Web Server Crashes + CLOSE

Discussion:

Web Server Crashes + CLOSE_WAIT

Jay Harper

2006-03-15 13:20:46 UTC

Has anyone seen situations where the web server process essentially
crashes and stops serving web pages followed by 4D Server crashing? When
the crash occurs you see a lot of CLOSE_WAIT connections to the port
when doing a netstat...

Here's the background... The web server in question gets about 800,000
page hits per month. The max in any half hour period is about 1500 to
1800. It's running 4D 2004.3 on a dual processor, 2.0 GHz XServe with
plenty of RAM. The 4D web server is proxied by Apache which handles all
the requests for static pages and just hands off dynamic page requests
to 4D. Pages are served using 4D non-contextual web templates.

When a crash occurs the activity bar on 4D Server will go up while all
visible processes are at 0%. At this point if you try to shut down the
server (File->Quit) the server will hang and you'll have to force quit
the application. After this point the web server will stop responding to
requests, and eventually the server will be non-responsive. When you
look at the connections to the web serving port you see a lot of
CLOSE_WAIT statuses.

I've done all sorts of research trying to track down this issue and at
the moment I'm trying to figure out what would cause a CLOSE_WAIT and
how it can be fixed. It appears that CLOSE_WAIT can be the result of a
keep alive connection that is not properly closed by the client, but the
"Use Keep Alive Connections" checkbox is NOT selected in Preferences,
and changing the "Inactive Web Process Timeout" setting has no effect.
On top of everything I changed the proxy settings and all requests from
Apache should be HTTP 1.0 with no keep alive connections, but I need to
check that I implemented that correctly.

If anyone has seen anything like this or has any comments on the
situation, I'd appreciate it greatly. Tech support has been a little
help, but I haven't heard from them in days now...

Thanks,

- Jay Harper
Slicksurface LLC
New York

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

pdavis

2006-03-16 01:47:29 UTC

Permalink

Post by Jay Harper
Has anyone seen situations where the web server process essentially

crashes and stops serving web pages followed by 4D Server crashing? When
the crash occurs you see a lot of CLOSE_WAIT connections to the port
when doing a netstat...
...
When a crash occurs the activity bar on 4D Server will go up while all
visible processes are at 0%. At this point if you try to shut down the
server (File->Quit) the server will hang and you'll have to force quit
the application.<<

I've had this happen. In my case the fault was my own. I had an
incorrectly designed On Web Connection database method which would allow
other methods to be invoked directly from the URL. This resulted in the
possibility of the other methods being invoked recursively. Depending on
how web users were actually using the system, the server would operate fine
for weeks and then we would get several freezes/crashes per day. I was able
to find the problem by writing a call trace to a log file. This may not be
the cause of your freezes, but it might be worth taking a look. Once the
code was corrected, the freezes stopped and have not resurfaced in 18 months
or so.

Paul Davis
Enterprise Software
K12 America
pdavis-***@public.gmane.org
Westport, CT

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Lahav Wolach

2006-03-16 04:29:09 UTC

Permalink

Jay,

I have seen this with a web services server. In our case, the problem
was traced to a problem with a remote network card/IP stack that kept
opening new ports without closing them. The ports do close after a
timeout period, however, if the connections are opened faster then the
open one can close, and the number of open ports exceeds some threshold,
the server will crash.

Lahav

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

John DeSoi

2006-03-16 06:21:46 UTC

Permalink

Post by Jay Harper
I've done all sorts of research trying to track down this issue and
at the moment I'm trying to figure out what would cause a
CLOSE_WAIT and how it can be fixed. It appears that CLOSE_WAIT can
be the result of a keep alive connection that is not properly
closed by the client, but the "Use Keep Alive Connections" checkbox
is NOT selected in Preferences, and changing the "Inactive Web
Process Timeout" setting has no effect. On top of everything I
changed the proxy settings and all requests from Apache should be
HTTP 1.0 with no keep alive connections, but I need to check that I
implemented that correctly.
If anyone has seen anything like this or has any comments on the
situation, I'd appreciate it greatly. Tech support has been a
little help, but I haven't heard from them in days now...

I don't think CLOSE_WAIT has anything specifically to do with keep-
alive connections. Here is the description from the netstat man page:

CLOSE_WAIT: The socket connection has been closed by the
remote peer,
and the system is waiting for the local application to close
its half of
the connection.

So to me it sounds like the client side has closed, but the server
side has not.

The CLOSE_WAIT setting on Windows is a fairly high number (several
minutes). Until you can track down the source of the problem, it
might help to change the setting to keep from running out of
resources. You'll find some hints on this here:

http://www.winguides.com/forums/showflat.php?
Cat=&Board=genwinnt&Number=130736&page=8&view=collapsed&sb=5&part=

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Jay Harper

2006-03-17 22:12:54 UTC

Permalink

Post by John DeSoi

Post by Jay Harper
I've done all sorts of research trying to track down this issue and
at the moment I'm trying to figure out what would cause a CLOSE_WAIT
and how it can be fixed. It appears that CLOSE_WAIT can be the
result of a keep alive connection that is not properly closed by the
client, but the "Use Keep Alive Connections" checkbox is NOT
selected in Preferences, and changing the "Inactive Web Process
Timeout" setting has no effect. On top of everything I changed the
proxy settings and all requests from Apache should be HTTP 1.0 with
no keep alive connections, but I need to check that I implemented
that correctly.
If anyone has seen anything like this or has any comments on the
situation, I'd appreciate it greatly. Tech support has been a little
help, but I haven't heard from them in days now...

I don't think CLOSE_WAIT has anything specifically to do with keep-
CLOSE_WAIT: The socket connection has been closed by the remote
peer,
and the system is waiting for the local application to close its
half of
the connection.
So to me it sounds like the client side has closed, but the server
side has not.
The CLOSE_WAIT setting on Windows is a fairly high number (several
minutes). Until you can track down the source of the problem, it
might help to change the setting to keep from running out of
http://www.winguides.com/forums/showflat.php?
Cat=&Board=genwinnt&Number=130736&page=8&view=collapsed&sb=5&part=

John,

As I read more, it seems your right... CLOSE_WAIT isn't necessarily
about keep-alive connections. The problem is continuing even though
Apache is now only doing HTTP 1.0 connections and 4D is configured not
to do keep-alive connections.

I'm now wondering whether this is what happens when Apache times out (I
had the proxy timeout set to 30 seconds). So, I've upped that setting
and will wait to see if it has any effect.

In the meantime, does anyone know how to lower the CLOSE_WAIT setting
for OS X? Apparently the default is 2 minutes.

- Jay Harper
Slicksurface LLC
New York

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Willie Alberty

2006-03-18 06:40:36 UTC

Permalink

Post by Jay Harper

Post by John DeSoi

Post by Jay Harper
I've done all sorts of research trying to track down this issue
and at the moment I'm trying to figure out what would cause a
CLOSE_WAIT and how it can be fixed. It appears that CLOSE_WAIT
can be the result of a keep alive connection that is not
properly closed by the client, but the "Use Keep Alive
Connections" checkbox is NOT selected in Preferences, and
changing the "Inactive Web Process Timeout" setting has no
effect. On top of everything I changed the proxy settings and
all requests from Apache should be HTTP 1.0 with no keep alive
connections, but I need to check that I implemented that correctly.

I don't think CLOSE_WAIT has anything specifically to do with
keep- alive connections. Here is the description from the netstat
CLOSE_WAIT: The socket connection has been closed by the
remote peer,
and the system is waiting for the local application to close
its half of
the connection.
So to me it sounds like the client side has closed, but the
server side has not.

As I read more, it seems your right... CLOSE_WAIT isn't necessarily
about keep-alive connections. The problem is continuing even though
Apache is now only doing HTTP 1.0 connections and 4D is configured
not to do keep-alive connections.
I'm now wondering whether this is what happens when Apache times
out (I had the proxy timeout set to 30 seconds). So, I've upped
that setting and will wait to see if it has any effect.
In the meantime, does anyone know how to lower the CLOSE_WAIT
setting for OS X? Apparently the default is 2 minutes.

It's not something that you should try to change. The timeout period
is used by the OS as a fail-safe; it is the responsibility of the
local application to properly and promptly close its own network
connections when done. It sounds like you've either uncovered a bug
in 4D's web server, or you have a bug in your application code that
is preventing 4D from properly closing the connection.

The CLOSE_WAIT status is part of the normal life cycle of a TCP
connection. TCP is a full-duplex protocol with each side maintaining
an independent state for the connection. CLOSE_WAIT indicates that
the remote side of the connection (in your case, Apache) has signaled
its intention to close. It is waiting for you to send any remaining
data and close your side of the connection as well.

When trying to debug a TCP connection issue, it is helpful to refer
to the connection state diagram on page 22 of the TCP specification
(RFC 793, located at http://www.rfc-editor.org/rfc/rfc793.txt). This
diagram is divided vertically into two halves. The upper half
describes the state flow for opening a connection, the lower half for
closing the connection. In the middle is the ESTABLISHED state
(status 8 for ITK and 4DIC users), where TCP connections spend most
of their time.

There are typical paths through this diagram, depending on whether
you are acting as the client or the server. For example, when opening
a connection, a server will perform a passive open (creating a
LISTEN) and wait to be contacted (top center and left of the
diagram). A client performs an active open and attempts to contact a
listening server (top right). Once both sides have completed what is
called the three-way handshake, they both enter the ESTABLISHED state
where most of the data exchange takes place.

At any time, either side may close the connection. The path through
the remainder of the diagram depends on which side initiated the
close. If you close the connection, you follow the bottom-left path,
through the FIN_WAIT-1 and FIN_WAIT-2 states. If the other side
closes the connection, you move through the CLOSE_WAIT and LAST-ACK
states. Both eventually end up at CLOSED.

With HTTP requests, it is usually the server that initiates the
close. After receiving an incoming request, it responds with the
appropriate data, then promptly closes the connection and creates a
replacement LISTEN for the next request. Here is a typical path
through the diagram:

1. Server creates new listen by performing passive open. Server
state: LISTEN; Client state: n/a
2. Server waits for incoming connection.
3. Client attempts to connect to server by performing active open.
Server state: LISTEN; Client state: SYN_SENT
4. Three-way handshake begins by server receiving the first SYN
packet. It sends an acknowledgment and its own SYN. Server state:
SYN_RCVD; Client state: SYN_SENT.
5. Step two of handshake continues with client receiving and sending
an acknowledgment of the server's SYN. It then moves to the
established state where it can begin sending and/or receiving data.
Server state: SYN_RCVD; Client state: ESTABLISHED.
6. Handshake concludes with the server's receipt of the client's
acknowledgment. Server is ready to talk and there is now a full-
duplex data connection in place. Server state: ESTABLISHED; Client
state: ESTABLISHED.
7. Client sends data (such as an HTTP request). Server state:
ESTABLISHED; Client state: ESTABLISHED.
8. Server reads data, sends other data in response. Server state:
ESTABLISHED; Client state: ESTABLISHED.
9. Both sides may continue to send more data if desired without
destroying the TCP connection. This is how HTTP/1.1 persistent
connections and the HTTP/1.0 keep-alive extension work since HTTP is
layered on top of TCP.
10. Server decides it is finished and begins closing the connection
by sending the close signal FIN. Note that the connection isn't
simply dropped at this point; it may be necessary for the OS to
retransmit a lost packet. Server state: FIN_WAIT-1; Client state:
ESTABLISHED.
11. Client receives the server's FIN and sends an acknowledgment.
Server state: FIN_WAIT-1; Client state: CLOSE_WAIT.
12. Server receives the acknowledgment and waits for the client to
close its side of the connection. Server state: FIN_WAIT-2; Client
state: CLOSE_WAIT.

...I must pause briefly here to point out something very important:
At this point, the TCP connection still exists, and in fact there is
still a half-duplex data connection from client to server. The reason
the connection doesn't go away at this point is because only one side
(the server) has indicated it will not be sending any more data. It
is still possible for the client to send more data and for the server
to receive it. This doesn't usually happen with HTTP, but may be
common with other TCP-based protocols. I will come back to this below...

13. Client is done with the connection and sends its own FIN. Server
state: FIN_WAIT-2; Client state: LAST-ACK.
14. Server receives the client's FIN and sends an acknowledgment. It
then moves into a sometimes confusing state called TIME_WAIT. I won't
go into TIME_WAIT here unless somebody really wants to know. Suffice
it to say that you rarely need to worry about TIME_WAIT and for all
intents and purposes, the connection is now closed on the server-
side. At some point, the OS will move the connection to CLOSED.
Server state: TIME_WAIT -> CLOSED; Client state: LAST-ACK.
15. Server creates new listen for next incoming connection. Server
state: LISTEN; Client state: n/a.
16. Original client receives the acknowledgment of its close. All
done. Server state: CLOSED; Client state: CLOSED.

There is no requirement that the server be the one to initiate the
connection close; the client can begin to close the connection
immediately after sending the request. It will wait for its requested
data, followed by the server's close. This happens frequently with
HTTP applications and is often a "gotcha" for people writing their
own web servers with ITK. The path through the diagram is similar,
but the roles reverse at the end:

Steps 1-7 are the same as above.
8. Before waiting for the server's response, the client indicates it
will not be sending any additional data by beginning to close the
connection. Remember that this is only a half-close, just the client-
to-server data connection. The server-to-client data connection is
still open. Server state: ESTABLISHED; Client state: FIN_WAIT-1.
9. Server reads all available data then receives the client's FIN.
Server sends acknowledgment of the FIN but continues to work on the
request. Server state: CLOSE_WAIT; Client state: FIN_WAIT-1.
10. Client receives the expected acknowledgment and continues to
receive whatever other data the server sends. Server state:
CLOSE_WAIT; Client state: FIN_WAIT-2.

...Here is where CLOSE_WAIT usually occurs in HTTP. At this point,
the server has the client's request and is working on fulfilling it.
The client has closed its side of the data connection, signaling that
it won't send any more data. However, this does not mean it is
unwilling to receive data. This is why the state is called
CLOSE_WAIT: it will be closed soon, but the client must wait until
the server is done with the request. For HTTP servers, it is
perfectly valid to send data to the client at both the ESTABLISHED
and CLOSE_WAIT states. For HTTP clients, you can receive data at any
of the ESTABLISHED, FIN_WAIT-1, or FIN_WAIT-2 states. I will cover
the case where the client is unwilling to receive any more data below...

11. Server works on request and finishes sending the requested data.
Server state: CLOSE_WAIT; Client state: FIN_WAIT-2.
12. Server indicates it is done sending data by closing the
connection. Server state: LAST-ACK; Client state: FIN_WAIT-2.
13. Client receives the server's FIN and sends an acknowledgment. It
then moves through TIME_WAIT and on to CLOSED. The connection is now
effectively closed on the client-side. Server state: LAST-ACK; Client
state: TIME_WAIT -> CLOSED.
14. Server receives acknowledgment of its FIN. All done. Server
state: CLOSED; Client state: CLOSED.
15. Server creates new listen for next incoming connection. Server
state: LISTEN; Client state: n/a.

There is a special case where both sides decide to close the
connection simultaneously. This is rare, but in this case, both send
the closing FIN and move to the FIN_WAIT-1 state. Both will receive
the FIN and send an acknowledgment, moving to the CLOSING state.
After receiving each acknowledgment, both move through TIME_WAIT to
CLOSED.

The last thing to mention is the circumstance where one side chooses
to close the connection but is unwilling to receive any more data.
Because TCP was designed to be a robust protocol, both sides need to
agree that the connection should be closed before it is truly closed.
Most well-behaved HTTP clients will simply read in but discard the
server's response, waiting for the closing FIN packet to be received.
This allows both sides to close the TCP connection cleanly. If the
client is unwilling or unable to do that, it may send what's called a
RST (reset) packet which is essentially a "kill it now" order. Upon
sending the RST, the client force-closes its TCP connection. Upon
receiving the RST, the server force-closes its TCP connection. The
status instantly jumps to CLOSED and send and receive attempts will
return an error indicating the connection was killed.

Getting back to Jay's situation, it appears that the Apache proxy is
generating a request, sending it to the 4D web server, and either
half-closing immediately, or giving up after some timeout period.
Whatever the case, it is not force-closing the TCP connection (as
Apache is a well-behaved HTTP client) and waiting for 4D to perform
its side of the close. This is why you are seeing so many connections
in the CLOSE_WAIT state; either your application code or 4D is not
finishing the job. I would suspect that if you ran netstat on the
Apache proxy server, you would see the same number of connections in
the FIN_WAIT-2 state.

I must admit a bit of ignorance here with respect to 4D's built-in
web server. I toyed with it for a weekend back in the 4D 6.7 days,
but since then I have worked only on built-from-scratch web servers,
one using ITK, another using a custom plug-in, where I had a great
deal of control over the underlying HTTP and TCP connections. 4D's
automatic handling of the web connections is a blessing in that there
is less code for you to write, but a curse to debug because you are
at the complete and utter mercy of whatever 4D decides to let you see
(which, in this case, is very little).

4D's web server uses a classic master-slave design. The web server
process serves as the master, delegating new connection requests to
its slaves. The slave is then responsible for the entirety of the
connection. Because you are seeing a lot of TCP connections at the
CLOSE_WAIT state, I would suspect that you have one or more processes
that never return from the On Web Connection method. 4D itself is
responsible for closing the underlying TCP connection and will only
do so when it believes the web request is complete, that is to say,
when On Web Connection finishes execution.

You probably also have a user that is compounding the situation: The
user requests some resource. It doesn't show up. So they click
"Reload" in their browser. It doesn't show up. So they click "Reload"
in their browser. And so on. With each new request, another slave
process gets tied up. Eventually, all slaves are busy doing...
something, and the web server cannot delegate new incoming requests.
It effectively goes deaf and you "crash."

Here are some things to try and/or look for:

Turn on the web server's log file. It's enabled in the Web > Advanced
pane of the database preferences. It's not the greatest log file, and
entries are only written after the hit completes (and then only after
a buffer is filled), but it might give you an idea of the user's
requests that lead up to the crash.

Since I suspect that On Web Connection is not completing, create your
own log file that simply records when that method starts and stops
for each process. I'd also include the request URL, provided in $1.
At the next freeze, examine the log file to verify that every process
has indeed completed. If so, you're probably looking at a bug in 4D
because you have no control beyond that point.

You could also create a different log file and write the complete
contents of the HTTP request header, provided in $2, out at the top
of every On Web Connection call. This file could grow to be quite
large, so be sure to rotate it regularly--perhaps daily. The next
time you encounter your freeze, look at the last few entries in the
log; they should give you an idea of the specific requests that were
in process at the time.

Another possibility is that you are experiencing some sort of denial-
of-service attack. Any one of the log file options described above
can help you identify that.

I'd also recommend opening and leaving the Runtime Explorer window
open on the server. That way you'll be able to see if any of the
invisible kernel processes (at least the ones that are 4D processes,
like the Web Server) are chewing up the CPU.

Hope this helps!

--

Willie Alberty, Owner
Spenlen Media
willie-***@public.gmane.org

http://www.spenlen.com/

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Jay Harper

2006-03-18 15:17:19 UTC

Permalink

Willie,

What an incredible response!!! Thank you. It will take me a little while
to fully study what you said, and it's too long to meaningfully quote,
but here are answers to some of the issues you brought up...

Is this a problem with Apache timing out? Apparently not. When I look at
Apache's error log I see cases of timeouts, but they don't correspond to
the problems with CLOSE_WAIT (at least not recently). Yesterday I upped
Apache's timeout limit from 30 seconds to 300 seconds just so Apache
timing out wouldn't be a problem. However, the problem has occurred
literally while I was watching the server on two occasions and I can say
that the connection had been there for a number of seconds before going
into a persistent CLOSE_WAIT status.

Another thing I've seen by putting some "pre-failure warning" code in is
that sometimes CLOSE_WAITs will resolve themselves, and other times they
won't. There were two cases overnight where I detected the existence of
CLOSE_WAITs for a minute or two and then they cleared up, but then later
this morning I waited 10 minutes and the CLOSE_WAITs didn't clear up.
This means OS X's fail safe of terminating TCP connections that are in
CLOSE_WAIT for more than 2 minutes did not kick in, which seems truely odd.

I should also note that the connection which is not closing is the one
with 4D 'local' and Apache as 'foreign'. Prior to the persistent
CLOSE_WAIT you actually see connections in both directions, but the one
with 4D 'foreign' and Apache 'local' always closes correctly.

Is it a DOS attack? Apparently not... I looked at 4D's logweb.txt for
the time period around the case of persistent CLOSE_WAITs this morning
and there were generally about 20 requests per minute (range 12 to 21)
in the minutes preceeding the persistent CLOSE_WAITs - that's actually a
bit on the quiet side for this server (busy would be averaging 30 to 50
per minute). So that isn't remarkable and all of the requests seemed
pretty routine. But then again, those were the ones where 4D wrote it
out to logweb.txt, but if 4D never responded we'd expect to see it in
the Apache erro log and we don't. Either way, I will start writing out
one or both of the logs you suggest - they're good ideas for the time being.

Your suggestion of opening the runtime explorer was a good one, but I
ran into a problem doing it - it just won't open. I quit 4D Server and
tried again - same result. So not sure what's happening there.

So at this point my big questions are;
- Why is the connection in CLOSE_WAIT status for so long to start with?
- Why doesn't OS X terminate it after two minutes of CLOSE_WAIT like
it should?
- Why can't I open Runtime Explorer on server?

I'll continue to beef up my pre-failure warning code and implement more
logging... Hopefully that will shed additional light on what's going on,
but at the moment it appears to be a problem with how 4D is handling the
closing of TCP connections compounded with OS X's failsafe not kicking
in to clean things up when there's a problem with 4D.

Thanks again,

- Jay Harper
Slicksurface LLC
New York

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Willie Alberty

2006-03-22 03:13:28 UTC

Permalink

Post by Jay Harper
This means OS X's fail safe of terminating TCP connections that are
in CLOSE_WAIT for more than 2 minutes did not kick in, which seems
truely odd.
- Why doesn't OS X terminate it after two minutes of CLOSE_WAIT
like it should?

A long post on a Friday night after a long week of work got the
better of me: I misspoke. The two minute timeout I was describing,
and the one John DeSoi referred to in his response, are actually for
the TIME_WAIT state, not CLOSE_WAIT. The fact that they are named
similarly contributes to the confusion surrounding TIME_WAIT.

I didn't describe TIME_WAIT completely before because you really only
need to care about it if you are writing a TCP stack at the OS-level.
Remember from the state diagram that TIME_WAIT is the second-to-last
state prior to CLOSED when your side is the one initiating the close.

TCP is a robust, reliable protocol, ensuring the delivery of every
packet, including the control packets used to set-up and tear-down
connections. TCP guarantees delivery by requiring an acknowledgment
of each and every raw packet transmitted. Since it is possible for
packets to be lost on the Internet, and impossible to determine in
which direction the packet was lost, if the transmitting side does
not receive a timely acknowledgment of a sent packet, it will send it
again (RFC 793 §1.5).

TIME_WAIT exists to protect against the possibility of the final FIN
packet, or the response ACK packet, being lost. If the remote side
doesn't receive the expected acknowledgment of its close, it will
send the FIN again. There are two possible causes:

1. The original FIN was lost to the network and we never received it,
so we were still in the FIN_WAIT-2 state. We receive it this time,
send the required ACK, then move to the TIME_WAIT state.

2. Our original ACK was lost to the network so the remote side never
received it; it sends a FIN again. We are already in the TIME_WAIT
state, so we simply send back another ACK (§3.9).

Without TIME_WAIT, there would be nothing on our side to receive this
second FIN, and the remote side would be stuck in the LAST-ACK state
forever. TCP connections in the TIME_WAIT state are managed entirely
by the host OS and die off after a timeout period (called 2MSL). Most
modern OSes use a static configurable value for this timeout,
typically two minutes.

The only reason it is configurable is that under highly extraordinary
circumstances, using a two minute idle period could cause a temporary
depletion of ephemeral TCP ports (since there are only 65535). It is
also possible that an unusually slow network link could necessitate
using a higher timeout value.

Post by Jay Harper
I should also note that the connection which is not closing is the
one with 4D 'local' and Apache as 'foreign'. Prior to the
persistent CLOSE_WAIT you actually see connections in both
directions, but the one with 4D 'foreign' and Apache 'local' always
closes correctly.
- Why is the connection in CLOSE_WAIT status for so long to start with?

CLOSE_WAIT is a perfectly valid state for a TCP connection, and may
remain that way (academically, at least) indefinitely. Here is a
potential scenario:

- Server creates a LISTEN.
- Client connects to the server. Both sides handshake, enter
ESTABLISHED.
- Client sends some request data, say a request for a stock ticker
update every 30 seconds, then closes its side of the data connection
(as it has no other data to transmit).
- Client is now in FIN_WAIT-2 and server is in CLOSE_WAIT.
- Every 30 seconds, the server sends an updated quote over the
established TCP connection. Forever.

This isn't likely to happen in practice--the client really should be
opening new single-request connections every 30 seconds instead--but
nothing in the protocol design prohibits it. Only the local
application (4D) has the authority to decide when it is finished with
a TCP connection, and it is its responsibility to properly close it
when done.

Post by Jay Harper
Your suggestion of opening the runtime explorer was a good one, but
I ran into a problem doing it - it just won't open. I quit 4D
Server and tried again - same result. So not sure what's happening
there.
- Why can't I open Runtime Explorer on server?

You've got me on this one... Does the rest of the UI on the server
behave properly? Have you tried the keyboard combination (cmd-shift-
F9 on Mac)?

Post by Jay Harper
Logging the complete HTTP header turned out to be the best piece of
advice. I did a couple of extra log files trying to figure out what
was going on, but in the end it turned out to be bad code on my
side that didn't handle attempts by hackers and botnets the way it
should have. The server handles multiple hosts and when a request
came through for a host that wasn't configured to be handled by 4D
I wasn't handling the error properly. The only log that showed the
requested host was the one with the full HTTP header, hence it's
the one that pointed me to the source of the problem.
When I get a chance I'll figure out what 4D was doing that resulted
in a CLOSE_WAIT. IMHO it should have just thrown an error and
continued on and closed the connection gracefully.

I've experienced three distinct behaviors 4D exhibits when a fatal
bug occurs in my code. In preferred order, they are:

1. Runtime error dialog appears on the server. Most errors are caught
by 4D's runtime environment before they wreak havoc, and most error
dialogs are helpful. The best part is that these errors, if left
alone, don't immediately crash the server and I can restart semi-
gracefully.

2. 4D aborts the currently running process, leaving the rest
unaffected and the application continues to run. I really don't like
these that much but at least the app didn't crash. Unfortunately,
you've got little to nothing to go on for debugging.

3. 4D crashes. This is the least desirable, obviously. Especially so
with 4D (and similar development environments, including RealBasic)
because you can't even use helpful tools like Mac OS X's built-in
CrashReporter to diagnose the problem; the stack trace is 4D's code,
not yours.

In all likelihood, you were experiencing the mini-crashing behavior
of #2. The web server slave process that was responsible for
servicing the request and closing the connection vanished, taking the
underlying OS handle to the TCP connection with it. This left the
connection half-closed (the remote half) with no way to recover.

This could also explain why your server eventually went deaf. It is
possible that under certain circumstances, a bug exists in 4D's built-
in web server causing it not to resurrect crashed slave processes.
Eventually there would be no living slaves left to handle new
requests. 4D might then get stuck in an infinite loop looking for an
available slave--a condition that would never become true. And as we
all know, infinite loops in 4D (since it's essentially single-
threaded) cause the user interface to die, forcing you to force-quit
the app.

In a former life, I was responsible for diagnosing mysterious crashes
in a very large 4D application. In almost every case, it was log
files like yours that eventually lead to the bug fixes. 4D's
debugging capabilities, especially in compiled applications, are
limited. Log files are your friend. They are my first line of defense.

Glad to hear your up and running again.

--

Willie Alberty, Owner
Spenlen Media
willie-***@public.gmane.org

http://www.spenlen.com/

**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Jay Harper

2006-03-20 05:27:14 UTC

Permalink

Willie,

Logging the complete HTTP header turned out to be the best piece of
advice. I did a couple of extra log files trying to figure out what was
going on, but in the end it turned out to be bad code on my side that
didn't handle attempts by hackers and botnets the way it should have.
The server handles multiple hosts and when a request came through for a
host that wasn't configured to be handled by 4D I wasn't handling the
error properly. The only log that showed the requested host was the one
with the full HTTP header, hence it's the one that pointed me to the
source of the problem.

When I get a chance I'll figure out what 4D was doing that resulted in a
CLOSE_WAIT. IMHO it should have just thrown an error and continued on
and closed the connection gracefully.

Thanks again for your excellent comments.

- Jay Harper
Slicksurface LLC
New York

Post by Willie Alberty
You could also create a different log file and write the complete
contents of the HTTP request header, provided in $2, out at the top
of every On Web Connection call. This file could grow to be quite
large, so be sure to rotate it regularly--perhaps daily. The next
time you encounter your freeze, look at the last few entries in the
log; they should give you an idea of the specific requests that were
in process at the time.

Ingo Wolf

2006-03-16 11:31:32 UTC

Permalink

I had something similar. My setup is 4D 2003.5, maybe it doesn't apply to 2004, but who knows...
Check the timeout setting for inactiv web processes. I had it at approx. 15 minutes and the server did hang every now and then. The hanging went away once I set the timeout to "never".

HTH
Ingo Wolf

Post by Jay Harper
Has anyone seen situations where the web server process essentially
crashes and stops serving web pages followed by 4D Server crashing? When
the crash occurs you see a lot of CLOSE_WAIT connections to the port
when doing a netstat...
...
If anyone has seen anything like this or has any comments on the
situation, I'd appreciate it greatly. Tech support has been a little
help, but I haven't heard from them in days now...
Thanks,
- Jay Harper
Slicksurface LLC
New York

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ViELMAC Ingo Wolf
Bahnhofstr. 25
D 55576 Badenheim
Tel. +49 (0)6701 9119823
Fax +49 (0)6701 9119824
e-mail: ingo-qZTDS41X+***@public.gmane.org
[Please avoid sending me Word attachments (use txt, rtf, or pdf)
<http://www.gnu.org/philosophy/no-word-attachments.html>]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**********************************************************************
4th Dimension Internet Users Group (4D iNUG)
Unsub: mailto:4D_Tech-off-d2/***@public.gmane.org
**********************************************************************

Jay Harper

2006-03-17 21:34:16 UTC

Permalink

Ingo,

Thanks, but I changed the timeout setting for inactive web processes and
it had no effect on the problem.

- Jay

Post by Ingo Wolf
I had something similar. My setup is 4D 2003.5, maybe it doesn't apply to 2004, but who knows...
Check the timeout setting for inactiv web processes. I had it at approx. 15 minutes and the server did hang every now and then. The hanging went away once I set the timeout to "never".
HTH
Ingo Wolf