Hi,
You mention of modifying system files makes me think my comments were misread. I have some additional comments below. In no way am I suggesting changing a system file. Quite the opposite. I am suggesting adding code around you select to handle the errors that TCP passes up. I see from your comments you are comparing it to linux and windows, but those are significantly larger systems and there may be some differences on a small embedded platform.
fanat9 wrote:Well... I just look on select() as a "high level" function. What I mean by that:
Clearly, to get data ready to read (ready bits set on "read_fds") a lot of things have to be done. But its all done behind the screen (handling ethernet frames, when IP protocol on top of ethernet, and finally TCP). So, on Netburner platform(or should I use word uCOS?), we have Ethernet driver thread/task with priority 38, IP task with priority 39, TCP task 40 and finally users tasks.
So, I just naturally expected to have at least some problems with connection reported thru errors file descriptor set. Like, for example, unplugged ethernet cable or when remote system sent packet with FIN bit set and other including "no response from remote host" when KeepAlive feature is turned On.
Only those errors reported by standard tcp, not including keepalive, are passed up. I understand you are saying it should be "standard" or included, but it is not. So unplugging a cable, RST, or other problems will only be reported to select if data is in transit when one of those events occur.
Now KeepAlive: Yes, to get status of tcp connection you have to try to send some data to remote host. But it is a part of tcp protocol. Its called KeepAlive, which is basically SYN packet with zero data to be send to remote system and remote system have to ACK it the same way it ACK any other packet. So it can be used without any modifications on remote system, as long as its in compliance with TCP.
I think keep alive is a good thing to have, and I did mention it in my previous post. However, it is an option in the tcp rfc and following the rfc the min. time period is 2 hours. Not very useful, so most implementations make it much shorter. Again, I am *not* disagreeing with you that keepalive is a good thing to have, and that it would be great to have select automatically include it. I'm just trying to give you some info to implement it with the netburner system and make it very clear I am not recommending changing system files.
**********
Under Transmission Control Protocol (TCP) keepalives are an optional feature, and if included must default to off.[1] The keepalive packet contains null data. In an Ethernet network, a keepalive frame length is 60 bytes, while acknowledge to this, also a null data frame, is 54 bytes. There are three parameters related to keepalive:
Keepalive time is the duration between two keepalive transmissions in idle condition. TCP keepalive period is required to be configurable and by default is set to no less than 2 hours.
Keepalive interval is the duration between two successive keepalive retransmissions, if acknowledgement to the previous keepalive transmission is not received.
**********
No argument here either, but they are different operating systems.
I did search where SetHaveError() used in tcp.cpp file and found just two places: in socket_struct::DoClose() and socket_struct::ClearPending().
Don't know what to do... not really want to start modify system stuff.
Have you looked at the keepalive example? It seems like you could have the select timeout process the keepalive for your connections. It depends on how long you can go for detection, but if you only need to detect a dead connection when data is not being transmitted, doing it once every 5 minutes or so might work. This doesn't require any system mods. Just suggesting a possible solution.