TCP Connect gets stuck in infinite dead loop

Discussion to talk about software related topics only.
John.Ohl
Posts: 20
Joined: Mon Dec 03, 2012 7:00 pm

TCP Connect gets stuck in infinite dead loop

Post by John.Ohl »

Repeated tcp connect success, and closures can cause an indefinite loop in the tcp.cpp connect process depending on the hash of the IP and port. The problem occurs when all of the TCP packets are used (I call them dirty), and they are never removed from the active list in order to be re-used.

Example application:

Code: Select all

#include "predef.h"
#include <stdio.h>
#include <ctype.h>
#include <startnet.h>
#include <autoupdate.h>
#include <dhcpclient.h>
#include <smarttrap.h>
#include <taskmon.h>
#include <NetworkDebug.h>
#include <tcp.h>

extern "C" {
void UserMain(void * pd);
}

const char * AppName="Test";

// Try creating a TCP connection
void TCPConnect() {
	static int i = 0;

	// Define our variables
	IPADDR ipAddress = AsciiToIp("127.0.0.1");

	int fd;
	if ((fd = connect(ipAddress, 0, 1935, (2 * TICKS_PER_SECOND))) < 0) {
		iprintf("Connect Failed %d\n", ++i);
	} else {
		iprintf("Connect succeeded %d\n", ++i);
		close(fd);
	}
}

void UserMain(void * pd) {
    InitializeStack();
    if (EthernetIP == 0) GetDHCPAddress();
    OSChangePrio(MAIN_PRIO);
    EnableAutoUpdate();
    StartHTTP();
    EnableTaskMonitor();

    #ifndef _DEBUG
    EnableSmartTraps();
    #endif

    #ifdef _DEBUG
    InitializeNetworkGDB_and_Wait();
    #endif

    OSTimeDly(20);

    while (1) {
    	TCPConnect();
    }

    iprintf("Application started\n");
    while (1) {
        OSTimeDly(20);
    }
}
Debugger paused, this is where it will loop indefinitely:
Debugger paused, this is where it will loop indefinitely.
Debugger paused, this is where it will loop indefinitely.
Screen Shot 2014-02-14 at 11.01.54 AM.png (190.5 KiB) Viewed 7327 times
Example serial output (from a MOD5270):
Example serial output (from a MOD5270)
Example serial output (from a MOD5270)
Screen Shot 2014-02-14 at 11.02.36 AM.png (75.11 KiB) Viewed 7327 times
In the corrected version (see attachments) I changed the structure heads for the free and active lists to use standard TAILQs. I'm sure you could apply the fix a similar manner using the current setup, I simply changed it to save the allocation of a single large socket structure and replace it with the smaller TAILQs. It also affords a simpler understanding when dissecting the code [At least I think so ;-)].

If you run a DIFF against the original vs the one I provided you'll see the changes easily (you can ignore a few changes to the critical object sections, I changed them to their expanded c instead of C++ constructor and destructor versions, while in the course of troubleshooting)
Attachments
tcp.cpp
Corrected version that does not exhibit the problem with simplified list structures.
(140.22 KiB) Downloaded 399 times
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by pbreed »

Other than replacing our linked list with your TailQ I see no functional difference?

Can you please give me the explicit line number where you think you changed the logic?

Also why did you nuke all the Critical section objects and replace them with error prone enter/exit?

I've just found a connect bug where if you connect to a server thats not listening and returns an RST there is a socket leak....
But I don't think thats what your issue was....

Please try again because I don't follow...
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by pbreed »

The parts you removed were pointers... Not socket structures...

IE
PSOCKET or pointer to socket a single 32 bit pointer.

If the active list is not corrupt then the find socket list is not an infinite loop as the last socket in the list has the active next pointing to null...
(This is set by the socket init...)

So I don't understand what if any logic you changed?
John.Ohl
Posts: 20
Joined: Mon Dec 03, 2012 7:00 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by John.Ohl »

I replaced the structure with lists and also the critical sections through the course of troubleshooting. The source code for tcp.cpp lacks constructive commenting, which caused me to have to change things out one piece at a time until I found the answer, I digress.

The solution is to add the following @ line 2884:

Code: Select all

active_list.RemoveSocket(ps);
Leaving the abortsocket function to look like the following:

Code: Select all

   int abortsocket( int fd )
   {
      int sl = fd - TCP_SOCKET_OFFSET;
      if ( sl < 0 )
      {
         return TCP_ERR_NOSUCH_SOCKET;
      }

      register PSOCKET ps = sockets + sl;

      SNMPCOUNT( snmp_var_tcpOutRsts );  
      ps->SendState( TCP_RST );

      OSCriticalSectionObj oscs( TCP_critical_section ); /*Locks the OS the destructor will unlock*/

      switch ( ps->state )
      {
        case TCP_STATE_LISTEN :
#ifdef SNMP
          if ( pRemoveTableElementtcpConnEntry != NULL )
          {
             pRemoveTableElementtcpConnEntry( ps );
          }
#endif
          REMOVE( ps, listen_list_head );
          break;
        case TCP_STATE_CLOSED:
        	active_list.RemoveSocket(ps);
          break;

        default:
#ifdef SNMP
          if ( pRemoveTableElementtcpConnEntry != NULL )
          {
             pRemoveTableElementtcpConnEntry( ps );
          }
#endif

          active_list.RemoveSocket( ps );
      }


      ps->SetState( TCP_STATE_CLOSED );
      ps->DoCleanup();
      ps->ClearPending( TCP_ERR_CON_ABORT );
      INSERTLIST( ps, free_list_head );
      return TCP_ERR_NORMAL;
   }
}//Extern C
The infinite loop comes into play because the socket isn't removed from the active list when abortsocket is called and the tcp state is TCP_SATE_CLOSED. If this causes a problem elsewhere please let me know, but it appears to have solved the issue on our boxes /w no apparent negative side effects (yet).
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by pbreed »

That fix has bad side effects...

The socket is not supposed to ever end up in state_closed unless it has been removed from the list.
Removing it from the list when its not on the list is bad....

I believe that the problem you had has to do with having connections fail...
The most recent code had a change in the way connections fail and we were not cleaning up correctly...

The change is to modify the section around line 1427 to look like...

if(OSSemPend( &ps->m_OsSemaphore, timeout )!= OS_TIMEOUT)
{
if (( ps->state == TCP_STATE_SYN_SENT )||(ps->state ==TCP_STATE_CLOSED))
{
//We can only get here if our socket was reset
//Or closed immediatly.
ps->gpflags |= TCP_USER_CLOSED;
ps->DoClose( TCP_ERR_TIMEOUT );
FREESOCKET(ps);
return TCP_ERR_CON_RESET;
}
SNMPCOUNT( snmp_var_tcpActiveOpens );
SNMPCOUNT( snmp_var_tcpCurrEstab );
return fd;
}


This fixes the leak for connections...

Paul
John.Ohl
Posts: 20
Joined: Mon Dec 03, 2012 7:00 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by John.Ohl »

Thanks for the assistance. I updated it with your suggestion and it changed the output to Failed as expected instead of Success, so in that respect it is a closer solution since mine never addressed that problem.

However, it still doesn't resolve the infinite spin loop issue. The spin still occurs in the same location when ps->m_pNextActive always returns NULL. Any other suggestions? Thanks!
New output from the test application
New output from the test application
Screen Shot 2014-02-14 at 2.35.21 PM.png (56.91 KiB) Viewed 7294 times
John.Ohl
Posts: 20
Joined: Mon Dec 03, 2012 7:00 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by John.Ohl »

Follow up:

Taking your suggestion and adding active_list.RemoveSocket(ps) before the FREESOCKET(ps); also fixes the problem. Hope this helps you if needed, though I'm not sure it doesn't cause other issues, it at least resolves this one particular lockup.

Code: Select all

if(OSSemPend( &ps->m_OsSemaphore, timeout )!= OS_TIMEOUT)
{
 if ( ps->state == TCP_STATE_SYN_SENT||(ps->state == TCP_STATE_CLOSED))
 {
   //We can only get here if our socket was reset
   //Or closed immediatly.
   ps->gpflags |= TCP_USER_CLOSED;
   ps->DoClose( TCP_ERR_TIMEOUT );
   if (ps->state == TCP_STATE_CLOSED) {
	   active_list.RemoveSocket( ps );
   }
   FREESOCKET(ps);
   return TCP_ERR_CON_RESET;
 }
 SNMPCOUNT( snmp_var_tcpActiveOpens );  
 SNMPCOUNT( snmp_var_tcpCurrEstab );
 return fd;
}
Thanks again!
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by pbreed »

For local refused connections we still have issues.....

Looking at this now... will advise...
dacoast
Posts: 14
Joined: Fri Apr 25, 2008 10:09 am

Re: TCP Connect gets stuck in infinite dead loop

Post by dacoast »

I have been experiencing similar problems with TCP.
The Netburner (MOD-54415) connects as a client to a local TCP server (currently simulated by a PC).
I experience a problem when the PC is connected but not running the simulation program (TCP Server).
In this instance, the Netburner attempts to Connect, which returns successfully, but then fails on an attempt to write
to the socket with UNCONNECTED TCP Socket Error.

MCPTCP Socket # 0 created!
MCPTCP Connect req accepted by MCP at addr 192.23.1.25, port # 52130; Socket # 28
ERROR writing to MCP: Error Code = -2
Write 115 to UNCONNECTED TCP Socket Error!! FreeCount = 257
MCPTCP.sendWatchDogMsg: msg sent, result = 255!
MCPTCP: Error on READ from MCP: Code = -1
MCPTCP: Closing MCP Socket 28

After this cycle repeats approximately 64 times, if an attempt is made to connect to the Netburner webserver or other
TCP server, the Netburner "locks up", seemingly in an infinite loop in the TCP or IP task.

-Doug
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: TCP Connect gets stuck in infinite dead loop

Post by pbreed »

We have a fix for this....
I know it works with the latest beta, and latest release, not sure about earlier releases.

The nburn\include\tcp.h file
and nburn\system\tcp.cpp are attached.

This fixes the connection weirdness that happened when we added non-blocking connect.
Attachments
tcp.h
(10 KiB) Downloaded 386 times
tcp.cpp
(138.19 KiB) Downloaded 403 times
Post Reply