Page 2 of 2

Re: intermittent hanging

Posted: Mon Oct 13, 2014 12:27 pm
by rnixon
I don't think I can think of anything else to try without answers to all the previous questions. Your last comment is also confusing. It seems to indicate you are getting link on every module, when previously you said the link leds were off (but also sometimes always on?). To solve something like this you need a very methodical sequence of debugging.

Re: intermittent hanging

Posted: Mon Oct 13, 2014 3:57 pm
by rdb9878
Which questions are you waiting for me to answer?

1) "What is the other end of the ethernet cable plugged into?" Ethernet Switch
2) "Is the power to that [Ethernet Switch] cycled at the same time?" No
3) "Maybe take one unit that constantly fails and disconnect all the others from the switch." There is not one unit that constantly/consistently fails, it's always random.
4) "Maybe a check for link, and if not, do a soft reset?" I did implement this per your suggestion, but the overall problem still occurred (random failure sometimes on powerup).
5) "I think the factory program shows link status on the serial port. I would try that first." Yes, this was a helpful step in understanding why I mistakenly thought that the EtherLink() function did not work.
6) "It seems to indicate you are getting link on every module, when previously you said the link leds were off (but also sometimes always on?)" See question 4: Per your suggestion, I added a while loop to wait for the EtherLink() function to return true in the startup procedure (or timing out and soft-resetting). Adding this code did NOT fix my problem as I am still getting random failures every few power cycles.

With my latest test code (attached), here are different scenarios I have seen when powering up the devices:

Scenario A) All devices send heartbeats, and their ethernet jack LEDs blink. This is a successful power-up.
Scenario B) All devices send heartbeats except for a randomly chosen one which does not send heartbeats and its ethernet jack LEDs are both off. Unplugging the ethernet cable and plugging it back in fixes the issue. This is what I'm calling a failure.
Scenario C) All devices send heartbeats except for a randomly chosen one which does not send heartbeats and its ethernet jack LEDs are both on and not blinking. Unplugging the ethernet cable and plugging it back in fixes the issue. This is also what I'm calling a failure.

Let me know if this clears up your confusion.

Please look at the attached code and let me know what the next methodical sequence of debugging should be.

Re: intermittent hanging

Posted: Tue Oct 14, 2014 7:03 am
by pbreed
When it does not work, can you see the device with ipsetup...

The Task stack is not 4 byte aligned.....
This could be an issue....

I'd personally make two changes, just for style....

1)I'd use the
OSSimpleTaskCreate(x,p) macro...
This takes care of the task stack for you, makes sure the parameters are right and static allocates a user sized stack....
OSSimpleTaskCreate(heartbeatMonitor,MAIN_PRIO+1);

2)I'd replace you new call with a static buffer,....
No need to do new for a one time alocator..

static char heartBeatMsg[100];



Paul

Re: intermittent hanging

Posted: Tue Oct 14, 2014 9:30 am
by rnixon
That clears up some.

If you check for link and do a soft reset if it fails after some timeout period, as you say in item 3, wouldn't this mean a failed unit would constantly be resetting? You don't mention that behavior. Is this what happens?

How do you verify the reset is occurring?

Is the unit constantly resetting when you plug/unplug the cable?

The key point here is what is the state of the module is when there are no link lights.
rdb9878 wrote:Which questions are you waiting for me to answer?
3) "Maybe take one unit that constantly fails and disconnect all the others from the switch." There is not one unit that constantly/consistently fails, it's always random.
4) "Maybe a check for link, and if not, do a soft reset?" I did implement this per your suggestion, but the overall problem still occurred (random failure sometimes on powerup).
6) "It seems to indicate you are getting link on every module, when previously you said the link leds were off (but also sometimes always on?)" See question 4: Per your suggestion, I added a while loop to wait for the EtherLink() function to return true in the startup procedure (or timing out and soft-resetting). Adding this code did NOT fix my problem as I am still getting random failures every few power cycles.

Scenario B) All devices send heartbeats except for a randomly chosen one which does not send heartbeats and its ethernet jack LEDs are both off. Unplugging the ethernet cable and plugging it back in fixes the issue. This is what I'm calling a failure.

Re: intermittent hanging

Posted: Thu Oct 16, 2014 9:04 am
by rdb9878
Unfortunately, the system is being utilized by other teams right now so I will not be able to work on this for potentially several weeks (this forum thread may go dead for awhile)

pbreed: Thanks, I will implement your suggestions.

rnixon: In the case where 1 is failing, I don't know of a way to easily determine what the device is actually doing (I don't have to ability to flash LEDs or monitor the serial port ). It just looks dead and I'm not sure if it is constantly rebooting or stuck somewhere in the program...or not stuck at all. Maybe I can put an elapsed-time-from-powerup time-stamp in the UDP message to get a gauge of whether the code was stuck or the device was constantly rebooting? A more extreme approach might be to solder jumper wires to all 20 netburner serial ports and route them outside of our mechanical assembly, and that way I can see if I'm getting the boot-string continually.

Re: intermittent hanging

Posted: Thu Oct 16, 2014 9:38 am
by pbreed
Is the serial port connected to anything?

IE is the thing its connected to somehow sending an 'A' during the boot?

Re: intermittent hanging

Posted: Thu Oct 16, 2014 9:39 am
by pbreed
If the port is conneted and might be getting an 'A' the fix is to change the baudrate setting of the monitor so that when the monitor is booting
the serial chars are all misframed... then set the baudrate back in the applicaiton...