how to debug a silent crash

chourizo · Post by **chourizo** » Fri Nov 22, 2019 9:27 am

Hello,

I am trying to debug some code developed in 2.7.1 for a Netburner Nano. The program seems to be crashing every several hours and I am trying to see why.

There are no memory leaks. I have EnableSmartTraps(); in my main function, but I am redirecting the stdout port to UDP and never saw any message. Should it work anyways?

My application receives messages via udp, parses them using sscanf (for what I read that can cause issues) and also has a TCP server.

Any suggestion of how to find silent problems like this?

Post by **TomNB** » Fri Nov 22, 2019 9:49 am

Exactly (details matter) what does and does not work when you say it crashes. For example, does ping work, web server, serial port, etc. If you have no network communication at all, it could be that you are running out of buffers or sockets. If you have multiple tasks, it could be that the way they have been used results in one of your high priority tasks never blocking and starving the system. If you can no longer ping the device I would recommend adding some iprintf() output to your code and monitor the serial output. That way if your problem has to do with the network, you still have communication.

sulliwk06 · Post by **sulliwk06** » Fri Nov 22, 2019 10:20 am

I think the smart trap diagnostics always go out the default uart, regardless of whether you redirect stdout

chourizo · Post by **chourizo** » Fri Nov 22, 2019 10:30 am

Thanks for the answer. When it crashes, It doesn't answer to ping and the network doesn't work. Another task is listening to a gpio and sending something via rs485 and it doesn't work either when it crashes.

My comms are based on UDP, with the exception of the TCP server that only allows 5 clients. I have been monitoring the buffers with GetFreeCount() and they seem around 1800 or so. Never go down than that.

I will do some tests disabling high priority tasks and see it I can replicate the problem.

Post by **TomNB** » Fri Nov 22, 2019 1:04 pm

How about serial port access. Your device may not be crashing, it might just be out of network buffers or sockets. Under that condition the serial port should still work.

sulliwk06 · Post by **sulliwk06** » Fri Nov 22, 2019 1:57 pm

For the systems I've made, I typically use the watchdog to ensure the code is still running. I'll also add a timer that watches a dummy task at the lowest priority to make sure it has had a chance to run at least every so many seconds/minutes to prove that all my tasks are blocking correctly, If I detect that some task is blocking the others then I'll throw a divide by 0 to print out the trap diagnostics and see which task is locked up where.

I always keep one of the serial ports available for diagnostics and as a backup way to load a new program, but if you don't have any serial ports available, I would try logging as much data on the file system as you can. I managed to get my smart trap diagnostics to write to a file (sometimes, depending on what caused it) but it wasn't an easy feat. While I was doing that I made a way write my own "Trap Handlers" that would print out certain variables from different sections of my own code when a trap occurs. That's probably a bit of overkill unless you have a very large code base like I do, but that's how I handle debugging.

Post by **TomNB** » Fri Nov 22, 2019 2:34 pm

Those are great ideas! I have not used the first one checking a dummy task before, but I know a few times it would have come in handy.

chourizo · Post by **chourizo** » Fri Nov 22, 2019 5:33 pm

That's a really good idea. I cannot use the serial port because the nano is installed in my custom board and don't have access to it, but I can definitely try using the dummy task to see what is causing issues. Thanks a lot!

Post by **pbreed** » Mon Nov 25, 2019 6:22 am

No serial port is hard.

The Nano does have a small amount of ram in the clock module.
The boot process and monitor don't touch this RAM.

If you can boot the nano wihtout power cycling (IE reset switch)
You can try the following:

Smarttrap puts all of its I/O out through a function extern void LocalOutByte( char );

You can write your own and comment out the one in the library
nanof54415\system\bsp.cpp

This is called in interrupt space, so you can't do stdio, you cant do networking, you can't do ANY of that sort of thing.
You can probably write to the RAM in the realtime clock module.
That RAM is "Funky/broken" and only wants to be byte addressed.

After displaying the messgae smart trap tries to reboot....
So if you were getting a trap I'd think the system would come back to life...

Is it possible that this is a thermal or power supply issue?

Post by **pbreed** » Mon Nov 25, 2019 4:19 pm

Also EXACTLY how are you sending stdio to udp?

Is it possible you opened a UDP socket that you never read from, if so that could be your crash...

NetBurner Community Forum

how to debug a silent crash

how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash

Re: how to debug a silent crash