Page 1 of 1

floating point issue in nbfloatprint.cpp

Posted: Wed Sep 19, 2018 1:26 pm
by ephogy
Occasionally, one of the tasks that generates a string gets stuck. I finally had a development version of the software running with TaskMonitor running.

The code that seems to break it is a relatively simple call to snprintf(buffer, sizeof(buffer), "%f", some_float).
since this is hard to reproduce, I have absolutely no idea as to what the value of some_float is, but it *should* within the range of (-0.1, 0.1), and can be extremely close to 0.

Multiple requests using TaskScan seem to indicate that the code is stuck in nbfloatprint.cpp between lines 83-86:

Code: Select all

while ( fi > 0xFFFFFFFFFFFFFFFFULL )
{
    l++;
    fi /= 10.0;
}
It appears as though the operation fi /= 10.0; is causing the gcc compiled code to throw an internal exception and the while loop doesn't handle the exception gracefully, and the task gets stuck in an infinite loop.

The software is running on a MOD5272, NNDK2.7, and internally we use our own implementation of floats as they are 2-3x faster than gcc's implementation of IEEE754, then when something like printf is required, the internal floating point representation is converted to the IEEE754 representation.

I'm guessing that I'll need to modify this loop somehow, but I'd like to concede to someone with perhaps a little more experience than me.

Cheers,
Keith

Re: floating point issue in nbfloatprint.cpp

Posted: Mon Sep 24, 2018 8:51 am
by ephogy
I believe I've solved my problem, or at least I see what is going on now...

In nbfloatprint.cpp, TheFloatPrintf function simply doesn't handle Infinity (properly?)

In my main code,

Code: Select all

    uint64_double infTest;
    printf("Infinity (bit pattern, float value):\r\n");

    infTest.d = nan("0");
    printf ("\tnan: %llX, %f\r\n", infTest.u, infTest.d);

    infTest.d = infinity();
    printf ("\tpos: %llX, %f\r\n", infTest.u, infTest.d);

    infTest.d = -infinity();
    printf ("\tneg: %llX, %f\r\n", infTest.u, infTest.d);
  • Serial port response:

    EPWaiting 2sec to start 'A' to abort
    Configured IP = 192.168.1.230
    Configured Mask = 255.255.255.0
    MAC Address= 00:03:f4:09:39:51

    Application started
    Infinity (bit pattern, float value):
    nan: 7FF8000000000000, NAN
    pos: 7FF0000000000000, I?Waiting 2sec to start 'A' to abort
Is this a known problem? I don't think I'm doing anything strange in my code here, or after, as removing these lines, the application runs fine.

Re: floating point issue in nbfloatprint.cpp

Posted: Tue Sep 25, 2018 11:41 am
by TomNB
Exactly which 2.7.x release are you using?
Can you attach a minimal main.cpp using only standard data types that exhibits the problem that we can build and test?

Re: floating point issue in nbfloatprint.cpp

Posted: Tue Sep 25, 2018 2:20 pm
by ephogy
Hi Tom,

My mistake. It's actually NNDK 2.8.7 that I'm using.

I have the install for NNDK 2.6, but it looks like this section of the kernel went through some heavy modification since my 2.6.0 code base. e.g. none of the files I've listed above exist.

In the above code, uint64_double is just a union of unsigned long long and double so I could print out the hex value. It's not needed;

Code: Select all

printf("%f\r\n", infinity());
should produce the same problem. Occasionally it works, but instead of printing Inf to the screen, you get some crazy number.

I added:

Code: Select all

    if ( isinf( d ) )
    {
        char buf[4] = {'-', 'I', 'n', 'f'};
        char *bufp = buf;
        if (sign)
        {
            pfs.len = 4;
        }
        else
        {
            pfs.len = 3;
            bufp++;
        }

        prespace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
        pf( data, bufp, pfs.len );
        pfs.nsent += pfs.len;
        ret = postspace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
        return ret;
    }
at line 512 of nbfloatprint.cpp (just below the isnan conditional), and this resolved my issue.

Keith

Re: floating point issue in nbfloatprint.cpp

Posted: Wed Sep 26, 2018 1:52 pm
by TomNB
Hello,

If I follow what you have described with the program below, nothing prints out at all. To be honest, I have not found anyone here, or searching through google, as to what I would use the infinity() function for. So you are saying you can run the a program like the one below in 2.8.7 and you get varying results? Important: with no modification to how floats are handled.

void UserMain(void * pd) {
InitializeStack();
GetDHCPAddressIfNecessary();
OSChangePrio(MAIN_PRIO);
EnableAutoUpdate();
StartHTTP();
EnableTaskMonitor();

#ifndef _DEBUG
EnableSmartTraps();
#endif


iprintf("Application started\n");
while (1)
{
printf("Infinity: %f\r\n", infinity());
OSTimeDly(TICKS_PER_SECOND);
}
}

Re: floating point issue in nbfloatprint.cpp

Posted: Fri Sep 28, 2018 8:05 am
by ephogy
Hi Tom,

I recompiled the code you've listed above with a just a very minor change (no web server):

Code: Select all

#include <stdio.h>
#include <autoupdate.h>
#include <ip.h>
//#include <http.h>
#include <dhcpclient.h>
#include <taskmon.h>
#include <smarttrap.h>
#include <math.h>

void UserMain(void * pd)
{
    InitializeStack();
    GetDHCPAddressIfNecessary();
    OSChangePrio(MAIN_PRIO);
    EnableAutoUpdate();
    //StartHTTP();
    EnableTaskMonitor();

    #ifndef _DEBUG
    EnableSmartTraps();
    #endif

    iprintf("Application started\n");
    while (1)
    {
        printf("Infinity: %g\r\n", infinity());
        OSTimeDly(TICKS_PER_SECOND);
    }
}
This is the output:

Waiting 2sec to start 'A' to abort
Configured IP = 192.168.1.230
Configured Mask = 255.255.255.0
MAC Address= 00:03:f4:09:39:51
Application started

No modifications to the kernel.
The function infinity() is not the important important part, it's just that the infinity case is NOT handled in nbfloatprint.cpp. the NaN case, however, IS handled.

When a float is printed to the screen, there are several functions which have while loops with no error checks:

Code: Select all

int Get_f_len(double d, pfstate & pfs)
...
    while ( fi > 0xFFFFFFFFFFFFFFFFULL )
    {
        l++;
        fi /= 10.0;
    }
...

int OutputFFloat(PutCharsFunction * pf, void * data, double d, pfstate & pfs)
...
        while ( fi > 0xFFFFFFFFFFFFFFFFULL )
        {
            fi /= 10.0;
            exp++;
        }
...

int OutputEFloat(PutCharsFunction * pf, void * data, double d, pfstate & pfs, char e_or_E)
...
    while ( d >= 10.0 )
    {
        d /= 10.0;
        exp++;
    }
...
since no check for infinity is done before these calls are made, gcc throws an internal exception when fi (or d) are divided by 10.0, and the original value of fi and d are maintained at infinity, hence the loops run forever.

handing the Inf case, in the same location as the NaN case is currently handled, solves the problem. This non-handling of printing infinities exists in NNDK 2.8.1, 2.8.5 and 2.8.7 (though I assume this exists since these files were introduced).

In my case, the software is performing logarithm calculations and at some point tries to take the logf(0.0), which results in Infinity, and when printing this back to a web page, the HTTP task gets stuck in this infinite loop.

Re: floating point issue in nbfloatprint.cpp

Posted: Fri Sep 28, 2018 10:36 am
by TomNB
Hello ephogy,

I ran all this by engineering, and they said it was a really good find on your part, thank you very much for sharing all the details. The fix they came up with and tested is to make a change in nbfloatprintf.cpp. In the function TheFloatPrintf() add another check under NAN for INF. In the file I am looking at in 2.8.7 it is on line 512.

So before there was only:

if ( isnan( d ) )
{
char buf[3] = {'N', 'A', 'N'};
pfs.len = 3;
prespace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
pf( data, buf, 3 );
pfs.nsent += 3;
ret = postspace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
return ret;
}

Copy that block of code and paste it underneath. Then change the N A N characters to I N F:

if ( isinf( d ) )
{
char buf[3] = {'I', 'N', 'F'};
pfs.len = 3;
prespace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
pf( data, buf, 3 );
pfs.nsent += 3;
ret = postspace( pfs.width, pfs.flags, pfs.len, pf, data, pfs.nsent );
return ret;
}

So now there are two checks, one for NAN and one for INF. I have run this also on my release and the previous code that hung on INF now works properly.

Re: floating point issue in nbfloatprint.cpp

Posted: Fri Sep 28, 2018 10:49 am
by ephogy
Thanks Tom,

Glad I could help get to the bottom of this.

Cheers,
Keith

Re: floating point issue in nbfloatprint.cpp

Posted: Fri Sep 28, 2018 1:18 pm
by pbreed
That was a nice bug find.
Thanks for digging to the bottom of this...