Page 4 of 5

Re: Release 2.8.5

Posted: Fri Dec 08, 2017 5:24 am
by sulliwk06
Actually it looks like your bytes are shifted left by 2, not an endian problem

If i was an endian problem, you would see this:
300 (0x0000012C) -> 738263040 (0x2C010000)

This is a 2 byte left shift:
300 (0x0000012C) -> 19660800 (0x012C0000)


Maybe something in your structure got sized differently in the 2 builds that led to a 2 byte offset?

Re: Release 2.8.5

Posted: Fri Dec 08, 2017 7:12 am
by SeeCwriter
Ah, good point. It's same source code. The only difference is the declaration of IPADDR. I've done a sizeof on IPADDR in both builds and they are the same size. I'll do more investigating.

Re: Release 2.8.5

Posted: Fri Dec 08, 2017 3:25 pm
by SeeCwriter
I found the problem with my structure between the 2 versions of the compiler.

Below is a section of the structure from both compilers. Everything is fine up to multi_band. Then 2.7.6 inserts a pad byte after multi_band so that ntp_server falls on an even boundary. But 2.8.5 does not. So the structures are off by one byte at this point. It continues until use_multicast, where 2.7.6 inserts another pad byte, and 2.8.5 does not. Now the structures are out of sync by 2-bytes. And that’s the way it continues to the end of the structure.

Code: Select all

v2.7.6
           Member      Size  Offset
       rf_subband_frq   16   894
           multi_band    1   910
           ntp_server    4   912
           futureuse1   12   916
           ip_address    4   928
           futureuse2   12   932
              ip_mask    4   944
           futureuse3   12   948
           ip_gateway    4   960
           futureuse4   12   964
         ip_multicast    4   976
           futureuse5   12   980
        use_multicast    1   992
         amp_cfg_mask    2   994
          vtune_level    2   996

Last offset = 1190

Structure size = 1192

======================================

v2.8.5
            Member     Size  Offset
       rf_subband_frq   16   894
           multi_band    1   910
           ntp_server    4   911
           futureuse1   12   915
           ip_address    4   927
           futureuse2   12   931
              ip_mask    4   943
           futureuse3   12   947
           ip_gateway    4   959
           futureuse4   12   963
         ip_multicast    4   975
           futureuse5   12   979
        use_multicast    1   991
         amp_cfg_mask    2   992
          vtune_level    2   994

Last offset = 1188

Structure size = 1192
It looks to me like 2.7.6 is correct, and something happened with 2.8.x.

Re: Release 2.8.5

Posted: Sun Dec 10, 2017 3:26 pm
by pbreed
If you are using structures between different software, I always include the packed directive...

If you DON'T use packed there is no guarantee what the structure layout/padding will be...

typedef struct
{
.
.
} __attribute__( ( packed ) ) MyStructType;

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 7:28 am
by SeeCwriter
Who doesn't upgrade their compiler version over time? I have at least 4 different compilers installed, and I get updates all the time, and I've never had to pack a structure to guarantee a structure layout would stay the same from version to version.

IMO, the way GNU 5.2 changed how the structure is laid out is a bug. The layout isn't even consistent within the same structure with the same compiler version. Up until the point that it deviated, every structure member of 1-byte in size that was followed by an integer had a pad byte inserted so that the integer fell on an even address. And then all of a sudden it breaks the pattern and decides to not do that, plus it adds 2-byes at the end of the structure so that the structure size stays the same as that produced by the previous version? That is a bug.

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 10:27 am
by dciliske
Unfortunately, no. Trust me when I say this, it feels like a betrayal of the compiler for it to do this, but the C *and* C++ language specs do not place any restrictions on how a compiler packs elements within a structure. Therefore, the compiler is completely free to place the members where it chooses. Obviously, the 5.2 compiler is a little more aggressive with packing the data than 4.2.1.

However, as Paul mentioned, GCC (and pretty much every other C/C++ compiler) provide a way to ensure that no silent padding is added into the structure (the 'packed' attribute, or #pragma pack). As a rule, any structure that may be passed in binary form between two separate applications should have an '__attribute__( ( packed ) )' attached to the definition.

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 12:34 pm
by SeeCwriter
I may be splitting hairs, but we aren't passing the structure between applications. We have an application who's current release was compiled with 2.7.6. We then wanted to start implementing IPv6, so we compiled the application with 2.8.5. The structure holds operating parameters of the unit that are stored in EEPROM. When the application boots it copies the structure from EEPROM to sram. But now, a portion of the data is out of alignment and filled with bad data. This isn't a problem with new units that have only had firmware built with 2.8.5. But it makes field upgrades very difficult.

While this is annoying, it is not the cause of the random reboots/crashes I am experiencing. I am no closer today than I was nearly 4-months ago when this was first noticed.

I've had the debugger running on one unit for just over two weeks now with no crashes. Other units continue to crash at random times and intervals (days to weeks), and none provide any useful information from the debug port.

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 1:22 pm
by ephogy
SeeCwriter wrote:I may be splitting hairs, but we aren't passing the structure between applications. We have an application who's current release was compiled with 2.7.6. We then wanted to start implementing IPv6, so we compiled the application with 2.8.5. The structure holds operating parameters of the unit that are stored in EEPROM. When the application boots it copies the structure from EEPROM to sram. But now, a portion of the data is out of alignment and filled with bad data. This isn't a problem with new units that have only had firmware built with 2.8.5. But it makes field upgrades very difficult.
In the 2.8.5 builds, have you considered simply adding __attribute__((aligned(2))) to your structure definition?
It looks as though the alignment is set to 1 for the structure in 2.8.5, whereas it was 2 for the 2.7.6 branch. This should solve loading the values from EEPROM when updating the 2.7.6 branch to the 2.8.5 branch.

If you have any other structures that have (unsigned) chars in them, they may also now be back to 1 byte offsets. Is this potentially causing you grief elsewhere?

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 1:37 pm
by dciliske
While indeed, we are splitting hairs, from the context of guarantees, two different compilations of the same source with *any* change in the build tree (which includes the compiler and libraries), must be considered two different applications.

Uh... How many physical network interfaces exist for the device that crashes and are you running multicast?

Re: Release 2.8.5

Posted: Mon Dec 11, 2017 2:13 pm
by SeeCwriter
The first member of the structure has __attribute__((aligned(2))) as part of it's declaration. Is that what you mean?

We have one network interface. We aren't using multicast at this time. We use broadcasts, twice a second.