Let’s Debug it: Snow That’s Hot to the Touch!
Monday, January 18th, 2010 | Author: lisaksimone

A friend posted this FailBlog pic on Facebook and (as always) I had to figure out how the embedded system screwed up.

Snow.  Real snow.  Bare branches, no movie set.

And a sweltering 119 degrees!

Fahrenheit?  Not with the snow.  Can’t be Celsius or the snow would be boiling.

And well, it does appear icy cold, but if that’s Kelvin then this town is more than 200 degrees below 0°F.

Brrrr.

What the heck is going on?

Lotsa times when stuff like this happens, a rollover is to blame.  Some little variable (say an 8-bit unsigned char) happily marches towards 255 and stumbles over the edge of the world back to 0.  Or another happy signed variable counts down towards -128 and suddenly finds itself in positive 127-land.

So, to the mysterious 119°-in-the-winter bug.  Since the number 119 is sooooo close to the upper end of a signed char (127), and since the temperature is sooo close to what looks like 0°F, this bug just cries out to be a boundary condition error.

Hypothesis - the numbers look suspiciously familiar.

After some mucking around with numbers between -128 and 127, I figured out in reverse that if the outside temperature was actually -9°F,  that I could make the math workout pretty easily.  Not saying this is the real bug, but I betcha it is.  Unless Jack Ganssle tells me otherwise! :-)

Okay, assume the real temperature outside is -9F.  Also assume an 8-bit signed variable is used to describe temperature from the range of -128 to 127 decimal degrees.  That’s logical if you can’t spare the change to buy a couple more bits, since there’s very few places in the world (with neighborhood banks anyway) where you’ll find this temperature range exceeded.  And if you do, well, all bets are kinda off then anyway.

Prove it

So, wandering into binary land, -9 decimal is binary 1111 0111.  If the  top (signed) bit got switched to a 0, then the binary number becomes 0111 0111 or 0×77, which is = 119 decimal.

Okay, again in slow motion.  We start with +9 in binary, convert to -9 via twos-complement, and then see what’s left.

9: 0000 1001

Recall we use twos-complement to make a number negative.  Invert each bit and add 1.

-9: 1111 0110 + 1 = 1111 0111

Now, check out 119 in binary.  You can convert to hex, then binary, your ask Google “119  in binary” and it will happily report 119 = 0b 1110111 (”0b” being the prefix for binary.  They also annoyingly left off the leading zero).

See how close 1111 0111 is to 0111 0111?  If the top (signed) bit for -9 were changed to a 0, then the result would be interpreted as a positive number (0111 0111) which we now know is 119.

But you assumed the real temperature a priori.

Yes I did.  With problems like this that’s a conversion of some sort, and you have an idea of the “before” and “after” (in this case, “cold” and “119″), working from both directions is smart.

Root Cause?

So what can make this happen?  Since it appears actual temperature in Fahrenheit is stored in a signed char, we clearly need a signed variable to express negative numbers.  An 8-bit signed char is reasonable.  And if the temperature was anecdotally reported around -10, then the top (8th bit) needs to be ‘1′.

Something made that upper bit 1) go away, 2) get ignored, 3) get misinterpreted.  And probably 4) and 5) I haven’t thought of.

Mechanical-electrical: Magically-changing bit seems far-fetched, but bad solder joints and other electrical-mechanical problems are often to blame.  If the 8-bit value were sent from one processor to another, this could be the cause.

Unconnected port pins: Jack Ganssle explained a similar billboard temperature problem to me - the 501F Walgreen’s sign.  On a hot day, the temperature was reported as 501 degrees.  The culprit was an unconnected input port that the firmware assumed remained as the default 0.  As bias currents changed during a hot (not 501!) day, the zero turned to a one, and the processor happened to include that bit in the display data.  Again, assuming it was zero.

Other hypotheses appears in the FailBlog comments sections, and a few folks have sorts of clues.

  1. “It’s really 11.9 and the decimal point is missing.”
  2. 11.9 F = -11.2 C
  3. “128 (2^7 or 8 signed bits) - 9 = 119. “
  4. “You only need  7 bits if the variable is unsigned (0 - 127)”
  5. The range of unsigned 7-bits is 0-127.  If  1110111 (119) is interpreted as signed, then it’s -9.
  6. The closest answer uses 7-bit binary as well and suggests an accidental switch of the 7th bit rather than the 8th bit.
  7. And miscellaneous guesses about Celsius to Fahrenheit to Kelvin, etc.
  8. “He’s such a tease,” “People have always said my hair should be blonde,” and “interesting tattoo.”

1 - What billboard anywhere uses decimal points in the temperature reading?  Although this *could* be a missing decimal point.  But the anecdotal evidence shared was an outside temperature of ~ -10F.

2 - Yeah, nice coincidence.

3,4- Both about the same thing.  Assumption seems to be that the temperature is computed/stored as 8-bits signed and then displayed assuming it’s 7-bits unsigned.  And in this case, the idiot designer must have assumed that the temperature would never have dropped below 0°F.

5,6 - To the first point - true.  To the second, if a 7-bit number is interpreted as a signed number, then the 7th bit is the sign, and the # is stored in bits 0 through 6.  Therefore, the value is 11 0111 or 55.  No good.

7 - At least go to a search engine and do a couple conversions before posting drivel.

8 - Well, yeah, it is the interwebs out there and Darwin’s nowhere to be seen.

I love embedded software bugs!