Monday, November 15, 2010

What the "-1.#QNB"? Debugging We Go

I recently compiled the code I have been writing about and ran a regression data set (aka an old operational run). The original had been run on a Linux box using Lahey's Fortran compiler, and we also had an older Solaris version.

Aside from the expected differences (end of line, time and date stamps, CPU time, etc), I ran across the text in the title of this post. Specifically, an average of some reasonable parameters was coming out:

"-1.#QNB" instead of "0.1327"

WTF?

The rest of the run looked fine, and the final estimates of the parameters were all correct.
I googled the "#QNB", and got a number of hits for the Qatar National Bank, Google not paying attention to the "#" at the beginning.

Adding the "1." yielded something useful. I found an old discussion thread at "www.rhinocerous.net" (apparently a site about parallel computing). The actual page is gone, but I found a cached version of the thread. Long story short, this is a "not a number" (NaN) message.

Both I and the original poster were using MinGW. Specifically, I'm using g77. The number in question was printed out using F8.4. Apparently cygwin does show this as NaN. It turns out so will MinGW, but only if there are enough digits of precision. When I print the quantity in question using WRITE(*,*), I get "-1.#QNAN". With only 4 decimal places, however, it appears that this rounds to "-1.#QNB". I never would have guessed.

For the record, I tried the varying the format from F8.0 to F10.8, and here's what I found:

F8.0 -1.
F8.1 -1.$
F8.2 -1.#R
F8.3 -1.#QO
F8.4 -1.#QNB
F8.5 -1.#QNAN
F10.6 -1.#QNAN0
F10.7 -1.#QNAN00
F10.8 ************

which makes sense, if you think of rounding the integer equivalent of the ASCII codes. Presumably more precision will simply yield more trailing 0's.

The F8.0 result is concerning, because in that case there is no indication that there is anything wrong.

And people wonder why I mumble when they ask me what I did all day.

Friday, November 12, 2010

A Short-cut for Comparisons

Having shown the lowest possible tech version of getting the comparison of files done, I thought I'd show a couple of simple improvements. These (and a few others) can be found in MS_DOS Batch Files, (2nd ed) by Kris Jamsa.


Instead of using dir and then editing the results, we can have DOS do the iteration over files for us. The basic syntax is:

FOR %%I IN (a, b, c) DO command

where command is any DOS command. Specifically, we can use it to call out comparison batch file "comp.bat".

Within the parentheses can be an arbitrary set of files. Wildcards are allowed, so we can write:

compall.bat:

FOR %%I IN (*.f) DO call comp

This one-liner replaces the 60-line file we created with Excel yesterday, and gives identical function.

Thursday, November 11, 2010

Dude, where's my code? Sorting out a "wealth of files"

So, I've been given the job of doing a major clean-up on a small/medium-sized system (~12K lines of FORTRAN 77) that implements a core business functionality. The system is about 15-20 years old, and consists of sixty odd files. No tests for anything, and the bulk was written by statisticians. UGH.

I'm documenting the steps in the process as an aid to others (or at least an outlet for my whining).

Figure Out Where the Code Is
Historically, use of version control has been spotty in certain areas, despite my impassioned pleas. I discovered a secret to getting others to adopt it, however. I had a couple of high priority bug fixes within a month. When we were doing impact analysis, I got to say "Since we don't do version control, we have no idea when this happened. It could have been last release, or it could have been when the original version was written in 1992." Suddenly, every one thinks Subversion is a truly excellent idea.

Figure Out Which Versions to Merge
A survey of the department turned up two major versions of the source. Moreover, one of the major versions had spawned at least 4 minor variants. In addition, I'm guilty of having pulled part of the source down and made minor fixes. Each time, an emergency overcame the work and so I have several directories on my hard-drive with names like "temp4" and "PRE_FIX_07_2010". Any moral high ground that I had previously laid claim to just left the building. Sigh.

So, job 1 is to clean out all of the redundancy without losing any important enhancements/fixes. For this first pass, I'm taking the lowest tech approach possible. I could go faster by writing Python/Ruby scripts, but I wanted to keep tight control on this first pass. Automating would make the job faster, but I needed the confidence that comes with direct, hands on work.

Low-tech Automation: Batch Files
I chose one recent version as "base", and wrote two simple DOS batch files to do the comparisons (we live in XP world).
The first compares two files with the same name:

COMP.BAT

fc %1 C:\BASE_VERSION\*.*

fc is a simple file byte-by-byte file comparison. In the present case, it does the job because the contents are identical for about 80% of the files any two versions.

Next, I created a list of the files in the source directory:

dir /B > filenames.txt

The /B switch tells dir to report only file names (one per line), without the date of last modification, etc. The output is redirected into the file "filenames.txt". Using ">" overwrites the file if it exists; ">>" would append the results.

Now I pulled the filenames into Excel. I inserted a column before the column of names. I put "call comp" in the first cell, and then dragged it down in front of all of the file names. Similarly, I put ">> results.out" in the column after the filename and dragged the fill cursor down to copy it into each row. Recall that ">>" would appends the results to the file specified. Finally, I copied it to the clipboard, pasted it into an editor, and saved it to compall.bat.
The first few rows of the final file looked like:

call comp aaa.f >> results.out
call comp bbb.f >> results.out
etc.

Now all I had to do was copy both and into each directory, type compall, and open up results.txt. Files that compared the same could simply be deleted, allowing me to focus only on the differences.

Three things to remember about this approach
1) It is about as low tech as it gets.
2) I have to remember to delete the file results.out if I decide to re-run the job. Otherwise the new run is appended to the results of the original, which can be confusing if you haven't had your morning coffee.
3) Notice that I used "call comp" instead of just "comp". In a DOS batch file, "comp" transfers control to comp.bat program, which would compare the first file and then quit. I use "call comp"control returns to the original program (compall.bat) after comp.bat runs. Essentially, "call comp" is calling a function/subroutine/procedure, while the plain "comp" is a GOTO.