Thursday, November 11, 2010

Dude, where's my code? Sorting out a "wealth of files"

So, I've been given the job of doing a major clean-up on a small/medium-sized system (~12K lines of FORTRAN 77) that implements a core business functionality. The system is about 15-20 years old, and consists of sixty odd files. No tests for anything, and the bulk was written by statisticians. UGH.

I'm documenting the steps in the process as an aid to others (or at least an outlet for my whining).

Figure Out Where the Code Is
Historically, use of version control has been spotty in certain areas, despite my impassioned pleas. I discovered a secret to getting others to adopt it, however. I had a couple of high priority bug fixes within a month. When we were doing impact analysis, I got to say "Since we don't do version control, we have no idea when this happened. It could have been last release, or it could have been when the original version was written in 1992." Suddenly, every one thinks Subversion is a truly excellent idea.

Figure Out Which Versions to Merge
A survey of the department turned up two major versions of the source. Moreover, one of the major versions had spawned at least 4 minor variants. In addition, I'm guilty of having pulled part of the source down and made minor fixes. Each time, an emergency overcame the work and so I have several directories on my hard-drive with names like "temp4" and "PRE_FIX_07_2010". Any moral high ground that I had previously laid claim to just left the building. Sigh.

So, job 1 is to clean out all of the redundancy without losing any important enhancements/fixes. For this first pass, I'm taking the lowest tech approach possible. I could go faster by writing Python/Ruby scripts, but I wanted to keep tight control on this first pass. Automating would make the job faster, but I needed the confidence that comes with direct, hands on work.

Low-tech Automation: Batch Files
I chose one recent version as "base", and wrote two simple DOS batch files to do the comparisons (we live in XP world).
The first compares two files with the same name:

COMP.BAT

fc %1 C:\BASE_VERSION\*.*

fc is a simple file byte-by-byte file comparison. In the present case, it does the job because the contents are identical for about 80% of the files any two versions.

Next, I created a list of the files in the source directory:

dir /B > filenames.txt

The /B switch tells dir to report only file names (one per line), without the date of last modification, etc. The output is redirected into the file "filenames.txt". Using ">" overwrites the file if it exists; ">>" would append the results.

Now I pulled the filenames into Excel. I inserted a column before the column of names. I put "call comp" in the first cell, and then dragged it down in front of all of the file names. Similarly, I put ">> results.out" in the column after the filename and dragged the fill cursor down to copy it into each row. Recall that ">>" would appends the results to the file specified. Finally, I copied it to the clipboard, pasted it into an editor, and saved it to compall.bat.
The first few rows of the final file looked like:

call comp aaa.f >> results.out
call comp bbb.f >> results.out
etc.

Now all I had to do was copy both and into each directory, type compall, and open up results.txt. Files that compared the same could simply be deleted, allowing me to focus only on the differences.

Three things to remember about this approach
1) It is about as low tech as it gets.
2) I have to remember to delete the file results.out if I decide to re-run the job. Otherwise the new run is appended to the results of the original, which can be confusing if you haven't had your morning coffee.
3) Notice that I used "call comp" instead of just "comp". In a DOS batch file, "comp" transfers control to comp.bat program, which would compare the first file and then quit. I use "call comp"control returns to the original program (compall.bat) after comp.bat runs. Essentially, "call comp" is calling a function/subroutine/procedure, while the plain "comp" is a GOTO.

No comments:

Post a Comment