Producing Commented Source Code from Spreadsheets with XL2E

Robert Rhode worked as a software engineer for Exele Information Systems, Inc. in a past life.

Abstract

Performance equations are periodic calculations performed as part of or in conjunction with an automated control system for an industrial process. A large system may use thousands of performance equations (PEs), many of which are identical apart from their specific data points (input variables). Using cut and paste, a programmer may produce a couple hundred equations per hour, with a high likelihood of making errors and a low likelihood of adequately documenting the code. However, if the PEs are defined in a consistent, regular format, such as in a table or spreadsheet, then an automated code generation tool can improve speed, reliability, and accountability.

XL2E is a PERL script that reads an engineering requirements specification from a tab-delimited text file, such as a spreadsheet saved as text, and produces fully commented PE source code. The spreadsheet input describes the performance equations, and the output is the implementation of the performance calculations in EDICT source code. The entire input spreadsheet is reproduced in tabular form as comments, with executable code interspersed between commented equation definitions. Using XL2E, hundreds of equations can be processed in a minute, error-free, with each equation annotated with the customer's own original specification. Furthermore, by using the original requirements specification as documentation, the programmer can be confident that the implementation does exactly what the customer requested.

EDICT is Exele Information Systems' PE add-on for PI Data Archive. With respect to XL2E, EDICT may be viewed as a special-purpose C preprocessor. PI, from OSI Software, Inc., is the leading data archive product in the industrial automation market. Exele sells versions of EDICT that work with FORTRAN, C, and VB, and that run under VMS, Unix, or Windows NT.


Introduction

You may wonder why I'm bothering to tell you about XL2E. Apart from the obvious retort that I wanted to save a bundle on the entry fee to this conference, that is. I expected that most of the speakers and attendees at this conference would come from "ivory towers" such as academia, webmastering, or writing PERL books for O'Reilly. I thought it might be a refreshing change of pace to show the wizards how PERL is used by the gnomes down in the engineering cellar. I also happen to think this script fairly elegantly performs a useful function that other people might be interested in.

Why XL2E Is Useful

Writing performance equations (PEs) is a drag. Nothing makes your day like being handed a spec containing hundreds of confusing, ill-formatted equations, full of magical constants and nearly identical apart from the tags (values read from the control system) they reference. Exele has a tool to create such boilerplate equations, but you still have to build a list of all the input and output tags, which takes time and exposes you to introducing errors.

The only way to verify the correctness of the your PEs is to flip back and forth between the specification and the source code, checking the math by eye. When the customer's engineers come to review the code, they will repeat the verification by the same laborious method. When mistakes are found, the offending equations must be rewritten.

The people who will eventually maintain this code are process engineers, not programmers, so the code must be straightforward and well documented. The more effort you spend writing a beautiful library of elegant, optimized equations, the less the customer will like it. You need to spit out some code that gives your customer high confidence that they are getting precisely what they asked for, with minimal effort. For this reason, having the implementation closely resemble the original specification is more important than producing efficient code.

Who Might Use XL2E

Anyone who writes large systems of arithmetic equations from a tabular (e.g., spreadsheet) specification might find XL2E handy. XL2E can be particularly useful if internal documentation or accountability to the written specification is essential. For more difficult equations, it is possible (easy, even) to produce only the table-format comment lines and write the code by hand.

Example

The files shown below demonstrate what a customer might submit as part of a FRS (functional requirements specification), and the corresponding EDICT source code that XL2E would generate. A real FRS would likely contain the same sort of equations, but repeated 20 times for 20 different motor groups. Furthermore, there would be 20 such spreadsheets, each specifying a different category of PEs (e.g., Motor status, valve positions, relay states, temperature and pressure sensors, level meters, etc.). Some PEs resemble this simple example; others can be quite messy.

Input file: Demo.txt

This is an actual XL2E input file. This file was originally a spreadsheet, which I saved as (tab-delimited) text.
Table 1.1 Motors Power and Status

	Symbol	Tag	Formula	Units	Description
Eq. 1	ATOT	POWTOT.CV	(CUR1.PV + CUR2.PV + CUR3.PV + CUR4.PV) * VOLTAGE where VOLTAGE = 230 V	Watts	Power usage in motors
					
	A1	CUR1.PV		Amperes	Current in motor 1
	A2	CUR2.PV		Amperes	Current in motor 2
	A3	CUR3.PV		Amperes	Current in motor 3
	A4	CUR4.PV		Amperes	Current in motor 4
	VOLTAGE		230	Volts	Supply voltage

Eq. 2		RUNNING.CV	.CV = MOTRUN1.PV OR MOTRUN3.PV OR MOTRUN2.PV OR MOTRUN4.PV	BOOL	Motor status
	R1	MOTRUN1.PV			Motor 1 is running (BOOLEAN)
	R2	MOTRUN2.PV			Motor 2 is running (BOOLEAN)
	R3	MOTRUN3.PV			Motor 3 is running (BOOLEAN)
	R4	MOTRUN4.PV			Motor 4 is running (BOOLEAN)

Output file: Demo.dic

XL2E generated this EDICT source code file from the table shown above. EDICT equation definitions start with an output tag declaration and contain C code fragments. Note how the original table is reproduced in block comments, with relevant executable code interspersed between the lines. "$TAG()" is an EDICT keyword that flags the enclosed variable as a PI tag, rather than a C local variable. "VOLTAGE" is an example of a local variable.
PINODE=localhost:5450
/*====================================================================*/
/* demo.dic - EDICT dictionary                                        */
/*====================================================================*/
/* Revision history:                                                  */
/* 1.0  Wed May 26 13:33:58 1999                                      */
/*      Created by xl2e.pl from: demo.txt                             */
/*====================================================================*/
/* xl2e.pl version 2.1                                                */
/* EDICT Dictionary Generator                                         */
/* (c)1998-1999 by Exele Information Systems, Inc.                    */
/* 445 W. Commercial St., East Rochester, NY 14445  (716)385-9740     */
/*====================================================================*/
/*  Table 1.1 Motors Power and Status                                                                           */
/*                                     Symbol   Tag         Formula      Units    Description                   */

{$TAG(POWTOT.CV)
/*  Eq. 1                              ATOT     POWTOT.CV   (CUR1.PV +   Watts    Power usage in motors         */
/*                                                          CUR2.PV +                                           */
/*                                                          CUR3.PV +                                           */
/*                                                          CUR4.PV) *                                          */
/*                                                          VOLTAGE                                             */
/*                                                          where                                               */
/*                                                          VOLTAGE =                                           */
/*                                                          230 V                                               */
/*                                     A1       CUR1.PV                  Amperes  Current in motor 1            */
/*                                     A2       CUR2.PV                  Amperes  Current in motor 2            */
/*                                     A3       CUR3.PV                  Amperes  Current in motor 3            */
/*                                     A4       CUR4.PV                  Amperes  Current in motor 4            */
/*                                     VOLTAGE              230          Volts    Supply voltage                */

VOLTAGE = &
	230;
$TAG(POWTOT.CV) = &
	($TAG(CUR1.PV) &
	+ $TAG(CUR2.PV) &
	+ $TAG(CUR3.PV) &
	+ $TAG(CUR4.PV)) * VOLTAGE ;



{$TAG(RUNNING.CV)
/*  Eq. 2                                       RUNNING.CV  .CV =        BOOL     Motor status                  */
/*                                                          MOTRUN1.PV                                          */
/*                                                          OR                                                  */
/*                                                          MOTRUN3.PV                                          */
/*                                                          OR                                                  */
/*                                                          MOTRUN2.PV                                          */
/*                                                          OR                                                  */
/*                                                          MOTRUN4.PV                                          */
/*                                     R1       MOTRUN1.PV                        Motor 1 is running (BOOLEAN)  */
/*                                     R2       MOTRUN2.PV                        Motor 2 is running (BOOLEAN)  */
/*                                     R3       MOTRUN3.PV                        Motor 3 is running (BOOLEAN)  */
/*                                     R4       MOTRUN4.PV                        Motor 4 is running (BOOLEAN)  */

$TAG(RUNNING.CV) = &
	$TAG(MOTRUN1.PV) &
	|| $TAG(MOTRUN3.PV) &
	|| $TAG(MOTRUN2.PV) &
	|| $TAG(MOTRUN4.PV);


STDOUT diagnostic

This is the progress message printed to STDOUT.
demo
Processing: demo.txt
Header row:  3
Target tag in column: 2
Equation in column: 3
16 lines read
Compress:  0  0  0  1  0  0
Widths:  33  7  10  11  7  28
Deleting old demo.dic

format DICTFILE = 
/*~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~^<<<<<<~~^<<<<<<<<<~~^<<<<<<<<<<~~^<<<<<<~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<~~*/
$curr_line[0],$curr_line[1],$curr_line[2],$curr_line[3],$curr_line[4],$curr_line[5],
.

What XL2E Does

In brief, XL2E reads ASCII-formatted equations from a tab-delimited table and spits them out as C-like source code, with the original input file reproduced in tabular form in C comments.

Tab-delimited ASCII table input

People rarely distribute tab-delimited text tables; they more often distribute spreadsheets (which I save as text). The table need not be tab-delimited; I chose it because FRS description fields often contain commas, but rarely tabs. Tab-delimited was easier to parse in this instance.

C-like source code output

I say "C-like" because the real C code is generated by EDICT. EDICT source is composed of code snippets in the target language (C, in this case) annotated with EDICT keywords. EDICT, by the way, stands for "Equation Dictionary". EDICT, written and sold by Exele Information Systems, is a PE add-on for PI Data Archive. The good folks at Exele would love to sell you a few copies of either product.

Tabular format comments

XL2E formats the comment lines in such a way as to preserve the tabular layout of the original spreadsheet cells, ensuring that the documentation in the generated code resembles the customer's requirements sepecification as closely as possible. When the table rows exceed EDICT's maximum allowed line length (as they usually do), the cell text is wrapped intelligently across multiple comment lines. This feature alone consumes about 25% of the code in XL2E.

Flexible file parsing

XL2E is not precisely intended to be a swiss army knife. I used about five different derivatives of it in one project. Most of the important regular expressions are defined in string variables at the top of the file. XL2E is smart enough to filter out header rows and figure out which column the equations occupy. A second (extended) equation column is supported because some of the equations in the FRS ran longer than Excel 95's 255-character cell limit.

Friendly equation formatting

XL2E provides a few "reader-friendly" functions when it generates the EDICT source code. The equation is broken up across several rows for better readability. Because XL2E was written with an eye toward engineering applications, it also knows to discard units that follow constants (e.g., 120 VAC).

Process tags (variables from the control system) are clearly labeled with the EDICT "$TAG" keyword in order to distinguish them from local (PE temporary) variables, which are sometimes added for better clarity or to reuse a common subexpression. The equation block comment is placed between the equation declaration and the implementation.

XL2E also currently supports one symbolic subexpression, declared by the keyword "where" (e.g., Volume = Area * Length where Area = PI * R * R )

The equation specification may also use multiple bracket styles to improve readability in nested parentheses. XL2E converts all brackets to parentheses in generated source code.

This program will put me out of work!

No, it won't.

First, XL2E needs to be tweaked every time a new FRS format is introduced (which happens all the time). Second, there are plenty of performance equations that are too complex to be handled by this simpleminded tool. Third, XL2E has no ability to create solutions; if it isn't in the FRS, XL2E can't add it in.

The next version of XL2E will put you out of work. ;-)

How XL2E Works

XL2E works in two passes. In the first pass, it reads the input file and stores it in an array of lines, at the same time measuring the width of the spreadsheet columns. In the second pass, it extracts the equations from the memory copy and formats the original input lines as comments in the output source code. One of the niftier features of XL2E is that the comments preserve the tabular formatting of the original spreadsheet. Some diagnostic information is printed to stdout, and the EDICT source code is written to files with names that correspond to the original input files. Let us touch upon some of the more interesting code snippets.

Process files

XL2E processes input in typical PERL fashion. It slurps the input file into an array, taking notes as it goes. Then writes the whole thing out, transforming the stored lines as it goes. If an input filename does not exist, XL2E will happily try appending the default extension.

Record column widths

Because the table is likely to be wider than the maximum EDICT input line length, XL2E must note both the maximum and minimum width of each column. The minimum width is the length of the longest word in the column.
foreach $i (0..$#curr_line) {
    if ($i != $ex_eq_field) {
        $field_maxw[$i] = &GREATER($field_maxw[$i],
                                   length $curr_line[$i]);
        $field_minw[$i] = &GREATER($field_minw[$i],
                                   &MAXLEN( split(/ +/,$curr_line[$i]) ));
    }
}

GREATER and MAXLEN are functions I wrote. GREATER returns the value of the greater of the first two arguments, short and sweet. MAXLEN returns the length of the longest string in its argument vector.

Compression Ratio

The "compression ratio" of each column is calculated as the ratio of its maximum width (length of its longest cell) divided by its minimum width (length of its longest word).
foreach $i (0..$fieldn) {
...
    $comp_ratio[$i] = $field_maxw[$i] / $field_minw[$i];
...
}

Compress columns

While the length of the comment line remains greater than the allowed length, the widest column is selected for compression and the length of the comment line is recalculated. When a column is compressed, the text in that column will be wrapped across multiple lines in the comment block.
while(($comment_line_len > $EDICT_line_limit) &&
      (&SUM(@compress) <= ($ex_eq_field == -1 ? $fieldn-1 : $fieldn))) {
    my $comp_idx = &IDXMAX (@comp_ratio);
    $comment_line_len -= $field_maxw[$comp_idx];
    $comment_line_len += $field_minw[$comp_idx];
    $compress[$comp_idx] = 1;
    $comp_ratio[$comp_idx] = 0;
} # while

IDXMAX is a function that returns the index of the greatest element in an array.

The next step is to distribute the remaining free space proportionally among the compressed (wrapped) columns (uncompressed columns already have all the space they want). The first loop adds up the free space, and the second loop divides it up.


foreach $i (0..$fieldn) {
    if ($i != $ex_eq_field) {
        $compress[$i] = 1 if ($compress_all == 1);
        if( $compress[$i] ) {
            $comment_field_total += $field_minw[$i];
        } else {
            $comment_free_space -= $field_maxw[$i];
        }
    }
} # for

foreach $i (0..$fieldn) {
    if ($i != $ex_eq_field) {
        $field_w[$i] = ($compress[$i]
                        ? int($comment_free_space
                              * $field_minw[$i]
                              / $comment_field_total)
                        : $field_maxw[$i]);
    }
} # for

Compose comment format

In this next section things happen so fast, if you blink you may miss everything. First, a string called "$format" is built up to represent the number and widths of the columns in the commented table. This string is then evaluated as a PERL statement that has the result of defining the format "DICTFILE" for later use. This format will automatically wrap the text in its fields across as many lines as necessary when printing out data.
$format = "format DICTFILE = \n/*";
foreach $i (@field_w) {
    if ($i > 0) {
        $format = $format . "~~^" . "<" x ($i-1);
    }
}
$format = $format . "~~*/\n";
foreach $i (0..$fieldn) {
    if ($i != $ex_eq_field) {
        $format = $format . '$curr_line[' . $i . '],';
    }
}
$format = $format . "\n.\n";

print $format;
eval $format; 
die $@ if $@;

The format string is echoed to STDOUT, as you can see in the diagnostic output section of the example. The code to print out our input spreadsheet as a commented table then looks about as simple as can be.
foreach (@file_line) {
...
    write DICTFILE;
} # foreach line

Generate source code

The ellipsis in the preceding section conceals that minor part of the loop that generates the C-like EDICT source code. There's no need to discuss that code; I'll leave it as an exercise for the reader. However, I will provide this "minor spoiler".

EDICT prefers that comment blocks be printed between the output tag declaration and the performance equation that calculates it. This version of the script also assumes that a new equation will be defined before its input variables. These are both important assumptions, which are coded in (and could easily be coded out).

When a new equation is located, the code below first prints out the previous equation, then prints the declaration for the new equation, then generates the source code for the new equation. The comment lines for the new equation are then processed and printed until the next equation is detected.


@curr_line = split "\t";

...

if( (length $curr_line[$eq_field] > 0) &&
   ($curr_line[$tag_field] =~ /($tagname_re)/) ) {
    if( $eq_to_print ) {
        print DICTFILE "\n$curr_eq";
    }

    print DICTFILE "\n{\$TAG($1)";
    $eq_to_print = 1;

    $curr_eq = (($eq_field == $tag_field)
                ? &FORMAT_EQUATION ($curr_line[$eq_field])
                : &FORMAT_EQUATION ($curr_line[$eq_field],
                                    $curr_line[$tag_field]) );
}

The meaning of the first line should be clear; the fields of the input file are separated by tabs.

Format equations

FORMAT_EQUATION is the function that parses the equation and generates the EDICT source code. Note that it is called with one or two arguments, depending on whether the output tag is defined in the same cell as the equation, as the code below illustrates.
if( $#_ > 0 ) {
    $curr_eq =~ s/^\s*[.0-9A-Z_a-z]+\s*=\s*//;
    $curr_eq = "$_[1] = $curr_eq";
}

The first line above simply strips off any redundant output tag declaration in the equation definition. This does mean that the "TAG" column, if it exists, takes precedence over the equation definition if conflicting output tags are declared.

As previously discussed, XL2E performs a number of automatic transformations on the equation definition to turn it into EDICT source code.

Breaking up the equations achieves the dual purpose of making the code more readable and allowing longer equations (because EDICT's equation length limit is about ten times its line length limit).

Results

XL2E was written, tested, and used over a period of about four weeks (part-time). The final version produces EDICT dictionaries that need no hand-tweaking for about 75% of the approximately two dozen input files in the project for which XL2E was written. Taking into account the time needed to adapt XL2E to different table formats in new input files, what once took all day can now be accomplished in an hour.

Compared with generating code by hand or using text editor macros, XL2E is faster, more accurate, more consistent, and produces more maintainable code.


Appendix A: XL2E Source Code

$script_name = $0;
# xl2e.pl
# by Robert Rhode (robert_rhode@yahoo.com)
#
# Description
#=============
# Create EDICT performance equations for PI Data Archive based on
# spreadsheet containing functional requirement specification (FRS).
#
# Read in FRS files saved as .txt (tab-delimited cells) and create
# EDICT dictionaries automatically.
#
# This is a second-generation script and is designed to be much
# smarter than its predecessors in detecting equations and long
# comments.  It also is better able to fit the line width to the
# constraints imposed by EDICT.
#
# After the script has run, be sure to check over the output file.
# Some errors are detected by the script and are flagged with
# a comment line that starts with "ERROR!".  Others may be beyond the
# capabilities of the script to detect.
#
# It is expected that some of the regular expressions in this script
# will have to be modified for individual FRS documents.
#
# Usage
#=======
# For each file on the command line, a corresponding .dic file is
# created, which is the EDICT dictionary.
#
# example:
# perl xl2e.pl file1.txt

# Xl2e is a PERL script.  PERL is the command and Xl2e.pl is the
# document that PERL interprets.  Xl2e takes its input from file1.txt
# and prints to file1.dic.
#
# Revision history
#===================
# 1.0   17 jul 1998  bgr  Original version
# 2.0   23 jul 1998  bgr  Rewritten bigger, smarter
# 2.1   12 may 1999  bgr  Cleaned up for YAPC
#
$script_version = "2.1";
#
# TO DO
#======
# Symbol fields
# IF...THEN statements
# Digital tags
# Multiline equations
# Use english variable names
#======================================================
# Input Example: demo.txt
#
# Line	Symbol	Tag	Description	Equation
# 11	Production	WH1_FL_AVG.CV	TOTAL FLOW FROM WELLHEAD NUMBER  1	.CV = WH1W1FL.PV + WH1W2FL.PV + WH1W3FL.PV + WH1W4FL.PV + WH1W5FL.PV	
# 12	WT-1	WH1W1FL.PV	WELL NO. 1 (WH # 1-5)		
# 13		WH1W2FL.PV	WELL NO. 2 (WH # 1-1)		
# 14		WH1W3FL.PV	WELL NO. 3 (WH # 1-3)	
# 15		WH1W4FL.PV	WELL NO. 4 (WH # 1-4)	
# 16		WH1W5FL.PV	WELL NO. 5 (WH # 1-2)	
# 				
#======================================================
# Output Example: demo.dic
#
# /*  Line  Symbol      Tag            Description                         Equation                               */
#
# {$TAG(WH1_FL_AVG.CV)
# /*  11    Production  WH1_FL_AVG.CV  TOTAL FLOW FROM WELLHEAD NUMBER  1  .CV = WH1W1FL.PV + WH1W2FL.PV +        */
# /*                                                                       WH1W3FL.PV + WH1W4FL.PV + WH1W5FL.PV   */
# /*  12    WT-1        WH1W1FL.PV     WELL NO. 1 (WH # 1-5)                                                      */
# /*  13                WH1W2FL.PV     WELL NO. 2 (WH # 1-1)                                                      */
# /*  14                WH1W3FL.PV     WELL NO. 3 (WH # 1-3)                                                      */
# /*  15                WH1W4FL.PV     WELL NO. 4 (WH # 1-4)                                                      */
# /*  16                WH1W5FL.PV     WELL NO. 5 (WH # 1-2)                                                      */
#
# $TAG(WH1_FL_AVG.CV) = &
# 	$TAG(WH1W1FL.PV) &
# 	+ $TAG(WH1W2FL.PV) &
# 	+ $TAG(WH1W3FL.PV) &
# 	+ $TAG(WH1W4FL.PV) &
# 	+ $TAG(WH1W5FL.PV);
#
#
#======================================================
#
# Subroutines used in this program
#
sub GREATER;
sub IDXMAX;
sub MAXLEN;
sub SUM;
sub FORMAT_EQUATION;
sub FORMAT_HEADER;
#===========================================================
#
# Set output delimiters
#
$, = "  ";
$\ = "\n";
#
# Set some global vars
#
$EDICT_line_limit = 116;
$EDICT_eq_char_limit = 1100;
$EDICT_continue = " &\n\t";
$EDICT_terminate = ";\n";
$File_ext_input = ".txt";
$File_ext_output = ".dic";
$comment_extra_chars = 10;
#
# what does a tagname look like
#             operator
#             equation header
#             extended equation header
#             target tag header
#             symbol header
#
$tagname_re = "(?i)[&0-9_A-Z]+[.][A-Z]+";
$operator_re = "[=/*-+]|\\b(?:AND|OR)\\b";
$eq_hdr_re = "(?i)(?:FORMULA|EQUATION|CALCULATION)";
$ex_eq_hdr_re = "(?i)EXT";
$tag_hdr_re = "(?i)TAG";
$sym_hdr_re = "(?i)SYMBOL";
#===========================================================
#
# BEGIN EXECUTION
#
# Process every file in argv
#
print @ARGV;
foreach $file (@ARGV) {
#
# prepare variables
#
    $curr_eq = "";
    $eq_to_print = 0;
    @file_line = ();
    %tag_refs = ();
    $hdr_row = -1;
    $tag_field = -1;
    $eq_field = -1;
    $ex_eq_field = -1;
    @field_maxw = ();
    @field_minw = ();
    $compress_all = 0;
#
# designate script and output file
#
    if ( -e $file ) {
	$dictfile = $file;
	$dictfile =~ s/\..*$//;
	$dictfile .= $File_ext_output;
    } elsif ( -e $file . $File_ext_input ) {
	$dictfile = $file . $File_ext_output;
        $file .= $File_ext_input;
    } else {
	print " not found as an input file\n";
	next;
    }

#
# echo filename
#
    print "Processing: $file";
    $src_name = $file;
    
#
# read FRS file into script
#
    open( INPUTFILE, $file );
    while (<INPUTFILE>) {
#
	my $i;
#	print @_;
# strip off newline
	chomp;
#
# Save line
#
	push @file_line, $_;
#
	@curr_line = split "\t"; # split line on tabs
#
# See if header row found
# Note: Extended equation field must be next after equation
# Note: Equation may be in same field as target tag
#
	if( ($hdr_row == -1)
	   && ( /$tag_hdr_re/
	       || /$eq_hdr_re/
	       || /$ex_eq_hdr_re/ ) ) {
	    $hdr_row = $.;
	    print ("Header row:",$.);
	    foreach $term (0..$#curr_line) {
		if( ($tag_field == -1)
		   && ($curr_line[$term] =~ /$tag_hdr_re/) ) {
		    $tag_field = $term;
		    print ("Target tag in column: $term");
		}
		if( ($eq_field == -1)
		   && ($curr_line[$term] =~ /$eq_hdr_re/) ) {
		    $eq_field = $term;
		    print ("Equation in column: $term");
		    $term++;
		    if( $curr_line[$term] =~ /$ex_eq_hdr_re/ ) {
			$ex_eq_field = $term;
			print ("Extended equation in column: $term");
		    } # if extended eq
		} # if
	    } # foreach
	} # if
#
# Merge equation and extended equation fields
#
	if( $ex_eq_field != -1 ) {
	    if( length $curr_line[$ex_eq_field] > 0 ) {
		$curr_line[$eq_field] = $curr_line[$eq_field]
		    . " " 
		    . $curr_line[$ex_eq_field];
		$curr_line[$ex_eq_field] = "";
	    }
	    $field_maxw[$ex_eq_field] = 0;
	    $field_minw[$ex_eq_field] = 0;
	}
#
# Check field widths
#
	foreach $i (0..$#curr_line) {
	    if ($i != $ex_eq_field) {
		$field_maxw[$i] = &GREATER($field_maxw[$i],
					   length $curr_line[$i]);
		$field_minw[$i] = &GREATER($field_minw[$i],
					   &MAXLEN( split(/ +/,$curr_line[$i]) ));
	    }
	}
    }
    print "$. lines read";
    close( INPUTFILE );
#
# Some basic error checking
#
    print "ERROR! No header record located" if( $hdr_row == -1 );
    print "ERROR! No equation field located" if( $eq_field == -1 );
    next if( $eq_field == -1 );
#
    $fieldn = $#field_maxw;
    if( $#field_minw != $fieldn ) {
	print ("ERROR! Field width counters do not match:",
	       $#field_maxw,
	       $#field_minw);
    }
#
# Calculate comment line length
#
    $compress_all = 0;
    @compress = ();
    foreach $i (0..$fieldn) {
	$compress[$i] = 0;
	if ($i != $ex_eq_field) {
	    $comp_ratio[$i] = $field_maxw[$i] / $field_minw[$i];
	} else {
	    $comp_ratio[$i] = 0;
	}
    }
#
    @field_w = @field_maxw;
    $comment_line_len = &SUM (@field_w, $comment_extra_chars, $fieldn * 2);
    $comment_line_len -= 2 if ($ex_eq_field != -1);
#
# Comment line compression algorithm:
#
# while line too long
#   select [equation or] field with greatest compression ratio
#   set compress to 1
#   set field widths of compressed fields to greater of minw or
#     proportional space available
#
    while(($comment_line_len > $EDICT_line_limit) &&
	  (&SUM(@compress) <= ($ex_eq_field == -1 ? $fieldn-1 : $fieldn))) {
	my $comp_idx = &IDXMAX (@comp_ratio);
	$comment_line_len -= $field_maxw[$comp_idx];
	$comment_line_len += $field_minw[$comp_idx];
	$compress[$comp_idx] = 1;
	$comp_ratio[$comp_idx] = 0;
    } # while
#
    print ("Compress:",@compress);
#
# Proportionally divide up remaining space based on minw
#
    if ($EDICT_line_limit < $comment_line_len) {
	$compress_all = 1;
    }
#
    my $comment_free_space = ($EDICT_line_limit
			      - $comment_extra_chars
			      - $fieldn * 2);
    $comment_free_space -= 2 if ($ex_eq_field != -1);
    my $comment_field_total = 0;
#
    foreach $i (0..$fieldn) {
	if ($i != $ex_eq_field) {
	    $compress[$i] = 1 if ($compress_all == 1);
	    if( $compress[$i] ) {
		$comment_field_total += $field_minw[$i];
	    } else {
		$comment_free_space -= $field_maxw[$i];
	    }
	}
    } # for
#
    foreach $i (0..$fieldn) {
	if ($i != $ex_eq_field) {
	    $field_w[$i] = ($compress[$i]
			    ? int($comment_free_space
				  * $field_minw[$i]
				  / $comment_field_total)
			    : $field_maxw[$i]);
	}
    } # for
#
    print ("Widths:", @field_w);
#
#===========================================================
#
# OUTPUT RESULTS
#

#
# open output file
#
    if( -e $dictfile ) {
	print "Deleting old $dictfile\n";
    }

    open ( DICTFILE, ">$dictfile" );
    
#
# compose comment format
#
    $format = "format DICTFILE = \n/*";
    foreach $i (@field_w) {
	if ($i > 0) {
	    $format = $format . "~~^" . "<" x ($i-1);
	}
    }
    $format = $format . "~~*/\n";
    foreach $i (0..$fieldn) {
	if ($i != $ex_eq_field) {
	    $format = $format . '$curr_line[' . $i . '],';
        }
    }
    $format = $format . "\n.\n";

    print $format;
    eval $format; 
    die $@ if $@;

#
# print header
#
    printf DICTFILE ("%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n%s\n",
		     &FORMAT_HEADER ($dictfile,$script_name,$src_name,$script_version));

    foreach (@file_line) {
	my $i;
#
	@curr_line = split "\t";
	if( $ex_eq_field != -1 ) {
	    if( length $curr_line[$ex_eq_field] > 0 ) {
		$curr_line[$eq_field] = $curr_line[$eq_field]
		    . " " 
		    . $curr_line[$ex_eq_field];
		$curr_line[$ex_eq_field] = "";
	    }	    
	}

#
# evaluate line for equation
#
	if( (length $curr_line[$eq_field] > 0) &&
	   ($curr_line[$tag_field] =~ /($tagname_re)/) ) {
	    if( $eq_to_print ) {
		print DICTFILE "\n$curr_eq";
	    }

	    print DICTFILE "\n{\$TAG($1)";
	    $eq_to_print = 1;
	    
	    $curr_eq = (($eq_field == $tag_field)
			? &FORMAT_EQUATION ($curr_line[$eq_field])
			: &FORMAT_EQUATION ($curr_line[$eq_field],
					    $curr_line[$tag_field]) );
	}

#
# spit out original line as a C comment
#
	write DICTFILE;
    } # foreach line
#
# print final eq
#
    if( $eq_to_print ) {
	print DICTFILE "\n$curr_eq\n";
    }

    close( DICTFILE );
} # for
    
#===========================================================
#
# define subroutines
#

sub FORMAT_EQUATION {
#
# Add equation terminator
#
    my $curr_eq = "$_[0]$EDICT_terminate";

#
# Establish the left-hand side
# Eliminate leading ".CV ="
# Prepend "Tagname.CV ="
#
    if( $#_ > 0 ) {
	$curr_eq =~ s/^\s*[.0-9A-Z_a-z]+\s*=\s*//;
	$curr_eq = "$_[1] = $curr_eq";
    }
#
# Declare all tags
#
    $curr_eq =~ s/($tagname_re)/\$TAG($1)/g;

#
# Replace OR with C ||
#
    $curr_eq =~ s/ (?i)OR / || /g;

#
# Replace square and curly braces with parens
#
    $curr_eq =~ s^[{]^\(^g;
    $curr_eq =~ s^[}]^\)^g;
    $curr_eq =~ s^\[^\(^g;
    $curr_eq =~ s^\]^\)^g;

#
# Find keyword 'where' and prepend subexpression
#
    $curr_eq =~ s/(.*)\s*where\s*(.*)/$2\n$1$EDICT_terminate/;

#
# Suppress EU after constant
#
    $curr_eq =~ s^(\d+) +((?i)[0-9A-Z /]+)^$1^g;

#
# Check for obvious syntax errors
#
    if ( $curr_eq =~ /(\w+\s+\w+)/ ) {
	$curr_eq .= "/* ERROR! Illegal expression: $1 */\n";
    }

#
# Check line length
#
    if ( length $curr_eq > $EDICT_eq_char_limit ) {
	$curr_eq .= "/* ERROR! Equation too long. */\n";
    }

#
# break up output line
#
#    $curr_eq =~ s/\) +(?!=)/\)$EDICT_continue/g;
    $curr_eq =~ s/= */=$EDICT_continue/g;
    $curr_eq =~ s/ *([+|]+)/$EDICT_continue$1/g;
    $curr_eq =~ s/ +(?=\()/$EDICT_continue/g;

    $curr_eq;
}

sub GREATER {
    $_[1] > $_[0] ? $_[1] : $_[0];
}

sub IDXMAX {
    my $i;
    my $idx = 0;
    if( $#_ > 0 ) {
	foreach $i (1..$#_) {
	    $idx = $i if( $_[$i] > $_[$idx] );
	}
    }
    $idx;
}

sub MAXLEN {
    my $max = 0;
    my $foo;
    foreach $foo (@_) {
	my $i = length $foo;
	$max = $i if $max < $i;
    }
    $max;
}

sub SUM {
    my $sum = 0;
    my $foo;
    foreach $foo (@_) {
	$sum += $foo;
    }
    $sum;
}

sub FORMAT_HEADER {
    my ($dictfile,$script_name,$src_name,$script_version,$junk) = @_;
    my $the_time = localtime;
    my @hdr_text;
    $hdr_text[0] = "PINODE=localhost:5450";
    $hdr_text[1] = ("/*====================================================================*/");
    $hdr_text[2] = sprintf ("/* %-66s */", "$dictfile - EDICT dictionary");
    $hdr_text[3] = ("/*====================================================================*/");
    $hdr_text[4] = ("/* Revision history:                                                  */");
    $hdr_text[5] = sprintf  ("/* %-66s */", "1.0  $the_time" );
    $temp = 43 - length $script_name;
    $hdr_text[6] = sprintf  ("/*      Created by $script_name from: %-$temp"."s */", $src_name);
    $hdr_text[7] = ("/*====================================================================*/");
    $temp = 57 - length $script_name;
    $hdr_text[8] = sprintf  ("/* $script_name version %-$temp"."s */", $script_version);
    $hdr_text[9] = ("/* EDICT Dictionary Generator                                         */");
    $hdr_text[10] = ("/* (c)1998-1999 by Exele Information Systems, Inc.                    */");
    $hdr_text[11] = ("/* 445 W. Commercial St., East Rochester, NY 14445  (716)385-9740     */");
    $hdr_text[12] = ("/*====================================================================*/");
    @hdr_text;
}