Previous Previous chapter · Next Next chapter · Contents Table of Contents

Chapter 8 : DEBUGGING AND PROGRAM EFFICIENCY

8.1 DEBUGGING AND TRACING

You are probably well aware of the diversity of potential errors when writing computer programs. They range from simple typographical errors made while entering a program, to subtle design problems which may only be revealed by unexpected input data.

Debugging a SNOBOL4 program is not fundamentally different than debugging programs written in other languages. However, SNOBOL4's syntactic flexibility and lack of type declarations for variables produce some unexpected problems. By way of compensation, an unusually powerful trace capability is provided.

Of course, there may come a time when you can't explain your program's behavior, and decide "the system" is at fault. No guarantee can ever be made that SNOBOL4 is completely free of errors. However, its internal algorithms have been in use in other SNOBOL4 systems since 1967, and all known errors have been removed. Often the problem is a misunderstanding of how a function works with exceptional data, and a close reading of the reference section clears the problem up. In short, suspect the system last.

8.1.1 Compilation Errors

Compilation errors are the simplest to find; SNOBOL4 displays the erroneous line on your screen with its statement number, and places a marker below the point where the error was encountered. The source file name, line number, and column number of the error are displayed for use by your text editor. Only the first error in a statement is identified, so you should also carefully check the remainder of the statement. A typical line looks like this:

    32              ,OUTPUT = CNT+ 1
                     ^
    test.sno(57,10) : Compilation Error : Erroneous statement
Here, the comma preceding the word OUTPUT is misplaced. The message indicates that ",OUTPUT" is not a valid language element.

Programs containing compilation errors can still be run, at least until a statement containing an error is encountered. When that happens, SNOBOL4 will produce an execution error message, and stop.

A complete description of error messages is provided in Chapter 9 of the Reference Manual, "System Messages."

8.1.2 Execution Errors

Once a program compiles without error, testing can begin. Two kinds of errors are possible: SNOBOL4 detectable errors, such an incorrect data type or calling an undefined function, and program logic errors that produce incorrect results.

With the first type of error, you'll get a SNOBOL4 error message with statement and line numbers. Inspecting the offending line will often reveal typing errors, such as a misspelled function name, keyword, or label. If the error is due to incorrect data in a variable -- such as trying to perform arithmetic on a non-numeric string -- you'll have to start debugging to discover how the incorrect data was created. Placing output statements in your program, or using the trace techniques described below, will usually find such errors.

Here are some common errors to look for first:

  1. Setting keywords &ANCHOR, &FULLSCAN, and &TRIM improperly. We may have written a program with anchored pattern matching in mind, but let an unanchored match slip in inadvertently. Forgetting to set &TRIM to 1 causes blanks to be appended to input lines, and they usually interfere with pattern matching and conversion of a string to an integer.

  2. Misspelled variable names. Using PUTPUT instead of OUTPUT, as in:
        PUTPUT = LINE1
    
    creates a new variable and assigns LINE1 to it. Worse still is using a misspelled name as a value source, since it will return a null string value.

    The first type of error is relatively easy to find -- produce an end-of-run dump by using the SNOBOL4 command line option /D. You can study the list of variables for an unexpected name. The second type of error is naturally much harder to find, because variables with null string values are omitted from the end-of-run dump. In this case, you will have to study the source program closely for misspellings.

  3. Spurious spaces between a function name and its argument list. A line like:
        LINE = TRIM (INPUT)
    
    is not a call to the TRIM function. The blank between TRIM and the left parenthesis is interpreted as concatenating

    variable TRIM with the expression (INPUT). TRIM used as a variable is likely to be the null string, so INPUT is returned unchanged.

  4. No blank space after a binary operator. SNOBOL4 sees a unary operator instead, with completely unexpected results. For instance:
        X = Y -Z
    
    concatenates Y with the expression -Z.

  5. Confusion occurring when a variable contains a number in string form. When used as an argument to most functions, conversion from string to number is automatic, and proper execution results. However, functions IDENT and DIFFER do not convert their arguments, and seemingly equal values are thought to be different. For example, if we want to test an input line for the number 3, the statements:
        N = INPUT
        IDENT(N, 3)                               :S(OK)
    
    are not correct. N contains a string, which is a different data type from the integer 3. This could be corrected by using IDENT(+N, 3), or EQ(N, 3). Once again, &TRIM should be 1, or the blanks appended to N will prevent its conversion to an integer.

  6. Omitting the assignment operator when we wish to remove the matching substring from a subject, resulting in a program which loops forever. For example, our word-counting program replaced each word with the null string:
        NEXTWRD LINE WRDPAT =                        :F(READ)
    
    However, by omitting the equal sign we would repeatedly find the same first word in LINE:
        NEXTWRD LINE WRDPAT                          :F(READ)
    
  7. Unexpected statement failure, with no provision for detecting it in the GOTO field. For example, the CONVERT function fails if the table being converted is empty:
        RESULT = CONVERT(TALLY, "ARRAY")
    
    RESULT will not be set if CONVERT fails, and a subsequent array reference to RESULT would produce an execution error.

  8. Failure can be detected but misinterpreted when there are several causes for it in a statement. This statement fails when an End-of-File is read, or if the input line does not contain any digits:
        INPUT SPAN('0123456789') . N         :F(EOF)
    
    In the latter case, if we want to generate an error message, the statement should be split in two:
        N = INPUT                            :F(EOF)
        N SPAN('0123456789') . N             :F(WARN)
    
  9. Using operators such as alternation (|) and conditional assignment (.) for purposes other than pattern construction. Using them in the subject field will produce an 'Illegal data type' error message. Using them in the replacement field produces a pattern, intended for subsequent use in a pattern match statement. For example, this statement sets N to a pattern; it does not replace it with the words 'EVEN' or 'ODD', as was probably intended:
        N = EQ(REMDR(N,2),0) 'EVEN' | 'ODD'
    
    We note in passing that SNOBOL4+, Catspaw's professional SNOBOL4 package, provides language extensions that allow just that:
        N = (EQ(REMDR(N,2),0) 'EVEN', 'ODD')
    
  10. Forgetting that functions like TAB and BREAK bind subject characters. This won't matter for simple pattern matching, but for matching with replacement, problems can appear. For example, suppose we wanted to replace the 50th character in string S with '*'. If we used:
        S TAB(49) LEN(1) = '*'
    
    we would find the first 50 characters replaced by a single asterisk. Instead, we should say:
        S POS(49) LEN(1) = '*'
    
    or, even more efficiently:
        S TAB(49) . FRONT LEN(1) = FRONT '*'
    
  11. Omitting the unevaluated expression operator when defining a pattern containing variable arguments. For example, the pattern
        NTH_CHAR = POS(*N - 1) LEN(1) . CHAR
    
    will copy the Nth subject character to variable CHAR. The pattern adjusts automatically if N's value is subsequently changed. Omitting the asterisk would capture the value of N at the time the pattern is defined (probably the null string).

8.1.3 Simple Debugging

These simple methods should find a majority of your bugs:

  1. Set keyword &DUMP nonzero, or use command line option /D to get an end-of-run dump. Examine it closely for reasonable values and variable names. Dumps can also be produced at any time during execution by calling the built-in function DUMP.

  2. Use keyword &STLIMIT to end execution after a fixed number of statements.

  3. Use the keyboard Control-C key to interrupt a program which is looping endlessly, and record the statement number.

  4. Use the GOTO :F(ERROR) to detect unexpected failures and data errors. Do not define the label ERROR -- SNOBOL4 will display the statement number of the error if an attempt is made to transfer to label ERROR.

  5. Assign values to OUTPUT to monitor data values. Use immediate assignment and cursor assignment (to OUTPUT) to observe the operation of a pattern match.

  6. Produce end-of-run statistics with the command line option /S. Are the number and kind of operations reasonable?

  7. Use the CODE.SNO program to setup simple test cases. This is particularly useful when pattern-matching statements do not behave as expected.

More subtle errors can be pinpointed using SNOBOL4's trace facility, described below.

8.2 EXECUTION TRACING

Tracing the flow of control and data in a program is usually the best way to find difficult problems. SNOBOL4 allows tracing of data in variables and some keywords, transfers of control to specified labels, and function calls and returns. Two keywords control tracing: &FTRACE and &TRACE.

8.2.1 Function Tracing

Keyword &FTRACE is set nonzero to produce a trace message each time a program-defined function is called or returns. The trace message displays the statement number where the action occurred, the name of the function, and the values of its arguments. Function returns display the type of return and value, if any. Each trace message decrements &FTRACE by one, and tracing ends when &FTRACE reaches zero. A typical trace messages looks like this:

    STATEMENT 39: LEVEL 0 CALL OF SHIFT('SKYBLUE',3),TIME = 140
    STATEMENT 12: LEVEL 1 RETURN OF SHIFT = 'BLUESKY',TIME = 141
The level number is the overall function call depth. The program execution time in tenths of a second is also provided.

8.2.2 Selective Tracing

Keyword &TRACE will also produce trace messages when it is set nonzero. However, the TRACE function must be called to specify what is to be traced. Tracing can be selectively ended by using the STOPTR function. The TRACE function call takes the form:

    TRACE(name, type, string, function)
The name of the item being traced is specified using a string or the unary name operator. Besides variables, it is also possible to trace a particular element of an array or table:
    TRACE('VAR1', ...
    TRACE(.A<2,5>, ...
    TRACE('SHIFT', ...
"Type" is a string describing the kind of trace to be performed. If omitted, a VALUE trace is assumed:
'VALUE' Trace whenever name has a value assigned to it. Assignment statements, as well as conditional and immediate assignments within pattern matching will all produce trace messages.
'CALL' Produce a trace whenever function name is called.
'RETURN' Produce a trace whenever function name returns.
'FUNCTION' Combine the previous two types: trace both calls and returns of function name.
'LABEL' Produce a trace when a GOTO transfer to statement name occurs. Flowing sequentially into the labeled statement does not produce a trace.
'KEYWORD' Produce a trace when keyword name's value is changed by the system. The name is specified without an ampersand. Only keywords &ERRTYPE, &FNCLEVEL, &STCOUNT, and &STFCOUNT may be traced.
When the first argument is specified with the unary name operator, the third argument, string, will be displayed to identify the item being traced:
    TRACE(.T<"zip">, "VALUE", "Table entry 'zip'")
The last argument, function, is usually omitted. Its use is described in the next section.

The form of trace message displayed for each type of trace is listed in Chapter 9 of the Reference Manual, "System Messages."

Each time a trace is performed, keyword &TRACE is decreased by one. Tracing stops when it reaches zero. Tracing of a particular item can also be stopped by function STOPTR:

    STOPTR(name, type)

8.2.4 Program Trace Functions

Normally, each trace action displays a descriptive message, such as:

    STATEMENT 371: SENTENCE = 'Ed ran to town',TIME = 810
Instead, we can instruct SNOBOL4 to call our own programdefined function. This allows us to perform whatever trace actions we wish. We define the trace function in the normal way, using DEFINE, and then specify its name as the fourth argument of TRACE. For example, if we want function TRFUN called whenever variable COUNT is altered, we would say:
   &TRACE = 10000
   TRACE('COUNT', 'VALUE', , 'TRFUN')
   DEFINE('TRFUN(NAME,ID)')             :(TRFUN_END)
    . . .
TRFUN will be called with the name of the item being traced, 'COUNT', as its first argument. If a third argument was provided with TRACE, it too is passed to your trace function, as ID. (Here the argument was omitted.) To use trace functions effectively, we must pause to describe a few more SNOBOL4 keywords:
&LASTNO The statement number of the previous SNOBOL4 statement executed.

&STCOUNT The total number of statements executed. Incremented by one as each statement begins execution.

&ERRTYPE Error message number of the last execution error.

&ERRLIMIT Number of nonfatal execution errors allowed before SNOBOL4 will terminate.
The first three keywords are continuously updated by SNOBOL4 as a program is executed.

Now, let's consider debugging a program where variable COUNT is inexplicably being set to a negative number. Continuing with the previous example, the function body would look like this:

            &TRACE = 10000
            TRACE('COUNT', 'VALUE', , 'TRFUN')
            DEFINE('TRFUN(NAME,ID)TEMP')         :(TRFUN_END)

    TRFUN   TEMP = &LASTNO
            GE($NAME, 0)                         :S(RETURN)
            OUTPUT = 'COUNT negative in statement ' TEMP  :(END)
    TRFUN_END
The first statement of the function captures the number of the last statement executed -- the statement that triggered the trace. We then check COUNT, and return if it is satisfactory. If it is negative, we print an error message and stop the program.

When a trace function is invoked, keywords &TRACE and &FTRACE are temporarily set to zero. Their values are restored when the trace function returns. There is no limit to the number of functions or items which may be traced.

Tracing keyword &STCOUNT will call your trace function before every program statement is executed.

Program CODE.SNO traces keyword &ERRTYPE to trap nonfatal execution errors from your sample statements, and produce an error message. Keyword &ERRLIMIT must be set nonzero to prevent SNOBOL4 from terminating when an error occurs.

8.3 PROGRAM EFFICIENCY

To a greater extent than other languages, SNOBOL4 programs are sensitive to programming methods. Often, there are many different ways to formulate a pattern match, and some will require many more match attempts than others.

As you work with SNOBOL4, you will develop an intuitive feel for the operation of the pattern matcher, and will write more efficient patterns. I can, however, start you off with some general rules:

  1. Try to use anchored, quickscan, and trim modes when possible. If operating unanchored, artificially anchor whenever possible by using POS(0) or FENCE as the first subpattern.

  2. Try to use BREAK and SPAN instead of ARB.

  3. Use ANY instead of an explicit list of one-character strings and the alternation operator.

  4. LEN, TAB and RTAB are faster than POS and RPOS. The former "step over" subject characters in one operation; the latter continually fail until the subject cursor is positioned correctly. But be careful of misusing them with replacement and replacing more than you expected.

  5. Use conditional assignment instead of immediate assignment in pattern matching.

  6. Use IDENT and DIFFER to compare strings for equality, instead of pattern matching. Since each unique string is stored only once in SNOBOL4, these functions merely compare one-word pointers, regardless of string length. By contrast, pattern matching and functions such as LGT must perform character by character comparisons.

  7. Avoid ARBNO and recursion if possible.

  8. Pattern construction is time-consuming. Preconstruct patterns and store them in variables whenever possible.

  9. Keep strings modest in length. Although SNOBOL4 allows strings to be thousands of characters long, operating upon them is very time-consuming. They use large amounts of memory, and force SNOBOL4 to frequently rearrange storage.

  10. Use functions to modularize a program and make it easier to understand and maintain.

  11. Avoid algorithms that make a linear search of an array or list. The algorithms can usually be rewritten using tables and indirect references for associative programming.

Efficiency should not be measured purely in terms of program execution time. With the relatively low cost of microcomputers, the larger picture of time spent designing, coding, and debugging a program also must be considered. A direct approach, emphasizing simplicity, robustness, and ease of understanding usually outweighs the advantages of tricky algorithms and shortcut techniques. (But we admit that tricky pattern matching is fun!)


Previous Previous chapter · Next Next chapter · Contents Table of Contents