Previous Previous chapter · Next Next chapter · Contents Table of Contents

Chapter 7 : ADVANCED TOPICS

The material presented so far allows you to write powerful SNOBOL4 programs. In this chapter, we will examine other interesting and useful features of the language.

7.1 THE ARBNO FUNCTION

This function produces a pattern which will match zero or more consecutive occurrences of the pattern specified by its argument. As its name implies, ARBNO is useful when an arbitrary number of instances of a pattern may occur. For example, ARBNO(LEN(3)) matches strings of length 0, 3, 6, 9, .... There is no restriction on the complexity of the pattern argument.

Like the ARB pattern, ARBNO is shy, and tries to match the shortest possible string. Initially, it simply matches the null string. If a subsequent pattern component fails to match, SNOBOL4 backs up, and asks ARBNO to try again. Each time ARBNO is retried, it supplies another instance of its argument pattern. In other words, ARBNO(PAT) behaves like

    ( '' |  PAT | PAT PAT | PAT PAT PAT | ... )
Also like ARB, ARBNO is usually used with adjacent patterns to "draw it out." Let's consider a simple example. We want to write a pattern to test for a list. We'll define a list as being one or more numbers separated by comma, and enclosed by parentheses. Use CODE.SNO to try this definition:
    ?       ITEM = SPAN('0123456789')
    ?       LIST = POS(0) '(' ITEM  ARBNO(',' ITEM) ')' RPOS(0)
    ?       '(12,345,6)' LIST
    Success
    ?       '(12,,34)' LIST
    Failure
ARBNO is retried and extended until its subsequent, ')', finally matches. POS(0) and RPOS(0) force the pattern to be applied to the entire subject string.

Alternation may be used within ARBNO's argument. This pattern matches any number of pairs of certain letters:

    ?       PAIRS = POS(0) ARBNO('AA' | 'BB' | 'CC') RPOS(0)
    ?       'CCBBAAAACC' PAIRS
    Success
    ?       'AABBB' PAIRS
    Failure

7.2 RECURSIVE PATTERNS

This is the pattern analogue of a recursive function -- a pattern is defined in terms of itself. The unevaluated expression operator makes the definition possible.

Suppose we wanted to expand the previous definition of a list to say that a list item may be a span of digits, or another list. The definition proceeds as before, except that the unevaluated expression operator is used in the first statement; the concept of a list has not yet been defined:

    ?       ITEM = SPAN('0123456789') | *LIST
    ?       LIST = '(' ITEM  ARBNO(',' ITEM) ')'
    ?       TEST = POS(0) LIST RPOS(0)
    ?       '(12,(3,45,(6)),78)' TEST
    Success
    ?       '(12,(34)' TEST
    Failure
Recursion occurs because LIST is defined in terms of ITEM, which is defined in terms of LIST, and so on. Note that functions POS(0) and RPOS(0) were "moved out one level," to TEST, because LIST must now match substrings within the subject.

In our previous discussion of recursive functions, we said they work because successive calls present the function with progressively simpler problems, until the problem can be solved without further recursion. Similarly, patterns ITEM and LIST are applied to successively smaller substrings, until ITEM can use its SPAN() alternative instead of invoking LIST again.

In general, you will need an alternative somewhere in the recursive loop to allow the pattern matcher "a way out." Also, you should place recursive objects last in a series of alternatives, so that the simpler, nonrecursive patterns are attempted first and "recursive plunges" can be avoided.

SNOBOL4 saves information on a "pattern stack" during the pattern match process. Heavily recursive patterns and long subject strings can sometimes result in stack overflow. If this occurs, you should break the problem apart into several smaller pattern matches.

As recursive patterns use the unevaluated expression operator, it is sometimes necessary to disable SNOBOL4's heuristics by setting &FULLSCAN = 1.

7.3 QUICKSCAN AND FULLSCAN

Pattern matching can be very time-consuming because of the number of possibilities which must be attempted. In the normal "quickscan" mode, SNOBOL4 stops searching for a match when it

thinks further efforts would be futile. The heuristics are complex, but can be summarized as follows: pattern matching fails when there are insufficient subject characters to satisfy the remaining pattern components.

The cursor operator can be used to demonstrate at what point SNOBOL4 gives up. For example, in the pattern match

    ?       'ABCD' @OUTPUT 'X' LEN(3)
    0
    Failure
SNOBOL4 does not attempt to match 'X' against 'B' because fewer than 3 subject characters remain after it, and LEN(3) could never succeed.

A second type of heuristic is the "one character assumption" for unevaluated expressions. SNOBOL4 assumes that unevaluated expressions will match at least one character. This heuristic was originally provided to break recursive loops, but can cause programming problems when an unevaluated expression must match the null string. Consider a pattern which succeeds if 'B' is at least 4 character positions beyond an 'A' in the subject:

    ?       P = 'A' ARB $ X 'B' *GE(SIZE(X), 4)
    ?       'A12345BC' P
    Success
    ?       'A12345B' P
    Failure
The characters between 'A' and 'B' are matched by ARB, and immediately assigned to X. The size of X is then compared to 4 by the GE function, which succeeds and returns the null string. This null string result should not interfere with the pattern match, but we find the pattern misbehaves when 'B' is the last character of the subject. The unevaluated expression operator made SNOBOL4 assume a one character length for the GE function, and matching 'B' against the last subject character was never attempted.

For most pattern matching, heuristics are invisible. However, there are circumstances when we would like SNOBOL4 to be exhaustive in its match attempts. We can disable heuristics and enter "fullscan" mode by setting keyword &FULLSCAN nonzero:

    ?       &FULLSCAN = 1
    ?       'A12345B' P
    Success
    ?       'ABCD' @OUTPUT 'X' LEN(3)
    0
    1
    2
    3
    4
    Failure
The quickscan mode can be reinstated by setting &FULLSCAN = 0.

7.4 OTHER PRIMITIVE PATTERNS

We can accomplish quite a lot with just the primitive patterns ARB and REM. However, there are five additional patterns which you should be aware of:

7.5 OTHER FUNCTIONS

I'd like to briefly point out a few more built-in functions. See Chapter 8 of the Reference Manual for a complete description of their use.
APPLY Allows an indirect call to a function through a variable.
CONVERT Provides explicit conversion from one data type to another. Chapter 6 of the Reference Manual, "Data Types and Conversion," describes the conversions possible.
ENDFILE Closes a file and detaches all variables associated with it.
ITEM Allows an indirect reference to an array or table.
LPAD & RPAD These are padding functions, which will pad a string on its left or right side with blanks or a given character. Padding is provided to a specified width, and is useful when producing columnar output.

7.6 OTHER UNARY OPERATORS

Operation: Negation
Symbol: ~ (tilde)

Operation: Interrogation
Symbol ? (question mark)

7.7 RUN-TIME COMPILATION

The two functions described below are among the most esoteric features, not just of SNOBOL4, but of all programming languages in existence. While your program is executing, the entire SNOBOL4 compiler is just a function call away.

A SNOBOL4 program is nothing more than a string of characters. The functions EVAL and CODE let you supply the compiler with character strings from within the program itself.

7.7.1 The EVAL Function

This function is used to evaluate an expression. Its argument may take a number of forms:

  1. If the argument is an integer, or a number in string form, the number is returned as the function result:
        ?       OUTPUT = EVAL(19)
        19
    

  2. If the argument is an unevaluated expression, it is evaluated using current values for any variables it might contain. EVAL returns the expression's value as its result:
        ?       E = *('N SQUARED IS ' N ** 2)
        ?       N = 15
        ?       OUTPUT = EVAL(E)
        N SQUARED IS 225
    
    This is similar to our earlier use of unevaluated expressions with patterns. In this case, however, the unevaluated expression operator (*) must be applied to the entire expression to create an object with the EXPRESSION data type.

  3. If the argument is a string (other than a simple number), EVAL tries to compile it as a SNOBOL4 expression. Only an expression is permitted -- not an entire SNOBOL4 statement:
        ?       OUTPUT = EVAL('3 * N + 2')
        47
    
    If the string compiles without error, EVAL then evaluates the expression and returns the result.

It is this last use of EVAL -- to compile a string -- which is the most interesting. Here is a trivial program which behaves like a simple desk calculator.

    LOOP    OUTPUT = EVAL(INPUT)                 :S(LOOP)
    END
You can easily try it by placing a semicolon after the GOTO to protect it from CODE.SNO's own machinations:
    ?LOOP   OUTPUT = EVAL(INPUT)                 :S(LOOP);
    4 * (5 - 2) / 2
    6
    N + 1
    16
    ^Z
The program reads a line of input, compiles and evaluates it, and displays the result. Each expression you enter must be wellformed according to SNOBOL4's syntax rules. In particular, this means there must be blanks around the binary operators.

The BNF program included with Vanilla SNOBOL4 demonstrates that EVAL's power is useful even if the input data does not conform to SNOBOL4 syntax. It reads a definition of a grammar from a file, and converts it to SNOBOL4 patterns.

EVAL fails if evaluation of the argument fails, or if the argument contains a syntax error. The SNOBOL4 keyword &ERRTEXT will contain a string describing the error.

The expressions used with EVAL may return any SNOBOL4 data type, not just numbers. For instance, the expression might construct a new pattern, and return it as the result:

    ITEM = EVAL('SPAN("0123456789") | *LIST')
Note that EVAL can only call the compiler with a string argument. If we used a pattern as the argument, we would produce an execution error:
    ITEM = EVAL(SPAN("0123456789") | *LIST)  (incorrect)

7.7.2 The CODE Function

CODE accepts a string argument containing one or more statements to be compiled. Multiple statements are separated by semicolons (;). Statements may be labeled, and can include all the usual components -- subject, pattern, replacement, and GOTO. However, comment and continuation statements are not permitted.

The CODE function compiles the statements, and returns a pointer to the resulting object code. It fails if any statement contains an error, and places an error message in &ERRTEXT.

There are two ways to execute the new object code.

  1. Transfer to a label which is defined in the new code:
        *  Compile a sample piece of code:
                S = 'L OUTPUT = N; N = LT(N,10) N + 1  :S(L)F(DONE)'
                CODE(S)
        *  Transfer to a label in it:
                                                       :(L)
        *  Come here when the new code transfers back.
        DONE     . . .
    
    Notice how we placed a GOTO from the new code back to label DONE in the main program. If we had not done this, SNOBOL4 would terminate when execution "fell out of the bottom" of the new code block.

  2. The pointer returned by the CODE function can be used in a "direct GOTO" to transfer to the first statement in the code block. A direct GOTO is performed by enclosing the pointer in angular brackets in the GOTO field:
        *  Compile a sample piece of code:
                S = 'L OUTPUT = N; N = LT(N,10) N + 1  :S(L)F(DONE)'
                C = CODE(S)
        *  Transfer to the first statement in the block:
                                                       :<C>
        DONE     . . .
    

Labels contained in the new program fragment override any labels of the same name in your main program. This provides the ability to write self-modifying SNOBOL4 programs, and makes the division between "code" and "data" far less distinct than in other high-level languages.


Previous Previous chapter · Next Next chapter · Contents Table of Contents