JPattern

Snobol4-Style Pattern Matching Primitives for Java


Last Updated:27 August 2006
Latest Version:Jpattern 1.1
Minimum JDK Level:JDK 1.5

Table of contents

Introduction

Most current programming languages provide some form of pattern matching. As a rule, the pattern matching is based on regular expressions. The Comit/Snobol3/Snobol4/Spitbol programming languages have provided pattern matching but based on a pattern matching paradigm that is strictly more powerful than standard regular expressions.

The goal of this project is to make Snobol4-style pattern matching available as a package for the Java programming language. Rather than build such a package from scratch, the existing Ada-based Gnat Spitbol Patterns code was converted to Java. The result is generally consistent with that Ada package, although some changes were made to conform to the capabilities and limitations of the Java language. In light of the derived nature of this code, the original GNAT license is assumed to cover this code. The compiler, however is licensed under a BSD license. See the License Section for more details.

Download

The source file may be retrieved at the following location.
DescriptionURL
Source Version 1.1: jpattern-1.1.jar

Dependencies

Installation

The distribution contains both an ant build.xml file and a Makefile. Both have the following major tasks defined. The final product is the jar file called jpattern.jar.

Testing

A set of tests is provided in the directory name tests. A make Makefile and an ant build.xml file are both provided.

The output of the tests is captured and compared to the expected output. Any discrepancies are reported.



Pattern Matching Tutorial

The following introduction is taken from the file G-SPIPAT.ADS. It has been modified to conform to the Java version for various details.

API Overview

A pattern matching operation (a call to one of the Match subprograms) takes a subject string and a pattern. The pattern is matched against the subject string, and either the match fails, or it succeeds matching a contiguous substring.

Pattern Functions

Patterns are constructed using a set of functions defined in the class Pattern. These pattern functions take one of three kinds of argument types.
  1. java.lang.string. This represents a simple string. In some cases, a string of length one will be treated as a single character.

  2. jpattern.pattern.Pattern. This represents a sub-pattern.

  3. jpattern.pattern.Variable. This represents a name that will be interpreted later to refer to a value in a table. See Variables for more details.
Consider the following pattern.
("abc" | "ab") & ("de" | "cd")
This pattern could be construct using the following sequence of Pattern functions.

Pattern pe1 = Pattern.Alt("abc","ab");
Pattern pe2 = Pattern.Alt("de","cd");
Pattern pe3 = Pattern.Concat(pe1,pe2);

Variables

It is possible to use variables within patterns for a number of purposes. A variable is represented as a string matching the following regular expression: "[_a-zA-Z][_a-zA-Z0-9]*" (Note that sometimes RE's are more compact).

As a rule, the variable name should be preceded by the defer operator ("+") or an assignment operator ("." "$" "*" "**") or a replacement operator ("="). The leading "+" is not required unless the variable name is also a keyword.

As a rule, any elementary pattern operator that takes an argument may have a variable as its argument.

During the match, if a reference to a variable is required, then it searched for in the VarMap passed to the Match() function. The value associated with the variable in the map can be of the following types.

jpattern.pattern.Function
If the value associated with the variable in the map is of this type, then the get() method is invoked for that object. The value returned is then re-evaluated.
java.util.Collection
If a java Collection is associated with the variable, and the variable is being used as an immediate assignment (e.g. "*x"), then the value assigned to the variable is appended to the end of the collection. In conjunction with the Fail pattern, this can be used to collect all possible matches of a pattern against a subject string.
java.lang.Object
For this final case, the object is converted to a string and that value is used either for replacement or for matching.
If the variable is not defined in the map, then if it is involved in an assignment, it is inserted into the map. Otherwise, an exception is thrown signalling an undefined variable.

Pattern Specification Language

The string representation of patterns uses the following syntax.

pattern := alt;
alt := cat | cat '|' alt;
cat := replace | replace '&' cat | replace cat;
replace := assign | assign '=' variable;
assign :=   element
          | element '*' variable
          | element '**' variable
          | element '.' variable
          | element '$' variable;
element := operator | '(' pattern ')' | defer | INT | STRING;
operator := 
            | "+" variable
            | "abort" | cancel
            | "any" "(" svarg ")"
            | "arb" "(" svarg ")"
            | "arbno" "(" svarg ")"
            | "bal" | bal "(" svarg ")"
            | "break" "(" svarg ")"
            | "breakx" "(" svarg ")"
            | "fence" | fence "(" svarg ")"
            | "fail"
            | "len" "(" nvarg ")"
            | "notany", "(" svarg ")"
            | "nspan" "(" svarg ")"
            | "pos" "(" nvarg ")"
            | "rem" | "rest"
            | "rpos", "(" nvarg ")"
            | "rtab" "(" nvarg ")"
            | "setcur" "(" defer ")"
            | "span" "(" svarg ")"
            | "succeed"
            | "tab" "(" nvarg ")"
	    ;
svarg := STRING | defer;
nvarg := INT | defer;
defer := variable | "+" variable;

variable := ID;

# Lexical Elements
ID := [_a-zA-Z][_a-zA-Z0-9]* that is not a keyword.
INT := [-]?[0-9]+ | "0x" [0-9a-fA-F]+
STRING := '"'  '"'
Whitespace (" \t\n\r...") and comments may appear in expressions where desired. The comments are either the "//...\n" type or the "/*...*/" type. Be careful to not put an end-of-line escape before any comments on that line. That is, do not write
@len(5) \ //comment 1
bal()@
but rather write this.
@len(5) //comment 1 \
bal()@

Pattern Matching Overview

We will start with two basic functions: concatenation ("&") and alternation ("|"). If A and B are patterns, then "A & B" means try to match A, and if that succeeds, then try to match B starting after A. Note that the & is actually optional, and the juxtiposition of the two patterns is taken to imply concatenation. Thus "A B" is the treated the same as "A & B".

The expression "A | B" means try to match A and if that fails, then try to match B starting at the same place where the attempt to match A began.

There is full backtracking, which means that if a given pattern element fails to match, then previous alternatives are matched. For example if we have the pattern:

(A | B) (C | D) (E | F)
First we attempt to match A, if that succeeds, then we go on to try to match C, and if that succeeds, we go on to try to match E. If E fails, then we try F. If F fails, then we go back and try matching D instead of C. Let's make this explicit using a specific example, and introducing the simplest kind of pattern element, which is a literal string. The meaning of this pattern element is simply to match the characters that correspond to the string characters. Now let's rewrite the above pattern form with specific string literals as the pattern elements:
("ABC" | "AB")("DEF" | "CDE")("GH" | "IJ")
The following strings will be attempted in sequence:
  1. ABC . DEF . GH
  2. ABC . DEF . IJ
  3. ABC . CDE . GH
  4. ABC . CDE . IJ
  5. AB . DEF . GH
  6. AB . DEF . IJ
  7. AB . CDE . GH
  8. AB . CDE . IJ
Here we use the dot simply to separate the pieces of the string matched by the three separate elements.

Moving the Start Point

A pattern is not required to match starting at the first character of the string, and is not required to match to the end of the string. The first attempt does indeed attempt to match starting at the first character of the string, trying all the possible alternatives. But if all alternatives fail, then the starting point of the match is moved one character, and all possible alternatives are attempted at the new anchor point.

The entire match fails only when every possible starting point has been attempted. As an example, suppose that we had the subject string

"ABABCDEIJKL"
matched using the pattern in the previous example:
("ABC" | "AB") & ("DEF" | "CDE") & ("GH" or "IJ")
This would succeed, after two anchor point moves:

"ABABCDEIJKL"
   ^^^^^^^
where the "^^^^^^^" indicates the matched section.

This mode of pattern matching is called the unanchored mode. It is also possible to put the pattern matcher into anchored mode by calling the function Match.setAnchorMode(true) This will cause all subsequent matches to be performed in anchored mode, where the match is required to start at the first character.

We will also see later how the effect of an anchored match can be obtained for a single specified anchor point if this is desired.

Available Pattern Descriptions

Abort/Cancel
Pattern Function: Abort() or Cancel()
Compiler Syntax: abort or cancel
Description: The Abort (Cancel) operator immediately aborts the entire pattern match, signalling failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects. Note: the original spitbol name was Abort, but the GNU version changed it to Cancel because Abort is a reserved Ada word.

Alternation
Pattern Function: Alternate(Pattern A, Pattern B)
Compiler Syntax: A | B
Description: The alternation operator means first attempt to match A, and then if that does not succeed, match B.

Any
Pattern Function: Any(String s)
Compiler Syntax: any(<string>)
Description: The Any(S) operator where S is a string, matches a single character that is any one of the characters in S. Fails if the current character is not one of the given set of characters.

Arb
Pattern Function: Arb()
Compiler Syntax: arb
Description: The Arb operator matches any string. First it matches the null string, and then on a subsequent failure, matches one character, and then two characters, and so on. It only fails if the entire remaining string is matched.

Arbno
Pattern Function: Arbno(String S) or Arbno(Pattern P)
Compiler Syntax: arbno(<pattern>)
Description: The Arbno(P) operator where P is any pattern, matches any number of instances of the pattern, starting with zero occurrences. It is thus equivalent to ("" | (P & ("" | (P & ("" ....))))). The pattern P may contain any number of pattern elements including the use of alternatiion and concatenation.

Bal
Pattern Function: Bal() or Bal(String parens)
Compiler Syntax: bal or bal(<string>)
Description: The Bal operator matches a non-empty string that is parentheses balanced with respect to ordinary () characters. Examples of balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E". Bal matches the shortest possible balanced string on the first attempt, and if there is a subsequent failure, attempts to extend the string. Optionally, the Bal operator can be given a two character string to indicate the parentheses characters. For example Bal("[]") will match balanced square brackets.

Break
Pattern Function: Break(String S)
Compiler Syntax: break(<string>)
Description: The Break(S) operator where S is a string, matches a string of zero or more characters up to but not including a break character that is one of the characters given in the string S. Can match the null string, but cannot match the last character in the string, since a break character is required to be present.

BreakX
Pattern Function: BreakX(String S)
Compiler Syntax: breakx(<string>)
Description: The BreakX(S) operator where S is a string, behaves exactly like Break(S) when it first matches, but if a string is successfully matched, then a susequent failure causes an attempt to extend the matched string.

Concatenation
Pattern Function: Concat(Pattern A, Pattern B)
Compiler Syntax: A & B
Description: The concatenation operator means match A followed immediately by matching B.

Fail
Pattern Function: Fail()
Compiler Syntax: fail
Description: The Fail operator is equivalent to the null alternation. Matches no possible strings, so it always signals failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

Fence
Pattern Function: Fence() or Fence(Pattern p)
Compiler Syntax: fence or Fence(<pattern>)
Description: The Fence operator matches the null string at first, and then if a failure causes alternatives to be sought, aborts the match (like a Cancel). Note that using Fence at the start of a pattern has the same effect as matching in anchored mode.

The Fence(P) operator, where P is a pattern, attempts to match the pattern P including trying all possible alternatives of P. If none of these alternatives succeeds, then the Fence pattern fails. If one alternative succeeds, then the pattern match proceeds, but on a subsequent failure, no attempt is made to search for alternative matches of P. The pattern P may contain any number of pattern elements including the use of alternatiion and concatenation.

Len
Pattern Function: Len(int n)
Compiler Syntax: len(<integer>)
Description: The Len(N) operator where N is a natural number, matches the given number of characters. For example, Len(10) matches any string that is exactly ten characters long.

NotAny
Pattern Function: NotAny(String S)
Compiler Syntax: notany(<string>)
Description: The NotAny(N) operator where S is a string, matches a single character that is not one of the characters of S. Fails if the current characer is one of the given set of characters.

NSpan
Pattern Function: NSpan(String S)
Compiler Syntax: nspan(<string>)
Description: The NSpan(S) operator where S is a string, matches a string of zero or more characters that is among the characters given in the string. Always matches the longest possible such string. Always succeeds, since it can match the null string.

Pos
Pattern Function: Pos(int n)
Compiler Syntax: pos(<integer>)
Description: The Pos(N) operator where N is a natural number, matches the null string if exactly N characters have been matched so far, and otherwise fails.

Rem/Rest
Pattern Function: Rem() or Rest()
Compiler Syntax: rem or rest
Description: The Rem (Rest) operator matches from the current point to the last character in the string. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects. Note: the original spitbol name was Rem, but the GNU version changed it to Rest because Rem is a reserved Ada word.

Rpos
Pattern Function: Rpos(int n)
Compiler Syntax: rpos(<integer>)
Description: The Rpos(N) operator where N is a natural number, matches the null string if exactly N characters remain to be matched, and otherwise fails.

Rtab
Pattern Function: Rtab(int n)
Compiler Syntax: rtab(<integer>)
Description: The Rtab(N) operator where N is a natural number, matches characters from the current position until exactly N characters remain to be matched in the string. Fails if fewer than N unmatched characters remain in the string.

Span
Pattern Function: Span(String S)
Compiler Syntax: span(<string>)
Description: The Span(S) operator where S is a string, matches a string of one or more characters that is among the characters given in the string. Always matches the longest possible such string. Fails if the current character is not one of the given set of characters.

Succeed
Pattern Function: Succeed()
Compiler Syntax: succeed
Description: The Succeed operator repeatedly matches the null string (it is equivalent to the alternation ("" | "" | "" ....)). This is a special pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

Tab
Pattern Function: Tab(int n)
Compiler Syntax: tab(<integer>)
Description: The Tab(N) operator where N is a natural number, matches characters from the current position until exactly N characters have been matched in all. Fails if more than N characters have already been matched.

Recursive Pattern Matching

Defer
Pattern Function: Defer(Variable B)
Compiler Syntax: [+]<name>/td>
Description: The Defer operator (+V) where V is a variable, obtains at match time a value for V from the VarMap argument to the Match function. It then proceeds to match that value as if it had been present in the original pattern at that point. The leading "+" is optional unless the variable name matches a keyword. See also the Variables discussion.

A deferred pattern can be used to create a recursive pattern that will, at pattern matching time, follow the pointer to obtain the referenced pattern, and then match this pattern. Consider for example, the following Java code.

Pattern P
    = Pattern.Alternate("A",
                        Pattern.Concat("B",
                                       Pattern.Defer("P")));
VarMap vars = new VarMap();
vars.put("P",P);
By placing the pattern P into the map with the name P (the name can be anything consistent), we cause P to recurse through the map. The Examples section provides illustrations of recursive patterns.

On the first attempt to match, this pattern attempts to match the string "A". If this fails, then the alternative matches a "B", followed by an attempt to match the value of "P", which is of source the Pattern P. This second attempt now first attempts to match "A", and so on. The result is a pattern that will match a string of B's followed by a single A.

This particular example could simply be written as nspan("B") & "A", but the use of recursive patterns in the general case can construct complex patterns which could not otherwise be built.

Pattern Assignment Operations In addition to the overall result of a pattern match, which indicates success or failure, it is often useful to be able to keep track of the pieces of the subject string that are matched by individual pattern elements, or subsections of the pattern.

Deferred Assignment
Pattern Function: Assign(Pattern P, Variable v)
Compiler Syntax: <pattern> * <variable> or <pattern> . <variable>
Description: The deferred assignment operator takes an arbitrary pattern P and a variable V that will be set to the substring matched by P. The assignment is deferred to the end of the match. If the entire match is successful, and if the pattern P was part of the successful match, then at the end of the matching operation the assignment to S of the string matching P is performed.

Immediate Assignment
Pattern Function: IAssign(Pattern P, Variable v)
Compiler Syntax: <pattern> ** <variable> or <pattern> $ <variable>
Description: The immediate assignment operator takes an arbitrary pattern P P a variable V that will be set to the substring matched by P. This assignment happens during pattern matching, so if P matches more than once, then the assignment happens more than once.

Set Cursor
Pattern Function: Setcur(Variable v)
Compiler Syntax: setcur(<variable>)
Description: The cursor assignment operation assigns the current cursor position to the variable N. The cursor position is defined as the count of characters that have been matched so far (including any start point moves).

Deferred Replacement

Deferred Replacement
Pattern Function: Replace(Pattern P, Variable v)
Compiler Syntax: <pattern>=<variable>
Description: Deferred replacement is similar to deferred assignment. With assignment, a variable is associated with a pattern and when matching completes, the substring matched by that pattern is assigned to the variable.

With deferred replacement, after a successful match, the substring matched by the pattern is replaced by the value of the variable (at the end of matching).

In order to access the modified subject string, the MatchResult argument must used.

The JPattern Pseudo-Compiler

The package jpattern.compiler supports the compilation of string expressions into equivalent Pattern objects. It is capable of converting a string representation of a pattern to equivalent Java code (via a command line interface) or to a Patten object at runtime (via an API).

jpattern.compiler.Main

The command line interface is supported by the program jpattern.compiler.Main.

Summary:

java -classpath jpattern.jar jpattern.compiler.Main OPTIONS

Options:

OptionDescription
-pat '<pattern>' Specify a pattern expression.
-subject="<string>"Specify a subject string.
-var "var=<value>"Specify an initial variable assignment.; may be repeated.
-tag "<string>Specify a string that identifies pattern lines in the standard input. The same tag is used to mark the beginning and the end of the string.
-xmltag "<string>Specify an xml style opening tag that identifies pattern lines in the standard input. The closing tag is the same as the open tag, but with the standard "/". For example, -xmltag <pattern> would mark strings with the tags <pattern>...</pattern>.
-squoteIndicate that pattern expressions are using single quotes instead of the usual double quotes.
-debugTurn on lowest level of debug output.
-debugn <N> Set the debug level to N
-vPrint out the parameters.
-f <filename>Take input from this file
-o <filename>Send non-error output to this file.
-err <filename>Send error output to this file.

Description:

Command Line Usage

There are two primary calling forms, exemplified by the following.
  1. java -classpath jpattern.jar jpattern.compiler.Main
    -pat "arb & len(5)=x" -subject "1234567" -var "x=xyz"

  2. java -classpath jpattern.jar jpattern.compiler.Main
    -tag "@" < javacode.pat > javacode.java
The first command line case performs an actual match. It compiles the "-pat" argument and matches it agains the "-subject" string. It reports if the match succeeds, and if so, prints out the final values of all variables and the final subject line after replacement. The example above, for example would report success and that the final subject line was "xyz67" and the final value of x was "xyz".

The second case reads lines from standard input ("javacode.pat" in this case) and writes them to standard output ("javacode.java" in this case). As each line is read, it is searched for paired occurrences of the java tag string (the argument to the -tag flag or -xmltag flag). The text between the occurrences is assumed to be a pattern expression. Multiple patterns per line is disallowed. That expression is compiled to equivalent Java code to construct the Pattern and is written to standard output in place of the tagged text. Patterns may span multiple lines by ending each line with a backslash character ("\").

Suppose the first example was run with the following input file.

public class Test0
{
    static void  makePattern()
    {
	Pattern Lnum = @pos(0) (Digs ".") span(" ")@;
	Pattern Digs = @span("0123456789")@;
    }
}
It would produce the following output file.
public class Test0
{
    static void makePattern()
    {
	Pattern Lnum = Pattern.Concat(
            Pattern.Pos(0),
            Pattern.Concat(
                Pattern.Concat(
                    Pattern.Defer(Variable.create("Digs")),
                    Pattern.StringPattern(".")
                ),
                Pattern.Span(" ")
            )
        );
	Pattern Digs = Pattern.Span("0123456789");
    }
}

You can examine the files Test1.pat thru Test5.pat in the test directory to see other examples.

API Usage

The compiler may be invoked within a Java program to dynamically generate Patterns from pattern specifications. The Java code fragment equivalent to the first command line above would be as follows.
// Instantiate the compiler, a matcher,
// a MatchResult, and a VarMap for
// definiing variable names.
Compiler compiler = new Compiler();
Match matcher = new Match();
MatchResult result = new MatchResult();
VarMap vars = new VarMap();

Pattern p = compiler.compile("arb & len(5)=x");
String subject = "1234567";

boolean ok = matcher.Match(subject,p,result,vars);

stdout.print(ok?"succeed: "+result.Subject:"fail.");

Examples of Pattern Matching

Example 1. This shows a simple example of the use of pattern replacement to remove a line number from the start of a string. We assume that the line number has the form of a string of decimal digits followed by a period, followed by one or more spaces. The tag delimiter for pattern specifications is "@". We will occasionally include the concatenation operator ("&").
Pattern Digs = @span("0123456789")@;
Pattern Lnum = @pos(0) & (+Digs & ".")=nil & span(" ")@;

Now to use this pattern we simply do a match with a replacement:

VarMap vars = new VarMap(); vars.put("Digs",Digs); vars.put("nil","");
boolean ok = matcher.Match(Line, Lnum, vars);
which replaces the line number by the null string.

The style we use here -- defining constant patterns and then using them -- is typical. It is possible to build up patterns dynamically, but it is usually more efficient to build them in pieces in advance using constant declarations. Note in particular that although it is possible to construct a pattern directly as an argument for the Match routine, it is much more efficient to preconstruct the pattern as we did in this example.

Example 2. Now let's look at the use of pattern assignment to break a string into sections. Suppose that the input string has two unsigned decimal integers, separated by spaces or a comma, with spaces allowed anywhere. Then we can isolate the two numbers with the following pattern:

Pattern B = @nspan(" ")@;
Pattern N = @span("0123456789")@;
Pattern T = @nspan(" ") +N*Num1 span(" ,") +N*Num2@;

The match operation

Match(" 124, 257  ", T)
would assign the string 124 to Num1 and the string 257 to Num2.

Example 3. Now let's see how more complex elements can be built from the set of primitive elements. The following pattern matches strings that have the syntax of Ada 95 based literals:

VarMap vars = new VarMap();
vars.put("DecDigits","0123456789");
vars.put("HexDigits","0123456789abcdefABCDEF");

Pattern Digs = @span(+DecDigits)@;
Pattern UDigs = @+Digs arbno("_" +Digs")@;
Pattern Hdig = @span(+HexDigits)@;
Pattern UHdig = @+Hdig arbno("_" +Hdig)@;
Pattern Bnum = @+UDigs "#" +UHdig "#"@;
vars.put("Digs",Digs);
vars.put("UDigs",UDigs);
vars.put("Hdig",Hdigs);
vars.put("UHdig",UHdigs);
vars.put("Bnum",Bnum);
A match against Bnum will now match the desired strings, e.g. it will match "16#123_abc#", but not "a#b#". However, this pattern is not quite complete, since it does not allow colons to replace the pound signs. The following is more complete:
Pattern Bchar = @any("#:")@;
vars.put("Bchar",Bchar);
Pattern Bnum = @+UDigs +Bchar +UHdig +Bchar@;
But that is still not quite right, since it allows # and : to be mixed, and they are supposed to be used consistently. We solve this by using a deferred match.
Pattern Bnum  = @+UDigs Bchar*Temp UHdig +Temp@;
Here the first instance of the base character is stored in Temp, and then later in the pattern we rematch the value that was assigned.

Example 4. For an example of a recursive pattern, let's define a pattern that is like the built in Bal, but the string matched is balanced with respect to square brackets or curly brackets.

The language for such strings might be defined in extended BNF as

ELEMENT ::= <any character other than [] or {}>
           | '[' BALANCED_STRING ']'
           | '{' BALANCED_STRING '}'
BALANCED_STRING ::= ELEMENT {ELEMENT}
Here we use "{}" to indicate zero or more occurrences of a term, as is common practice in extended BNF. Now we can translate the above BNF into recursive patterns as follows:
Pattern Element = @notany("[]{}") \
                  | ("[" Balanced_String "]") \
                  | ("{" Balanced_String "}")@;
Pattern Balanced_String = @Element arbno(Element)@;

VarMap vars = new VarMap();
vars.put("Balanced_String",Balanced_String);
Note the important use of + here to refer to a pattern not yet defined. Note also that we use assignments precisely because we cannot refer to as yet undeclared variables in initializations.

Now that this pattern is constructed, we can use it as though it were a new primitive pattern element. For example, the match:

Match("xy[ab{cd}]", @+Balanced_String*Current_Output fail@);
will generate the output:
x xy xy[ab{cd}] y y[ab{cd}] [ab{cd}] a ab ab{cd} b b{cd} {cd} c cd d
Note that the function of the fail here is simply to force the pattern Balanced_String to match all possible alternatives. Studying the operation of this pattern in detail is highly instructive.

Example 5. Finally we give a rather elaborate example of the use of deferred matching. The following declarations build up a pattern which will find the longest string of decimal digits in the subject string.

VarMap vars = new Varmap();

Function GTS = new Function() {
	Object get(VarMap vars)
        {
            String Cur = vars.getString("Cur","");
            String Max = vars.getString("Max","");
            return Boolean.valueOf(Cur.length() > Max.length());
         }}
vars.put("GTS",GTS);

vars.put("Digit","0123456789");

Pattern Digs = @span(+Digit)@;
Pattern Find = @""*Max fence \          // initialize Max to null
                & breakx(+Digit) \      // scan looking for digits
                & ((span(+Digit)*Cur \  // assign next string to Cur
	            & (+GTS) \          // check size(Cur) > Size(Max)
                    & setcur(+Loc)) \   // if so, save location
                   * Max) \             // and assign to Max
                & fail@;               // seek all alternatives
As we see from the comments here, complex patterns like this take on aspects of sequential programs. In fact they are sequential programs with general backtracking. In this pattern, we first use a pattern assignment that matches null and assigns it to Max, so that it is initialized for the new match. Now BreakX scans to the next digit. Arb would do here, but BreakX will be more efficient. Once we have found a digit, we scan out the longest string of digits with Span, and assign it to Cur. The deferred call to GtS tests if the string we assigned to Cur is the longest so far. If not, then failure is signalled, and we seek alternatives (this means that BreakX will extend and look for the next digit string). If the call to GtS succeeds then the matched string is assigned as the largest string so far into Max and its location is saved in Loc. Finally Fail forces the match to fail and seek alternatives, so that the entire string is searched.

If the pattern Find is matched against a string, the variable Max at the end of the pattern will have the longest string of digits, and Loc will be the starting character location of the string. For example, Match("ab123cd4657ef23", Find) will assign "4657" to Max and 11 to Loc (indicating that Subject.substring(0,11) has been matched.



Change Log


Note that this list of changes is a partial summary taken from the more complete list in the Changelog file.

Changes Incorporated into Version 1

Minor version levels are indicated in parentheses.



Point of Contact

Author:Dennis Heimbigner
Software Engineering Research Laboratory
Computer Science Department
University of Colorado
Boulder, Colorado 80309-0430 USA
dennis.heimbigner@colorado.edu


License

The code is divided into two parts. The source code which is derived directly from the Ada source code is licensed under that source's license, which is essentially the LGPL. The compiler source code is licensed under the BSD license. See the file license.txt for more more details.