SNOBOL4IO(1) | CSNOBOL4B 2.3.2 | Janurary 1, 2024

NAME

snobol4io – SNOBOL4 file I/O

DESCRIPTION

Macro SNOBOL4 originally depended on FORTRAN libraries, unit numbers and FORMATs for input and output. CSNOBOL4 uses the C stdio(3) library instead, but unit numbers (INTEGERs between 1 and 256) and record lengths remain embedded in the Macro SNOBOL4 code.

I/O Associations

Output on a closed unit generates a fatal “Output error”, see snobol4error(1).

The following variable/unit/file associations exist by default;

VariableUnitAssociation
INPUT5standard input (input)
OUTPUT6standard output (output)
TERMINAL7standard error (output)
TERMINAL8/dev/tty (input)

Named files

Input and output filenames can be supplied to the INPUT() and OUTPUT() functions via an optional fourth argument.

filename - (hyphen)
is interpreted as stdin on INPUT() and stdout on OUTPUT().

sub-process I/O using PIPE and Pseudo-terminals
If the filename begins with single a vertical bar (|), the remainder is used as a shell command whose stdin (in the case of OUTPUT()) or stdout (in the case of INPUT()) will be connected to the file variable via a pipe. If a pipe is opened by INPUT() input in “update” mode, the connection will be bi-directional (on systems with socketpair and Unix-domain sockets). See below for how to associate a variable for I/O in both directions.

If the filename begins with two vertical bars (||) the remainder is used as a shell command executed with stdin, stdout and stderr attached to the slave side of a pseudo-terminal (pty), if the system C library contains the forkpty(3) routine. Use of ptys are necessary when the program to be invoked cannot be run without a “terminal” for I/O. See below on how to properly associate the I/O variable.

magic paths /dev/stdin, /dev/stdout, and /dev/stderr
/dev/stdin, /dev/stdout, and /dev/stderr refer to the current process standard input, standard output and standard error I/O streams respectively regardless of whether those special filenames exist on your system.

magic path /dev/fd/n
/dev/fd/n uses fdopen(3) to open a new I/O stream associated with file descriptor number n, regardless of whether the special device entries exist.

magic paths /tcp/hostname/service, /udp/hostname/service
and /tls/hostname/service. /tcp/hostname/service can be used to open connection to a TCP server. /udp/hostname/service behaves similarly for UDP. /tls/hostname/service opens a TLS over TCP connection (NOTE! does not attempt to verify certificate unless "verify" option used, and even then does not handle SNI or SAN). Path can followed by a number of different slash separated options:

broadcastAllow broadcast address (UDP only).
dontrouteEnables routing bypass for outgoing messages.
keepaliveEnables TCP connection keep alive messages.
nodelaySend TCP data without waiting.
oobinlineEnables reception of out-of-band data in band.
privBind local port number under 1024 (if allowed).
reuseaddrAllow quick reuse of local addresses.
verifyAttempt to verify server TLS certificate.

magic pathname /dev/tmpfile
/dev/tmpfile opens an anonymous temporary file for reading and writing, see tmpfile(3).

/dev/null and /dev/tty
On non-POSIX systems /dev/null and /dev/tty are magical, and refer to the null device, and the user's terminal/console, respectively.

I/O Options

Originally the third argument specified record length for INPUT(), or a FORTRAN FORMAT for OUTPUT().

CSNOBOL4 interprets it as string of single letter options, commas are ignored. Some options effect only the I/O variable named in the first argument, others effect any variable associated with the unit number in the second argument.

digits
A span of digits will set the input record length for the named I/O variable. This controls the maximum string that will be returned for regular text I/O, and the number of bytes returned for binary I/O. Record length is per-variable association; multiple variables may be associated with the same unit, but with different record lengths. The default record length for input is 1024. Lines longer than the record length will be silently truncated. Since CSNOBOL4 2.2, record length is only honored for binary I/O, and all characters upto a newline (ASCII Line Feed) are interpreted as a single line.

A
For OUTPUT() the unit will be opened for append access (and ignored by INPUT()). All writes will occur at the end of the file at the time of the write, regardless of the file position before the write.

B
The unit will be opened for binary access. On input, newline characters have no special meaning; the number of bytes transferred depends on record length (see above). On output, no newline is appended.

B
For terminal devices, all input from this unit will be done without special processing for line editing or EOF; the number of characters returned depends on the record length. Characters which deliver signals (including interrupt, kill, and suspend) are still processed. Units (with different fds) opened on the same terminal device operate independently; some can use binary mode, while others operate in text mode.

C
Character at a time I/O. A synonym for B,1.

E
Set the "close on exec" flag for the underlying file descriptor. Depends on support by the C library fopen(3) call for 'e' in the mode string for regular files. Honored for sockets regardless, (but not on Windows).

J
Read and write compressed data in .xz format, using liblzma, as written by xz(1). If a digit 0 through 9 immediately follows the option, it will be interpreted as the compression level to use when writing. It's claimed that level zero is "sometimes faster than gzip -9 while compressing much better". The default compression level is 6, larger numbers will require more than 16MiB of memory to decompress, and are only useful only when compressing files bigger than 8 MiB (level 7), 16 MiB (level 8), and 32 MiB (level 9). Matches the tar(1) command line option. Added in CSNOBOL4 2.2.

j
Read and write compressed data in .bz2 format, using libbz2, as created by bzip2(1). If a digit 1 through 9 immediately follows the option, it will be interpreted as the compression level to use when writing. Matches the tar(1) command line option. Added in CSNOBOL4 2.2.

K
If an input line is longer than the input record length, return the line in multiple reads (breaK up the line) instead of discarding the extra characters. Added in CSNOBOL4 2.0. Obsolete in CSNOBOL4 2.2.

T
Terminal mode. Writes are performed “unbuffered” (see below), and no newline characters are added. On input newline characters are returned. Terminal mode effects only the referenced unit, and does not require opening a new file descriptor (ie; by using a magic pathname): OUTPUT(.TT, 8, "T", "-"). Terminal mode is useful for outputting prompts in interactive programs.

Q
Quiet mode. Turns off input echo on terminals. Effects only input on this file descriptor.

U
Update mode. The unit is opened for both input and output. Example of associating a variable for I/O in both directions:
        unit = IO_FINDUNIT()
        INPUT(.name, unit, 'U', 'filepath')
        OUTPUT(.name, unit)

Useful situations for this when filepath is /dev/fd/n where n is a file descriptor number returned by SERV_LISTEN(), or filepath specifies a pipe (|command) or pseudo-terminal (||command) paths.

The above sequence is also useful with when combined with fixed record length, binary mode and the SET() function for I/O to preexisting files. Performing OUTPUT() first will create a regular file if it does not exist, but will also truncate a preexisting file!

W
Unbuffered mode. Each output variable assignment causes an immediate I/O transfer to occur by direct read(1) or write(1) system calls, rather than collecting the data in a buffer for efficiency.

X
Open fails if file exists (meaningless for /dev/fd/n). Depends on support by the C library fopen(3) call for 'x' in the mode string. Added in CSNOBOL4 2.1 where it was ignored for sockets. In CSNOBOL4 2.2 applies to sockets, and means don't allow local socket address reuse.

Z
Reserved for .Z (compress(1)) style compression?!

z
Read and write compressed data in .gz format using zlib(3), as created by gzip(1). If a digit 0 through 9 immediately follows the option, it will be interpreted as the compression level to use when writing. Matches the tar(1) command line option. Added in CSNOBOL4 2.2.

Other I/O extensions

SERV_LISTEN(), SET(), SSET()
see snobol4func(1).

I/O Layers

The Macro SNOBOL4 and POSIX I/O architectures have subtleties which interact, and are explained here:

Variable association
Input and output is done by reading or writing variables associated with a unit number for I/O.

Input (maximum) record lengths are associated each variable association!

Unit number
Multiple variables can be associated with the same unit number using the INPUT() and OUTPUT() functions.

Each unit number refers to a stdio(3) stream (except on broken systems like Windows, where socket handles are incompatible with file handles, and all network I/O is performed “unbuffered”).

Sequential named files can be associated with an I/O unit when the -r option is given on the command line! REWIND() should return to to after the program END label!

“Standard I/O” Stream
snobol4(1) performs MOST I/O through “Standard Input/Output” streams. Multiple units can be associated with the same stdio stream (FILE struct) using magic pathnames (“-” and /dev/std{in,out,err}). Buffering is performed by the stdio layer.

Operating System file descriptor
More than one stdio stream can be associated with the same O/S “fd” (by opening magic pathname “/dev/fd/n”).

Each POSIX “fd” has a file position pointer, changed by reading, writing and the REWIND(), SET() and SSET() functions.

Normally terminal device “special files” have one set of mode settings, but CSNOBOL4 associates (saves and restores) different terminal settings (echo and the number of characters returned on read) based on fd numbers.

Operating System open file object
More than one “fd” slot can be associated with the same “open file” object, either in multiple forks, or by dup(2) of the same fd. This is often the case for stdin, stdout and stderr.

Open file objects have flags which effect all associated fds, including input, output and append modes.

Operating System named file
Independent opens of the same named “regular” file will have different open file objects, and thus have independent access modes and file positions.

Terminal devices normally have one set of “line discipline” mode settings, but CSNOBOL4 maintains different settings for each file descriptor (see above).

BUGS

This page was cut and pasted from various parts of the original snobol4(1) man page, and still needs review and cleanup.

All extensions should be annotated with the version they appeared in (and what other implementations they're compatible or inspired by).

Record lengths.

Unit numbers.

SEE ALSO

snobol4(1), snobol4ezio(3)