A (Revised^2) Report on SLAN: Scheme List-Atom Notation

Frank Mitchell

Posted: 2025-06-13
Last Modified: 2026-02-09
Word Count: 1497
Tags: scheme software programming

Table of Contents

CHANGES (2025-06-13): Since R1:

CHANGES (2025-06-14): spell-check.

Introduction

SLAN is the Scheme List-Atom Notation1, because I really wanted the acronym SLAN. Like XML, YAML, JSON, TOML, and my own ELTN it’s a language for describing data structures, not performing calculations. In SLAN everything is a list of “atoms” – strings, numbers, “symbols”, and booleans – and other lists.

SLAN somewhat resembles the programming language Scheme, but even smaller: entire data types and constructions have been cut out to make parsing simpler.

History

Since the invention of LISP in 1960 and Scheme around 1975 programmers have used S-expressions – space separated words surrounded by parentheses – as a simple file format. SXML writes XML using S-expressions.

While contemplating how to serialize what’s essentially a rules engine2 as a Java .properties file, I realized I had other choices. The project I’m working on parses JSON, so to avoid infinite regress I decided on something simpler. Thus SLAN was born.

File Format

The SLAN file format consists entirely of ASCII characters representing Scheme-like lists. As a simple serialization format (not a mathematical programming language) it can represent only a few basic data types: strings, “symbols” (basically strings without quotes), simple decimal numbers, booleans, and of course lists.

Parsers should allow non-ASCII characters in a string constant or in comments, but not elsewhere in the file. Such characters will pass through the parser unchanged and uninterpreted. Parsers should also look for the Unicode “byte order mark” to verify they are parsing a byte encoding and not UTF-16 or UTF-32.

The intent is that SLAN avoids the complexity of character encodings by staying compatible with 1960s technology, while remaining current with those newer standards.

Comments

Following Scheme R6RS, SLAN regards the following as comments:

The parser will treat comment text as whitespace.

Data Types

The following are the only data types in SLAN.

List

Lists are sequences of other data types, including nested lists. SLAN doesn’t care whether one implements lists as linked cons cells, arrays, or an arbitrary data structure that’s simply serialized as nested lists.

Symbol

In SLAN symbols and strings are two ways of writing a sequence of characters, one more restrictive than the other.

Nevertheless, parsers should indicate whether a value is a symbol or a string, in case the application attaches semantic meaning to one of the other. Applications can use a “symbol” like a C or Java enum, the name of a datatype, the name of a (remote?) procedure call, or any other meaning programmers give to names. That said, the current specification has no mechanism to verify symbols against a list of “known” names; the application must do that.

String

After Lists, Strings are the main data type in SLAN. As SLAN exists to serialize and deserialize data structures, a String could denote an arbitrary character sequence, a specific data structure like a date, or even a complicated number.

Escape Sequences

SRFI 75 defines the following escape sequences for strings:

Escape Byte(s) Meaning
\a 0x07 bell
\b 0x08 backspace
\t 0x09 horizontal tab
\n 0x0a newline
\v 0x0b vertical tab
\f 0x0c form feed
\r 0x0d carriage return
\" " quotation mark / double quote
\' ' apostrophe / single quote
\\ \ backslash
\␤ skip newline and following whitespace
\xXX 0xXX byte value in hexadecimal digits
\uXXXX utf8(XXXX) UTF-8 bytes for code point 0xXXXX.
\UXXXXXXXX utf8(XXXXXXXX) UTF-8 bytes for code point 0xXXXXXXXX.

Escaping a newline skips over the newline and any whitespace to the next printable character:

"This is a string with a newline in it. \
        ... oh, wait, it's not."

becomes “This is a string with a newline in it. … oh, wait, it’s not.”

The Unicode escapes \\uXXXX and \\UXXXXXXXX includes all valid Unicode code points, i.e. 0x01 through 0x10FFFF, excluding the surrogates 0xD800 through 0xDFFF.

Number

Numbers in SLAN resemble those in JSON: a whole part, an optional fractional part, and an optional exponent. That’s it. Nothing like a Scheme number.

Boolean

A basic true-or false value.

Empty List

In Scheme the Empty List has special status. Scheme implements lists as linked lists of “cons” cells, and the empty list is effectively a null pointer. In SLAN we continue the tradition of Empty List being the equivalent of nil or null in other languages. It’s the only value equivalent to Boolean false; all others are true.

Syntax

Below is the formal syntax of SLAN. This is the notation in use:

lower_case
A reference to a grammar rule, defined with the symbol =.
SOME WORDS
A description of a rule in plain English.
"x"
A literal character or sequence of characters.
"\x"
An escape sequence denoting a non-printable or confusing character. A literal " is written as "\"; a literal \ is written as "\\". See Escape Sequences, above.
x , y
A sequence containing both x and y.
x | y
Either x or y.
[ x ]
Zero or one of x.
{ x }
Zero or more of x.
( x )
A group of items that are treated as a unit.

For example "y", [ "a" | "b" ] , "z" means a sequence of zero or more “a"s or “b"s, starting with a “y” and ending with a “z”: “yz” “yaz”, “ybz”, “yaaz”, “ybaz”, etc.

Streams, Lists, and Values

stream      = ws , list , { ws , list } , ws ;

list        = "(" , ws , value , { reqws , value } , ws , ")" ;

value       = list | emptylist | symbol | string | number | boolean ;

emptylist   = "(" , ws , ")" ;

Symbols

symbol      = ( ichar , { schar } ) | nchar ;

ichar       = letter | "!" | "$" | "%" | "&" | "*" | "/" | ":"
                 | "<" | "=" | ">" | "?" | "~" | "_" | "^" ;

nchar       = "." | "+" | "-" ;

schar       = ichar | digit | nchar ;

letter      = "A" THROUGH "Z" | "a" THROUGH "z" ;

digit       = "0" THROUGH "9" ;

Strings

string      = dquo , { char | "\\" escape } , dquo ;

dquo        = """" ;    (* literal double quote *)

char        = ANY CHARACTER BUT A DOUBLE QUOTE OR BACKSLASH ;

escape      = "'" | dquo | "\\"
                 | "a" | "b" | "f" | "n" | "r" | "t" | "v" 
                 | newline , { whitespace }
                 | "x" , hexdigit , hexdigit 
                 | "u" , hexdigit , hexdigit , hexdigit , hexdigit
                 | "U" , hexdigit , hexdigit , hexdigit , hexdigit ,
                         hexdigit , hexdigit , hexdigit , hexdigit ;

hexdigit    = "0" THROUGH "9" | "A" THROUGH "F" | "a" THROUGH "f";

Numbers

number      = [ sign ] , whole , [ decimal ] , [ exponent ]
                | [ sign ] , [ whole ] , decimal , [exponent]
                | [ sign ] , [ whole ] , "/" , positive
                | "0/0" | "+1/0" | "-1/0" ;     (* NaN, +Infinity, -Infinity *)

sign        = "+" | "-" ;

whole       = "0" | positive ;

positive    = ( "1" THROUGH "9" ) , { digit } ;

decimal     = "." , digit , { digit } ;

exponent    = ( "e" | "E" ) , [ sign ] , digit , {digit} ;

Booleans

boolean     = "#t" | "#f" ;

Whitespace

reqws       = ( whitespace | comment ) , ws ;

ws          = { whitespace | comment } ;

whitespace  = " " | "\f" | "\t" | "\r" | "\n" | "\v" ;

comment     = ( ";" , NOT A NEWLINE , newline )
                | ( "#|" , NOT A CLOSE-COMMMENT , "|#" ) ;

newline     = "\r" | "\n" | "\r\n" ;

Semantic Concerns


  1. Formerly the “Scheme-Like Abridged Notation”, but this sounds much better and more descriptive. JSON has Objects (and Arrays), ELTN has Tables, SLAN has Lists … and Atoms. ↩︎

  2. Specifically how to choose among Java classes to wrap an object based on the signature(s) of the wrapper’s constructor(s). For extra credit, imagine the user offers up multiple arguments, e.g. (byte[], int, int). I think I implemented something like this for Rhino back in the day, but I can’t remember the details. ↩︎