CHANGES (2025-06-13): Since R1:
- Removed restrictions of character encodings, in line with ELTN; any bytes outside the ASCII range remain unchanged.
- Converted the formal grammar to official EBNF.
- Removed rejected proposals.
- Moved Java APIs to a new page.
- Replace Lua string escapes with SRFI-75.
CHANGES (2025-06-14): spell-check.
Introduction
SLAN is the Scheme List-Atom Notation1, because I really wanted the acronym SLAN. Like XML, YAML, JSON, TOML, and my own ELTN it’s a language for describing data structures, not performing calculations. In SLAN everything is a list of “atoms” – strings, numbers, “symbols”, and booleans – and other lists.
SLAN somewhat resembles the programming language Scheme, but even smaller: entire data types and constructions have been cut out to make parsing simpler.
History
Since the invention of LISP in 1960 and Scheme around 1975 programmers have used S-expressions – space separated words surrounded by parentheses – as a simple file format. SXML writes XML using S-expressions.
While contemplating how to serialize what’s essentially a rules engine2 as a Java .properties file, I realized I had other choices. The project I’m working on parses JSON, so to avoid infinite regress I decided on something simpler. Thus SLAN was born.
File Format
The SLAN file format consists entirely of ASCII characters representing Scheme-like lists. As a simple serialization format (not a mathematical programming language) it can represent only a few basic data types: strings, “symbols” (basically strings without quotes), simple decimal numbers, booleans, and of course lists.
Parsers should allow non-ASCII characters in a string constant or in comments, but not elsewhere in the file. Such characters will pass through the parser unchanged and uninterpreted. Parsers should also look for the Unicode “byte order mark” to verify they are parsing a byte encoding and not UTF-16 or UTF-32.
The intent is that SLAN avoids the complexity of character encodings by staying compatible with 1960s technology, while remaining current with those newer standards.
Comments
Following Scheme R6RS, SLAN regards the following as comments:
-
text outside a string that starts with
;and runs to the end of the line. -
text outside a string starting with
#|and ending with|#.
The parser will treat comment text as whitespace.
Data Types
The following are the only data types in SLAN.
List
Lists are sequences of other data types, including nested lists. SLAN doesn’t care whether one implements lists as linked cons cells, arrays, or an arbitrary data structure that’s simply serialized as nested lists.
Symbol
In SLAN symbols and strings are two ways of writing a sequence of characters, one more restrictive than the other.
Nevertheless, parsers should indicate whether a value is a symbol or a string,
in case the application attaches semantic meaning to one of the other.
Applications can use a “symbol” like a C or Java enum, the name of a
datatype, the name of a (remote?) procedure call, or any other meaning
programmers give to names. That said, the current specification has no
mechanism to verify symbols against a list of “known” names; the application
must do that.
String
After Lists, Strings are the main data type in SLAN. As SLAN exists to serialize and deserialize data structures, a String could denote an arbitrary character sequence, a specific data structure like a date, or even a complicated number.
Escape Sequences
SRFI 75 defines the following escape sequences for strings:
| Escape | Byte(s) | Meaning |
|---|---|---|
| \a | 0x07 | bell |
| \b | 0x08 | backspace |
| \t | 0x09 | horizontal tab |
| \n | 0x0a | newline |
| \v | 0x0b | vertical tab |
| \f | 0x0c | form feed |
| \r | 0x0d | carriage return |
| \" | " |
quotation mark / double quote |
| \' | ' |
apostrophe / single quote |
| \\ | \ |
backslash |
| \ | skip newline and following whitespace | |
| \xXX | 0xXX | byte value in hexadecimal digits |
| \uXXXX | utf8(XXXX) |
UTF-8 bytes for code point 0xXXXX. |
| \UXXXXXXXX | utf8(XXXXXXXX) |
UTF-8 bytes for code point 0xXXXXXXXX. |
Escaping a newline skips over the newline and any whitespace to the next printable character:
"This is a string with a newline in it. \
... oh, wait, it's not."
becomes “This is a string with a newline in it. … oh, wait, it’s not.”
The Unicode escapes \\uXXXX and \\UXXXXXXXX includes all valid Unicode
code points, i.e. 0x01 through 0x10FFFF, excluding the surrogates
0xD800 through 0xDFFF.
Number
Numbers in SLAN resemble those in JSON: a whole part, an optional fractional part, and an optional exponent. That’s it. Nothing like a Scheme number.
Boolean
A basic true-or false value.
Empty List
In Scheme the Empty List has special status. Scheme implements lists as
linked lists of “cons” cells, and the empty list is effectively a null pointer.
In SLAN we continue the tradition of Empty List being the equivalent
of nil or null in other languages. It’s the only value equivalent
to Boolean false; all others are true.
Syntax
Below is the formal syntax of SLAN. This is the notation in use:
lower_case- A reference to a grammar rule, defined with the symbol
=. SOME WORDS- A description of a rule in plain English.
"x"- A literal character or sequence of characters.
"\x"- An escape sequence denoting a non-printable or confusing character.
A literal
"is written as"\"; a literal \ is written as"\\". See Escape Sequences, above. - x
,y - A sequence containing both x and y.
- x
|y - Either x or y.
[x]- Zero or one of x.
{x}- Zero or more of x.
(x)- A group of items that are treated as a unit.
For example "y", [ "a" | "b" ] , "z" means
a sequence of zero or more “a"s or “b"s, starting with a “y” and ending
with a “z”: “yz” “yaz”, “ybz”, “yaaz”, “ybaz”, etc.
Streams, Lists, and Values
stream = ws , list , { ws , list } , ws ;
list = "(" , ws , value , { reqws , value } , ws , ")" ;
value = list | emptylist | symbol | string | number | boolean ;
emptylist = "(" , ws , ")" ;
Symbols
symbol = ( ichar , { schar } ) | nchar ;
ichar = letter | "!" | "$" | "%" | "&" | "*" | "/" | ":"
| "<" | "=" | ">" | "?" | "~" | "_" | "^" ;
nchar = "." | "+" | "-" ;
schar = ichar | digit | nchar ;
letter = "A" THROUGH "Z" | "a" THROUGH "z" ;
digit = "0" THROUGH "9" ;
Strings
string = dquo , { char | "\\" escape } , dquo ;
dquo = """" ; (* literal double quote *)
char = ANY CHARACTER BUT A DOUBLE QUOTE OR BACKSLASH ;
escape = "'" | dquo | "\\"
| "a" | "b" | "f" | "n" | "r" | "t" | "v"
| newline , { whitespace }
| "x" , hexdigit , hexdigit
| "u" , hexdigit , hexdigit , hexdigit , hexdigit
| "U" , hexdigit , hexdigit , hexdigit , hexdigit ,
hexdigit , hexdigit , hexdigit , hexdigit ;
hexdigit = "0" THROUGH "9" | "A" THROUGH "F" | "a" THROUGH "f";
Numbers
number = [ sign ] , whole , [ decimal ] , [ exponent ]
| [ sign ] , [ whole ] , decimal , [exponent]
| [ sign ] , [ whole ] , "/" , positive
| "0/0" | "+1/0" | "-1/0" ; (* NaN, +Infinity, -Infinity *)
sign = "+" | "-" ;
whole = "0" | positive ;
positive = ( "1" THROUGH "9" ) , { digit } ;
decimal = "." , digit , { digit } ;
exponent = ( "e" | "E" ) , [ sign ] , digit , {digit} ;
Booleans
boolean = "#t" | "#f" ;
Whitespace
reqws = ( whitespace | comment ) , ws ;
ws = { whitespace | comment } ;
whitespace = " " | "\f" | "\t" | "\r" | "\n" | "\v" ;
comment = ( ";" , NOT A NEWLINE , newline )
| ( "#|" , NOT A CLOSE-COMMMENT , "|#" ) ;
newline = "\r" | "\n" | "\r\n" ;
Semantic Concerns
-
Because the definition of Empty List includes an empty list, parsers must aggressively search for either their first element or their closing parenthesis. This complicates parsing a bit, but it’s the only consistent way to treat empty lists.
-
#fand()are interpreted as “false”, if they’re interpreted. Everything else counts as “true”.
-
Formerly the “Scheme-Like Abridged Notation”, but this sounds much better and more descriptive. JSON has Objects (and Arrays), ELTN has Tables, SLAN has Lists … and Atoms. ↩︎
-
Specifically how to choose among Java classes to wrap an object based on the signature(s) of the wrapper’s constructor(s). For extra credit, imagine the user offers up multiple arguments, e.g.
(byte[], int, int). I think I implemented something like this for Rhino back in the day, but I can’t remember the details. ↩︎