A Specification of the ELTN File Format (v0.5)

Frank Mitchell

Posted: 2023-01-26
Last Modified: 2023-04-12
Word Count: 3257
Tags: software programming lua

Table of Contents

ELTN (Extended Lua Table Notation) is a structured text format to describe data structures. It fills similar niches to other text formats like XML (1997-2008), YAML (2004-2021), JSON (2006-2017), and TOML (2013-2021). Like JSON it’s a strict subset of a dynamically typed embedded programming language, Lua.

The name Extended Lua Table Notation reflects that the syntax does not simply include Lua tables but a sequence of key-value pairs similar to Lua global variable assignments. Thus one doesn’t have to group the whole document in curly brackets (’{’ … ‘}’).

The author believes ELTN fills a niche between the simplicity of JSON and the readability of YAML and TOML.

Use Cases

Configuration Files

Mainly we intended ELTN as a format for configuration files. This is a portion of my Hugo configuration as TOML:

[markup]
    [markup.tableOfContents]
        startLevel = 2
        endLevel = 5
    [markup.highlight]
        style = "monokailight"
        tabWidth = 4
    [markup.goldmark]
        [markup.goldmark.renderer]
            unsafe = true

[taxonomies]
    tag = "tags"

Here’s the equivalent in YAML:

markup:
  tableOfContents:
    startLevel: 2
    endLevel: 5
  highlight:
    style: "monokailight"
    tabWidth: 4
  goldmark:
    renderer:
      unsafe: true

taxonomies:
    tag: "tags"

Here’s what it might look like in JSON:

{
"markup": {
  "tableOfContents": { "startLevel": 2, "endLevel": 5 },
  "highlight": {
    "style": "monokailight",
    "tabWidth": 4
  },
  "goldmark": { "renderer": { "unsafe": true }}
},
"taxonomies": { "tag": "tags" }
}

And here’s the equivalent in ELTN:

1markup = {
2  tableOfContents = { startLevel = 2, endLevel = 5 };
3  highlight = {
4    style = "monokailight";
5    tabWidth = 4;
6  };
7  goldmark = { renderer = { unsafe = true }};
8}
9taxonomies = { tag = "tags" }

The ELTN version makes better use of vertical space, and unlike the JSON version there’s no need to quote strings used as keys. Note also that commas separate the first keys on line #2 but semicolons separate them thereafter. That’s mainly a stylistic issue: within a Lua table one can use either a comma or a semicolon as a separator, and it won’t complain if you end the last key with a separator. Top level keys markup and taxonomies don’t need separators at all, although one can use a semicolon (not a comma!) if one wants.

Data Persistence

One could also have programs write out ELTN files to persist their data. They or a cooperating program could read them in later. The format is only a little more difficult to parse than JSON, owing to “variable” assignments outside of a table instance and the use of tables for both sequential and associative elements.

Here I’ll borrow an example from Programming in Lua, 3rd edition, by Roberto Ierusalimschy, adapted to ELTN syntax. Let’s say a group of professional programmers keeps a distributed lending library of the books on their shelves.1 Each member uploads an ELTN document that lists of the books they’re willing to lend out that might look like this:

{
  member = {
     number = 13,
     name = "Frank Mitchell",
     contact = {
        email = "frank.mitchell@nosuchplace.com",
        phone = false,     -- no phone calls!
        pidgin_im = false; -- no IMs!
     }
  },
  books = {
    { 
      author = "Donald E. Knuth", 
      title = "Literate Programming", 
      publisher = "CSLI", 
      year = 1992
    },
    { 
      author = "Jon Bentley", 
      title = "More Programming Pearls", 
      year = 1990, 
      publisher = "Addison-Wesley", 
    },
    --[[
    ... many, many more ...
    ]]
  }
}

Since most of the members of this group are programmers, they write a system that collates all the data. It transforms the ELTN tree to index the data by author and title, counts how many total copies of each book are available, and persists the total library in a big ELTN file, or maybe a set of small ones indexed by the author’s name.

Data Transfer

ELTN can transfer data between processes with embedded Lua interpreters the same way JSON transfers data between a Web server and a browser. Neither side actually has to run Lua; both could parse ELTN.

Let’s continue the example from above. The ELTN request gets translated into entries in a SQL or NoSql database, for reasons, but a Web site allows lenders to search for books. Having found one, our borrower – let’s call him Tom – clicks a button to notify the owner. The system notifies the unsociable member with too many books above (through email, not text). The borrower and lender then meet to exchange the book, and both confirm the exchange took place. The system acknowledges with some boilerplate text and the following bit at the bottom:

{
  transaction = 982,
  requested = "2023-01-29 18:42 CST",
  completed = "2023-02-01 11:29 CST",
  lender = {
     number = 13,
     name = "Frank Mitchell",
     comments = "Please don't break the spine."
  },
  borrower = {
     number = 23,
     name = "Tom Morrow",
     comments = "Gonna read this on the plane to Tokyo!"
  },
  books = {
    { 
      author = "Donald E. Knuth", 
      title = "Literate Programming", 
      publisher = "CSLI", 
      year = 1992
    },
  }
}

When Tom returns the book he replies to that e-mail with the ELTN part intact, and the unsociable bookworm e-mails his copy back. Ideally one would only need the transaction number, but by replying with all that information (and maybe more) both parties acknowledge what they’re lending or borrowing. Conversely, if the lender doesn’t get his book back, or gets it with a broken spine, he has a machine readable e-mail trail.

Syntax and Semantics

ELTN syntax is a strict subset of Lua 5.4 syntax, specifically creating tables and assigning values to their keys.

Lexical Conventions

Lexical conventions follow the Lua Manual for 5.4.

By default the lexer treats the input as a sequence of UTF-8 bytes. While parsing other encodings that support ASCII characters is valid, informing the lexer of an alternative encoding is outside the scope of this specification.

Fixed-Length Tokens

The language requires the following fixed-length tokens.

Token Rules Meaning
; fieldsep, stat separates table fields or top-level definitions
, fieldsep separates table fields
= stat, field assignment to a name or explicit table index
nil constant empty reference2
false constant Boolean false
true constant Boolean true
{ tableconstructor creates a new table
} tableconstructor marks the end of a table

Variable-Length Tokens

To summarize variable-length tokens in ELTN, as used in the formal syntax below:

identifier (NAME)
To quote the Lua 5.4 Manual, any string of any string of Latin letters, Arabic-Indic digits, and underscores, not beginning with a digit and not being a reserved word. For backward compatibility, otherwise valid NAMEs not siginficant to ELTN but meaningful in Lua are forbidden; see below for a list.
literal string (STRING_LITERAL)
Lua literal strings come in two forms. A short literal string is delimited by single (') or double quotes ("); a backslash (\) escapes quote marks within the string and forms C-like escape sequences, detailed below. A long literal string is delimited by the sequences [[ and ]]. Unlike a short literal, any characters within those sequences (save for a newline shortly after the [[) are considered part of the string, including nested [[]] pairs. (This specification will leave “long brackets” up to the implementor.)
numeric constant (NUMERAL)
Numbers follow conventions similar to other programming languages: a whole or decimal number followed by an e or E and a positive or negative integer exponent, or 0x or 0X followed by a whole or fractional hexadecimal number followed by a p and a positive or negative hexadecimal exponent.
whitespace (WS)
an uninterrupted sequence of non-printable ASCII characters: space (’ ’ or 0x20), form feed (0x0c), newline (0x0a), carriage returns (0x0d), horizontal tab (0x09), or vertical tab (0x0b) Whitespace only serves to separate other tokens and improve readability.
newline
CR (0x0d), LF (0x0a), or a CR-LF (0x0d 0x0a) sequence.
comments
Comments also come in two forms; A short comment starts with -- and runs until the first newline. A long comment starts with --[[ aand runs to the next ]] not matched by an internal [[, much like long string literals. In both cases, a comment is essentially whitespace in this specification. A future specification may configure parsing behavior from comments.

Reserved Words

For backward compatibility with Lua, the following are reserved words and invalid as NAMEs:

and       break     do        else      elseif    end
false     for       function  goto      if        in
local     nil       not       or        repeat    return
then      true      until     while

Any non-alphanumeric, non-whitespace character not used in one of the productions above is also invalid, including nearly all those in Lua operators.

Escape Sequences

Escape Byte(s) Meaning
\a 0x07 bell
\b 0x08 backspace
\f 0x0c form feed
\n 0x0a newline
\r 0x0d carriage return
\t 0x09 horizontal tab
\v 0x0b vertical tab
\\ \ backslash
\" " quotation mark / double quote
\’ ' apostrophe / single quote
\↩︎ 0x0d escaped newline3
\z skip to the next non-whitespace character4
\xXX 0xXX byte value in hexadecimal digits
\DDD 0DDD byte value in octal digits5
\u{XXXX} utf8(XXXX)6 UTF-8 bytes for code point 0xXXXX.7

Parsing

This BNF defines how the tokens above form a valid ELTN document, ignoring whitespace and comments.

UPDATED 2023-02-09: Changed exp to value, to match both intent and JSON naming.

UPDATED 2023-04-11: Disallow nil as a table key, for backward compatibility with Lua.

document         ::= tableconstructor | statlist
statlist         ::= ( stat )*
stat             ::= var '=' value | ';'
var              ::= NAME
value            ::= `nil` | constant | tableconstructor 
constant         ::= 'false' | 'true' | NUMERAL | STRING_LITERAL
tableconstructor ::= '{' ( fieldlist )? '}'
fieldlist        ::= field ( fieldsep field )* ( fieldsep )?
field            ::= key '=' value | value
key              ::= '[' constant ']' | NAME
fieldsep         ::= ',' | ';'

This grammar uses the following notation:

lowercase
A parser rule, defined elsewhere in the grammar. document is the top level rule.
NAME, NUMERAL, and STRING_LITERAL
Lexical symbols described above.
''
A lexical symbol defined as the exact sequence between the single quotes.
|
A separator between alternatives, e.g. key ::= '[' constant ']' | NAME means that a key may either be a constant between square brackets or a NAME (identifier).
()?
Zero or one occurence.
()*
Zero or more occurences.

Streaming ELTN

Added 2023-02-09.

ELTN could also communicate between processes as a wire protocol. In that case the top part of the grammar might change to this:

stream   ::= ( tableconstructor | stat )* EOT
EOT      ::=

A tableconstructor sends an asynchronous message from client to server or vice-versa. A future spec might detail an ELTN-RPC analogous to JSON-RPC.

A stat might configure some session state or otherwise preserve values during the session. We recommend that the exchange of tables represent application-specific persistent changes, and that the effects of var = value statements not persist beyond the session. In distributed computing, one process maintaining state for another causes a lot of headaches, only partially solved by expiration dates, keepalive heartbeats, and session cookies. All that is far beyond the scope of this document; suffice to say one doesn’t want to bring down a server because someone decided to make it hold onto a lot of large tables.

Constraints

Semantics

The atomic values – nil, Boolean true and false, numbers, and strings – stand for themselves.

The tableconstructor rule has two ways to populate its structure:

  1. key = value, whether that key is a NAME or a value in square brackets ([]) means to associate the key value with the given value.

  2. Without a key, the values are stored in sequential indexes: 1, 2, etc. Thus a value like:

    {"foo", "bar", "baz", [5] = "foobar"}
    

    is equivalent to

    { [1]="foo", [2]="bar", [3]="baz", [5]="foobar" }
    

If an ELTN Table doesn’t define a value for a key, the Table should return nil2, or its equivalent in the host language.

It’s up to the implementer whether the resulting structure is immutable, and therefore easy to pass around multiple threads, or mutable, and therefore easy to transform or decorate.

The “statement” “;” does nothing. The Lua grammar allows one to add as many or as few semicolons as one wants, or none at all.

Type Systems

ELTN derives from a dynamically typed language (Lua) just like JSON (JavaScript). The same solutons for representing JSON in statically typed languages like Java apply to Lua:

Rejected and Postponed Suggestions

Exclusion of nil

Lua does not distinguish between a table that lacks the requested key and one whose value is nil.8 JSON, however, does make such a distinction because JavaScript does. Lua’s JSON parsers must therefore include a unique NULL value for JSON’s null.

If the host language has simiar semantics to Lua, then, a nil constant may not even be necessary. On the other hand, if its native associative arrays (hash tables, etc.) distinguish between nil/null and an undefined value like JavaScript, then the implementation must also do so.

On the third hand9, since we intended people, not machines, to writ ELTN documents, a nil indicates to the reader that the writer intentionally did not define a value for that key, even if the resulting data structure looks the same either way.

At this time the author is unsure which convention to adhere to. For now we’ll keep nil in ELTN, and let implementers decide which idiom makes sense for their language.

For now we’ll leave it in and let language idioms and implementers decide whether nil is a distinct value or just the result of a missing key.

Function Call syntax

An earlier version of the grammar included a limited version of Lua’s functioncall production:

stat         ::=  ';' | var '=' value | functioncall

value        ::= constant | tableconstructor | functioncall

functioncall ::= funcname arg 
funcname     ::= NAME
arg          ::= tableconstructor | STRING_LITERAL

The host program would define all “functions” and provide them to the parser. It was intended as an extension mechanism to allow the following:

Difficulties in writing the parser caused us to cut this syntax out. More importantly, though, the functioncall syntax presumes that the parser knows what the function is. Unlike the Tag syntax of YAML the semantics of the data file could depend entirely on what the function calls mean. Instead, for now, we recommend the following:

  1. Implementers should provide APIs to convert constants or tables into more specific data structures.
  2. Users should develop conventions to indicate forward, backward, and cyclic references if they’re needed.
  3. Users should place a sequence of tables within a larger “list-like” table to group similar things.
  4. Implementers and developers should work on a schema for validating ELTN documents, ideally written in ELTN, just like similar efforts for XML and JSON.

Variable Substitution

Added 2023-02-09, updated 2023-02-17.

A small change to the grammar would enable variable substitutions

value ::= constant | tableconstructor | var

Here var refers to a NAME defined in a stat production. In a document one can resolve forward and backward references with little difficulty. During a stream one can only use previous definitions.

While this would give the stat production new meaning in the Streaming protocol, this could also lead to cyclic dependencies which would make the value graph difficult to re-serialize in a way that keeps backward compatibility with Lua. One could limit substitutions to forbidding cyclic graphs, substituting by value instead of by reference, or excluding tables entirely, but that merely complicates parsing.

More seriously it raises the specter of how long a variable declaration lasts. During the lifetime of a session? (Or document.) Tied to a user account, like cookies on the Web? Where would the information reside? This little document shouldn’t introduce new implementation headaches beyond parsing and interpreting a simple alternative to JSON or TOML.

History

The original specification was on the G+ blog, repeated in a post here.

Removed implementation details 2023-04-12.


  1. An idea I had when I had similarly bookish co-workers. Sadly something this formal would be overkill now. ↩︎

  2. But see this↩︎ ↩︎

  3. That is, the newline after the slash becomes part of the string. ↩︎

  4. To quote Ierusalimschy et al. verbatim, The escape sequence ‘\z’ skips the following span of whitespace characters, including line breaks; it is particularly useful to break and indent a long literal string into multiple lines without adding the newlines and spaces into the string contents. A short literal string cannot contain unescaped line breaks nor escapes not forming a valid escape sequence. ↩︎

  5. The sequence may have only one or two octal digits unless the following character(s) are also digits, to avoid ambiguity. ↩︎

  6. That is, a function that encodes a Unicode code point into a two or three (or four) byte UTF-8 sequence. ↩︎

  7. The enclosing brackets are mandatory. The code can specify any number of hexadecimal digits, from one to four (or more!). ↩︎

  8. Internally it probably removes the key if the value is nil↩︎

  9. Some of us think we’re Edosians↩︎