ELTN in C (Work in Progress)

Frank Mitchell

Posted: 2023-04-12
Last Modified: 2026-02-09
Word Count: 2406
Tags: c-programming eltn lua programming python ruby

Table of Contents

This project will create a C library to parse and emit ELTN text. It will then provide wrappers for various languages including Lua, Python, and Ruby.

Design

The ELTN parser is a pull parser similar to the JSON Parser implemented in Java. The Buffer object takes in text, either passed directly or read through a callback, and stores it in an expandable ring buffer. When the Parser is queried for the next event, it asks the the Lexer for tokens, which prompts the Lexer to read characters from the Buffer and transforms individual characters into tokens which the Parser then interprets into events.

T E X T = = > B u f f e r < = > L e x e r < = > P a r s e r = = > E V E N T S
  sequenceDiagram
    actor Reader
    participant Parser
    participant Lexer
    participant Buffer
    participant Text@{"type": "entity"}
    Reader->>+Parser: next
    Parser->>+Lexer: next_token
    Lexer->>+Buffer: next_char
    Buffer->>Text: read
    Text-->>Buffer: "var=1"
    Buffer-->>-Lexer: 'v'
    Lexer->>+Buffer: next_char
    Buffer-->>-Lexer: 'a'
    Lexer->>+Buffer: next_char
    Buffer-->>-Lexer: 'r'
    Lexer->>+Buffer: next_char
    Buffer-->>-Lexer: '='
    Lexer-->>-Parser: (ELTN_TOKEN_NAME, "var")
    Parser->>+Lexer: next_token
    Lexer-->>-Parser: (ELTN_TOKEN_EQUAL, "=")
    Parser-->>-Reader:
    Reader->>+Parser: event
    Parser-->>-Reader: ELTN_DEF_NAME
    Reader->>+Parser: string
    Parser-->>-Reader: "var"

Error Handling

Since C lacks a consistent exception mechanism, the Parser indicates error conditions through an ERROR event and a yet-to-be-determined set of numerical error codes which can be mapped to internationalized error messages. This parser tracks the line and column number of characters read, and provides them to the caller if an error occurs.

Once an error occurs, the Parser stops. The caller must take care of deleting the Parser (which removes the Buffer and Lexer) and cleaning up file descriptors and the like (which the caller passes into the Parser / Buffer).

Multi-threading

Right now the implementation is not thread-safe at all. The Parser interface in particular depends on functions being called in a specific order, so only one thread should ever use the Parser. Multiple threads can write to the Buffer if it were equipped with a mutex to prevent simultaneous reads and writes, as a read can potentially provoke a resizing.

The ring buffer in the Buffer can grow arbitrarily large if a chunk of text read through the callback is too large or an external agent feeds too much text too fast. If this design were multi-threaded we would need a condition variable to prevent the Buffer from taking in more text than the Parser thread can read.

Interface

This section describes the C interface as it currently exists. The header file “eltn.h” provides prototypes and documentation for all public functions, types, and #defines, including the ELTN_API marker for all public functions. Documentation has been omitted for brevity.

Definitions

typedef struct ELTN_Parser ELTN_Parser;

typedef struct ELTN_Buffer ELTN_Buffer;

typedef struct ELTN_Emitter ELTN_Emitter;

typedef struct ELTN_Pool ELTN_Pool;

typedef void* (*ELTN_Alloc)(void* ptr, size_t size);

typedef int (*ELTN_Reader)(void* state, char** strptr, size_t *lenptr);

typedef ssize_t (*ELTN_Writer)(void* ud, const char* str, size_t len, int *errptr);

This defines the allocator and reader callbacks that the parser needs to allocate memory (assuming malloc/realloc /free` are not suitable and reading text from a file descriptor or other resource, as wel as forward references for the parser, source, emitter, and memory pools.

Allocator Protocol

The allocator function mimics malloc, realloc, and free based on its arguments:

ptr size function
NULL any malloc
non-null non-zero realloc
non-null 0 free

Reader Protocol

The reader function takes an arbitrary pointer (or file descriptor) to an external resource, a pointer to a size variable, and a pointer to an error number variable.

Each time it is invoked, the reader returns the next block of text from the resource and, in sizeptr, the number of bytes in the text block. If it encounters an error, it returns NULL and sets the contents of errptr to a nonzero number.

To signal the normal end of the text, the reader returns NULL and sets both *sizeptr and *errptr to 0.

Events

typedef enum ELTN_Event {
    ELTN_ERROR = -1,
    ELTN_STREAM_START = 0,
    ELTN_COMMENT,
    ELTN_DEF_NAME,
    ELTN_KEY_STRING,
    ELTN_KEY_NUMBER,
    ELTN_KEY_INTEGER,
    ELTN_VALUE_STRING,
    ELTN_VALUE_NUMBER,
    ELTN_VALUE_INTEGER,
    ELTN_VALUE_TRUE,
    ELTN_VALUE_FALSE,
    ELTN_VALUE_NIL,
    ELTN_TABLE_START,
    ELTN_TABLE_END,
    ELTN_STREAM_END
} ELTN_Event;

ELTN_API const char* ELTN_Event_name(ELTN_Event e);

ELTN_API void ELTN_Event_string(ELTN_Event e, char** strptr, size_t* sizeptr);

Most of the event names are pretty straightforward. ELTN_DEF_NAME indicates a key outside a table, while ELTN_KEY_STRING indicates a string key inside a table. See the specification for more details about the difference between “definitions” and table keys.

Errors

typedef enum ELTN_Error {
    ELTN_ERR_UNKNOWN = -1,
    ELTN_OK = 0,
    ELTN_ERR_OUT_OF_MEMORY,
    ELTN_ERR_STREAM_END,
    ELTN_ERR_UNEXPECTED_TOKEN,
    ELTN_ERR_INVALID_TOKEN,
    ELTN_ERR_DUPLICATE_KEY
} ELTN_Error;

ELTN_API const char* ELTN_Error_name(ELTN_Error e);

ELTN_API void ELTN_Error_string(ELTN_Error e, char** strptr, size_t* sizeptr);

The list of potential errors is not yet fixed, but as of this writing these are the types of errors the parser can distinguish:

ELTN_ERR_DUPLICATE_KEY
The document specified the same key twice, explicitly or implicitly, within the same table.
ELTN_ERR_INVALID_TOKEN
The parser encountered a character or word not allowed outside a string or comment.
ELTN_ERR_OUT_OF_MEMORY
An attempt to allocate more memory failed.
ELTN_ERR_STREAM_END
The document or character stream ended unexpectedly, without completing open tables or table entries.
ELTN_ERR_UNEXPECTED_TOKEN
The parser encountered a valid character or word in an unexpected place.
ELTN_ERR_UNKNOWN
An unclassified error.

Parser “Class”

ELTN_API ELTN_Parser* ELTN_Parser_new();

ELTN_API ELTN_Parser* ELTN_Parser_new_with_pool(ELTN_Pool * pool);

ELTN_API bool ELTN_Parser_include_comments(ELTN_Parser * p);

ELTN_API void ELTN_Parser_set_include_comments(ELTN_Parser * p, bool b);

ELTN_API ELTN_Buffer* ELTN_Parser_buffer(ELTN_Parser * p);

ELTN_API ssize_t ELTN_Parser_read(ELTN_Parser * p, ELTN_Reader reader,
                                  void* state);

ELTN_API ssize_t ELTN_Parser_read_string(ELTN_Parser * p, const char* str,
                                         size_t len);
 
ELTN_API ssize_t ELTN_Parser_read_file(ELTN_Parser * p, FILE * fp);
 
ELTN_API bool ELTN_Parser_has_next(ELTN_Parser * p);

ELTN_API void ELTN_Parser_next(ELTN_Parser * p);

ELTN_API ELTN_Event ELTN_Parser_event(ELTN_Parser * p);

ELTN_API unsigned int ELTN_Parser_depth(ELTN_Parser *p);

ELTN_API void ELTN_Parser_current_key(ELTN_Parser *p, ELTN_Event* typeptr,
                                      char** strptr, size_t lenptr);
 
ELTN_API void ELTN_Parser_text(ELTN_Parser * p, char** strptr, size_t* lenptr);

ELTN_API void ELTN_Parser_string(ELTN_Parser * p, char** strptr,
                                 size_t* lenptr);

ELTN_API double ELTN_Parser_number(ELTN_Parser * p);

ELTN_API long int ELTN_Parser_integer(ELTN_Parser * p);

ELTN_API bool ELTN_Parser_boolean(ELTN_Parser * p);

ELTN_API ELTN_Error ELTN_Parser_error_code(ELTN_Parser * p);

ELTN_API unsigned int ELTN_Parser_error_line(ELTN_Parser * p);

ELTN_API unsigned int ELTN_Parser_error_column(ELTN_Parser * p);

ELTN_API void ELTN_Parser_free(ELTN_Parser * p);

The parser’s basic protocol is detailed below.

All functions that provide their results as strings require pointers to a string (char*) and its length. They ensure that those pointers point to a newly allocated string (char*) along with its length. The string should be freed with free() when the caller is done with it.

The functions ELTN_Parser_number() and ELTN_Parser_integer() convert the text read to double precision floating point or a long integer. ELTN_Parser_boolean() interprets the value as a boolean value, i.e. any value but “false” or “nil” is true.

In case of an ERROR event, the caller can get an error code and the position at the beginning of the text that caused the error.

Buffer “Class”

ELTN_API ELTN_Buffer* ELTN_Parser_source(ELTN_Parser* s);

ELTN_API size_t ELTN_Buffer_capacity(ELTN_Buffer* s);

ELTN_API bool ELTN_Buffer_set_capacity(ELTN_Buffer* s, size_t newcap);

ELTN_API bool ELTN_Buffer_is_empty(ELTN_Buffer* s);

ELTN_API bool ELTN_Buffer_is_closed(ELTN_Buffer* s);

ELTN_API ssize_t ELTN_Buffer_write(ELTN_Buffer* s, const char *text, size_t len);

ELTN_API void ELTN_Buffer_close(ELTN_Buffer* s);

A Buffer is responsible for reading in text, either by pulling it on demand through a reader function or by accepting it through the “write” function. The write function returns the number of bytes written, which will be negative if the Buffer has been “closed” or if some error condition happens.

The “close” function signals that no more text is forthcoming. Once the source is closed, it, and by extension the parser, cannot be re-opened for writing. The caller may still read events from it as long as text to process remains. If the text ends prematurely and the source is closed, the parser’s last even will be an ERROR event. If the text ends prematurely but the source is not closed, the parser’s last event will be INCOMPLETE; if more text is written to the Buffer, parsing will resume.

A Buffer is “empty” if it contains no unprocessed text.

Emitter “Class”

As of this writing (2025-05-14) the Emitter class has yet to be implemented. This is the current design.

ELTN_API ELTN_Emitter* ELTN_Emitter_new();

ELTN_API ELTN_Emitter* ELTN_Emitter_new_with_pool(ELTN_Pool* pool);

ELTN_API bool ELTN_Emitter_def_name(ELTN_Emitter* e, const char* n);

ELTN_API bool ELTN_Emitter_key_string(ELTN_Emitter* e, const char* s,
                                      size_t len);

ELTN_API bool ELTN_Emitter_key_number(ELTN_Emitter* e, double n, int sigfigs);

ELTN_API bool ELTN_Emitter_key_integer(ELTN_Emitter* e, int i);

ELTN_API bool ELTN_Emitter_value_string(ELTN_Emitter* e, const char* s,
                                        size_t len);

ELTN_API bool ELTN_Emitter_value_number(ELTN_Emitter* e, double n, int sigfigs);

ELTN_API bool ELTN_Emitter_value_integer(ELTN_Emitter* e, int i);

ELTN_API bool ELTN_Emitter_value_boolean(ELTN_Emitter* e, bool b);

ELTN_API bool ELTN_Emitter_value_nil(ELTN_Emitter* e);

ELTN_API bool ELTN_Emitter_table_start(ELTN_Emitter* e);

ELTN_API bool ELTN_Emitter_table_end(ELTN_Emitter* e);

ELTN_API int ELTN_Emitter_error_code(ELTN_Emitter* e);

ELTN_API void ELTN_Emitter_error_path(ELTN_Emitter* e, char** pathbuf,
                                      size_t* len);

ELTN_API bool ELTN_Emitter_pretty_print(ELTN_Emitter* e);

ELTN_API void ELTN_Emitter_set_pretty_print(ELTN_Emitter* e, bool pretty);

ELTN_API unsigned int ELTN_Emitter_indent(ELTN_Emitter* e);

ELTN_API void ELTN_Emitter_set_indent(ELTN_Emitter* e, unsigned int indent);

ELTN_API ssize_t ELTN_Emitter_length(ELTN_Emitter* e);

ELTN_API ssize_t ELTN_Emitter_write(ELTN_Emitter* e, ELTN_Writer writer,
                                    void* state);

ELTN_API ssize_t ELTN_Emitter_write_file(ELTN_Emitter* e, FILE* fp);

ELTN_API void ELTN_Emitter_free(ELTN_Emitter* e);

It is the reverse of the parser class: its functions take events as arguments and assemble an internal representation of the ELTN document. If the sequence of events creates a valid document, the caller can then obtain a valid document as a C string or write it to a file.

Operation

Parsing

The core parser essentially ports ELTNPP to C:

FILE* fp = fopen("config.eltn", "r");

/* omitting check of `fp` */

ELTN_Parser* epp = ELTN_Parser_new();

ELTN_Parser_read_file(epp, fp);

while (ELTN_Parser_has_next(epp)) {
    char* str;
    size_t len;
    unsigned int depth;
    unsigned int olddepth = ELTN_Parser_depth(epp);

    ELTN_Parser_next(epp);

    depth = ELTN_Parser_depth(epp);

    switch (ELTN_Parser_event(epp)) {
    case ELTN_ERROR:
        printf("ERROR (%u:%u) %s\n",
               ELTN_Parser_error_line(epp),
               ELTN_Parser_error_column(epp),
               ELTN_Error_name(ELTN_Parser_error_code(epp)));
        break;
    case ELTN_STREAM_START:
        printf("STREAM_START (%u)\n", depth);
        break;
    case ELTN_COMMENT:
        ELTN_Parser_string(epp, &str, &len);
        printf("COMMENT --<<%*s>>\n", len, str);
        free(str);
        break;
    case ELTN_DEF_NAME:
        ELTN_Parser_string(epp, &str, &len);
        printf("DEF_NAME %*s\n", len, str);
        free(str);
        break;
    case ELTN_KEY_STRING:
        ELTN_Parser_string(epp, &str, &len);
        printf("KEY_STRING [<<%*s>>] =\n", len, str);
        free(str);
        break;
    case ELTN_KEY_NUMBER:
        printf("KEY_NUMBER [%lf] =\n", ELTN_Parser_number(epp));
        break;
    case ELTN_KEY_INTEGER:
        printf("KEY_INTEGER [%ld] =\n", ELTN_Parser_integer(epp));
        break;
    case ELTN_VALUE_STRING:
        ELTN_Parser_string(epp, &str, &len);
        printf("VALUE_STRING <<%*s>>\n", len, str);
        free(str);
        break;
    case ELTN_VALUE_NUMBER:
        printf("VALUE_NUMBER %lf\n", ELTN_Parser_number(epp));
        break;
    case ELTN_VALUE_INTEGER:
        printf("VALUE_INTEGER %ld\n", ELTN_Parser_integer(epp));
        break;
    case ELTN_VALUE_TRUE:
        printf("VALUE_TRUE\n");
        break;
    case ELTN_VALUE_FALSE:
        printf("VALUE_FALSE\n");
        break;
    case ELTN_VALUE_NIL:
        printf("VALUE_NIL\n");
        break;
    case ELTN_TABLE_START:
        printf("TABLE_START (%u -> %u)\n", olddepth, depth);
        break;
    case ELTN_TABLE_END:
        printf("TABLE_END (%u -> %u)\n", olddepth, depth);
        break;
    case ELTN_STREAM_END:
        printf("STREAM_END (%u)\n", depth);
        break;
    }
}

ELTN_Parser_free(epp);

fclose(fp);

ELTN_Parser_event will signal a read error in addition to parse errors, but the API is otherwise remarkably similar.

Emitting

The Emitter in C is somewhat verbose. On the other hand, it will check the syntax of what it emits. Each function returns a boolean that tells the caller whether their event was syntactically correct.

ELTN_Emitter* eep = ELTN_Emitter_new();

const char* url = "git+https://github.com/frank-mitchell-com/eltnc.git";

bool ok = true;
ok &= ELTN_Emitter_var_name(eep, "package");
ok &= ELTN_Emitter_value_string(eep, "eltnc", 5);
ok &= ELTN_Emitter_var_name(eep, "version");
ok &= ELTN_Emitter_value_string(eep, "1.0.0", 5);
ok &= ELTN_Emitter_var_name(eep, "source");
ok &= ELTN_Emitter_table_start(eep);
ok &= ELTN_Emitter_key_string(eep, "url", 3);
ok &= ELTN_Emitter_value_string(eep, url, strlen(url));
ok &= ELTN_Emitter_key_string(eep, "module", 6);
ok &= ELTN_Emitter_value_string(eep, "eltnc", 5);
ok &= ELTN_Emitter_key_string(eep, "tag", 3);
ok &= ELTN_Emitter_value_string(eep, "v1.0.0", 6);
ok &= ELTN_Emitter_table_end(eep);
/* ... and so on ... */

if (!ok) {
    log_error_code(ELTN_Emitter_error_code(eep));
    ELTN_Emitter_free(eep);
    return -1;
}

ELTN_Emitter_set_pretty_print(eep, true, 3);

ssize_t expect = ELTN_Emtter_length(eep);

if (expect < 0) {
    log_error_code(ELTN_Emitter_error_code(eep));
    ELTN_Emitter_free(eep);
    return -1;
}

FILE* fp = fopen("eltnc-1.0.0-1.rockspec", "w");

/* omitting check of `fp` */

ssize_t result = ELTN_Emitter_write_file(eep, fp);

/*
 * Should be something like:
 *
 * package = "eltnc"
 * version = "1.0.0"
 * source = {
 *    url = "git+https://github.com/frank-mitchell-com/eltnc.git",
 *    module = "eltnc",
 *    tag = "v1.0.0"
 * }
 * -- ... and so on ...
 */

ELTN_Emitter_free(eep);

int fresult = fclose(fp);

ASSERT(fresult == 0);
ASSERT(result == expect);

Pools

Pools grew out of the necessity of copying the ELTN_Alloc function and its state everywhere. Unlike the rest of the API, they use a reference counting scheme.

ELTN_API void ELTN_Pool_new_with_alloc(ELTN_Pool ** hptr, ELTN_Alloc alloc,
                                       void* state);

ELTN_API void ELTN_Pool_acquire(ELTN_Pool ** hptr);

ELTN_API void ELTN_Pool_release(ELTN_Pool ** hptr);

ELTN_API void ELTN_Pool_set(ELTN_Pool ** dest, ELTN_Pool ** src);

Internal routines call the equivalents of malloc, realloc, and free on the Pool. If the pool is NULL, these routines default to the respective libc functions.

Installation

Under Posix-Compliant Platforms

The GNU makefile install target will install libraries in /usr/local/lib/libeltnc-VERSION.a and /usr/local/lib/libeltnc.so.VERSION; headers files go to the directory /usr/local/include/eltnc-VERSION. To change the root directory, override the INSTALL_DIR variable to change the root directory or INSTALL_HDR and INSTALL_LIB to change the header and library installation directories, respectively.

Under MinGW

The dist target for the makefile compiles a Windows DLL and a header file, then bundles them in eltnc-VERSION.zip in the root directory.

Under Other Platforms

The project consists of four source files and four header files. Compiling for one’s specific platform should not be too difficult.

Wrappers

Wrappers will use native I/O to construct a tree of native associative arrays, strings, number, booleans, and nil/null values.

Lua

The Lua wrapper will support the event-driven interface above, plus some Lua-specific idioms.

local eltnc = require "eltnc"

local parser <close> = eltnc.file_parser("config.eltn")
for event, value in parser:iter() do
    if event == eltnc.DEF_NAME then
        -- etc. ...
    end
end

In addition to representing an ELTN object as a series of events, the parser will create a tree of tables.

eltn = require "eltnc"

local eltndoc, parse_err = eltn.parse_file("config.eltn")

local eltnstr, emit_err = eltn.emit(eltndoc)

The err value will contain an error code or string denoting the specific error that prevented the parser or emitter from completing.

Python

The Python wrapper will support the event-driven interface above, plus some Python-specific idioms.

import eltnc

with eltnc.FileParser("config.eltn") as parser:
    for event, value in parser.iter():
        match event
            case eltnc.STREAM_START:
                # ... some processing
            # ... etc.

In addition to representing an ELTN object as a series of events, the parser will create a tree of dictionaries.

import eltnc

eltndoc, parse_err = eltnc.parse_file("config.eltn")

eltnstr, emit_err = eltnc.emit(eltndoc)

Ruby

The Ruby wrapper will support the event-driven interface above, plus some Ruby-specific idioms.

require "eltnc"

ELTNC::FileParser.new("config.eltn") do |parser|
    parser.each() do |event, value|
        case event 
        when ELTN::STREAM_START then 
            # do something
        # and so forth
        end
    end
end

In addition to representing an ELTN object as a series of events, the parser will create a tree of Hashes.

require "eltnc"

eltndoc, parse_err = ELTNC::parse_file("config.eltn")

eltnstr, emit_err = ELTNC::emit(eltndoc)