This project will create a C library to parse and emit ELTN text. It will then provide wrappers for various languages including Lua, Python, and Ruby.
Design
The ELTN parser is a pull parser similar to the JSON Parser implemented in Java. The Buffer object takes in text, either passed directly or read through a callback, and stores it in an expandable ring buffer. When the Parser is queried for the next event, it asks the the Lexer for tokens, which prompts the Lexer to read characters from the Buffer and transforms individual characters into tokens which the Parser then interprets into events.
sequenceDiagram
actor Reader
participant Parser
participant Lexer
participant Buffer
participant Text@{"type": "entity"}
Reader->>+Parser: next
Parser->>+Lexer: next_token
Lexer->>+Buffer: next_char
Buffer->>Text: read
Text-->>Buffer: "var=1"
Buffer-->>-Lexer: 'v'
Lexer->>+Buffer: next_char
Buffer-->>-Lexer: 'a'
Lexer->>+Buffer: next_char
Buffer-->>-Lexer: 'r'
Lexer->>+Buffer: next_char
Buffer-->>-Lexer: '='
Lexer-->>-Parser: (ELTN_TOKEN_NAME, "var")
Parser->>+Lexer: next_token
Lexer-->>-Parser: (ELTN_TOKEN_EQUAL, "=")
Parser-->>-Reader:
Reader->>+Parser: event
Parser-->>-Reader: ELTN_DEF_NAME
Reader->>+Parser: string
Parser-->>-Reader: "var"
Error Handling
Since C lacks a consistent exception mechanism, the Parser indicates error conditions through an ERROR event and a yet-to-be-determined set of numerical error codes which can be mapped to internationalized error messages. This parser tracks the line and column number of characters read, and provides them to the caller if an error occurs.
Once an error occurs, the Parser stops. The caller must take care of deleting the Parser (which removes the Buffer and Lexer) and cleaning up file descriptors and the like (which the caller passes into the Parser / Buffer).
Multi-threading
Right now the implementation is not thread-safe at all. The Parser interface in particular depends on functions being called in a specific order, so only one thread should ever use the Parser. Multiple threads can write to the Buffer if it were equipped with a mutex to prevent simultaneous reads and writes, as a read can potentially provoke a resizing.
The ring buffer in the Buffer can grow arbitrarily large if a chunk of text read through the callback is too large or an external agent feeds too much text too fast. If this design were multi-threaded we would need a condition variable to prevent the Buffer from taking in more text than the Parser thread can read.
Interface
This section describes the C interface as it currently exists.
The header file “eltn.h” provides prototypes and documentation for all public
functions, types, and #defines, including the ELTN_API marker for all
public functions. Documentation has been omitted for brevity.
Definitions
typedef struct ELTN_Parser ELTN_Parser;
typedef struct ELTN_Buffer ELTN_Buffer;
typedef struct ELTN_Emitter ELTN_Emitter;
typedef struct ELTN_Pool ELTN_Pool;
typedef void* (*ELTN_Alloc)(void* ptr, size_t size);
typedef int (*ELTN_Reader)(void* state, char** strptr, size_t *lenptr);
typedef ssize_t (*ELTN_Writer)(void* ud, const char* str, size_t len, int *errptr);
This defines the allocator and reader callbacks that the parser needs to
allocate memory (assuming malloc/realloc /free` are not suitable
and reading text from a file descriptor or other resource, as wel as
forward references for the parser, source, emitter, and memory pools.
Allocator Protocol
The allocator function mimics malloc, realloc, and free based on its
arguments:
| ptr | size | function |
|---|---|---|
| NULL | any | malloc |
| non-null | non-zero | realloc |
| non-null | 0 | free |
Reader Protocol
The reader function takes an arbitrary pointer (or file descriptor) to an external resource, a pointer to a size variable, and a pointer to an error number variable.
Each time it is invoked, the reader returns the next block of text
from the resource and, in sizeptr, the number of bytes in the text block.
If it encounters an error, it returns NULL and sets the contents of errptr
to a nonzero number.
To signal the normal end of the text, the reader returns NULL and sets both
*sizeptr and *errptr to 0.
Events
typedef enum ELTN_Event {
ELTN_ERROR = -1,
ELTN_STREAM_START = 0,
ELTN_COMMENT,
ELTN_DEF_NAME,
ELTN_KEY_STRING,
ELTN_KEY_NUMBER,
ELTN_KEY_INTEGER,
ELTN_VALUE_STRING,
ELTN_VALUE_NUMBER,
ELTN_VALUE_INTEGER,
ELTN_VALUE_TRUE,
ELTN_VALUE_FALSE,
ELTN_VALUE_NIL,
ELTN_TABLE_START,
ELTN_TABLE_END,
ELTN_STREAM_END
} ELTN_Event;
ELTN_API const char* ELTN_Event_name(ELTN_Event e);
ELTN_API void ELTN_Event_string(ELTN_Event e, char** strptr, size_t* sizeptr);
Most of the event names are pretty straightforward.
ELTN_DEF_NAME indicates a key outside a
table, while ELTN_KEY_STRING indicates a string key inside a table.
See the specification for more details about the difference
between “definitions” and table keys.
Errors
typedef enum ELTN_Error {
ELTN_ERR_UNKNOWN = -1,
ELTN_OK = 0,
ELTN_ERR_OUT_OF_MEMORY,
ELTN_ERR_STREAM_END,
ELTN_ERR_UNEXPECTED_TOKEN,
ELTN_ERR_INVALID_TOKEN,
ELTN_ERR_DUPLICATE_KEY
} ELTN_Error;
ELTN_API const char* ELTN_Error_name(ELTN_Error e);
ELTN_API void ELTN_Error_string(ELTN_Error e, char** strptr, size_t* sizeptr);
The list of potential errors is not yet fixed, but as of this writing these are the types of errors the parser can distinguish:
ELTN_ERR_DUPLICATE_KEY- The document specified the same key twice, explicitly or implicitly, within the same table.
ELTN_ERR_INVALID_TOKEN- The parser encountered a character or word not allowed outside a string or comment.
ELTN_ERR_OUT_OF_MEMORY- An attempt to allocate more memory failed.
ELTN_ERR_STREAM_END- The document or character stream ended unexpectedly, without completing open tables or table entries.
ELTN_ERR_UNEXPECTED_TOKEN- The parser encountered a valid character or word in an unexpected place.
ELTN_ERR_UNKNOWN- An unclassified error.
Parser “Class”
ELTN_API ELTN_Parser* ELTN_Parser_new();
ELTN_API ELTN_Parser* ELTN_Parser_new_with_pool(ELTN_Pool * pool);
ELTN_API bool ELTN_Parser_include_comments(ELTN_Parser * p);
ELTN_API void ELTN_Parser_set_include_comments(ELTN_Parser * p, bool b);
ELTN_API ELTN_Buffer* ELTN_Parser_buffer(ELTN_Parser * p);
ELTN_API ssize_t ELTN_Parser_read(ELTN_Parser * p, ELTN_Reader reader,
void* state);
ELTN_API ssize_t ELTN_Parser_read_string(ELTN_Parser * p, const char* str,
size_t len);
ELTN_API ssize_t ELTN_Parser_read_file(ELTN_Parser * p, FILE * fp);
ELTN_API bool ELTN_Parser_has_next(ELTN_Parser * p);
ELTN_API void ELTN_Parser_next(ELTN_Parser * p);
ELTN_API ELTN_Event ELTN_Parser_event(ELTN_Parser * p);
ELTN_API unsigned int ELTN_Parser_depth(ELTN_Parser *p);
ELTN_API void ELTN_Parser_current_key(ELTN_Parser *p, ELTN_Event* typeptr,
char** strptr, size_t lenptr);
ELTN_API void ELTN_Parser_text(ELTN_Parser * p, char** strptr, size_t* lenptr);
ELTN_API void ELTN_Parser_string(ELTN_Parser * p, char** strptr,
size_t* lenptr);
ELTN_API double ELTN_Parser_number(ELTN_Parser * p);
ELTN_API long int ELTN_Parser_integer(ELTN_Parser * p);
ELTN_API bool ELTN_Parser_boolean(ELTN_Parser * p);
ELTN_API ELTN_Error ELTN_Parser_error_code(ELTN_Parser * p);
ELTN_API unsigned int ELTN_Parser_error_line(ELTN_Parser * p);
ELTN_API unsigned int ELTN_Parser_error_column(ELTN_Parser * p);
ELTN_API void ELTN_Parser_free(ELTN_Parser * p);
The parser’s basic protocol is detailed below.
All functions that provide their results as strings require pointers to a
string (char*) and its length. They ensure that those pointers point to
a newly allocated string (char*) along with its length. The string should
be freed with free() when the caller is done with it.
-
ELTN_Parser_text()provides the relevant section of the source text shorn of whitespace. -
ELTN_Parser_current_key()provides the explicit or implicit key corresponding to a value. Usually this is the last key parsed. UponELTN_TABLE_CLOSE, however, it provides the key associated with the recently closed table in the enclosing table (or definition). -
ELTN_Parser_string()provides the text of a comment, quoted string, or long string shorn of delimiters and with escape sequences resolved to their literal characters.
The functions ELTN_Parser_number() and ELTN_Parser_integer()
convert the text read to double precision floating point or a long integer.
ELTN_Parser_boolean() interprets the value as a boolean value, i.e.
any value but “false” or “nil” is true.
In case of an ERROR event, the caller can get an error code and the position at the beginning of the text that caused the error.
Buffer “Class”
ELTN_API ELTN_Buffer* ELTN_Parser_source(ELTN_Parser* s);
ELTN_API size_t ELTN_Buffer_capacity(ELTN_Buffer* s);
ELTN_API bool ELTN_Buffer_set_capacity(ELTN_Buffer* s, size_t newcap);
ELTN_API bool ELTN_Buffer_is_empty(ELTN_Buffer* s);
ELTN_API bool ELTN_Buffer_is_closed(ELTN_Buffer* s);
ELTN_API ssize_t ELTN_Buffer_write(ELTN_Buffer* s, const char *text, size_t len);
ELTN_API void ELTN_Buffer_close(ELTN_Buffer* s);
A Buffer is responsible for reading in text, either by pulling it on demand through a reader function or by accepting it through the “write” function. The write function returns the number of bytes written, which will be negative if the Buffer has been “closed” or if some error condition happens.
The “close” function signals that no more text is forthcoming. Once the source is closed, it, and by extension the parser, cannot be re-opened for writing. The caller may still read events from it as long as text to process remains. If the text ends prematurely and the source is closed, the parser’s last even will be an ERROR event. If the text ends prematurely but the source is not closed, the parser’s last event will be INCOMPLETE; if more text is written to the Buffer, parsing will resume.
A Buffer is “empty” if it contains no unprocessed text.
Emitter “Class”
As of this writing (2025-05-14) the Emitter class has yet to be implemented. This is the current design.
ELTN_API ELTN_Emitter* ELTN_Emitter_new();
ELTN_API ELTN_Emitter* ELTN_Emitter_new_with_pool(ELTN_Pool* pool);
ELTN_API bool ELTN_Emitter_def_name(ELTN_Emitter* e, const char* n);
ELTN_API bool ELTN_Emitter_key_string(ELTN_Emitter* e, const char* s,
size_t len);
ELTN_API bool ELTN_Emitter_key_number(ELTN_Emitter* e, double n, int sigfigs);
ELTN_API bool ELTN_Emitter_key_integer(ELTN_Emitter* e, int i);
ELTN_API bool ELTN_Emitter_value_string(ELTN_Emitter* e, const char* s,
size_t len);
ELTN_API bool ELTN_Emitter_value_number(ELTN_Emitter* e, double n, int sigfigs);
ELTN_API bool ELTN_Emitter_value_integer(ELTN_Emitter* e, int i);
ELTN_API bool ELTN_Emitter_value_boolean(ELTN_Emitter* e, bool b);
ELTN_API bool ELTN_Emitter_value_nil(ELTN_Emitter* e);
ELTN_API bool ELTN_Emitter_table_start(ELTN_Emitter* e);
ELTN_API bool ELTN_Emitter_table_end(ELTN_Emitter* e);
ELTN_API int ELTN_Emitter_error_code(ELTN_Emitter* e);
ELTN_API void ELTN_Emitter_error_path(ELTN_Emitter* e, char** pathbuf,
size_t* len);
ELTN_API bool ELTN_Emitter_pretty_print(ELTN_Emitter* e);
ELTN_API void ELTN_Emitter_set_pretty_print(ELTN_Emitter* e, bool pretty);
ELTN_API unsigned int ELTN_Emitter_indent(ELTN_Emitter* e);
ELTN_API void ELTN_Emitter_set_indent(ELTN_Emitter* e, unsigned int indent);
ELTN_API ssize_t ELTN_Emitter_length(ELTN_Emitter* e);
ELTN_API ssize_t ELTN_Emitter_write(ELTN_Emitter* e, ELTN_Writer writer,
void* state);
ELTN_API ssize_t ELTN_Emitter_write_file(ELTN_Emitter* e, FILE* fp);
ELTN_API void ELTN_Emitter_free(ELTN_Emitter* e);
It is the reverse of the parser class: its functions take events as arguments and assemble an internal representation of the ELTN document. If the sequence of events creates a valid document, the caller can then obtain a valid document as a C string or write it to a file.
Operation
Parsing
The core parser essentially ports ELTNPP to C:
FILE* fp = fopen("config.eltn", "r");
/* omitting check of `fp` */
ELTN_Parser* epp = ELTN_Parser_new();
ELTN_Parser_read_file(epp, fp);
while (ELTN_Parser_has_next(epp)) {
char* str;
size_t len;
unsigned int depth;
unsigned int olddepth = ELTN_Parser_depth(epp);
ELTN_Parser_next(epp);
depth = ELTN_Parser_depth(epp);
switch (ELTN_Parser_event(epp)) {
case ELTN_ERROR:
printf("ERROR (%u:%u) %s\n",
ELTN_Parser_error_line(epp),
ELTN_Parser_error_column(epp),
ELTN_Error_name(ELTN_Parser_error_code(epp)));
break;
case ELTN_STREAM_START:
printf("STREAM_START (%u)\n", depth);
break;
case ELTN_COMMENT:
ELTN_Parser_string(epp, &str, &len);
printf("COMMENT --<<%*s>>\n", len, str);
free(str);
break;
case ELTN_DEF_NAME:
ELTN_Parser_string(epp, &str, &len);
printf("DEF_NAME %*s\n", len, str);
free(str);
break;
case ELTN_KEY_STRING:
ELTN_Parser_string(epp, &str, &len);
printf("KEY_STRING [<<%*s>>] =\n", len, str);
free(str);
break;
case ELTN_KEY_NUMBER:
printf("KEY_NUMBER [%lf] =\n", ELTN_Parser_number(epp));
break;
case ELTN_KEY_INTEGER:
printf("KEY_INTEGER [%ld] =\n", ELTN_Parser_integer(epp));
break;
case ELTN_VALUE_STRING:
ELTN_Parser_string(epp, &str, &len);
printf("VALUE_STRING <<%*s>>\n", len, str);
free(str);
break;
case ELTN_VALUE_NUMBER:
printf("VALUE_NUMBER %lf\n", ELTN_Parser_number(epp));
break;
case ELTN_VALUE_INTEGER:
printf("VALUE_INTEGER %ld\n", ELTN_Parser_integer(epp));
break;
case ELTN_VALUE_TRUE:
printf("VALUE_TRUE\n");
break;
case ELTN_VALUE_FALSE:
printf("VALUE_FALSE\n");
break;
case ELTN_VALUE_NIL:
printf("VALUE_NIL\n");
break;
case ELTN_TABLE_START:
printf("TABLE_START (%u -> %u)\n", olddepth, depth);
break;
case ELTN_TABLE_END:
printf("TABLE_END (%u -> %u)\n", olddepth, depth);
break;
case ELTN_STREAM_END:
printf("STREAM_END (%u)\n", depth);
break;
}
}
ELTN_Parser_free(epp);
fclose(fp);
ELTN_Parser_event will signal a read error in addition to parse errors,
but the API is otherwise remarkably similar.
Emitting
The Emitter in C is somewhat verbose. On the other hand, it will check the syntax of what it emits. Each function returns a boolean that tells the caller whether their event was syntactically correct.
ELTN_Emitter* eep = ELTN_Emitter_new();
const char* url = "git+https://github.com/frank-mitchell-com/eltnc.git";
bool ok = true;
ok &= ELTN_Emitter_var_name(eep, "package");
ok &= ELTN_Emitter_value_string(eep, "eltnc", 5);
ok &= ELTN_Emitter_var_name(eep, "version");
ok &= ELTN_Emitter_value_string(eep, "1.0.0", 5);
ok &= ELTN_Emitter_var_name(eep, "source");
ok &= ELTN_Emitter_table_start(eep);
ok &= ELTN_Emitter_key_string(eep, "url", 3);
ok &= ELTN_Emitter_value_string(eep, url, strlen(url));
ok &= ELTN_Emitter_key_string(eep, "module", 6);
ok &= ELTN_Emitter_value_string(eep, "eltnc", 5);
ok &= ELTN_Emitter_key_string(eep, "tag", 3);
ok &= ELTN_Emitter_value_string(eep, "v1.0.0", 6);
ok &= ELTN_Emitter_table_end(eep);
/* ... and so on ... */
if (!ok) {
log_error_code(ELTN_Emitter_error_code(eep));
ELTN_Emitter_free(eep);
return -1;
}
ELTN_Emitter_set_pretty_print(eep, true, 3);
ssize_t expect = ELTN_Emtter_length(eep);
if (expect < 0) {
log_error_code(ELTN_Emitter_error_code(eep));
ELTN_Emitter_free(eep);
return -1;
}
FILE* fp = fopen("eltnc-1.0.0-1.rockspec", "w");
/* omitting check of `fp` */
ssize_t result = ELTN_Emitter_write_file(eep, fp);
/*
* Should be something like:
*
* package = "eltnc"
* version = "1.0.0"
* source = {
* url = "git+https://github.com/frank-mitchell-com/eltnc.git",
* module = "eltnc",
* tag = "v1.0.0"
* }
* -- ... and so on ...
*/
ELTN_Emitter_free(eep);
int fresult = fclose(fp);
ASSERT(fresult == 0);
ASSERT(result == expect);
Pools
Pools grew out of the necessity of copying the ELTN_Alloc function and
its state everywhere. Unlike the rest of the API, they use a reference
counting scheme.
ELTN_API void ELTN_Pool_new_with_alloc(ELTN_Pool ** hptr, ELTN_Alloc alloc,
void* state);
ELTN_API void ELTN_Pool_acquire(ELTN_Pool ** hptr);
ELTN_API void ELTN_Pool_release(ELTN_Pool ** hptr);
ELTN_API void ELTN_Pool_set(ELTN_Pool ** dest, ELTN_Pool ** src);
Internal routines call the equivalents of malloc, realloc, and free on
the Pool. If the pool is NULL, these routines default to the respective libc
functions.
Installation
Under Posix-Compliant Platforms
The GNU makefile install target will install libraries in
/usr/local/lib/libeltnc-VERSION.a and
/usr/local/lib/libeltnc.so.VERSION;
headers files go to the directory /usr/local/include/eltnc-VERSION.
To change the root directory, override the INSTALL_DIR variable to change
the root directory or INSTALL_HDR and INSTALL_LIB to change the header
and library installation directories, respectively.
Under MinGW
The dist target for the makefile compiles a Windows DLL and a header file,
then bundles them in eltnc-VERSION.zip in the root directory.
Under Other Platforms
The project consists of four source files and four header files. Compiling for one’s specific platform should not be too difficult.
Wrappers
Wrappers will use native I/O to construct a tree of native associative
arrays, strings, number, booleans, and nil/null values.
Lua
The Lua wrapper will support the event-driven interface above, plus some Lua-specific idioms.
local eltnc = require "eltnc"
local parser <close> = eltnc.file_parser("config.eltn")
for event, value in parser:iter() do
if event == eltnc.DEF_NAME then
-- etc. ...
end
end
In addition to representing an ELTN object as a series of events, the parser will create a tree of tables.
eltn = require "eltnc"
local eltndoc, parse_err = eltn.parse_file("config.eltn")
local eltnstr, emit_err = eltn.emit(eltndoc)
The err value will contain an error code or string denoting the specific
error that prevented the parser or emitter from completing.
Python
The Python wrapper will support the event-driven interface above, plus some Python-specific idioms.
import eltnc
with eltnc.FileParser("config.eltn") as parser:
for event, value in parser.iter():
match event
case eltnc.STREAM_START:
# ... some processing
# ... etc.
In addition to representing an ELTN object as a series of events, the parser will create a tree of dictionaries.
import eltnc
eltndoc, parse_err = eltnc.parse_file("config.eltn")
eltnstr, emit_err = eltnc.emit(eltndoc)
Ruby
The Ruby wrapper will support the event-driven interface above, plus some Ruby-specific idioms.
require "eltnc"
ELTNC::FileParser.new("config.eltn") do |parser|
parser.each() do |event, value|
case event
when ELTN::STREAM_START then
# do something
# and so forth
end
end
end
In addition to representing an ELTN object as a series of events,
the parser will create a tree of Hashes.
require "eltnc"
eltndoc, parse_err = ELTNC::parse_file("config.eltn")
eltnstr, emit_err = ELTNC::emit(eltndoc)