binfmt:   binary format parser generator
1 Example
2 Grammar and Operation
3 Reference
exn:  fail:  binfmt?
exn:  fail:  binfmt-id
8.12

binfmt: binary format parser generator🔗ℹ

Bogdan Popa <bogdan@defn.io>

 #lang binfmt package: binfmt-lib

This package provides a #lang for building binary format parsers with support for limited context-sensitivity.

1 Example🔗ℹ

Here is a parser definition for the ID3v1 format:

#lang binfmt
id3     = magic title artist album year comment genre;
magic   = 'T' 'A' 'G';
title   = u8{30};
artist  = u8{30};
album   = u8{30};
year    = u8{4};
comment = u8{30};
genre   = u8;

Assuming this is saved in a file called "id3v1.b", you can import it from Racket and apply any of the definitions to an input port in order to parse its contents:

> (require "id3v1.b")

You can parse the magic header by itself:

> (magic (open-input-bytes #"TAG"))

'((char_1 . #\T) (char_2 . #\A) (char_3 . #\G))

Or a full tag:

> (define data
   (bytes-append
    #"TAGCreative Commons Song         Improbulus                    N"
    #"/A                           2005Take on O Mio Babbino Caro!   g"))
> (define tree
    (id3 (open-input-bytes data)))

And inspect the resulting parse tree:

> (map car tree)

'(magic_1 title_1 artist_1 album_1 year_1 comment_1 genre_1)

> (define ref (compose1 cdr assq))
> (take (ref 'title_1 tree) 8)

'(67 114 101 97 116 105 118 101)

> (apply bytes (ref 'title_1 tree))

#"Creative Commons Song         "

Finally, parsing invalid data results in a syntax error:

> (id3 (open-input-bytes #"TAG..."))

parse failed

 expected 'u8' but found EOF

  in: string

  position: 7

Every definition automatically creates an un-parser. Un-parsers are functions that take a parse tree as input and serialize the data to an output port. They are named by prepending un- to the name of a definition.

> (define bs
    (call-with-output-bytes
     (lambda (out)
       (un-id3 tree out))))
> (for ([n (in-range 0 (bytes-length bs) 64)])
    (println (subbytes bs n (+ n 64))))

#"TAGCreative Commons Song         Improbulus                    N"

#"/A                           2005Take on O Mio Babbino Caro!   g"

2 Grammar and Operation🔗ℹ

The grammar for binfmt is as follows:

 

def

 ::= 

alt {| alt}* ;

 

alt

 ::= 

expr+

 

expr

 ::= 

term  |  star  |  plus  |  repeat

 

star

 ::= 

term *

 

plus

 ::= 

term +

 

repeat

 ::= 

term { id  |  natural }

 

term

 ::= 

byte

 

  |  

char

 

  |  

id

 

byte

 ::= 

an integer between 0x00 and 0xFF

 

char

 ::= 

' ascii character '

 

id

 ::= 

any identifier

 

natural

 ::= 

any natural number

Within an alt, each expr is assigned a unique name based on its id: the first time an id appers in an alt, _1 is appended to its name, the second time _2, and so on.

Alternatives containing two or more exprs parse to an association list mapping expr names (as defined above) to parse results. Alternatives containing a single expr collapse to the result of the expr.

The repeat syntax can either repeat a parser an exact number of times or it can repeat it based on the result of a previous parser within the same alt. For example, the following parser parses a i8 to determine the length of a string, then parses that number of u8s following it.

#lang binfmt
string = strlen u8{strlen_1};
strlen = i8;

Negative length values are allowed, in which case they’re treated the same as 0. The parser above would parse #"\377" to an empty string.

The following parsers are built-in:

Parsers for alts may backtrack, but backtracking is only supported on file and string input ports. All other types of ports (eg. pipes and custom ports that don’t support setting a file position) cause backtracking to fail with a parsing error.

On parse and unparse failure, an exn:fail:binfmt? error is raised.

3 Reference🔗ℹ

 (require binfmt/runtime) package: binfmt-lib

procedure

(exn:fail:binfmt? v)  boolean?

  v : any/c
Returns #t when v is a binfmt error.

procedure

(exn:fail:binfmt-id e)  symbol?

  e : exn:fail:binfmt?
Returns the id of the parser or unparser that failed.