Extracting binary data from bytestrings using match
1 The binary match pattern
binary
2 Additional functions
byte->nybbles
binary-match-default-endianness
8.12

Extracting binary data from bytestrings using match🔗ℹ

 (require binary-matcher) package: binary-matcher

This module introduces a new match pattern for matching and destructuring binary data encoded in a bytestring.

The API should be considered very alpha and open to incompatible changes.

Some similar packages include xenomorph and the "#lang" based binfmt.

1 The binary match pattern🔗ℹ

syntax

(binary byte-pattern ...+ maybe-rest)

 
byte-pattern = (bytes pat length)
  | (zero-padded pat length)
  | (until-byte pat byte)
  | (until-byte* pat byte)
  | (length-prefixed pat)
  | (length-prefixed pat prefix-length endianness)
  | (number-type pat)
  | (number-type pat endianness)
  | control-pattern
     
maybe-rest = 
  | (rest* pat)
     
control-pattern = (get-offset pat)
  | (set-offset! offset)
     
number-type = s8
  | u8
  | s16
  | u16
  | s32
  | u32
  | u64
  | s64
  | f32
  | f64
     
prefix-length = u8
  | u16
  | u32
  | u64
     
endianness = big-endian
  | little-endian
  | native-endian
  | host-order
  | network-order
 
  byte : byte?
  length : (and/c fixnum? positive?)
  offset : (and/c fixnum? (>=/c 0))
A match extender that, when matched against a bytestring, tries to destructure it according to the given spec and match extracted values against given match patterns.

An example:

(match #"\17\240bc"
  ((binary (s16 num big-endian) (bytes rest 2))
   (list num rest))) ; (4000 #"bc")

bytes extracts a fixed-width field. zero-padded extracts a fixed-width field and strips trailing 0 bytes. until-byte extracts bytes until the given delimiter byte is encountered. until-byte* is the same but a failure to find the delimiter is not a match failure. length-prefixed reads a length header and then that many bytes. It defaults to the 9P protocol specification of a 2 byte little-endian length if not explicitly specified.

The number patterns should hopefully be self explanatory.

rest* takes any remaining bytes at the end of the bytestring after everything else is matched; if there are no extra bytes, it applies an empty bytestring to its pattern.

Normally, matching starts with the first byte in the bytestring. (set-offset! where) changes the location (To facilitate matching bytestrings with multiple records), and get-offset will save the current index at that point in the matching.

A more complex example, that matches an IPv4 header:

(parameterize ([binary-match-default-endianness 'network-order])
  (match header
    ((binary
      (u8 (app byte->nybbles version header-length)) (u8 service-type) (u16 total-length)
      (u16 identification) (u16 flags+fragment)
      (u8 ttl) (u8 protocol) (u16 checksum)
      (bytes (app make-ip-address source-address) 4)
      (u32 (app (lambda (n) (make-ip-address n 4)) dest-address))
      (rest* options))
     (list version header-length service-type total-length ttl protocol
           (ip-address->string source-address) (ip-address->string dest-address)
           options))))

2 Additional functions🔗ℹ

procedure

(byte->nybbles b)  
byte? byte?
  b : byte?
Splits a single byte into two 4-bit nybbles. The upper 4 bits is the first value, the lower 4 is the second.

parameter

(binary-match-default-endianness)

  (or/c 'big-endian 'little-endian 'native-endian)
(binary-match-default-endianness endianness)  void?
  endianness : (or/c 'big-endian 'little-endian 'native-endian 'network-order 'host-order)
 = 'native-endian
A parameter that controls the endianness used by numeric patterns when one isn’t explicitly given.