PEG

8.12

PEG🔗ℹ

This library implements a PEG parser generator.

1 Introduction🔗ℹ

PEG can be thought of as an advance over regex. It can match more languages (for example balanced brackets) and can be paired with semantic actions to produce structured results from a parse.

The PEG language is implemented as a system of macros that compiles parser descriptions (rules) into scheme code. It is also provided with a custom syntax via "#lang peg".

The generated code parses text by interacting with the "PEG VM", which is a set of registers holding the input text, input position, control stack for backtracking and error reporting notes.

2 Syntax Reference🔗ℹ

2.1 define-peg🔗ℹ

syntax
(define-peg name rule)

<rule> = (epsilon) ; always succeeds
| (char c) ; matches the character c
| (any-char) ; matches any character
| (range c1 c2) ; match any char between c1 and c2
| (string str) ; matches string str
| (and <rule> ...) ; sequence
| (or <rule> ...) ; prioritized choice
| (* <rule> ...) ; zero or more
| (+ <rule> ...) ; one or more
| (? <rule> ...) ; zero or one
| (call name)
| (capture name <rule>)
| (! <rule> ...) ; negative lookahead
| (& <rule>) ; positive lookahead
| (drop <rule> ...) ; discard the semantic result on matching

Defines a new scheme function named peg-rule:name by compiling the peg rule into scheme code that interacts with the PEG VM.

syntax
(define-peg name rule action)

Same as above, but also performs a semantic action to produce its result. Semantic actions are regular scheme expressions, they can refer to variables named by a capture.

We also provide shorthands for some common semantic actions:

syntax
(define-peg/drop name rule)

= (define-peg rule-name (drop rule))

makes the parser produce no result.

syntax
(define-peg/bake name rule)

= (define-peg rule-name (name res rule) res)

transforms the peg-result into a scheme object.

syntax
(define-peg/tag name rule)

= (define-peg rule-name (name res exp) `(rule-name . ,res))

tags the result with the peg rule name. Useful for parsers that create an AST.

2.2 peg🔗ℹ

syntax
(peg rule input-text)

Run a PEG parser. Attempt to parse the input-text string using the given rule. This is sets up the PEG VM registers into an initial state and then calls into the parser for rule.

3 Examples🔗ℹ

3.1 Example 1🔗ℹ

For a simple example, lets try splitting a sentence into words. We can describe a word as one or more of non-space characters, optionally followed by a space:

> (require peg)
> (define sentence "the quick brown fox jumps over the lazy dog")
> (define-peg non-space
    (and (! #\space) (any-char)))
> (define-peg/bake word
    (and (+ non-space)
         (drop (? #\space))))
> (peg word sentence)
"the"
> (peg (+ word) sentence)
'("the" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog")

Using the peg lang, the example above is equal to

#lang peg

(define sentence "the quick brown fox jumps over the lazy dog"); //yes, we can use

//one-line comments and any sequence of s-exps BEFORE the grammar definition

non-space <- (! ' ') . ; //the dot is "any-char" in peg

word <- c:(non-space+ ~(' ' ?)) -> c ; //the ~ is drop

//we can use ident:peg to act as (name ident peg).

//and rule <- exp -> action is equal to (define-peg rule exp action)

//with this in a file, we can use the repl of drracket to do exactly the

//uses of peg above.

3.2 Example 2🔗ℹ

Here is a simple calculator example that demonstrates semantic actions and recursive rules.

(define-peg number (name res (+ (range #\0 #\9)))
  (string->number res))
(define-peg sum
  (and (name v1 prod) (? (and #\+ (name v2 sum))))
  (if v2 (+ v1 v2) v1))
(define-peg prod
  (and (name v1 number) (? (and #\* (name v2 prod))))
  (if v2 (* v1 v2) v1))

this grammar in peg lang is equivalent to:

#lang peg
number <- res:[0-9]+ -> (string->number res);
sum <- v1:prod ('+' v2:sum)? -> (if v2 (+ v1 v2) v1);
prod <- v1:number ('*' v2:prod)? -> (if v2 (* v1 v2) v1);

Usage:

> (peg sum "2+3*4")
14
> (peg sum "2*3+4")
10
> (peg sum "7*2+3*4")
26

3.3 Example 3🔗ℹ

Here is an example of parsing balanced parenthesis. It demonstrates a common technique of using _ for skipping whitespace, and using "define-peg/bake" to produce a list rather than a sequence from a *.

#lang racket
(require peg)

(define-peg/drop _ (* (or #\space #\newline)))

(define-peg symbol
  (and (name res (+ (and (! #\( #\) #\space #\newline) (any-char)))) _)
  (string->symbol res))

(define-peg/bake sexp
  (or symbol
      (and (drop #\() (* sexp) (drop #\) _))))

or in PEG syntax:

#lang peg
_ < [ \n]*;
symbol <- res:(![() \n] .)+ _ -> (string->symbol res);
sexp <- s:symbol / ~'(' s:sexp* ~')' _ -> s;
// had to use s: -> s because there is no way to express bake from the PEG language

Usage:

> (peg sexp "(foob (ar baz)quux)")
'(foob (ar baz) quux)
> (peg sexp "((())(()(())))")
'((()) (() (())))
> (peg sexp "(lambda (x) (list x (list (quote quote) x)))")
'(lambda (x) (list x (list 'quote x)))

4 PEG Syntax🔗ℹ

This package also provides a "#lang peg" alternative, to allow you to make parsers in a more standard PEG syntax.

4.1 PEG Syntax Reference🔗ℹ

The best way to understand the PEG syntax would be by reference to examples, there are many simple examples in the racket peg repo and the follow is the actual grammar used by racket-peg to implemet the peg lang:

Note: When you match the empty string in peg lang, the result object is the empty sequence, not the empty string, be careful.

#lang peg

(require "s-exp.rkt");

_ < ([ \t\n] / '//' [^\n]*)*;

SLASH < '/' _;

name <-- [a-zA-Z_] [a-zA-Z0-9\-_.]* _;

rule <-- name ('<--' / '<-' / '<') _ pattern ('->' _ s-exp _)? ';' _;

pattern <-- alternative (SLASH alternative)*;

alternative <-- expression+;

expression <-- (name ~':' _)? ([!&~] _)? primary ([*+?] _)?;

primary <-- '(' _ pattern ')' _ / '.' _ / literal / charclass / name;

literal <-- ~['] (~[\\] ['\\] / !['\\] .)* ~['] _;

charclass <-- ~'[' '^'? (cc-range / cc-escape / cc-single)+ ~']' _;

cc-range <-- cc-char ~'-' cc-char;

cc-escape <-- ~[\\] .;

cc-single <-- cc-char;

cc-char <- !cc-escape-char . / 'n' / 't';

cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't';

peg <-- _ import* rule+;

import <-- s-exp _ ';' _;