On this page:
2.1 RACR Overview
2.2 Holes and Choice Objects
2.3 Scope Graphs
2.4 Attributes, Choices, and Properties, Oh My!
2.5 Lifting
2.6 Getting Started
2.7 Minimal Example
2.8 Another Small Example with Variables
2.9 An Upgrade:   Using Canned Components
8.12

2 Xsmith Guide🔗ℹ

Xsmith uses RACR, an attribute grammar library, in its implementation, and some knowledge of RACR is necessary when using Xsmith.

To create a fuzzer with Xsmith, users create a specification by combining specification components (more commonly referred to as spec components), defined with define-spec-component. A spec component can define productions of a grammar with add-to-grammar, or it can provide properties of grammar productions with add-property. The grammar and properties are used to generate a RACR grammar, attributes for the grammar, and choice objects, which guide AST generation.

Program generation starts by generating an AST hole for a given grammar production. Generation continues by filling holes with concrete AST nodes, which may introduce new holes as child nodes. The grammar specification is used to determine how to fill holes in the AST. For example, in a grammar with addition and subtraction expressions, a generic Expression hole may be replaced by an Addition or Subtraction node. A choice object is created for each valid possible replacement. Choice objects have methods (called choice-methods) which aid in choosing a concrete replacement. Some of these methods act as predicates to filter out choices that are not legal in a particular context, such as choices that introduce more holes when the maximum tree depth has been reached. The choice-weight property defines a method which determines the relative probability of each choice being chosen. The fresh property defines a method which determines how the choice is instantiated as a RACR node. Additional methods may be defined as helpers. Choice objects have access to the current-hole, so they may query RACR attributes in method bodies. Choice object classes follow the same hierarchy as the grammar, so method inheritance for choice objects is similar to attribute inheritance for RACR nodes.

RACR attributes and choice object methods may be added directly with add-attribute and add-choice-method, respectively, but many are defined indirectly by various Xsmith properties. Properties allow users to specify various attributes and choice methods in a more declarative fashion.

Xsmith was primarily designed for the implementation of language specifications for differential compiler/interpreter testing. Therefore Xsmith takes pains to avoid producing programs that rely on commonly unspecified behaviors, such as the order of evaluation of function or operator arguments. Because the language-agnostic analysis within Xsmith is quite conservative, there are many valid programs that will not be generated. However, Xsmith makes it easy to create a fuzzer that obeys such rules without much language-specific work.

2.1 RACR Overview🔗ℹ

RACR is a library for Reference Attribute Grammars. Xsmith’s DSL defines a RACR grammar specification as well as various attributes. The attributes are queried to determine how to generate the AST.

RACR caches the results of attribute queries and keeps track of the nodes accessed for any attribute. When nodes used in an attribute computation are changed, future queries to that attribute are re-computed.

Users can specify new RACR attributes for Xsmith generators, but they should use add-attribute or add-property from Xsmith rather than using RACR functions directly. In expressions evaluated in the context of RACR attributes (attributes) or choice methods, RACR attributes may be queried.

The main RACR APIs of interest are:

Functions for querying the AST:

Xsmith provides a function which generates a complete AST, but users can also perform AST rewrites after initial program generation. Relevant RACR functions for performing AST rewrites include:

Full RACR documentation is here.

2.2 Holes and Choice Objects🔗ℹ

Hole nodes are RACR AST nodes. For every node type in the grammar, a hole node is created as a subclass of that node, inheriting all of its RACR attributes. A hole can be recognized by the xsmith_is-hole? attribute.

Consider the following (partial) grammar defined in the my-spec-component grammar specification:

(add-to-grammar
 my-spec-component
 [Expression #f ()]
 [LiteralInt Expression (v = (random 1000))]
 [AdditionExpression Expression
                     ([left : Expression]
                      [right : Expression])])

When a fresh AdditionExpression is created, it will include two Expression hole nodes. When the generator gets to those holes, a choice object is created for each subclass of Expression (including Expression itself unless it is disabled with the may-be-generated property). The choice objects have types corresponding to LiteralInt and AdditionExpression, and therefore may have different implementations for various choice methods. The choice objects all have access to the Expression hole (through current-hole), but while choice objects have access to their specialized choice method implementations, the hole is of type Expression, and so all RACR attributes that may be queried are specialized only as far as Expression, not to LiteralInt or AdditionExpression.

Although Xsmith can create holes for any type of production defined in the grammar, by default it will only generate more general holes. In the case of this example, the default behavior would be for Xsmith to generate Expression holes, but not LiteralInt or AdditionExpression holes. More specific holes are used either when a grammar production specifies that a child must be of a specific kind, or when a custom fresh implementation uses make-hole with a specific kind of production. For example, we could extend the above example:

(define-spec-component my-spec-component)
 
(add-to-grammar
 my-spec-component
 [Expression #f ()]
 [LiteralInt Expression (v = (random 1000))]
 [AdditionExpression Expression
                     ([left : Expression]
                      [right : Expression])]
 [LiteralSubtractionExpression Expression
                               ([left : LiteralInt]
                                [right : LiteralInt])])

Now, an Expression hole could be filled with LiteralInt, AdditionExpression, or LiteralSubtractionExpression. However, while the AdditionExpression’s two Expression child holes could be filled with any kind of Expression, the LiteralSubtractionExpression’s children will only be LiteralInts.

2.3 Scope Graphs🔗ℹ

Xsmith uses the language-independent resolution algorithm of scope graphs.

The theory of scope graphs is described in the paper “A Theory of Name Resolution with Extended Coverage and Proofs”, (available at https://researchr.org/publication/NeronTVW15).

2.4 Attributes, Choices, and Properties, Oh My!🔗ℹ

Aside from the grammar productions themselves, Xsmith language specifications deal with attributes (eg. via add-attribute), choice methods (via add-choice-method), and properties (via add-property). The exact nature of these terms and how they relate to one another can be confusing, so let’s talk about them.

Remember: Attributes and choice methods are functions that can be used within specific contexts of Xsmith fuzzers. On ther other hand, properties are compile-time macros for generating attributes and choice methods and so are not themselves used during generation.

2.5 Lifting🔗ℹ

The term “lift” is used throughout this document and the Xsmith library. In the context of Xsmith, "lifting" refers to creating a binding definition after it’s needed by a reference. In other words, when a reference node (i.e., a node which implements the reference-info property) is created and there is not an available definition that satisfies the type system and effect system, a new definition is “lifted” to a binding form that is visible from that reference location. Additionally, with some probability, new definitions may be lifted even when there is a suitable definition available. This probability can be controlled with the reference-choice-info property.

Importantly, lifts can only be done when there is a surrounding node that can hold the lifted definition. Only definitions are lifted, not function parameters. In practice, this means that your outermost “program” node needs to have a field with Definition *, so any number of definitions may be lifted to it.

Note that lifted definitions inherit the tree depth (relevant to xsmith-max-depth) of the reference expression that caused the lift. This is to prevent xsmith from endlessly choosing to lift new things.

2.6 Getting Started🔗ℹ

This section is a small walkthrough of creating a simple fuzzer using only basic Xsmith forms to show the fundamentals. However, we recommend using Canned Components for any serious fuzzer implementation.

To get started, we must require xsmith and racr. Other modules may also be convenient, such as racket/string.
(require xsmith
         racr
         racket/string)

To define a fuzzer, first define a spec component with define-spec-component. We’re going to create a simple arithmetic language, so we’ll name our spec component arith for short.

The spec component is where we store definitions of the grammar productions, properties, attributes, and choice methods. Let’s add some productions to the grammar defined by this spec component! When adding nodes, we must always specify three components: the node’s name, its supertype (also called a “parent type”), and a list of children nodes. We’ll define a top-level node that we will later use to specify where to start program generation, which we’ll call Program. The Program production will have a single child: an Expression. Therefore, we will also need to provide a base Expression production. Neither of these nodes has a supertype, so we will supply #f in the supertype field after the node name. Note that the names of node types should be capitalized camel case (punctuation characters like - and _ are disallowed).

(add-to-grammar
 arith
 [Program #f (Expression)]
 [Expression #f ()])

We want the Expression node to be abstract and not generated itself, so we’ll use the may-be-generated property to restrict it.

(add-property
 arith
 may-be-generated
 [Expression #f])

We can also put properties inline with the grammar definition if we prefer. We can replace the above definition using this style:

(add-to-grammar
 arith
 [Program #f (Expression)]
 [Expression #f ()
             #:prop may-be-generated #f])

(Note that we cannot use both forms, as this will produce an error due to conflicting definitions of a property value for a given grammar production.)

If we add node types that are subtypes of Expression, they can be generated in Expression holes. Let’s add a node for literal integers.

(add-to-grammar
 arith
 [LiteralInt Expression ([v = (random 100)])])

Note that the literal node contains a child v that is a normal Racket value, not a grammar node type. It is initialized with the expression on the right-hand side of the = sign.

The literal integers generated in our language will only be values from 0 to 99. Note that we can add the initialization expression inline as above or with the fresh property. If we don’t add an initialization expression, then non-node fields will be initialized with #f, while node fields will be initialized with hole nodes of the appropriate type..

Let’s add addition expressions. Because we have multiple Expressions, we need to give them names. Note that we aren’t supplying initialization code. Because they are grammar node typed, they will be initialized with Expression Hole nodes (same with the Program node’s Expression child).

(add-to-grammar
 arith
 [Addition Expression ([l : Expression]
                       [r : Expression])])

Our language only has one type: integers. This means we don’t need to add type rules, but we will anyway for the sake of a complete tutorial. We will call the type int and add an implementation of the type-info property for our language so far.

(define int (base-type 'int))
(add-property
 arith
 type-info
 [Program [int (λ (n t) (hash 'Expression int))]]
 [LiteralInt [int (λ (n t) (hash))]]
 [Addition [int (λ (n t) (hash 'l int 'r int))]])

Each rule supplied to the property is surrounded in square brackets, [].

The left-hand side of each of these rules is an expression that returns the type the node can inhabit. Most often, this will just be the name of the node type, such as Program and LiteralInt in this example. If a node can inhabit multiple types, fresh-type-variable can be used to specify an unconstrained or partially constrained type.

On the right-hand side is a function with two parameters: the node currently being evaluated, and the type that is currently being used to confine generation of that node. The body of the function returns a dictionary mapping children (by name or by node object) to a type. Note that type variables are unified during the type analysis, so you should take care with object/pointer equality (i.e. eq?) of type variables.

Now we need to specify how to print our programs, or “render” them. For this, we use the render-node-info property.

(add-property
 arith
 render-node-info
 [Program
  (λ (n) (att-value 'xsmith_render-node (ast-child 'Expression n)))]
 [LiteralInt
  (λ (n) (number->string (ast-child 'v n)))]
 [Addition
  (λ (n) (format "(~a + ~a)"
                 (att-value 'xsmith_render-node (ast-child 'l n))
                 (att-value 'xsmith_render-node (ast-child 'r n))))])

In this case, we are rendering each node directly to a string, but that’s not usually the best approach. More often, we render programs with some intermediate data structure and convert that to a string only as a last step. Most Xsmith fuzzers either use s-expressions for rendering Lisp-like languages, or else the pprint Racket library’s document objects. You can, of course, use your own custom implementation if you prefer.

We put everything together with the define-xsmith-interface-functions macro, which compiles our language specification and defines the arith-command-line function.

(define-xsmith-interface-functions
  [arith]
  #:comment-wrap (λ (lines)
                   (string-join
                    (map (λ (x) (format "// ~a" x)) lines)
                    "\n")))

Note that we give it a function for how it wraps comments. Specifically it is a function that takes a list of single-line strings and returns a single string that’s been appropriately formatted to be commented in your language. There are many optional arguments to the define-xsmith-interface-functions that changes the way it behaves.

To actually run our fuzzer, we need to run the arith-command-line function. Let’s put it in our main submodule. Then when we run our program from the command line it will generate a program. It automatically has --help support, and has various options, such as setting a seed, a max depth, etc.

(module+ main
  (arith-command-line))

2.7 Minimal Example🔗ℹ

Below is a complete generator of arithmetic expressions, following the implementation we just outlined in Getting Started. Note that we use #lang clotho instead of #lang racket, which allows us to capture, replay, and modify random choices during generation.

"minimal-example.rkt"

#lang clotho
(require xsmith
         racr
         racket/string)
 
(define-spec-component arith)
 
(add-to-grammar
 arith
 [Program #f (Expression)]
 [Expression #f ()
             #:prop may-be-generated #f]
 [LiteralInt Expression ([v = (random 100)])]
 [Addition Expression ([l : Expression]
                       [r : Expression])
           ;; The default weight is 10.
           #:prop choice-weight 20])
 
(define int (base-type 'int))
(add-property
 arith
 type-info
 [Program [int (λ (n t) (hash 'Expression int))]]
 [LiteralInt [int (λ (n t) (hash))]]
 [Addition [int (λ (n t) (hash 'l int 'r int))]])
 
(add-property
 arith
 render-node-info
 [Program
  (λ (n) (att-value 'xsmith_render-node (ast-child 'Expression n)))]
 [LiteralInt
  (λ (n) (number->string (ast-child 'v n)))]
 [Addition
  (λ (n) (format "(~a + ~a)"
                 (att-value 'xsmith_render-node (ast-child 'l n))
                 (att-value 'xsmith_render-node (ast-child 'r n))))])
 
;; This line defines `arith-command-line`.
(define-xsmith-interface-functions
  [arith]
  #:comment-wrap (λ (lines)
                   (string-join
                    (map (λ (x) (format "// ~a" x)) lines)
                    "\n")))
 
(module+ main (arith-command-line))
 

2.8 Another Small Example with Variables🔗ℹ

Here is a bigger example that contains variables and references to those variables.

Note that instead of a Program node, we use the LetStar node as the base node from which to begin generation. The purpose in having a specific top-level Program node is to be sure that the top-level node can have definitions lifted to it. The LetStar node in the following example satisfies this purpose.

This example also renders to s-expressions rather than directly to strings like the previous example. Because of this change, we have to give xsmith-command-line another optional argument, called #:format-render, specifying how we convert our rendered format into strings.

"minimal-example-with-variables.rkt"

#lang clotho
(require xsmith
         xsmith/app
         racr
         racket/pretty
         racket/string
         racket/port)
 
(define-spec-component arith)
 
(add-to-grammar
 arith
 [Definition #f (name type Expression)
   #:prop binder-info ()]
 [Expression #f ()
             #:prop may-be-generated #f]
 [LetStar Expression ([definitions : Definition *]
                      [sideEs : Expression * = (random 2)]
                      Expression)
          #:prop strict-child-order? #t]
 [VariableReference Expression (name)
                    #:prop reference-info (read)]
 [SetBangRet Expression (name Expression)
             #:prop reference-info (write)]
 [LiteralInt Expression ([v = (random 100)])]
 [Addition Expression ([es : Expression * = (+ 1 (random 5))])
           #:prop choice-weight 50])
 
 
(define int (base-type 'int))
(add-property
 arith
 type-info
 [Definition [(fresh-type-variable) (λ (n t) (hash 'Expression t))]]
 [LetStar [(fresh-type-variable)
           (λ (n t) (hash 'definitions (λ (cn) (fresh-type-variable))
                          'sideEs (λ (cn) (fresh-type-variable))
                          'Expression t))]]
 [LiteralInt [int (λ (n t) (hash))]]
 [VariableReference [(fresh-type-variable) (λ (n t) (hash))]]
 [SetBangRet [(fresh-type-variable) (λ (n t) (hash 'Expression t))]]
 [Addition [int (λ (n t) (hash 'es t))]])
 
(add-property
 arith
 render-node-info
 ;; Note that we've imported xsmith/app, so our #%app allows quoted
 ;; symbols to be used in function position as a shorthand for
 ;; using `att-value`.
 [LetStar
  (λ (n)
    `(let* (,@(map (λ (d)
                     `[,(string->symbol (ast-child 'name d))
                       ,($xsmith_render-node
                         (ast-child 'Expression d))])
                   (ast-children (ast-child 'definitions n))))
       ,@(map (λ (c) ($xsmith_render-node c))
              (ast-children (ast-child 'sideEs n)))
       ,($xsmith_render-node (ast-child 'Expression n))))]
 [LiteralInt (λ (n) (ast-child 'v n))]
 [VariableReference (λ (n) (string->symbol (ast-child 'name n)))]
 [SetBangRet (λ (n) `(begin (set! ,(string->symbol (ast-child 'name n))
                                  ,($xsmith_render-node
                                    (ast-child 'Expression n)))
                            ,(string->symbol (ast-child 'name n))))]
 [Addition (λ (n) `(+ ,@(map (λ (c) ($xsmith_render-node c))
                             (ast-children (ast-child 'es n)))))])
 
 
 
;; This line defines `arith-command-line`.
(define-xsmith-interface-functions
  [arith]
  #:program-node LetStar
  #:type-thunks (list (λ () int))
  #:comment-wrap (λ (lines)
                   (string-join
                    (map (λ (x) (format ";; ~a" x)) lines)
                    "\n"))
  #:format-render (λ (ast)
                    (with-output-to-string
                      (λ ()
                        (pretty-print ast (current-output-port) 1)))))
 
(module+ main (arith-command-line))
 

2.9 An Upgrade: Using Canned Components🔗ℹ

Many programming languages share various features in common, such as binding scope, arithmetic expressions, functions, etc. The xsmith/canned-components library in Xsmith provides tools for quickly building this common infrastructure for your language specification with minimal effort. It supplies the ability to enable various productions as needed, as well as default implementations of the type-info and fresh properties for those nodes. Here is an example of a fuzzer built using Canned Components.

"minimal-example-with-canned-components.rkt"

#lang clotho
 
(require
 xsmith
 racr
 xsmith/racr-convenience
 xsmith/canned-components
 racket/string
 racket/list
 racket/pretty)
 
;; We first define a basic component and add a bunch of expressions.
 
(define-basic-spec-component somelisp)
 
(add-basic-expressions somelisp
                       #:ProgramWithSequence #t
                       #:VoidExpression #t
                       #:AssignmentExpression #t
                       #:VariableReference #t
                       #:ProcedureApplication #t
                       #:IfExpression #t
                       #:ExpressionSequence #t
                       #:LetSequential #t
                       #:LambdaWithExpression #t
                       #:Numbers #t
                       #:Booleans #t
                       #:Strings #t
                       #:ImmutableList #t)
 
(add-loop-over-container
 somelisp
 #:name Map
 #:collection-type-constructor (λ (inner) (immutable (list-type inner)))
 #:loop-type-constructor (λ (inner) (immutable (list-type inner))))
 
 
;; Now we basically just add the render property unless we want to manually
;; define more.
 
 
(define (render-child sym n)
  (att-value 'xsmith_render-node (ast-child sym n)))
(define (render-children sym n)
  (map (λ (cn) (att-value 'xsmith_render-node cn))
       (ast-children (ast-child sym n))))
(define ((binary-op-renderer op) n)
  `(,op ,(render-child 'l n) ,(render-child 'r n)))
 
 
(add-property
 somelisp
 render-node-info
 
 [ProgramWithSequence
  (λ (n)
    ;; Unless our language has a well-defined exception interface that
    ;; we are trying to fuzz, we need to provide “safe” versions of
    ;; functions that aren't total.
    ;; Canned components uses safe functions and requires us to provide
    ;; definitions.
    ;; Here we provide some before the generated program.
    `((define (safe-divide numerator divisor)
        (if (equal? 0 divisor)
            0
            (/ numerator divisor)))
      (define (safe-car list fallback)
        (if (null? list)
            fallback
            (car list)))
      (define (safe-cdr list fallback)
        (if (null? list)
            fallback
            (cdr list)))
      ,@(render-children 'definitions n)
      (define program-result ,(render-child 'ExpressionSequence n))
 
      ;; Let's add code to print the values of generated global variables.
      (begin
        ,(if (base-type? (att-value 'xsmith_type
                                    (ast-child 'ExpressionSequence n)))
             '(printf "Program body result: ~v\n" program-result)
             '(void))
        ,@(for/list ([c (ast-children (ast-child 'definitions n))]
                     #:when (base-type? (concretize-type
                                         (att-value 'xsmith_type c))))
            `(printf "Variable ~a value: ~v\n"
                     ',(string->symbol (ast-child 'name c))
                     ,(string->symbol (ast-child 'name c)))))))]
 
 [Definition (λ (n)
               `(define ,(string->symbol (ast-child 'name n))
                  ,(render-child 'Expression n)))]
 [AssignmentExpression
  (λ (n) `(set! ,(string->symbol (ast-child 'name n))
                ,(render-child 'newvalue n)))]
 [ExpressionSequence
  (λ (n)
    `(begin
       ,@(render-children 'effectexpressions n)
       ,(render-child 'finalexpression n)))]
 [LetSequential
  (λ (n)
    `(let* (,@(map (λ (dn) `[,(string->symbol (ast-child 'name dn))
                             ,(render-child 'Expression dn)])
                   (ast-children (ast-child 'definitions n))))
       ,(render-child 'body n)))]
 
 [IfExpression
  (λ (n)
    `(if ,(render-child 'test n)
         ,(render-child 'then n)
         ,(render-child 'else n)))]
 
 [VariableReference (λ (n) (string->symbol (ast-child 'name n)))]
 
 [ProcedureApplication
  (λ (n)
    `(,(render-child 'procedure n)
      ,@(render-children 'arguments n)))]
 [FormalParameter (λ (n) (string->symbol (ast-child 'name n)))]
 [LambdaWithExpression
  (λ (n)
    `(lambda (,@(render-children 'parameters n))
       ,(render-child 'body n)))]
 
 [BoolLiteral (λ (n) (not (not (ast-child 'v n))))]
 [Not (λ (n) `(not ,(render-child 'Expression n)))]
 [And (binary-op-renderer 'and)]
 [Or (binary-op-renderer 'or)]
 
 [IntLiteral (λ (n) (ast-child 'v n))]
 [Plus (binary-op-renderer '+)]
 [Minus (binary-op-renderer '-)]
 [Times (binary-op-renderer '*)]
 [LessThan (binary-op-renderer '<)]
 [GreaterThan (binary-op-renderer '>)]
 [SafeDivide (binary-op-renderer 'safe-divide)]
 
 [StringLiteral (λ (n) (ast-child 'v n))]
 [StringAppend (binary-op-renderer 'string-append)]
 [StringLength (λ (n) `(string-length ,(render-child 'Expression n)))]
 
 [ImmutableListLiteral
  (λ (n) `(list ,@(render-children 'expressions n)))]
 [ImmutableListSafeCar
  (λ (n) `(safe-car ,(render-child 'list n)
                    ,(render-child 'fallback n)))]
 [ImmutableListSafeCdr
  (λ (n) `(safe-cdr ,(render-child 'list n)
                    ,(render-child 'fallback n)))]
 [ImmutableListCons
  (λ (n) `(cons ,(render-child 'newvalue n)
                ,(render-child 'list n)))]
 [VoidExpression (λ (n) '(void))]
 [Map (λ (n) `(map (λ (,(ast-child 'name (ast-child 'elemname n)))
                     ,(render-child 'body n))
                   ,(render-child 'collection n)))])
 
 
 
;; Sometimes we have a type variable that is not concrete but needs to be.
;; Here we provide a list of options we can pick for an unconstrained
;; type variable.
(define (type-thunks-for-concretization)
  (list
   (λ()int-type)
   (λ()bool-type)
   (λ()string-type)
   (λ()(immutable (list-type (fresh-type-variable))))))
 
(define (somelisp-format-render s-exps)
  (define out (open-output-string))
  (for ([symex s-exps])
    (pretty-print symex out 1))
  (get-output-string out))
 
 
(define-xsmith-interface-functions
  [somelisp]
  #:program-node ProgramWithSequence
  #:type-thunks type-thunks-for-concretization
  #:comment-wrap (λ (lines)
                   (string-join
                    (map (λ (l) (format ";; ~a" l)) lines)
                    "\n"))
  #:format-render somelisp-format-render)
 
(module+ main (somelisp-command-line))