Open  Fst:   Racket Bindings
1 Getting Started
1.1 Example:   Thousand Separators
1.2 Example:   Leading Zeros
2 Abstract Automata Manipulation
fst?
fst-like?
fst-accept
fst-cross
fst-closure
fst-union
fst-compose
fst-concat
fst-project
fst-shortest-path
fst->string
fst-inverse
fst-reverse
fst-optimize
fst-difference
fst-sane?
2.1 File I/  O
fst-save
fst-load
2.2 Derived Utilities
fst-add-weight
fst-insert
fst-delete
fst-join
fst-rewrite
3 Direct Automata Access
make-fst
fst-add-state!
fst-add-arc!
fst-set-start!
fst-set-final!
fst-num-states
fst-num-arcs
fst-start
fst-final
fst-states
fst-arcs
3.1 Transition Arcs
arc?
label?
arc
arc-ilabel
arc-olabel
arc-weight
arc-next-state
4 Limitations
Bibliography
8.12

OpenFst: Racket Bindings🔗ℹ

Alex MacLean <alex@alex-maclean.com>

 (require openfst) package: openfst

This package provides Racket bindings for OpenFst, "a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs)" [openfst]. Extensions to OpenFst based on the Python package pynini are also included [pynini].

1 Getting Started🔗ℹ

A weighted finite-state transducers (FSTs) is a type of automata that consists of:

This library provides a high- and low-level interface for interacting with FSTs. The high-level abstract functional interface contains functions over FSTs such as fst-union and fst-compose. The low-level direct interface allows for inspection and stateful mutation of the structure of an FST.

1.1 Example: Thousand Separators🔗ℹ

To better understand what FSTs are and how OpenFST enables their use, we’ll perform some simple formatting tasks on natural numbers. First consider the task of adding commas after every third digit as thousand separators, that is:

"7577130400""7,577,130,400"

To do this, we first define an FST that accepts any single digit called digit. Next, we take the closure over this FST with a lower and upper bound of 3. The resulting FST, digits3, will accept only strings of exactly 3 digits and rewrite them unchanged.

(define digit (fst-union "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
(define digits3 (fst-closure digit #:lower 3 #:upper 3))

Next we add a comma inserting FST before digits3 and take the Kleene * of this to get digits3*. The resulting FST will accept a string of digits with length divisible by 3 and insert a comma before every 3rd digit. Finally we concatenate this FST with an FST taking 1, 2, or 3 digits representing the leading digit of the number, to produce add-commas-fst.

(define digits3*
  (fst-closure (fst-concat (fst-cross "" ",") digits3)))
(define add-commas-fst
  (fst-concat (fst-closure digit #:lower 1 #:upper 3)
              digits3*))

To actually apply this FST to a string we use fst-compose to implicitly convert the string to an FST itself and then compose that FST with the one we build for thousand separation. Finally, we convert the resulting FST back into a string. For the purposes of this example we do this explicitly, but fst-rewrite wraps this functionality up nicely.

(define (add-commas number-string)
  (fst->string
   (fst-compose number-string add-commas-fst)))

Examples:
> (add-commas "7577130400")

"7,577,130,400"

> (add-commas "81")

"81"

> (add-commas "100000")

"100,000"

1.2 Example: Leading Zeros🔗ℹ

To see how weights can be applied to FSTs, consider the problem of removing leading zeros from a number. While this task can be accomplished with an unweighted-FST, introducing weights simplifies the solution.

"007""7"

The leading-zero removing transducer is defined as the concatenation of an FST that removes zero or more "0"s and an FST that accepts one or more digits. While the second FST could process the leading "0"s without removing them, this path will have more weight associated with it.

(define leading-0s
  (fst-concat (fst-closure (fst-cross "0" ""))
              (fst-closure (fst-add-weight digit 1.0) #:lower 1)))

Note that this FST can be composed with the thousand separator from the previous problem into a single new FST. When applying this FST, fst-shortest-path must be used since now multiple computations exist for some strings.

(define cleanup-fst (fst-compose leading-0s add-commas-fst))
 
(define (cleanup-number number-string)
  (fst->string
   (fst-shortest-path
    (fst-compose number-string cleanup-fst))))

Examples:
> (cleanup-number "07577130400")

"7,577,130,400"

> (cleanup-number "00000000")

"0"

> (cleanup-number "100000")

"100,000"

2 Abstract Automata Manipulation🔗ℹ

procedure

(fst? v)  boolean?

  v : any/c
Returns #true if the given v is a finite-state transducer.

procedure

(fst-like? v)  boolean?

  v : any/c
Returns (or (string? v) (fst? v)). Nearly all of the following functions accept FST-like arguments and convert strings to FSTs with fst-accept.

procedure

(fst-accept str [#:weight weight])  fst?

  str : string?
  weight : real? = 0
Constructs a new FST that accepts the given str at the given weight. This FST takes the form of a chain of states each connected to the next by a single arc with weight 0 and an input and output label corresponding to a character in the input string. The final state has a weight of weight.

procedure

(fst-cross fst1 fst2)  fst?

  fst1 : fst-like?
  fst2 : fst-like?
Creates a new FST that accepts input strings from the language of fst1 and produces output strings in the language of fst2.

Example:
> (fst-rewrite (fst-cross "hello" "world") "hello")

"world"

procedure

(fst-closure fst    
  [#:lower lower    
  #:upper upper])  fst?
  fst : fst-like?
  lower : exact-nonnegative-integer? = 0
  upper : (or/c #f exact-positive-integer?) = #f
Produces an FST that accepts strings in the language of fst repeated between lower and upper times. If upper is #false, then strings may be repeated an infinite number of times.

Examples:
> (define f (fst-closure (fst-cross "A" "B") #:lower 2))
> (fst-rewrite f "AAAAA")

"BBBBB"

> (fst-rewrite f "AA")

"BB"

> (fst-rewrite f "A")

#f

procedure

(fst-union fst ...+)  fst?

  fst : fst-like?
Create a new FST representing the union (or sum) of the given FSTs. The resulting FSTs will transduce all the strings in the input FSTs to all the corresponding output strings.

procedure

(fst-compose fst ...+)  fst?

  fst : fst-like?
Create a new FST that has the effect of applying each of the given FSTs in order. This operation is also useful because it can be used to rewrite a string represented as an acceptor FST.

Examples:
> (define more+s (fst-closure (fst-cross "+" "++")))
> (fst->string (fst-compose "+" more+s))

"++"

> (fst->string (fst-compose "+" more+s more+s more+s))

"++++++++"

procedure

(fst-concat fst ...+)  fst?

  fst : fst-like?
Construct a new FST that transduces strings that are a concatenation of the strings of the input FSTs.

Examples:
> (define f (fst-concat (fst-cross "a" "b") (fst-cross "x" "y")))
> (fst-rewrite f "ax")

"by"

> (fst-rewrite f "a")

#f

procedure

(fst-project fst type)  fst?

  fst : fst-like?
  type : (or/c 'input 'output)
Given a transducer fst, produces an acceptor that accepts either the input or the output language. This is accomplished by replacing the in-labels of every arc with the out-labels or vice versa.

Examples:
> (define A->B (fst-cross "A" "B"))
> (fst->string (fst-project A->B 'input))

"A"

> (fst->string (fst-project A->B 'output))

"B"

procedure

(fst-shortest-path fst [n])  fst?

  fst : fst-like?
  n : exact-positive-integer? = 1
Constructs a new FST representing the n shortest paths through the given FST.

procedure

(fst->string fst)  string?

  fst : fst-like?
If there is only one possible path through the given FST, produces a string by walking this path and adding all output-labels to a string. If there are multiple paths an error is printed and "" is returned.

procedure

(fst-inverse fst)  fst?

  fst : fst-like?
Creates a new FST that has opposite input and output languages as the given fst. This is accomplished by switching the input and output labels on every arc.

Examples:
> (define f (fst-inverse (fst-cross "hello" "world")))
> (fst-rewrite f "world")

"hello"

procedure

(fst-reverse fst)  fst?

  fst : fst-like?
Creates a new FST that accepts the reverse of the strings in the given FST and produces the reverse of the products.

Examples:
> (define f (fst-reverse (fst-cross "hello" "world")))
> (fst-rewrite f "olleh")

"dlrow"

procedure

(fst-optimize fst)  fst?

  fst : fst-like?
Creates a new FST with the same effect as the provided FST but possibly with fewer states and arcs.

procedure

(fst-difference fst1 fst2)  fst?

  fst1 : fst-like?
  fst2 : fst-like?
Create an FST acceptor that accepts all strings accepted by fst1, except those accepted by fst2. Both input FSTs must be acceptors.

Examples:
> (define f (fst-difference (fst-union "a" "b" "c") "c"))
> (fst-rewrite f "a")

"a"

> (fst-rewrite f "c")

#f

procedure

(fst-sane? fst)  boolean?

  fst : fst-like?
Performs a sanity check on the given FST returning #true if all arcs point to valid states and there is a start state for non-empty FSTs.

2.1 File I/O🔗ℹ

Because of the limiations of the interfaces between Racket, C, and C++ these functions work by creating a named temporary file and passing the name of this file to the underlying C++ implementation. This should not be apparent to users of the library, but this approach has not been fully tested on all systems.

procedure

(fst-save fst port)  void?

  fst : fst-like?
  port : output-port?
Write a binary reperesentation to the given port.

procedure

(fst-load port)  fst?

  port : input-port?
Read an FST written by fst-save returning the FST.

2.2 Derived Utilities🔗ℹ

 (require openfst/utils) package: openfst

procedure

(fst-add-weight fst weight)  fst?

  fst : fst-like?
  weight : real?
Creates a new FST equivalent to fst, except that every path through the FST has weight added to it.

This is accomplished by adding an ε transition to the start of fst with the given weight.

procedure

(fst-insert fst [#:weight weight])  fst?

  fst : fst-like?
  weight : real? = 0
Create a new FST that accepts "" and produces the language of the given fst. This function is nearly equivalent to (fst-cross "" fst). If a weight is provided this weight is applied to the resulting FST.

This function has the effect of replacing the input-label of every arc in the input FST with ε.

procedure

(fst-delete fst [#:weight weight])  fst?

  fst : fst-like?
  weight : real? = 0
Create a new FST that accepts the language of the given fst and produces "". This function is nearly equivalent to (fst-cross fst ""). If a weight is provided this weight is applied to the resulting FST.

This function has the effect of replacing the output-label of every arc in the input FST with ε.

procedure

(fst-join exp sep)  fst?

  exp : fst-like?
  sep : fst-like?
Creates a new FST that is equivalent to (<exp> (<sep> <exp>)*)

procedure

(fst-rewrite fst input)  (or/c string #f)

  fst : fst-like?
  input : string?
Rewrites the given input string with the given FST. If the FST cannot accept input then #f is returned, otherwise the rewritten string is returned. If the FST can accept the input string this is equivalent to:

3 Direct Automata Access🔗ℹ

procedure

(make-fst state-info ...)  fst?

  state-info : 
(cons/c
 (or/c symbol? (list/c symbol? real?))
 (listof (list/c label? label? real? symbol?)))
Creates a new finite state transducer given a series of s-expressions representing states. The first element in each list is either a symbol used to refer to the state or a list containing the symbol and a final weight to be assigned to this state. Without a weight, states receive a weight of +inf.0. The elements in the list after this are quadruples of arguments passed to arc. The first element in the list is assigned the role of start state. Note that the symbols used for the states do not persist beyond construction of the FST.

Example:
(make-fst
 '(a (#\A #\B 1 a)
     (#\B #\C 1 b))
 '((b 0) (#\A #\B 1 a)))

procedure

(fst-add-state! fst)  exact-nonnegative-integer?

  fst : fst?
Mutates the given fst adding a new state and returns that state’s id. Generally states start at 0 and increase sequentially, but probably not a good idea to count on this.

procedure

(fst-add-arc! fst state arc)  void?

  fst : fst?
  state : exact-nonnegative-integer?
  arc : arc?
Mutates the given fst adding a transition arc initiating at state.

procedure

(fst-set-start! fst state)  void?

  fst : fst?
  state : exact-nonnegative-integer?
Mutates the given fst re-assigning the start state to the given state. The provided state must already be in the FST. The state that was previously the start state will no longer be.

procedure

(fst-set-final! fst state weight)  void?

  fst : fst?
  state : exact-nonnegative-integer?
  weight : real?
Mutates the given fst, setting the final weight for state. This value represents the weight that will be added to a path that ends at this state. A value of +inf.0 is the default weight for all states and indicates that a path may not end at this state (a non-final state).

procedure

(fst-num-states fst)  exact-nonnegative-integer?

  fst : fst?
Returns the total number of states in fst. This is equivalent to (length (fst-states fst)), though somewhat more efficient.

procedure

(fst-num-arcs fst state)  exact-nonnegative-integer?

  fst : fst?
  state : exact-nonnegative-integer?
Returns the total number of arcs coming from state in fst. This is equivalent to (length (fst-arcs fst state)), though somewhat more efficient.

procedure

(fst-start fst)  (or/c #f exact-nonnegative-integer?)

  fst : fst?
Returns the state id of the start state for the given fst. If the FST does not have a start state, either because it is empty or invalid, returns #false.

procedure

(fst-final fst state)  real?

  fst : fst?
  state : exact-nonnegative-integer?
Returns the final weight associated with the given state. This represents the weight that will be added to a path that ends on this state. Final states will have finite weights while non-final states have a weight of +inf.0.

procedure

(fst-states fst)  (listof exact-nonnegative-integer?)

  fst : fst?
Returns a list of the state ids present in fst.

procedure

(fst-arcs fst state)  (listof arc?)

  fst : fst?
  state : exact-nonnegative-integer?
Returns a list of the arcs originating from state in fst.

3.1 Transition Arcs🔗ℹ

Arcs represent transitions between states in an finite-state transducer. Each arc consists of a in-label, an out-label, a weight, and a next state. Arcs are added to the automata at the states from which they originate.

As an implementation note, arc objects are in fact pointers to underlying C++ objects in foreign memory. From the perspective of the user, however they conform to the struct interface.

procedure

(arc? v)  boolean?

  v : any/c
Returns #true if the given v is a finite-state transducer arc.

procedure

(label? v)  boolean?

  v : any/c
Returns #true if the given v is suitable for use as an input or output label for an arc. Equivalent to (or (exact-nonnegative-integer? v) (char? v)).

procedure

(arc ilabel olable weight next-state)  arc?

  ilabel : label?
  olable : label?
  weight : real?
  next-state : exact-nonnegative-integer?
Create a new arc object.

procedure

(arc-ilabel arc)  exact-nonnegative-integer?

  arc : arc?

procedure

(arc-olabel arc)  exact-nonnegative-integer?

  arc : arc?

procedure

(arc-weight arc)  real?

  arc : arc?

procedure

(arc-next-state arc)  exact-nonnegative-integer?

  arc : arc?
Accessor methods for an arc object. Note that even when arcs are constructed with character labels these values are represented internally as integers so arc-ilabel and arc-olabel will return the corresponding integer.

4 Limitations🔗ℹ

This library currently only supports x86_64 architecture. Getting it working on different hardware should be as simple as running the build script corresponding to the OS in the lib/ directory of the GitHub Repo, but I haven’t tested this...

While it should not be visible to the consumers of this library, the Windows implementation of this library is using an unofficial port of OpenFST [winfst] because the official OpenFST distribution does not support Windows.

Bibliography🔗ℹ

[pynini] K. Gorman, “Pynini: A Python library for weighted finite-state grammar compilation,” Proc. ACL Workshop on Statistical NLP and Weighted Automata, 75-80, 2016. https://pypi.org/project/pynini/
[openfst] Cyril Allauzen, M. Riley, J. Schalkwyk, Wojciech Skut, Mehryar Mohri, “OpenFst: A General and Efficient Weighted Finite-State Transducer Library.” 16 July 2007. https://www.openfst.org
[winfst] “OpenFST port for Windows.” https://github.com/kkm000/openfst