String Searching Algorithms
1 Searching for single strings
1.1 Knuth-Morris-Pratt
kmp-string-contains
kmp-string-contains-ci
kmp-make-matcher
kmp-matcher?
kmp-matcher-ci?
kmp-matcher-pattern
kmp-find-string
kmp-find-all-strings
1.2 Boyer-Moore-Horspool
1.2.1 Strings
bmh-string-contains
bmh-string-contains-ci
bmh-make-matcher
bmh-matcher?
bmh-matcher-ci?
bmh-matcher-pattern
bmh-find-string
bmh-find-all-strings
1.2.2 Byte Strings
bmh-byte-string-contains
bmh-make-byte-matcher
bmh-byte-matcher?
bmh-byte-matcher-pattern
bmh-find-byte-string
bmh-find-all-byte-strings
2 Searching for multiple strings
2.1 Aho-Corasick
ahoc-make-matcher
list->ahoc-matcher
ahoc-matcher?
ahoc-matcher-patterns
ahoc-find-string
ahoc-find-allstrings
8.12

String Searching Algorithms🔗ℹ

    1 Searching for single strings

    2 Searching for multiple strings

 (require string-searchers) package: string-searchers

Provides a variety of string search algorithms written in Typed Racket. They look for sequences of exact code points or bytes, not equivalencies. When using with non-ASCII text, consider normalizing strings first.

1 Searching for single strings🔗ℹ

Functions for searching for a single substring in strings.

1.1 Knuth-Morris-Pratt🔗ℹ

Search for strings using the Knuth-Morris-Pratt algorithm. These functions are also available in the string-searchers/kmp module, without the kmp- prefix.

procedure

(kmp-string-contains haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-string-contains-ci haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
  kmp-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(kmp-matcher? obj)  Boolean

  obj : Any
Returns true if its argument is a kmp-matcher object.

procedure

(kmp-matcher-ci? m)  Boolean

  m : kmp-matcher
Tests if a matcher is case-insensitive or not

procedure

(kmp-matcher-pattern m)  String

  m : kmp-matcher
Returns the search string for this matcher.

procedure

(kmp-find-string m text [start end])  (Option Index)

  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcher’s search string in text, or #f if not found.

procedure

(kmp-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?])  (Listof Index)
  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexs of all occurances of the matcher’s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, (0) when false.

1.2 Boyer-Moore-Horspool🔗ℹ

Search for strings using the Boyer-Moore-Horspool algorithm. These functions are also available in the string-searchers/bmh module, without the bmh- prefix.

1.2.1 Strings🔗ℹ

procedure

(bmh-string-contains haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-string-contains-ci haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
  bmh-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(bmh-matcher? obj)  Boolean

  obj : Any
Returns true if its argument is a bmh-matcher object.

procedure

(bmh-matcher-ci? m)  Boolean

  m : bmh-matcher
Tests if a matcher is case-insensitive or not

procedure

(bmh-matcher-pattern m)  String

  m : bmh-matcher
Returns the search string for this matcher.

procedure

(bmh-find-string m text [start end])  (Option Index)

  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcher’s search string in text, or #f if not found.

procedure

(bmh-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?])  (Listof Index)
  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexes of all occurances of the matcher’s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, '(0) when false.

1.2.2 Byte Strings🔗ℹ

Note: There are no case-insensitive routines for byte strings.

procedure

(bmh-byte-string-contains haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : Bytes
  needle : Bytes
  start : Index = 0
  end : Index = (bytes-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-make-byte-matcher pattern)  bmh-byte-matcher

  pattern : Bytes
Return a new matcher that searches for the given byte string.

procedure

(bmh-byte-matcher? obj)  Boolean

  obj : Any
Returns true if its argument is a bmh-byte-matcher object.

procedure

(bmh-byte-matcher-pattern m)  Bytes

  m : bmh-byte-matcher
Returns the search byte string for this matcher.

procedure

(bmh-find-byte-string m text [start end])  (Option Index)

  m : bmh-byte-matcher
  text : Bytes
  start : Index = 0
  end : Index = (bytes-length text)
Return the index of the first occurance of the matcher’s search byte string in text, or #f if not found.

procedure

(bmh-find-all-byte-strings m    
  text    
  [start    
  end    
  #:overlap overlap?])  (Listof Index)
  m : bmh-byte-matcher
  text : Bytes
  start : Index = 0
  end : Index = (bytes-length text)
  overlap? : Boolean = #t
Return the indexes of all occurances of the matcher’s search byte string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for #"bb" in #"bbb" will return '(0 1) when true, '(0) when false.

2 Searching for multiple strings🔗ℹ

Sections for searching for any of multiple different substrings in a string.

2.1 Aho-Corasick🔗ℹ

Search for strings using the Aho-Corasick algorithm. These functions are also available in the string-searchers/ahoc module, without the ahoc- prefix.

procedure

(ahoc-make-matcher s ...)  ahoc-matcher

  s : String
Create a new Aho-Corasick matcher object that looks for the given string(s).

procedure

(list->ahoc-matcher strings)  ahoc-matcher

  strings : (Listof String)
Create a new Aho-Corasick matcher object that looks for the string(s) in the given list.

procedure

(ahoc-matcher? obj)  Boolean

  obj : Any
Tests if its argument is an Aho-Corasick matcher or not.

procedure

(ahoc-matcher-patterns m)  (Listof String)

  m : ahoc-matcher
Return the search strings this matcher looks for.

procedure

(ahoc-find-string m text)  (Option (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the first index of a matched string and which string it is, or #f if there are no matches.

procedure

(ahoc-find-allstrings m text)  (List (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the locations of all matched strings, or an empty list if none match. If one of the search strings is a prefix of another, multiple results can have the same index.