udelim

8.12

udelim🔗ℹ

package: udelim

1 Stability🔗ℹ

Don’t consider this library to be stable right now. Particularly the udelim metalanguage – I’m not sure what extra parens would be a good default (eg. which ones are distinctive enough, have good font support, are desired by people...). I am definitely keen on «» as nestable string delimiters that aren’t wrapped in anything (I would like to always have a nestable string delimiter that just gives me a string). I think I would like one nestable string delimiter that is wrapped with some #%symbol, and several paren types also wrapped.

2 Guide🔗ℹ

This is a library I wrote primarily to help make nestable embedding of different syntax in #lang rash, but is generally useful for adding extra types of parenthesis or string delimiters to a language. After watching Jay McCarthy’s talk at Sixth Racketcon, I also decided to steal his idea of making different types of parenthesis wrap their contents with an additional #%symbol.

You can use the udelim meta-language (eg. #lang udelim racket/base) to essentially wrap your language’s readtable with udelimify.

3 Reference🔗ℹ

procedure
(make-list-delim-readtable
l-paren
r-paren
[ #:base-readtable base-readtable
#:as-dispatch-macro? as-dispatch-macro?
#:wrapper wrapper
#:inside-readtable inside-readtable])
→ readtable?
  l-paren : char?
  r-paren : char?
  base-readtable : readtable? = #f
  as-dispatch-macro? : any/c = #f
  wrapper : (or/c false/c symbol? procedure?) = #f
   inside-readtable : (or/c false/c readtable? (-> (or/c false/c readtable?)) 'inherit)
= 'inherit

Returns a new readtable based on base-readtable that uses l-paren and r-paren like parenthesis. IE they read into a list. If wrapper is supplied with a symbol, it is placed at the head of the list. If wrapper is a function, it will be applied to the syntax object result of reading (the argument will be a syntax object whether read or read-syntax is used – the result of read is created by using syntax->datum on the result of read-syntax).

Examples:

> (require udelim)
> (parameterize ([current-readtable (make-list-delim-readtable #\⟦ #\⟧)])
    (read
     (open-input-string "(a b ⟦c d e ⟦f g⟧ h i⟧ j k)")))
'(a b (c d e (f g) h i) j k)
> (parameterize ([current-readtable
                 (make-list-delim-readtable #\⟦ #\⟧ #:wrapper '#%white-brackets)])
    (read
     (open-input-string "(a b ⟦c d e ⟦f g⟧ h i⟧ j k)")))
'(a b (#%white-brackets c d e (#%white-brackets f g) h i) j k)
> (parameterize ([current-readtable
                 (make-list-delim-readtable #\⟦ #\⟧
                 #:wrapper '#%white-brackets
                 #:inside-readtable (make-list-delim-readtable #\⟦ #\⟧))])
    (read
     (open-input-string "(a b ⟦c d e ⟦f g⟧ h i⟧ j k)")))
'(a b (#%white-brackets c d e (f g) h i) j k)

The inside-readtable argument determines what readtable is used to read elements of the list. If inside-readtable is the symbol 'inherit, then the readtable being modified is used. Beware that that is likely not what you want! If you call make-list-delim-readtable multiple times to add multiple delimiters, you probably want the inner readtable to be the final readtable after all modifications are made (eg. by providing a function that returns the finalized readtable).

Be careful with inside-readtable – you can get potentially unexpected errors by switching the readtable inside a set of parenthesis. Specifically, if the inside-readtable does not treat the parens you are defining specially then you will need a space between any symbol and the closing parenthesis, or the reader will add that character to the symbol! This is particularly visible if the inside-readtable is the base (#f) readtable. So it is recommended to only use an inside-readtable that has the same parenthesis extensions (though perhaps with more defined, or with other extensions).

Examples:

> (require udelim)
> "In this example the result is what we expect."
"In this example the result is what we expect."
> (parameterize ([current-readtable (make-list-delim-readtable #\⟦ #\⟧ #:inside-readtable #f)])
    (read
     (open-input-string "(a b ⟦c d e ⟧ f g)")))
'(a b (c d e) f g)
> "In this example, the closing ⟧ will be read as part of a symbol, causing a strange error later!"
"In this example, the closing ⟧ will be read as part of a symbol, causing a strange error later!"
> (parameterize ([current-readtable (make-list-delim-readtable #\⟦ #\⟧ #:inside-readtable #f)])
    (read
     (open-input-string "(a b ⟦c d e⟧ f g)")))
string::21: read-syntax: unexpected `)`

If as-dispatch-macro? is true, the extension is added to the readtable as a 'dispatch-macro instead of a 'terminating-macro. This means that there will be a # character before the opening delimiter.

procedure
(make-string-delim-readtable
l-paren
r-paren
[ #:base-readtable base-readtable
#:as-dispatch-macro? as-dispatch-macro?
#:wrapper wrapper
#:string-read-syntax string-read-syntax
#:whole-body-readers? whole-body-readers?])
→ readtable?
  l-paren : char?
  r-paren : char?
  base-readtable : readtable? = #f
  as-dispatch-macro? : any/c = #f
  wrapper : (or/c false/c symbol? procedure?) = #f
   string-read-syntax : (or/c false/c (-> any/c input-port? any/c))
= #f
  whole-body-readers? : any/c = #f

Returns a new readtable based on base-readtable that uses l-paren and r-paren as delimiters to a non-escapable string (with balanced internal delimiters). If wrapper is provided, it wraps the string in an s-expression with that symbol at the head. If wrapper is a function, it will be applied to the syntax object result of reading (the argument will be a syntax object whether read or read-syntax is used – the result of read is created by using syntax->datum on the result of read-syntax).

In addition to simply being a nice additional option to make literal strings, it goes great with stx-string->port to use in macros that read alternative syntax, such as are used in #lang rash. Other things you might do are create macros that read interesting surface syntax for different data structures, list comprehensions, or common patterns that you use that would benefit from a different syntax.

If string-read-syntax is provided, then it will be applied to the string (transformed into a port with correct location info) to obtain a (probably non-string) syntax object. If whole-body-readers? is true, the function is applied just once to get a syntax object. Otherwise, the reader is applied repeatedly until it produces an EOF object and the results are placed in a syntax-object-wrapped list. string-read-syntax must be a function that could be used in place of read-syntax (IE it must accept two arguments, src and port). string-read-syntax is essentially useful for making non-readtable-based reader extensions where the extension is bounded by the given delimiters. However, using this will make your inner language still require any nested delimiters to be balanced, so for instance an unbalanced delimiter can’t be inside a string in the language of the nested reader. The tradeoff for needing balanced delimiters is that your inner reader doesn’t need to know how to detect its termination character and stop itself – it will get an EOF from an empty port instead, which can simplify some implementations or allow embedding of whole-file readers that weren’t originally intended to be embedded as a readtable extension. However, most of the time you are probably better off making a read function by hand that detects its closing delimiter itself so users aren’t confused about the situation of inner delimiters. If wrapper and string-read-syntax are both provided, the string-read-syntax function will be applied first.

If as-dispatch-macro? is true, the extension is added to the readtable as a 'dispatch-macro instead of a 'terminating-macro. This means that there will be a # character before the opening delimiter.

Examples:

> (require udelim)
> (parameterize ([current-readtable (make-string-delim-readtable #\« #\»)])
    (read
     (open-input-string "«this is a string with nested «string delimiters.»  No \\n escape interpreting.»")))
"this is a string with nested «string delimiters.»  No \\n escape interpreting."
> (parameterize ([current-readtable
                  (make-string-delim-readtable #\｢ #\｣ #:wrapper '#%cjk-corner-quotes)])
    (read
     (open-input-string "｢this is a string with nested ｢string delimiters.｣  No \\n escape interpreting.｣")))
'(#%cjk-corner-quotes "this is a string with nested ｢string delimiters.｣  No \\n escape interpreting.")

It’s great for regexps:

(regexp-match (pregexp "\\w*\\.+\\s\\d+\\w*") "foo.. 97bar")
(regexp-match (pregexp «\w*\.+\s\d+\w*») "foo.. 97bar")

It’s great for using macros that embed code in another syntax:

(rash «ls -l»)

procedure
(udelimify table) → readtable?
table : (or/c readtable? false/c)

Unstable. Added delimiters may change.

Returns the readtable given, but extended with several more delimiters (the same ones as #lang udelim).

Specifically: «» are nestable non-escaping string delimiters (IE «foo «bar»» reads as "foo «bar»"), ｢｣ are like «» but wrapped so ｢foo bar｣ produces (#%cjk-corner-quotes "foo bar"), ﴾foo bar﴿ reads as (#%ornate-parens foo bar), ⦓foo bar⦔ reads as (#%inequality-brackets foo bar), ⦕foo bar⦖ reads as (#%double-inequality-brackets foo bar), 🌜foo bar🌛 reads as (#%moon-faces foo bar), and ⟅foo bar⟆ reads as (#%s-shaped-bag-delim foo bar).

To get default meanings for the #% identifiers (currently mostly pass-through macros), use (require udelim/defaults). The only one that has a non-passthrough default is #%cjk-corner-quotes (given by ｢｣, defaults to pregexp).

procedure
(stx-string->port stx) → input-port?
stx : syntax?

Unstable. This doesn’t really fit with the rest of the package, and could be moved to another module or package.

stx should only contain a string. The location data on stx is used to give the resulting port accurate location information when it is read. This is useful for creating macros that allow embedding of alternate syntax, such as #lang rash does.

When you use read-syntax on the resulting port, the syntax objects will have correct location information, but will be lacking lexical context. To fix this, use replace-context.

Examples:

> (require udelim)
> (with-syntax
([str #'"#(this \"is a\" ((string that) can be) read with some reader function)"])
(read-syntax (syntax-source #'str) (stx-string->port #'str)))
#<syntax:eval:14:0 #(this "is a" ((string that) can be) read with some reader function)>

procedure
(scribble-strings->string stx) → syntax?
stx : syntax?

Unstable. This doesn’t really fit with the rest of the package, and could be moved to another module or package.

Takes a syntax object that represents a list of strings created by the scribble reader, and reconstitutes them into one string. If the syntax contains anything that is not a string, it raises an error.

This makes it easier for a sub-parsing macro to accept input either from the scribble reader or from a string (including the wonderful verbatim strings with nestable delimiters made with make-string-delim-readtable).

Example:

(require (for-syntax udelim syntax/strip-context syntax/parse))

;; this function likely exists somewhere...
(define-for-syntax (read-syntax* src in)
  (define (rec rlist)
    (let ([part (read-syntax src in)])
      (if (eof-object? part)
          (reverse rlist)
          (rec (cons part rlist)))))
  (rec '()))

(define-syntax (subparse stx)
  (syntax-parse stx
    [(subparse arg:str)
     (with-syntax ([(parg ...) (map (λ (s) (replace-context #'arg s))
                                     (read-syntax* (syntax-source #'arg)
                                                   (stx-string->port #'arg)))])
       #'(begin parg ...))]
    [(subparse arg:str ...+)
     (with-syntax ([one-str (scribble-strings->string #'(arg ...))])
       #'(subparse one-str))]))

4 Code and License🔗ℹ

The code is available on github.

This library is distributed under the MIT license and the Apache version 2.0 license, at your option.