shlex for Racket:   Simple lexical analysis
1 Functions
split
join
quote-arg
2 Safety
8.12

shlex for Racket: Simple lexical analysis🔗ℹ

Sorawee Porncharoenwase <sorawee.pwase@gmail.com>

 (require shlex) package: shlex

This library is a port of Python’s shlex library, originally implemented by Eric S. Raymond, Gustavo Niemeyer, Vinay Sajip, and other contributors. The library allows users to call system-like functions (e.g., system, process) safely, avoiding the shell injection attack. On the other direction, it allows users to convert arguments of system-like functions to a format that can be used with (safe) system*-like functions (e.g., system*, process*).

Note, however, that this library differs from the Python’s library. It only supports split (shlex.split), join (shlex.join), and quote-arg (shlex.quote) with limited customization. The implementation of split is based directly on the specification of the Shell Command Language in the Open Group Base Specifications Issue 7, 2018 edition, rather than Python’s implementation.

1 Functions🔗ℹ

procedure

(split s [#:comment? comment?])  (listof string?)

  s : (or/c string? input-port?)
  comment? : any/c = #t
Split s using shell-like syntax in POSIX mode.

When comment? is not #f, line comments via the character # are supported.

If there is an unterminated quote or escape sequence, exn:fail:read:eof will be raised.

The results in particular can be used with system*-like functions.

Notably, the function passes all but one tests in Python’s test suite. The discrepancy is due to how Python handles comments incorrectly.

Examples:
> (split "echo -n 'Multiple words'")

'("echo" "-n" "Multiple words")

> (split "echo \"abc \\\"123\\\" def\" ghi")

'("echo" "abc \"123\" def" "ghi")

> (split "ls#b") ; Python is wrong in this example, outputting ["ls"]

'("ls#b")

> (split "ls #b\nsome-file")

'("ls" "some-file")

> (split "ls #b\nsome-file" #:comment? #f)

'("ls" "#b" "some-file")

procedure

(join xs)  string?

  xs : (listof string?)
Concatenate the tokens of xs. This function is the inverse of split. The returned value can safely be used in a shell command line and system-like functions.

Example:
> (join (list "echo" "-n" "Multiple words"))

"echo -n 'Multiple words'"

procedure

(quote-arg s)  string?

  s : string?
Return a shell-escaped version of s. The returned value can safely be used as one token in a shell command line and system-like functions.

Example:
> (quote-arg "somefile; rm -rf ~")

"'somefile; rm -rf ~'"

2 Safety🔗ℹ

It might be tempting to write code as follows:

(define (ls-unsafe arg)
  (system (format "ls ~a" arg)))

However, the above code has a shell injection vulnerability. For example, when ls-unsafe is invoked with the argument "somefile; rm -rf ~", the argument to system would have the following value:

> (format "ls ~a" "somefile; rm -rf ~")

"ls somefile; rm -rf ~"

which causes the home directory to be deleted.

By using quote-arg, one can avoid this attack:

(define (ls-safe arg)
  (system (format "ls ~a" (quote-arg arg))))

We can see that when ls-safe is called with "somefile; rm -rf ~", the argument is quoted properly, thus avoiding the attack.

> (format "ls ~a" (quote-arg "somefile; rm -rf ~"))

"ls 'somefile; rm -rf ~'"