8.13

1.1 Notation🔗ℹ

Like most languages, Rhombus syntax builds on a set of rules for parsing characters into tokens. Unlike most languages––but like Lisp, Scheme, and Racket—Rhombus syntax uses an additional layer of rules for grouping and nesting tokens. For languages in the Lisp family, the intermediate structure is S-expression notation, which gives Lisp its parenthesized, prefix notation. For Rhombus, the intermediate structure is shrubbery notation, which is designed to support traditional infix operators and rely on line breaks and indentation for grouping and nesting. This section offers a brief summary of shrubbery notation, but see Shrubbery Notation for complete details.

To explore shrubbery notation independent of Rhombus, try #lang shrubbery. The parsed form is represented as an S-expression, so the output is only useful if you’re familiar with S-expression notation.

Numbers are decimal, either integer or floating-point, or they’re hexadecimal, octal, or binary integers written with a 0x, 0o, or 0b prefix, respectively. An underscore can be used to separate digits in a number.

0

42

-42

1_048_576

3.14157

.5

6.022e23

0xf00ba7ba2

0o377

0b1001

Identifiers use Unicode alphanumeric characters, _, and emoji sequences, with an initial character that is not numeric.

pi

scissor7

π

underscore_case

camelCase

Keywords are like identifiers, but prefixed with ~ and no space. As a datatype distinct from identifiers, they are useful as names that cannot be misconstrued as bound variables or as any other kind of expression form.

~base

~stronger_than

The following characters are used for shrubbery structure and are mostly not available for use in operators:

  ( ) [ ] { } '   ; ,   : |   « »  \   "  # @

Any other Unicode punctuation or symbol character (but not an emoji) is fair game for an operator:

+

.

->

>=

!^$&%$

The : and | characters can be used as part of an operator, even though the characters have a special meaning when used alone. To avoid confusion with blocks, an operator cannot end with : unless it contains only : characters. Similarly, to avoid potential confusion with operators alongside numbers, an operator that ends in +, -, or . must consist only of that character. So, ++ and ... are operators, but !+ is not. Similar problems happen with comments, so an operator cannot contain // or /* or have multiple characters and end in /. A ~ cannot be used by itself as an operator to avoid confusion with ~ to form a keyword.

Shrubbery notation does not include a notion of operator precedence. Instead, Rhombus builds a precedence-parsing layer on top of shrubbery notation (which is why shrubberies are not full-grown trees). Precedence in Rhombus is macro-defined in the same way that syntactic forms are macro-defined in Rhombus.

Booleans are written with a leading # followed immediately by true or false.

#true

#false

Strings of Unicode characters use double quotes, and byte strings are similar, but with a # prefix. Strings and byte string support the usual escapes, such as \n for a newline character or byte.

"This is a string,\n just like you’d expect"

#"a byte string"

Comments are C-style, but block comments are nestable.

// This is a line comment

 

/* This is a multiline

      comment that /* continues */

      on further lines */

To aid interoperability with Racket and to support some rarely useful datatypes, such as characters, shrubbery notation includes an escape to S-expression notation through #{}. For example, #{list-first} is a single identifier that includes - as one of its characters. A #{} cannot wrap a list-structured S-expression that uses immediate parentheses, however.

Shrubbery notation is whitespace-sensitive, and it uses line breaks and indentation for grouping. A line with more indentation starts a block, and it’s always after a line that ends :. A | alternative also starts a block, and the | itself can start a new line, in which case it must line up with the start of its enclosing form. So, the |s below are written with the same indentation as if, match, or cond to create the alternative cases within those forms:

In DrRacket, hit Tab to cycle through the possible indentations for a line. See also Shrubbery Support in DrRacket.

block:

  println("group within block")

  println("another group within block")

 

if is_rotten(apple)

| get_another()

| take_bite()

  be_happy()

 

match x

| 0:

    let zero = x

    x + zero

| n:

    n + 1

 

cond

| // check the weather

  is_raining():

    take_umbrella()

| // check the destination

  going_to_beach():

    wear_sunscreen()

    take_umbrella()

| // assume a hat is enough

  ~else:

    wear_hat()

A : isn’t needed before the first | in an alts-block, because the | itself is enough of an indication that a sequence of alternatives is starting, but a : is allowed. Some forms support the combination of a : followed by a sequence of | alternatives, but most forms have either a : block or a sequence of | alternatives.

Each line within a block forms a group. Groups are important, because parsing and macro expansion are constrained to operate on groups (although a group can contain nested blocks, etc.). Groups at the same level of indentation as a previous line continue that group’s block. A | can have multiple groups in the subblock to its right. A : block or sequence of | alternatives can only be at the end of an enclosing group.

A : doesn’t have to be followed by a new line, but it starts a new block, anyway. Similarly, a | that starts an alternative doesn’t have to be on a new line. These examples parse the same as the previous examples:

block: group within block

       another group within block

 

if is_rotten(apple) | get_another() | take_bite()

                                      be_happy()

 

match x | 0: let zero = x

             x + zero

        | n: n + 1

 

cond | is_raining(): take_umbrella()

     | going_to_beach(): wear_sunscreen()

                         take_umbrella()

     | ~else: wear_hat()

Within a block, a ; can be used instead of a new line to start a new group, so these examples also parse the same:

block: group within block; another group within block

 

if is_rotten(apple) | get_another() | take_bite(); be_happy()

 

match x | 0: let zero = x; x + zero

        | n: n + 1

 

cond | is_raining(): take_umbrella()

     | going_to_beach(): wear_sunscreen(); take_umbrella()

     | ~else: wear_hat()

You can add extra ;s, such as at the end of lines, since ; will never create an empty group.

Finally, anything that can be written with newlines and indentation can be written on a single line, but « and » may be required to delimit a block using « just after : or | and » at the end of the block. Normally, parentheses work just as well, since they can be wrapped around any expression—but definitions, for example, can create a situation where « and » are needed to fit on a single line. Without « and », the following form would put x + zero() inside the definition of zero:

match x | 0: fun zero():« x »; x + zero() | n: n + 1

Parentheses (), square brackets [], and curly braces {} combine a sequence of groups. A comma , can be used to separate groups on one line between the opener and closer. Furthermore, a , is required to separate groups, even if they’re not on the same line. You can’t have extra ,s, except after the last group.

f(1, 2,

  3, 4)

 

["apples",

 "bananas",

 "cookies",

 "milk"]

 

map(add_five, [1, 2, 3, 4,])

Indentation still works for creating blocks within (), [], or {}:

map(fun (x):

      x + 5,

    [1, 2, 3, 4])

There are some subtleties related to the “precedence” of :, |, ;, and ,, but they’re likely to work as you expect in a given example.

Single-quote marks '' are used for quoting code (not strings), as in macros. Quotes work like (), except that the content is more like a top-level or block sequence, and ; is used as a group separator (optional when groups are on separate lines).

macro 'thunk: $(body :: Block)':

  'fun () $body'

Nested quoting sometimes requires the use of ' « ... » ' so that the nested opening quote is not parsed as a close quote. This counts as a different use of « and » than with : or |, and it doesn’t disable indentation for the quoted code.