GTP measure

8.16

GTP measure🔗ℹ

Ben Greenman

For benchmarking.

1 Command-line: raco gtp-measure🔗ℹ

The gtp-measure raco command is a tool for measuring the performance of a set of gtp-measure targets according to a set of configuration options.

See also: GTP targets

See also: GTP configuration

To see all accepted flags: raco gtp-measure --help

To measure performance and print status messages:

PLTSTDERR="error info@gtp-measure" raco gtp-measure ....

1.1 Stages of measurement🔗ℹ

After gtp-measure is invoked on the command line, it operates in five stages:

resolve the command-line targets to actual files / directories;
resolve the command-line configuration options to a configuration;
setup a measuring task, based on the targets;
divide the measuring task into sub-tasks;
collect data, write to the task’s data directory.

1.2 Configuration and Data Files🔗ℹ

The gtp-measure library uses the basedir library to obtain configuration and data files.

User-level configuration settings are stored in the file:

(writable-config-file "config.rktd" #:program "gtp-measure")

Each task gets a data directory, stored under:

(writable-data-dir #:program "gtp-measure")

Together, the data files and command-line arguments build a gtp-measure configuration value. See Configuration Fallback for details on how these data sources work together.

2 GTP targets🔗ℹ

A gtp-measure target is either:

gtp file target : a file containing a Racket module and exactly one call to time-apply (possibly via time);
gtp typed-untyped target : a directory containing: (1) a "typed" directory, (2) an "untyped" directory, (3) optionally a "base" directory, and (4) optionally a "both" directory.
- The "typed" directory must contain a few typed/racket modules.
- The "untyped" directory must contain matching Racket modules. These modules must have the same name as the modules in the "typed" directory, and should have the same code as the typed modules — just missing type annotations and type casts.
- The optional "base" directory may contain data files that the "typed" and "untyped" modules may reference via a relative path (e.g. "../base/file.rkt")
- The optional "both" directory may contain modules that the "typed" and "untyped" modules may reference as if they were in the same directory (e.g. "file.rkt"). If so, the "typed" and "untyped" modules will not compile unless the "both" modules are copied into their directory. This is by design.
gtp manifest target : a file containing a gtp-measure/manifest module.
gtp deep-shallow-untyped target : a directory containing: (1) a "typed" directory, (2) an "untyped" directory, (3) a "shallow" directory, (4) optionally a "base" directory, and (5) optionally a "both" directory.
- The "typed" and "untyped" directories must follow the same guidelines as for a typed-untyped target.
- The "shallow" directory must contain matching typed modules in Transient mode (#lang typed/racket #:transient).
- The optional "base" directory may contain data files that the "typed" and "untyped" modules may reference via a relative path (e.g. "../base/file.rkt")
- The optional "both" directory may contain modules that the "typed" and "untyped" modules may reference as if they were in the same directory (e.g. "file.rkt"). If so, the "typed" and "untyped" modules will not compile unless the "both" modules are copied into their directory. This is by design.

To measure a file target, gtp-measure compiles the file once and repeatedly: runs the file and parses the output of time-apply. See GTP configuration for details on how gtp-measure compiles and runs Racket modules.

To measure a typed-untyped target, gtp-measure chooses a sequence of typed-untyped configurations and, for each: copies the configuration to a directory, and runs this program’s entry module as a file target. The sequence of configurations is either exhaustive or approximate.

To measure a manifest target, gtp-measure runs the targets listed in the manifest.

To measure a deep-shallow-untyped target, the protocol is similar to typed-untyped targets.

2.1 Typed-Untyped Configuration🔗ℹ

A typed-untyped configuration for a typed-untyped target with M modules is a working program with M modules — some typed (maybe none), some untyped.

The gtp-measure library encodes such a configuration with a string of length M where each character is either #\0 or #\1. If the character at position i is #\0, the configuration uses the i-th module in the "untyped" directory and ignores the i-th module in the "typed" directory. If the character at position i is #\1, the configuration uses the i-th "typed" module and ignores the "untyped" module. Modules are ordered by filename-sort.

2.2 Exhaustive vs. Approximate evaluation🔗ℹ

An exhaustive evaluation of a typed-untyped target with M modules measures the performance of all 2M configurations. This is a lot of measuring, and will probably take a very long time if M is 15 or more.

An R-S-approximate evaluation measures R * S * M randomly-selected configurations; more precisely, R sequences containing S*M configuration in each sequence. This number, RSM, is probably less than 2M. (If it’s not, just do an exhaustive evaluation.) See GTP configuration for how to set R and S, and how to switch from an exhaustive evaluation to an approximate one.

The idea of an approximate evaluation comes from our work on Typed Racket. Greenman and Migeed (PEPM 2018) give a more precise definition, and apply the idea to Reticulated Python. Note that gtp-measure uses a different definition of S than the PEPM paper.

2.3 Design: typed-untyped directory🔗ℹ

The point of a typed-untyped directory is to describe an exponentially-large set of programs in “less than exponential” space. The set is all ways of taking a Typed Racket program and removing some of its types — specifically, removing types from some of the modules in the program. So given a typed-untyped directory, gtp-measure needs to be able to generate and run each program.

The "typed" and "untyped" directories are a first step to reduce space. Instead of storing all 2M programs for a program with M modules, we store 2M modules. The reason we store 2M instead of just M typed modules is that we do not have a way to automatically remove types from a Typed Racket program (to remove types, we sometimes want to translate type casts to Racket).

The "base" directory is a second way to save space. If a program depends on data or libraries, they belong in the "base" directory so that all configurations can reference one copy.

The "both" directory helps us automatically generate configurations by solving a technical problem. The problem is that if an untyped module defines a struct and two typed modules import it, both typed modules need to reference a canonical require/typed for the struct’s type definitions. We solve this by putting an type adaptor module with the require/typed in the "both" directory. An adaptor can require "typed" or "untyped" modules, and typed modules can require the adaptor.

3 GTP configuration🔗ℹ

(require gtp-measure/configure)

package: gtp-measure

The gtp-measure library is parameterized by a set of key/value pairs. This section documents the available keys and the type of values each key expects.

value
gtp-measure-config/c : flat-contract?

Contract for a gtp-measure configuration; that is, an immutable hash whose keys are a subset of those documented below and whose values match the descriptions below.

symbol
key:bin

Value must be a string that represets a path to a directory. The directory must contain executables named raco and racket.

Used to compile and run Racket programs.

In particular, if <BIN> is the value of key:bin then the command to compile the target <FIILE> is:

<BIN>/raco make -v <FILE>

and the command to run <FILE> is:

<BIN>/racket <FILE>

Since this package was originally created to measure the GTP benchmarks, which depend on the require-typed-check package, invoking raco gtp-measure ensures that the package is installed for the current value of key:bin. If the package is missing, <BIN>/raco pkg installs it.

Changed in version 0.3 of package gtp-measure: Automatically install require-typed-check if missing.

symbol
key:iterations

Value must be an exact-positive-integer?.

Determines the number of times to run a file target and collect data.

symbol
key:jit-warmup

Value must be an exact-nonnegative-integer?.

Determines the number of times (if any) to run a file target and ignore the output BEFORE collecting data.

symbol
key:num-samples

Value must be an exact-positive-integer?

Determines R, the number of samples for any approximate evaluations.

symbol
key:sample-factor

Value must be an exact-positive-integer?

Determines the size of each sample in any approximate evaluations. The size is S*M, where S is the value associated with this key and M is the number of modules in the typed-untyped target.

symbol
key:cutoff

Value must be an exact-nonnegative-integer?.

Determines whether to run an exhaustive or approximate evaluation for a typed-untyped target. Let M be the number of modules in the target and let C be the value associated with this key. If (<= M C), then gtp-measure runs an exhaustive evaluation; otherwise, it runs an approximate evaluation.

symbol
key:entry-point

Value must be a string that represents a filename.

Determines the entry module of all typed-untyped targets. This module is treated as a file target for each configuration in the typed-untyped evaluation.

symbol
key:start-time

Value must be a real number.

By default, this is the value of current-inexact-milliseconds when gtp-measure was invoked. You should probably not override this default.

symbol
key:time-limit

Value must be an (or/c #false exact-nonnegative-integer?).

Sets a time limit for the total time to run a configuration. If the value is #false then there is no time limit. Otherwise, the value is the time limit in seconds.

The total time includes all the warmup iterations and all the collecting iterations.

3.1 Configuration Fallback🔗ℹ

The gtp-measure library defines a default value for each configuration key. Users can override this default by writing a hashtable with relevant keys (a subset of the keys listed above) to their configuration file. Users can override both the defaults and their global configuration by supplying a command-line flag. Run raco gtp-measure --help to see available flags.

The defaults for the machine that rendered this document are the following:

key:bin = "/home/root/racket/bin/"
key:iterations = 8
key:jit-warmup = 1
key:num-samples = 10
key:sample-factor = 10
key:cutoff = 9
key:entry-point = "main.rkt"
key:start-time = 0
key:time-limit = #f
key:argv = ()

4 GTP measuring task🔗ℹ

A task describes a sequence of targets to measure.

4.1 GTP task setup🔗ℹ

Before measuring the targets in a task, the gtp-measure library allocates a directory for the task and writes files that describe what is to be run. If the task is interrupted, gtp-measure may be able to resume the task; run raco gtp-measure --help for instructions.

4.2 GTP sub-task🔗ℹ

A sub-task is one unit of a task. This concept is not well-defined. The idea is to divide measuring tasks into small pieces so there is little to recompute if a task is interrupted.

More later.

5 Data Description Languages🔗ℹ

The gtp-measure library includes a few small languages to describe data formats.

5.1 Manifest = Benchmark Instructions🔗ℹ

#lang gtp-measure/manifest

package: gtp-measure

A manifest contains an optional hash with configuration options and a sequence of target descriptors.

The configuration options must be prefixed by the keyword #:config and must be a hash literal that matches the gtp-measure-config/c contract. If present, the options specified in the hash override any defaults.

A target descriptor is either a string representing a file or directory, or a pair of such a string and a target kind. In the first case, the target kind is inferred at runtime. In the second case, the target kind is checked at runtime.

#lang gtp-measure/manifest

#:config #hash((iterations . 10))

file-0.rkt
typed-untyped-dir-0
"file-1.rkt"
("file-2.rkt" . file)
(typed-untyped-dir-1 . typed-untyped)

There is an internal syntax class for these “target descriptors” that should be made public.

5.2 Output Data: File Target🔗ℹ

#lang gtp-measure/output/file

package: gtp-measure

Output data for one gtp file target.

Each line contains a result for one iteration of the file. The result may be:

successful time output, containing the CPU time, real time, and GC time;
a Racket runtime error message;
or a timeout notice ("timeout N").

5.3 Output Data: Typed-Untyped Target🔗ℹ

#lang gtp-measure/output/typed-untyped
	package: gtp-measure

Output data for a gtp typed-untyped target.

Each line is the result for one configuration. The first element is the name of the configuration; the second is a sequence of file results.

Provides an identifier gtp-data that is bound to the full dataset.

Example data from a benchmark that ran with no timeouts or errors:

#lang gtp-measure/output/typed-untyped
("00000" ("cpu time: 566 real time: 567 gc time: 62" "cpu time: 577 real time: 578 gc time: 62"))
("00001" ("cpu time: 820 real time: 822 gc time: 46" "cpu time: 793 real time: 795 gc time: 44"))
("00010" ("cpu time: 561 real time: 562 gc time: 46" "cpu time: 565 real time: 566 gc time: 44"))
("00011" ("cpu time: 805 real time: 807 gc time: 47" "cpu time: 813 real time: 815 gc time: 45"))
....

Running an output file prints a summary:

$ racket jpeg-2020-08-17.rktd
dataset info:
- num configs: 32
- num timings: 256
- min time: 110 ms
- max time: 8453 ms
- total time: 968537 ms

5.4 Output Data: Deep-Shallow-Untyped Target🔗ℹ

#lang gtp-measure/output/deep-shallow-untyped
	package: gtp-measure

Output data for a gtp deep-shallow-untyped target.

Each line is the result for one configuration. The first element is the name of the configuration; the second is a sequence of file results.

Provides an identifier gtp-data that is bound to the full dataset.

Example data from a benchmark that ran with no timeouts or errors:

#lang gtp-measure/output/deep-shallow-untyped
("00000" ("cpu time: 325 real time: 325 gc time: 60"))
("00001" ("cpu time: 336 real time: 336 gc time: 64"))
("00002" ("cpu time: 332 real time: 332 gc time: 64"))
("00010" ("cpu time: 7059 real time: 7061 gc time: 70"))
("00020" ("cpu time: 410 real time: 410 gc time: 64"))
("00011" ("cpu time: 7119 real time: 7121 gc time: 76"))
("00012" ("cpu time: 7035 real time: 7037 gc time: 76"))
("00021" ("cpu time: 426 real time: 426 gc time: 63"))
("00022" ("cpu time: 433 real time: 433 gc time: 77"))
("00100" ("cpu time: 7154 real time: 7158 gc time: 80"))
....

Running an output file prints a summary:

$ racket jpeg-2020-08-17.rktd
dataset info:
- num configs: 243
- num timings: 1944
- min time: 117 ms
- max time: 9036 ms
- total time: 6827787 ms

6 gtp-measure Utilities🔗ℹ

6.1 Time Limit Parsing🔗ℹ

procedure
(string->time-limit str) → exact-nonnegative-integer?
str : string?

Convert a string to a natural number that represents a time limit in seconds. The string must begin with a sequence of digits ([0-9]*) and may optionally end with one of the following unit suffixes: "h" (hours), "m" (minues), "s" (seconds, the default).

Examples:

> (string->time-limit "1")
1
> (string->time-limit "1s")
1
> (string->time-limit "1m")
60
> (string->time-limit "1h")
3600

procedure
(hours->seconds h) → exact-nonnegative-integer?
h : exact-nonnegative-integer?
procedure
(minutes->seconds m) → exact-nonnegative-integer?
m : exact-nonnegative-integer?

Convert a positive amount of time (in some kind of units) to an equal amount of seconds.

Examples:

> (hours->seconds 1)
3600
> (minutes->seconds 1)
60

1	Command-line: raco gtp-measure
2	GTP targets
3	GTP configuration
4	GTP measuring task
5	Data Description Languages
6	gtp-measure Utilities