GTP plot
Tools for visualizing the performance of a gradual typing system. If you have performance data, this package should help you plot it.
1 Introduction: Gradual Typing Performance
Typed Racket is a gradual typing system because programs can mix statically-typed typed/racket code with dynamically-typed racket code.
A run-time check like T? makes the program safe, but it also makes it slow. How slow? Depends on the number of checks like T? and the cost of each check.
The purpose of this package is to help answer the question of "how slow".
1.1 Performance Evaluation Assumptions
This package assumes that its clients have:
a program,
a set of gradually-typed configurations of the program,
data on the performance of each configuration.
The data must be in one of the built-in data formats.
2 Data Definitions
2.1 Configuration Info
(require gtp-plot/configuration-info) | package: gtp-plot |
struct
(struct configuration-info (id num-types runtime*) #:extra-constructor-name make-configuration-info #:prefab) id : any/c num-types : natural? runtime* : nonnegative-real/c
The id field is an identifier for the configuration; the purpose is to distinguish one configuration from other gradually-typed versions of the same program. The num-types field is the number of type annotations in the program. The runtime* field is a list of running times for the configuration.
For example, a fully-typed Typed Racket program with N modules
can be gradually typed in (expt 2 N) different ways by removing
some of the N type annotations.
The gtp-plot/typed-racket-info module represents configurations
with "bitstrings" —
#s(configuration-info "0000" 0 (4133 4074 4163)) #s(configuration-info "0001" 1 (6380 6189 6423)) #s(configuration-info "0010" 1 (11075 10937 11863)) #s(configuration-info "0011" 2 (9384 9740 9418))
procedure
(configuration-info->id cfg) → any/c
cfg : configuration-info?
procedure
cfg : configuration-info?
procedure
→ (listof nonnegative-real/c) cfg : configuration-info?
procedure
cfg : configuration-info?
2.2 Performance Info
(require gtp-plot/performance-info) | package: gtp-plot |
syntax
A performance info structure contains performance data for one gradually typed program. An exhaustive performance info structure contains data for all configurations in the program. An (r,s) simple-random-approximate (SRA) performance info structure contains data for r samples of configurations where each sample contains data for s configurations chosen uniformly at random.
procedure
(performance-info? x) → boolean?
x : any/c
procedure
(make-performance-info name #:src src #:num-units num-units #:num-configurations num-configurations #:baseline-runtime* baseline-runtime* #:untyped-runtime* untyped-runtime* #:typed-runtime* typed-runtime* #:make-in-configurations make-in-configurations) → performance-info? name : symbol? src : (or/c #f path-string?) num-units : natural? num-configurations : natural? baseline-runtime* : (listof nonnegative-real/c) untyped-runtime* : (listof nonnegative-real/c) typed-runtime* : (listof nonnegative-real/c) make-in-configurations : (-> performance-info? (sequence/c configuration-info?))
name is the name of the program;
src is a performance data for the program (or #f for datasets that do not live in a file), this file can be in any format that make-in-configurations understands;
num-units is the number of typeable components in the program;
num-configurations is the number of configurations in the program;
baseline-runtime* is the performance of the given program without any gradual typing;
untyped-runtime* is the performance of the program with no type annotations, this may be the same as the baseline runtime;
typed-runtime* is the performance of the program when fully-typed;
make-in-configurations is a function that accepts a performance info structure (specifically, itself) and returns a sequence with the data for each configuration in the program.
procedure
(performance-info->name pi) → symbol?
pi : performance-info?
procedure
(performance-info->src pi) → (or/c #f path-string?)
pi : performance-info?
procedure
pi : performance-info?
procedure
→ (listof nonnegative-real/c) pi : performance-info?
procedure
→ (listof nonnegative-real/c) pi : performance-info?
procedure
→ (listof nonnegative-real/c) pi : performance-info?
procedure
(performance-info->baseline-runtime pi) → nonnegative-real/c
pi : performance-info?
procedure
pi : performance-info?
procedure
pi : performance-info?
procedure
(performance-update-name pi) → performance-info?
pi : performance-info?
procedure
(performance-update-src pi) → performance-info?
pi : performance-info?
procedure
pi : performance-info?
procedure
(filter-performance-info pi keep-cfg?) → performance-info?
pi : performance-info? keep-cfg? : (-> configuration-info? any/c)
#lang racket/base (require gtp-plot/configuration-info gtp-plot/performance-info) ;; Keep fully-typed configurations, and 'fully-typed but for 1-2 units' (define (keep-nearly-typed-configs pi) (define max-types (performance-info->num-units pi)) (define min-types (max (- max-types 2) 0)) (define (nearly-typed? cfg) (<= min-types (configuration-info->num-types cfg) max-types)) (filter-performance-info pi nearly-typed?))
procedure
pi : performance-info?
procedure
(deliverable D) → (-> performance-info? natural?)
D : nonnegative-real/c
A configuration is D-deliverable if its performance is at most Dx slower than the baseline runtime.
procedure
(overhead pi t) → nonnegative-real/c
pi : performance-info? t : nonnegative-real/c (overhead pi) → (-> nonnegative-real/c nonnegative-real/c) pi : performance-info?
procedure
(count-configurations pi f) → natural?
pi : performance-info? f : (-> configuration-info? any)
procedure
(filter-configurations pi f) → (listof configuration-info?)
pi : performance-info? f : (-> nonnegative-real/c any)
procedure
(max-overhead pi) → nonnegative-real/c
pi : performance-info?
procedure
(mean-overhead pi) → nonnegative-real/c
pi : performance-info?
procedure
(min-overhead pi) → nonnegative-real/c
pi : performance-info?
procedure
pi : performance-info?
procedure
pi : performance-info?
procedure
pi : performance-info?
procedure
pi : performance-info?
procedure
pi : performance-info?
procedure
(fold/mean-runtime pi f [#:init init]) → any
pi : performance-info? f : (-> any/c nonnegative-real/c any) init : (or/c #f (-> nonnegative-real/c any)) = #f
If init is #false, then the initial accumulator is the mean running time of some configuration and f must return a nonnegative real number. If init is a procedure, then the initial accumulator is the result of applying init to an arbitrary configuration.
2.3 Sample Info
(require gtp-plot/sample-info) | package: gtp-plot |
procedure
(sample-info? x) → boolean?
x : any/c
procedure
(make-sample-info pi samples) → sample-info?
pi : performance-info? samples : (listof (listof configuration-info?))
procedure
(sample-info->sample-size si) → natural?
si : sample-info?
procedure
(sample-info->num-samples si) → natural?
si : sample-info?
procedure
→ (listof performance-info?) si : sample-info?
3 Plot
(require gtp-plot/plot) | package: gtp-plot |
The examples in this section use the following context:
(require gtp-plot/performance-info gtp-plot/reticulated-info gtp-plot/typed-racket-info gtp-plot/sample-info gtp-plot/plot pict (only-in racket/random random-sample) racket/runtime-path) (define-runtime-path mbta-data "./scribblings/data/mbta-v6.2.rktd") (define mbta (make-typed-racket-info mbta-data)) (define-runtime-path sample_fsm-data "./scribblings/data/sample_fsm/") (define sample_fsm (make-reticulated-info sample_fsm-data)) (*OVERHEAD-PLOT-HEIGHT* 200) (*OVERHEAD-PLOT-WIDTH* 400) (*FONT-SIZE* 14)
For command-line options, run raco gtp-plot --help.
procedure
(overhead-plot pi*) → pict?
pi* : (treeof performance-info?)
This function is intended for exhaustive performance info structures.
(overhead-plot mbta)
(parameterize ((*OVERHEAD-SHOW-RATIO* #f)) (overhead-plot (list mbta (typed-racket-info%best-typed-path mbta 2))))
procedure
(exact-runtime-plot pi) → pict?
pi : (treeof performance-info?)
One configuration cfg renders N points, where N is the length of the list (configuration-info->runtime* cfg). These points are spread across the x-axis bucket that represents configurations with T type annotations, where T is the value of (configuration-info->num-types cfg).
(exact-runtime-plot mbta)
procedure
(samples-plot si) → pict?
si : sample-info?
(parameterize ([*OVERHEAD-SHOW-RATIO* #f]) (samples-plot sample_fsm))
procedure
(validate-samples-plot pi si) → pict?
pi : performance-info? si : sample-info?
(parameterize ([*OVERHEAD-SHOW-RATIO* #f]) (let* ((sample* (for/list ((i (in-range 4))) (random-sample (in-configurations mbta) 20))) (si (make-sample-info mbta sample*))) (validate-samples-plot mbta si)))
procedure
(relative-scatterplot pi-x pi-y) → pict?
pi-x : performance-info? pi-y : performance-info?
Points along the line X=Y represent configurations with equal overhead in both datasets. Points below the line are faster in pi-x and points to the left of the line are faster in pi-y.
(parameterize ([*POINT-SYMBOL* 'dot] [*POINT-SIZE* 30] [*POINT-ALPHA* 0.5] [*POINT-COLOR* 6]) (relative-scatterplot mbta mbta))
Added in version 0.2 of package gtp-plot.
(parameterize ([*FONT-SIZE* 8] [*GRID-NUM-COLUMNS* 2] [*GRID-Y* #f] [*OVERHEAD-PLOT-HEIGHT* 100] [*OVERHEAD-SHOW-RATIO* #f]) (grid-plot overhead-plot (list mbta mbta mbta)))
procedure
(performance-lattice pi) → pict?
pi : (or/c performance-info? natural?)
(parameterize ([*FONT-SIZE* 14] [*LATTICE-UNIT-WIDTH* 16] [*LATTICE-UNIT-HEIGHT* 12] [*LATTICE-CONFIG-X-MARGIN* 10] [*LATTICE-CONFIG-Y-MARGIN* 25] [*LATTICE-LINES?* #true] [*LATTICE-LINE-ALPHA* 0.5]) (ht-append 4 (performance-lattice mbta) (performance-lattice 3)))
Added in version 0.5 of package gtp-plot.
3.1 Experimental Plotting Functions
The functions in this section make strange plots. The appearance of these plots is subject to change.
procedure
(cloud-plot pi) → pict?
pi : performance-info?
Intended for exhaustive performance info structures.
(cloud-plot mbta)
procedure
(discrete-overhead-plot pi D*) → pict?
pi : performance-info? D* : (listof nonnegative-real/c)
(discrete-overhead-plot mbta '(1.2 4 10))
procedure
(rectangle-plot pi) → pict?
pi : performance-info?
Intended for exhaustive performance info structures.
(rectangle-plot mbta)
procedure
(relative-overhead-cdf pi-base pi-new) → pict?
pi-base : performance-info? pi-new : performance-info?
finds the configurations that are common to pi-new and pi-base;
groups the configurations by the quotient of their pi-new runtime and pi-base runtime;
plots a line of points (x,y), which means: y% of the common configurations are at most x times slower on pi-new
If pi-old and pi-new are the same, then the relative overhead of each configuration is 1.
(relative-overhead-cdf mbta mbta)
3.2 Plot Parameters
parameter
(*BAR-WIDTH*) → nonnegative-real/c
(*BAR-WIDTH* bar-width) → void? bar-width : nonnegative-real/c
= 0.1
parameter
(*DECORATIONS-COLOR*) → plot-color/c
(*DECORATIONS-COLOR* pc) → void? pc : plot-color/c
= 0
Added in version 0.2 of package gtp-plot.
parameter
(*FONT-SIZE*) → exact-positive-integer?
(*FONT-SIZE* font-size) → void? font-size : exact-positive-integer?
= 10
parameter
(*GRID-NUM-COLUMNS*) → exact-positive-integer?
(*GRID-NUM-COLUMNS* grid-num-columns) → void? grid-num-columns : exact-positive-integer?
= 3
parameter
(*INTERVAL-ALPHA*) → nonnegative-real/c
(*INTERVAL-ALPHA* ia) → void? ia : nonnegative-real/c
= 1
parameter
(*OVERHEAD-FREEZE-BODY*) → boolean?
(*OVERHEAD-FREEZE-BODY* freeze?) → void? freeze? : boolean?
= #f
parameter
(*OVERHEAD-LEGEND?*) → boolean?
(*OVERHEAD-LEGEND?* show-legend?) → void? show-legend? : boolean?
= #true
parameter
(*OVERHEAD-LINE-COLOR*) → plot-color/c
(*OVERHEAD-LINE-COLOR* line-color) → void? line-color : plot-color/c
= 3
parameter
(*OVERHEAD-LINE-WIDTH*) → nonnegative-real/c
(*OVERHEAD-LINE-WIDTH* line-width) → void? line-width : nonnegative-real/c
= 1
parameter
(*OVERHEAD-MAX*) → exact-positive-integer?
(*OVERHEAD-MAX* x-max) → void? x-max : exact-positive-integer?
= 20
parameter
(*OVERHEAD-PLOT-HEIGHT*) → nonnegative-real/c
(*OVERHEAD-PLOT-HEIGHT* plot-height) → void? plot-height : nonnegative-real/c
= 300
parameter
(*OVERHEAD-PLOT-WIDTH*) → nonnegative-real/c
(*OVERHEAD-PLOT-WIDTH* plot-height) → void? plot-height : nonnegative-real/c
= 600
parameter
(*OVERHEAD-SAMPLES*) → exact-positive-integer?
(*OVERHEAD-SAMPLES* num-samples) → void? num-samples : exact-positive-integer?
= 20
parameter
(*OVERHEAD-SHOW-RATIO*) → boolean?
(*OVERHEAD-SHOW-RATIO* show?) → void? show? : boolean?
= #t
parameter
(*POINT-ALPHA*) → nonnegative-real/c
(*POINT-ALPHA* pa) → void? pa : nonnegative-real/c
= 0.4
parameter
(*POINT-COLOR*) → plot-color/c
(*POINT-COLOR* pc) → void? pc : plot-color/c
= 2
parameter
(*POINT-SIZE*) → exact-positive-integer?
(*POINT-SIZE* ps) → void? ps : exact-positive-integer?
= 3
parameter
(*POINT-SYMBOL*) → point-sym/c
(*POINT-SYMBOL* ps) → void? ps : point-sym/c
= 'fullcircle
parameter
(*SAMPLE-COLOR*) → plot-color/c
(*SAMPLE-COLOR* sc) → void? sc : plot-color/c
= "chocolate"
parameter
(*LATTICE-UNIT-BORDER-WIDTH* r) → void? r : nonnegative-real?
= 0
parameter
(*LATTICE-UNIT-X-MARGIN* r) → void? r : nonnegative-real?
= 0
parameter
(*LATTICE-UNIT-HEIGHT* r) → void? r : nonnegative-real?
= 6
parameter
(*LATTICE-UNIT-WIDTH* r) → void? r : nonnegative-real?
= 3
parameter
(*LATTICE-CONFIG-X-MARGIN* r) → void? r : nonnegative-real?
= 3
parameter
(*LATTICE-CONFIG-Y-MARGIN* r) → void? r : nonnegative-real?
= 8
parameter
(*LATTICE-CONFIG-LABEL-MARGIN* r) → void? r : nonnegative-real?
= 1
parameter
(*LATTICE-LINE-WIDTH* r) → void? r : nonnegative-real?
= 0.5
parameter
(*LATTICE-LINE-ALPHA* r) → void? r : nonnegative-real?
= 0.2
parameter
(*LATTICE-UNTYPED-COLOR*) → plot-color/c
(*LATTICE-UNTYPED-COLOR* c) → void? c : plot-color/c
= "white"
parameter
(*LATTICE-TYPED-COLOR*) → plot-color/c
(*LATTICE-TYPED-COLOR* c) → void? c : plot-color/c
= "black"
4 Data Formats
4.1 Typed Racket Info
(require gtp-plot/typed-racket-info) | package: gtp-plot |
procedure
(typed-racket-data? ps) → boolean?
ps : path-string?
lives in a file named "NAME-vX-OTHER.rktd", where "NAME" is the name of the program and "vX" is a Racket version number, and "OTHER" is any string (used to distinguish this data from other data with the same prefix).
or lives in a file with any name where the first non-whitespace characters on the first non-blank-line, non-comment line are the #( characters.
or lives in a #lang gtp-measure/output/typed-untyped file.
contains (or is) a Racket vector with (expt 2 N) elements (for some natural N); entries in the vector are lists of runtimes.
or contains valid #lang gtp-measure/output/typed-untyped data.
Changed in version 0.4 of package gtp-plot: Accept gtp-measure output.
Changed in version 0.5: Accept vector
;; Example Typed Racket GTP dataset
#((1328)
(42)
(8) (1))
procedure
(make-typed-racket-info ps [#:name name]) → typed-racket-info?
ps : typed-racket-data? name : (or/c #f symbol?) = #f
procedure
(typed-racket-info? x) → boolean?
x : any/c
procedure
(typed-racket-id? x) → boolean?
x : any/c
> (typed-racket-id? "0000") '(#\0 #\1)
> (typed-racket-id? "001000011") '(#\1)
> (typed-racket-id? "2001") #f
> (typed-racket-id? 10) #f
procedure
(typed-racket-id<=? id0 id1) → boolean?
id0 : typed-racket-id? id1 : typed-racket-id?
> (typed-racket-id<=? "0000" "1111") #t
> (typed-racket-id<=? "1111" "0000") #f
> (typed-racket-id<=? "0110" "1110") #t
> (typed-racket-id<=? "0110" "1011") #f
Added in version 0.6 of package gtp-plot.
procedure
(make-typed-racket-sample-info src #:name name #:typed-configuration tc #:untyped-configuration uc) → sample-info? src : (listof typed-racket-info?) name : symbol? tc : typed-configuration-info? uc : untyped-configuration-info?
procedure
(make-typed-racket-configuration-info id time*) → typed-racket-configuration-info? id : typed-racket-id? time* : (listof nonnegative-real/c)
procedure
x : any/c
procedure
x : any/c
procedure
x : any/c
procedure
(typed-racket-info%best-typed-path pi n) → typed-racket-info?
pi : typed-racket-info? n : natural?
Added in version 0.6 of package gtp-plot.
4.2 Reticulated Info
(require gtp-plot/reticulated-info) | package: gtp-plot |
procedure
(reticulated-data? ps) → boolean?
ps : path-string?
has a name like "NAME-OTHER", where "NAME" is the name of the program and "OTHER" is any string (used to distinguish from other data with the same prefix);
contains a file named "NAME-python.tab", where each line has a floating point number;
- either:
contains a Reticulated data file named "NAME.tab";
or a compressed Reticulated data file named "NAME.tab.gz";
or a Reticulated meta file "NAME-meta.rktd", a data file "NAME-retic-typed.tab", a data file "NAME-retic-untyped.tab", and a sequence of files named "sample-NUM.tab" (where "NUM" is a sequence of digits).
ID NUM-TYPES [TIME, ...]
Wikipedia: mixed-radix number
ID is a hyphen-separated sequence of digits. These represent a configuration as a mixed-radix number; the radix of position k is the number of typeable components in module k of the program.NUM-TYPES is a natural number, and
TIME is a positive floating-point number.
13-0-2-7-7 9 [2.096725507, 2.134529453, 2.088266101, 2.1067140589999998, ]
0-1-18-7-3 11 [2.153331553, 2.217980131, 2.0789024329999997, 2.1293707979999996, ]
#hash((num-units . 19))
Sorry for the complex format, it’s strongly motivated by the data in the gm-pepm-2018 package.
procedure
(make-reticulated-info ps [kind]) → reticulated-info?
ps : reticulated-data? kind : (or/c 'exhaustive 'approximate #f) = #f
By default, inspects the given dataset and chooses whether to return a reticulated-info? struct with exhaustive data or a sample-info? struct with approximate data. Use the kind argument to override the default.
procedure
(reticulated-info? x) → boolean?
x : any/c
procedure
(reticulated-id? x) → boolean?
x : any/c
Configuration names may be strings of hypen-separated numbers or lists of natural numbers.
> (reticulated-id? "0") #t
> (reticulated-id? "0-0-0") #t
> (reticulated-id? "2001-14-3") #t
> (reticulated-id? '(2001 14 3)) #t
To decode the type annotations from one number N, convert N to base-2 and look for #\0 digits. Each place in a binary number correponds to a typed unit, each #\0 means the unit is typed, and each #\1 means the unit in untyped.
Be advised, the interpretations of #\0 and #\1 are the opposite of what typed-racket-id? does.
procedure
(reticulated-id<=? id0 id1) → boolean?
id0 : reticulated-id? id1 : reticulated-id?
> (reticulated-id<=? "1" "0") #t
> (reticulated-id<=? "0" "1") #f
> (reticulated-id<=? "7-6" "5-6") #t
> (reticulated-id<=? "6-6" "5-6") #f
The input configurations must be for the same benchmark.
Added in version 0.6 of package gtp-plot.
procedure
pi : reticulated-info? n : natural?
Added in version 0.6 of package gtp-plot.
5 Support
5.1 System API
(require gtp-plot/system) | package: gtp-plot |
NOTE: This library is deprecated; use gtp-util, instead.
See also racket/system.
procedure
cmd : path-string? arg* : (or/c path-string? (listof path-string?))
procedure
filename : path-string?
5.2 Other Helper Functions
(require gtp-plot/util) | package: gtp-plot |
syntax
syntax
syntax
syntax
syntax
syntax
See also: define-logger.
NOTE: This sequence of bindings from here down is deprecated; use gtp-util, instead. Keep using gtp-plot/util for the logger above.
procedure
(nonnegative-real/c x) → boolean?
x : any/c
procedure
(confidence-interval r* [ #:cv confidence-value]) → (cons/c real? real?) r* : (listof real?) confidence-value : nonnegative-real/c = 1.96
procedure
(path-string->string ps) → string?
ps : path-string?
procedure
(ensure-directory ps) → void?
ps : path-string?
Assumes n is a power of 2.
procedure
(order-of-magnitude n) → natural?
n : real?
procedure
ps : path-string?
procedure
out-path : path-string? p : pict?
procedure
(force/cpu-time thunk) →
any/c natural? thunk : (-> any)
procedure
(natural->bitstring n #:pad pad) → string?
n : natural? pad : natural?
procedure
(bitstring->natural str) → natural?
str : string?
procedure
(count-zero-bits str) → natural?
str : string?
6 GTP Glossary
(These aren’t definitions, these are examples.)
A typeable component in Typed Racket is a module. A typeable component in Reticulated is a function parameter, function return, or class field (see def and @fields).
A configuration is a gradually-typed program. Given a fully-typed program with N typeable components, there are (expt 2 N) configurations that have fewer annotations than the fully-typed configuration.