Variable-Length-Input Benchmarks

8.12

Variable-Length-Input Benchmarks🔗ℹ

Dominik Pantůček <dominik.pantucek@trustica.cz>

This package provides a library to measure the running time of given procedure(s) for successively increasing input sizes and performing elementary statistical analysis on the resulting series.

1 Introduction🔗ℹ

This package is currently a work-in-progress. Use with caution!

It is developed for helping with benchmarking and profiling the qi modular compiler.

2 Statistical Analysis🔗ℹ

(require vlibench/private/statistics)

package: vlibench

A simple module for performing statistical analysis on a set of results - durations - expected to have normal distribution.

struct
(struct stats (avg stdev min max count))
  avg : real?
  stdev : real?
  min : real?
  max : real?
  count : positive-fixnum?

A struct representing the result of statistical analysis. The fields hold avg average value, standard deviation stdev, minimal sample min, maximal sample max, and the number of samples analysed count.

procedure
(make-stats lst) → stats?
lst : (listof real?)

Takes a list lst of numbers and performs statistical analysis on it. It computes the average \bar{x}, standard deviation \sigma(x), and minimum and maximum values.

\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i

\sigma(x)=\sqrt{\frac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^2}

Returns the stats? struct containing the results.

procedure
(statistical-analysis raw-values
[ #:max-sigma max-sigma])
→
(listof real?) stats?
raw-values : (listof real?)
max-sigma : real? = 3

Performs a more rigorous statistical analysis on given sample of numbers by repeatedly removing all input values that fall outside given max-sigma range from the average, until all samples are within the allowed range. This is to make sure the results are not affected by random glitches that produce extreme values.

Returns two values: the list of remaining samples that are taken into consideration when calculating the statistical analysis and the actual statistical analysis results.

3 Benchmarking🔗ℹ

(require vlibench)

package: vlibench

This module provides all procedures for setting the benchmarking up, running it and formatting results in a scribble document. It re-provides applicable bindings from the following private modules:

vlibench/profile-defaults ...

3.1 Profiles🔗ℹ

(require vlibench/private/profile-struct)
	package: vlibench

This module defines a benchmarking profile.

struct
(struct vlib/profile (name mini mid maxi rep gc))
  name : string?
  mini : fixnum?
  mid : (or/c fixnum? #f)
  maxi : fixnum?
  rep : fixnum?
  gc : (or/c 'never 'step 'always)

Named profile specifying the minimum, middle, and maximum value of input sizes. See make-benchmark-steps for details. In addition it specifies the number of repeats each input length is benchmarked and how often the garbage collection should be run based on gc field. Allowed values are: 'never to never run collect-garbage explicitly, 'step to run it after all the runs for given input size, and 'always to run it after each measurement.

procedure
(make-vlib-profile #:name name
[ #:mini mini
#:mid mid
#:maxi maxi]
#:rep rep
#:gc gc) → vlib/profile?
  name : string?
  mini : fixnum? = 10
  mid : fixnum? = 10000
  maxi : fixnum? = 1000000
  rep : fixnum?
  gc : (or/c 'never 'step 'always)

Helper procedure to create vlib/profile struct with keyword arguments specifying the fields and providing reasonable defaults.

3.2 Benchmark Steps / Profile Input Sizes🔗ℹ

(require vlibench/private/profile-steps)

package: vlibench

procedure
(vlib/profile->steps profile) → sequence?
profile : vlib/profile?

Returns a sequence? of input sizes for given profile.

3.3 Benchmark Structs🔗ℹ

(require vlibench/private/benchmark-structs)
	package: vlibench

This module contains only the core structs for defininy and running benchmarks. Also, auxilliary syntaxes are provided to create those with source syntax included when applicable.

struct
(struct vlib/prog (label proc src))
  label : string?
  proc : (-> any/c any)
  src : syntax?

Single program to be benchmarked with different input sizes. The label field is used to display the results alongside the source syntax src - which can easily be rendered with syntax highlighting. The procedure proc representing the program must accept single argument and can produce anything.

Usually the procedure accepts a number or list of particular size as its argument. Such lists are created as part of the vlib/spec specification struct.

syntax
(make-vlib/prog label prog)

Convenience syntax to populate the vlib/prog struct with both the program and its source alongside specified label.

struct
(struct vlib/spec (name prepare progs))
  name : string?
  prepare : (-> fixnum? any/c)
  progs : (listof vlib/prog?)

A struct representing a single benchmark specification named name, having prepare as preparation procedure and list of vlib/prog? in the progs field.

The preparation procedure is used to create an input value of desired length before actual benchmark is being run and measured.

syntax
(make-vlib/spec name prepare (label prog) ...)

Convenience syntax to create vlib/spec of given name and prepare preparation procedure containing multiple programs with given labels and their sources.

struct
(struct vlib/result (len raw-times times stats))
  len : fixnum?
  raw-times : (listof (listof positive-real?))
  times : (listof (listof positive-real?))
  stats : (listof stats?)

Single input length result for all progs in given benchmark specification. The len is the input size used, raw-times contains a list of list of all running durations measured, times contains a list of list of running durations with extremes outside desired max-sigma range removed and stats contains a list of stats? structs with the statistical analysis results for each program being benchmarked.

struct
(struct vli/benchark ( spec
profile
results
duration
fits
rfit
ratios))
  spec : vlib/spec?
  profile : vlib/profile?
  results : (listof vlib/result)
  duration : positive-real?
  fits : (listof polynomial?)
  rfit : (list/c real? real?)
  ratios : (listof (listof real?))

Represents complete results of running given benchmark under given profile. It contains the spec specification and profile used a list of vlib/result? for all input sizes, the total duration (time spent) polynomial fits of each running times of program from the specification (polynomial of up to 2nd degree), hyperbolic fit (rfit) of the ratio between the first and second program running times and a matrix (a table) of ratios between all programs benchmarked.

3.4 Running Benchmarks🔗ℹ

(require vlibench/private/benchmark-run)

package: vlibench

This module provides the means of running the benchmark specification under given profile, producing the result.

procedure
(run-benchmark spec #:profile prof) → vli/benchmark
spec : vlib/spec?
prof : vlib/profile?

Runs a benchmark given in specification spec under a profile prof and produces vli/benchmark? result.

3.5 Benchmark Utils🔗ℹ

(require vlibench/private/benchmark-utils)
	package: vlibench

This module provides auxilliary procedures for constructing the benchmark specifications.

procedure
(make-random-integer-list len [max-test-int]) → (listof integer?)
len : fixnum?
max-test-int : integer? = 1000000

Creates a random list of integers of given length and maximum value. Should be used as a prepare field of specifications where the input is a list of integers.

4 Benchmark Reports🔗ℹ

(require vlibench/private/report-snippets)
	package: vlibench

This module contains various scribble snippets allowing for simple creation of a comprehensive report about given benchmarks.

procedure
(snippet:vlib/profile profile [heading]) → content?
profile : vlib/profile?
heading : block? = subsection

Outputs information about the profile - its name, whether logarithmic steps are used throughout the tested range or there is a boundary where the steps change to linear progression, the number of measurements for each input length and garbage collection settings.

For example:

Profile: some name
Logarithmic/Linear Boundary: 10000
Number of measurements for each input length: 100
Garbage collection: never

procedure
(snippet:system-information [ heading
#:more-versions more-versions
#:more-generic more-generic])
→ content?
  heading : block? = subsection
  more-versions : list? = '()
  more-generic : list? = '()

Displays information about the hardware and software of the system where the benchmarks were run. Racket version, operating system, architecture, Racket runtime, machine type (CPU class), number of processor threads and processor descriptive string.

Detailed processor description is not available on Windows platform.

An example output is:

System Information
Racket Version: 8.12 - Welcome to Racket v8.12 [cs].
Operating System: linux
Architecture: x86_64
Runtime: chez-scheme
Machine: Linux 5e3a64569911 4.14.336-253.554.amzn2.x86_64 #1 SMP Fri Jan 12 09:58:17 UTC 2024 x86_64
Processor Count: 1
Processor: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

procedure
(snippet:hyperbolic-formula lst) → content?
lst : (listof real?)

Renders hyperbolic equation in the form: ax^{-1}+b. The argument is a list in the form (list a b).

procedure
(snippet:polynomial-formula fit
[ max-rank
epsilon]) → content?
  fit : polynomial?
  max-rank : fixnum? = 2
  epsilon : real? = 1e-6

Renders polynomial formula of given maximal rank max-rank in the form ax^2+bx+c suitable for inclusion in the alignat* LaTeX math environment. See snippet:benchmark-polynomial-formulae below.

procedure
(snippet:spec-sources spec) → content?
spec : vlib/spec?

Shows a program sources for all programs of given specification spec. For example:

Program Name:
(lambda (lst)
(for/sum ((v (in-list lst)))
(* v v)))

procedure
(snippet:benchmark-polynomial-formulae benchmark) → content?
benchmark : vli/benchmark?

Displays multiple polynomial formulae from vli/benchmark?’s fits field aligned using the alignat* LaTeX math environment. For example:

\begin{alignat*}{5} \text{First Name:}\quad t & = & 2 & x^2 & + & & x & - & 5 \\ \text{Second Name:}\quad t & = & & x^2 & - & 3 & x & & \\ \end{alignat*}

procedure
(snippet:benchmark-performance-matrix benchmark) → content?
benchmark : vli/benchmark?

Display matrix of running time durations ratios comparing all programs with each other. For example:

A
B
C
A
1.0
2.0
5.0
B
0.5
1.0
2.5
C
0.2
0.4
1.0

procedure
(snippet:benchmark-ratio-fit benchmark
[ nominator-label
denominator-label]) → content?
  benchmark : vli/benchmark?
  nominator-label : string? = "Q"
  denominator-label : string? = "R"

Displays the hyperbolic fit formula for given result. For example:

\frac{Q}{R}=5x^{-1}+2

procedure
(snippet:benchmar/s-duration benchmark/s) → content?
benchmark/s :
(or/c vli/benchmark?
(listof vli/benchmark?))

Displays the duration of single benchmark or a total duration of a list of benchmarks. The format is "Benchmark(s) took 123 seconds (00:02:03)."

procedure
(snippet:benchmark-results benchmark) → content?
benchmark : vli/benchmark?

Outputs comprehensive results of given benchmark as a subsection labeled with benchmark name. It contains in order: benchmark duration, programs’ sources, graphical plot of running times for all programs for comparison, polynomial formulae list, performance matrix, graphical plot of ratios and fitted formula.

procedure
(snippet:summary-results-table benchmarks) → content?
benchmarks : (listof vli/benchmark?)

Displays summary table of all the benchmark results comparing the first two implementations for all of them and reporting the ratio of the first over second.

1	Introduction
2	Statistical Analysis
3	Benchmarking
4	Benchmark Reports