posted on 2022-10-12, 13:04authored byFrederik Jakob Gossen
Among the first steps in a compilation pipeline is the construction of an Intermediate Representation (IR), an in-memory representation of the input program. Any attempt to program
optimisation, both in terms of size and running time, has to operate on this structure. There may
be one or multiple such IRs, however, most compilers use some form of a Control Flow Graph
(CFG) internally. This representation clearly aims at general-purpose programming languages,
for which it is well suited and allows for many classical program optimisations. On the other
hand, a growing structural difference between the input program and the chosen IR can lose
or obfuscate information that can be crucial for effective optimisation. With today’s rise of a
multitude of different programming languages, Domain-Specific Languages (DSLs), and computing platforms, the classical machine-oriented IR is reaching its limits and a broader variety of
IRs is needed. This realisation yielded, e.g., Multi-Level Intermediate Representation (MLIR),
a compiler framework that facilitates the creation of a wide range of IRs and encourages their
reuse among different programming languages and the corresponding compilers.
In this modern spirit, this dissertation explores the potential of Algebraic Decision Diagrams
(ADDs) as an IR for (domain-specific) program optimisation. The data structure remains the
state of the art for Boolean function representation for more than thirty years and is well-known
for its optimality in size and depth, i.e. running time. As such, it is ideally suited to represent
the corresponding classes of programs in the role of an IR. We will discuss its application in
a variety of different program domains, ranging from DSLs to machine-learned programs and
even to general-purpose programming languages.
Two representatives for DSLs, a graphical and a textual one, prove the adequacy of ADDs
for the program optimisation of modelled decision services. The resulting DSLs facilitate
experimentation with ADDs and provide valuable insight into their potential and limitations:
input programs can be aggregated in a radical fashion, at the risk of the occasional exponential
growth. With the aggregation of large Random Forests into a single aggregated ADD, we
bring this potential to a program domain of practical relevance. The results are impressive:
both running time and size of the Random Forest program are reduced by multiple orders of
magnitude. It turns out that this ADD-based aggregation can be generalised, even to general purpose programming languages. The resulting method achieves impressive speedups for a
seemingly optimal program: the iterative Fibonacci implementation.
Altogether, ADDs facilitate effective program optimisation where the input programs allow
for a natural transformation to the data structure. In these cases, they have proven to be an
extremely powerful tool for the optimisation of a program’s running time and, in some cases,
of its size. The exploration of their potential as an IR has only started and deserves attention in
future research.