Sable Publications (Theses)
Efficient Design and Implementation of Software Thread-Level Speculation
|
back
|
Author: Zhen Cao
Date: December 2015
Software approaches to Thread-Level Speculation (TLS) have
been recently explored, bypassing the need for specialized
hardware designs. These approaches, however, tend to focus
on source or VM-level implementations aimed at specific
language and runtime environments. In addition, previous
software approaches tend to make use of a simple thread
forking model, reducing their ability to extract substantial
parallelism from tree-form recursion programs. We propose a
Mixed forking model Universal software-TLS (MUTLS) system to
overcome these limitations. Our work demonstrates that
actual speedup is achievable on existing, commodity
multi-core processors while maintaining the flexibility of a
highly generic implementation context, though
memory-intensive benchmarks could be improved by further
optimizations. Our experiments also indicate that a mixed
model is preferable for parallelization of tree-form
recursion applications over the simple forking models used
by previous software-TLS approaches.
We then improve the performance of the MUTLS system by
proposing memory buffering optimizations. Traditional "lazy"
buffering mechanisms enable strong isolation of speculative
threads, but imply large memory overheads, while more recent
"eager" mechanisms improve scalability, but are more
sensitive to data dependencies and have higher rollback
costs. We thus describe an integrated system that incorpo-
rates the best of both designs, automatically selecting the
best buffering mechanism. Our approach builds on novel
buffering designs with specific optimizations that improve
both lazy and eager buffer management, allowing us to
achieve 71% geometric mean performance of the OpenMP
manually parallelized version.
Fork-heuristics play a key role in automatic parallelization
using TLS. Current fork-heuristics either lack real parallel
execution environment information to accurately evaluate
fork points and/or focus on hardware-TLS implementation
which cannot be directly applied to software-TLS. We propose
adaptive fork-heuristics as well as a feedback-based
selection technique to overcome the problems. Experiments
show that the adaptive fork-heuristics and feedback-based
selection are generally effective for automatic
parallelization using software-TLS, achieving respectively
56% and 76% performance of the manually annotated
speculative version. More advanced automatic workload
distribution strategies can also help to improve the
effectiveness of the automatic parallelization approaches.
Finally, dynamic languages such as Python are difficult to
statically analyze and parallelize, which is an ideal
context for the dynamic parallelization software- TLS
approach. We integrate the Python JIT specializing compiler
Numba with the MUTLS system to get a parallelizing compiler
for Python. Experiments on 6 benchmarks show that the
speculatively parallelized version achieve speedups from 1.8
to 16.4 and from 2.2 to 40.5 for the JIT compiled python
programs with and without accounting for JIT compilation
time, respectively. Overall, we demonstrate that
software-TLS can be an efficient and effective approach for
automatic parallelization on commodity multi-core processors
for a variety of language contexts/execution
environments.
VeloCty: A Static Optimising Compiler for MATLAB and NumPy
|
back |
Author: Sameer Jagdale
Date: March 2015
Abstract
High-level scientific languages such as MATLAB and Python's NumPy library are
gaining popularity among scientists and mathematicians. These languages provide
many features such as dynamic typing, high-level scientific functions etc.
which allow easy prototyping. However these features also inhibit performance
of the code.
We present VeloCty, an optimizing static compiler for MATLAB and Python as a
solution to the problem of enhancing performance of programs written in these
languages. In most programs, a large portion of the time is spent executing a
small part of the code. Moreover, these sections can often be compiled ahead of
time and improved performance can be achieved by optimizing only these `hot'
sections of the code. VeloCty takes as input functions written in MATLAB and
Python specified by the user and generates an equivalent C++ version. VeloCty
also generates glue code to interface with MATLAB and Python. The generated
code can then be compiled and packaged as a shared library that can be linked
to any program written in MATLAB and Python. We also implemented optimisations
to eliminate array bounds checks, reuse previously allocated memory during
array operations and support parallel execution using OpenMP.
VeloCty uses the Velociraptor toolkit. We implemented a C++ backend for the
Velociraptor intermediate representation, VRIR, and language-specific runtimes
for MATLAB and Python. We have also implemented a MATLAB VRIR generator using
the \mclab toolkit.
VeloCty was evaluated using 17 MATLAB benchmarks and 9 Python benchmarks. The
MATLAB benchmark versions compiled using VeloCty with all optimisations enabled
were between 1.3 to 458 times faster than the MathWorks' MATLAB2014b
interpreter and JIT compiler. Similarly, Python benchmark versions were between
44.11 and 1681 times faster than the CPython interpreter.
MC2FOR: A MATLAB TO FORTRAN 95 COMPILER
|
back |
Author: Xu Li
Date: April 2014
Abstract
MATLAB is a dynamic numerical scripting
language widely used by scientists, engineers and students. While
MATLAB's high-level syntax and dynamic types make it ideal for fast
prototyping, programmers often prefer using high-performance static languages
such as FORTRAN for their final distribution. Rather than rewriting
the code by hand, our solution is to provide a source-to-source compiler that
translates the original MATLAB program to an equivalent MATLAB program.
In this thesis, we introduce MC2FOR, a source-to-source compiler which
transforms MATLAB to FORTRAN and handles several important challenges
during the transformation, such as efficiently estimating the static type
characteristics of all the variables in a given MATLAB program,
mapping numerous MATLAB built-in functions to FORTRAN, and correctly
supporting some MATLAB dynamic features in the generated FORTRAN code.
This compiler consists of two major parts. The first part is an
interprocedural analysis component to estimate the static type
characteristics, such as the shapes of the arrays and the ranges
of the scalars, which are used to generate variable declarations and to remove
unnecessary array bounds checking in the translated FORTRAN program.
The second part is an extensible FORTRAN code generation framework
automatically transforming MATLAB constructs to equivalent FORTRAN
constructs.
This work has been implemented within the McLab framework, and we
evaluated the performance of the \mctfor compiler on a collection of
20 MATLAB benchmarks. For most of the benchmarks, the generated
FORTRAN program runs 1.2 to 337 times faster than the original MATLAB
program, and in terms of physical lines of code, typically grows only
by a factor of around 2. These experimental results show that the code
generated by \mctfor performs better on average, at the cost of only a
modest increase in code size.
MiX10: Compiling MATLAB to X10 for High Performance
|
back |
Author: Vineet Kumar
Date: April 2014
Abstract
MATLAB is a popular dynamic array-based language commonly used by
students, scientists and engineers who appreciate the interactive
development style, the rich set of array operators, the extensive
builtin library, and the fact that they do not have to declare static
types. Even though these users like to program in MATLAB, their
computations are often very compute-intensive and are better suited
for emerging high performance computing systems. This thesis reports
on MIX10, a source-to-source compiler that automatically translates
MATLAB programs to X10, a language designed for “Performance and
Productivity at Scale"; thus, helping scientific programmers make
better use of high performance computing systems. There is a large
semantic gap between the array-based dynamically-typed nature of
MATLAB and the object-oriented, statically-typed, and high-level array
abstractions of X10. This thesis addresses the major challenges that
must be overcome to produce sequential X10 code that is competitive
with state-of-the-art static compilers for MATLAB which target more
conventional imperative languages such as C and Fortran. Given that
efficient basis, the thesis then provides a translation for the MATLAB
parfor construct that leverages the powerful concurrency constructs in
X10. The MIX10 compiler has been implemented using the McLab compiler
tools, is open source, and is available both for compiler researchers
and end-user MATLAB programmers. We have used the implementation to
perform many empirical measurements on a set of 17 MATLAB benchmarks.
We show that our best MIX10-generated code is significantly faster
than the de facto Mathworks’ MATLAB system, and that our results are
competitive with state-of-the-art static compilers that target C and
Fortran. We also show the importance of finding the correct approach
to representing the arrays in X10, and the necessity of an IntegerOkay
analysis that determines which double variables can be safely
represented as integers. Finally, we show that our X10-based handling
of the MATLAB parfor greatly outperforms the de facto MATLAB
implementation.
DYNAMIC COMPILER OPTIMIZATION TECHNIQUES FOR MATLAB
|
back
|
Author: Nurudeen Abiodum Lameed
Date: April 2013
MATLAB has gained widespread acceptance among engineers and scientists. Several
aspects of the language such as dynamic loading and typing, safe updates, copy semantics
for arrays, and support for higher-order functions contribute to its appeal, but at the same
time provide many challenges to the compiler and virtual machine. MATLAB is a dynamic
language. Traditional implementations of the language use interpreters and have been found
to be too slow for large computations. More recently, researchers and software developers
have been developing JIT compilers for MATLAB and other dynamic languages.
This thesis is about the development of new compiler analyses and
transformations for a MATLAB JIT compiler, McJIT, which is based on the LLVM JIT compiler toolkit.
The new contributions include a collection of novel analyses for optimizing copying
of arrays, which are performed when a function is first compiled. We designed and imple-
mented four analyses to support an efficient implementation of array copy semantics in a
MATLAB JIT compiler. Experimental results show that copy optimization is essential for
performance improvement in a compiler for the MATLAB language.
We also developed a variety of new dynamic analyses and code transformations for
optimizing running code on-the-fly according to the current conditions of the runtime en-
vironment. LLVM does not currently support on-the-fly code transformation. So, we first
developed a new on-stack replacement approach for LLVM. This capability allows the run-
time stack to be modified during the execution of a function, thus enabling a continuation
of the execution at a higher optimization level. We then used the on-stack replacement
implementation to support selective inlining of function calls in long-running loops. Our
experimental results show that function calls in long-running loops can result in high run-
time overhead, and that selective dynamic inlining can be used to drastically reduce this
overhead.
The built-in function feval is an important MATLAB feature for certain classes of
numerical programs and solvers which benefit from having functions as parameters. Pro-
grammers may pass a function name or function handle to the solver and then the solver
uses feval to indirectly call the function. In this thesis, we show that although feval
provides an acceptable abstraction mechanism for these types of applications, there are
significant performance overheads for function calls via feval, in both MATLAB inter-
preters and JITs. The thesis then proposes, implements and compares two on-the-fly mech-
anisms for specialization of feval calls. The first approach uses our on-stack replacement
technology. The second approach specializes calls of functions with feval using a combi-
nation of runtime input argument types and values. Experimental results on seven numerical
solvers show that the techniques provide good performance improvements.
The implementation of all the analyses and code transformations presented in this thesis
has been done within the McLab virtual machine, McVM, and is available to the public as
open source software.
UNDERSTANDING AND REFACTORING THE MATLAB LANGUAGE
|
back |
Author: Soroush Radpour
Date: August 2012
Abstract
MATLAB is a very popular dynamic "scripting" language for numerical computations
used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions and frequently build
upon existing code which may use outdated features. This results in programs that could
benefit from refactoring, especially if the code will be reused and/or distributed. Despite
the need for refactoring there appear to be no MATLAB refactoring tools available. Correct
refactoring of MATLAB is quite challenging because of its non-standard rules for binding
identifiers. Even simple refactorings are non-trivial. Compiler writers and software engineers are generally not familiar with MATLAB and how it is used so the problem has been
left untouched so far.
This thesis has two main contributions. The first is McBench, a tool that helps compiler
writers understand the language better. In order to have a systematic approach to the problem, we developed this tool to give us some insight about how programmers use MATLAB.
The second contribution is a suite of semantic-preserving refactoring for MATLAB functions and scripts including: function and script inlining, converting scripts to functions,
extracting new functions, and converting dynamic feval calls to static function calls. These
refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB programs which demonstrates the effectiveness of
our approach.
Software Method Level Speculation for Java
|
back
|
Author: Christopher J. F. Pickett
Date: April 2012
Abstract
Speculative multithreading (SpMT), also known as thread level
speculation (TLS), is a dynamic parallelization technique that
relies on out-of-order execution, dependence buffering, and
misspeculation rollback to achieve speedup of sequential programs
on multiprocessors. A large number of hardware studies have shown
good results for irregular programs, as have a smaller number of
software studies in the context of loop level speculation for
unmanaged languages.
In this thesis we explore software method level speculation for
Java. A software environment means that speculation will run on
existing multiprocessors, at the cost of extra overhead. Method
level speculation (MLS) is a kind of SpMT / TLS that creates
threads on method invocation, executing the continuation
speculatively. Although MLS can subsume loop level speculation,
it is still a relatively unexplored paradigm. The Java
programming language and virtual machine environment are rich and
complex, posing many implementation challenges, but also
supporting a compelling variety of object-oriented programs.
We first describe the design and implementation of a prototype
system in a Java bytecode interpreter. This includes support for
various MLS components, such as return value prediction and
dependence buffering, as well as various interactions with
features of the Java virtual machine, for example bytecode
interpretation, exception handling, and the Java memory model.
Experimentally we found that although high thread overheads
preclude speedup, we could extract significant parallelism if
overheads were excluded. Furthermore, profiling revealed three
key areas for optimization.
The first key area for optimization was the return value
prediction system. In our initial model, a variety of predictors
were all executing naïvely on every method invocation, in order
that a hybrid predictor might select the best performing ones. We
developed an adaptive system wherein hybrid predictors dynamically
specialize on a per-callsite basis, thus dramatically reducing
speed and memory costs whilst maintaining high accuracy.
The second area for optimization was the nesting model. Our
initial system only allowed for out-of-order nesting, wherein a
single parent thread creates multiple child threads. Enabling
support for in-order nesting exposes significantly more
parallelization opportunities, because now speculative child
threads can create their own children that are even more
speculative. This required developing a memory manager for child
threads based on recycling aggregate data structures. We present
an operational semantics for our nesting model derived from our
implementation.
Finally, we use this semantics to address the third area for
optimization, namely a need for better fork heuristics. Initial
heuristics based on online profiling made it difficult to identify
the best places to create threads due to complex feedback
interactions between speculation decisions at independent
speculation points. This problem grew exponentially worse with
the support for in-order nesting. Instead, we chose to clarify
the effect of program structure on runtime parallelism. We did
this by systematically exploring the interaction between
speculation and a variety of coding idioms. The patterns we
identify are intended to guide both manual parallelization and
static compilation efforts.
Author: Anton Dubrau
Date: April 2012
Abstract
MATLAB is a dynamic scientific language used by scientists, engineers and students
worldwide. Although MATLAB is very suitable for rapid prototyping and development,
MATLAB users often want to convert their final MATLAB programs to a static language
such as FORTRAN, to integrate them into already existing programs of that language,
to leverage the performance of powerful static compilers, or to ease the distribution of
executables.
This thesis presents an extensible object-oriented toolkit to help facilitate the generation
of static programs from dynamic MATLAB programs. Our open source toolkit, called the
MATLAB Tamer, targets a large subset of MATLAB. Given information about the entry
point of the program, the MATLAB Tamer builds a complete callgraph, transforms every
function into a reduced intermediate representation, and provides typing information to aid
the generation of static code.
In order to provide this functionality, we need to handle a large number of MATLAB
builtin functions. Part of the Tamer framework is the builtin framework, an extensible
toolkit which provides a principled approach to handle a large number of builtin functions.
To build the callgraph, we provide an interprocedural analysis framework, which
can be used to implement full-program analyses. Using this interprocedural framework, we
have developed value analysis, an extensible interprocedural analysis to estimate MATLAB
types, which helps discover the call edges needed to build the call graph.
In order to make the static analyses even possible, we disallow a small number of MATLAB
constructs and features, but attempt to support as large a subset of MATLAB as possible.
Thus, by both slightly restricting MATLAB, and by providing a framework with powerful
analyses and simplifying transformations, we can “Tame MATLAB”.
IMPLEMENTATION AND OPTIMIZATION OF THREAD-LOCAL
VARIABLES FOR A RACE-FREE JAVA DIALECT
|
back |
Author: Yi Zhang
Date: December 2011
Abstract
Despite the popularity of Java, problems may arise from potential
data-race conditions during execution of a Java program. Data-races
are considered errors in concurrent programming languages and greatly
complicate both programming and runtime optimization efforts. A
race-free version of Java is therefore desirable as a way of avoiding
this complexity and simplifying the programming model.
This thesis is part of work trying to build a race-free version of
Java. It implements and optimizes thread-local accesses and comes up
with a new semantics for this language. An important part of
implementing a language without races is to distinguish thread-local
data from shared data because these two groups of data need to be
treated differently. This is complex in Java because in the current
Java semantics all objects are allocated on a single heap and
implicitly shared by multiple threads. Furthermore, while Java does
provide a mechanism for thread-local storage, it is awkward to use and
inefficient.
Many of the new concurrent programming languages, such as OpenMP, UPC,
and D, use “sharing directives” to distinguish shared data from
thread-local data, and have features that make heavy use of
thread-local data. Our goal here is to apply some of these language
ideas to a Java context in order to provide a simpler and less
error-prone programming model. When porting such features as part of a
language extension to Java, however, performance can suffer due to the
simple, map-based implementation of Java's built-in ThreadLocal class.
We implement an optimized mechanism based on programmer annotations
that can efficiently ensure class and instance variables are only
accessed by their owner thread. Both class and instance variables
inherit values from the parent thread through deep copying, allowing
all the reachable objects of child threads to have local copies if
syntactically specified. In particular, class variable access involves
direct access to thread-local variables through a localized heap,
which is faster and easier than the default map mechanism defined for
ThreadLocal objects. Our design improves performance significantly
over the traditional thread-local access method for class variables
and provides a simplified and more appealing syntax for doing so. We
further evaluate our approach by modifying non-trivial, existing
benchmarks to make better use of thread-local features, illustrating
feasibility and allowing us to measure the performance in realistic
contexts. This work is intended to bring us closer to designs for a
complete race-free version of Java, as well as show how improved
support for use of thread-local data could be implemented in other
languages.
MCSAF: AN EXTENSIBLE STATIC ANALYSIS FRAMEWORK FOR THE MATLAB LANGUAGE
|
back |
Author: Jesse Doherty
Date: August 2011
Abstract
MATLAB® is a popular language for scientific and numerical programming.
Despite its popularity, there are few active projects providing open tools
for MATLAB related compiler research. This thesis provides the McLAB
Static Analysis Framework,Mc SAF, the goal of which is to simplify the
development of new compiler tools for MATLAB.
The McLAB project was started in order to develop such tools in the hopes
of attracting further research. The goal of the project is to provide an
extensible compiler toolkit for MATLAB and scientific programming. It is
intended to explore the compilation challenges unique to MATLAB and to
explore new language features that could help scientific programmers
be more productive. One piece of functionality that is particularly
important for compiler research is the ability to perform static analysis.
Without the information provided by static analyses, program transformations
and optimizations, and automated programmer feedback would not be possible.
In order to make the development of static analyses simpler, this thesis
contributes a framework for creating static analyses for the MATLAB language.
This framework is intended to make writing analyses easier by providing core
functionality and API for developing such analyses. It also aims to make
analysis development easier by providing an intermediate representation called
McLAST, which provides simpler syntax and explicitly exposes some of MATLAB’s
semantics. In order to give analysis writers a head start, some example analyses
are provided. These include simple analyses intended to demonstrate the use of
the framework, and some more complicated analyses that provide basic semantic
information about MATLAB programs.
In addition to the framework making development of analyses simpler, McSAF
is also designed to be extended to new language features. Not only can the
framework be extended, but existing analyses can also be extended. This allows
work that was previously done for analyzing MATLAB code to be applied to future
language extensions.
McFLAT : A Profile-based Framework for MATLAB Loop Analysis and Transformations
|
back |
Author: Amina Aslam
Date: August 2010
Abstract
Parallelization and optimization of the MATLAB™ programming language
presents several challenges due to the dynamic nature of MATLAB. Since
MATLAB does not have static type declarations, neither the shape and size
of arrays, nor the loop bounds are known at compile-time. This means that
many standard array dependence tests and associated transformations cannot
be applied straight-forwardly. On the other hand, many MATLAB programs operate
on arrays using loops and thus are ideal candidates for loop transformations
and possibly loop vectorization/parallelization.
This thesis presents a new framework, McFLAT, which uses profile-based
training runs to determine likely loop-bounds ranges for which specialized
versions of the loops may be generated. The main idea is to collect
information about observed loop bounds and hot loops using training data which
is then used to heuristically decide upon which loops and which ranges are
worth specializing using a variety of loop transformations.
Our McFLAT framework has been implemented as part of the McLAB extensible
compiler toolkit. Currently, McFLAT is used to automatically transform
ordinary MATLAB code into specialized MATLAB code with transformations
applied to it. This specialized code can be executed on any MATLAB system,
and we report results for four execution engines, Mathwork’s proprietary
MATLAB system, the GNU Octave open-source interpreter, McLAB’s McVM
interpreter and the McVM JIT. For several benchmarks, we observed significant
speedups for the specialized versions, and noted that loop transformations
had different impacts depending on the loop range and execution engine.
This thesis reports on the design and implementation of McFLAT, a framework
that is designed to study the effect of various loop transformations on
different loop-bound ranges by introducing loop-level specializations in
MATLAB programs.
AspectMatlab: An Aspect-Oriented Scientific Programming Language
|
back |
Author: Toheed Aslam
Date: February 2010
Abstract
There has been relatively little work done in the compiler research community
for incorporating aspect-oriented features in scientific and dynamic
programming languages. MATLAB is a dynamic scientific programming
language that is commonly used by scientists because of its convenient and
high-level syntax for arrays, the fact that type declarations are not required,
and the availability of a rich set of application libraries. This thesis
introduces a new aspect-oriented scientific language, AspectMatlab.
AspectMatlab introduces key aspect-oriented features in a way that is
both accessible to scientists and where the aspect-oriented features
concentrate on array accesses and loops, the core computation elements
in scientific programs. One of the main contributions of this thesis is to
provide a compiler implementation of the AspectMatlab language. It is
supported by a collection of scientific use cases, which demonstrate the
potential of aspect-orientation for scientific problems.
Introducing aspects into a dynamic language such as MATLAB also
provides some new challenges. In particular, it is difficult to
statically determine precisely where patterns match, resulting in many
dynamic checks in the woven code. The AspectMatlab compiler uses flow
analyses to eliminate many of those dynamic checks.
This thesis reports on the language design of AspectMatlab, the amc
compiler implementation, and also provides an overview of the use cases that
are specific to scientific programming. By
providing clear extensions to an already popular language, AspectMatlab
will make aspect-oriented programming accessible to a new group of
programmers including scientists and engineers.
McVM: an Optimizing Virtual Machine for the MATLAB Programming Language
|
back |
Author:Maxime Chevalier-Boisvert
Date: August 2009
Abstract
In recent years, there has been an increase in the popularity of dynamic languages such as Python, Ruby, PHP, JavaScript and MATLAB. Programmers appreciate the productivity
gains and ease of use associated with such languages. However, most of them still run in virtual machines which provide no Just-In-Time (JIT) compilation support, and thus perform
relatively poorly when compared to their statically compiled counterparts. While the reference MATLAB implementation does include a built-in compiler, this implementation
is not open sourced and little is known abouts its internal workings. TheMcVMproject has focused on the design and implementation of an optimizing virtual machine for a subset of
the MATLAB programming language.
Virtual machines and JIT compilers can benefit from advantages that static compilers do not have. It is possible for virtual machines to make use of more dynamic information
than static compilers have access to, and thus, to implement optimization strategies that are more adapted to dynamic languages. Through theMcVMproject, some possible avenues to
significantly improve the performance of dynamic languages have been explored. Namely, a just-in-time type-based program specialization scheme has been implemented in order to
take advantage of dynamically available type information.
One of the main contributions of this project is to provide an alternative implementation
of the MATLAB programming language. There is already an open source MATLAB interpreter (GNU Octave), but our implementation also includes an optimizing JIT compiler
and will be open sourced under the BSD license. McVM aims to become a viable implementation for end-users, but could also see use in the compiler research community as a
testbed for dynamic language optimizations. In addition to the contribution of the McVM framework itself, we also contribute the design and implementation of a novel just-in-time
type-based program specialization system aimed at dynamic languages.
The novel specialization system implemented in McVM shows much promise in terms of
potential speed improvements, yielding performance gains up to 3 orders of magnitude
faster than competing implementations such as GNU Octave. It is also easily adaptable to
other dynamic programming languages such as Python, Ruby and JavaScript. The investigation
of performance issues we make in this thesis also suggests future research directions
for the design of dynamic language compilers of the future.
Verifying finite-state properties of large-scale programs
|
back |
Author: Eric Bodden
Date: June 2009
Abstract
Designers of software components can use finite-state properties to denote behavioral interface specifications which enforce client-side programming rules that state how the components ought to be used. This allows users of these components to check their client code for compliance with these rules, both statically and at runtime.
In this dissertation we explain the design and implementation of Clara, a framework for specifying and verifying finite-state properties of large-scale programs. With Clara, programmers specify finite-state properties together with runtime monitors, using a syntactic extension to the aspect-oriented programming language AspectJ. Clara then uses a sequence of three increasingly detailed static analyses to determine if the program satisfies the finite-state properties, i.e., is free of property violations.
Clara produces a list of program points at which the program may violate the properties, ranked by a confidence value. If violations are possible, Clara also instruments the program with the supplied runtime monitor, which will capture property violations when the program executes. Due to its static analyses, Clara can omit the instrumentation at program locations which the analyses proved safe, and so optimize the instrumented program. When much instrumentation remains, Clara partitions the instrumentation into subsets, so that one can distribute multiple partially instrumented program versions that each run with a low overhead.
We validated the approach by applying Clara to finite-state properties denoted in multiple formalisms over several large-scale Java programs. Clara proved that most of the programs fulfill our example properties. For most other programs, Clara could remove the monitoring overhead to below 10%. We also found multiple property violations by manually inspecting the top entries in Clara's ranked result list.
McFOR: A MATLAB to FORTRAN 95 Compiler
|
back |
Author: Jun Li
Date: August 2009
Abstract
The high-level array programming language MATLAB is widely used for prototyping
algorithms and applications of scientific computations. However, its dynamicallytyped
nature, which means that MATLAB programs are usually executed via an
interpreter, leads to poor performance. An alternative approach would be converting
MATLAB programs to equivalent Fortran 95 programs. The resulting programs could
be compiled using existing high-performance Fortran compilers and thus could provide
better performance. This thesis presents techniques that are developed for our
MATLAB-to-Fortran compiler, McFor, for extracting information from the high-level
semantics of MATLAB programs to produce efficient and reusable Fortran code.
The McFor compiler includes new type inference techniques for inferring intrinsic
type and shape of variables and uses a value-propagation analysis to precisely estimate
the sizes of arrays and to eliminate unnecessary array bounds checks and dynamic
reallocations. In addition to the techniques for reducing execution overhead, McFor
also aims to produce programmer-friendly Fortran code. By utilizing Fortran 95 features,
the compiler generates Fortran code that keeps the original program structure
and preserves the same function declarations.
We implemented the McFor system and experimented with a set of benchmarks
with different kinds of computations. The results show that the compiled Fortran
programs perform better than corresponding MATLAB executions, with speedups
ranging from 1.16 to 102, depending on the characteristics of the program.
The Metalexer Lexer Specification Language
|
back |
Author: Andrew Michael Casey
Date: June 2009
Abstract
Compiler toolkits make it possible to rapidly develop compilers and translators for new
programming languages. Recently, toolkit writers have focused on supporting extensible
languages and systems that mix the syntaxes of multiple programming languages. However,
this work has not been extended down to the lexical analysis level. As a result, users of
these toolkits have to rely on ad-hoc solutions when they extend or mix syntaxes. This thesis
presents MetaLexer, a new lexical specification language that remedies this deficiency.
MetaLexer has three key features: it abstracts lexical state transitions out of semantic actions,
makes modules extensible by introducing multiple inheritance, and provides crossplatformsupport
for a variety of programming languages and compiler front-end toolchains.
In addition to designing this new language, we have constructed a number of practical
tools. The most important are a pair of translators that map MetaLexer to the popular JFlex
lexical specification language and vice versa.
We have exercised MetaLexer by using it to create lexers for three real programming languages:
AspectJ (and two extensions), a large subset of Matlab, and MetaLexer itself. The
new specifications are easier to read and require much less action code than the originals.
Optimizing Software-Hardware Interplay in Efficient Virtual Machines
|
back |
Author: Gregory B. Prokopski
Date: February 2009
Abstract
To achieve the best performance, most computer languages are compiled, either
ahead of time and statically, or dynamically during runtime by means of a
Just-in-Time (JIT) compiler. Optimizing compilers are complex, however, and for
many languages such as Ruby, Python, PHP, etc., an interpreter-based Virtual Machine
(VM) offers a more flexible and portable implementation method, and moreover
represents an acceptable trade-off between runtime performance and development
costs. VM performance is typically maximized by use of the basic directthreading
interpretation technique which, unfortunately, interacts poorly with modern
branch predictors. More advanced techniques, like code-copying have been proposed
[RS96,PR98,EG03c,Gag02] but have remained practically infeasible due to important
safety concerns. On this basis we developed two cost-efficient, well-performing
solutions.
First, we designed a C/C++ language extension that allows programmers to express
the need for the special safety guarantees of code-copying. Our low-maintenance
approach is designed as an extension to a highly-optimizing, industry-standard GNU
C Compiler (GCC), and we apply it to Java, OCaml, and Ruby interpreters. We
tested the performance of these VMs on several architectures, gathering extensive
analysis data for both software and hardware performance. Significant improvement
is possible, 2.81 times average speedup for OCaml and 1.44 for Java on Intel 32-bit,
but varies by VM and platform. We provide detailed analysis and design guidelines
for helping developers predict and evaluate the benefit provided by safe code-copying.
In our second approach we focused on alleviating the limited scope of optimizations
in code-copying with an ahead-of-time-based (AOT) approach. A source-based
approach to grouping bytecode instructions together allows for more extensive crossbytecode
optimizations, and so we develop a caching compilation server that specializes
interpreter source code to a given input application. Although this requires AOT
profiling, it further improves performance over code-copying, 27% average for OCaml
and 4-5% on selected Java benchmarks.
This thesis work provides designs for low maintenance, high efficiency VMs, and
also demonstrates the large performance improvement potentially enabled by tailoring
language implementation to modern hardware. Our designs are both based on
understanding the impact of lower-level components on VM behavior. By optimizing
the software-hardware interplay we thus show significant speed-up is possible with
very minimal VM and compiler development costs.
Aspect Impact Analysis
|
back |
Author: Dehua Zhang
Date: August 2008
Abstract
One of the major challenges in aspect-oriented programming
is that aspects may have unintended impacts on a base program. Thus, it
is important to develop techniques and
tools that can both summarize the impacts and provide information about
the causes of the impacts. This thesis presents impact analyses for
AspectJ.
Our approach detects different ways advice and inter-type
declarations interact and interfere with the base program and focuses
on four kinds of impacts, state impacts which
cause changes of state in the base program, computation impacts which
cause changes in functionality by adding, removing or replacing
computations of the base program, shad-
owing impacts which cause changes of field reference in the base
program, and lookup impacts which cause changes of method lookup in the
base program.
We provide a classification scheme for these kinds of
impacts and then develop a set of static analyses to estimate these
impacts. A key feature of our approach is the use of
points-to analysis to provide more accurate estimates. Further, our
analysis results allow us to trace back to find the causes of the
impacts.
We have implemented our techniques in the AspectBench
compiler. By implementing them in an AspectJ compiler, all kinds of
pointcuts, advice and inter-type declarations can
be analyzed. We also have integrated these analyses into an AspectJ IDE
and provided a two-way navigation between impacts and program source
code. In addition, we have
carried out experiments on example programs and benchmarks to
investigate the results of our analyses.
Static Lock Allocation
|
back |
Author: Richard L. Halpert
Date: April 2008
Abstract
The allocation of lock objects to critical sections in
concurrent
programs affects both performance and correctness. Traditionally,
this allocation is done manually by the programmer. Recent work
explores automatic lock allocation, aiming primarily to minimize
conflicts and maximize parallelism by allocating locks to individual
critical sections. We investigate several modes of lock allocation,
using connected components (groups) of interfering critical sections
on a critical section inteference graph as the basis for allocation
decisions. Our allocator uses thread-based side effect analysis which
is built from several pluggable component analyses. It benefits from
precise points-to and may happen in parallel information. Thread-local
object information provides a small improvement over points-to
analysis alone. Our framework minimizes the restrictions on input
programs, dealing gracefully with nesting and deadlock, and requiring
only simple annotations identifying all critical sections. Legacy
programs using synchronized regions can be processed without
alteration. Dynamic locks do not broadly improve upon identical
allocations of static locks, but allocating several dynamic locks in
place of a single static lock can significantly increase parallelism.
We experiment with a range of small and large Java benchmarks on 1 to
8 processors, and find that singleton locking is sufficient for four
of our benchmarks, and that static locking with Spark points-to
analysis is sufficient for another two. Of the other five benchmarks,
two require the use of all phases of our analysis, one depends on
using the lockset transformation, and two benchmarks proved too
complex to be automatically transformed to satisfactory performance.
Java bytecode obfuscation
|
back |
Author: Michael R. Batchelder
Date: January 2007
Abstract
Programs written for machine execution will always be
susceptible to information theft. This information can include
trademarked algorithms, data embedded in the program, or even data the
program accesses. As technology advances computer scientists are
building more and more powerful tools for reverse-engineering such as
the decompiler.
The Java programming language is particularly open to
reverse-engineering attacks be- cause of its well-defined, open, and
portable binary format. We examine one area of better- securing the
intellectual property of a Java program; obfuscation. Obfuscation of a
program involves transforming the code of the program into a more
complex but semantically equiv- alent representation. This can include
the addition of confusing control flow, the removal of certain
information embedded in the program which is not explicitly required
for execution, or the cloaking of data.
Obfuscation is one of the only techniques available other
than cryptological options. While many approaches to obfuscation are
ultimately reversible, it nevertheless seriously hinders those
attempting to steal information by increasing the computing time and
power required by software to reverse-engineer the program and also
severely increases the com- plexity of any source code that is
recovered by the reverse-engineering.
In this thesis we present a number of obfuscating
transformations implemented within an automatic tool we name the Java
Bytecode Obfuscator (JBCO). We present empirical measures of the
performance costs of these transformations in terms of execution speed
and program size. Complexity measurements that gauge the effectiveness
of the obfuscations are also given. Finally, we review the feasibility
of each transformation by looking at source code generated from
obfuscated bytecode by various decompilers.
Programmer-friendly decompiled Java
|
back |
Author: Nomair A. Naeem
Date: January 2007
Abstract
Java decompilers convert Java class files to Java source.
Common Java decompilers are javac-specific decompilers since they
target bytecode produced from a particular javac compiler. We present
work carried out on Dava, a tool-independent decompiler that de-
compiles bytecode produced from any compiler. A known deficiency of
tool-independent decompilers is the generation of complicated
decompiled Java source which does not re- semble the original source as
closely as output produced by javac-specific decompilers. This thesis
tackles this short-coming, for Dava, by introducing a new back-end
consisting of simplifying transformations.
The work presented can be broken into three major
categories: transformations using tree traversals and pattern matching
to simplify the control flow, the creation of a flow analysis framework
for an Abstract Syntax Tree (AST) representation of Java source code
and the implementation of flow analyses with their use in complicated
transformations.
The pattern matching transformations rewrite the ASTs to
semantically-equivalent ASTs that correspond to code that is easier for
programmers to understand. The targeted Java con- structs include If
and If-Else aggregation, for-loop creation and the removal of abrupt
control flow. Pattern matching using tree traversals has its
limitations. Thus, we introduce a new structure-based data flow
analysis framework that can be used to gather informa- tion required by
more complex transformations. Popular compiler analyses e.g., reaching
definitions, constant propagation etc. were implemented using the
framework. Information from these analyses is then leveraged to perform
more advanced AST transformations.
We performed experiments comparing different decompiler
outputs for different sources of bytecode. The results from these
experiments indicate that the new Dava back-end con- siderably improves
code comprehensibility and readability.
Information flow in a Java intermediate language
|
back |
Author: Ahmer Ahmedani
Date: August 2006
Abstract
It is a common practice to retrieve code from an outside
source, execute it and return the result to the user. During execution
secure data can enter the program by user input or access of a data
resource. It is important to track the secure data once it enters the
program to identify possible information flows to unwanted regions of
the code which would permit undesirable data output to a user. Most
approaches to restrict information flow in programs have fallen short
of providing a practical solution for mainstream programming languages.
To address this issue, this thesis presents two
context-sensitive inter-procedural analyses which analyze an
intermediate representation of Java Bytecode for secure information
flow. The first analysis assumes that there is only one instance of all
class fields where as the second analysis uses points-to information to
differentiate between instance fields which belong to different
instances of the same class. The analyses track secure information in
the program by maintaining sets of secure data. The analyses resolve
dynamic method resolution in Java statically by analyzing all possible
methods which may be invoked at a call site and merging the secure data
sets. We were able to define rules to analyze all the statements in the
intermediate representation and also accounted for Java libraries. The
analyses do not expect any security annotations in the program.
Type information is useful in debugging, guiding
optimizations, and specifying and providing safety proofs for programs.
A type system for a subset of the Java Bytecode intermediate
representation is also formulated in this thesis. An operational
semantics is specified and a type preservation proof assures the
soundness of the type system.
Non-trivial benchmarks were analyzed and explicit and
implicit information flows were counted for both analyses. The
empirical data collected suggests secure data is used in many
statements of programs and output of data to a user at several places
in a program can lead data.
Dynamic data structure analysis and visualization of
Java programs
|
back |
Author: Sokhom Pheng
Date: May 2006
Abstract
For many years, programmers have faced the problem of
reading and trying to understand other programmers' code, either to
maintain it or to learn from it. Analysis of dynamic data structure
usage is useful for both program understanding and for improving the
accuracy of other program analyses.
Data structure usage has been the target of various static
techniques. Static approaches, however, may suffer from reduced
accuracy in complex situations and have the potential to be
overly-conservative in their approximation. An accurate, clean picture
of runtime heap activity is difficult to achieve.
We have designed and implemented a dynamic heap analysis
system that allows one to examine and analyze how Java programs build
and modify data structures. Using a complete execution trace from a
profiled run of the program, we build an internal representation that
mirrors the evolving runtime data structures. The resulting series of
representations can then be analyzed and visualized. This gives us an
accurate representation of the data structures created and an insight
into the program’s behaviour. Furthermore we show how to use our
approach to help understand how programs use data structures, the
precise effect of garbage collection, and to establish limits on static
data structure analysis.
A deep understanding of dynamic data structures is
particularly important for modern, object-oriented languages that make
extensive use of heap-based data structures. These analysis results can
be useful for an important group of applications such as
parallelization, garbage collection optimization, program understanding
or improvements to other optimization.
Shimple: An Investigation of Static Single Assignment
Form
|
back |
Author: Navindra Umanee
Date: February 2006
Abstract
In designing compiler analyses and optimisations, the choice
of intermediate representation (IR) is a crucial one. Static Single
Assignment (SSA) form in particular is an IR with interesting
properties, enabling certain analyses and optimisations to be more
easily implemented while also being more powerful. Our goal has been to
research and implement various SSA-based IRs for the Soot compiler
framework for Java.
We present three new IRs based on our Shimple framework for
Soot. Simple Shimple is an implementation of the basic SSA form. We
explore extensions of SSA form (Extended SSA form and Static Single
Information form) and unify them in our implementation of Extended
Shimple. Thirdly, we explore the possibility of applying the concepts
of SSA form to array element analysis in the context of Java and
present Array Shimple.
We have designed Shimple to be extensible and reusable,
facilitating further research into variants and improvements of SSA
form. We have also implemented several analyses and optimisations to
demonstrate the utility of Shimple.
Visualization Tools for Optimizing Compilers
|
back |
Author: Jennifer Lhoták
Date: February 2006
Abstract
Optimizing
compilers have traditionally had little support for visual tools which
display the vast amount of information generated and which could aid in
the development of analyses and teaching and provide extra information
to general programmers. This thesis presents a set of visualization
tools which integrate visualization support for Soot, an optimizing
compiler framework, into Eclipse, a popular, extensible IDE.
In
particular, this work explores making the research compiler framework
more accessible to new users and general programmers. Tools for
displaying data flow analysis results in intermediate representations (IRs)
and in the original source code are discussed, with consideration for
the issue of the mapping of information between the low-level IRs
and the source using flexible and extensible mechanisms. Also described
are tools for interactive control flow graphs which can be used for
research and teaching and tools for displaying large graphs, such as
call graphs, in a manageable way.
Additionally,
the area of communicating information generated by
optimizing compilers to general programmers is explored with a small
case study to determine if analyses could be useful to general
programmers and how the information is displayed.
This
work is shown to be useful for research to find imprecision or errors
in analyses, both from visualizing the intermediate results with the
interactive control flow graphs and the final results at the IR
and source code levels, and for students learning about compiler
optimizations and writing their first data flow analyses.
Program Analysis using Binary Decision Diagrams
|
back |
Author: Ondřej Lhoták
Date: January 2006
Abstract
A fundamental problem in interprocedural program analyses is
the need to represent and manipulate collections of large sets. BDDs
are a data structure widely used in model checking to compactly encode
large state sets. In this dissertation, we develop new techniques and
frameworks for applying BDDs to program analysis, and use our BDD-based
analyses to gain new insight into factors influencing analysis
precision.
To make it feasible to express complicated, interrelated
analyses using BDDs, we first present the design and implementation of
Jedd, a Java language extension which adds relations implemented with
BDDs as a datatype, and makes it possible to express BDD-based
algorithms at a higher level than existing BDD libraries.
Using Jedd, we develop Paddle, a framework of
context-sensitive points-to and call graph analyses for Java, as well
as client analyses that make use of their results. Paddle supports
several variations of context-sensitive analyses, including the use of
call site strings and abstract receiver object strings as abstractions
of context.
We use the Paddle framework to perform an in-depth empirical
study of the effect of context-sensitivity variations on the precision
of interprocedural program analyses. The use of BDDs enables us to
compare context-sensitive analyses on much larger, more realistic
benchmarks than has been possible with traditional analysis
implementations.
Finally, based on the call graph computed by Paddle, we
implement, using Jedd, a novel static analysis of the cflow construct
in the aspect-oriented language AspectJ. Thanks to the Jedd high-level
representation, the implementation of the analysis closely mirrors its
specification.
Measuring and improving the runtime behaviour of
AspectJ programs
|
back |
Author:
Christopher Goard
Date: August 2005
Abstract
AspectJ
is a popular aspect-oriented extension to Java, providing powerful new
features for the modularizing of crosscutting concerns, promising
improved code quality. The runtime cost of these features, however, is
currently not well understood, and is a concern limiting even more
wide-spread adoption of the language. The crosscutting nature of
AspectJ complicates the measurement of these costs.
This
thesis presents a methodology for analyzing the runtime behaviour of
AspectJ programs, with a particular emphasis on identifying runtime
overheads resulting from the implementation of AspectJ features. It
presents a taxonomy of overhead kinds and defines some new
AspectJ-specific dynamic metrics. A toolset for measuring these metrics
is described, including both of the current AspectJ compilers: ajc and
abc, and results for a newly collected set of AspectJ benchmarks are
presented.
Significant
overheads are found in some cases, suggesting improvements to the code
generation strategy of the AspectJ compilers. Initial implementations
of some improvements are presented, resulting, for some benchmarks, in
order of magnitude improvements to execution time. These improvements
have since been integrated in abc and ajc.
Clearly
understanding the runtime behaviour of AspectJ programs should result
in both better implementations of the language and more confident
adoption by the mainstream.
Runtime Techniques and Interprocedural Analysis in Java
Virtual Machines
|
back |
Author:
Feng Qian
Date: May 2005
Abstract
Java programs
are deployed in a bytecode format that is executed by a Java virtual
machine (JVM). JVM performance is determined by several major
components: execution engine, garbage collector, and threading system.
The static compilation and optimization approach, such as taken in
C/C++ compilers, does not fit in Java's execution model very well
because Java allows dynamic class loading, lazy resolution,
just-in-time (JIT) compilation, and garbage collection. These dynamic
features presents new challenges to JVM designers.
In this thesis,
we study both the challenges and opportunities of dynamic optimizations
in Java virtual machines. Our contributions include a new garbage
collector using dynamic techniques and dynamic interprocedural program
analyses for speculative optimizations in JIT compilers.
We present a
novel approach for reducing garbage collection frequencies. Instead of
relying on an ahead-of-time escape analysis or a region-based type
system, our approach adapts regions based on the runtime history of an
application. By freeing regions with associated stack frames, the
system can reduce the frequency of garbage collections. We present the
overall idea and provide details of a specific design and
implementation.
Dynamic class
loading is a two-edged sword. A JIT compiler can speculatively optimize
methods base on loaded classes only. However, newly loaded classes may
invalidate previous optimization assumptions. We review existing
techniques supporting speculative optimizations, including runtime
guards, code patching, and on-stack replacement. We present an
improvement and implementation of an on-stack replacement mechanism.
A call graph is
necessary for developing interprocedural program analyses. Call graph
construction in a Java virtual machine needs to overcome the
difficulties of dynamic class loading and lazy reference resolution. We
show a general approach to adapt static type analyses to dynamic
versions suitable for building call graphs in a JIT environment. We
also introduce a new call graph profiling mechanism using code stubs.
Having dynamic
call graphs, we study reachability-based interprocedural analysis. We
describe a general type analysis framework for supporting speculative
method inlining in a JIT environment. Several popular type analyses
were implemented in the framework, including an interprocedural one,
VTA~\cite{Sundaresan00VTA}. Using the framework, we present the results
of a limit study of method inlining and report our findings and
experience.
In each chapter
we discuss the related work for that chapter's topic. At the end of the
thesis, we point out future research directions.
Using Inter-Procedural Side-Effect Information in JIT
Optimizations
|
back |
Author: Anatole Le
Date: February 2005
Abstract
Side-effect analysis gives information about the set of
locations that a statement may read or modify. This analysis can
provide information useful in a compiler for performing aggressive
optimizations. The impact of the use of side-effect analysis in
compiler optimizations has been studied for programming languages such
as Modula-3 and C, but no thorough investigation for Java has been
done. We present a study of whether side-effect information improves
performance in Java just-in-time (JIT) compilers, and if so, what level
of analysis precision is needed. We also analyze the optimizations and
benchmarks that benefit most from side-effect analysis.
We used Spark, the inter-procedural analysis component of
the Soot Java analysis and optimization framework, to compute
side-effect information and encode it in class files. We modified Jikes
RVM, a research JIT, to make use of side-effect analysis in various
local and global analyses and optimizations such as local common
sub-expression elimination, heap SSA, redundant load elimination and
loop-invariant code motion. On the SpecJVM98 benchmarks, we measured
the static number of memory operations removed, the dynamic counts of
memory reads eliminated, and the execution time.
Our results show that the use of side-effect analysis
increases the number of static opportunities for load elimination by up
to 98%, and reduces dynamic field read instructions by up to 27%.
Side-effect information enabled speedups of up to 20% for some
benchmarks. The main cause of the speedups is the use of side-effect
information in load elimination. Finally, among the different levels of
precision of side-effect information, a simple side-effect analysis is
usually sufficient to obtain most of these speedups.
Objective Quantification of Program Behaviour Using
Dynamic Metrics
|
back |
Author: Bruno Dufour
Date: October 2004
Abstract
In order to perform meaningful experiments in optimizing
compilation and runtime system design, researchers usually rely on a
suite of benchmark programs of interest to the optimization technique
under consideration. Programs are described as numeric,
memory-intensive, concurrent, or object-oriented, based on a
qualitative appraisal, in some cases with little justification.
In order to make these intuitive notions of program
behaviour more concrete and subject to experimental validation, this
thesis introduces a methodology to objectively quantify key aspects of
program behaviour using dynamic metrics. A set of unambiguous, dynamic,
robust and architecture-independent dynamic metrics is defined, and can
be used to categorize programs according to their dynamic behaviour in
five areas: size, data structures, memory use, polymorphism and
concurrency. Each metric is also empirically validated.
A general-purpose, easily extensible dynamic analysis
framework has been designed and implemented to gather empirical metric
results. This framework consists of three major components. The
profiling agent collects execution data from a Java virtual machine.
The trace analyzer performs computations on this data, and the web
interface presents the result of the analysis in a convenient and
user-friendly way.
The utility of the approach as well as selected specific
metrics is validated by examining metric data for a number of commonly
used benchmarks. Case studies of program transformations and the
consequent effects on metric data are also considered. Results show
that the information that can be obtained from the metrics not only
corresponds well with the intuitive notions of program behaviour, but
can also reveal interesting behaviour that would have otherwise
required lengthy investigations using more traditional techniques.
Dataflow Analysis of the Pi-Calculus
|
back |
Author: Sam Sanjabi
Date: September 2004
Abstract
Static analysis is a technique used to compute information
about the runtime behaviour of a program prior to execution.
Traditionally, it has been used in the context of optimizing compilers,
but it has recently been applied to more formalized languages in order
to develop provable policies that can be used to verify the security of
networks. Best results are naturally achieved with the most precise
information flow techniques, though complex systems impose feasibility
constraints. Accuracy of results, particularly with respect to relative
cost of computation is thus an important quality.
This thesis presents a series of dataflow analyses of the
pi-calculus, an extensively studied concurrent language that has been
used to model and verify security protocols. Some of the presented
analyses are equivalent to previous work done in the field, but the
framework in which the analysis is done is new in that it immediately
suggests an iterative implementation.
There are also analyses presented that improve on existing
approaches in two ways. First, by fully treating the sequentiality of
potential actions in a protocol, thereby improving the accuracy of
previous approaches. Second, by considering the potential environment
that a process could be running in, the computed results are correct
independent of any context that the analyzed process may be in parallel
composition with.
SableJIT: A retargetable just-in-time compiler
|
back |
Author: David Belanger
Date: August 2004
Abstract
In this thesis, we introduce SableJIT, a retargetable
just-in-time compiler for the SableVM Java virtual machine. Our design
attempts to minimize the amount of work required to port (or to
retarget) our compiler to a new platform. We accomplish this in three
ways. First, we introduce a retargetable backend where the amount of
work required to port it is reduced to the implementation of simple
well-defined primitive functions. Second, we keep all code related to
the internals of the virtual machine in the frontend, making knowledge
of the target architecture sufficient for a port. Finally, we provide a
good development environment that favours incremental development and
testing, in part through a robust runtime system that can recover from
compilation failures.
We demonstrate the portability of SableJIT by supporting
three processor architectures and various operating systems. In
particular, we describe the experience acquired in porting our compiler
to the Solaris/SPARC platform.
A practical MHP information computation for concurrent
Java programs
|
back |
Author: Lin Li
Date: August 2004
Abstract
With the development of multi-processors, multi-threaded
programs and programing languages have become more and more popular.
This requires extending the scope of program analysis and compiler
optimization from traditional, sequential programs to concurrent
programs.
Naumovich et al. proposed May Happen in Parallel (MHP)
analysis that determines which program statements may be executed
concurrently. From this information, compiler optimization
improvements, as well as analysis data on potential program problems
such as dataraces can be analyzed or discovered.
Unfortunately, MHP analysis has some limitations with
respect to practical use. In this thesis we present an implementation
of MHP analysis for Java that attempts to address some of the practical
implementation concerns of the original work. We describe a design that
incorporates techniques for aiding a feasible implementation and
expanding the range of acceptable inputs.
The MHP analysis requires a particular internal
representation in order to run. By using a combination of techniques,
we are able to compact that representation, and thus significantly
improve MHP execution time without affecting accuracy. We also provide
experimental results showing the utility and impact of our approach and
optimizations using a variety of concurrent benchmarks. The results
show that our optimizations are effective, and allow more and larger
benchmarks to be analyzed. For some benchmarks, our optimizations have
very impressive results, speeding up MHP analysis by several orders of
magnitude.
STEP: A Framework for the Efficient Encoding of General
Trace Data
|
back |
Author: Rhodes H. F. Brown
Date: May 2003
Abstract
Program tracing is a common technique employed by software
and hardware developers who are interested in characterizing the
dynamic behavior of complex software systems. However, despite the
popularity of trace-driven analyses, there are surprisingly few options
for encoding trace data in a standard format.
In the past, many developers have resorted to creating their
own ad-hoc trace encoding solutions, tailored specifically to the data
they are considering. Such efforts are usually redundant, and in many
cases lead to an obscure and poorly documented trace format which
ultimately limits the reuse and sharing of potentially valuable
information.
The STEP system was created to address this problem by
providing a standard method for encoding general program trace data in
a flexible and compact format. The system consists of a trace data
definition language along with a compiler for the language and an
encoding architecture that implements a number of common trace
compaction techniques. The system simplifies the development and
interoperability of trace clients by encapsulating the encoding process
and presenting the data as an abstract object stream.
This thesis presents a detailed description of the STEP
system and evaluates its utility by applying it to a variety of trace
data from Java programs. Initial results indicate that compressed STEP
encodings are often substantially more compact than similarly
compressed naive formats.
New algorithms for a Java decompiler and their
implementation in Soot
|
back |
Author: Jerome Miecznikowski
Date: April 2003
Abstract
This thesis presents Dava, a Java bytecode to Java source
code decompiler built on top of the Soot framework.
The Java Virtual Machine Specification of valid bytecode is
much less restrictive than the Java Language Specification of valid
Java source programs. For example, bytecode has unstructured control
flow, loose local typing, and few restrictions on method modifiers. By
contrast, the Java language has highly structured control flow, strong
local typing, and many restrictions on its method modifiers. The goal
of this thesis was to build a tool that could correctly decompile the
widest range of verifiable Java bytecode back into compilable Java
source. This includes bytecode coming from other source languages,
bytecode that has undergone certain types of obfuscation, and optimized
bytecode. To accomplish this goal we created the Structure
Encapsulation Tree data structure and a set of new decompiling
algorithms.
The algorithms fall into three categories: regular control
flow, exceptional control flow, and idioms. Regular control flow
includes finding constructs such as loops, if-else statements, and
labeled blocks. Exceptional control flow refers strictly to try-catch
statements. Idioms is a collection of miscellaneous restructuring
enhance- ments and techniques for ensuring that we produce
syntactically correct Java. The Structure Encapsulation Tree allows us
to address various decompiling prob- lems one at a time, in order of
their difficulty. For example, we find all loops before we find any
if-else statements. The result is a very robust algorithm that will
restructure any control flow graph without resorting to tricks such as
encoding the control flow graph as a finite state machine.
We test our implementation of the decompiler on a wide range
of inputs and compare the results to the output of four leading Java
decompilers. Our results show that Dava always produces correct source
code and that it far outperforms the competition on all of our more
advanced tests.
Spark: A flexible points-to analysis framework for Java
|
back |
Author: Ondřej Lhoták
Date: February 2003
Abstract
Many compiler analyses and optimizations require precise
information about the behaviour of pointers in order to be effective.
Points-to analysis is a technique for computing this information that
has been studied extensively over the last decade. Most of this
research has focused on points-to analyses for C. The behaviour of
points-to analysis on higher-level languages such as Java appears very
different than on C. Moreover, most proposed points-to analysis
techniques were evaluated in disparate analysis systems and benchmarks,
making it difficult to compare their effectiveness.
To address these issues, this thesis introduces Spark, a
flexible framework for experimenting with points-to analyses for Java.
Spark is intended to be a universal framework within which different
points-to analyses can be easily implemented and compared in a common
context. Currently, Spark supports equality- and subset-based analyses,
variations in field sensitivity, respect for declared types, variations
in call graph construction, off-line simplification, and several
points-to set propagation algorithms.
A substantial study of factors affecting precision and
efficiency of points-to analyses has been performed as a demonstration
of Spark in action. The results show that Spark is not only flexible
and modular, but also very efficient compared to other points-to
analysis implementations.
Two client analyses that use the points-to information are
described, call graph construction and side-effect analysis. The
side-effect information can be encoded in Java class file attributes,
so that it can later be used for optimization by other compilers and
virtual machines.
Spark has been demonstrated to be a flexible and efficient
framework for Java points-to analysis. Several experiments that could
be performed with it are suggested.
A Portable Research Framework for the Execution of Java
Bytecode
|
back |
Author: Etienne Gagnon
Date: December 2002
Abstract
Compilation to bytecode paired with interpretation is often
used as a technique to easily build prototypes for new programming
languages. Some languages, including Java, push this further and use
the bytecode layer to isolate programs from the underlying platform.
Current state-of-the-art commercial and research Java virtual machines
implement advanced just-in-time and adaptive compilation techniques to
deliver high-performance execution of Java bytecode. Yet, experimenting
with new features such as adding new bytecodes or redesigning the type
system can be a daunting task within these complex systems, when new
features invalidate assumptions on which the internal dynamic
optimizing compiler depends. On the other hand, simpler existing Java
bytecode interpreters, written purely in high-level languages, deliver
poor performance. The main motivation behind this thesis was to answer
the question: How fast can a portable, easily modifiable Java
bytecode interpreter be? In order to address this question, we have
designed and developed the SableVM research framework, a
portable interpreter-based Java virtual machine written in portable C.
In this thesis we introduce innovative techniques for
implementing an efficient, yet portable Java bytecode interpreter.
These techniques address three areas: instruction dispatch, memory
management, and synchronization. Specifically, we show how to implement
an inline-threaded engine in the presence of lazy code preparation,
without incurring a high synchronization penalty. We then introduce a
logical partitioning of runtime system memory that simplifies memory
management, and a related sparse interface virtual table design for
fast interface-method invocation. We show how to efficiently compute
space-efficient garbage collection maps for verifiable bytecode.
We also present a bidirectional object layout that simplifies garbage
collection. Finally, we introduce an improvement to thin locks,
eliminating busy-wait in case of contention.
Our experiments within the SableVM framework show
that inline-threading Java delivers significant performance
improvement over switch and direct-threading, that sparse interface
tables cause no memory loss, and that our map computation algorithm
delivers a very small number of distinct garbage collection maps. Our
overall performance measurements show that, using our techniques, a
portable interpreter can deliver competitive interpretation
performance, and even surpass that of a less-portable state-of-the-art
interpreter on some benchmarks.
Combining Static and Dynamic Data in Code Visualization
|
back |
Author: David Eng
Date: August 2002
Abstract
The task of developing, tuning, and debugging compiler
optimizations is a difficult one which can be facilitated by software
visualization. There are many characteristics of the code which must be
considered when studying the kinds of optimizations which can be
performed. These characteristics can include large amounts of data
which would be difficult to inspect without the appropriate tools. Both
static data collected at compile-time and dynamic runtime data can
reveal opportunities for optimization and affect code transformations.
In order to expose the behavior of such complex systems, visualizations
should include as much information as possible and accommodate the
different sources from which this information is acquired.
This thesis presents a visualization framework designed to
address these issues. The framework is based on a new, extensible
language called JIL which provides a common format for encapsulating
intermediate representations and associating them with compile-time and
runtime data. We present new contributions which extend existing
compiler and profiling frameworks, allowing them to export the
intermediate languages, analysis results, and code metadata they
collect as JIL documents. Visualization interfaces can then combine the
JIL data from separate tools, exposing both static and dynamic
characteristics of the underlying code. We present such an interface in
the form of a new web-based visualizer, allowing JIL documents to be
visualized online in a portable, customizable interface.
A Comprehensive Approach to Array Bounds Check
Elimination for Java
|
back |
Author: Feng Qian
Date: April 2001
Abstract
The Java programming language requires array reference range
checks at run time to guarantee a program's safe execution. If the
array index exceeds the range, the run-time environment must throw an
IndexOutOfBoundsException at the precise program point where the array
reference occurs. Compilers generate conditional branch instructions
for implementing array bounds checks. A branch instruction has great
performance penalties in modern pipelined architectures. Also, it makes
many other optimizations difficult. For array-intensive applications,
array bounds checks may cause a heavy run-time overhead, and thus it is
beneficial to eliminate all checks which a static analysis can prove to
be unneeded. Array bounds checks are required by some other languages
such as Ada and Fortran, and some bounds check elimination algorithms
have been developed for these kinds of languages. However, these
algorithms are not directly applicable for Java applications because of
the precise-exception requirement of the language.
We present a new approach to eliminate array bounds checks
in Java by using static analyses. Our approach is based upon a
flow-sensitive intraprocedural analysis called variable constraint
analysis (VCA). VCA collects constraints between locals related to
array references. The array bounds check problem is formulated as
solving a system of difference constraints. The analysis builds a small
constraint graph for each important point in a method, and then
computes the shortest-path weight of the graph. The shortest-path
weights from upper bound to array index and from the index to lower
bound indicates the safety of checks. Using VCA as the base analysis,
we also show how two further analyses can improve the results of VCA.
Array field analysis is applied on each class and provides information
about some arrays stored in fields, while rectangular array analysis is
an interprocedural analysis to approximate the shape of arrays, and is
useful for finding rectangular (non-ragged) arrays.
We have implemented all three analyses using the Soot
bytecode optimization/annotation framework and we transmit the results
of the analysis to virtual machines using class file attributes. We
have modified the Kaffe JIT, and IBM's High Performance Compiler for
Java (HPCJ) (the experiment on HPCJ was conducted by Clark Verbrugge).
to make use of these attributes, and we demonstrate significant
speed-ups.
Soot: A Java Bytecode Optimization Framework
|
back |
Author: Raja Vallée-Rai
Date: July 2000
Abstract
Java provides many attractive features such as platform
independence, execution safety, garbage collection and object
orientation. These features facilitate application development but are
expensive to support; applications written in Java are often much
slower than their counterparts written in C or C++. To use these
features without having to pay a great performance penalty,
sophisticated optimizations and runtime systems are required.
We present Soot, a framework for optimizing Java bytecode.
The framework is implemented in Java and supports three intermediate
representations for representing Java bytecode: Baf, a streamlined
representation of bytecode which is simple to manipulate; Jimple, a
typed 3-address intermediate representation suitable for optimization;
and Grimp, an aggregated version of Jimple suitable for decompilation.
Soot also contains a set of transformations between these intermediate
representations, and an application programming interface (API) is
provided to write optimizations and analyses on Java bytecodein these
forms.
In order to demonstrate the usefulness of the framework, we
have implemented intraprocedural and whole program optimizations. To
show that whole program bytecode optimization can give performance
improvements, we provide experimentalresults for 10 large benchmarks,
including 8 SPECjvm98 benchmarks running on JDK1.2. These results show
a speedup of up to "38%".
A study of side-effect analyses for Java
|
back |
Author: Chrislain Razafimahefa
Date: December 1999
Abstract
In the context of an object-oriented programming language
such as Java, the ubiquitous use of instance fields and the presence of
pointers (references to objects) make it difficult to understand a
program's memory usage. Understanding memory usage is a key element for
optimizing programs and side-effect analysis, an analysis to estimate
variables that are inspected or altered by a computation, is of prime
importance for understanding and optimizing Java programs.
In this thesis, we study the impact of a set of practical
side-effect analyses on optimizing Java programs. The main contribution
of the thesis is the design and implementation of two groups of
analyses, namely type-based analysis and refers-to analysis. These
analyses use different strategies to capture the effect of an
instruction on variables present in the program. The former uses type
information whereas the second models the program's storage shape
graph. In both type-based analysis and refers-to analysis, two
variations of the analysis are presented, one which considers the
internal structure of objects such as fields as a key element of the
analysis, and another which considers objects as aggregates and ignores
their internal structure.
In order to assess the effectiveness of each analysis, we
have implemented loop-invariant removal, an optimization which moves
invariant computation out of loops. We used the Soot framework to
implement the analyses and the optimization. We present both static and
dynamic results based on experiments performed on a set of Java
benchmarks of various sizes. Our results suggests that side-effect
analysis is beneficial. Furthermore, we also notice that analyses that
exploit the internal structure of objects, as a key element of the
analysis, performs significantly better than their counterparts which
do not.
Practical Techniques for Virtual Call Resulution in Java
|
back |
Author: Vijay Sundaresan
Date: June 1999
Abstract
Virtual method calls are a fundamental feature offered by
Java, an object-oriented programming language. However, they are also a
source of degradation of performance at run time and imprecision in
interprocedural analyses. There are several well known, inexpensive
analyses that have been developed for virtual call resolution. However,
they have been observed to be effective in resolving method calls in
library code, while not being very effective in the benchmark code
excluding libraries.
We present a new flow insensitive and context insensitive
analysis called reaching type analysis in this thesis. We present the
analysis rules for two variations of this analysis, variable type
analysis and a coarser grained version declared type analysis. Reaching
type analysis is based on an analysis that builds a type propagation
graph where nodes represent variables and edges represent the flow of
types due to assignments.
We have implemented variable type analysis and declared type
analysis, and two existing analyses, class hierarchy analysis and rapid
type analysis, in the Soot framework and compared their relative
success at building accurate call graphs for complete applications. We
present static empirical results for call graph improvement for the
whole application as well as for the benchmark code alone. We have also
made dynamic measurements focusing on the benchmark code excluding
libraries.
Method inlining is a compiler optimization in which the
method call is replaced by the body of the method being called. Method
inlining is very effective in improving performance of benchmarks that
have many small methods and in which a large proportion of instructions
executed are virtual calls. We have implemented method inlining
(automatic and profile guided) at the Java bytecode level using the
Soot framework. We demonstrate the effectiveness of our analyses and
method inlining on a set of 15 benchmarks whose bytecodes were
generated from Java, ML, Ada, Eiffel and Pizza compilers.
SableCC, an Object Oriented Compiler Framework
|
back |
Author: Etienne Gagnon
Date: March 1998
Abstract
In this thesis, we introduce SableCC, an object-oriented
framework that generates compilers (and interpreters) in the Java
programming language. This framework is based on two fundamental design
decisions. Firstly, the framework uses object-oriented techniques to
automatically build a strictly-typed abstract syntax tree that matches
the grammar of the compiled language and simplifies debugging.
Secondly, the framework generates tree-walker classes using an extended
version of the visitor design pattern which enables the implementation
of actions on the nodes of the ab- stract syntax tree using
inheritance. These two design decisions lead to a tool that supports a
shorter development cycle for constructing compilers.
To demonstrate the simplicity of the framework, we discuss
the implementation of a state-of-the-art almost linear time points-to
analysis. We also provide a brief description of other systems that
have been implemented using the SableCC tool.
We conclude that the use of object-oriented techniques
significantly reduces the length of the programmer written code, can
shorten the development time and finally, makes the code easier to read
and maintain.
|