Sable Research Group

Sable Home
Main Page
People
Projects
Publications
Software
Internal
Links

Publications
Papers
Theses
Posters
Reports
Notes

Sable Publications (Papers)

Improving Database Query Performance with Automatic Fusion back

Best paper finalist.
Authors: Hanfeng Chen and Alexander Krolik and Bettina Kemme and Clark Verbrugge and Laurie Hendren
Date: 22-26 February 2020
CC '20, San Diego, CA, USA

Abstract
Array-based programming languages have shown significant promise for improving performance of column-based in-memory database systems, allowing elegant representation of query execution plans that are also amenable to standard compiler optimization techniques. Use of loop fusion, however, is not straightforward, due to the complexity of built-in functions for implementing complex database operators. In this work, we apply a compiler approach to optimize SQL query execution plans that are expressed in an array-based intermediate representation. We analyze this code to determine shape properties of the data being processed, and use a subsequent optimization phase to fuse multiple database operators into single, compound operations, reducing the need for separate computation and storage of intermediate values. Experimental results on a range of TPC-H queries show that our fusion technique is effective in generating efficient code, improving query time over a baseline system.
View the paper (.pdf) BibTeX entry

Numerical Computing on the Web: Benchmarking for the Future back

Authors: David Herrera and Hanfeng Chen and Erick Lavoie and Laurie Hendren
Date: 4-9 November 2018
DLS '18, Boston, MA, USA

Abstract
Recent advances in execution environments for JavaScript and WebAssembly that run on a broad range of devices, from workstations and mobile phones to IoT devices, provide new opportunities for portable and web-based numerical computing. Indeed, numerous numerical libraries and applications are emerging on the web, including Tensorflow.js, JSMapReduce, and the NLG Protein Viewer. This paper evaluates the current performance of numerical computing on the web, including both JavaScript and WebAssembly, over a wide range of devices from workstations to IoT devices. We developed a new benchmarking approach, which allowed us to perform centralized benchmarking, including benchmarking on mobile and IoT devices. Using this approach we performed four performance studies using the Ostrich benchmark suite, a collection of numerical programs representing the numerical dwarf categories identified by Colella. We studied the performance evolution of JavaScript, the relative performance of WebAssembly, the performance of server-side Node.js, and a comprehensive performance showdown for a wide range of devices.
View the paper (.pdf) BibTeX entry

HorseIR: Bringing Array Programming Languages Together with Database Query Processing back

Authors: Hanfeng Chen and Joseph Vinish D'silva and Hongji Chen and Bettina Kemme and Laurie Hendren
Date: 4-9 November 2018
DLS '18, Boston, MA, USA

Abstract
Relational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-memory databases with columnar storage, which use array-like storage structures, has shifted the focus on optimizations from the traditional I/O bottleneck to CPU and memory. However, database research so far has primarily focused on CPU cache optimizations. The similarity in the computational characteristics of such database workloads and array programming language optimizations are largely unexplored. We believe that these database implementations can benefit from merging database optimizations with dynamic array-based programming language approaches. Therefore, in this paper, we propose a novel approach to optimize database query execution using a new array-based intermediate representation, HorseIR, that resides between database queries and compiled code. Furthermore, we provide a translator to generate HorseIR from database execution plans and a compiler that optimizes HorseIR and generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing standard SQL queries, and show how our approach and compiler optimizations improve the runtime of complex queries.
View the paper (.pdf) BibTeX entry

Efficiently implementing the copy semantics of MATLAB's arrays in JavaScript back

Authors: Vincent Foley-Bourgon and Laurie J. Hendren
Date: 1 November 2016
DLS '16, Amsterdam, Netherlands

Abstract
Compiling MATLAB --- a dynamic, array-based language --- to JavaScript is an attractive proposal: the output code can be deployed on a platform used by billions and can leverage the countless hours that have gone into making JavaScript JIT engines fast. But before that can happen, the original MATLAB code must be properly translated, making sure to bridge the semantic gaps of the two languages.
An important area where MATLAB and JavaScript differ is in their handling of arrays: for example, in MATLAB, arrays are one-indexed and writing at an index beyond the end of an array extends it; in JavaScript, typed arrays are zero-indexed and writing out of bounds is a no-op. A MATLAB-to-JavaScript compiler must address these mismatches. Another salient and pervasive difference between the two languages is the assignment of arrays to variables: in MATLAB, this operation has value semantics, while in JavaScript is has reference semantics.
In this paper, we present MatJuice --- a source-to-source, ahead-of-time compiler back-end for MATLAB --- and how it deals efficiently with this last issue. We present an intra-procedural data-flow analysis to track where each array variable may point to and which variables are possibly aliased. We also present the associated copy insertion transformation that uses the points-to information to insert explicit copies when necessary. The resulting JavaScript program respects the MATLAB value semantics and we show that it performs fewer run-time copies than some alternative approaches.
View the paper (.pdf) BibTeX entry

Exhaustive Analysis of Thread-level Speculation back

Authors: Clark Verbrugge and Christopher J.F. Pickett and Alexander Krolik and Allan Kielstra
Date: 1 November 2016
SEPS '16, Amsterdam, Netherlands

Abstract
Thread-level Speculation (TLS) is a technique for automatic parallelization. The complexity of even prototype implementations, however, limits the ability to explore and compare the wide variety of possible design choices, and also makes understanding performance characteristics difficult. In this work we build a general analytical model of the method-level variant of TLS which we can use for determining program speedup under a wide range of TLS designs. Our approach is exhaustive, and using either simple brute force or more efficient dynamic programming implementations we are able to show how performance is strongly limited by program structure, as well as core choices in speculation design, irrespective of and complementary to the impact of data-dependencies. These results provide new, high-level insight into where and how thread-level speculation can and should be applied in order to produce practical speedup.
View the paper (.pdf) BibTeX entry

Automatic Vectorization for MATLAB back

Authors: Hanfeng Chen and Alexander Krolik and Erick Lavoie and Laurie J. Hendren
Date: 28-30 September 2016
LCPC '16, Rochester, NY, USA

Abstract
Dynamic array-based languages such as MATLAB provide a wide range of built-in operations which can be efficiently applied to all elements of an array. Historically, MATLAB and Octave programmers have been advised to manually transform loops to equivalent "vectorized" computations in order to maximize performance. In this paper we present the techniques and tools to perform automatic vectorization, including handling for loops with calls to user-defined functions. We evaluate the technique on 9 benchmarks using two interpreters and two JIT-based platforms and show that automatic vectorization is extremely effective for the interpreters on most benchmarks, and moderately effective on some benchmarks in the JIT context.
View the paper (.pdf) BibTeX entry

Reducing Memory Buffering Overhead in Software Thread-level Speculation back

Authors: Zhen Cao and Clark Verbrugge
Date: 17-18 March 2016
CC '16, Barcelona, Spain

Abstract
Software-based, automatic parallelization through Thread-Level Speculation (TLS) has significant practical potential, but also high overhead costs. Traditional "lazy" buffering mechanisms enable strong isolation of speculative threads, but imply large memory overheads, while more recent "eager" mechanisms improve scalability, but are more sensitive to data dependencies and have higher rollback costs. We here describe an integrated system that incorporates the best of both designs, automatically selecting the best buffering mechanism. Our approach builds on well-optimized designs for both techniques, and we describe specific optimizations that improve both lazy and eager buffer management as well. We implement our design within MUTLS, a software-TLS system based on the LLVM compiler framework. Results show that we can get 75% geometric mean performance of OpenMP versions on 9 memory intensive benchmarks. Application of these optimizations is thus a useful part of the optimization stack needed for effective and practical software TLS.
View the paper (.pdf) View the presentation slides (.pdf) BibTeX entry

Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs back

Authors: Rahul Garg and Laurie Hendren
Date: June 15 - 17, 2015
ARRAY@PLDI '15, Portland, OR, USA

Abstract
We present a toolkit called Velociraptor that can be used by compiler writers to quickly build compilers and other tools for array-based languages. Velociraptor operates on its own unique intermediate representation (IR) designed to support a variety of array-based languages. The toolkit also provides some novel analysis and transformations such as region detection and specialization, as well as a dynamic backend with CPU and GPU code generation. We discuss the components of the toolkit and also present case-studies illustrating the use of the toolkit.
View the paper (.pdf) BibTeX entry

AspectMatlab++: annotations, types, and aspects for scientists back

Authors: Andrew Bodzay and Laurie Hendren
Date: March 16 - 19, 2015
MODULARITY '15, Fort Collins, CO, USA

Abstract
In this paper we present extensions to an aspect oriented compiler developed for MATLAB. These extensions are intended to support important functionality for scientists, and include pattern matching on annotations, and types of variables, as well as new manners of exposing context. We provide use-cases of these features in the form of several general-use aspects which focus on solving issues that arise from use of dynamically-typed languages. We also detail performance enhancements to the ASPECTMATLAB compiler which result in an order of magnitude in performance gains.
View the paper (.pdf) BibTeX entry

Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs back

Authors: Rahul Garg and Laurie Hendren
Date: August 24-27, 2014
PACT '14, Edmonton, AB, Canada

Abstract
Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs.
Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU.
To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.
View the paper (.pdf) BibTeX entry

Mc2For: A Tool for Automatically Translating MATLAB to FORTRAN 95 back

Authors: Xu Li and Laurie Hendren
Date: 3-6 Feb. 2014
WCRE '14, Antwerp, Belgium

Abstract
MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent FORTRAN program. There are several important challenges for automatically translating MATLAB to FORTRAN, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent FORTRAN constructs.
In this paper, we introduce Mc2FOR, a tool which automatically translates MATLAB to FORTRAN. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to FORTRAN. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated FORTRAN code on a collection of MATLAB benchmarks.

View the paper (.pdf) BibTeX entry

Optimizing MATLAB Feval with Dynamic Techniques back

Authors: Nurudeen Lameed and Laurie Hendren
Date: October 2013
DLS '13, Indianapolis, USA

Abstract
MATLAB is a popular dynamic array-based language used by engineers, scientists and students worldwide. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach uses on-stack replacement technology, as supported by McVM/McOSR. The second approach specializes calls of functions with feval using a combination of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.
View the paper (.pdf) BibTeX entry

Mixed Model Universal Software Thread-Level Speculation back

Authors: Zhen Cao and Clark Verbrugge
Date: October 2013
ICPP '13, Lyon, France

Abstract
Software approaches to Thread-Level Speculation (TLS) have been recently explored, bypassing the need for specialized hardware designs. These approaches, however, tend to focus on source or VM-level implementations aimed at specific language and runtime environments. In addition, previous software approaches tend to make use of a simple thread forking model, reducing their ability to extract substantial parallelism from tree-form recursion programs such as depth-first search and divide-and-conquer. This paper proposes a Mixed forking model Universal software-TLS (MUTLS) system to overcome these limitations. MUTLS is purely based on the LLVM intermediate representation (IR), a language and architecture independent IR that supports more than 10 source languages and target architectures by many projects. MUTLS maximizes parallel coverage by applying a mixed forking model that allows all threads to speculate, forming a tree of threads. We evaluate MUTLS using several C/C++ and Fortran benchmarks on a 64-core machine. On 3 computation intensive applications we achieve speedups of 30 to 50 and 20 to 50 for the C and Fortran versions, respectively. We also observe speedups of 2 to 7 for memory intensive applications. Our experiments indicate that a mixed model is preferable for parallelization of tree-form recursion applications over the simple forking models used by previous software-TLS approaches. Our work also demonstrates that actual speedup is achievable on existing, commodity multi-core processors while maintaining the flexibility of a highly generic implementation context.

View the paper (.pdf) BibTeX entry

Adaptive Fork-Heuristics for Software Thread-Level Speculation back

Authors: Zhen Cao and Clark Verbrugge
Date: September 2013
PPAM '13, Warsaw, Poland

Abstract
Fork-heuristics play a key role in software Thread-Level Speculation (TLS). Current fork-heuristics either lack real parallel execution environment information to accurately evaluate fork points and/or focus on hardware-TLS implementation which cannot be directly applied to software TLS. This paper proposes adaptive fork-heuristics as well as a feedback-based selection technique to overcome the problems. Adaptive fork-heuristics insert and speculate on all potential fork/join points and purely rely on the runtime system to disable inappropriate ones. Feedback-based selection produces parallelized programs with ideal speedups using log files generated by adaptive heuristics. Experiments of three scientific computing benchmarks on a 64-core machine show that feedback-based selection and adaptive heuristics achieve more than 88% and 50% speedups of the manual-parallel version, respectively. For the Barnes-Hut benchmark, feedback-based selection is 49% faster than the manual-parallel version.

View the paper (.pdf) BibTeX entry

Refactoring MATLAB back

Authors: Soroush Radpour, Laurie Hendren and Max Schäfer
Date: March 2013
CC '13, Rome, Italy

Abstract
MATLAB is a very popular dynamic "scripting" language for numerical computations used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions, and frequently build upon existing code which may use outdated features. This results in programs that could benefit from refactoring, especially if the code will be reused and/or distributed. Despite the need for refactoring, there appear to be no MATLAB refactoring tools available. Furthermore, correct refactoring of MATLAB is quite challenging because of its non-standard rules for binding identifiers. Even simple refactorings are non-trivial.
This paper presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: converting scripts to functions, extracting functions, and converting dynamic function calls to static ones. The refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.

View the paper (.pdf) BibTeX entry

A Modular Approach to On-Stack Replacement in LLVM back

Authors: Nurudeen Lameed and Laurie Hendren
Date: March 2013
VEE '13, Houston, Texas, USA

Abstract
On-stack replacement (OSR) is a technique that allows a virtual machine to interrupt running code during the execution of a function/method, to re-optimize the function on-the-fly using an optimizing JIT compiler, and then to resume the interrupted function at the point and state at which it was interrupted. OSR is particularly useful for programs with potentially long-running loops, as it allows dynamic optimization of those loops as soon as they become hot.
This paper presents a modular approach to implementing OSR for the LLVM compiler infrastructure. This is an important step forward because LLVM is gaining popular support, and adding the OSR capability allows compiler developers to develop new dynamic techniques. In particular, it will enable more sophisticated LLVM-based JIT compiler approaches. Indeed, other compiler/VM developers can use our approach because it is a clean modular addition to the standard LLVM distribution. Further, our approach is defined completely at the LLVM-IR level and thus does not require any modifications to the target code generation.
The OSR implementation can be used by different compilers to support a variety of dynamic optimizations. As a demonstration of our OSR approach, we have used it to support dynamic inlining in McVM. McVM is a virtual machine for MATLAB which uses a LLVM-based JIT compiler. MATLAB is a popular dynamic language for scientific and engineering applications that typically manipulate large matrices and often contain long-running loops, and is thus an ideal target for dynamic JIT compilation and OSRs. Using our McVM example, we demonstrate reasonable overheads for our benchmark set, and performance improvements when using it to perform dynamic inlining.

View the paper (.pdf) BibTeX entry

Taming MATLAB back

Authors: Anton Dubrau and Laurie Hendren
Date: October 2012
OOPSLA '11, Tucson, Arizona, USA

Abstract
MATLAB is a dynamic scientific language used by scientists, engineers and students worldwide. Although MATLAB is very suitable for rapid prototyping and development, MATLAB users often want to convert their final MATLAB programs to a static language such as FORTRAN. This paper presents an extensible object-oriented toolkit for supporting the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, identifies a large tame subset of MATLAB, supports the generation of a specialized Tame IR for that subset, provides a principled approach to handling the large number of builtin MATLAB functions, and supports an extensible interprocedural value analysis for estimating MATLAB types and call graphs.

View the paper (.pdf) BibTeX entry

Kind Analysis for MATLAB back

Authors: Jesse Doherty, Laurie Hendren and Soroush Radpour
Date: October 2011
OOPSLA '11, Portland, Oregon, USA

Abstract

MATLAB is a popular dynamic programming language used for scientific and numerical programming. As a language, it has evolved from a small scripting language intended as an interactive interface to numerical libraries, to a very popular language supporting many language features and libraries. The overloaded syntax and dynamic nature of the language, plus the somewhat organic addition of language features over the years, makes static analysis of modern MATLAB quite challenging.
A fundamental problem in MATLAB is determining the kind of an identifier. Does an identifier refer to a variable, a named function or a prefix? Although this is a trivial problem for most programming languages, it was not clear how to do this properly in MATLAB. Furthermore, there was no simple explanation of kind analysis suitable for MATLAB programmers, nor a publicly-available implementation suitable for compiler researchers.
This paper explains the required background of MATLAB, clarifies the kind assignment program, and proposes some general guidelines for developing good kind analyses. Based on these foundations we present our design and implementation of a variety of kind analyses, including an approach that matches the intended behaviour of modern MATLAB 7 and two potentially better alternatives.
We have implemented all the variations of the kind analysis in McLAB, our extensible compiler framework, and we present an empirical evaluation of the various analyses on a large set of benchmark programs.

View the paper (.pdf) View the slides (.pptx) BibTeX entry

The Soot framework for Java program analysis: a retrospective back

Authors: Patric Lam, Eric Bodden, Ondrej Lhotak and Laurie Hendren
Date: October 2011
CETUS '11, Galveston, Texas, USA

Abstract

Soot is a successful framework for experimenting with compiler and software engineering techniques for Java programs. Researchers from around the world have implemented a wide range of research tools which build on Soot, and Soot has been widely used by students for both courses and thesis research. In this paper, we describe relevant features of Soot, summarize its development process, and discuss useful features for future program analysis frameworks.

View the paper (.pdf) View the slides (.pdf) BibTeX entry

There is Nothing Wrong with Out-of-Thin-Air: Compiler Optimization and Memory Models back

Authors: Clark Verbrugge, Allan Kielstra, and Yi Zhang
Date: June 2011
MSPC 2011, San Jose, California, USA

Abstract

Memory models are used in concurrent systems to specify visibility properties of shared data. A practical memory model, however, must permit code optimization as well as provide a useful semantics for programmers. Here we extend recent observations that the current Java memory model imposes significant restrictions on the ability to optimize code. Beyond the known and potentially correctable proof concerns illustrated by others we show that major constraints on code generation and optimization can in fact be derived from fundamental properties and guarantees provided by the memory model. To address this and accommodate a better balance between programmability and optimization we present ideas for a simple concurrency semantics for Java that avoids basic problems at a cost of backward compatibility.

View the paper (.pdf) View the slides (.pdf) BibTeX entry

MetaLexer: A Modular Lexical Specification Language back

Authors: Andrew Casey and Laurie Hendren
Date: March 2011
AOSD '11, Pernambuco, Brazil

Abstract

Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Although there exist elegant toolkits for modular and extensible parsers, compiler developers must often resort to ad-hoc solutions when extending or composing lexers. This paper presents MetaLexer, a new modular lexical specification language and associated tool.
MetaLexer allows programmers to define lexers in a modular fashion. MetaLexer modules can be used to break the lexical specification of a language into a collection smaller modular lexical specifications. Control is passed between the modules using the concept of meta-tokens and meta-lexing. MetaLexer modules are also extensible.
MetaLexer has three key features: it abstracts lexical state transitions out of semantic actions, it makes modules extensible by introducing multiple inheritance, and it provides platform agnostic support for a variety of programming languages and compiler front-end toolchains.
We have constructed a MetaLexer tool which converts MetaLexer specifications to the popular JFlex lexical specification language and we have used our tool to create lexers for three real programming languages and their extensions: AspectJ (and two AspectJ extensions), MATLAB, and MetaLexer itself. The new specifications are easier to read, are extensible, and require much less action code than the originals.

View the paper (.pdf) View the slides (.pptx) BibTeX entry

Typing Aspects for MATLAB back

Authors: Laurie Hendren
Date: March 2011
DSAL '11, Pernambuco, Brazil

Abstract

The MATLAB programming language is heavily used in many scientific and engineering domains. Part of the appeal of the language is that one can quickly prototype numerical algorithms without requiring any static type declarations. However, this lack of type information is detrimental to both the programmer in terms of software reliability and understanding, and to the compiler in terms of generating efficient code.
This paper introduces the idea of adding typing aspects to MATLAB programs. A typing aspect can be used to: (1) capture the run-time types of variables, and (2) to check run-time types against either a declared type or against a previously captured run-time type. Typings aspects can be deployed at three different levels,they can be used: (1) solely as documentation, (2) to log type errors or (3) to catch type errors at run-time.

View the paper (.pdf) View the slides (.pptx) BibTeX entry

Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler back

Authors: Nurudeen Lameed and Laurie Hendren
Date: March 2011
CC 2011, Saarbrüken, Germany

Abstract

MATLAB has gained widespread acceptance among scientists.Several dynamic aspects of the language contribute to its appeal, but also provide many challenges. One such problem is caused by the copy semantics of MATLAB. Existing MATLAB systems rely on reference-counting schemes to create copies only when a shared array representation is updated. This reduces array copies, but requires runtime checks.

We present a staged static analysis approach to determine when copies are not required. The first stage uses two simple, intraprocedural analyses, while the second stage combines a forward necessary copy analysis with a backward copy placement analysis. Our approach eliminates unneeded array copies without requiring reference counting or frequent runtime checks.

We have implemented our approach in the McVM JIT. Our results demonstrate that, for our benchmark set, there are significant overheads for both existing reference-counted and naive copy-insertion approaches, and that our staged approach is effective in avoiding unnecessary copies.

View the paper (.pdf) BibTeX entry

McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations back

Authors: Amina Aslam and Laurie Hendren
Date: October 2010
LCPC 2010, Houston, Texas, USA

Abstract

Parallelization and optimization of the MATLAB programming language presents several challenges due to the dynamic nature of MATLAB. Since MATLAB does not have static type declarations, neither the shape and size of arrays, nor the loop bounds are known at compile-time. This means that many standard array dependence tests and associated transformations cannot be applied straight-forwardly. On the other hand, many MATLAB programs operate on arrays using loops and thus are ideal candidates for loop transformations and possibly loop vectorization/parallelization.

This paper presents a new framework,McFLAT,which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

Our McFLAT framework has been implemented as part of the McLAB extensible compiler toolkit. Currently, McFLAT, is used to automatically transform ordinary MATLAB code into specialized MATLAB code with transformations applied to it. This specialized code can be executed on any MATLAB system, and we report results for four execution engines, Mathwork’s proprietary MATLAB system, the GNU Octave open-source interpreter, McLAB’s McVM interpreter and the McVM JIT. For several benchmarks, we observed significant speedups for the specialized versions, and noted that loop transformations had different impacts depending on the loop range and execution engine.

View the paper (.pdf) BibTeX entry

Optimizing Matlab through Just-In-Time Specialization back

Authors: Maxime Chevalier-Boisvert, Laurie Hendren, and Clark Verbrugge
Date: March 2010
CC 2010, Paphos, Cyprus

Abstract

Scientists are increasingly using dynamic programming languages like Matlab for prototyping and implementation. Effectively compiling Matlab raises many challenges due to the dynamic and complex nature of Matlab types. This paper presents a new JIT-based approach which specializes and optimizes functions on-the-fly based on the current types of function arguments.

A key component of our approach is a new type inference algorithm which uses the run-time argument types to infer further type and shape information, which in turn provides new optimization opportunities. These techniques are implemented in McVM, our open implementation of a Matlab virtual machine. As this is the first paper reporting on McVM, a brief introduction to McVM is also given.

We have experimented with our implementation and compared it to several other Matlab implementations, including the Mathworks proprietary system, McVM without specialization, the Octave open-source interpreter and the McFor static compiler. The results are quite encouraging and indicate that specialization is an effective optimization—McVM with specialization outperforms Octave by a large margin and also sometimes outperforms the Mathworks implementation.

View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

AspectMatlab: An Aspect-Oriented Scientific Programming Language back

Authors: Toheed Aslam, Jesse Doherty, Anton Dubrau and Laurie Hendren
Date: March 2010
AOSD 2010, Rennes and Saint-Malo , France

Abstract

This paper introduces a new aspect-oriented programming language, AspectMatlab. Matlab is a dynamic scientific programming language that is commonly used by scientists because of its convenient and high-level syntax for arrays, the fact that type declarations are not required, and the availability of a rich set of application libraries.

AspectMatlab introduces key aspect-oriented features in a way that is both accessible to scientists and where the aspect-oriented features concentrate on array accesses and loops, the core computation elements in scientific programs.

Introducing aspects into a dynamic language such asMatlab also provides some new challenges. In particular, it is difficult to statically determine precisely where patterns match, resulting in many dynamic checks in the woven code. Our compiler includes flow analyses which are used to eliminate many of those dynamic checks.

This paper reports on the language design of AspectMatlab, the amc compiler implementation and related optimizations, and also provides an overview of use cases that are specific to scientific programming.

View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Dependent Advice: A General Approach to Optimizing History-based Aspects back

Authors: Eric Bodden, Feng Chen and Grigore Rosu
Date: March 2009
AOSD 2009, Charlottesville, VA

Abstract

Many aspects for runtime monitoring are history-based: they contain pieces of advice that execute conditionally, based on the observed execution history. History-based aspects are notorious for causing high runtime overhead. Compilers can apply powerful optimizations to history-based aspects using domain knowledge. Unfortunately, current aspect languages like AspectJ impede optimizations, as they provide no means to express this domain knowledge.

In this paper we present dependent advice, a novel AspectJ language extension. A dependent advice contains dependency annotations that preserve crucial domain knowledge: a dependent advice needs to execute only when its dependencies are fulfilled. Optimizations can exploit this knowledge: we present a whole-program analysis that removes advice-dispatch code from program locations at which an advice's dependencies cannot be fulfilled.

Programmers often opt to have history-based aspects generated automatically, from formal specifications from model-driven development or runtime monitoring. As we show using code-generation tools for two runtime-monitoring approaches, tracematches and JavaMOP, such tools can use knowledge contained in the specification to automatically generate dependency annotations as well.

Our extensive evaluation using the DaCapo benchmark suite shows that the use of dependent advice can significantly lower, sometimes even completely eliminate, the runtime overhead caused by history-based aspects, independently of the specification formalism.

View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Finding Programming Errors Earlier by Evaluating Runtime Monitors Ahead-of-Time back

Authors: Eric Bodden, Patrick Lam and Laurie Hendren
Date: November 2008
FSE 2008

Abstract
Runtime monitoring allows programmers to validate, for instance, the proper use of application interfaces. Given a property specification, a runtime monitor tracks appropriate runtime events to detect violations and possibly execute recovery code. Although powerful, runtime monitoring inspects only one program run at a time and so may require many program runs to find errors. Therefore, in this paper, we present ahead-of-time techniques that can (1) prove the absence of property violations on all program runs, or (2) flag locations where violations are likely to occur.
Our work focuses on tracematches, an expressive runtime monitoring notation for reasoning about groups of correlated objects. We describe a novel flow-sensitive static analysis for analyzing monitor states. Our abstraction captures both positive information (a set of objects could be in a particular monitor state) and negative information (the set is known not to be in a state). The analysis resolves heap references by combining the results of three points-to and alias analyses. We also propose a machine learning phase to filter out likely false positives.
We applied a set of 13 tracematches to the DaCapo benchmark suite and SciMark2. Our static analysis rules out all potential points of failure in 50% of the cases, and 75% of false positives on average. Our machine learning algorithm correctly classifies the remaining potential points of failure in all but three of 461 cases. The approach revealed defects and suspicious code in three benchmark programs.

View the paper (.pdf) BibTeX entry

Object representatives: a uniform abstraction for pointer information back

Authors: Eric Bodden, Patrick Lam and Laurie Hendren
Date: October 2008
1st International Academic Conference of the British Computer Society (BCS)

Abstract
Pointer analyses enable many subsequent program analyses and transformations by statically disambiguating references to the heap. However, different client analyses may have different sets of pointer analysis needs, and each must pick some pointer analysis along the cost/precision spectrum to meet those needs. Some analysis clients employ combinations of pointer analyses to obtain better precision with reduced analysis times. Our goal is to ease the task of developing client analyses by enabling composition and substitutability for pointer analyses. We therefore propose object representatives, which statically represent runtime objects. A representative encapsulates the notion of object identity, as observed through the representative's aliasing relations with other representatives. Object representatives enable pointer analysis clients to disambiguate references to the heap in a uniform yet flexible way. Representatives can be generated from many combinations of pointer analyses, and pointer analyses can be freely exchanged and combined without changing client code. We believe that the use of object representatives brings many software engineering benefits to compiler implementations because, at compile time, object representatives are Java objects. We discuss our motivating case for object representatives, namely, the development of an abstract interpreter for tracematches, a language feature for runtime monitoring. We explain one particular algorithm for computing object representatives which combines flow-sensitive intraprocedural must-alias and must-not-alias analyses with a flow-insensitive, context-sensitive whole-program points-to analysis. In our experience, client analysis implementations can almost directly substitute object representatives for runtime objects, simplifying the design and implementation of such analyses.
View the paper (.pdf) BibTeX entry

Racer: Effective Race Detection Using AspectJ back

Winner of an "ACM SIGSOFT Distinguished Paper Award".
Authors: Eric Bodden and Klaus Havelund
Date: July 2008
ISSTA 08, July 2008, Seattle, WA

Abstract

Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new pointcuts, lock(), unlock() and maybeShared(). These pointcuts allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call Racer, an adoption of the well-known Eraser algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the AspectBench Compiler, implemented the Racer algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive Racer finds 70 data races. Only one of these races was previously known. We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.

View the paper (.pdf) BibTeX entry

Relational Aspects as Tracematches back

Authors: Eric Bodden, Reehan Shaikh and Laurie Hendren
Date: March 2008
AOSD 2008, March 2008, Brussels, Belgium

Abstract

The relationships between objects in an object-oriented program are an essential property of the program's design and implementation. Two previous approaches to implement relationships with aspects were association aspects, an AspectJ-based language extension, and the relationship aspects library. While those approaches greatly ease software development, we believe that they are not general enough. For instance, the library approach only works for binary relationships, while the language extension does not allow for the association of primitive values or values from non-weavable classes. Hence, in this work we propose a generalized alternative implementation via a direct reduction to tracematches, a language feature for executing an advice after having matched a sequence of events. This new implementation scheme yields multiple benefits. Firstly, our implementation is more general than existing ones, avoiding most previous limitations. It also yields a new language construct, relational tracematches. We provide an efficient implementation based on the AspectBench Compiler, along with test cases and microbenchmarks. Our empirical studies showed that our implementation, when compared to previous approaches, uses a similar memory footprint with no leaking, but the generality of our approach does lead to some runtime overhead. We believe that our implementation can provide a solid foundation for future research.

View the paper (.pdf) BibTeX entry

Compiler-guaranteed Safety in Code-copying Virtual Machines back

Authors: Gregory B. Prokopski and Clark Verbrugge
Date: March 2008
CC 2008, March 29 - April 6, 2008, Budapest, Hungary

Abstract
Virtual Machine authors face a difficult choice between low performance, cheap interpreters, or specialized and costly compilers. A method able to bridge this wide gap is the existing \emph{code-copying} technique that reuses chunks of the VM's binary code to create a simple JIT. This technique is not reliable without a compiler guaranteeing that copied chunks are still functionally equivalent despite aggressive optimizations. We present a proof-of-concept, minimal-impact modification of a highly optimizing compiler, GCC. A VM programmer marks chunks of VM source code as {\em copyable}. The chunks of native code resulting from compilation of the marked source become addressable and self-contained. Chunks can be safely copied at VM runtime, concatenated and executed together. This allows code-copying VMs to safely achieve speedup up to 3 times, 1.67 on average, over the {\em direct} interpretation. This maintainable enhancement makes the code-copying technique reliable and thus practically usable.
View the paper (.pdf) BibTeX entry View the slides (.pdf) Springer version

Phase-Based Adaptive Recompilation in a JVM back

Authors: Dayong Gu and Clark Verbrugge
Date: April 2008
CGO 2008, April 6 - 9, 2008, Boston, Massachusetts

Abstract
Modern JIT compilers often employ multi-level recompilation strategies as a means of ensuring the most used code is also the most highly optimized, balancing optimization costs and expected future performance. Accurate selection of code to compile and level of optimization to apply is thus important to performance. In this paper we investigate the effect of an improved recompilation strategy for a Java virtual machine. Our design makes use of a lightweight, low-level profiling mechanism to detect high-level, variable length phases in program execution. Phases are then used to guide adaptive recompilation choices, improving performance. We develop both an offline implementation based on trace data and a self-contained online version. Our offline study shows an average speedup of 8.7% and up to 21%, and our online system achieves an average speedup of 4.4%, up to 18%. We subject our results to extensive analysis and show that our design achieves good overall performance with high consistency despite the existence of many complex and interacting factors in such an environment.
View the paper (.pdf) BibTeX entry View the slides (.pdf) ACM version

A staged static program analysis to improve the performance of runtime monitoring back

Authors: Eric Bodden Laurie Hendren and Ondřej Lhoták
Date: July 2007
21st European Conference on Object-Oriented Programming, July 30th - August 3rd 2007, Berlin, Germany

There exists an extended Technical Report version of this paper: abc-2007-2.

Abstract

In runtime monitoring, a programmer specifies a piece of code to execute when a trace of events occurs during program execution. Our work is based on tracematches, an extension to AspectJ, which allows programmers to specify traces via regular expressions with free variables. In this paper we present a staged static analysis which speeds up trace matching by reducing the required runtime instrumentation. The first stage is a simple analysis that rules out entire tracematches, just based on the names of symbols. In the second stage, a points-to analysis is used, along with a flow-insensitive analysis that eliminates instrumentation points with inconsistent variable bindings. In the third stage the points-to analysis is combined with a flow-sensitive analysis that also takes into consideration the order in which the symbols may execute. To examine the effectiveness of each stage, we experimented with a set of nine tracematches applied to the DaCapo benchmark suite. We found that about 25% of the tracematch/benchmark combinations had instrumentation overheads greater than 10%. In these cases the first two stages work well for certain classes of tracematches, often leading to significant performance improvements. Somewhat surprisingly, we found the third, flow-sensitive, stage did not add any improvements.

View the paper (.pdf) BibTeX entry

Component-Based Lock Allocation back

Authors: Richard L. Halpert and Christopher J. F. Pickett and Clark Verbrugge
Date: July 2007
PACT 2007, September 2007, Brasov, Romania

Abstract
The allocation of lock objects to critical sections in concurrent programs affects both performance and correctness. Recent work explores automatic lock allocation, aiming primarily to minimize conflicts and maximize parallelism by allocating locks to individual critical section interferences. We investigate component-based lock allocation, which allocates locks to entire groups of interfering critical sections. Our allocator depends on a thread-based side effect analysis, and benefits from precise points-to and may happen in parallel information. Thread-local object information has a small impact, and dynamic locks do not improve significantly on static locks. We experiment with a range of small and large Java benchmarks on 2-way, 4-way, and 8-way machines, and find that a single static lock is sufficient for mtrt, that performance degrades by 10% for hsqldb, that jbb2000 becomes mostly serialized, and that for lusearch, xalan, and jbb2005, component-based lock allocation recovers the performance of the original program.
View the paper (.pdf) BibTeX entry

Dynamic Purity Analysis for Java Programs back

Authors: Haiying Xu and Christopher J. F. Pickett and Clark Verbrugge
Date: April 2007
PASTE 2007, June 2007, San Diego, California, USA

Abstract
The pure methods in a program are those that exhibit functional or side effect free behaviour, a useful property in many contexts. However, existing purity investigations present primarily static results. We perform a detailed examination of dynamic method purity in Java programs using a JVM-based analysis. We evaluate multiple purity definitions that range from strong to weak, consider purity forms specific to dynamic execution, and accomodate constraints imposed by an example consumer application, memoization. We show that while dynamic method purity is actually fairly consistent between programs, examining pure invocation counts and the percentage of the bytecode instruction stream contained within some pure method reveals great variation. We also show that while weakening purity definitions exposes considerable dynamic purity, consumer requirements can limit the actual utility of this information.
View the paper (.pdf) BibTeX entry

Obfuscating Java: the most pain for the least gain back

Authors: Michael Batchelder and Laurie Hendren
Date: March 2007
International Conference on Compiler Construction (CC 2007), Braga, Portugal.

Abstract
Bytecode, Java's binary form, is relatively high-level and therefore susceptible to decompilation attacks. An obfuscator transforms code such that it becomes more complex and therefore harder to reverse engineer. We develop bytecode obfuscations that are complex to reverse engineer but also do not significantly degrade performance. We present three kinds of techniques that: (1) obscure intent at the operational level; (2) complicate control flow and object-oriented design (i.e. program structure); and (3) exploit the semantic gap between what is legal in source code and what is legal in bytecode. Obfuscations are applied to a benchmark suite to examine their affect on runtime performance, control flow graph complexity, and decompilation. These results show that most of the obfuscations have only minor negative performance impacts and many increase complexity. In almost all cases, tested decompilers fail to produce legal source code or crash completely. Those obfuscations that are decompilable greatly reduce the readability of the output source code.
View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Avoiding Infinite Recursion with Stratified Aspects back

Authors: Eric Bodden, Florian Forster and Friedrich Steimann
Date: March 2006
Net.ObjectDays 2006 - published in: GI-Edition Lecture Notes in Informatics 'NODe 2006 GSEM 2006'

Abstract
Infinite recursion is a known problem of aspect-oriented programming with AspectJ: if no special precautions are taken, aspects which advise other aspects can easily and unintentionally advise themselves. We present a compiler for an extension of the AspectJ programming language that avoids self reference by associating aspects with levels, and by automatically restricting the scope of pointcuts used by an aspect to join points of lower levels. We report on a case study using our language extension and quantify the changes necessary for migrating existing applications to it. Our results suggest that we can make programming with AspectJ simpler and safer, without restricting its expressive power unduly.
View the paper (.pdf) BibTeX entry

Programmer-Friendly Decompiled Java back

Authors: Nomair A. Naeem and Laurie Hendren
Date: March 2006
International Conference on Program Comprehension (ICPC 2006), Athens, Greece.

Abstract
Java decompilers convert Java class files to Java source. Java class files may be created by a number of different tools including standard Java compilers, compilers for other languages such as AspectJ, or other tools such as optimizers or obfuscators. There are two kinds of Java decompilers, javac-specific decompilers that assume that the class file was created by a standard javac compiler and tool-independent decompilers that can decompile arbitrary class files, independent of the tool that created the class files. Typically javac-specific decompilers produce more readable code, but they fail to decompile many class files produced by other tools.
This paper tackles the problem of how to make a toolindependent decompiler, Dava, produce Java source code that is programmer-friendly. In past work it has been shown that Dava can decompile arbitrary class files, but often the output, although correct, is very different from what a programmer would write and is hard to understand. Furthermore, tools like obfuscators intentionally confuse the class files and this also leads to confusing decompiled source files.
Given that Dava already produces correct Java abstract syntax trees (ASTs) for arbitrary class files, we provide a new back-end for Dava. The back-end rewrites the ASTs to semantically equivalent ASTs that correspond to code that is easier for programmers to understand. Our new backend includes a new AST traversal framework, a set of simple pattern-based transformations, a structure-based data flow analysis framework and a collection of more advanced AST transformations that use flow analysis information. We include several illustrative examples including the use of advanced transformations to clean up obfuscated code.
View the paper (.pdf) BibTeX entry

Context-sensitive points-to analysis: is it worth it? back

Authors: Ondřej Lhoták and Laurie Hendren
Date: March 2006
15th International Conference on Compiler Construction (CC 2006)

Abstract
We present the results of an empirical study evaluating the precision of subsetbased pointsto analysis with several variations of context sensitivity on Java benchmarks of significant size. We compare the use of call site strings as the context abstraction, object sensitivity, and the BDDbased contextsensitive algo rithm proposed by Zhu and Calman, and by Whaley and Lam. Our study includes analyses that contextsensitively specialize only pointer variables, as well as ones that also specialize the heap abstraction. We measure both characteristics of the pointsto sets themselves, as well as effects on the precision of client analyses. To guide development of efficient analysis implementations, we measure the number of contexts, the number of distinct contexts, and the number of distinct pointsto sets that arise with each context sensitivity variation. To evaluate precision, we measure the size of the call graph in terms of methods and edges, the number of devirtualizable call sites, and the number of casts statically provable to be safe. The results of our study indicate that objectsensitive analysis implementations are likely to scale better and more predictably than the other approaches; that object sensitive analyses are more precise than comparable variations of the other ap proaches; that specializing the heap abstraction improves precision more than ex tending the length of context strings; and that the profusion of cycles in Java call graphs severely reduces precision of analyses that forsake context sensitivity in cyclic regions.
View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Dynamic Data Structure Analysis for Java Programs back

Authors: Sokhom Pheng and Clark Verbrugge
Date: June 2006
ICPC 2006, Athens, Greece

Abstract
Analysis of dynamic data structure usage is useful for both program understanding and for improving the accuracy of other program analyses. Static analysis techniques, however, suffer from reduced accuracy in complex situations, and do not necessarily give a clear picture of runtime heap activity. We have designed and implemented a dynamic heap analysis system that allows one to examine and analyze how Java programs build and modify data structures. Using a complete execution trace from a profiled run of the program, we build a internal representation that mirrors the evolving runtime data structures. The resulting series of representations can then be analyzed and visualized, and we show how to use our approach to help understand how programs use data structures, the precise effect of garbage collection, and to establish limits on static data structure analysis. A deep understanding of dynamic data structures is particularly important for modern, object-oriented languages that make extensive use of heapbased data structures.
View the paper (.pdf) BibTeX entry

Relative Factors in Performance Analysis of Java Virtual Machines back

Authors: Dayong Gu and Clark Verbrugge and Etienne M. Gagnon
Date: June 2006
VEE 2006, Ottawa, Canada

Abstract
Many new Java runtime optimizations report relatively small, single-digit performance improvements. On modern virtual and actual hardware, however, the performance impact of an optimization can be influenced by a variety of factors in the underlying systems. Using a case study of a new garbage collection optimization in two different Java virtual machines, we show the relative effects of issues that must be taken into consideration when claiming an improvement. We examine the specific and overall performance changes due to our optimization and show how unintended side-effects can contribute to, and distort the final assessment. Our experience shows that VM and hardware concerns can generate variances of up to 9.5% in whole program execution time. Consideration of these confounding effects is critical to a good, objective understanding of Java performance and optimization.
View the paper (.pdf) View the presentation slides (.pdf) BibTeX entry

Software Thread Level Speculation for the Java Language and Virtual Machine Environment back

Authors: Christopher J.F. Pickett and Clark Verbrugge
Date: October 2005
LCPC 2005, October 2005, Hawthorne, NY, USA

Abstract
Thread level speculation (TLS) has shown great promise as a strategy for fine to medium grain automatic parallelisation, and in a hardware context techniques to ensure correct TLS behaviour are now well established. Software and virtual machine TLS designs, however, require adherence to high level language semantics, and this can impose many additional constraints on TLS behaviour, as well as open up new opportunities to exploit language-specific information. We present a detailed design for a Java-specific, software TLS system that operates at the bytecode level, and fully addresses the problems and requirements imposed by the Java language and VM environment. Using SableSpMT, our research TLS framework, we provide experimental data on the corresponding costs and benefits; we find that exceptions, GC, and dynamic class loading have only a small impact, but that concurrency, native methods, and memory model concerns do play an important role, as does an appropriate, language-specific runtime TLS support system. Full consideration of language and execution semantics is critical to correct and efficient execution of high level TLS designs, and our work here provides a baseline for future Java or Java virtual machine implementations.
View the paper (.pdf) View the presentation slides (.pdf) BibTeX entry

SableSpMT: A Software Framework for Analysing Speculative Multithreading in Java back

Authors: Christopher J.F. Pickett and Clark Verbrugge
Date: August 2005
PASTE 2005, September 2005, Lisbon, Portugal

Abstract
Speculative multithreading (SpMT) is a promising optimisation technique for achieving faster execution of sequential programs on multiprocessor hardware. Analysis of and data acquisition from such systems is however difficult and complex, and is typically limited to a specific hardware design and simulation environment. We have implemented a flexible, software-based speculative multithreading architecture within the context of a full-featured Java virtual machine. We consider the entire Java language and provide a complete set of support features for speculative execution, including return value prediction. Using our system we are able to generate extensive dynamic analysis information, analyse the effects of runtime feedback, and determine the impact of incorporating static, offline information. Our approach allows for accurate analysis of Java SpMT on existing, commodity multiprocessor hardware, and provides a vehicle for further experimentation with speculative approaches and optimisations.
View the paper (.pdf) View the presentation slides (.pdf) BibTeX entry

(P)NFG: A Language and Runtime System for Structured Computer Narratives back

Authors: Christopher J.F. Pickett and Clark Verbrugge and Félix Martineau
Date: August 2005
GameOn'NA 2005, August 2005, Montréal, Québec, Canada

Abstract
Complex computer game narratives can suffer from logical consistency and playability problems if not carefully constructed, and current, state of the art design tools do little to help analysis or ensure good narrative properties. A formally-grounded system that allows for relatively easy design and analysis is therefore desireable. We present a language and an environment for expressing game narratives based on a structured form of Petri Net, the Narrative Flow Graph. Our "(P)NFG" system provides a simple, high level view of narrative programming that maps onto a low level representation suitable for expressing and analysing game properties. The (P)NFG framework is demonstrated experimentally by modelling narratives based on non-trivial interactive fiction games, and integrates with the NuSMV model checker. Our system provides a necessary component for systematic analysis of computer game narratives, and lays the foundation for all-around improvements to game quality.
View the paper (.pdf) BibTeX entry

A Study of Type Analysis for Speculative Method Inlining in a JIT Environment back

Authors: Feng Qian and Laurie Hendren
Date: April 2005
CC 2005

Abstract
Method inlining is one of most important optimizations to achieve a high performance JIT compiler in Java virtual machines. A type analysis allows the compiler directly inline monomorphic calls. At runtime, the compiler and type analysis have to handle dynamic class loading properly because the analysis result is only correct at compile time. Loading of new classes could invalidate previous analysis results and optimizations. Class hierarchy analysis (CHA) has been used successfully in JIT compilers for speculative inlining with various invalidation techniques as backup.
In this paper, we present the results of a limit study of method inlining using dynamic type analysis on a set of standard Java benchmarks. We developed a general type analysis framework for measure the effectiveness of several well-known type analysis, including CHA, RTA, XTA and VTA. Surprisingly, the simple dynamic CHA is nearly as good as an ideal type analysis for inlining virtual method calls. It leaves no room for other type analysis to improve. On the other hand, only reachability-based interprocedural type analysis (VTA) is able to capture the majority of monomorphic interface calls. We measured the runtime overhead of interprocedural type analysis in the JIT environment. To overcome the memory overhead of dynamic whole-program analysis, we outlined the design of a demand-driven inter-procedural type analysis for inlining hot interface calls.
View the paper (.ps)

Using inter-procedural side-effect information in JIT optimizations back

Authors: Anatole Le, Ondřej Lhoták and Laurie Hendren
Date: April 2005
CC 2005

Abstract
Inter-procedural analyses such as side-effect analysis can provide information useful for performing aggressive optimizations. We present a study of whether side-effect information improves performance in just-in-time (JIT) compilers, and if so, what level of analysis precision is needed.
We used Spark, the inter-procedural analysis component of the Soot Java analysis and optimization framework, to compute side-effect information and encode it in class files. We modified Jikes RVM, a research JIT, to make use of side-effect analysis in local common sub-expression elimination, heap SSA, redundant load elimination and loop-invariant code motion. On the SpecJVM98 benchmarks, we measured the static number of memory operations removed, the dynamic counts of memory reads eliminated, and the execution time.
Our results show that the use of side-effect analysis increases the number of static opportunities for load elimination by up to 98%, and reduces dynamic field read instructions by up to 27%. Side-effect information enabled speedups in the range of 1.08x to 1.20x for some benchmarks. Finally, among the different levels of precision of side-effect information, a simple side-effect analysis is usually sufficient to obtain most of these speedups.
View the paper (.ps) BibTeX entry

abc: An extensible AspectJ compiler back

Authors: Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Jennifer Lhoták, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam, and Julian Tibble
Date: March 2005
AOSD 2005

Abstract
Research in the design of aspect-oriented programming languages requires a workbench that facilitates easy experimentation with new language features and implementation techniques. In particular, new features for AspectJ have been proposed that require extensions in many dimensions: syntax, type checking and code generation, as well as data flow and control flow analyses.
The AspectBench Compiler (abc) is an implementation of such a workbench. The base version of abc implements the full AspectJ language. Its frontend is built, using the Polyglot framework, as a modular extension of the Java language. The use of Polyglot gives flexibility of syntax and type checking. The backend is built using the Soot framework, to give modular code generation and analyses.
In this paper, we outline the design of abc, focusing mostly on how the design supports extensibility. We then provide a general overview of how to use abc to implement an extension. Finally, we illustrate the extension mechanisms of abc through a number of small, but non-trivial, examples. abc is freely available under the GNU LGPL.
View the paper (.ps) BibTeX entry

Code Layout as a Source of Noise in JVM Performance back

Authors: Dayong Gu and Clark Verbrugge and Etienne Gagnon
Date: October 2004
CAMP04, October 2004, Vancouver, BC, Canada

Abstract
We describe the effect of a particular form of "noise" in benchmarking. We investigate the source of anomalous measurement data in a series of optimization strategies that attempt to improve runtime performance in the garbage collector of a Java virtual machine. The results of our experiments can be explained in terms of the difference in code layout, and hence instruction and data cache behaviour. We show that unintended changes in code layout due to code modifications as trivial as symbol renaming can contribute up to 2.7% of measured machine cycle cost, 20% in data cache misses, and 37% in instruction cache misses.
View the paper (.pdf) View the presentation slides (.ppt) BibTeX entry

Return Value Prediction in a Java Virtual Machine back

Authors: Christopher J.F. Pickett and Clark Verbrugge
Date: September 2004
VPW2, October 2004, Boston, MA, USA

Abstract
We present the design and implementation of return value prediction in SableVM, a Java Virtual Machine. We give detailed results for the full SPEC JVM98 benchmark suite, and compare our results with previous, more limited data. At the performance limit of existing last value, stride, 2-delta stride, parameter stride, and context (FCM) sub-predictors in a hybrid, we achieve an average accuracy of 72%. We describe and characterize a new table-based memoization predictor that complements these predictors nicely, yielding an increased average hybrid accuracy of 81%. VM level information about data widths provides a 35% reduction in space, and dynamic allocation and expansion of per-callsite hashtables allows for highly accurate prediction with an average per-benchmark requirement of 119 MB for the context predictor and 43 MB for the memoization predictor. As far as we know, the is the first implementation of non-trace-based return value prediction within a JVM.
View the paper (.pdf) View the presentation slides (.pdf) BibTeX entry

A Practical MHP Information Analysis for Concurrent Java Programs back

Authors: Lin Li and Clark Verbrugge
Date: September 2004
LCPC 2004, September 2004, West Lafayette, IN, USA

Abstract
In this paper we present an implementation of May Happen in Parallel analysis for Java that attempts to address some of the practical implementation concerns of the original work. We describe a design that incorporates techniques for aiding a feasible implementation and expanding the range of acceptable inputs. We provide experimental results showing the utility and impact of our approach and optimizations using a variety of concurrent benchmarks.
View the paper (.pdf) BibTeX entry

Jedd: A BDD-based Relational Extension of Java back

Authors: Ondřej Lhoták and Laurie Hendren
Date: April 2004
PLDI 2004, June 2004, Washington, D.C., USA

Abstract
In this paper we present Jedd, a language extension to Java that supports a convenient way of programming with Binary Decision Diagrams (BDDs). The Jedd language abstracts BDDs as database-style relations and operations on relations, and provides static type rules to ensure that relational operations are used correctly.
The paper provides a description of the Jedd language and reports on the design and implementation of the Jedd translator and associated runtime system. Of particular interest is the approach to assigning attributes from the high-level relations to physical domains in the underlying BDDs, which is done by expressing the constraints as a SAT problem and using a modern SAT solver to compute the solution. Further, a runtime system is defined that handles memory management issues and supports a browsable profiling tool for tuning the key BDD operations.
The motivation for designing Jedd was to support the development of whole program analyses based on BDDs, and we have used Jedd to express five key interrelated whole program analyses in our Soot compiler framework. We provide some examples of this application and discuss our experiences using Jedd.
View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Towards Dynamic Interprocedural Analysis in JVMs back

Authors: Feng Qian and Laurie Hendren
Date: May 2004
VM 2004, May 2004, San Jose, USA

Abstract
This paper presents a new, inexpensive, mechanism for constructing a complete call graph for Java programs at runtime, and provides an example of using the mechanism for implementing a dynamic reachability-based interprocedural analysis (IPA), namely dynamic XTA.
Reachability-based IPAs, such as points-to analysis and escape analysis, require a context-insensitive call graph of the analyzed program. Computing a call graph at runtime presents several challenges. First, the overhead must be low. Second, when implementing the mechanism for languages such as Java, both polymorphism and lazy class loading must be dealt with correctly and efficiently. We propose a new, low-cost, mechanism for constructing runtime call graphs in a JIT environment. The mechanism uses a profiling code stub to capture the first execution of a call edge, and adds at most one more instruction to repeated call edge invocations. Polymorphism and lazy class loading are handled transparently. The call graph is constructed incrementally, and it supports optimistic analysis and speculative optimizations with invalidations.
We also developed a dynamic, reachability-based type analysis, dynamic XTA, as an application of runtime call graphs. It also serves as an example of handling lazy class loading in dynamic IPAs.
The dynamic call graph construction algorithm and dynamic version of XTA have been implemented in Jikes RVM. We present empirical measurements of the overhead of call graph profiling and compare the characteristics of call graphs built using our profiling code stubs with conservative ones constructed by using dynamic class hierarchy analysis (CHA).
View the paper (.pdf) Download the paper (.ps.gz) Slides

Integrating the Soot compiler infrastructure into an IDE back

Authors: Jennifer Lhoták, Ondřej Lhoták, and Laurie Hendren
Date: April 2004
CC 2004, April 2004, Barcelona, Spain

Abstract
This paper presents the integration of Soot, a byte-code analysis and transformation framework, with an integrated development environment (IDE), Eclipse. Such an integrated toolkit is useful for both the compiler developer, to aid in understanding and debugging new analyses, and also for the end-user of the IDE, to aid in program understanding by exposing semantic information gathered by the advanced compiler analyses. The paper discusses these advantages and provides concrete examples of its usefulness.
There are several major challenges to overcome in developing the integrated toolkit, and the paper discusses three major challenges and the solutions to those challenges. An overview of Soot and the integrated toolkit is given, followed by a more detailed discussion of the fundamental components. The paper concludes with several illustrative examples of using the integrated toolkit along with a discussion of future plans and research.
View the paper (.pdf) Download the paper (.ps.gz) BibTeX entry

Visualizing Program Analysis with the Soot-Eclipse Plugin back

Authors: Jennifer Lhoták and Ondřej Lhoták
Date: April 2004
eTX (at ETAPS) 2004, March 2004, Barcelona, Spain

Abstract
Our integration of the Soot bytecode manipulation framework into the Eclipse IDE forms a powerful tool for graphically visualizing both the progress and output of program analyses. We demonstrate several examples of the visualizations that we have developed, and explain how they are useful for both compiler research and teaching.
View the paper (.pdf) BibTeX entry

Dynamic Metrics for Java back

Authors: Bruno Dufour, Karel Driesen, Laurie Hendren and Clark Verbrugge
Date: November 2003
OOPSLA 2003

Abstract
In order to perform meaningful experiments in optimizing compilation and run-time system design, researchers usually rely on a suite of benchmark programs of interest to the optimization technique under consideration. Programs are described as numeric, memory-intensive, concurrent, or object-oriented, based on a qualitative appraisal, in some cases with little justification. We believe it is beneficial to quantify the behaviour of programs with a concise and precisely defined set of metrics, in order to make these intuitive notions of program behaviour more concrete and subject to experimental validation. We therefore define and measure a set of unambiguous, dynamic, robust and architecture-independent metrics that can be used to categorize programs according to their dynamic behaviour in five areas: size, data structure, memory use, concurrency, and polymorphism. A framework computing some of these metrics for Java programs is presented along with specific results demonstrating how to use metric data to understand a program's behaviour, and both guide and evaluate compiler optimizations.
View the paper (.pdf) View the presentation slides BibTeX entry

EVolve, an Open Extensible Software Visualization Framework back

Authors: Qin Wang, Wei Wang, Rhodes Brown, Karel Driesen, Bruno Dufour, Laurie Hendren and Clark Verbrugge
Date: June 2003
ACM Symposium on Software Visualization 2003

Abstract
Existing visualization tools typically do not allow easy extension by new visualization techniques, and are often coupled with inflexible data input mechanisms. This paper presents EVolve, a flexible and extensible framework for visualizing program characteristics and behaviour. The framework is flexible in the sense that it can visualize many kinds of data, and it is extensible in the sense that it is quite straightforward to add new kinds of visualizations.
The overall architecture of the framwork consists of the core EVolve platform that communicates with data sources via a well defined data protocal and which communicates with visualization methods via a visualization protocol.
Given a data source, an end-user can use EVolve as a stand-alone tool by interactively creating, configuring and modifying visualizations. A variety of visualizations are provided in the current EVolve library, with features that facilitate the comparison of multiple views on the same execution data. We demonstrate EVolve in the context of visualizing execution behaviour of Java programs.
View the paper (.pdf)

Points-to Analysis using BDDs back

Authors: Marc Berndl, Ondřej Lhoták, Feng Qian, Laurie Hendren and Navindra Umanee
Date: April 2003
PLDI 2003, June 2003, San Diego, USA

Abstract
This paper reports on a new approach to solving a subset-based points-to analysis for Java using Binary Decision Diagrams (BDDs). In the model checking community, BDDs have been shown very effective for representing large sets and solving very large verification problems. Our work shows that BDDs can also be very effective for developing a points-to analysis that is simple to implement and that scales well, in both space and time, to large programs.
The paper first introduces BDDs and operations on BDDs using some simple points-to examples. Then, a complete subset-based points-to algorithm is presented, expressed completely using BDDs and BDD operations. This algorithm is then refined by finding appropriate variable orderings and by making the algorithm propagate sets incrementally, in order to arrive at a very efficient algorithm. Experimental results are given to justify the choice of variable ordering, to demonstrate the improvement due to incrementalization, and to compare the performance of the BDD-based solver to an efficient hand-coded graph-based solver. Finally, based on the results of the BDD-based solver, a variety of BDD-based queries are presented, including the points-to query.

View the paper (.pdf) Download the paper (.ps.gz) Presentation slides (.pdf) Presentation slides (.ps) BibTeX entry

Dynamic Profiling and Trace Cache Generation back

Authors: Marc Berndl and Laurie Hendren
Date: March 2003
CGO'03, March 2003, San Francisco, USA

Abstract
Dynamic program optimization is increasingly important for achieving good runtime performance. A key issue is how to select which code to optimize. One approach is to dynamically detect traces, long sequences of instructions spanning multiple methods, which are likely to execute to completion. Traces are easy to optimize and have been shown to be a good unit for optimization.
This paper reports on a new approach for dynamically detecting, creating and storing traces in a Java virtual machine. We first describe four important criteria for a successful trace strategy: good instruction stream coverage, low dispatch rate, cache stability, and optimizability of traces. We then present our approach based on branch correlation graphs. A branch correlation graph stores information about the correlation between pairs of branches, as weel as additional state information.
We present the complete design for an efficient implementation of the system, including a detailed discussion of the trace cache and profiling mechanisms. We have implemented an experimental framework to measure the traces generated by our approach in a direct-threaded Java VM(SableVM) and we presnet experimental results to show that the trace we generate meet the design criteria.
View the technical report (pdf)

Design, Implementation and Evaluation of Adaptive Recompilation with On-Stack Replacement back

Authors: Stephen J. Fink (IBM T.J. Watson) and Feng Qian
Date: March 2003
CGO'03, March 23-26, San Francisco, USA

Abstract
Modern virtual machines often maintain multiple compiled versions of a method. An on-stack replacement (OSR) mechanism enables a virtual machine to transfer execution between compiled versions, even while a method runs. Relying on this mechanism, the system can exploit powerful techniques to reduce compile time and code space, dynamically de-optimize code, and invalidate speculative optimizations.
This paper presents a new, simple, mostly compiler-independent mechanism to transfer execution into compiled code. Additionally, we present enhancements to an analytic model for recompilation to exploit OSR for more aggressive optimization. We have implemented these techniques in Jikes RVM and present a comprehensive evaluation, including a study of fully automatic, online, profile-driven deferred compilation.
Paper available upon requests.

CC2003: Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences back

Authors: Etienne Gagnon and Laurie Hendren
Date: January 2003
CC 2003, April 2003, Warsaw, Poland

Abstract
Inline-threaded interpretation is a recent technique that improves performance by eliminating dispatch overhead within basic blocks for interpreters written in C. The dynamic class loading, lazy class initialization, and multi-threading features of Java reduce the effectiveness of a straight-forward implementation of this technique within Java interpreters. In this paper, we introduce preparation sequences, a new technique that solves the particular challenge of effectively inline-threading Java. We have implemented our technique in the SableVM Java virtual machine, and our experimental results show that using our technique, inline-threaded interpretation of Java, on a set of benchmarks, achieves a speedup ranging from 1.20 to 2.41 over switch-based interpretation, and a speedup ranging from 1.15 to 2.14 over direct-threaded interpretation.
Download the paper (.ps.gz) View the paper (.pdf)

CC2003: Scaling Java Points-To Analysis using Spark back

Authors: Ondřej Lhoták and Laurie Hendren
Date: January 2003
CC 2003, April 2003, Warsaw, Poland

Abstract
Most points-to analysis research has been done on different systems by different groups, making it difficult to compare results, and to understand interactions between individual factors each group studied. Furthermore, points-to analysis for Java has been studied much less thoroughly than for C, and the tradeoffs appear very different. We introduce Spark, a flexible framework for experimenting with points-to analyses for Java. Spark supports equality- and subset-based analyses, variations in field sensitivity, respect for declared types, variations in call graph construction, off-line simplification, and several solving algorithms. Spark is composed of building blocks on which new analyses can be based. We demonstrate Spark in a substantial study of factors affecting precision and efficiency of subset-based points-to analyses, including interactions between these factors. Our results show that Spark is not only flexible and modular, but also offers superior time/space performance when compared to other points-to analysis implementations.

Download the paper (.ps.gz) Presentation slides (.pdf) Presentation slides (.ps) BibTeX entry

PASTE02-2: STEP: A Framework for the Efficient Encoding of General Trace Data back

Authors: Rhodes Brown, Karel Driesen, David Eng, Laurie Hendren, John Jorgensen, Clark Verbrugge and Qin Wang
Date: November 2002
PASTE 2002, Charleston, SC, USA

Abstract
Traditional tracing systems are often limited to recording a fixed set of basic program events. This limitation can frustrate an application or compiler developer who is trying to understand and characterize the complex behavior of software systems such as a Java program running on a Java Virtual Machine. In the past, many developers have resorted to specialized tracing systems that target a particular type of program event. This approach often results in an obscure and poorly documented encoding format which can limit the reuse and sharing of potentially valuable information. To address this problem, we present STEP, a system designed to provide profiler developers with a standard method for encoding general program trace data in a flexible and compact format. The system consists of a trace data definition language along with a compiler and an architecture that simplifies the client interface by encapsulating the details of encoding and interpretation.

Download the paper (.ps.gz) (.pdf) Link to STEP project Presentation slides (.pdf) BibTeX entry

PASTE02-1: Combining Static and Dynamic Data in Code Visualization back

Authors: David Eng
Date: November 2002
PASTE 2002, Charleston, SC, USA

Abstract
The task of developing, tuning, and debugging compiler optimizations is a difficult one which can be facilitated by software visualization. There are many characteristics of the code which must be considered when studying the kinds of optimizations which can be performed. Both static data collected at compile-time and dynamic runtime data can reveal opportunities for optimization and affect code transformations. In order to expose the behavior of such complex systems, visualizations should include as much information as possible and accommodate the different sources from which this information is acquired.
This paper presents a visualization framework designed to address these issues. The framework is based on a new, extensible language called JIL which provides a common format for encapsulating intermediate representations and associating them with compile-time and runtime data. We present new contributions which extend existing compiler and profiling frameworks, allowing them to export the intermediate languages, analysis results, and code metadata they collect as JIL documents. Visualization interfaces can then combine the JIL data from separate tools, exposing both static and dynamic characteristics of the underlying code. We present such an interface in the form of a new web-based visualizer, allowing JIL documents to be visualized online in a portable, customizable interface.

Download the paper (.ps.gz) Presentation slides (.ppt) BibTeX entry

JGI02: Run-time Evaluation of Opportunities for Object Inlining in Java back

Authors: Ondřej Lhoták and Laurie Hendren
Date: September, 2002
JGI'02, November 2002, Seattle, WA, USA

Abstract
Object-oriented languages, such as Java, encourage the use of many small objects linked together by field references, instead of a few monolithic structures. While this practice is beneficial from a program design perspective, it can slow down program execution by incurring many pointer indirections. One solution to this problem is object inlining: when the compiler can safely do so, it fuses small objects together, thus removing the reads/writes to the removed field, saving the memory needed to store the field and object header, and reducing the number of object allocations.
The objective of this paper is to measure the potential for object inlining by studying the run-time behaviour of a comprehensive set of Java programs. We study the traces of program executions in order to determine which fields behave like inlinable fields. Since we are using dynamic information instead of a static analysis, our results give an upper bound on what could be achieved via a static compiler-based approach. Our experimental results measure the potential improvements attainable with object inlining, including reductions in the numbers of field reads and writes, and reduced memory usage.
Our study shows that some Java programs can benefit significantly from object inlining, with close to a 10% speedup. Somewhat to our surprise, our study found one case, the db benchmark, where the most important inlinable field was the result of unusual program design, and fixing this small flaw led to both better performance and clearer program design. However, the opportunities for object inlining are highly dependent on the individual program being considered, and are in many cases very limited. Furthermore, fields that are inlinable also have properties that make them potential candidates for other optimizations such as removing redundant memory accesses. The memory savings possible through object inlining are moderate.

Download the paper (.ps.gz) View the presentation slides (.pdf) BibTeX entry

ISMM2002: An Adaptive, Region-based Allocator for Java back

Authors: Feng Qian and Laurie Hendren
Date: April 22, 2002
ISMM'02, June 2002, Berlin, Germany

Abstract
This paper introduces an adaptive, region-based allocator for Java. The basic idea is to allocate non-escaping objects in local regions, which are allocated and freed in conjunction with their associated stack frames. By releasing memory associated with these stack frames, the burden on the garbage collector is reduced, possibly resulting in fewer collections.
The novelty of our approach is that it does not require static escape analysis, programmer annotations, or special type systems. The approach is transparent to the Java programmer and relatively simple to add to an existing JVM. The system starts by assuming that all allocated objects are local to their stack region, and then catches escaping objects via write barriers. When an object is caught escaping, its associated allocation site is marked as a non-local site, so that subsequent allocations will be put directly in the global region. Thus, as execution proceeds, only those allocation sites that are likely to produce non-escaping objects are allocated to their local stack region.
The paper presents the overall idea, and then provides details of a specific design and implementation. In particular, we present a region-based allocator and the necessary modifications of the Jikes RVM baseline JIT and a copying collector. Our experimental study evaluates the idea using the SPEC JVM98 benchmarks, plus one other large benchmark. We show that a region-based allocator is a reasonable choice, that overheads can be kept low, and that the adaptive system is successful at finding local regions that contain no escaping objects.

Download the paper (.pdf) Download the paper (.ps.gz) Slides (.ppt) BibTex entry

CC2002: Decompiling Java Bytecode: Problems, Traps and Pitfalls back

Authors: Jerome Miecznikowski and Laurie Hendren
Date: February 2002
CC'02, April 2002, Grenoble France

Abstract
Java virtual machines execute Java bytecode instructions. Since this bytecode is a higher level representation than traditional object code, it is possible to decompile it back to Java source. Many such decompilers have been developed and the conventional wisdom is that decompiling Java bytecode is relatively simple. This may be true when decompiling bytecode produced directly from a specific compiler, most often Sun's javac compiler. In this case it is really a matter of inverting a known compilation strategy. However, there are many problems, traps and pitfalls when decompiling arbitrary verifiable Java bytecode. Such bytecode could be produced by other Java compilers, Java bytecode optimizers or Java bytecode obfuscators. Java bytecode can also be produced by compilers for other languages, including Haskell, Eiffel, ML, Ada and Fortran. These compilers often use very different code generation strategies from javac.
This paper outlines the problems and solutions we have found in our development of Dava, a decompiler for arbitrary Java bytecode. We first outline the problems in assigning types to variables and literals, and the problems due to expression evaluation on the Java stack. Then, we look at finding structured control flow with a particular emphasis on how to deal with Java exceptions and synchronized blocks. Throughout the paper we provide small examples which are not properly decompiled by commonly used decompilers.

Download the paper (.ps.gz) View presentation slides (.pdf)

CC2002: A Comprehensive Approach to Array Bounds Check Elimination for Java back

Authors: Feng Qian, Laurie Hendren and Clark Verbrugge
Date: February 2002
CC'02, April 2002, Grenoble France

Abstract
This paper reports on a comprehensive approach to eliminating array bounds checks in Java. Our approach is based upon three analyses. The first analysis is a flow-sensitive intraprocedural analysis called variable constraint analysis (VCA). This analysis builds a small constraint graph for each important point in a method, and then uses the information encoded in the graph to infer the relationship between array index expressions and the bounds of the array. Using VCA as the base analysis, we also show how two further analyses can improve the results of VCA. Array field analysis is applied on each class and provides information about some arrays stored in fields, while rectangular array analysis is an interprocedural analysis to approximate the shape of arrays, and is useful for finding rectangular (non-ragged) arrays.
We have implemented all three analyses using the Soot bytecode optimization/annotation framework and we transmit the results of the analysis to virtual machines using class file attributes. We have modified the Kaffe JIT, and IBM's High Performance Compiler for Java (HPCJ) to make use of these attributes, and we demonstrate significant speedups.

Download the paper (.ps.gz) Presentation slides (.pdf)

WCRE2001: Decompiling Java Using Staged Encapsulation back

Authors: Jerome Miecznikowski and Laurie Hendren
Date: October 2001

Abstract
This paper presents an approach to program structuring for use in decompiling Java bytecode to Java source. The structuring approach uses three intermediate representations: (1) a list of typed, aggregated statements with an associated exception table, (2) a control flow graph, and (3) a structure encapsulation tree.
The approach works in six distinct stages, with each stage focusing on a specific family of Java constructs, and each stage contributing more detail to the structure encapsulation tree. After completion of all stages the structure encapsulation tree contains enough information to allow a simple extraction of a structured Java program.
The approach targets general Java bytecode including bytecode that may be the result of front-ends for languages other than Java, and also bytecode that has been produced by a bytecode optimizer. Thus, the techniques have been designed to work for bytecode that may not exhibit the typical structured patterns of bytecode produced by a standard Java compiler.
The structuring techniques have been implemented as part of the Dava decompiler which has been built using the Soot framework.

Download the paper (.ps.gz) View presentation slides (.ppt)

CC2001: A Framework for Optimizing Java Using Attributes back

Authors: Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren and Clark Verbrugge
Date: November 2000

Abstract
This paper presents a framework for supporting the optimization of Java programs using attributes in Java class files. We show how class file attributes may be used to convey both optimization opportunities and profile information to a variety of Java virtual machines including ahead-of-time compilers and just-in-time compilers.
We present our work in the context of Soot, a framework that supports the analysis and transformation of Java bytecode (class files). We demonstrate the framework with attributes for elimination of array bounds and null pointer checks, and we provide experimental results for the Kaffe just-in-time compiler, and IBM's High Performance Compiler for Java ahead-of-time compiler.

View the paper (.pdf) Download the paper (.pdf.gz) Presentation slides (.pdf) BibTeX entry

JVM01: SableVM: A Research Framework for the Efficient Execution of Java Bytecode back

Winner of the "best paper that is primarily the work of a student" award.
Authors: Etienne Gagnon and Laurie Hendren
Date: April 2001
Conference: Java Virtual Machine Research and Technology Symposium (JVM '01)

Abstract
SableVM is an open-source virtual machine for Java intended as a research framework for efficient execution of Java bytecode. The framework is essentially composed of an extensible bytecode interpreter using state-of-the-art and innovative techniques. Written in the C programming language, and assuming minimal system dependencies, the interpreter emphasizes high-level techniques to support efficient execution.
In particular, we introduce a bidirectional layout for object instances that groups reference fields sequentially to allow efficient garbage collection. We also introduce a sparse interface virtual table layout that reduces the cost of interface method calls to that of normal virtual calls. Finally, we present a technique to improve thin locks by eliminating busy-wait in presence of contention.

View the paper (.ps)
Download the paper (.ps.gz) View the conference presentation slides (.ps)
Download the slides (.ps.gz)

OOPSLA2000: Practical Virtual Method Call Resolution for Java back

Authors: Vijay Sundaresan, Laurie Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, and Charles Godin
Date: October 2000

Abstract
This paper addresses the problem of resolving virtual method and interface calls in Java bytecode. The main focus is on a new practical technique that can be used to analyze large applications. Our fundamental design goal was to develop a technique that can be solved with only one iteration, and thus scales linearly with the size of the program, while at the same time providing more accurate results than two popular existing linear techniques, class hierarchy analysis and rapid type analysis.
We present two variations of our new technique, variable-type analysis and a coarser-grain version called declared-type analysis. Both of these analyses are inexpensive, easy to implement, and our experimental results show that they scale linearly in the size of the program.
We have implemented our new analyses using the Soot framework, and we report on empirical results for seven benchmarks. We have used our techniques to build accurate call graphs for complete applications (including libraries) and we show that compared to a conservative call graph built using class hierarchy analysis, our new variable-type analysis can remove a significant number of nodes (methods) and call edges. Further, our results show that we can improve upon the compression obtained using rapid type analysis.
We also provide dynamic measurements of monomorphic call sites, focusing on the benchmark code excluding libraries. We demonstrate that when considering only the benchmark code, both rapid type analysis and our new declared-type analysis do not add much precision over class hierarchy analysis. However, our finer-grained variable-type analysis does resolve significantly more call sites, particularly for programs with more complex uses of objects.

View the paper (.ps) Download the paper (.ps.gz) Presentation slides (.pdf) BibTeX entry

CASCON2000: A Framework for Optimizing Java Using Attributes back

Authors: Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren and Clark Verbrugge
Date: November 2000

Abstract
This paper presents a framework for supporting the optimization of Java programs using attributes in Java class files. We show how class file attributes may be used to convey both optimization opportunities and profile information to a variety of Java virtual machines including ahead-of-time compilers and just-in-time compilers.
We present our work in the context of Soot, a framework that supports the analysis and transformation of Java bytecode (class files). We demonstrate the framework with attributes for elimination of array bounds and null pointer checks, and we provide experimental results for the Kaffe just-in-time compiler, and IBM's High Performance Compiler for Java ahead-of-time compiler.

View the paper (.ps) Download the paper (.ps.gz) BibTeX entry

SAS2000: Efficient Inference of Static Types for Java Bytecode back

Authors: Etienne Gagnon, Laurie Hendren and Guillaume Marceau
Date: June-July 2000

Abstract
Even though Java bytecode has a significant amount of type information embedded in it, there are no explicit types for local variables. However, knowing types for local variables is very useful for both program optimization and decompilation. In this paper, we present an efficient and practical algorithm for inferring static types for local variables in a 3-address, stackless, representation of Java bytecode.
By decoupling the type inference problem from the low level bytecode representation, and abstracting it into a constraint system, we show that there exists verifiable bytecode that cannot be statically typed. Further, we show that, without transforming the program, the static typing problem is NP-hard. In order to develop a practical approach we have developed an algorithm that works efficiently for the usual cases and then applies efficient program transformations to simplify the hard cases.
Our solution is an multi-stage algorithm. In the first stage, we propose an efficient algorithm that infers static types for most bytecode found in practice. In case this stage fails, the second stage is applied. It consists of a simple and efficient variable splitting operation that renders most bytecode typeable using the algorithm of stage one. Finally, for completeness of the algorithm, we present a final stage that efficiently transforms and infers types for all remaining bytecode (such bytecode is likely to be a contrived example, and not code produced from a compiler).
We have implemented this algorithm in the Soot framework. Our experimental results show that all of the 17,000 methods used in our tests were successfully typed, 99.8% of those required only the first stage, 0.2% required the second stage, and no methods required the third stage.

View the paper (.ps) Download the paper (.ps.gz) BibTeX entry

CC2000: Optimizing Java Bytecode using the Soot Framework: Is it Feasible? back

Authors: Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan
Date: March-April 2000

Abstract

This paper presents Soot, a framework for optimizing JavaTM bytecode. The framework is implemented in Java and supports three intermediate representations for representing Java bytecode: Baf, a streamlined representation of Java's stack-based bytecode; Jimple, a typed three-address intermediate representation suitable for optimization; and Grimp, an aggregated version of Jimple.
Our approach to class file optimization is to first convert the stack-based bytecode into Jimple, a three-address form more amenable to traditional program optimization, and then convert the optimized Jimple back to bytecode.
In order to demonstrate that our approach is feasible, we present experimental results showing the effects of processing class files through our framework. In particular, we study the techniques necessary to effectively translate Jimple back to bytecode, without losing performance. Finally, we demonstrate that class file optimization can be quite effective by showing the results of some basic optimizations using our framework. Our experiments were done on ten benchmarks, including seven SPECjvm98 benchmarks, and were executed on five different Java virtual machine implementations.

View the paper (.ps)
Download the paper (.ps.gz) View the talk slides (.ps)
Download the talk slides (.ps.gz) BibTeX entry

CASCON99: Soot - a Java Optimization Framework back

Authors: Raja Vallée-Rai, Laurie Hendren, Vijay Sundaresan, Patrick Lam, Etienne Gagnon and Phong Co
Date: September 99

Abstract
This paper presents Soot, a framework for optimizing Java(tm) bytecode. The framework is implemented in Java and supports three intermediate representations for representing Java bytecode: Baf, a streamlined representation of bytecode which is simple to manipulate; Jimple, a typed 3-address intermediate representation suitable for optimization; and Grimp, an aggregated version of Jimple suitable for decompilation. We describe the motivation for each representation, and the salient points in translating from one representation to another.
In order to demonstrate the usefulness of the framework, we have implemented intraprocedural and whole program optimizations. To show that whole program bytecode optimization can give performance improvements, we provide experimental results for 12 large benchmarks, including 8 SPECjvm98 benchmarks running on JDK 1.2 for GNU/Linux(tm). These results show up to 8% improvement when the optimized bytecode is run using the interpreter and up to 21% when run using the JIT compiler.

View the paper (.ps) Download the paper (.ps.gz) BibTeX entry

TOOLS98: SableCC, an Object-Oriented Compiler Framework back

Authors: Etienne Gagnon and Laurie J. Hendren
Date: August 1998

Abstract
In this paper, we introduce SableCC, an object-oriented framework that generates compilers (and interpreters) in the Java programming language. This framework is based on two fundamental design decisions. Firstly, the framework uses object-oriented techniques to automatically build a strictly-typed abstract syntax tree that matches the grammar of the compiled language which simplifies debugging. Secondly, the framework generates tree-walker classes using an extended version of the visitor design pattern which enables the implementation of actions on the nodes of the abstract syntax tree using inheritance. These two design decisions lead to a tool that supports a shorter development cycle for constructing compilers.
To demonstrate the simplicity of the framework, we present all the steps of building an interpreter for a mini-BASIC language. This example could easily be modified to provide an embedded scripting language in an application. We also provide a brief description of larger systems that have been implemented the SableCC tool.
We conclude that the use of object-oriented techniques significantly reduces the length of the programmer written code, can shorten the development time and finally, makes the code easier to read and maintain.

View the paper (.ps) Download the paper (.ps.gz)

Last updated Fri Apr 11 23:53:33 EDT 2003.