If you're doing non-semantics-preserving transformations, such as using
Soot to add code for instrumenting, then do you really want the
information to reflect the code you've added so far, or the original
code?
I think this is exactly the point... by viewing transformations/optimizations
as modifications on hidden compiler state, it is very difficult for a user
of Soot to address the question above.
As Patrick mentions, it is also going to be very difficult to build a
multithreaded compiler...
Personally, I've always seen compilers as dataflow problems, even though
nobody ever builds them that way.
For example, think of any optimization as being a single input, single
output box
that takes a scene and produces a new scene. If all I'm interested in is
the output of the compiler,
then maybe I can do the transformation "in place". If I'm interested in
visualizing what the optimizer did, then
maybe I need another box that takes two scenes and compares them....
Note that this also makes the compiler alot more modular: I can reorder
compilation phases
*and I have control over the dataflow between them*.
I can also easily manage concurrent transformations because I
know what depends on what.. Note that this implies that it is important to
pull as much data
as possible out of the scene, since having a central data structure is
going to increase data
dependance and reduce available concurrency.
Of course the problem becomes one of efficiency: how do I manage two
different versions of a big data structure
without making two copies? Sounds like a database problem to me.