[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: projected changes to CFG classes
>
> this release will include an exception handler among
> the heads of the graph when the first Unit in the
> method may throw an exception caught by the handler.
vranganath> What do you mean by this? How can the control
vranganath> reach the handler before reaching the unit it is
vranganath> protecting? I remember from our earlier
vranganath> discussion that a control flow edge from the
vranganath> predecessor of the "excepting" statement to the
vranganath> handler needs to exist for various
vranganath> reasons. However, in this case, as there aren't
vranganath> any predecessors, there can be no edge. This
vranganath> does not imply that the successor of the head
vranganath> should be made a head as well because this means
vranganath> that the control flow graph has 2 entry points
vranganath> which is untrue!
If you want the method's single entry point, use
getBody().getUnits().getFirst(), not getHeads().
I don't deny that the possibility of multiple heads is weird, but
it's necessary for the same reason that the edges to handlers
from the predecessors of excepting instructions are necessary. To
ensure that dataflow analyses are correct, we want a CFG edge
from A to B to mean that A _completes_ execution before B starts
execution.
Consider this Java source:
static int i = 0;
static int sideShowExhibit() {
try {
i = 3/0;
} catch (ArithmeticException t) {
return i;
}
return 0;
}
which produces the following jimple:
static int sideShowExhibit()
{
java.lang.ArithmeticException r0, $r1;
int $i0, $i1;
label0:
$i0 = 3 / 0; //statement A
<TwoHeadedMethod: int i> = $i0;
label1:
goto label3;
label2:
$r1 := @caughtexception; //statement B
r0 = $r1;
$i1 = <TwoHeadedMethod: int i>;
return $i1;
label3:
return 0;
catch java.lang.ArithmeticException from label0 to label1
with label2;
}
If statement B were not a head of the graph, but instead there
was an edge to statement B from statement A, it would appear that
B could only be reached after A completes successfully. That
would lead a dataflow analysis to conclude $i0 is necessarily
assigned a new value before the handler is reached. In fact,
$i0's value is necessarily unchanged if the handler is reached.
This may be a good argument for treating exceptional edges in the
CFG differently from unexceptional edges, but I'm not in a
position to alter every existing Soot analysis to make the
distinction (imagine the yelps about breaking compatibility if we
tried that!).
vranganath> This may break other analysis relying on the fact
vranganath> that shape of the control flow paths represented
vranganath> in the UnitGraph represent the paths in the
vranganath> method.
The problem is what, precisely, you mean by a "path". From this
remark, and our previous discussions, it seems that for your
purposes the meaning of an edge from A to B should be that B is
the next statement to execute after A _starts_ (rather that
completes) execution. You could use the combination of
ExceptionalUnitGraph.getUnexceptionalSuccsOf() and
ExceptionalUnitGraph.getExceptionDests() to provide a set of
edges consistent with that interpretation.
You could also make a good case that Soot's CFGs should follow
the convention of starting each method with a dummy entry node
that does not correspond to any program instruction. Then the
graph could have a single nominal entry point whose successors
were all the Units which might complete first. But UnitGraph
nodes have always been defined to be Units in the underlying IR
and I'm not sure what the consequences would be of introducing
new placeholder instructions now (even adding nops would require
ensuring that none of the program transformations removes an initial
nops from a method until after you're through generating graphs
from it).
If it's any consolation, you have to work pretty hard to
manufacture a two-headed example, so they're unlikely to come up
in real code. It's no accident that my example takes no
parameters and has only constant values on the right hand side of
its first instruction; any parameters would result in jimple
identity statements appearing before the label marking the start
of the protected code, while putting a reference to a static
field on the rhs would produce a jimple load of the static into a
local ahead of the division.