[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Jimple Canonical Form



Navindra Umanee wrote:
Venkatesh Prasad Ranganath <vranganath@cox.net> wrote:

Is there a canonical form of Jimple?  Say, can one be ensured that
if a class is represented as Jimple in two different runs in which
the same options were used (no analysis or transformations will be
applied) then the Jimple representation will be "alike"?  Given a
Stmt object can one be assured that the order of the children will
be same in two different runs?  The Stmt object will be at a
distance of n statements from the start of method in the method
bodies in both the runs.


Personally I wouldn't say there is a "canonical" form per se, in that
we don't have rigid guarantees on what the final Jimple will look like
although the structure is largely determined by the bytecode. (I'm
sure someone else can confirm/deny/elaborate.)

However, the original authors of Soot went to great pains to ensure
that the output is *deterministic*. So you can be reasonably
confident that you should get the same Jimple output, all else being
equal.


Perhaps it would help if you can give a concrete example of what you
are trying to do...


Sorry, I used the wrong term. I meant "deterministic". However, the question is to what extent is the creation of Jimple deterministic? As you said, all else being equal or same across the 2 runs, can the user assume the following?
1> Identical identifiers represent the same variable, i.e., $r0 in the both bodies of a method identify the same variable.
2> Given a Stmt object AssignStmt(Local($r1) = Local($r2) + Local($r3)), one can obtain the use/def boxes as a List via getUseDefBoxes() call and the order of the boxes in the list will be same for the same statement in different bodies of the same method.
3> The order of the statements in different bodies of a method are identical.


The main concern here is given a class file and a mechanism to generate Jimple from it, can these to together be used for baselining? A simple example is, if I have a Java to Jimple compiler how do I validate that the translation is correct? A simple approach is to compile the Java file into a class file and then construct a Jimple representation from it and use this as the baseline. If the translator can generate Jimple rep. with the same structure then that would be a reasonable validation. For this to happen one should be able to compare the Jimple bodies to start with. This is possible if there is a canonical representation or a deterministic class file to Jimple translation.

I hope I am making sense and this helps.

--

Venkatesh Prasad Ranganath,
Dept. Computing and Information Science,
Kansas State University, US.
web: http://www.cis.ksu.edu/~rvprasad