[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: this poorly optimised bytecode causes soot to crash



>>>>> "olhotak" == Ondrej Lhotak <olhotak@sable.mcgill.ca> writes:

    olhotak> Looks like a coffi bug.

    olhotak> dup_x2 has two different behaviours, depending on
    olhotak> what is on the top of the stack. If the top of the
    olhotak> stack is a 64-bit value, it copies both halves of it
    olhotak> down. But if it is a 32-bit value, the copies only
    olhotak> the top 32-bits of the stack down.

Your diagnosis is probably correct, but this explanation 
is a little misleading (or at least it misled me :-)).

The value at the _top_ of the stack must be a 32-bit value (or
"category 1 computational type" in the language of the VM spec)
for dup_x2 to pass verification. It's the value _underneath_ the
top of the stack which can be either two 32-bit values or one
64-bit value (a "category 2 computational type").  I.e., dup_x2
either turns

    32-bit A                       32-bit A
    32-bit B       into            32-bit B
    32-bit C                       32-bit C
       .                           32-bit A
       .                              .
       .                              .
                                      .
or it turns

    32-bit X                       32-bit X
    64-bit Y                       64-bit Y
       .                           32-bit X
       .                               .
       .                               .


The part of soot/coffi/CFG.java you quote is not necessarily
incorrect, since I notice that the code for pushing a long is:

         case ByteCode.LLOAD:
            typeStack = typeStack.push(LongType.v());
            typeStack = typeStack.push(Long2ndHalfType.v());
            break;

which suggests that the typeStack pushes separate types for each
half of a 64-bit value, which would be preserved by the code that
you cited.  Something more subtle is going wrong in the stack
simulation.

I tried to produce jasmin input corresponding to Rohan's example:

    .method public static test()Ljava/lang/Long;
    .limit stack 10
    .limit locals 0
       invokestatic TestCase/f()J
       new java/lang/Long
       dup_x2
       dup_x2
       pop
       invokespecial java/lang/Long/<init>(J)V
       areturn
    .end method

And here's the resulting Jimple that Coffi feeds as input to
phase jb.in. It looks like a straightforward, almost correct
translation of the stack code to equivalent locals:

    public static java.lang.Long test()
    {
        .unknown $stack0, $stack2, $stack1, $stack3, $stack4;

        $stack0 = TestCase.f();
        $stack2 = new java.lang.Long;
        $stack0 = $stack2;		;AAA
        $stack1 = $stack0;		;BBB
        $stack3 = $stack2;
        $stack1 = $stack3;		;CCC
        $stack2 = $stack1;		;DDD
        $stack4 = $stack3;
        nop;
        specialinvoke $stack1.<init>($stack2);
        return $stack0;
    }

The only problem is the order of the statements that I have
labeled.  AAA should come after BBB, not before.  Similarly, CCC
should follow DDD, rather than precede it.

I'll make an attempt to figure out where this happens in the
stack simulation code.

    olhotak> Are there other bytecodes with this kind of weird
    olhotak> behaviour that depends on the top of stack?

pop2, dup2, dup2_x1, dup2_x2 

These ones really do handle the _top_ of the stack as either two
32-bit values or one 64-bit value. dup2_x2 also adds dup_x2's
treatment of the next item(s) down as either two 32-bit values or
one 64-bit value.  

soot/coffi/Coffi.java doesn't include any explicit special
handling for any of them, but again, it probably does not need
to, since the two halves of 64-bit values are represented
separately on the type stack.

-- 
John Jorgensen		jjorge1@cs.mcgill.ca