Functions | |
| void * | bitc_emit_procedure_object (void *stubP, void *envP) __attribute__((unused)) |
| Architecture-specific run time function that allocates and initialized procedure objects. | |
Inner procedures are hoisted to top-level by the implementation. The hoisted procedure may or may not expect to receive a closure record pointer depending on which identifiers were closed over. If it does not, then its behavior is exactly like that of a top-level procedure, and no further action is required.
When an inner procedure requires a closure record, it is rewritten during hoisting in such a way as to render the closure record type and the associated procedure argument explicit. For such procedures, the compiler allocates the required closure record on the heap at the procedure construction site and emits a call to bitc_emit_procedure_object() to generate a heap-allocated trampoline function. It then returns the address of the generated trampoline function as the address of the inner procedure.
Two optimizations on this are possible:
The current BitC code generator assumes that the closure pointer will be injected as the first parameter of the hoisted inner procedure. On architectures that place parameters in registers, this requires a non-trivial architecture dependent rewriting of the call frame. Rather than try to implement a generic foreign function interface library, which has stymied many other implementations, we let the C compiler do the heavy lifting here. For each hoisted procedure, we emit a helper procedure that looks like:
ReturnType
hoistedProcStub(Ty1 arg1, ... TyN argN) {
void *__theClosureEnv;
BITC_GET_CLOSURE_ENV(theClosureEnv);
return hoistedProc(theClosureEnv, arg1, ... argn);
}
With this helper in place, what we need from the trampoline code is an assembly stub that:
Finally, since closure records may be garbage collected, we add one further constraint, which is that the closure record pointer that is embedded in the heap-allocated trampoline should appear at a naturally aligned address for pointers according to the requirements of the target architecture.
While other implementations are possible, the preferred method to use in the stub trampoline is to use a register as a transfer buffer for the closure record pointer. This method is compatible with concurrent runtimes, and is generally faster than reading/writing a global location. Look for a register that is call-clobbered but not used for parameters. If return values are returned via registers on your target, the return value register is usually a good choice. Registers that are tied up in the stack frame fabrication, such as the stack pointer, frame pointer, or return register (a.k.a. link register) are probably not good choices.
Having found your register, there are two possible strategies for implementing your trampoline:
This method is preferred, but it is only feasible on architectures that can load a full-register-width immediate constant out of the instruction stream from a naturally aligned position. It is currently used on IA32 and X86_64. We expect that a variant will eventually be used on Coldfire. This method is preferred mainly because it does not require any references via D-space, which on most architectures will eliminate a load delay stall.
Aside: This approach also preserves the possibility of a copying collector that can blindly relocate procedure objects, but the current implementation does not use a copying collector. A copying implementation is possible for the stub-relative method (below), but the collector needs to be made aware of how to rewrite the stubs in that case.
To use this approach, write some assembly code that uses that instruction to load some recognizable constant value:
ldimm %rclosure,$0xdeadbeef ;; closure record to register
jmp $0xfeedface ;; jump to stub
On CISC architectures it may be necessary to insert NOP instructions before the immediate load in order to get your constant to be naturally aligned. Now use objdump --disssemble to determine what byte sequence results from these instructions. Figure out where the 0xdeadbeef constant ended up, and define the bitc_Procedure union accordingly such that the env.ptr field ends up overlaying that location. Your version of bitc_emit_procedure_object() will copy this byte sequence into the code leg of the union, replacing the $0xfeedface with the passed stub procedure address stubW and then store the passed environment pointer value envP into the env.ptr slot.
On architectures that do not have an easy way to load a naturally-aligned immediate value, there is generally some instruction sequence that synthesizes constant values, such as loadhi followed by add. Since we want the target address of the closure record to appear naturally aligned, the preferred solution is to synthesize a PC-relative load:
ldc %rclosure,$proc-address ;; load address of proc object
mov %rclosure,4(%rclosure) ;; fetch from env.ptr slot
jmp $0xfeedface ;; jump to stub .long 0xdeadbeef ;; closure record ptrThis method is currently used on SPARC and SPARC64. It is probably needed on MIPS as well. In this implementation, it becomes the responsibility of the garbage collector to patch the instructions emitted by the LDC pseudo-op if the procedure object is relocated.
As before, you can now use objdump --disssemble to determine what byte sequence results from these instructions. Figure out where the 0xdeadbeef constant ended up, and define the bitc_Procedure union accordingly such that the env.ptr field ends up overlaying that location. Also patch up the offset used in the mov instruction to replace the "4" with the proper offset of env.ptr. Your version of bitc_emit_procedure_object() will copy this byte sequence into the code leg of the union, replacing the $proc-address with the heap address of the procedure object, $0xfeedface with the passed stub procedure address stubW, and storing the passed environment pointer value envP into the env.ptr slot.
IA32 The IA32 implementation uses the Load-Immediate method, storing the closure record pointer into %eax. No other register is touched, and %eax is reserved on this architecture for use as the return register. Note that %eax is considered call-clobbered even when the called procedure returns void.
PowerPC The PowerPC implementation uses a position-independent variant of the Procedure-Object-Relative method, storing the closure record pointer into , which is defined as call-clobbered on this architecture. r2, which is defined as available on Mach-O systems, is also clobbered by this sequence.
The code sequence is slightly tricky. It takes advantage of the fact that the branch and link instruction (BL) writes the address of the instruction following the BL instruction into the link register, and uses BL to collect the address at which the procedure object is executing. It then exploits the delay slot of the BL to implement the actual load of the closure pointer.
Thanks to Paul Snively for assistance in defining and debugging this sequence.
SPARC The SPARC implementation uses a position-independent variant of the Procedure-Object-Relative method, storing the closure record pointer into %g1, which is defined as call-clobbered on this architecture. No other register is touched by this sequence.
The code sequence is slightly tricky. It takes advantage of the fact that the jump and link instruction (JMPL) can write the address of the JMPL instruction into a designated register (in this case %g1, and uses JMPL to collect the address at which the procedure object is executing. It then exploits the delay slot of the JMPL to implement the actual load of the closure pointer.
On its face this sequence should not be efficient, since it involves both a PC store and a D-space load. In practice it is the canonical sequence used for PC-relative load on this architecture, and is therefore a specifically optimized case in SPARC hardware implementations.
Thanks to Jonathan Adams for assistance in defining and debugging this sequence.
SPARC64 The SPARC64 implementation uses a position-independent variant of the Procedure-Object-Relative method, ultimately storing the closure record pointer into %g1. The %g4 register is clobbered while loading the address of the stub procedure. Both %g1 and %g4 are defined as call-clobbered on this architecture.
The code sequence uses the same delay slot trick that is used on the 32-bit SPARC implementation. SPARC64 implementations are multi-issue, so this sequence is only almost as expensive as it looks. On the other hand, SPARC64 does not implement any jump immediate or call immediate instruction with a 64-bit span, so this code sequence is specifically optimized in SPARC64 hardware implementations.
Thanks to Jonathan Adams for assistance in defining and debugging this sequence.
x86_64 The X86_64 implementation uses the Load-Immediate method, storing the closure record pointer into %rax. No other register is touched, and %rax is reserved on this architecture for use as the return register. Note that %rax is considered call-clobbered even when the called procedure returns void.
| void* bitc_emit_procedure_object | ( | void * | stubP, | |
| void * | envP | |||
| ) |
Architecture-specific run time function that allocates and initialized procedure objects.
Definition at line 52 of file i386/make_procedure_object.c.
References GC_ALLOC.
1.4.7