Skip to content

Commit

Permalink
Fix identity comparison for boxed values
Browse files Browse the repository at this point in the history
Previously, the identity comparison for boxed values worked by
comparing the addresses of the boxes regardless of their contents. This
was leading to confusing behaviour, for example

  let a: U32 = 0
  let b: U32 = 0
  let boxa: Any = a
  let boxb: Any = b
  a is b // True.
  boxa is boxb // False.

Now, the identity of a boxed value (either a numeric primitive or a
tuple) depends on its content rather than its address. This affects the
`is`, `isnt` and `digestof` operators. Several general changes were
made to the code generator in order to make the identity comparison of
a boxed value as efficient as possible.

When comparing a possibly boxed value to an unboxed value, the type of
the box is checked first by looking at its type descriptor. If the
types of the values are the same, the boxed value is unboxed and the
comparison is done on the raw objects.

When comparing two possibly boxed values, the check depends on the
underlying types.

- First, the addresses are compared. If they are the same, the objects
  have the same identity regardless of whether they are boxed values or
  not
- If the addresses are different, we check whether the objects are two
  boxed values of the same type. This is done by checking the type IDs
  with a method described afterwards
  - If the values are two boxed numbers, the identity check is done
    with `memcmp` without unboxing
  - If the values are two boxed tuples, the identity check is done by
    calling a type-specific `__is` function through the vtable. This
    function unboxes the tuples and does the identity check
  - If the values aren't boxed or if they don't have the same type,
    they don't have the same identity.

When computing the digest of a possibly boxed value, we first check
whether the value is boxed by checking the type ID with the same method
as for the identity comparison. If the value is boxed, a type-specific
`__digestof` function is called through the vtable. This function
unboxes the value and computes the digest.

For both identity comparison and digest computing, the compiler
generates the checks depending on the possible subtypes of the base
types that it is looking at. This means that checking for the identity
of an object that cannot be boxed (because the interface type has no
numeric primitive/tuple as subtypes) doesn't involve checking whether
it is boxed.

This commit changes how type IDs are assigned. Previously, they were
assigned incrementally as types were reached without considering boxed
types. Now, type IDs are assigned as

- Object type IDs:  1, 3, 5, 7, 9, ...
- Numeric type IDs: 0, 4, 8, 12, 16, ...
- Tuple type IDs:   2, 6, 10, 14, 18, ...

There are two reasons for this

- Speed. Checking whether a value is boxed or not consists in checking
  if its type ID is a multiple of 2, and further discriminating between
  a boxed number and a boxed tuple consists in checking if it is a
  multiple of 4. In machine code, this boils down to checking whether
  the first or second bit of the type ID are set, respectively
- Forward compatibility. We won't have to change this method when
  dynamic code loading is added to Pony

The `__is` and `__digestof` functions introduce a change to the
reachability algorithm. These functions are automatically added to
numeric primitives and tuples, are generated by the compiler and are
marked as "internal". An internal method isn't added to subtypes and
since it doesn't exist before reachability analysis, it is added to
supertypes in order to be visible when generating identity checks.

Closes #1567.
  • Loading branch information
Benoit Vey committed Mar 21, 2017
1 parent f5dc43d commit 680f389
Show file tree
Hide file tree
Showing 16 changed files with 983 additions and 88 deletions.
21 changes: 21 additions & 0 deletions src/libponyc/codegen/codegen.c
Original file line number Diff line number Diff line change
Expand Up @@ -763,6 +763,26 @@ static void init_runtime(compile_t* c)
// i32 pony_personality_v0(...)
type = LLVMFunctionType(c->i32, NULL, 0, true);
c->personality = LLVMAddFunction(c->module, "pony_personality_v0", type);

// i32 memcmp(i8*, i8*, intptr)
params[0] = c->void_ptr;
params[1] = c->void_ptr;
params[2] = c->intptr;
type = LLVMFunctionType(c->i32, params, 3, false);
value = LLVMAddFunction(c->module, "memcmp", type);
#if PONY_LLVM >= 309
LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, nounwind_attr);
LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, readonly_attr);
LLVMAddAttributeAtIndex(value, 1, readonly_attr);
LLVMAddAttributeAtIndex(value, 2, readonly_attr);
#else
LLVMAddFunctionAttr(value, LLVMNoUnwindAttribute);
LLVMAddFunctionAttr(value, LLVMReadOnlyAttribute);
value = LLVMGetParam(value, 0);
LLVMAddAttribute(value, LLVMReadOnlyAttribute);
value = LLVMGetParam(value, 1);
LLVMAddAttribute(value, LLVMReadOnlyAttribute);
#endif
}

static bool init_module(compile_t* c, ast_t* program, pass_opt_t* opt)
Expand Down Expand Up @@ -1010,6 +1030,7 @@ bool codegen_gen_test(compile_t* c, ast_t* program, pass_opt_t* opt)

reach(c->reach, main_ast, c->str_create, NULL, opt);
reach(c->reach, env_ast, c->str__create, NULL, opt);
reach_done(c->reach, c->opt);

if(opt->limit == PASS_REACH)
return true;
Expand Down
1 change: 1 addition & 0 deletions src/libponyc/codegen/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ typedef struct compile_t
LLVMValueRef none_instance;
LLVMValueRef primitives_init;
LLVMValueRef primitives_final;
LLVMValueRef numeric_sizes;

LLVMTypeRef void_type;
LLVMTypeRef ibool;
Expand Down
24 changes: 21 additions & 3 deletions src/libponyc/codegen/gendesc.c
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ static LLVMValueRef make_vtable(compile_t* c, reach_type_t* t)
pony_assert(index != (uint32_t)-1);
pony_assert(vtable[index] == NULL);

if(t->primitive != NULL)
if((t->primitive != NULL) && !m->internal)
vtable[index] = make_unbox_function(c, t, m);
else
vtable[index] = make_desc_ptr(m->func, c->void_ptr);
Expand Down Expand Up @@ -372,7 +372,7 @@ void gendesc_type(compile_t* c, reach_type_t* t)
const char* desc_name = genname_descriptor(t->name);
uint32_t traits = trait_count(t, NULL, NULL);
uint32_t fields = 0;
uint32_t vtable_size = 0;
uint32_t vtable_size = t->vtable_size;

if(t->underlying == TK_TUPLETYPE)
fields = t->field_count;
Expand Down Expand Up @@ -444,10 +444,23 @@ void gendesc_init(compile_t* c, reach_type_t* t)

void gendesc_table(compile_t* c)
{
uint32_t len = c->reach->next_type_id;
uint32_t object_id_max = c->reach->object_type_count * 2;
uint32_t numeric_id_max = (c->reach->numeric_type_count * 4) + 1;
uint32_t tuple_id_max = (c->reach->tuple_type_count * 4) + 3;

uint32_t len = object_id_max;
if(len < numeric_id_max)
len = numeric_id_max;
if(len < tuple_id_max)
len = tuple_id_max;

size_t size = len * sizeof(LLVMValueRef);
LLVMValueRef* args = (LLVMValueRef*)ponyint_pool_alloc_size(size);

LLVMValueRef null = LLVMConstNull(c->descriptor_ptr);
for(size_t i = 0; i < len; i++)
args[i] = null;

reach_type_t* t;
size_t i = HASHMAP_BEGIN;

Expand Down Expand Up @@ -498,6 +511,11 @@ LLVMValueRef gendesc_fetch(compile_t* c, LLVMValueRef object)
return desc;
}

LLVMValueRef gendesc_typeid(compile_t* c, LLVMValueRef object)
{
return desc_field(c, gendesc_fetch(c, object), DESC_ID);
}

LLVMValueRef gendesc_trace(compile_t* c, LLVMValueRef object)
{
return desc_field(c, gendesc_fetch(c, object), DESC_TRACE);
Expand Down
2 changes: 2 additions & 0 deletions src/libponyc/codegen/gendesc.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ void gendesc_table(compile_t* c);

LLVMValueRef gendesc_fetch(compile_t* c, LLVMValueRef object);

LLVMValueRef gendesc_typeid(compile_t* c, LLVMValueRef object);

LLVMValueRef gendesc_trace(compile_t* c, LLVMValueRef object);

LLVMValueRef gendesc_dispatch(compile_t* c, LLVMValueRef object);
Expand Down
1 change: 1 addition & 0 deletions src/libponyc/codegen/genexe.c
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,7 @@ bool genexe(compile_t* c, ast_t* program)
fprintf(stderr, " Reachability\n");
reach(c->reach, main_ast, c->str_create, NULL, c->opt);
reach(c->reach, env_ast, c->str__create, NULL, c->opt);
reach_done(c->reach, c->opt);

if(c->opt->limit == PASS_REACH)
return true;
Expand Down
Loading

0 comments on commit 680f389

Please sign in to comment.