Skip to content

Commit

Permalink
Guarantee generated jit code is acyclic
Browse files Browse the repository at this point in the history
This change lets Blink provide a much stronger guarantee that
asynchronous signals will actually be delivered.
jart committed Mar 13, 2023
1 parent b210154 commit b775993
Showing 9 changed files with 292 additions and 38 deletions.
69 changes: 52 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -10,7 +10,7 @@ This project contains two programs:
different operating systems and hardware architectures. It's designed to
do the same thing as the `qemu-x86_64` command, except that

1. Blink is 213kb in size (112kb with optional features disabled),
1. Blink is 217kb in size (115kb with optional features disabled),
whereas qemu-x86_64 is a 4mb binary.

2. Blink will run your Linux binaries on any POSIX system, whereas
@@ -728,6 +728,8 @@ JIT path formation be visualized. Blink currently only enables the JIT
for programs running in long mode (64-bit) but we may support JITing
16-bit programs in the future.

#### Lockless Hash Table for Generated Functions

Blink stores generated functions by virtual address in a multithreaded
lockless hash table. The hottest operation in the codebase is reading
from this hash table, using a function called `GetJitHook`. Since it'd
@@ -738,22 +740,55 @@ starts off at a reasonable size and grows gradually with the memory
requirements. This design is the primary reason Blink usually uses 40%
less peak resident memory than Qemu.

The amount of JIT memory Blink can use is currently limited by the ISA
displacement. On Aaarch64 that's roughly ~22mb of JIT memory. Blink is
able to tell when JIT memory is running out (>90% usage) and reacts by
forcing the oldest blocks of generated code to retire. Retired blocks
have all their hash entrypoints removed, but they can't be unmapped
since other threads might still be executing on them. Blink handles this
using a circular queue that reuses the least-recently retired blocks.

Blink doesn't use indirect branches in generated JIT code and instead
favors the use of near branches. Since generated JIT code will usually
call statically compiled functions, we need to ensure that that JIT
memory and Blink's executable image both reside at a closeby location in
the host virtual address space. Blink accomplishes that by defining a
gigantic static array that's part of the executable's `.bss` section.
This ensures Blink will always have access to JIT memory, and that it
won't interfere with any system's dynamic linker.
#### Acyclic Generated Code

While JIT paths always end at branching instructions, Blink will still
weave paths together. If a function is generated and the destination
address for its final branch instruction points to a location where
another JIT path has already been installed, then Blink will generate
code that lets the generated function tail call the second one, by
jumping into its function body past the prologue. It enables Blink to
avoid dropping back into the main interpreter loop as much as possible.

Blink keeps track of the connections between JIT functions, in order to
ensure that JIT'd code remains acyclic. If Blink detects that adding a
new connection would introduce a cycle, it will generate code that drops
back into the main intpreter loop instead. This is important because
Blink only checks for asynchronous signals and other status changes from
the main interpreter loop. By checking for cycles during path generation
Blink is able to avoid elevating that costly status checking code into
JIT generated code, so that the only branches JIT needs are the ones
that were specified by the guest executable.

#### Reliable JIT Memory Mapping

Blink uses a small memory model that favors near branch instructions.
Costly operations such as indirecting function calls through a generated
Procedure Linkage Table (PLT) are avoided. In order to do that, Blink
needs to ensure JIT memory is mapped at a virtual address that's nearby
its executable image; otherwise JIT code wouldn't be able to call into
normal code. However it's not possible to mmap() memory nearby the
executable in a portable way. Dynamic linkers oftentimes surround a
loaded executable with dynamic library images, and there's no portable
API for probing this region that won't risk destroying system mappings.

The trick Blink uses to solve this problem is creating a gigantic global
static array that's loaded as part of the `.bss` section. This memory is
then protected or remapped appropriately to turn it into memory that's
suitable for JIT.

#### Avoiding JIT OOM

Since Blink doesn't generate a PLT or use indirect branches, branching
is limited to the maximum displacement of the host instruction set. On
Aaarch64 that means Blink has about ~22mb of JIT memory per process.
Since large programs will oftentimes need more than that, Blink will
begin retiring executable pages once memory reaches a certain threshold
(>90% of blocks used). When a block is retired, all hooks associated
with all pages it touches are removed. However we still can't guarantee
that no thread is still currently executing on them. Blink solves that
using a freelist queue, where retired blocks are placed at the back so
it takes as long as possible before they're reused.

### Virtualization

149 changes: 141 additions & 8 deletions blink/jit.c
Original file line number Diff line number Diff line change
@@ -234,6 +234,14 @@ static struct JitJump *NewJitJump(void) {
return jj;
}

static struct JitPage *NewJitPage(void) {
struct JitPage *jj;
if ((jj = (struct JitPage *)calloc(1, sizeof(struct JitPage)))) {
dll_init(&jj->elem);
}
return jj;
}

static struct JitBlock *NewJitBlock(void) {
struct JitBlock *jb;
if ((jb = (struct JitBlock *)calloc(1, sizeof(struct JitBlock)))) {
@@ -264,6 +272,11 @@ static void FreeJitJump(struct JitJump *jj) {
free(jj);
}

static void FreeJitPage(struct JitPage *jp) {
free(jp->edges.p);
free(jp);
}

static void FreeJitFreed(struct JitFreed *jf) {
free(jf->data);
free(jf);
@@ -440,6 +453,7 @@ int InitJit(struct Jit *jit, uintptr_t opt_staging_function) {
jit->staging = EncodeJitFunc(opt_staging_function);
unassert(!pthread_mutex_init(&jit->lock, 0));
jit->hooks.n = n = RoundupTwoPow(kJitInitialHooks);
STATISTIC(jit_hash_capacity = MAX(jit_hash_capacity, n));
unassert(virts = (_Atomic(uintptr_t) *)calloc(n, sizeof(*virts)));
unassert(funcs = (_Atomic(int) *)calloc(n, sizeof(*funcs)));
atomic_store_explicit(&jit->hooks.virts, virts, memory_order_relaxed);
@@ -477,6 +491,10 @@ int DestroyJit(struct Jit *jit) {
dll_remove(&jit->jumps, e);
FreeJitJump(JITJUMP_CONTAINER(e));
}
while ((e = dll_first(jit->pages))) {
dll_remove(&jit->pages, e);
FreeJitPage(JITPAGE_CONTAINER(e));
}
UNLOCK(&jit->lock);
unassert(!pthread_mutex_destroy(&jit->lock));
free(jit->hooks.funcs);
@@ -627,6 +645,7 @@ static unsigned RehashJitHooks(struct Jit *jit) {
atomic_store_explicit(&jit->hooks.virts, virts2, memory_order_release);
atomic_store_explicit(&jit->hooks.funcs, funcs2, memory_order_relaxed);
atomic_store_explicit(&jit->hooks.n, n2, memory_order_release);
STATISTIC(jit_hash_capacity = MAX(jit_hash_capacity, n2));
EndUpdate(&jit->keygen, kgen);
// leak old table so failed reads won't segfault from free munmap
RetireJitHeap(jit, virts, n1 * sizeof(*virts));
@@ -635,11 +654,6 @@ static unsigned RehashJitHooks(struct Jit *jit) {
return n2;
}

// @assume jit->lock
static bool IncrementJitHookCount(struct Jit *jit) {
return true;
}

// @assume jit->lock
static bool SetJitHookUnlocked(struct Jit *jit, u64 virt, int cas,
intptr_t funcaddr) {
@@ -695,7 +709,10 @@ static bool SetJitHookUnlocked(struct Jit *jit, u64 virt, int cas,
STATISTIC(++jit_hooks_installed);
}
}
if (!key) ++jit->hooks.i;
if (!key) {
++jit->hooks.i;
STATISTIC(jit_hash_elements = MAX(jit_hash_elements, jit->hooks.i));
}
kgen = BeginUpdate(&jit->keygen);
atomic_store_explicit(virts + spot, virt, memory_order_release);
atomic_store_explicit(funcs + spot, func, memory_order_relaxed);
@@ -728,6 +745,7 @@ uintptr_t GetJitHook(struct Jit *jit, u64 virt, uintptr_t dflt) {
_Atomic(int) *funcs;
_Atomic(uintptr_t) *virts;
unsigned n, kgen, hash, spot, step;
COSTLY_STATISTIC(++jit_hash_lookups);
hash = HASH(virt);
do {
kgen = atomic_load_explicit(&jit->keygen, memory_order_relaxed);
@@ -745,12 +763,58 @@ uintptr_t GetJitHook(struct Jit *jit, u64 virt, uintptr_t dflt) {
if (!key) {
return dflt;
}
STATISTIC(++jit_collisions);
COSTLY_STATISTIC(++jit_hash_collisions);
}
} while (ShallNotPass(kgen, &jit->keygen));
return res;
}

// @assume jit->lock
static struct JitPage *GetJitPage(struct Jit *jit, i64 page) {
bool lru;
struct Dll *e;
struct JitPage *jp;
unassert(!(page & 4095));
lru = false;
for (e = dll_first(jit->pages); e; e = dll_next(jit->pages, e)) {
jp = JITPAGE_CONTAINER(e);
if (jp->page == page) {
if (!lru) {
STATISTIC(++jit_pages_hits_1);
} else {
STATISTIC(++jit_pages_hits_2);
dll_remove(&jit->pages, e);
dll_make_first(&jit->pages, e);
}
return jp;
}
lru = true;
}
return 0;
}

// @assume jit->lock
static bool IsJitPageCyclic(struct JitPage *jp, short visits[kJitDepth],
int depth, short dst) {
int i;
if (depth == kJitDepth) {
return true;
}
for (i = 0; i < depth; ++i) {
if (dst == visits[i]) {
return true;
}
}
visits[depth++] = dst;
for (i = 0; i < jp->edges.i; ++i) {
if (jp->edges.p[i].src == dst &&
IsJitPageCyclic(jp, visits, depth, jp->edges.p[i].dst)) {
return true;
}
}
return false;
}

// @assume jit->lock
static void ResetJitPageBlockStage(struct JitBlock *jb, struct JitStage *js,
i64 page) {
@@ -829,6 +893,15 @@ static void ResetJitPageHooks(struct Jit *jit, i64 virt, i64 end) {
STATISTIC(AVERAGE(jit_page_resets_average_hooks, found));
}

// @assume jit->lock
static void ResetJitPageObject(struct Jit *jit, i64 page) {
struct JitPage *jp;
if ((jp = GetJitPage(jit, page))) {
dll_remove(&jit->pages, &jp->elem);
FreeJitPage(jp);
}
}

// @assume jit->lock
static int ResetJitPageUnlocked(struct Jit *jit, i64 page) {
unsigned gen;
@@ -843,6 +916,7 @@ static int ResetJitPageUnlocked(struct Jit *jit, i64 page) {
gen = BeginUpdate(&jit->pagegen);
ResetJitPageHooks(jit, virt, end);
ResetJitPageBlocks(jit, page);
ResetJitPageObject(jit, page);
EndUpdate(&jit->pagegen, gen);
jit->lastreset = resetcode;
return 0;
@@ -886,6 +960,7 @@ static void ForceJitBlockToRetire(struct Jit *jit) {
page = jb->pages.p[i - 1];
ResetJitPageHooks(jit, page, page + 4096);
ResetJitPageBlocks(jit, page);
ResetJitPageObject(jit, page);
unassert(jb->pages.i < i);
}
break;
@@ -967,7 +1042,7 @@ static bool AppendJitBlockPage(struct JitBlock *jb, i64 virt) {
if (n2 >= 2) {
n2 += n2 >> 1;
} else {
n2 = 16;
n2 = 8;
}
if ((p2 = (i64 *)realloc(p2, n2 * sizeof(*p2)))) {
jb->pages.p = p2;
@@ -1102,6 +1177,7 @@ static void FixupJitJumps(struct Dll *list, uintptr_t addr) {
struct JitJump *jj;
for (e = dll_first(list); e; e = e2) {
STATISTIC(++jumps_applied);
STATISTIC(++path_connected);
e2 = dll_next(list, e);
jj = JITJUMP_CONTAINER(e);
u.q = 0;
@@ -1262,6 +1338,62 @@ bool RecordJitJump(struct JitBlock *jb, u64 virt, int addend) {
return true;
}

static bool RecordJitEdgeImpl(struct Jit *jit, i64 src, i64 dst) {
int n2;
i64 page;
struct JitPage *jp;
struct JitPageEdge *p2;
struct JitPageEdge edge;
short visits[kJitDepth];
unassert((src & -4096) == (dst & -4096));
// get object associtaed with this memory page
page = src & -4096;
if (!(jp = GetJitPage(jit, page))) {
if (!(jp = NewJitPage())) return false;
dll_make_first(&jit->pages, &jp->elem);
jp->page = page;
}
// determine if adding this edge would introduce a cycle
visits[0] = src & 4095;
if (IsJitPageCyclic(jp, visits, 1, dst & 4095)) {
STATISTIC(++jit_cycles_avoided);
return false;
}
// no cycles detected, so record the new edge in our dag
if (jp->edges.i == jp->edges.n) {
p2 = jp->edges.p;
n2 = jp->edges.n;
if (n2 > 1) {
n2 += n2 >> 1;
} else {
n2 = 8;
}
if ((p2 = (struct JitPageEdge *)realloc(p2, n2 * sizeof(*p2)))) {
jp->edges.p = p2;
jp->edges.n = n2;
} else {
return false;
}
}
edge.src = src & 4095;
edge.dst = dst & 4095;
jp->edges.p[jp->edges.i++] = edge;
STATISTIC(jit_max_edges_per_page = MAX(jit_max_edges_per_page, jp->edges.i));
unassert(src != dst);
return true;
}

/**
* Records JIT edge or returns false if it'd create a cycle.
*/
bool RecordJitEdge(struct Jit *jit, i64 src, i64 dst) {
bool res;
LOCK(&jit->lock);
res = RecordJitEdgeImpl(jit, src, dst);
UNLOCK(&jit->lock);
return res;
}

static void DiscardGeneratedJitCode(struct JitBlock *jb) {
jb->index = jb->start;
}
@@ -1371,6 +1503,7 @@ bool FinishJit(struct Jit *jit, struct JitBlock *jb) {
*/
bool AbandonJit(struct Jit *jit, struct JitBlock *jb) {
JIT_LOGF("abandoning jit path in block %p at %#" PRIx64, jb, jb->virt);
STATISTIC(++path_abandoned);
AbandonJitJumps(jb);
AbandonJitHook(jit, jb);
DiscardGeneratedJitCode(jb);
Loading
Oops, something went wrong.

0 comments on commit b775993

Please sign in to comment.