rustgc_paper.ltx

%&rustgc_paper_preamble
\endofdump

\begin{document}

\begin{abstract}
\noindent Rust is a non-Garbage Collected (GCed) language, but the lack of GC
makes expressing data-structures that require shared ownership awkward,
inefficient, or both. In this paper we explore a new design for, and implementation of, GC in
Rust, called \ourgc. Unlike previous
approaches to GC in Rust, \ourgc maps existing Rust destructors to
finalizers: this makes GC in Rust natural to use but introduces surprising
soundness, performance, and ergonomic problems. \ourgc provides solutions for
each of these problems.
\end{abstract}

\maketitle


\section{Introduction}

\begin{figure}[t]
\lstinputlisting[
  language=Rust,
  firstline=6,
  basicstyle=\footnotesize\ttfamily,
  caption={An \ourgc example, showing \lstinline{Gc<T>}
    and destructors as finalizers. We create a type \lstinline{GcNode} which
    models a graph: it stores an 8 bit integer value and a possibly-null
    reference (via Rust's standard \lstinline{Option} type) to a neighbouring node
    (line 1). We add a normal Rust destructor which \ourgc is able to use as a
    finalizer when \lstinline{GcNode} is used inside \lstinline{Gc<T>} (line 2).
    Inside \lstinline{main} we create the first GCed node in the graph (line 5).
    We use Rust's normal \lstinline{RefCell} type to allow the node to be mutated
    (using the \lstinline{RefCell::borrow\_mut} method which dynamically checks
    for mutation that would undermine Rust's static rules)
    and a cycle created directly back to itself (line 6). We then create a second cyclic graph (lines 7
    and 8), immediately assigning it to the \lstinline{gc1} variable (line 9):
    this copies, rather than moves, the \lstinline{Gc<T>}.
    This cause the first cyclic graph \lstinline{GcNode\{value: 1, ..\}}
    to no longer be reachable, so after forcing a collection (line 10) that node
    can be collected. Its finalizer is then scheduled to be run, causing
    \lstinline{drop 1} to be printed out at a later point; when it has completed the GC
    heap memory can be reclaimed. The print statement outputs \lstinline{2 2} (line
    11).},
  label={fig:first_example}
]{listings/first_example.rs}
\end{figure}

Amongst the ways one can classify programming languages are whether they
are Garbage Collected (GCed) or not: GCed languages enable implicit memory management;
non-GCed languages require explicit memory management (e.g~\lstinline{C}'s \lstinline{malloc} /
\lstinline{free} functions). Rust's use of affine types~\citep[p.~5]{pierce04advanced}
and ownership does not
fit within this classification: it is not GCed but it has implicit memory management.
Most portions of most Rust programs are as
succinct as a GCed equivalent, but ownership is too inflexible to express
\emph{shared ownership} for data-structures that require multiple owners
(e.g.~doubly linked lists).
Workarounds such as reference counting impose an extra burden on the programmer,
make mistakes more likely, and often come with a performance penalty.

In an attempt to avoid such problems, there are now a number of GCs for Rust
(e.g.~\cite{manish15rustgc, coblenz21bronze, gcarena, boa, shifgrethor}). Most
introduce a user-visible type \lstinline{Gc<T>} which takes a value $v$
of type \lstinline{T} and moves $v$ to the `GC heap'. The \lstinline{Gc<T>}
value itself is a wrapper around a pointer to $v$ on the GC heap.
\lstinline{Gc<T>} can be \emph{cloned} (i.e.~duplicated) and
\emph{dereferenced} to a value of type \lstinline{&T} (i.e.~a type-safe pointer) at will by the user. When no
\lstinline{Gc<T>} wrappers pointing to $v$
can be found, indirectly or directly, from the
program's \emph{roots} (e.g.~variables on the stack),
then the GC heap memory for $v$ can be reclaimed.

It has proven hard to find a satisfying design and implementation for a GC for
Rust, as perhaps suggested by the number of attempts to do so.
We identify two fundamental challenges
for GC for Rust: how to give \lstinline{Gc<T>} an idiomatic and complete
API; and how to make \emph{finalizers} (i.e.~the code that is run just before a
value is collected by the GC) safe, performant, and ergonomic. We show that
\emph{conservative} GC (i.e.~treating each reachable machine word as
a potential pointer) is necessary and sufficient to solve the API challenge,
but the finalization challenge is more involved.

Normal Rust code uses \emph{destructors} (i.e.~code which is run just before a
value is reclaimed by Rust's implicit memory management) extensively. Although
finalizers and destructors may seem to be synonyms, existing GCs for Rust
cannot reuse destructors as finalizers:
the latter must be manually implemented for each type that needs it.
Unfortunately, even this is trickier than it appears:
it is not possible to implement a finalizer for
\lstinline{Gc<T>} if \lstinline{T} is an external library; some parts of
destructors are automatically created by the Rust compiler, but
hand-written finalizers must duplicate those parts manually; and users
can accidentally cause a type's finalizer to be run more than once. In
short, finalization in existing GCs for Rust is unpalatable.

GCs for Rust are not alone in requiring manually written finalizers.
In a close cousin to our work,
a GC proposal for C++, the reuse of destructors as finalizers was ruled out due to
seemingly insoluble problems~\cite[p.~32]{boehm09garbage}, which we divide
into four categories:
(1) finalizers are prohibitively slower than destructors;
(2) finalizers can be run prematurely;
(3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks;
(4) some safe destructors are not safe finalizers.
All are, at least to some degree, classical GC problems; all are exacerbated
in some way by Rust; and none, with the partial exception of (2), has
existing solutions.

In this paper we show that it is possible to reuse most
Rust destructors as finalizers in a satisfying way. We introduce novel solutions to the
long-standing problems this implies by making use of some of Rust's unusual
static guarantees. We thus gain a better GC for
Rust and solutions to open GC problems. Our solutions, in order, are:
(1) \emph{finalizer elision} statically optimises away finalizers if the
underlying destructor duplicates the GC's work;
(2) \emph{premature finalizer prevention} automatically inserts barriers to prevent
the GC from being `tricked' into collecting values before they are dead;
(3) we run finalizers on a separate thread; and
(4) \emph{finalizer safety analysis}
extends Rust's static analyses to reject programs whose destructors are not
provably safe to be used as finalizers.

These solutions are implemented as part of \ourgc, a new GC for Rust:
an example of its use is shown in \cref{fig:first_example}.
Although \ourgc is not production ready, it has good enough performance
(across a number of benchmarks, its performance is \jake{geomean over whole suite}) and
other polish (e.g.~good quality error messages) that we believe it shows
a plausible path forwards for those who may wish to follow it. Furthermore,
although \ourgc is necessarily tied to Rust, we suspect that most of the
techniques in this paper, particularly those related to finalization, will
generalise to other ownership-based languages.

This paper is divided into four main parts: background (\cref{sec:background});
\ourgc's basic design (\cref{sec:alloy_design}); destructor and finalizer challenges
and solutions (\cref{sec:destructor challenges,sec:elision,sec:premature_finalize_prevention,sec:fsa}); and evaluation
(\cref{sec:evaluation}). The first three parts have the challenge that our work
straddles two areas that seem almost mutually exclusive: GC and Rust. We have
tried to provide sufficient material for readers expert in one of these
areas to gain adequate familiarity with the other, without boring either, but we explicitly encourage
readers to skip material they are already comfortable with.


\section{Background}
\label{sec:background}

\subsection{Does Rust need a GC?}

\begin{figure}[t]
\lstinputlisting[
  language=Rust,
  firstline=5,
  basicstyle=\footnotesize\ttfamily,
  caption={
    A version of~\cref{fig:first_example} using Rust's standard reference
    counting type \lstinline{Rc<T>}. To avoid memory leaks we use \emph{weak}
    references between nodes (line 1). We again create two cyclic graphs (lines
    6--9) using \lstinline{Rc::downgrade} to create weak references (lines 7 and
    0). Since \lstinline{Rc<T>} is not copyable, we must use a manual
    \lstinline{clone} call to have both the \lstinline{rc1} and \lstinline{rc2}
    variables point to the same cyclic graph (line 10). Accessing a neighbour
    node becomes a delicate dance requiring upgrading the weak reference (line 11).
    The need to downgrade \lstinline{Rc<T>} to \lstinline{Weak<T>} and upgrade
    (which may fail, hence the \lstinline{unwrap}) back to \lstinline{Rc<T>}
    creates significant extra complexity relative to~\cref{fig:first_example}: compare
    line 11 in \cref{fig:first_example} to (the much more complex) lines 10-12
    in this \lstinline{Rc} example.
  },
  label={fig:rc_example}
]{listings/rc_example.rs}
\end{figure}

Rust uses affine types and \emph{ownership}
to statically guarantee that: a value has a
single owner (e.g.~a variable); an owner can \emph{move} (i.e.~permanently
transfer the ownership of) a value to another owner; and
when a value's owner goes out of scope, the value's destructor
is run and its backing memory reclaimed. An owner can pass \emph{references} to a value
to other code, subject to the following static restrictions: there can be
multiple immutable references (`\lstinline{&}') to a value or a single
mutable reference (`\lstinline{&mut}'); and references cannot outlast the owner.
These rules allow many Rust programs to be as succinct as their equivalents
in GCed languages. This suggests that the search for a good GC for Rust may be
intellectually stimulating but of little practical value.

However, there are many programs which need to express data structures
which do not fit into the restrictions of affine types and
ownership. These are often described as `cyclic data-structures', though
that sometimes gives the incorrect impression that only data structures
such as doubly linked lists are of interest. In reality, programs as diverse as
interpreters for dynamically typed languages have need to use such types.
In this paper we use the more abstract term `shared ownership', which includes,
but is not limited to, cyclic data-structures.

A common way of working around these limitations
is the reference counting type \lstinline{Rc<T>} in Rust's
standard library. For many data-structures, this is a reasonable
solution, but some forms of shared ownership require
juggling strong and weak counts. This complicates the program
(see~\cref{fig:rc_example}) and makes
it easy for values to live shorter or longer than intended.

Another workaround is to store values in a vector and use
integer indices into that vector. Such indices are morally closer to
machine pointers than normal Rust references: the indices can become
stale, dangle, or may never have been valid in the first place. The programmer
must also manually deal with issues such as detecting unused values,
compaction, and so on. In other words, the programmer
ends up writing a partial GC themselves. A variant of this idea are
\emph{arenas}, which gradually accumulate multiple values but free all of them in one go: values
can no longer be reclaimed too early, though arenas tend to unnecessarily
increase the lifetime of objects.

A type-based approach is
\lstinline{GhostCell}~\cite{yanovski21ghostcell}, which uses \emph{branding} to
statically ensure that only one part of a program can access a
shared ownership data-structure at a time.
However, this implicitly prevents multiple owners (e.g.~in different threads)
from reading or mutating different parts of the structure.

Although it is easily overlooked, some workarounds (e.g.~\lstinline{Rc<T>})
rely on using \emph{unsafe} Rust (i.e.~parts of the language, often involving
pointers, that are not fully statically checked by the compiler). Pragmatically,
we assume that widely used code, even if technically unsafe, has been pored
over sufficiently that it is trustworthy in practise. However,
`new' solutions that a programmer implements using
unsafe Rust are unlikely to immediately reach the same level of trustworthiness.

While we do not believe that every Rust program would be improved by GC, the
variety of workarounds already present in Rust code, and the difficultly
of creating new ones, suggests that a substantial subset might benefit from GC.


\subsection{GC terminology}

GC is a venerable field and has accumulated terminology that can seem
unfamiliar or unintuitive. We mostly use the same terminology
as~\cite{jones23garbage}, the major parts of which we define here.

A program which uses GC is split between the \emph{mutator} (the user's program) and
the \emph{collector} (the GC itself). At any given point in time, a particular thread is either
running as a mutator or a collector. In our context, all threads
run as a collector at least sometimes, with some threads always running as a collector.
Tracing and reclamation is performed during a \emph{collection} phase. Our
collections always \emph{stop-the-world}, where all threads running
mutator code are paused while collection occurs.

A \emph{tracing} GC is one that scans memory looking
for reachable objects from a program's roots: objects that are not reachable
from the roots can then be \emph{reclaimed}. In contrast, a reference counting GC does
not scan memory, and thus cannot free objects that form a cycle. As is common
in most GC literature, henceforth we use `GC' to mean `tracing GC' (and not
reference counting or the like) unless we explicitly say otherwise.

We refer to memory which is allocated via \lstinline{Gc<T>} as being on
the \emph{GC heap}. We use the term `GC value' to refer both to the pointer wrapped in a
\lstinline{Gc<T>} and the underlying value on the GC heap, even though multiple
pointers / wrappers can refer to a single value on the GC heap, unless doing so
would lead to ambiguity.

We use `\ourgc' to refer to the combination of: our extension to the Rust
language; our modifications to the \lstinline{rustc} compiler; and our
integration of the Boehm-Demers-Weiser GC (\boehm) into the runtime of programs
compiled with our modified \lstinline{rustc}.


\section{\ourgc: Design and Implementation}
\label{sec:alloy_design}

In this section we outline \ourgc's basic design and implementation choices --
the rest of the paper then goes into detail on the more advanced aspects.


\subsection{Basic Design}
\label{sec:basic design}

\ourgc provides a \lstinline{Gc<T>} type that exposes an API modelled on the
\lstinline{Rc<T>} type from Rust's standard library, because
\lstinline{Rc<T>}: is conceptually
similar to \lstinline{Gc<T>}; widely used in Rust code, and its API
familiar; and that API reflects long-term experience about what Rust programmers
need.

When a user calls \lstinline{Gc::new(v)}, the value \lstinline{v} is
moved to the GC heap: the \lstinline{Gc<T>} value returned to the user is a
simple wrapper around a pointer to \lstinline{v}'s new address. The same underlying GCed value
may thus have multiple, partly overlapping, references active at any point.
To avoid undermining
Rust's ownership system, this means that dereferencing a \lstinline{Gc<T>} must
produce an immutable (i.e.~`\lstinline{&}') reference to the underlying value.
If the user wishes to mutate the underlying value, they must use other Rust
types that enable \emph{interior mutability} (e.g.~\lstinline{RefCell<T>} or
\lstinline{Mutex<T>}).

One feature that \ourgc explicitly adopts is \lstinline{Rc<T>}'s
ability to be transformed into a raw pointer (\lstinline{into_raw}) and
back (\lstinline{from_raw}). Crucially, transforming a \lstinline{Rc<T>} / \lstinline{Gc<T>}
into a pointer consumes the high-level wrapper: the
resulting pointer is thus not constrained by the wrapper's lifetime.
Though many programmers do not directly
encounter these functions, they are a vital link between Rust's high and
low-level features (e.g.~being used for the C Foreign Function Interface
(FFI) alongside the ability to cast Rust references to pointer types and back).
We believe that a viable GC for Rust must include this same
ability, but doing so has a profound impact because Rust allows raw pointers to be
converted to the integer type \lstinline{usize} and back\footnote{Although
it is outside the scope of this paper, it would
be preferable for Rust to have different types for `data width' and
`address'. Modern C, for example, captures this difference with the \lstinline{size_t} and
\lstinline{uintptr_t} types respectively. Rust now has a provenance lint to
nudge users in this general direction, but the \lstinline{as}
keyword still allows arbitrary conversions.}.

\label{conservative_gc}
Having acknowledged that pointers can be disguised as integers, it is then
inevitable that \ourgc must be a conservative GC: if a machine word's integer
value, when considered as a pointer, falls within a GCed block of memory,
then that block itself is considered reachable (and is transitively scanned).
Since a conservative GC cannot know if a word is really a pointer, or just happens to be a sequence of
bits that look like a valid pointer, this over-approximates the
\emph{live set} (i.e.~the blocks that the GC will not reclaim). Typically
the false detection rate is too low to be a worry (see e.g.~a Java study which measures
it at under 0.01\% of live objects~\cite{shahriyar14fast}).

Conservative GC occupies a grey zone in programming language semantics: in most
languages, and most compiler's internal semantics, conservative GC is, formally
speaking, unsound; and furthermore some languages (including Rust) allow
arbitrary `bit fiddling' on pointers that temporarily
obscures the address they are referring to. Despite this, conservative GC is widely used,
including in the two most widespread web browsers: Chrome uses it in its Blink
rendering engine~\citep{ager13oilpan} and Safari uses it in its JavaScript VM
JavaScriptCore~\citep{pizlo17riptide}. Even in 2024, we lack good alternatives
to conservative GC: there is no cross-platform API for precise GC; and while
some compilers such as LLVM provide some support for GC
features~\cite{llvm14statepoints}, we have found them incomplete and buggy.
Despite the potential soundness worries, conservative GC thus remains a widely
used technique.

\label{gc_is_copyable}
Conservative GC enables \ourgc to make a useful ergonomic improvement over
most other GCs for Rust whose \lstinline{Gc<T>} is only \emph{cloneable}. Such types can be duplicated, but doing
so requires executing arbitrary user code. To make the possible run-time cost of this clear, Rust has
no direct syntax for cloning: users must explicitly call \lstinline{Rc::clone(&v)}
to duplicate a value \lstinline{v}. In contrast, since \ourgc's \lstinline{Gc<T>} is just a wrapper around a pointer it
is not just cloneable but also \emph{copyable}: duplication only requires copying
bytes (i.e.~no arbitrary user code need be executed). Copying is implied by assignment
(i.e. \lstinline{w = v}),
reducing the need for a function call that cloning requires entirely\footnote{The lengthier
syntax \lstinline{y = Gc::clone(&v)} is available, since every copyable type is
also cloneable.}. This is not just a syntactic convenience but also reflects an underlying
semantic difference: duplicating a \lstinline{Gc<T>} in \ourgc is is a cheaper and simpler operation
than most other GCs for Rust which which tend to rely in part on reference counting.


\subsection{Basic Implementation}

The most visible aspect of \ourgc is its fork, and extension of, the standard
Rust compiler \rustc. We forked \rustc~\rustcversion and have
added or changed approximately 3,150 \laurie{needs updating? also split apart the LoC for tests} Lines of Code (LoC). Users must opt-in
to GC by enabling the allocator and assigning it to an arbitrarily-named variable:

\begin{lstlisting}[language=Rust, numbers=none, basicstyle=\footnotesize\ttfamily]
#![feature(gc)] use std::gc::GcAllocator;
#[global_allocator] static A: GcAllocator = GcAllocator
\end{lstlisting}

\ourgc uses \boehm as the underlying conservative GC, because it is the
most widely ported conservative GC we know of. Although we adapted \boehm
for \ourgc, we expect that our techniques will apply to other conservative GCs with
relative ease.

We made the following changes to \boehm. First, we disabled \boehm's parallel collector
because, for reasons we don't fully understand, it worsens \ourgc's
performance. Second, \boehm cannot scan pointers stored in thread locals
because these are platform dependent. Fortunately, \rustc uses LLVM's
thread local storage implementation, which stores such pointers in the
\lstinline{PT_TLS} segment of the ELF binary: we modified \boehm to scan
this ELF segment during each collection. Third,
\boehm dynamically intercepts thread creation calls so that it can
then can scan their stacks, but (for bootstrapping
reasons) is unable to do so in our context: we explicitly changed \ourgc
to register new threads with \boehm.

We made one conceptually larger -- though only 60LoC -- change to \boehm,
making it run all finalizers on a
separate thread (see~\cref{sec:general_challenges,sec:fsa_finalizer_thread} for
motivation). We lazily create a thread dedicated to this once the collector has
first identified objects that need finalizing. The finalizer thread then runs
until the end of the program waiting for further work;
the collector notifies the finalizer thread of additional work via a condition
variable.


\section{Destructors and Finalizers}
\label{sec:destructor challenges}

In many GCed languages, `destructor' and `finalizer' are used as synonyms, as
both terms refer to code run when a value's lifetime has ended. In existing GCs
for Rust, these two terms refer to different hierarchies of code (i.e.~destructors
and finalizers are fundamentally different). As we will see later, in \ourgc, finalizers can
often be thought of as a subset of destructors. In this section we pick apart
these differences, before describing the challenges of using destructors as
finalizers.

When a value in Rust is \emph{dropped} (i.e.~the value's owner went out of lexical
scope) its destructor is automatically run. Rust destructors are formed of two
parts, run in the following order: a user-defined \emph{drop method}; and
automatically inserted \emph{drop glue}. Drop methods are optional; users
can provide one for a type by implementing the \lstinline{Drop} trait's \lstinline{drop}
method. Drop glue recursively calls destructors of contained types (e.g.~fields
in a \lstinline{struct}). Although it is common usage to conflate `destructor' in
Rust with drop methods, drop glue is an integral part of a Rust destructor:
we therefore use `destructor' as the umbrella term for both drop methods and drop glue.

Rust's destructors enable a style of programming that originated in C++ called RAII (Resource
Acquisition Is Initialization)~\cite[Section~14.4]{stroustrup97c++}: when a
value is dropped, the underlying resources it possesses (e.g.~file handles or heap memory)
are released. Drop methods are not only used frequently, but programmers regularly
implement their own drop methods.

Finalizers are an important concept for most non-Rust GCs, subsuming the notion of destructors.
However, existing GCs for Rust are forced to have separate notions of destructors and finalizers.
Where the former have the \lstinline{Drop} trait, the later typically have
a \lstinline{Finalize} trait. If a user type needs to be finalized then
the user must provide an implementation of the \lstinline{Finalize} trait for that type.
However, doing so introduces a number of problems: (1) external libraries are
unlikely to provide finalizers and so placing their types in a \lstinline{Gc<T>}
tends to break RAII expectations; (2) Rust's \emph{orphan
rule}~\cite[Section~6.12]{rustlangref} prevents one implementing traits for
types defined in external libraries (i.e.~unless a library's types were
designed to support \lstinline{Gc<T>}, those types cannot be directly GCed);
(3) one cannot automatically replicate drop glue for finalizers; and (4) one
cannot replicate \rustc's refusal to allow calls to the equivalent of
\lstinline{Drop::drop}.

Programmers can work around problems \#1 and \#2 in various ways. For example,
they can wrap external library types in \emph{newtypes} (zero-cost wrappers)
and implement finalizers on those instead~\cite[Section~19.3]{klabnik18rust}.
Doing so is tedious but not conceptually difficult.

Problem \#3 has partial solutions: for example, ~\cite{manish15rustgc} uses the
\lstinline{Trace} macro to generate \emph{finalizer glue} (the finalizer equivalent of drop glue) for
\lstinline{struct} fields. This runs into an unsolvable variant of problem \#2:
types in external libraries will not implement this trait and cannot be
recursively scanned for finalizer glue.

Problem \#4 is impossible to solve in Rust as-is. One cannot define a function
that can never be called --- what use would such a function have? It might seem
tempting to have the \lstinline{finalize} method take ownership of the value,
but \lstinline{Drop::drop} does not do so because that would not allow drop
glue to be run afterwards.


\subsection{General Challenges When Using Destructors as Finalizers}
\label{sec:general_challenges}

We have stated as our aim that \ourgc should use destructors as finalizers.
Above we explained some Rust-specific challenges --- but there are several
non-Rust-specific challenges too! Fundamentally, finalizers and destructors
have different, and sometimes incompatible, properties. The best
guide to these differences, and the resulting problems, is~\cite{boehm03destructors},
supplemented by later work by some of the same authors on support
for GC in the C++ specification~\cite{boehm09garbage}\footnote{These features
were added to the C+11 specification, but do not seem to have been implemented by
compilers. C++23 removed these features.}.

An obvious difference between destructors and finalizers is when both
are run. Where C++ and Rust define
precisely when a destructor will be run\footnote{Mostly. Rust's `temporary
lifetime extension' delays destruction, but for how long is currently
unspecified.}, finalizers run at an unspecified point in time. Although
we often assume this happens after the last reference to a GCed value is lost,
finalizers can run \emph{prematurely}, that is before the equivalent
destructor~\cite[section~3.4]{boehm03destructors}.

A less obvious difference relates to where destructors and finalizers are run.
Destructors run in the same thread as the last owner of a value.
However, running finalizers in the same thread as the last owner of the value
can lead to race conditions and deadlocks if a finalize tries to access a resource that the mutator
expects to have exclusive access too~\cite[section~3.3]{boehm03destructors}.
When such problems affect destructors in normal Rust code, it is the clear result of programmer error, since they should
have taken into account the predictable execution point of destructors. However, since
finalizers have no such predictable execution point, there is no way for finalizers
to safely access shared resources if they are run on the same thread.
The only way to avoid this to run
finalizers on a non-mutator thread: however, not all Rust types / destructors
are safe to run on another thread.

Finally, finalization and cycles are not happy bedfellows: finalizers
can reference other GCed values that are partly, or wholly, `finalized' and may
even have had their backing memory reused. A related, but
distinct, problem is the ability of finalizers to \emph{resurrect} values by
copying the reference passed to the finalizer and storing it somewhere.


\subsection{The Challenge of Finalizers for \ourgc}

At this point we hope to have convinced the reader of two general points: a
viable GC for Rust needs to be able to use existing destructors as finalizers
whenever possible; and that finalizers, even in existing GCs, cause
various problems.

Over time, finalizers have come to be viewed with increasing suspicion. Java,
for example, has deprecated, and intends eventually removing, per-type
finalizers: instead it has introduced deliberately less flexible per-object `cleaners', whose API
prevents problems such as object resurrection~\cite{goetz21deprecated}. It
is important to differentiate such mechanisms from the \lstinline{Finalize} trait that many existing GCs for
Rust force users to manually implement: cleaners impose restrictions
to make finalizers less problematic where existing \lstinline{Finalize} traits
mostly do not.

Our desire that \ourgc should use existing Rust destructors as finalizers whenever
possible may seem out of reach. As alluded to above, in a proposal for GC
for C++, the desire was ruled out as the
problems were thought insoluble~\cite[p.~32]{boehm09garbage}. We break these
problems down into four:
(1) finalizers are prohibitively slower than destructors;
(2) finalizers can be run prematurely;
(3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks;
(4) some safe destructors are not safe finalizers.

Fortunately for us, Rust's unusual static guarantees, suitably expanded by
\ourgc, allow us to address each problem in novel, satisfying, ways. In the following
section, we tackle these problems in the order above, noting that we tackle problems
\#1 and \#2 separately, and \#3 and \#4 together.


\section{Finalizer Elision}
\label{sec:elision}

Finalizers are much slower to run than destructors --- running every Rust destructor
as a GC finalizer in \ourgc causes a significant slowdown \laurie{this is a case where a `worse of' might make sense, and i think it's 4x or similar?}
in our benchmark suite. In this section we show how to sidestep much of this overhead.

A variety of factors contribute to the finalizer performance overhead, including:
a queue of finalizers
must be maintained, whereas destructors can be run immediately; and finalizers run
some time after the last access of a value making cache misses more likely.
Most of these factors are inherent to any GC and
our experience of using and working on \boehm -- a mature, widely used GC -- does
not suggest that it is missing optimisations which would overcome all of this overhead.

Instead, whenever possible, \ourgc \emph{elides} finalizers so that they do not need to be run at all.
We are able to do this because many Rust destructors
only do work that a GC would do anyway and have no meaningful effect when run as finalizers.

Consider the standard Rust type \lstinline{Box<T>} which heap allocates space for a value;
when a \lstinline{Box<T>} value is dropped, the heap allocation will be freed.
We can then make two observations. First,
\lstinline{Box<T>}'s drop method solely consists of a \lstinline{deallocate}
call. Second, while we informally say
that \lstinline{Box<T>} allocates on the `heap' and \lstinline{Gc<T>} allocates
on the `GC heap', all allocations in \ourgc are made through \boehm and stored in
a single heap.

When used as a finalizer,
\lstinline{Box<T>}'s drop method is thus unneeded, as the underlying memory will
naturally be freed by \boehm anyway\footnote{Indeed, since the drop method only
calls \lstinline{deallocate}, any resulting finalizer would cause the underlying
allocation to live longer than if the finalizer was not present!}
This means that there is no need to run a finalizer for a type such as
\lstinline{Gc<Box<u8>>} and we can statically elide it. However,
just because a `top-level''s drop method might be unnecessary does not
mean that transitively reachable drop methods via drop glue are unnecessary. For example,
\lstinline{Gc<Box<Arc<u8>>} (where \lstinline{Arc<T>} is the thread-safe
version of \lstinline{Rc<T>}) requires a finalizer because
\lstinline{Arc<T>}'s drop method needs to decrement a reference count.


\subsection{Implementing finalizer elision}

\begin{algorithm}[t]
\caption{Finalizer Elision}
\label{alg:elision}
\begin{algorithmic}
\Function{RequiresFinalizer}{$T$}
    \If {\Call{Impls}{$T$, $Drop$} \AND \NOT \Call{Impls}{$T$, \textit{DropMethodFinalizerElidable}}}
        \State \Return{true}
    \EndIf
  \ForEach {\textit{field} $\in T$}
    \If{\Call{RequiresFinalizer}{\textit{field}}}
        \State \Return{true}
    \EndIf
\EndFor
\State \Return{false}
\EndFunction
\end{algorithmic}
\end{algorithm}

\label{needs_finalizer_intrinsic}
Finalizer elision statically determines which type's destructors do
not require corresponding finalizers and elides them. It does so conservatively,
and deals correctly with drop glue.

\label{dropmethodfinalizerelidable}
The high-level \cref{alg:elision} says that any type which implements
the \lstinline{Drop} trait requires finalization unless it also implements the
new \lstinline{DropMethodFinalizerElidable} \emph{marker trait} (i.e.~a
trait without methods). This trait can be
used by a programmer to signify that a type's drop method need
not be called if the type is placed inside a \lstinline{Gc<T>}. The `Drop'
part of the trait name is deliberate (i.e.~it is not a simplification of
`destructor'): it allows the programmer to reason about a type locally.
If the type has a transitively reachable field
whose type implements the \lstinline{Drop} trait but not the
\lstinline{DropMethodFinalizerElidable} trait, then then the top-level type
still requires finalization. This not only checks drop glue, but allows the
programmer to reason about type parameters without knowing
the concrete types passed to those parameters.

Even though neither normal Rust destructors or \ourgc finalizers are guaranteed
to run, a program whose destructors or finalizers never run would probably not
be usable (leaking resources such as memory, deadlocking, and so on). We
therefore make \lstinline{DropMethodFinalizerElidable} an unsafe
trait, because implementing it inappropriately is likely to lead to undesired
-- though not incorrect! -- behaviour at run-time. To implement it one therefore
needs to opt in to unsafe Rust:

\begin{lstrustsmall}
unsafe impl<T> DropMethodFinalizerElidable for Box<T> {}
\end{lstrustsmall}

\ourgc modifies the standard Rust library to implement
\lstinline{DropMethodFinalizerElidable} on the following types: \lstinline{Box<T>},
\lstinline{Vec<T>}, \lstinline{RawVec<T>}, \lstinline{VecDeque<T>},
\lstinline{LinkedList<T>}, \lstinline{BTreeMap<K, V>}, \lstinline{BTreeSet<T>},
\lstinline{HashMap<K, V>}, \lstinline{HashSet<T>},
\lstinline{RawTable<K, V>}\footnote{This is contained in the separate \lstinline{hashbrown} crate,
which is then included in Rust's standard library.}, and \lstinline{BinaryHeap<T>}. Fortunately,
not only are these types' drop methods compatible with \lstinline{DropMethodFinalizerElidable},
but they are extensively used in real Rust code: they enable significant numbers of
finalizers to be elided.

\begin{figure}[t]
  \begin{lstlisting}[
    language=Rust,
    numbers=none,
    basicstyle=\footnotesize\ttfamily,
    label={listing:elision_in_rustc},
    caption={
      A simplified view of how finalizers are elided inside \ourgc. The new compiler intrinsic
      \lstinline{needs_finalizer} returns true if a finalizer is required for a
      type. The \lstinline{Gc<T>} type uses this intrinsic to ensure that the
      value is registered as requiring a finalizer. Because \lstinline{needs_finalizer}
      is a \lstinline{const} function, with optimisations turned on, it is inlined
      and the branch optimised away. In other words, the seemingly dynamic, branching code
      in \lstinline{Gc::new} turns into static, branchless code.
    }]
impl<T> Gc<T> {
  pub fn new(value: T) -> Self {
    if needs_finalizer::<T>() { Gc<T>::new_with_finalizer(value) }
    else { Gc<T>::new_without_finalizer(value) }
    ...
  }
}
\end{lstlisting}
\end{figure}

\cref{listing:elision_in_rustc} shows how we use \ourgc's \texttt{const} compiler intrinsinc
\lstinline{needs_finalizer} to turn \cref{alg:elision} into reality.
\lstinline{Gc::new} uses this intrinsic to decide
whether a finalizer must be registered or not. Although this looks like
a dynamic check, in fully optimised builds it turns into a static
check: \lstinline{needs_finalizer} is
evaluated at compile-time and its result can be inlined into \lstinline{Gc::new};
this then allows the associated conditional to be removed too. In other words --
compiler optimisations allowing -- the `does this specific type require a
finalizer?' checks have no run-time cost at all.


\section{Premature Finalizer Prevention}
\label{sec:premature_finalize_prevention}

\begin{figure}[tb]
\begin{lstlisting}[
  language=Rust,
  basicstyle=\footnotesize\ttfamily,
  caption={An example of possible premature
    finalization. We create a new struct \lstinline{S}
    (line 1) with a drop method that sets the wrapped integer to zero (line 2). In the
    main method, we move an instance of the struct into a
    \lstinline{Box<T>}, which we then move into a \lstinline{Gc<T>} (line 4). We
    obtain a Rust reference to the inner integer (line 5), which at
    run-time will be a pointer to the \lstinline{Box<T>}. At this
    point, the compiler can determine that the \lstinline{Gc<T>} is no longer
    used and overwrite \lstinline{root}'s pointer (which may be in a register). If
    a collection then occurs, a finalizer can run \lstinline{S}'s drop method,
    causing the program to print `0' instead of the expected `1' (line 7).},
    label={fig:premature_finalization}]
struct S { value: u8 }
impl Drop for S { fn drop(&mut self) { self.value = 0; } }

fn main()  {
  let root = Gc::new(Box::new(S{ value: 1 }));
  let inner: &u8 = &**root.value;
  force_gc();
  println!("{}", *inner);
}
\end{lstlisting}
\end{figure}

Most of us assume that finalizers are always run later than the
equivalent destructor would have run, but they can sometimes run
before~\cite[section~3.4]{boehm03destructors}, undermining soundness.
Such premature finalization is also possible in \ourgc as described thus far
(see~\cref{fig:premature_finalization}). In this section we
show how to prevent, and then optimise, premature finalization.

There are two aspects to premature finalization. First, language
specifications often do not define, or do not precisely define, when the earliest point that a value can
be finalized is. While this means that, formally, there is no `premature' finalization,
it seems unlikely that language designers anticipated some of the resulting
implementation surprises (see e.g.~this example in
Java~\cite{shipilev20local}). Second, most compiler optimisations are
`GC unaware', so optimisations such as scalar
replacement can change the point in a program when GCed values appear to be
finalizable.

In our context, it is trivial to define premature finalization as a (dynamic) finalizer
for a \lstinline{Gc<T>} value running before the (static) \lstinline{Gc<T>} owner
has gone out of scope. Similar to the high-level proposal mooted
in~\cite[Solution~1]{boehm07optimization}, we must ensure that the dynamic
lifetime of a reference derived from a \lstinline{Gc<T>} matches or
exceeds the lifetime of the \lstinline{Gc<T>} itself.

Our solution relies on adjusting \lstinline{Gc<T>}'s drop method to \emph{keep
alive} a GCed value for at least the static lifetime of the \lstinline{Gc<T>} itself. In
other words, we ensure that the conservative GC will always see a pointer
to a GCed value while the corresponding \lstinline{Gc<T>} is in-scope.

However, there is a major problem to overcome: copyable types such as
\lstinline{Gc<T>} are forbidden from having destructors. The fundamental
challenge we have to solve is that each copied value will have a destructor
called on it, which has the potential for any shared underlying value to be destructed
more than once. \ourgc explicitly allows \lstinline{Gc<T>} -- but no other
copyable type -- to have a destructor, but to ensure it doesn't cause surprises
in the face of arbitrary numbers of copies, the destructor must be idempotent.
Our task is made easier because \lstinline{Gc<T>} naturally has no drop glue from
Rust's perspective: \lstinline{Gc<T>} consists of a field with a pointer type,
and such types are opaque from a destruction perspective.

We therefore only need to make sure that \lstinline{Gc<T>}'s drop method
is idempotent. Fortunately, this is sufficient for our purposes: we want the drop
method to inhibit finalization but that does not require run-time side effects.
To achieve this, we use a memory fence in the form of a keep-alive
function. Simplifying somewhat, memory fences can be thought of as having both
compile-time and run-time effects:
they prevent compilers from reordering computations around a given address;
and the manner in which this is done also also ensures that CPUs
cannot reorder computations relevant to that address.
Keep-alive functions are highly platform (compiler, operating
system, and CPU) specific: we use \boehm's \lstinline{GC_reachable_here}
function\footnote{This is actually a macro, and our fork of \boehm provides a
function \lstinline{GC_keep_alive} to expose the macro to Rust.}
which abstracts over these details. \lstinline{Gc<T>}'s
implementation of the \lstinline{Drop} trait therefore looks as follows:

\begin{lstlisting}[language=Rust, basicstyle=\footnotesize\ttfamily]
impl<T: ?Sized> Drop for Gc<T> {
   fn drop(&mut self) { unsafe { bdwgc::GC_keep_alive(self as *mut u8) }; }
}
\end{lstlisting}


\subsection{Optimising premature finalizer prevention}
\label{sec:optimising_prem}

The drop method we add to \lstinline{Gc<T>} fully prevents premature
finalization. It also naturally solves a performance problem with the suggested solution
for C++ in~\cite[Solution~1]{boehm07optimization}, which requires keeping alive
all pointers, no matter their type, for their full scope. By definition, our
solution only keeps alive \lstinline{Gc<T>} values: the compiler is free to
optimise values of other types as it so wishes. However, our approach still
imposes a performance overhead on code that uses
\lstinline{Gc<T>}: in the worst case in our benchmark suite, the overhead is
\somrsastbarriers{bnaive}{worstperf}{percent}.

Fortunately, we can optimise premature finalizer prevention further by
piggy-backing on finalizer elision: if a finalizer can be
elided, there is no finalizer to be called prematurely, and no need for
such values to be kept alive. Intuitively, we want to avoid generating drop methods
for \lstinline{Gc<T>} types that do not require finalization,
but this is difficult to do directly inside \rustc. Instead,
we suppress calls to the drop methods of such types: the two approaches
are functionally equivalent, though ours
does put an extra burden on dead-code elimination in the compiler tool-chain.

We add a new pass
\lstinline{RemoveElidableDrops} to
\rustc's Mid-Level Intermediate Representation (MIR) processing. MIR is best
thought of as the main IR inside \rustc: it contains the complete set of
functions in the program, where each function consists of a sequence of basic
blocks. Simplifying somewhat, function and drop method calls are represented as
different kinds of \emph{terminators} on basic blocks. Terminators
reference both a callee and a successor basic block.

The \lstinline{remove_elidable_drops} pass iterates over a program's MIR,
identifies drop method terminators which reference elidable finalizers (using
the \lstinline{needs_finalizer} intrinsic from
\cref{needs_finalizer_intrinsic}), and turns them into `goto' terminators
to the successor basic basic block. \cref{alg:barrier_removal} in the appendix
gives a more formal version of this algorithm.


\section{Finalizer Safety Analysis}
\label{sec:fsa}

In this section we address two high-level problems: running finalizers on the same thread
as a paused mutator can cause race conditions and deadlocks; and some safe destructors are not
safe finalizers. Addressing the former problem is conceptually simple --
finalizers must be run on a separate thread -- but we must ensure that doing so
is sound. We therefore consider this a specific instance of the latter problem,
treating both equally in this section.

In this section we introduce Finalizer Safety Analysis (FSA), which prevents
a type \lstinline{T}'s destructor being used as a finalizer if it is not provably
safe to do so.

\jake{Now that FSA steps through function calls, maybe when explaining it we
should say somewhere that really it's a form of abstract interpretation. We're
creating a sound approximation over the CFG (MIR) of a Rust program to reason
about finalization properties without actually executing it. It's probably then
obvious to a PL audience that this must be imprecise in order to be decidable.}

\subsection{Rust references}
\label{sec:fsa_rust_references}

\lstinline{Gc<T>} can store, directly or indirectly, normal Rust
references (i.e.~\lstinline{&} and \lstinline{&mut}), at which point it is
subject to Rust's normal borrow checker rules and cannot outlive the reference.
However, finalizers implicitly extend the lifetime of a GCed value,
including any stored reference: accessing a reference in a finalizer could
undermine Rust's borrow checking rules.

The simplest way of avoiding this problem would be to forbid \lstinline{Gc<T>}
from storing, directly or indirectly, references. This might seem to be
no great loss: storing references in a \lstinline{Gc<T>} largely nullifies
the `GCness' of \lstinline{Gc<T>}. However, we found the result hard to use,
making even simple tasks such as gradually refactoring existing code
to use \lstinline{Gc<T>} painful.

A moderate relaxation -- indeed, how we first designed this -- is to recognise
that only types that need a finalizer can possibly have problems with
references, and to forbid such types from storing references in
\lstinline{Gc<T>}. For example, if there is no drop method for
\lstinline|struct S {x: &u8}| then its destructor is safe to use as a
finalizer. We can relax matters further by recognising that \lstinline{S} could
safely have a drop method provided it uses no references at all. However,
these relaxations still proved frustrating to use with existing code.

The eventual rule we alighted upon for FSA is to allow a destructor for a type
\lstinline{T} to be used as a finalizer provided the destructor's drop methods
do not obtain references derived from \lstinline{T}'s fields (including fields
reachable from its attributes). Using Rust's terminology, we
forbid \emph{projections} (which include a struct's fields, indexes into a
vector, and so on) to reference types. Any other references that are created
in a drop method are by definition safe to use, as they either exist only
for the duration of the drop method (references to variables on the stack)
or will exist for the remainder of the program (references to global variables).

Forbidding all projections in this way over-approximates the safe set of
destructors. For example, if a drop method creates a new
value and tries to obtain a reference to a field in that new value, then FSA will
raise an error, even though that reference can only exist for the duration of
the drop method. It may be possible to relax our rule further, but we found that it becomes
rapidly harder to define and article what is safe, and to implement the resulting design in \rustc.

\jake{FSA has changed quite a bit since the last write-up in this section, below I will detail the key differences from before}

\subsection{New FSA rules}

\subsubsection{Bypassing FSA}

The \lstinline{FinalizerSafe} auto-trait no longer exists. This means there is
now no user-implementable auto-trait that can be implemented on types to bypass
FSA. The only way to now bypass FSA is to wrap a type inside
\lstinline{FinalizerUnchecked}.

The use of \lstinline{FinalizerUnchecked} works at both the top level, and
indirectly when used in a field. For example, the following would be allowed in
\ourgc:

\begin{lstlisting}[language=Rust, numbers=none]
struct S { x: *const u8 } // not Send + Sync because of the raw ptr
impl Drop for S { fn drop(&mut self) { let _ = *self.x }; }

Gc::new(unsafe { FinalizerUnchecked::new(S)});
\end{lstlisting}

\lstinline{S} is not safe to use in a finalizer because its drop method
dereferences the non-Send non-Sync raw pointer field \lstinline{x}. However, by
passing a wrapped \lstinline{FinalizerUnchecked} value of \lstinline{S} to
\lstinline{Gc::new} we have explicitly disabled FSA for this value. Note that
this is an unsafe operation because it could allow unsoundness.

This also works indirectly. For example:

\begin{lstlisting}[language=Rust, numbers=none]
struct T { y: FinalizerUnchecked<S> }

Gc::new(T { y: unsafe { FinalizerUnchecked::new(S)} });
\end{lstlisting}

Here, \lstinline{FinalizerUnchecked} is used to prevent FSA from checking the
drop method of \lstinline{S} only. In the above example, \lstinline{T}'s
finalizer will be allowed. However, if \lstinline{T} had an unsafe drop method.
FSA would still throw an error. This is useful for making sure that where FSA
is too conservative, it need only be disabled in the places it needs to be, and
other errors can't accidentally sneak through.

\lstinline{FinalizeUnchecked} has another benefit over the old approach of
implementing \lstinline{FinalizerSafe} for a specific type: it only affects a
particular instance of a type. In other words, implementing
\lstinline{FinalizerSafe} (or indeed \lstinline{Send + Sync}) for types so that
they can be finalized in GC means that users must be sure all uses of that type
are safe. I personally found this very difficult to be confident with --
especially over larger codebases (i.e. alacritty).

\subsubsection{FSA Entry-points}

Previously, we hard-coded \lstinline{Gc::new} to be the ``entry-point'' which
FSA would use to check a type. This was fine at the time, because
\lstinline{Gc::new<T>} was the only safe way to construct a Gcable object of
type \lstinline{T}. However, we have since extended the GC API to support
\lstinline{from/into} conversions, which are used extensively in some of the
\lstinline{Rc}'d programs that we have retrofitted.

We now define an internal rustc-only attribute
(\lstinline{#[rustc_fsa_entry_point]}) to replace this. The first stage of FSA
was to traverse the whole-program MIR looking for \lstinline{Gc::new}
constructors. Instead, we now traverse the whole-program MIR looking for
functions which have the \lstinline{#[rustc_fsa_entry_point]}. Constructor-like
functions inside the GC standard library (such as \lstinline{Gc::new} /
\lstinline{Gc::from}) are annotated with this attribute.

\subsubsection{The ReferenceFree auto-trait is back}

The viral properties of auto-traits are really useful for the kind of analyses
we want to do in FSA. E.g. with Send and Sync, we can know for certain if
\lstinline{T} is thread-safe provided it implements these autotraits -- we
don't have to step into T's fields and also check those because if they don't
the negative impl would bubble up and affect T too.

For FSA, this means that if we construct a \lstinline{Gc<T>} where
\lstinline{T} implements \lstinline{Send} + \lstinline{Sync} we don't need to
check drop methods for thread-safety for any projection into \lstinline{T}'s
fields.

However, thread-safety is not the entire story. We would still need to check
that drop methods do not project into fields of \lstinline{T} which are
reference types. For reasons we have already explained, both regular Rust
references and \lstinline{Gc} references cannot be used inside a finalizer. We
provide negative impls of the \lstinline{ReferenceFree} auto trait for
\lstinline{Gc<T>} and \lstinline{&/&mut T} types. This now lets us short-circuit
checking all projections for finalizer safety in drop methods, provided that
the top-level type implements all three auto-traits (\lstinline{Send + Sync + ReferenceFree}).

\lstinline{ReferenceFree} is a standard library internal type, and, like
\lstinline{Freeze}, must not be implemented by the user for their own types
(though due to the way rustc re-exports standard library items, we can't
actually prevent this, just as they can't prevent the same with
\lstinline{Freeze}).

\subsubsection{Traversing through function calls}


FSA can now automatically look into functions which are called inside drop
methods to also check that their implementation is finaliser safe. This greatly
improves the ergonomics of FSA. First it means the number of FSA errors that we
encounter is dramatically reduced. For alacritty and SWS, we go from 110 errors
(though likely more, as this is the ceiling at which point rustc no longer
emits any more errors) to 2 and 1 respectively. Almost all of those errors were
because a finalizer called something benign (e.g. unwrap()) which FSA could not
look into and therefore had to emit an error. This has a tangible impact on the
safety of a program: in the remaining places where types do need user
annotation, the user can cast a much narrower net, and the risk of them
disabling FSA entirely because of error fatigue is reduced.

Second, when an error is raised, it is more useful to the user. The error will
now pinpoint the exact line of code from within whatever function where the
problem arises, rather than the line of a function call which may or may not be
safe. Of course, some function calls (e.g. function pointers or external calls
where no MIR is available) still cannot be checked, so FSA emits an error at
the call for these. However, these are now issues in their own right which do
need user intervention, rather than spurious errors about potentially unsafe
function calls which could trivially be allowed automatically.

Function traversal in FSA is implemented iteratively using a queue. When
visiting the MIR for a drop method, we check whether the terminator of each
basic block is a call to a new function. If it is, we add that function to a
queue of functions to be checked.

Because functions in Rust can have generic type signatures, we must try to
enqueue the concrete monomorphized function definition of a call. Unfortunately
this is not always possible — for some functions, only the polymorphic function
definition may be available at the point where we perform FSA. In such
instances, we enqueue the polymorphic definition, however, this is more likely
to emit a false-positive FSA error, because \lstinline{Send + Sync + ReferenceFree} auto
trait bounds may not be present on the generic type, but are present on the
concrete type which is actually used. This will never cause a false-negative.

Functions in the queue are checked for the same potential misuses as the drop
method itself and can be thought of as breadth-first-search over a subset of
the call graph reachable from a drop method (though more natural, recursion is
not used in favour of a queue-based iterative approach to prevent stack
overflow and so that we can easily cache previously visited functions. This
caching also prevents non-termination where recursive calls are used in the
user program.)

Sometimes when attempting to resolve the function definition for a particular
callsite, the MIR for the function body is not available. This can happen for a
number of reasons, but most commonly it’s because the call is to either: a) a
function which is dynamically dispatched using Rusts trait object mechanism; b)
an external function which is dynamically linked later on; or c) a rust
internal function (i.e compiler intrinsic). In these cases, FSA must emit an
error because it can’t know that those functions are safe to call from within a
finalizer.

However, for c) we noticed that many safe intrinsics were called from within a
finalizer which FSA couldn’t know about. We thus audited each intrinsic and
annotated those which are safe with a special \lstinline{rustc_fsa_safe_fn} attribute. Some
intrinsics (e.g. transmute) allow users to use values in ways which could be
unsound in a finalizer, and thus are never safe.


\subsection{Cycles and Finalization}
\label{sec:cycles_and_finalization}

\begin{figure}[tb]
\lstinputlisting[
  language=Rust,
  basicstyle=\footnotesize\ttfamily,
  firstline=5,
  caption={An example of the problems that come from mixing cycles
  and finalization. The salient difference from~\cref{fig:first_example} is that
  the drop method sets the value of a field inside \lstinline{nbr} (line 4). Running this
  program on a strong memory model machine is likely to print either \lstinline{2 0} or \lstinline{1 0}
  depending on whether \lstinline{gc1} or \lstinline{gc2} is finalized first.
  The `seemingly expected' output of \lstinline{1 2} or \lstinline{2 1} would
  never be printed: whichever GCed value is finalized first changes what the
  other GCed value sees in its finalizer. As that implies, this example is
  unsound: whichever finalizer runs second leads to undefined behaviour.},
  label={fig:finalizer_cycle}
]{listings/finalization_cycle.rs}
\end{figure}

One of the main motivations for GCs is that they solve problems with cyclic
data structures. However, finalizers can be unsound if they access
state shared within members of a cycle. \cref{fig:finalizer_cycle} shows an example of undefined
behaviour when two GCed values create a cycle and both their
finalizers reference the other GCed value. Whichever order the finalizers are
run in, at least one of the finalizers will see the other GCed value as partly
or wholly `finalized'.

Most languages and systems we are aware of assume that users either don't run into
this problem (finalization cycles are considered rare in GCed
languages~\citep[p.~229]{jones23garbage}) or know how to deal
with it when they do (e.g.~refactoring the types into parts that do and don't
require finalization~\cite[p.~11]{boehm03destructors}). There is no fully
automatic solution to this problem, though some GCs
offer weak references, which allow users to detect when finalization cycles
have been broken, though they still have to deal with the consequences
manually.

We wanted to provide users with static guarantees that their destructors will
not behave unexpectedly when used as finalizers in a cycle. A first attempt at
enforcing such a property might
seem to be that a \lstinline{Gc<T>} cannot have, directly or indirectly, fields of type
\lstinline{Gc<T>}. This would indeed prevent the mistakes we want to catch but
at the cost of not allowing \lstinline{Gc<T>} with a finalizer to specify shared ownership!

Our approach is therefore very different to normal Rust type checks: instead of
checking an entire type, we check only that the type's destructor does not,
directly or indirectly, access a \lstinline{Gc<T>}. This is implicitly a non-strict
subset of the normal type check, allowing GCed types to express cycles so long
as their destructor(s) do not access other GC types.

To make this check easier to implement, we introduce
an \emph{auto trait}~\cite[Section.~11]{rustlangref},
a kind of marker trait that the compiler propagates automatically.
An auto trait \lstinline{A}
will be automatically implemented for a type \lstinline{T} unless
one of the following is true: there is an explicit \emph{negative
implementation} of \lstinline{A} for \texttt{T}; or \texttt{T}
contains a field that is not itself \lstinline{A}. Informally, we
say that a negative implementation of an auto-trait \emph{pollutes} containing
types.

Our new auto trait is \lstinline{FinalizerSafe} with a single
a negative implementation on \lstinline{Gc<T>}:

\begin{lstrustsmall}
impl<T> !FinalizerSafe for Gc<T> {}
\end{lstrustsmall}

This naturally handles transitively reachable code, allowing FSA itself to only
check that a destructor's direct field accesses are \lstinline{FinalizerSafe}.


\subsection{Destructors need to be runnable on a finalizer thread}
\label{sec:fsa_finalizer_thread}

\begin{figure}[t]
\lstinputlisting[
  language=Rust,
  basicstyle=\footnotesize\ttfamily,
  firstline=10,
  caption={How destructors can cause deadlocks when used as
  finalizers. The mutator creates a reference-counted mutex (line 6),
  placing a copy in a \lstinline{GcNode} that immediately goes out of scope
  (line 7). The mutator then acquires the lock (line 8) but before it
  can release the lock a GC cycle occurs and the \lstinline{GcNode}'s
  finalizer run (line 9). If the finalizer is run on the same thread
  as the mutator, then it will fail to grab the lock (line 2) and cause
  a deadlock as the mutator cannot continue and release the lock.},
  label={fig:finalizer_deadlock}
]{listings/finalizer_deadlock.rs}
\end{figure}

Virtually all destructors access state, whether that is in the GCed value they are
attached to, or global state. When used as finalizers, accessing such state can
be problematic, turning sound destructors into unsound finalizers. For
example, acquiring a lock can cause deadlocks if the mutator has already
acquired that lock and the finalizer is run on the same thread as the
mutator~\cite[Section~3.3]{boehm03destructors}. Locks are not the only problem:
data races of other kinds are also possible when destructors are run as
finalizers and can interleave with mutator code~\cite{niko13destructors}.

The solution to these problems is to ensure that finalizers are never
interleaved with mutator code on the same thread. The most general, and
practical, way to achieve this is to run finalizers on a thread(s) that never
runs mutator code. While such a thread(s) could perform other non-mutator work,
it is simplest to think of a thread dedicated solely to running finalizers and
hence we refer to it as the \emph{finalizer thread}.

We must therefore ensure that it is safe to run a type's destructor on the
finalizer thread. A conservative definition is that \lstinline{Gc<T>} is
is safe to use if \lstinline{T} implements both of Rust's existing \lstinline{Send} (is a type safe to be
moved permanently from one thread to another?) and \lstinline{Sync} (is a type
safe to be accessed from more than one thread simultaneously?) auto traits.
Note that it is not sufficient for \lstinline{T} to implement
\lstinline{Send} alone, because a finalizer may access
state (e.g.~stored in an \lstinline{Arc}) shared between a GCed value and
non-GCed value. If the finalizer runs while the non-GCed value is still in
scope, the mutator and finalizer threads may access
the underlying value simultaneously.


\subsubsection{Relaxing the \lstinline{Send + Sync} restriction}

Requiring that finalization is only possible for types that implement both \lstinline{Send} and
\lstinline{Sync} can occasionally be frustrating, in part because more types
implement \lstinline{Send} than \lstinline{Sync}. Experimentally, FSA
does not check that the top-level types implement \lstinline{Send} than \lstinline{Sync},
instead extending the destructor-checking approach of
\cref{sec:fsa_finalizer_thread}: destructors
may only access fields that implement both \lstinline{Send} and
\lstinline{Sync}.

This relaxed check is undoubtedly more dangerous than that of
\cref{sec:fsa_finalizer_thread}, which is why we mark it as experimental. We
believe that it is compatible with Rust's semantics (at least at the time of
writing), and our examination of the compiler tool-chain
(i.e.~\rustc and LLVM) showed that it does not implement the analyses and
optimisations that could undermined our check. However, there are
other languages and implementations where our check would not be valid
(for example our assumption would be unsafe
in Java due to synchronisation removal~\cite{wang06escape}). We leave as an
open question to others as to whether Rust should deliberately permit or forbid
such checks in its semantics.

\laurie{in our benchmark suite, how often does this relaxation make a difference?}\jake{I will post a write-up here once I have the numbers from all the benchmarks}


\subsection{Bypassing FSA checks when necessary}

FSA is deliberately conservative. In particular, when checking whether drop
methods satisfy certain properties, FSA naturally checks whether code
reachable from those drop methods satisfies those properties. Currently,
function calls which are inlined are checked in detail, but all other function
calls are considered \emph{opaque} and fail FSA's checks. Calls to
non-inlined normal Rust functions are not checked due to
a minor limitation inside \ourgc. Other kinds of function call are fundamentally
hard or impossible to check, specifically: calls to externally linked (e.g.~an FFI
call to C) function; or dynamic dispatch calls in \emph{trait objects}, where we can't
tell which of many possible implementations of a method will actually be
called.

In some cases, users are able to prove to their satisfaction that the resulting
false positives are incorrect (e.g.~because the C function they are calling
has no side effects). Depending on the error they receive, they can then
implement the unsafe traits \lstinline{FinalizerSafe}, \lstinline{Send}, or
\lstinline{Sync} appropriately. \laurie{is this still relevant?}
A challenge in all three cases is that
implementing a trait implicitly guarantees that all dynamic values
satisfy the static trait implementation.
One way to sidestep this problem is to provide a newtype that allows only a
subset of values to be declared as safe to use in a finalizer: the newtype
implements the \lstinline{FinalizerSafe}, \lstinline{Send}, and \lstinline{Sync}
traits, implicitly blessing the type it wraps. We found a particular need for
this when interfacing with external GC-unaware libraries. \ourgc thus provides
a standard newtype \lstinline{FinalizerUnchecked<T>} which allows allows users to inform
FSA that certain uses of
a type \lstinline{T} satisfy its properties. We still want to require the user
to explicitly opt-in to unsafe Rust when using
\lstinline{FinalizerUnchecked<T>}, since incorrect use is dangerous,
but Rust structs cannot be marked as unsafe directly. Instead we mark
the struct's constructor (\lstinline{FinalizerUnchecked::new}) as
unsafe.


\subsection{Putting it all together}

In essence, FSA integrates the seemingly separate parts presented above into one:
as shown in \cref{alg:fsa} it iterates over
every function in \rustc's MIR and checks whether types used in a
\lstinline{Gc<T>} satisfy FSA. To simplify the presentation, we ignored
caching in the algorithm though the actual implementation does perform
caching (e.g.~the \lstinline{needs_finalizer} intrinsinc) .

The implementation of FSA also gives good quality error messages. Rather
than just inform a user that `your drop method has not passed FSA', \ourgc
pinpoints precisely which field or line in a drop method caused FSA to fail
(via the \lstinline{Emit*Error} functions):
\lstinline{EmitReferenceError} informs the user when a reference in a type
is used in a way that violates FSA (see~\cref{sec:fsa_rust_references}); and
\lstinline{EmitFinalizerUnsafeError} when a drop method
contains code which is unsafe (e.g.~references a \lstinline{Gc<T>} type, an
opaque function, etc.). \cref{fig:finalizer_cycle_err} shows an example of
the errors reported by \ourgc: note that it pinpoints the line within a
drop method that caused an FSA error.

\begin{figure}[tb]
  \begin{center}
\lstinputlisting[
  language={},
    basicstyle=\fontsize{7.6pt}{7pt}\ttfamily,
    xleftmargin=6pt,
    numbers=none,
    caption={The compiler error produced by \ourgc for the example
    in \cref{fig:finalizer_cycle}. This extends \rustc's normal error messages:
    note that both the use of a type with a FSA-incompatible drop method (`has
    a drop method') and the line in the drop method (`caused by the
    expression') are highlighted to the user.},
    label={fig:finalizer_cycle_err}
]{listings/finalization_cycle.stderr}
  \end{center}
\end{figure}

\section{Performance Evaluation}
\label{sec:evaluation}

\begin{table}[t]
\begin{center}
\begin{tabular}{lll}
\toprule
  Benchmark & Version & Description \\
\midrule
  \binarytrees & Debian CLBG \binarytreesversion  & Heap allocation microbenchmark \\
  \grmtools & \grmtoolsversion  & Lexer / parser library \\
  % \minimoka & \minimokaversion  & Concurrent cache library \\
  \regexredux & Debian CLBG \regexreduxversion  & Regular expression matching \\
  \sws & \swsversion  & Web server\\
  \midrule
  \somrsast & git \#\somrsversion  & SOM AST VM\\
  \somrsbc & git \#\somrsversion  & SOM bytecode VM\\
  \yksom & git \#\yksomversion  & SOM bytecode VM\\
\bottomrule
\end{tabular}
\end{center}

  \vspace{5pt}
  \caption{Our suite of benchmarks: we take these Rust programs and alter them
  to use different memory allocation strategies (\ourgc, \lstinline{Rc<T>},
  etc.). The top half of the table shows a traditional suite of benchmarks. The
  bottom half shows three SOM (a programming language) VMs. For all three SOM
  VMs, we use the same SOM version of the `Are We Fast Yet?' benchmark suite
  (git\#\awfyversion)\cite{marr16cross}; each SOM VM is thus a proxy for 26 additional benchmarks.}
\label{tab:benchmarks}
\end{table}

To understand the performance of \ourgc, and the various aspects that make it
up, we carried out several performance experiments. In this section we explain
our methodology and our experimental results.


\subsection{Methodology}

In order to understand the performance of \ourgc we ran three experiments:

\vspace{6pt}
\begin{tabular}{ll}
  \Egcrc & the relative performance of \texttt{Gc<T>} over other allocation strategies (e.g.~\texttt{Rc<T>}) \\
  \Eelision & the performance gains of finalizer elision \\
  \Eprem & the costs of premature finalizer prevention and its optimisation \\
\end{tabular}
\vspace{10pt}

There is no existing benchmark suite for GCs for Rust. Even if such a suite did exist, it
may not have been suitable for our purposes because in experiment \Egcrc we
want to compare programs using existing shared ownership approaches such as \lstinline{Rc<T>} against
\lstinline{Gc<T>}. We searched through roughly the 100 most popular
Rust libraries on \lstinline{crates.io} (the \emph{de facto} standard Rust
package system) looking for suitable candidates. Many that use
shared ownership strategies do not use them in performance
critical areas (\laurie{give two examples}): any resulting benchmark would be
unlikely to show anything of interest to this paper. Some relied on
unusual and/or unsafe features of their memory strategy that were too difficult
or time-consuming to adapt (e.g.~\emph{RustPython}, \emph{Rkyv}).

\cref{tab:benchmarks} shows our eventual benchmark suite. \binarytrees and \regexredux
are well-known microbenchmarks. \binarytrees is both allocation intensive
and sufficiently simple that we could port it to a variety of shared ownership strategies.
\grmtools is a parsing library which uses \lstinline{Rc<T>} extensively: the
core library is around 18KLoC, excluding dependencies.
%\minimoka \laurie{uses what?};
To benchmark \grmtools, we used 4 large Java source files (collectively
around 28KLoc of source code).
\sws is built on top of the \emph{Hyper} HTTP library, which itself uses the
\emph{tokio} async runtime and uses \lstinline{Arc<T>} extensively: the
core server is around 16KLoC, with \emph{Hyper} around 27KLoC, and
tokio around 90KLoC (though \sws uses only a subset of each library).
To benchmark \sws, we use the standard \emph{wrk} HTTP benchmarking
tool which fires as many requests as possible at \sws in one minute.
Unlike our other benchmarks, we thus report requests per second for \sws.

Our suite also includes three SOM VMs. SOM is a small, but complete,
language in the Smalltalk mould, which has a wide variety of implementations. \somrsast and
\somrsbc are existing implementations and both relatively faithful ports of a Java
SOM VMs into Rust that use \lstinline{Rc<T>}. \yksom is a from-scratch rewrite
for this paper that is designed to be more idiomatic Rust code. Each SOM VM is
around 5KLoC. Crucially, the SOM VMs enable us to run the same `Are We Fast Yet?'
benchmark suite~\cite{marr16cross}, turning what appears to be 1 benchmark into 26
(mostly micro) benchmarks.

We compare \ourgc against three other shared ownership approaches: for most
experiments against \lstinline{Rc<T>} and \lstinline{Arc<T>} (the two being
very similar to oneanother); and for \binarytrees against \rustgc (a user
library for GC for Rust~\cite{manish15rustgc}) and \typedarena (a non-GC memory
arena~\laurie{is this \url{https://github.com/thomcc/rust-typed-arena}?}).
We used the other shared ownership approaches in their
standard configurations.

One confounding factor we have controlled for is the underlying memory
allocator. \boehm's core dates to the 1980s and allocators have made
many advances since. A quick experiment showed us that modern system
allocators are noticeably faster than \boehm: comparing \ourgc with \boehm
to \rustc with a modern system allocator would be uncomfortably
close to apples and oranges. We therefore use \boehm as the allocator
for both \ourgc and \rustc.

We ran each benchmark in our suite 30 times reporting 99\% confidence intervals.
For the non-SOM benchmarks reporting timings we measured wall-clock time of a process using
the standard Unix \lstinline{time}. We ran SOM benchmarks using
the conventional \emph{rebench} tool~\cite{rebench18marr}, which internally
measures wall-clock time using \lstinline{time}.
We ran all benchmarks on a \benchmarkcpu with \benchmarkram,
running \benchmarkos. We turned off turbo boost and hyper-threading,
as both can interfere with the kind of benchmarking we wish to perform.

Our experiment is fully repeatable and downloadable from
\url{https://github.com/softdevteam/rustgc_paper_experiment/}.


\subsection{Results}

\begin{table}[t]
\scalebox{0.7}{%
  \input{table_overview}
}
  \vspace{5pt}
  \caption{Results for the \Egcrc experiment, comparing
  benchmarks running on \ourgc against other shared ownership approaches. With
  the exception of \binarytrees, which we ported to several approaches, most
  compare \ourgc with one other shared ownership approach. All benchmarks
  were run 30 times with wall-clock times reported, except \sws which reports
  requests per second. Individual benchmarks have symmetric 99\% confidence
  intervals notated with $\pm$; summary benchmarks have asymmetric 99\%
  confidence intervals notated with square brackets.
  }
  \label{table:Ercgc}
\end{table}

\begin{figure}[t]
    \centering
    \includegraphics[width=.6\textwidth]{plots/elision.pdf}
    \caption{
    Summary results of \Eelision (the detailed breakdown for each SOM
    benchmark can be found in \cref{fig:Eelision_breakdown}. \emph{Naive} turns
    off finalizer elision. As these results clearly show, all three SOM VMs run
    significantly faster with finalizer elision turned on.}
\label{fig:Eelision}
\end{figure}

\begin{figure}[t]
    \centering
    \includegraphics[width=.6\textwidth]{plots/barriers.pdf}
    \caption{
    Summary results of \Eprem (the detailed breakdown for each SOM benchmark
    can be found in \cref{fig:Eprem_breakdown}). \emph{Naive} shows premature
    finalizer prevention turned on for all types; \emph{Opt} shows optimised
    premature finalizer prevention (see \cref{sec:optimising_prem}); and
    \emph{None} shows premature finalizer prevention turned off. \emph{None} is
    unsound, but provides the ultimate ceiling for \emph{Opt}'s performance.
    Note that we turned finalizer elision off in this experiment \laurie{because}.
    }
\label{fig:Eprem}
\end{figure}

\cref{table:Ercgc} shows the results for \Egcrc. \ourgc is often marginally
slower than other shared ownership approaches, but, with two exceptions, never
by more than 8\%. The first exception is \binarytrees, whose batch approach to
allocation is perfectly suited to \typedarena: it waits until the end of the
benchmark to deallocate all memory in one go. Interestingly, despite being
4$\times$ slower than \typedarena, \ourgc is 5.5$\times$ faster than \rustgc
and 1.5$\times$ faster than \lstinline{Rc<T>} on \binarytrees. The second
exception is \somrsast where \ourgc is over 2$\times$ slower, because
one destructor is not elidable. We could safely mark this destructor as \lstinline{DropMethodFinalizerElidable}
(see \cref{dropmethodfinalizerelidable}), but feel that would unduly
bias our results in \ourgc's favour. Overall, while such a sparse comparison can never be fully satisfying, a reasonable
interpretation of this data is that \ourgc does not appear to be prohibitively
slow.

\cref{fig:Eelision} shows the summary results for \Eelision
(\cref{fig:Eelision_breakdown} in the appendix shows each SOM benchmark).
Finalizer elision improves the performance of each VM, though by varying
amounts. Perhaps because it was built with \ourgc in mind, \yksom benefits the
most from finalizer elision.

\cref{fig:Eprem} shows the summary results for \Eprem. Unsurprisingly, turning
on premature finalization prevention everywhere slows performance noticeably
relative to removing some or all premature finalization prevention drop
methods. However, to our surprise, it is often difficult to distinguish
`optimised' premature finalizer prevention from no premature finalizer
prevention at all. Although we do think this shows that premature finalizer
prevention optimisation is highly effective, we suspect that some inevitable
noise in benchmark results might slightly flatter the optimisation.

% \begin{figure}[t!]
%     \centering
%     \includegraphics[width=1\textwidth]{plots/som_rs_perf.pdf}
%     \caption{Results from the \somrs micro-benchmark experiment, where the SOM
%     benchmark suite is run to compare two configurations of \somrs: \somrsrcbdwgc, where SOM objects are managed with RC but use the BDWGC's allocator with GC disabled; and \somrsgc, where SOM objects are managed using \ourgc's GC library, which uses the BDWGC's allocator. Each benchmark is run for 100 process
%     executions, where the error bars represent 99\% confidence intervals.}
% \label{graph:som_rs_finalizers}
% \end{figure}


\section{Threats to validity}

We cannot state with certainty that the solutions we have presented are
complete or correct. We have tried to mitigate this by relatively extensive
testing. We have created a number of new tests inside our \rustc fork (11 files which constitute 39 tests).
We also ported a number of existing programs to \ourgc --
including programs that we did not expect to make good benchmarks -- since some
GC problems only manifest at scale. However, we could undoubtedly have performed wider
testing, investigated fuzzing, and so on: it is inevitable that there will be
bugs in \ourgc that are entirely our fault. Some of these bugs might have
coloured our performance results.

Our definition of the problems, the design of our solutions, and the way we
implemented them are presented in a fairly informal manner. Though we must
admit that this partly reflects our backgrounds, a truly formal definition
would be challenging for two reasons. First, at the time of
writing, Rust's semantics remain imprecise and incomplete, though this
situation is gradually improving. Second, the problematic
aspects of GC we tackle in this paper have resulted from careful analysis of
running systems (e.g.~\cite{boehm03destructors}): to the best of our knowledge,
no formalism of GC exists which fully covers these aspects. We suspect this is
a mixture of: a lack of research on this problem; the challenging
temporal aspects of GC; and, possibly, a poor fit with existing formalisms.
The closest analogue we know of is the evolving research literature on memory
models which, in a sense, modern GC is a superset of.

Any performance judgements we make are necessarily contingent on our particular
benchmark suite. We cannot say, for example, that finalizer elision will always
improve performance by the \laurie{X\%} we see in our experiment: there
undoubtedly exist reasonable, non-pathological, programs which will see
performance changes beyond those our results suggest. Although the \Egcrc
experiment requires a very specific benchmark suite, our other experiments
would probably be a good fit for a standard benchmarking suite.
Since the benchmark suite in this paper is the largest for any GC in Rust \laurie{is that true of
Bronze?}, it may serve as a partial basis for such a suite in the future.

Using \boehm as the allocator for all benchmarks has the advantage of removing
`pure' allocator performance as a confounding factor, but does mean that some
of the performance characteristics of benchmarks will be changed (e.g.~the
impact on CPU caches). A generic, modern conservative GC, using the insights of
recent non-GC allocators, would probably give a better idea of the true
performance of \ourgc relative to other shared ownership strategies. We are
not aware of a production quality modern, generic conservative GC being
available right now, though we hope one does become available: it would
be interesting to rerun our experiments if and when that occurs.


\section{Related work}
\label{sec:related_work}

In this paper we hope to have given sufficient background on GC and the use
of destructors and finalizers in general. In this section we mostly survey the major
parts of the GC for Rust landscape more widely. Our survey is
inevitably incomplete, in part because this is a rapidly evolving field
(a number of changes have occurred since the most recent equivalent
survey we are aware of is~\cite{manish21arena}).
We also cover some relevant non-Rust GC work not mentioned elsewhere.

Early versions of Rust had `managed pointers' (using the \lstinline{@T} syntax)
which were intended to represent GC types~\cite{manish21arena}. The core
implementation used reference counting though there were several, sometimes
short-lived, cycle detectors~\cite{hoare22cycles}. Managed pointer support was removed\footnote{In commit
\url{https://github.com/rust-lang/rust/commit/ade807c6dcf6dc4454732c5e914ca06ebb429773}.}
around a year before the first stable release of Rust. This was not the end
of the story for `GC as a core part of Rust', with core Rust developers exploring
the problem space in more detail~\citep{felix15specifying, felix16roots,
manish16gc}. Over time these efforts dwindled, and those interested in GC for Rust largely moved from anticipating
\rustc support to expecting to have to do everything in user-level libraries.

One of the earliest user-level GC for Rust libraries is
\rustbacon~\citep{rustbacon}, an implementation of the well-known GC
algorithm~\citep{bacon01concurrent}. This provides a type \lstinline{Cc<T>}
which is similar in intention to \ourgc's \lstinline{Gc<T>}, though using
reference counting and manually invoked cycle detection via the
\lstinline{collect_cycles} function. In order for this function to correctly
trace fields, users must manually implement a \texttt{Trace} trait which
traverses a type's fields. Destructors
are used as finalizers: to avoid the problems with Rust references we solved in
\cref{sec:fsa_rust_references}, \rustbacon imposes a \lstinline{T: 'Static}
lifetime bound on the type parameter passed to \lstinline{Cc<T>}. Simplifying
slightly, this means that any references in such a type must be valid for the
remaining lifetime of the program, a severe restriction. Unlike our approach to the access
of already-finalized values (\cref{sec:cycles_and_finalization}), it can only detect
such accesses at runtime, leading to a (safe) Rust \lstinline{panic}. Unlike most other GCs for
Rust, destruction in \rustbacon can be
seen as deterministic: GCed values outside cycles have their destructors run as soon as their reference count is
0; GCed values inside cycles at the end of the \lstinline{collect_cycles} function.

Probably the best known GC for Rust is \rustgc~\cite{manish15rustgc} (partly
covered in \cref{sec:destructor challenges}). \rustgc's \lstinline{Gc<T>} provides
a similar API to \ourgc, with the notable exception that its \lstinline{Gc<T>}
is not, and cannot be, copyable, thus always requiring calls to
\lstinline{Gc::clone}. Although, like \ourgc, \rustgc allows \lstinline{Gc<T>} values to
be converted into pointers, its lack of conservative GC means that users
must ensure that a \lstinline{Gc<T>} wrapper is kept alive for the entire
lifetime of pointers derived from it. Similarly to \rustbacon,
GCed values are reference counted, with occasional
tracing sweeps to identify cycles, though \rustgc performs cycle detection
automatically (i.e.~it doesn't require manual calls to a function such as
\lstinline{collect_cycles}). Drop methods are not used as finalizers: if a
finalizer is required, a manual implementation of the \lstinline{Finalize} trait
must be provided; finalizer glue can be largely, though not fully (see
\cref{sec:destructor challenges}), automatically created by the provided
\lstinline{Trace} macro. \rustgc detects accesses to already-finalized values
dynamically at run-time, panicking if they occur. Unlike \rustbacon, these accesses
are detected by recording \laurie{the next bit doesn't make sense} per-state what the collector's state is in: if the
collector is in a `sweep' phase, any access of a \lstinline{Gc<T>} leads to a
panic. We have not yet verified whether cross-thread collection / sweeping can
evade this check.

An example of moving beyond reference counting in a GC for Rust is
\shifgrethor~\cite{shifgrethor}. It requires \lstinline{Gc} values to
be created by a \lstinline{Root<'root>}: the resulting \lstinline{Gc<'root, T>}
is then tied to the lifetime of the
\lstinline{Root<'root>}. This allows roots to be precisely identified, but
requires explicitly having access to a \lstinline{Root<'root>} whenever a
\lstinline{Gc<'root, T>} is used. As with \rustgc, \shifgrethor requires users
to manually implement a \lstinline{Finalize} trait, though \shifgrethor's is
more restrictive: not only can other GCed values not be accessed (implicitly
solving the same problem as \cref{sec:cycles_and_finalization}) but any other
type without the same \lstinline{'root} lifetime as the GCed value is
forbidden. This means that many seemingly safe finalizers require
implementing the unsafe \lstinline{UnsafeFinalize} trait. We view
\shifgrethor as proof that accurately tracking GC roots
in normal Rust without reference counting is possible, though it cannot
deal with references being converted into pointers and \lstinline{usize}s.

A different means of tackling the root-finding problem is \textsc{GcArena}~\citep{gcarena},
which uses branding in a similar way to
\lstinline{GhostCell}s (see \cref{sec:background}). In essence, users provide
a special `root' type which is the only place where roots can be stored.
Mutating the heap can only be done in the context of functions that are
passed a branded reference to the GCed heap. Once such a function has completed,
\textsc{GcArena} is in full control of the GC heap, and knows that
only the root type needs to be scanned for roots. This provides strong
guarantees about when GC references are available but it is possible for code to keep executing in
an arena for too long, at which point the system can be starved of resources
even if much of the arena is no longer used. \textsc{GcArena} was originally
part of the \emph{Piccolo} VM  (which was itself previously called
\emph{Luster}), a Lua VM written in Rust. Such VMs have a frequently executed
main loop which is a natural point for a program to relinquish references
to the GCed heap, but this is not true of many other GCed programs.

One of several extensions of \rustgc is \bronze, which aims to improve
upon \rustgc in two ways~\citep{coblenz21bronze}. First, \bronze tries
to solve the root-finding problem by using LLVM's \lstinline{gc.root} intrinsic
at function entries to generate stack-maps (a run-time mechanism for accurately
tracking active pointers). This rules out the false positives that are
inevitable in conservative GC. However, \bronze
does not track nested references: if a \lstinline{Gc<T>} is used as a field in a struct, it
will not be tracked by the GC, leading to undefined behaviour. Second, \bronze tries to give GC in
Rust similar semantics to non-ownership languages such as Java. It does this by
allowing shared mutation, undermining Rust's borrow checker, and also leading
to undefined behaviour. See \cref{lst:bronze_unsound} for an
example of a problematic program in \bronze.

Chrome's rendering engine \emph{Blink} uses the conservative GC
\textsc{Oilpan}. It has the interesting property that it has two classes of
finalizers. `Full finalizers' are similar to finalizers in \ourgc, running on a
finalizer thread at an indeterminate future point, but with the difference that
they can only reference parts of a GCed value. To mitigate this,
`pre-finalizers' are run by the collector on the same thread as mutator
as soon as an object as recognised as unused, and can
access all of a GCed value. Pre-finalizers are necessary, but not encouraged, because they
implicitly pause the stop-the-world phase of the collector.
This reflects the fact that latency is a fundamental concern
for a rendering engine: \ourgc currently makes no pretences to being low latency.


\section{Conclusions}

We introduced a novel design for GC in Rust that, by taking advantage of Rust's
unusual static guarantees, also solves some classical GC finalizer problems. We
implemented our design in a fork of \rustc, and showed that it has a fairly
small performance overhead. We do not claim that \ourgc is the ultimate design
for GC in Rust, but it does
show how much can be achieved if one is willing to adjust and extend the
language's design and \rustc.


\bibliographystyle{ACM-Reference-Format}
\bibliography{bib}

\clearpage

\appendix

\section*{Further details}

In this appendix, we show additional formal algorithms and additional plots.
None of these are necessary to understand the main part of the paper, but
interested readers may find that they help flesh out some details.

\begin{algorithm}[h]
\caption{Removing elidable drops}
\begin{algorithmic}
\Function{RemoveElidableDrops}{$func$}
      \ForEach {$basic\_block \in func$}
            \If {\Call{IsDropTerminator}{$basic\_block.terminator.kind$}}
                \State $ty \gets$ \Call{GetTypeOfDroppedValue}{$block.terminator$}
                \If {\Call{IsGcType}{$ty$}}
                    \If {\NOT \Call{RequiresFinalizer}{$ty$}}
                        \State \Call{ReplaceTerminator}{$basic\_block$}
                    \EndIf
                \EndIf
            \EndIf
      \EndFor
\EndFunction
\Function{ReplaceTerminator}{$basic\_block$}
    \State $drop\_func \gets$ \Call{GetDropFunc}{$basic\_block.terminator$}
    \State $last\_block \gets$ \Call{GetLastBasicBlock}{$drop\_func$}
    \State $block.terminator \gets last\_block.terminator$
\EndFunction
\end{algorithmic}
  \label{alg:barrier_removal}
\end{algorithm}

\begin{algorithm}[t]
  \captionof{algorithm}{Finalizer safety analysis}
  \label{alg:fsa}
  \begin{algorithmic}[1]
    \Function{FinalizerSafetyAnalysis}{\textit{func}}
      \ForEach {\textit{basic\_block} $\in$ \textit{func}}
            \State $t \gets$ \textit{basic\_block.terminator}
            \If {\NOT \Call{IsGcConstructorCall}{$t$}}
                \State \textbf{continue}
            \EndIf
            \State \textit{ty} $\gets$ \Call{GetTyOfGcValue}{$t$}
            \If {\Call{isFinalizerUnchecked}{\textit{ty}} \OR \NOT \Call{RequiresFinalizer}{\textit{ty}}}
              \State \textbf{continue}
            \EndIf
            \If { \NOT \Call{ImplementsTrait}{\textit{ty}, \textit{ReferenceFree}}}
            \State \Call{EmitReferenceError}{}
            \Else
            \ForEach {$drop\_method \in$ \Call{GetDropGlue}{\textit{ty}}}
              \If { \NOT \Call{IsMIRAvailable}{\textit{drop\_method}}}
                  \State \Call{EmitFinalizerUnsafeError}{}
              \EndIf
              \State \Call{CheckDropMethodSafety}{\textit{drop\_method}}
            \EndFor
            \EndIf
      \EndFor
\EndFunction
\State{}
    \Function{CheckDropMethodSafety}{\textit{drop}}
    \State $ty \gets$ \Call{GetTyOfDropValue}{\textit{drop}}
    \If {\Call{IsFinalizerSafe}{\textit{ty}}}
      \State \Return
    \EndIf
      \ForEach {\textit{basic\_block} $\in$ \textit{drop}}
        \ForEach {$\textit{statement} \in \textit{basic\_block}$}
            \ForEach {$\textit{projection} \in \textit{statement}$}
              \If{\NOT \Call{IsFinalizerSafe}{\textit{projection.element}}}
                  \State \Call{EmitFinalizerUnsafeError}{}
              \EndIf
            \EndFor
        \EndFor
      \EndFor

\EndFunction
\State{}
  \Function{IsFinalizerSafe}{\textit{ty}}
    \State \Return{\Call{Impls}{\textit{ty}, $Send$} \AND {\Call{Impls}{\textit{ty}, \textit{Sync}}} \AND {\Call{Impls}{\textit{ty}, \textit{FinalizerSafe}}}}
\EndFunction
\end{algorithmic}
\end{algorithm}

\section*{Extra plots}

\clearpage

\subsection{\sws}

\begin{table}[t]
  \input{plots/sws_perf_table}
  \caption{\sws \Egcrc experiment. Higher is better for everything except latency and request time.  30 pexecs. 99\% CIs.}
\end{table}

\begin{table}[t]
  \input{plots/sws_elision_table}
  \caption{\sws \Eelision experiment. Higher is better for everything except latency and request time.  30 pexecs. 99\% CIs.}
\end{table}

\begin{table}[t]
  \input{plots/sws_barriers_table}
  \caption{\sws \Epremopt experiment. Higher is better for everything except latency and request time.  30 pexecs. 99\% CIs.}
\end{table}

\begin{figure}[t!]
    \centering
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/som_rs_ast_elision.pdf}
      \caption{\somrsast}
    \end{subfigure}
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/som_rs_bc_elision.pdf}
    \caption{\somrsbc}
    \end{subfigure}
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/yksom_elision.pdf}
    \caption{\yksom}
    \end{subfigure}
    \caption{Results of a performance of \Eelision using two configurations: naive
    finalization (where no optimisation is performed); and \ourgc's finalizer
    elision optimisation (where finalizers which are used only to deallocate
    memory are removed). Each benchmark is run for \benchmarkpexecs process executions, where
    the error bars represent 99\% confidence intervals.}
\label{fig:Eelision_breakdown}
\end{figure}

\begin{figure}[t!]
    \centering
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/som_rs_ast_barriers.pdf}
      \caption{\somrsast}
    \end{subfigure}
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/som_rs_bc_barriers.pdf}
      \caption{\somrsbc}
    \end{subfigure}
    \begin{subfigure}{\textwidth}
    \includegraphics[width=1\textwidth]{plots/yksom_barriers.pdf}
    \caption{\yksom}
    \end{subfigure}
     \caption{Results showing the performance of \Epremopt on the SOM VMs, each
     of which use three configurations: \textit{None} where there are no
     compiler barriers (which is unsound); \textit{All}, where every single
     \lstinline{Gc} reference has a corresponding barrier; and \textit{None},
     where barriers are optimised away using \ourgc's optimisation (section
     \jake{XYZ}) Each benchmark is run for \benchmarkpexecs process executions, where
    the error bars represent 99\% confidence intervals.}
\label{fig:Eprem_breakdown}
\end{figure}

\begin{figure}[t!]
    \centering
    \includegraphics[width=1\textwidth]{plots/alacritty_barriers.pdf}
     \caption{Results showing the performance of \Epremopt on \alacritty, each
     of which use three configurations: \textit{None} where there are no
     compiler barriers (which is unsound); \textit{All}, where every single
     \lstinline{Gc} reference has a corresponding barrier; and \textit{None},
     where barriers are optimised away using \ourgc's optimisation (section
     \jake{XYZ}) Each benchmark is run for \benchmarkpexecs process executions, where
    the error bars represent 99\% confidence intervals.}
\label{fig:alacritty_Eprem_breakdown}
\end{figure}

\begin{figure}[!t]
\begin{lstlisting}[
  language=Rust,
  basicstyle=\footnotesize\ttfamily,
  caption={An example of unsoundness in Bronze (directly inspired by
    \url{https://github.com/mcoblenz/Bronze/issues/2\#issue-1021510658}) based on
    its ability to allow
    aliased mutable references. Here, we obtain two mutable references to the
    same underlying vector (lines 5-6), before using the second reference to
    resize the vector, which forces its backing store to be reallocated in
    memory (line 15). Later, when we try to access an element through the first
    reference, it no longer points to valid memory (line 15).
    },
  label={lst:bronze_unsound}
]
fn main() {
  let mut gr1 = GcRef::new(vec![1u16,2,3]);
  let mut gr2 = gr1.clone();

  let ref1 = gr1.as_mut();
  let ref2 = gr2.as_mut();

  // ref1 and ref2 now reference the same object:
  ref1.push(4);
  ref2.push(5);
  ref1.push(6);

  let ref1elem0 = ref1.get_mut(0).unwrap();
  // Force reallocation of the underlying vec
  ref2.resize(1024, 0);
  // Now this writes to deallocated memory
  *ref1elem0 = 42;
}
\end{lstlisting}
\end{figure}

\end{document}