arch/riscv: make RISC-V CSR accesses unsafe #3549

lschuermann · 2023-07-18T20:14:13Z

Pull Request Overview

Accessing RISC-V CSRs is an inherently unsafe and risky operation: given they exercise large control over the state of the RISC-V hart, they can be used to manipulate the current privilege mode, manage PMP enforcement, etc.

Writing to CSRs can violate Rust's memory safety rules in a variety of ways, e.g., by manipulating PMP rules and cooperating with a userspace process. In addition, Tock's system call handler can be trivially used for that: it is sufficient for some untrusted code to store a non-zero value (pointer to some normally inaccessible memory) into mscratch, and a value to be written to the target memory address in some register. The code must then cause a trap, e.g., by issuing an illegal instruction. The context switch code will incorrectly interpret this trap as a fault of a process, and proceed to write the previous register file to a "stack" indicated by mscratch. The kernel will continue execution in the main loop, leaking the previous stack. The untrusted code has successfully overwritten an arbitrary memory location.

The only reasonable solution is to mark all CSR accesses as unsafe. While CSR reads are generally less problematic, even a CSR read of a non-existant CSR can cause a trap, which would be yet another way for some untrusted code to violate the kernel's liveness (apart from looping & panicing). Because of this, and given we provide access to the ReadWriteRiscvCsr types which inherently allow writing to CSRs, this marks CSR register type construction and CSR register type accesses through the riscv::csr::CSR struct as unsafe operations.

Testing Strategy

This pull request was tested by compiling.

TODO or Help Wanted

As a follow up, we should move the CSR definitions to the riscv-csr crate. It does not make any sense to have them split between the riscv and riscv-csr crates as they are right now.

Documentation Updated

~~Updated the relevant files in /docs,~~ or no updates are required.

Formatting

Ran make prepush.

Accessing RISC-V CSRs is an inherently unsafe and risky operation: given they exercise large control over the state of the RISC-V hart, they can be used to manipulate the current privilege mode, manage PMP enforcement, etc. Writing to CSRs can be used to violate Rust's memory safety rules. This is especially true in the context of Tock, where we can construct a simple example by looking at our system call handler: it is sufficient for some untrusted code to store a non-zero value (pointer to some normally inaccessible memory) into `mscratch`, and a value to be written to that memory in a fixed register. The code must then cause a trap, e.g., by issuing an illegal instruction. The context switch code will incorrectly interpret this trap as a fault of a process, and proceed to write the previous register file to a "stack" indicated by `mscratch`. The kernel will continue execution in the main loop, leaking the previous stack. The untrusted code has successfully overwritten an arbitrary memory location. The only reasonable solution is to mark all CSR accesses as `unsafe`. While CSR reads are generally less problematic, even a CSR read of a non-existant CSR can cause a trap, which would be yet another way for some untrusted code to violate the kernel's liveness (apart from looping & panicing). Because of this, and given we provide access to the `ReadWriteRiscvCsr` types which inherently allow writing to CSRs, this marks CSR register type construction and CSR register type accesses through the `riscv::csr::CSR` struct as unsafe operations.

lschuermann · 2023-07-18T20:16:08Z

arch/riscv/src/csr/mod.rs

@@ -549,4 +550,569 @@ impl CSR {
            _ => unreachable!(),
        }
    }
+
+    // CSR accessor methods


These accessors are generated through regex-parsing the above struct and templating a generic accessor method. It's not nice to have to vendor this, but I don't really know of a better way to achieve similar semantics without inserting ~1kLOC of auto-generated methods.

bradjc · 2023-07-18T20:53:05Z

I'm not sure I understand why CSRs are different from any other registers. I agree that reading a nonexistent one is bad, and if that is possible in chip code without unsafe, then that is bad. But I thought all of the construction was handled in the arch crate so that we only exposed valid CSRs.

Code like capsules (the most untrusted inside the kernel) has never had access to CSRs or any other registers. So that's not a problem. Chip code does, of course, and if chip code is trying to use mscratch to trick the core kernel into writing stack data to some location, well that seems bad! But chip code has always had the ability to just use DMA to write whatever wherever (assuming there is a DMA peripheral) or whatever other tricks can be done with a specific MCU's peripherals.

So, it's not clear to me why our policy of "using the registers (er, CSRs) is safe, it's creating them that is not" doesn't apply here.

Is there a case here to use modules instead somehow? Like could we only make certain CSRs "public" (to chips) and the rest not? It seems like chips need to modify interrupts, but not much else.

lschuermann · 2023-07-18T21:19:54Z

So, it's not clear to me why our policy of "using the registers (er, CSRs) is safe, it's creating them that is not" doesn't apply here.

@bradjc It does, and that's exactly what I'm trying to enforce / implement with this PR. Previously anyone could've just run ReadWriteRiscvCsr::new(), as that was a safe operation. This changes this constructor to be unsafe.

Technically, this implements a slightly different version of that paradigm, as arch/riscv/src/csr/mod.rs defines a CSR struct which holds an instance of all CSRs as ReadWriteRiscvCsr types. Those types are created using unsafe, which is exactly what we want. However, if we kept those fields pub, other code could then access these types without unsafe, even untrusted code. Instead, this makes these register types private members of this struct, and introduces unsafe accessor methods, acting as a proxy for our "creating registers is unsafe" paradigm.

We can (and probably) should change this to just define functions that construct CSR register types, like so:

pub unsafe fn mscratch() -> ReadWriteRiscvCsr<usize, mscratch::mscratch::Register, MSCRATCH> {
    ReadWriteRiscvCsr::<usize, mscratch::mscratch::Register, MSCRATCH>::new()
}

Code which wants to access this register would then call e.g., (unsafe { riscv::csr::mscratch() }).get(), ensuring that register creation is the unsafe operation, and all further operations are then safe.

We can't just once use unsafe to create all of these registers and them make them safely accessible through a public API. This would allow any untrusted code having access to that API to violate safety & soundness. For instance, there would be nothing stopping a safe function in a board crate to completely break safety invariants by a call to riscv::csr::mscratch(), if this function were marked safe and is exposed publicly.

lschuermann · 2023-07-18T21:27:55Z

Like could we only make certain CSRs "public" (to chips) and the rest not? It seems like chips need to modify interrupts, but not much else.

Unfortunately, given we work across crate boundaries here, we can either export things publicly in the API (making CSRs accessible to every crate depending on this one), or not exporting things at all (making CSRs inaccessible to e.g., rv32i or chip crates).

Relying on the fact that a certain other crate (e.g., capsules) does not depend on riscv is not a particularly good way of containing unsafety. On the one hand, crates generally can have both safe and unsafe code in them (as do board crates, for instance). When calling into a safe function, this function must not be able to perform any unsound actions without using unsafe itself, and having a "safe" Rust function / struct be able to cause unsoundness violates this invariant. On the other hand, crates can expose such APIs to other crates which don't even depend on them: for instance, a board can pass a reference to a ReadWriteRiscvCsr to a capsule as an arbitrary &dyn WriteableRegister -- unfortunately, a crate A not depending on B does not mean that A will never be able to use B's APIs.

bradjc · 2023-07-19T13:50:21Z

Code which wants to access this register would then call e.g., (unsafe { riscv::csr::mscratch() }).get(), ensuring that register creation is the unsafe operation, and all further operations are then safe.

I thought I agreed with this, but it isn't so much that creating the type is unsafe, it's choosing the CSR number that is unsafe. By only exposing mscratch() and not csr::new(), we are already handling the unsafeness of choosing the "address" of the "register". So why does the constructor need to be unsafe?

Relying on the fact that a certain other crate (e.g., capsules) does not depend on riscv is not a particularly good way of containing unsafety.

I disagree, I think it's actually one of the best mechanisms we have. If we had to assume that capsules for instance had complete access to any function in any chip code (even perhaps chip crates for a different MCU) then I think we would have to architect things very differently.

If a board main.rs wants to pass in accessing a CSR to a capsule via some other function, well, that's its prerogative. And it's not that there wouldn't be any unsafe in the call chain (constructing the CSR would still be unsafe), it's just that the unsafe wouldn't be in the board's main.rs.

Unfortunately, given we work across crate boundaries here, we can either export things publicly in the API (making CSRs accessible to every crate depending on this one), or not exporting things at all (making CSRs inaccessible to e.g., rv32i or chip crates).

It might be messy, but is it really not possible to expose certain CSRs publicly, and others not?

Also, if the concern is that the low level arch crate relies on a very specific use of mscratch, and trusts that mscratch is not modified elsewhere, then why don't we just special case mscratch somehow (either unsafe or capability)?

lschuermann · 2023-07-19T14:52:38Z

By only exposing mscratch() and not csr::new(), we are already handling the unsafeness of choosing the "address" of the "register". So why does the constructor need to be unsafe?

Unfortunately, it's the other way around in terms of where unsoundness actually occurs: it's fine to create a ReadWriteRiscvCsr type over an arbitrary "CSR address", if this type would not allow interacting with (reading/writing) the CSR at all. It's those read/write operations which can actually cause undefined behavior as defined by Rust.

Our "using registers is safe, creating them is not" paradigm is nonetheless legal, but only when we add another restriction: that the caller of the unsafe code (i.e. the register's constructor) must guarantee that this register is not used in an unsound fashion. To illustrate this, take the following example:

fn main() {
    use std::ops::Deref;
    let a: Box<u8> = Box::new(42);
    let b: &u8 = unsafe { &*(a.deref() as *const u8) }; // unsafe, but not unsound in isolation
    std::mem::drop(a);                                  // "safe", but unsound
    println!("Box value: {}", *b);                      // "safe", but unsound
}

In this example, creating a reference from a pointer is clearly an unsafe operation and needs to be marked as such. However, by invoking this unsafe block, the caller further guarantees that the resulting value is used in a safe fashion. We break this invariant through the safe calls to std::mem::drop, and by accessing the reference b.

Applying this logic to registers / CSRs in Tock, when we create a register (e.g., the mscratch CSR) we must further ensure that it is only used in a sound fashion or pass this proof burden along to other code, through either of the following:

We can achieve the former by creating the register through an unsafe {} block and then only using it in the local function, or using visibility modifiers to prevent other out-of-crate code from accessing the created register instance.
Or we can mark the accessor function of that register unsafe as well. Then, we pass the obligation of sound usage of this type onto the function caller.

For regular Tock drivers, we encapsule this unsafety by creating the StaticRef<$REGISTERS_STRUCT> type. Because the same code which uses unsafe to construct the StaticRef then moves these registers into a trusted driver, which continues to uphold the required invariants, we satisfy the requirements of the first approach.

For CSRs, the story is different: by constructing CSR registers through an unsafe {} block and providing them as a public struct member or through a non-unsafe accessor method, we're opening the gateway for safe Rust code to use these constructs to produce unsoundness, which is illegal.

Relying on the fact that a certain other crate (e.g., capsules) does not depend on riscv is not a particularly good way of containing unsafety.

I disagree, I think it's actually one of the best mechanisms we have.
[...]
If a board main.rs wants to pass in accessing a CSR to a capsule via some other function, well, that's its prerogative. And it's not that there wouldn't be any unsafe in the call chain (constructing the CSR would still be unsafe), it's just that the unsafe wouldn't be in the board's main.rs.

In hindsight I perhaps should have phrased this differently. I fully agree with and embrace the idea that our dependency structure provides many guarantees, which are ultimately also relevant for safety.

However, we need to distinguish functional safety from the concepts of safety and soundness as defined by Rust. For example, it's clearly undesirable for a capsule to call Chip::sleep (for lack of a better example), but this will not be able to violate Rust's memory safety invariants. On the other hand, reconfiguring the MPU, or invoking the trap handler, can absolutely cause unsoundness. For these functions, Rust provides us with two options: validate all required invariants internally, which is infeasible in these functions, or mark the functions as unsafe.

The Rust reference on unsafe is fairly explicit about this. An unsafe operation is "infectious", in the sense that the only way to expose an unsafe operation through a safe API is by explicitly checking all required invariants before invoking the unsafe code. As we can't check these invariants when creating a ReadWriteRiscvCsr, the unsafe should propagate up to the board's main.rs in the above example. When the main.rs file then uses this unsafe constructor in an unsound fashion, it itself is incorrect. However, currently it would be the riscv crate's fault, as it safely exposed an unsafe API.

bradjc · 2023-07-19T15:31:20Z

we're opening the gateway for safe Rust code to use these constructs to produce unsoundness, which is illegal.

I don't agree with this as I don't see how this is different from what we do with chip registers. I think either we are doing this in both cases, or neither, but I don't see how CSRs are accessible from any less trusted code than StaticRef registers are.

I will say that because the kernel depends on mscratch, it does seem like it falls into a special case where accessing it is never safe, and all code which touches it should have to be unsafe due to the memory safety concerns you raise. But, that seems orthogonal to it being a CSR, and this would be true if riscv gave us some other way to access it.

I guess what I'm saying is I think we should fix mscratch, but I'm not sure about changing all CSRs.

it's fine to create a ReadWriteRiscvCsr type over an arbitrary "CSR address", if this type would not allow interacting with (reading/writing) the CSR at all.

This seems true, but perhaps slightly pedantic. Why let someone create something they can't use? It seems more rational to only allow things to be created that can be used. Like its ok for a child to hold a sucker, just not eat it, and then getting mad at them when they inevitably eat it.

jrvanwhy · 2024-03-14T00:14:40Z

unsafe in Rust exists for the purpose of providing memory safety. "Only trusted crates depend on this crate" is not a valid reason to make a function safe. A function needs to actually be incapable of causing memory safety to be marked safe.

mscratch is a good example of a CSR that really should be unsafe to modify, as it can be used to write arbitrary memory (even though that write is through a different piece of code).

On the other hand, liveness is not a memory safety issue. loop {} is the most memory-safe thing you could execute in Rust. A memory-safe function that makes the Tock kernel hang is a great time to use tools like "only core kernel crates depend on this" and // Please handle with care.

We should to do one of the following:

Go through each CSR one-by-one. If a CSR operation can be safe, we should document why, and if a CSR operation is unsafe then we should mark it unsafe and document the safety invariants.
Mark all the CSR operations as unsafe, with a // Safety: The caller is responsible for making sure they doesn't cause undefined behavior with this comment.

Given how much work the first option is, doing the second seems reasonable.

bradjc · 2024-03-15T21:28:41Z

The mscratch CSR in Tock is a global pointer variable in a language that doesn't really have global variables. The normal encapsulation rules that we expect from Rust don't apply, making it hard to ensure correct behavior.

I still contend the correct fix is scoping: the mscratch CSR should only be accessible from the file/function that we trust to set it correctly. If we could enable that, then setting the CSR hardware register is not unsafe, only using the stored value is unsafe.

Allowing mscratch to be set within any file (like a global variable), but guarding it behind unsafe, approximates the behavior we actually want. There is no valid reason for any code outside of the context switch riscv code to touch mscratch, and therefore none of that code should be able to, unsafe or not. If mscratch wasn't a hardware register but instead was just a normal Rust variable, then we could use Rust to control access using pub (or the lack of pub). We wouldn't, however, then explicitly create a public, yet unsafe, setter function, as there would be no reason to.

While thinking this through, I don't understand why we don't just delete the mscratch CSR from the csr crate. That implicitly guards it behind unsafe (as code would need asm to access the CSR), and also makes it no longer a public global variable.

Maybe it's worth figuring out mscratch first before deciding on the remaining CSRs? I still don't understand the argument that they are unsafe. We have a MPU object in cortex-m which only provides safe accesses, for example.

github-actions bot added tock-libraries This affects libraries supported by the Tock project risc-v RISC-V architecture WG-OpenTitan In the purview of the OpenTitan working group. labels Jul 18, 2023

lschuermann commented Jul 18, 2023

View reviewed changes

alistair23 approved these changes Jul 19, 2023

View reviewed changes

github-actions bot assigned alevy Feb 26, 2024

alevy added the waiting-on-author label Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch/riscv: make RISC-V CSR accesses unsafe #3549

arch/riscv: make RISC-V CSR accesses unsafe #3549

lschuermann commented Jul 18, 2023

lschuermann Jul 18, 2023

bradjc commented Jul 18, 2023

lschuermann commented Jul 18, 2023 •

edited

Loading

lschuermann commented Jul 18, 2023 •

edited

Loading

bradjc commented Jul 19, 2023

lschuermann commented Jul 19, 2023 •

edited

Loading

bradjc commented Jul 19, 2023

jrvanwhy commented Mar 14, 2024 •

edited

Loading

bradjc commented Mar 15, 2024

arch/riscv: make RISC-V CSR accesses unsafe #3549

Are you sure you want to change the base?

arch/riscv: make RISC-V CSR accesses unsafe #3549

Conversation

lschuermann commented Jul 18, 2023

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

lschuermann Jul 18, 2023

Choose a reason for hiding this comment

bradjc commented Jul 18, 2023

lschuermann commented Jul 18, 2023 • edited Loading

lschuermann commented Jul 18, 2023 • edited Loading

bradjc commented Jul 19, 2023

lschuermann commented Jul 19, 2023 • edited Loading

bradjc commented Jul 19, 2023

jrvanwhy commented Mar 14, 2024 • edited Loading

bradjc commented Mar 15, 2024

lschuermann commented Jul 18, 2023 •

edited

Loading

lschuermann commented Jul 18, 2023 •

edited

Loading

lschuermann commented Jul 19, 2023 •

edited

Loading

jrvanwhy commented Mar 14, 2024 •

edited

Loading