(cortex-m) unexpected kernel panic after thread exit #20812
Description
Description
I've noticed an unexpected kernel panic after a thread exited. I've traced it down to sched_switch
in core/sched.c
retrieving an invalid active thread. Inside sched_task_exit
, the sched_active_thread
pointer is being set to NULL and sched_switch
does not check if the retrieved thread pointer points to a valid thread.
Steps to reproduce the issue
I have created a small application that triggers the problem (tested on an STM32 NUCLEO-F401RE, problem initially seen on an EFR32). It uses the shell
module as the problem is triggered when the scheduler is invoked. After starting the application, enter a character in the console/terminal to invoke the scheduler.
Inside core/sched.c
I've added an assertion to enforce the problem (without it sometimes magically goes well).
void sched_switch(uint16_t other_prio)
{
thread_t *active_thread;
uint16_t current_prio;
int on_runqueue;
active_thread = thread_get_active();
assert(active_thread != NULL);
current_prio = active_thread->priority;
on_runqueue = (active_thread->status >= STATUS_ON_RUNQUEUE);
DEBUG("sched_switch: active pid=%" PRIkernel_pid " prio=%" PRIu16 " on_runqueue=%i "
", other_prio=%" PRIu16 "\n",
active_thread->pid, current_prio, on_runqueue,
other_prio);
main.c
#include <stdint.h>
#include <stdio.h>
#include "shell.h"
#include "thread.h"
char second_thread_stack[THREAD_STACKSIZE_MAIN];
static const shell_command_t shell_commands[] = {
{NULL, NULL, NULL},
};
void *second_thread(void *arg)
{
(void) arg;
puts("2nd: starting");
puts("2nd: exiting");
puts("Any character entered in the shell should now trigger the panic.");
return NULL;
}
int main(void)
{
int result = 0;
puts("main: starting");
kernel_pid_t main_pid = thread_create(
second_thread_stack,
sizeof(second_thread_stack),
THREAD_PRIORITY_MAIN - 1,
THREAD_CREATE_WOUT_YIELD,
second_thread,
NULL,
"nr2");
if (main_pid == -1)
{
puts("main: Error creating 2nd thread.");
result = -1;
}
if (result == 0)
{
char line_buf[SHELL_DEFAULT_BUFSIZE];
shell_run(shell_commands, line_buf, SHELL_DEFAULT_BUFSIZE);
}
return result;
}
Expected results
After accessing the console, I would expect the system to stay alive ;)
Actual results
After entering an enter character in the console, I get the following panic and stack trace.
> 2nd: starting
2nd: exiting
core/sched.c:288 => *** RIOT kernel panic:
FAILED ASSERTION.
ISR stack overflowed
Stack pointer corrupted, reset to top of stack
active thread: 2
FSR/FAR:
CFSR: 0x00008200
HFSR: 0x40000000
DFSR: 0x00000008
AFSR: 0x00000000
BFAR: 0xffffffff
Misc
EXC_RET: 0xfffffff1
Inside isr -13
Potential Fix
I've changed sched_switch
to include a check for active thread being valid to deal with threads having exited.
void sched_switch(uint16_t other_prio)
{
thread_t *active_thread = thread_get_active();
uint16_t current_prio = active_thread->priority;
int on_runqueue = (active_thread->status >= STATUS_ON_RUNQUEUE);
DEBUG("sched_switch: active pid=%" PRIkernel_pid " prio=%" PRIu16 " on_runqueue=%i "
", other_prio=%" PRIu16 "\n",
active_thread != NULL ? active_thread->pid : KERNEL_PID_UNDEF,
current_prio,
on_runqueue,
other_prio);
if ((active_thread == NULL) || !on_runqueue || (current_prio > other_prio)) {
if (irq_is_in()) {
I don't know if sched_switch
must be able to deal with this case or if sched_task_exit
shouldn't set the sched_active_thread
to NULL. The comment around thread_get_active
indicates the first. In that case we need to check if there are other functions that cannot deal with this case and potentially add assertions to help finding those cases in the future.
/**
* @brief Returns a pointer to the Thread Control Block of the currently
* running thread
*
* @return Pointer to the TCB of the currently running thread, or `NULL` if
* no thread is running
*/
static inline thread_t *thread_get_active(void)
....
Versions
RIOT version: master (5267300)
Operating System Environment
----------------------------
Operating System: "Ubuntu" "22.04.4 LTS (Jammy Jellyfish)"
Kernel: Linux 6.8.0-39-generic x86_64 x86_64
System shell: /usr/bin/dash (probably dash)
make's shell: /usr/bin/dash (probably dash)
Installed compiler toolchains
-----------------------------
native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
arm-none-eabi-gcc: arm-none-eabi-gcc (Arm GNU Toolchain 13.3.Rel1 (Build arm-13.24)) 13.3.1 20240614
Installed compiler libs
-----------------------
arm-none-eabi-newlib: "4.4.0"
Installed development tools
---------------------------
cmake: cmake version 3.22.1
doxygen: 1.9.1
git: git version 2.39.2
make: GNU Make 4.3
openocd: Open On-Chip Debugger 0.12.0+dev-00682-gefe902219 (2024-08-13-14:06)
python3: Python 3.10.12