Skip to content

(cortex-m) unexpected kernel panic after thread exit #20812

@Sanderhuisman

Description

Description

I've noticed an unexpected kernel panic after a thread exited. I've traced it down to sched_switch in core/sched.c retrieving an invalid active thread. Inside sched_task_exit, the sched_active_thread pointer is being set to NULL and sched_switch does not check if the retrieved thread pointer points to a valid thread.

Steps to reproduce the issue

I have created a small application that triggers the problem (tested on an STM32 NUCLEO-F401RE, problem initially seen on an EFR32). It uses the shell module as the problem is triggered when the scheduler is invoked. After starting the application, enter a character in the console/terminal to invoke the scheduler.

Inside core/sched.c I've added an assertion to enforce the problem (without it sometimes magically goes well).

void sched_switch(uint16_t other_prio)
{
    thread_t *active_thread;
    uint16_t current_prio;
    int on_runqueue;

    active_thread = thread_get_active();
    assert(active_thread != NULL);

    current_prio = active_thread->priority;
    on_runqueue = (active_thread->status >= STATUS_ON_RUNQUEUE);

    DEBUG("sched_switch: active pid=%" PRIkernel_pid " prio=%" PRIu16 " on_runqueue=%i "
          ", other_prio=%" PRIu16 "\n",
          active_thread->pid, current_prio, on_runqueue,
          other_prio);

main.c

#include <stdint.h>
#include <stdio.h>

#include "shell.h"
#include "thread.h"

char second_thread_stack[THREAD_STACKSIZE_MAIN];

static const shell_command_t shell_commands[] = {
  {NULL, NULL, NULL},
};

void *second_thread(void *arg)
{
    (void) arg;

    puts("2nd: starting");

    puts("2nd: exiting");
    puts("Any character entered in the shell should now trigger the panic.");

    return NULL;
}

int main(void)
{
    int result = 0;

    puts("main: starting");

    kernel_pid_t main_pid = thread_create(
      second_thread_stack,
      sizeof(second_thread_stack),
      THREAD_PRIORITY_MAIN - 1,
      THREAD_CREATE_WOUT_YIELD,
      second_thread,
      NULL,
      "nr2");
    if (main_pid == -1)
    {
        puts("main: Error creating 2nd thread.");
        result = -1;
    }

    if (result == 0)
    {
        char line_buf[SHELL_DEFAULT_BUFSIZE];
        shell_run(shell_commands, line_buf, SHELL_DEFAULT_BUFSIZE);
    }

    return result;
}

Expected results

After accessing the console, I would expect the system to stay alive ;)

Actual results

After entering an enter character in the console, I get the following panic and stack trace.

> 2nd: starting
2nd: exiting
core/sched.c:288 => *** RIOT kernel panic:
FAILED ASSERTION.


ISR stack overflowed
Stack pointer corrupted, reset to top of stack
active thread: 2
FSR/FAR:
 CFSR: 0x00008200
 HFSR: 0x40000000
 DFSR: 0x00000008
 AFSR: 0x00000000
 BFAR: 0xffffffff
Misc
EXC_RET: 0xfffffff1
Inside isr -13

Potential Fix

I've changed sched_switch to include a check for active thread being valid to deal with threads having exited.

void sched_switch(uint16_t other_prio)
{
    thread_t *active_thread = thread_get_active();
    uint16_t current_prio = active_thread->priority;
    int on_runqueue = (active_thread->status >= STATUS_ON_RUNQUEUE);

    DEBUG("sched_switch: active pid=%" PRIkernel_pid " prio=%" PRIu16 " on_runqueue=%i "
        ", other_prio=%" PRIu16 "\n",
        active_thread != NULL ? active_thread->pid : KERNEL_PID_UNDEF,
        current_prio,
        on_runqueue,
        other_prio);

    if ((active_thread == NULL) || !on_runqueue || (current_prio > other_prio)) {
        if (irq_is_in()) {

I don't know if sched_switch must be able to deal with this case or if sched_task_exit shouldn't set the sched_active_thread to NULL. The comment around thread_get_active indicates the first. In that case we need to check if there are other functions that cannot deal with this case and potentially add assertions to help finding those cases in the future.

/**
 * @brief   Returns a pointer to the Thread Control Block of the currently
 *          running thread
 *
 * @return  Pointer to the TCB of the currently running thread, or `NULL` if
 *          no thread is running
 */
static inline thread_t *thread_get_active(void)
....

Versions

RIOT version: master (5267300)

Operating System Environment
----------------------------
         Operating System: "Ubuntu" "22.04.4 LTS (Jammy Jellyfish)"
                   Kernel: Linux 6.8.0-39-generic x86_64 x86_64
             System shell: /usr/bin/dash (probably dash)
             make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (Arm GNU Toolchain 13.3.Rel1 (Build arm-13.24)) 13.3.1 20240614

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "4.4.0"

Installed development tools
---------------------------
                    cmake: cmake version 3.22.1
                  doxygen: 1.9.1
                      git: git version 2.39.2
                     make: GNU Make 4.3
                  openocd: Open On-Chip Debugger 0.12.0+dev-00682-gefe902219 (2024-08-13-14:06)
                  python3: Python 3.10.12

Metadata

Assignees

Labels

Area: coreArea: RIOT kernel. Handle PRs marked with this with care!Type: bugThe issue reports a bug / The PR fixes a bug (including spelling errors)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions