Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deadlock when ending task scheduler on POSIX #1217

Open
denravonska opened this issue Jan 3, 2025 · 10 comments
Open

[BUG] Deadlock when ending task scheduler on POSIX #1217

denravonska opened this issue Jan 3, 2025 · 10 comments
Labels
bug Something isn't working

Comments

@denravonska
Copy link

denravonska commented Jan 3, 2025

Describe the bug
We have a unit test runner that spawns a FreeRTOS task that runs our test suite and then calls vTaskEndScheduler to allow the main function to exit. This works most of the time but we noticed that there's an occasional deadlock.

Target

  • Development board: Host
  • Instruction Set Architecture: x64
  • IDE and version: Visual Studio Code 1.96.2
  • Toolchain and version: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
  • FreeRTOS commit: e55bde2

Host

  • Host OS: Ubuntu
  • Version: 24.04

To Reproduce
Example code:

void Task(void *)
{
    vTaskEndScheduler();
    vTaskDelete(nullptr);
}

int main(int argc, char ** argv)
{
    xTaskCreate(Task, "MainTask", 8192, nullptr, 6, nullptr);
    vTaskStartScheduler();

    printf("Done\n");
    return 0;
}

Running this in a loop helps triggering the issue. For me it triggers faster if I switch to another terminal:

while /bin/true; do ./test ; done

Looking at the threads we can see that the main task is stuck trying to take a mutex:

(gdb) info threads
  Id   Target Id                                             Frame 
* 1    Thread 0x78b84772ae40 (LWP 4087203) "Scheduler"       0x000078b846045fb8 in __GI___sigtimedwait (set=set@entry=0x78b84400ae60, info=info@entry=0x7ffd41908d10, timeout=timeout@entry=0x0)
    at ../sysdeps/unix/sysv/linux/sigtimedwait.c:31
  2    Thread 0x78b83f8006c0 (LWP 4087207) "Scheduler timer" 0x000078b8460ecadf in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x78b83f7ffb40, rem=rem@entry=0x0)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
  3    Thread 0x78b842e006c0 (LWP 4087204) "MainTask"        futex_wait (private=0, expected=2, futex_word=0x5080000000a0) at ../sysdeps/nptl/futex-internal.h:146
  
(gdb) thread 3
[Switching to thread 3 (Thread 0x78b842e006c0 (LWP 4087204))]
#0  futex_wait (private=0, expected=2, futex_word=0x5080000000a0) at ../sysdeps/nptl/futex-internal.h:146
warning: 146	../sysdeps/nptl/futex-internal.h: No such file or directory
(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0x5080000000a0) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x5080000000a0, private=0) at ./nptl/lowlevellock.c:49
#2  0x000078b8460a00f1 in lll_mutex_lock_optimized (mutex=0x5080000000a0) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x5080000000a0) at ./nptl/pthread_mutex_lock.c:93
#4  0x000062f66ed88eb7 in event_signal (ev=0x5080000000a0) at ../third-party/freertos/repo/portable/ThirdParty/GCC/Posix/utils/wait_for_event.c:104
#5  0x000062f66ed886cf in vPortCancelThread (pxTaskToDelete=<optimized out>) at ../third-party/freertos/repo/portable/ThirdParty/GCC/Posix/port.c:445
#6  0x000062f66ed774a0 in prvDeleteTCB (pxTCB=pxTCB@entry=0x62f66efdf720 <xIdleTaskTCB.3>) at ../third-party/freertos/repo/tasks.c:6445
#7  0x000062f66ed78726 in vTaskDelete (xTaskToDelete=<optimized out>) at ../third-party/freertos/repo/tasks.c:2316
#8  0x000062f66ed79fea in vTaskEndScheduler () at ../third-party/freertos/repo/tasks.c:3797
#9  0x000062f66eceeb26 in Task () at ../test/src/main.cpp:12
#10 0x000062f66ed881b0 in prvWaitForStart (pvParams=pvParams@entry=0x62f66eff0928 <ucHeap+65512>) at ../third-party/freertos/repo/portable/ThirdParty/GCC/Posix/port.c:465
#11 0x000078b84705ea42 in asan_thread_start (arg=0x78b846ef9000) at ../../../../src/libsanitizer/asan/asan_interceptors.cpp:234
#12 0x000078b84609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#13 0x000078b846129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

What's interesting is, if I interpret this correctly, that the mutex owner no longer exists:

(gdb) print ev.mutex
$3 = {__data = {__lock = 2, __count = 0, __owner = 4087205, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\245]>\000\001", '\000' <repeats 26 times>, __align = 2}
@denravonska denravonska added the bug Something isn't working label Jan 3, 2025
@rawalexe
Copy link
Member

rawalexe commented Jan 7, 2025

Hello @denravonska,
Thank you for your report, I'll forward this to the team and have a look.

@rawalexe
Copy link
Member

rawalexe commented Jan 7, 2025

Screenshot 2025-01-07 at 3 20 21 PM How long are you waiting? I modified the code a bit and tried and cannot reproduce.

@denravonska
Copy link
Author

Tried it with this script:

#!/bin/bash

counter=0

while true; do
   echo -n "$counter "
   ./test.out
   let counter++
done

Hung on run:

  • 9
  • 5481
  • 390
  • 1264
  • 735

@denravonska
Copy link
Author

Adding the config we're using if it helps.
FreeRTOSConfig.h.txt

@rawalexe
Copy link
Member

rawalexe commented Jan 10, 2025

I tried to build with the config and got a build error:

./FreeRTOS.h:2674:10: error: #error If configGENERATE_RUN_TIME_STATS is defined then portCONFIGURE_TIMER_FOR_RUN_TIME_STATS must also be defined. portCONFIGURE_TIMER_FOR_RUN_TIME_STATS should call a port layer function to setup a peripheral timer/counter that can then be used as the run time counter time base.
 2674 |         #error If configGENERATE_RUN_TIME_STATS is defined then portCONFIGURE_TIMER_FOR_RUN_TIME_STATS must also be defined.  portCONFIGURE_TIMER_FOR_RUN_TIME_STATS should call a port layer function to setup a peripheral timer/counter that can then be used as the run time counter time base.
./FreeRTOS.h:2679:14: error: #error If configGENERATE_RUN_TIME_STATS is defined then either portGET_RUN_TIME_COUNTER_VALUE or portALT_GET_RUN_TIME_COUNTER_VALUE must also be defined. See the examples provided and the FreeRTOS web site for more information.
 2679 |             #error If configGENERATE_RUN_TIME_STATS is defined then either portGET_RUN_TIME_COUNTER_VALUE or portALT_GET_RUN_TIME_COUNTER_VALUE must also be defined.  See the examples provided and the FreeRTOS web site for more information.

I can define the value but wanted to know what do you have for it?

@rawalexe
Copy link
Member

Can you actually provide the whole application and email me at rawalexe@amazon.com

@denravonska
Copy link
Author

I tried to build with the config and got a build error:

./FreeRTOS.h:2674:10: error: #error If configGENERATE_RUN_TIME_STATS is defined then portCONFIGURE_TIMER_FOR_RUN_TIME_STATS must also be defined. portCONFIGURE_TIMER_FOR_RUN_TIME_STATS should call a port layer function to setup a peripheral timer/counter that can then be used as the run time counter time base.
 2674 |         #error If configGENERATE_RUN_TIME_STATS is defined then portCONFIGURE_TIMER_FOR_RUN_TIME_STATS must also be defined.  portCONFIGURE_TIMER_FOR_RUN_TIME_STATS should call a port layer function to setup a peripheral timer/counter that can then be used as the run time counter time base.
./FreeRTOS.h:2679:14: error: #error If configGENERATE_RUN_TIME_STATS is defined then either portGET_RUN_TIME_COUNTER_VALUE or portALT_GET_RUN_TIME_COUNTER_VALUE must also be defined. See the examples provided and the FreeRTOS web site for more information.
 2679 |             #error If configGENERATE_RUN_TIME_STATS is defined then either portGET_RUN_TIME_COUNTER_VALUE or portALT_GET_RUN_TIME_COUNTER_VALUE must also be defined.  See the examples provided and the FreeRTOS web site for more information.

I can define the value but wanted to know what do you have for it?

That's really weird. We don't define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS at all, and I've verified that the FreeRTOSConfig.h gets included.

I have sent you a binary built with the following:

gcc -static -o freeze -O2 -g -ggdb \
    -I ../third-party/freertos/config  \
    -I $FREERTOS_ROOT/include \
    -I $FREERTOS_ROOT/portable/ThirdParty/GCC/Posix \
    src/main.cpp \
    $FREERTOS_ROOT/*.c \
    $FREERTOS_ROOT/portable/ThirdParty/GCC/Posix/port.c \
    $FREERTOS_ROOT/portable/ThirdParty/GCC/Posix/utils/wait_for_event.c \
    $FREERTOS_ROOT/portable/MemMang/heap_4.c

where ../third-party/freertos/config is the location of the above config and src/main.cpp is the above example. After sending I noticed that it also freezes with -O0 so I can provide you with a binary of that as well if it helps debugging.

@denravonska
Copy link
Author

denravonska commented Jan 10, 2025

I did some more testing with the config from examples/template_configuration and I am getting the freeze there as well. I had to modify my example by reducing the stack size and priority.

#include <FreeRTOS.h>
#include <task.h>
#include <stdio.h>

void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName)
{
    printf("OVERFLOW!\n");
}

void Task(void *)
{
    vTaskEndScheduler();
    vTaskDelete(nullptr);
}

int main(int argc, char ** argv)
{
    xTaskCreate(Task, "MainTask", 256, nullptr, 4, nullptr);
    vTaskStartScheduler();

    printf("Done\n");
    return 0;
}

Edit: I've also switched laptops and I get it on Arch in addition to Ubuntu.

@rawalexe
Copy link
Member

The document that you emailed me isn't the correct one, can you send me a valid zip or tar file

@denravonska
Copy link
Author

denravonska commented Jan 15, 2025

I'm not sure what you mean. It's a minimal binary that has the deadlock issue. If you need the source code rather than the binary it's in the post above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants