Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix xTaskNotifyWait & ulTaskNotifyTake determinism. #625

Closed
wants to merge 2 commits into from

Conversation

karver8
Copy link

@karver8 karver8 commented Feb 13, 2023

Description

Fixes the bug described in this issue:
#612

One unorthodox thing being done here, which I feel should be pointed out, is that I am calling vTaskSuspendAll() from inside a critical section. This is done to minimize the performance impact of this change, since one of the main benefits of task notifications is that they are faster than the other primitives. The only worry I see here is that it may cause issues with certain ports if they do tricky things inside portSOFTWARE_BARRIER() or portMEMORY_BARRIER(), but I cannot think of any examples.

Test Steps

As suggested by Richard Barry, adding configASSERT( ulCriticalNesting == 0UL ), into vListInsert(), which is a condition that should always be met, will cause it to be triggered by a call to xTaskNotifyWait() or ulTaskNotifyTake() if the task blocks. This of course only works on ports that have the variable ulCriticalNesting.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@karver8 karver8 requested a review from a team as a code owner February 13, 2023 22:20
@karver8 karver8 force-pushed the deterministic-task-notify branch from 5ff4250 to eeed340 Compare February 13, 2023 22:21
@Mancent Mancent linked an issue Mar 5, 2023 that may be closed by this pull request
This was linked to issues Mar 25, 2023
Closed
Closed
Closed
- Delayed task list is no longer walked while interrupts are disabled.

- Fixed comment that could lead to similar mistakes in the future.

- Minimize performance impact.
@karver8 karver8 force-pushed the deterministic-task-notify branch from eeed340 to 8daadaf Compare April 2, 2023 23:44
@karver8
Copy link
Author

karver8 commented Apr 3, 2023

Let me know if there is anything I need to do to move this forward. It looks like unit tests need to be updated in the other repository, these should be pretty simple changes but I am not sure how that would link up to this PR.

@n9wxu
Copy link
Member

n9wxu commented May 5, 2023

The code seems to work on an RP2040 but fails unit tests. These failures need to be checked and updated as required.

@karver8
Copy link
Author

karver8 commented May 6, 2023

@n9wxu Yes I have looked at the failures and I know how to fix them, but how do I synchronize changes to the unit test repo with changes in this repository? I can submit a PR in the FreeRTOS repo to fix the unit tests, but it looks like that will break the tests until the FreeRTOS-kernel submodule is updated?

@@ -3137,8 +3137,8 @@ void vTaskPlaceOnEventList( List_t * const pxEventList,
{
configASSERT( pxEventList );

/* THIS FUNCTION MUST BE CALLED WITH EITHER INTERRUPTS DISABLED OR THE
* SCHEDULER SUSPENDED AND THE QUEUE BEING ACCESSED LOCKED. */
/* THIS FUNCTION MUST BE CALLED WITH THE SCHEDULER SUSPENDED AND THE QUEUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this comment changed? The original "Interrupts disabled" should probably be "inside a critical section", but I think it's still valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because this function calls vListInsert(), which shouldn't be called while interrupts are disabled as far as I am aware. I noted this comment in my original issue. But I can change it back if you think this is not correct.

@Mancent Mancent linked an issue May 21, 2023 that may be closed by this pull request
Comment on lines +4758 to +4760
/* Suspend the scheduler before enabling interrupts. */
vTaskSuspendAll();
taskEXIT_CRITICAL();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the API doesn't allow calling vTaskSuspendAll() from inside a critical section. It probably works, but it would be the only place it happens in the FreeRTOS codebase.

The intent behind the larger construct being proposed here seems to be to avoid suspending/resuming the scheduler if possible, which in turn keeps ulTaskGenericNotifyTake() as lightweight as possible. Maybe a different construct could be used instead. For example, the idle task uses a different construct to avoid suspending/resuming the scheduler for tickless idle. A similar approach might work here -- specifically, checking the notification value while "unprotected" to determine whether to suspend the scheduler, and then verifying the notification value after suspending the scheduler if suspending is warranted.

aggarg pushed a commit that referenced this pull request Oct 17, 2023
This PR fixes the bug described in the following issue:
#612.
This was originally contributed in the following PR:
#625.

The implementation suspends the scheduler before exiting the
critical section (i.e. before enabling interrupts). If we do not do
so, a notification sent from an ISR, which happens after exiting
the critical section and before suspending the scheduler, will
get lost. The sequence of events will be:

1. Exit critical section.
2. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the Ready list.
3. Suspend scheduler.
4. prvAddCurrentTaskToDelayedList moves the task to the delayed
    or suspended list.
5. Resume scheduler does not touch the task (because it is not on
    the pendingReady list), effectively losing the notification from
    the ISR.

The same does not happen when we suspend the scheduler before
exiting the critical section. The sequence of events in this case will
be:

1. Suspend scheduler.
2. Exit critical section.
3. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the pendingReady list as the scheduler is suspended.
4. prvAddCurrentTaskToDelayedList adds the task to delayed or
    suspended list. Note that this operation does not nullify the add
    to pendingReady list done in the above step because a different
    list item, namely xEventListItem, is used for adding the task to the
    pendingReady list. In other words, the task still remains on the
    pendingReady list.
5. Resume scheduler moves the task from pendingReady list to the Ready list.

------------

Co-authored-by: Jacob Carver <karver8@github.com>
@kar-rahul-aws
Copy link
Member

Hi @karver8
We had to create a new PR #833 because the kernel code has diverged too much since the time this PR was created. We have ensured to mark you as the commit author and mentioned this in PR 833 description. We are closing this PR as PR 833 has been merged. Thank you for your valuable contribution.
Regards.

@karver8
Copy link
Author

karver8 commented Oct 22, 2023

Thanks! @kar-rahul-aws

n9wxu pushed a commit to n9wxu/FreeRTOS-Kernel that referenced this pull request Oct 26, 2023
This PR fixes the bug described in the following issue:
FreeRTOS#612.
This was originally contributed in the following PR:
FreeRTOS#625.

The implementation suspends the scheduler before exiting the
critical section (i.e. before enabling interrupts). If we do not do
so, a notification sent from an ISR, which happens after exiting
the critical section and before suspending the scheduler, will
get lost. The sequence of events will be:

1. Exit critical section.
2. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the Ready list.
3. Suspend scheduler.
4. prvAddCurrentTaskToDelayedList moves the task to the delayed
    or suspended list.
5. Resume scheduler does not touch the task (because it is not on
    the pendingReady list), effectively losing the notification from
    the ISR.

The same does not happen when we suspend the scheduler before
exiting the critical section. The sequence of events in this case will
be:

1. Suspend scheduler.
2. Exit critical section.
3. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the pendingReady list as the scheduler is suspended.
4. prvAddCurrentTaskToDelayedList adds the task to delayed or
    suspended list. Note that this operation does not nullify the add
    to pendingReady list done in the above step because a different
    list item, namely xEventListItem, is used for adding the task to the
    pendingReady list. In other words, the task still remains on the
    pendingReady list.
5. Resume scheduler moves the task from pendingReady list to the Ready list.

------------

Co-authored-by: Jacob Carver <karver8@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

> - [ ] ``` H G M https://docs.github.com/articles/keeping-your-account-and-data-secure/
5 participants