Written by Dr. Johan Kraft, CEO and founder of Percepio
The central idea underlying an RTOS with a fixed-priority scheduler is that a high-priority task should be scheduled ahead of one with lower priority. If necessary, the RTOS can even pre-empt the running task, forcing it to yield the CPU to a higher priority task. Yet, as a developer you must watch out for programming pitfalls that can result in a higher priority task having to wait for a lower priority task – this condition is known as priority inversion.
Priority inversions can occur in conjunction with a mutex, message queue or other type of synchronization object. The best way to describe the problem is probably to step through an example.
In the timeline diagram below, captured with Tracealyzer, we have a low-priority task (green) executing. It takes a binary semaphore to protect some critical section and continues to execute code within the critical section. When the high-priority task (red) enters the ready state, the RTOS pre-empts the green task and lets the red run. The red task tries to grab the same binary semaphore but is blocked as the low-priority green task is holding it.
So far, everything is fine – this is expected behaviour. In general, the green task would now run and quickly release the semaphore, at which time it is again pre-empted, and the red task can obtain the semaphore and proceed. This time, however, an inversion occurs instead. For some reason, maybe a timed wait that has expired, a medium-priority (orange) task has entered the ready state and is allowed to execute ahead of the green task. As the orange task has no knowledge of the contested semaphore, it happily runs to completion. Only then does the green task finally run so that it can release the semaphore and hand over execution to the red, high-priority task.
So, the high-priority task was blocked and had to wait for an indeterminate time while a medium-priority task ran to completion. That is Priority inversion at work.
It is important to realize that the three tasks involved here were essentially helpless. None of them could have done anything to avoid the inversion, at least not without some support from the RTOS. Luckily, such support is available in many RTOSes in the form of mutexes with priority inheritance. Priority inheritance means that if a high priority task blocks while attempting to obtain a mutex that is currently held by a lower priority task, then the priority of the task holding the mutex is temporarily raised to that of the blocked task. In our scenario, when the red task was blocked the green task would have been elevated to red priority, effectively preventing the orange task from running.
Priority inheritance does not really cure priority inversion, it just minimises its effect in some situations. Hard real-time applications should still be carefully designed such that priority inversion does not happen in the first place.
Generally, avoid blocking on shared resources whenever possible. As an example, if your task writes data to a message queue (that might become full) you could instead use a sufficiently large queue that doesn’t get full and, as an extra precaution, write in a non-blocking manner and check the return value for any failed writes. And instead of using multiple critical sections scattered all over the code (sharing a mutex) you can instead create a “server” task that takes requests from “client” tasks using a message queue, in a non-blocking manner, and performs all direct operations on the resource. The server can send any responses via other message queues, specified in the requests, that are owned by the client tasks.
This is part of a series from Zephyr Project member Percepio:
- RTOS Debugging: Dealing with Timing Issues
- RTOS debugging: When the CPU has too much on its hands
- RTOS Debugging: Chasing the jitter bug
If you have any questions or comments, please feel free to reach out to the Zephyr community on the Zephyr Discord Channel.