Deadlock

A two-process deadlock. Source: Harvard 2018.

Deadlock is a problem that can occur when resources are shared among multiple processes. Suppose process P1 is waiting for a resource R1 currently being used by process P2. Meanwhile, P2 is waiting for resource R2 that's being used by P1. Neither process is able to proceed. This is an example of deadlock.

Shared resources could be files, database tables, memory, peripherals (printers, tape drives), network, etc. While we commonly talk of deadlocks among processes, it could be among threads, users, transactions, computers in a distributed system, etc.

While it's possible to design systems completely free of deadlocks, it's inefficient not to share resources or avoid multiprocessing. Instead, there are techniques to avoid, detect and recover from deadlocks.

Discussion

Could you explain deadlocks with an example? In real life, traffic deadlocks can occur at intersections. In computer science, the classic example used to illustrate deadlock is the Dining Philosophers Problem. Five philosophers are seated at a round dining table. There are five forks, each placed between two philosophers. Philosophers alternate between eating and thinking. To eat, a philosopher has to acquire both forks on either side of her plate but acquire them one at a time. Once a philosopher successfully gets both forks, she eats for a while, then puts down both forks and thinks for a while. Now her neighbouring philosophers can acquire one of the released forks and start eating if they already have another fork. Deadlock can occur when each philosopher acquires one fork and waits indefinitely for another fork. Thus, no philosopher can start to eat. They'll wait forever, neither eating nor thinking. In this analogy, philosophers are the processes. Forks are the shared resources. If resource sharing is poorly coordinated, deadlocks can occur. The system stalls. No useful work gets done.
What are some areas in computer science where deadlocks can occur? Deadlocks can occur in database applications. For example, two transactions have locks on two different tables but also require access to the other table. This happens because operations within one transaction executes between operations of the other transaction, something that's common in concurrent systems. The problem is worse when databases are distributed. Deadlocks can occur in multithreaded applications or multiprocessing systems with shared resources. For example, one process is reading a file and then attempts to print the file. However, another process is using the printer and suddenly wants access to the file held by the first process. Both processes are now deadlocked. Exclusive access to a resource is typically implemented as a lock or mutex. The term mutex is common when accessing critical sections of code. The term lock is common in databases.
Which are the necessary conditions for a deadlock to arise? There are four necessary conditions for a deadlock:
- Mutual Exclusion: A resource can't be assigned to more than one process at a time.
- Hold-and-Wait: A process that's holding a resource is also waiting for another resource.
- No Pre-emption: Process that's holding a resource can't be forced to or doesn't voluntarily give up that resource.
- Circular Wait: Each process is waiting for a resource held by another process. This condition implies the hold-and-wait condition.
How do I analyze or visualize a deadlock? Resource allocation graphs help us understand or analyze deadlocks. Resources and processes are nodes of the graphs. Directed edges connect the nodes. A directed edge from a resource to a process implies that the resource is assigned to that process. A directed edge from a process to a resource implies that the process is requesting that resource. Where there are multiple instances of a resource, a request edge terminates at the edge of the resource box. On the other hand, an assignment edge always terminates at an instance. Each instance is represented as a dot within the resource box. In the figure we note two examples. Both have circular waits but only one of them is a deadlock. In sub-figure (b), P1 and P3 are waiting for R1 and R2 respectively. However, instances of R1 and R2 will become available once released by P2 and P4 respectively. There's no deadlock since P2 and P4 are not part of the circular wait and neither do they fulfil the hold-and-wait condition.
Which are some techniques to handle a deadlock? The four common ways to handle deadlocks are:
- Prevention: System is designed with constraints to prevent at least one of the conditions required for a deadlock.
- Avoidance: Algorithms decide at runtime to deny a resource request if subsequent requests might lead to a deadlock. Decisions may be conservative resulting in lower resource utilization.
- Detection: Periodically check for deadlocks. Frequent checks imply a performance overhead. Infrequent checks imply deadlocks are not caught soon enough. Use triggers such as a drop in CPU utilization to start a check. Perhaps check when a resource request can't be granted.
- Recovery: Take action to break one of the conditions for a deadlock. One of the processes could be terminated, called victim selection. Or the process is forced to release resources it holds.
Deadlock prevention is easier than deadlock detection. For example, in the Dining Philosophers Problem, a simple rule will prevent deadlocks: given that the forks are numbered, each philosopher must first pick the lower-numbered fork before attempting to pick the higher-numbered fork.
Which are some concepts closely related to deadlock? While we often talk about processes blocked on resources, there are also communication deadlocks. In these cases, processes are waiting for messages from other processes. If two processes are waiting on each other for messages, they're in a communication deadlock. For generalization, we could consider messages as resources. Communication deadlocks are possible in distributed databases where a transaction requests some nonlocal data and is blocked until a response arrives. Sometimes the system is not in a deadlock but work is not getting or progressing slowly. For instance, a process is unable to obtain a resource for long periods of time while other processes easily obtain the same resource, perhaps because they have a higher priority. This is called starvation. Another problem is when processes are doing something, such as continually responding to each other or changing states, but not getting their tasks done. This is not a deadlock since processes are still running. It's called a livelock.
As a developer, what best practices should I follow to prevent deadlocks? To prevent deadlocks, request all necessary resources together. In other words, a process is not allowed to obtain one resource and then request another. If for some reason a process holds a resource and its request for another resource is denied, it must give up the first resource. Another best practice is to impose a linear order in which resources must be requested. For example, always request file access before a printer access. Impose a constraint that a process can hold at most one lock at a time. It's also been said, "never call other people's code while holding a lock", unless that code doesn't acquire a lock. To share resources, prefer alternatives to mutexes or locks. The resource could be replicated. Use a higher-level pattern to synchronize operations. For instance, the pipeline pattern serializes accesses without using mutexes. To prevent communication deadlocks, at least one process must use non-blocking IO. An alternative is to use separate threads for send and receive operations, so that the process can still send messages while blocked on the receive thread.

Milestones

E. J. Braude publishes an IBM technical report titled An algorithm for the detection of system deadlocks.