Python’s GIL, Multithreading and Multiprocessing
Programming languages aren’t necessarily controversial but Python’s Global Interpreter Lock (GIL) is a hotly debated topic. The GIL continues to bind Python’s thread of execution to one CPU causing bottlenecks and delays. So why do we still need the GIL and what’s the workaround?
What Is the GIL?
Python’s GIL, and Python as a language predates this multi-CPU world we live in currently. The first multicore processor was created in the early 2000s, about 10 years after the birth of Python. Before adding multiple cores to a machine was best practice, the premier way to speed up a system was to enhance the system’s single CPU. This meant that at the time of Python’s creation, the language was always tied to a single CPU.
Here’s where the GIL did (and does) came in handy. Python’s memory management isn’t thread-safe. This means two threads can’t access the same memory at the same time without risking data corruption. So even back in the days when all CPU bound tasks took place on a single CPU, it was imperative that not only was there order, but that the developer determined the order of operations rather than the CPU or luck. With the GIL acting as a giant lock, the thread of execution remained aligned with the developer’s plans and memory safety persists.
But now we live in a multicore world and Python is the language of choice for many compute heavy operations that might be better served by many CPUs so why not remove the GIL altogether? This process would require fundamental and breaking changes. Removing the GIL means changing Python memory handling.
Python Multiprocessing and Multithreading
Multiprocessing and multithreading are two ways to break the larger thread of execution into smaller threads.
Multithreading
For I/O intensive tasks, multithreading is a solid option. Multithreading is the process when a processor executes multiple threads concurrently. The threads run concurrently and parallel to one another on the same CPU, thus sharing the same memory space within a parent process. Multithreading saves system memory, increases computing speed, and improves application performance. Responsive UIs are a use case that frequently use multithreading.
Python supports multithreading via its threading module. The code snippet below includes the setup and execution of two threads.
numbers()
and letters()
both represent a task section that was separated into its own thread. The next step was to create a Thread
object with the target function for execution. The start()
method started each thread and the join()
method waited for the threads to finish executing before moving the program forward.
There are downsides to multithreading. Multithreading is still bound by the GIL’s functionality. The code itself can be more challenging to read leading to tougher troubleshooting and debugging processes. Multithreading processes can’t be interrupted.
Multiprocessing
Multiprocessing is how Python developers can work outside of the limitations of the GIL. This makes it a great choice for CPU intensive tasks. It’s more cost effective and efficient than single processor systems. Multiprocessing is similar to threading, but the workload is spread across multiple CPUs with each additional CPU furthering the speed, power, and memory capacity. Multiprocessing requires more memory storage than threads to move data between processes. An inter-process communication (IPC) model must be implemented to share objects between processes.
There’s also a Python module that supports multiprocessing. Though different than multithreading, the code looks very similar.
Fear not!
Even with the GIL you won’t have to process billions of data points one at a time with multithreading and multiprocessing on the job. Just a quick Google search will reveal thousands of searches about the GIL and who does/ doesn’t want it gone. It is by far Python’s most controversial topic.