Python Parallel Processing using MultiProcessing !!!
INTRODUCTION :
In the world of Python programming, efficient execution of required tasks is essential for building high-performance applications. Asynchronous programming and multiprocessing are a few powerful techniques that can help you achieve parallelism in your code, improving its speed and responsiveness. However, understanding when and how to use threads and async can be decisive considering available space and CPU cores.
Here, we will discuss and execute the multiprocessing in Python, which will help us to execute similar(repeating) tasks parallelly.
Sequential Processing:
Sequential processing involves executing tasks one after another, following a specific order. The next process starts only after the completion of the previous one, and the same flow goes on. Sequential processing utilizes less CPU and memory compared to parallel processing.
The main disadvantage of using this processing is that it takes more time as a single instruction is getting executed at a given point of time.
Code for Sequential Processing:
import time
my_list = [1,2,3,4,5,6]
def my_func(num):
print(f"Inside function for task {num}")
print("Sleeping for 4 seconds")
time.sleep(4)
print("Completed execution of task ",num,"\n")
for num in my_list:
my_func(num)
In the above sequential code, we are defining a Python function to get a parameter as input. The function prints the input and sleeps for 4 seconds. Here in this function block definition, you can define your own work. The execution occurs sequentially, each task completes in 4 seconds due to sleep of 4 seconds.
The output looks like below:
Sequential Execution Interpretation:
As we have 6 elements in the defined list 'my_list'. For each element in the list, the function my_func() is called. As it takes 4 seconds to execute a function for one element, it takes 24 seconds to execute for a list.
Parallel Processing:
We will execute the same above function my_func parallelly using multiprocessing. For this purpose, we will import a multiprocessing package and create a pool of processes using "Pool" module. When pool object is executed , it creates required simultaneous processes that are defined while creating pool.
Code :
from multiprocessing import Pool
if __name__ == "__main__":
pool = Pool(3)
pool.map(my_func,my_list)
pool
The output looks like below:
Parallel Execution Interpretation:
We created a pool of 3 processes. Hence, at a time 3 processes will run for 3 elements simultaneously. Once these 3 executions are complete, the next 3 executions will be triggered for the other 3 elements. This way it takes 4 seconds for the first 3 elements to execute and further 4 seconds for the next 3 elements , making it total of 8 seconds.
Points to remember:
1. You can adjust the pool size as per the available cores and keeping in mind the daemons of OS. You can check available cores with os.cpu_count() function.
2. You can update the code in my_func to accommodate your own logic for required tasks eg. Data extractions, Files movement, table updates etc.
3. You can add os.getpid() function in my_func() to check the process id of every process.
Comments
Post a Comment