At my current company we rely heavily on Airflow for job orchestration and scheduling. From time to time we need to show these airflow task logs to our customers. The problem arises when the same task has multiple threads, each thread throws up its logs whenever it can which causes the logs to go into a frenzy. If there are more than 20-30 threads then the logs look like β
With that established, in this blog post we will look into how we can club the logs in each thread together and maybe print them in a ordered fashion.
Thread safe logging
Lets first create a sample code for thread safe logging
The current output for this will be
python main.py
2023-01-04 08:28:45 Hello from thread 0
2023-01-04 08:28:45 Hello from thread 1
2023-01-04 08:28:45 Hello from thread 2
2023-01-04 08:28:45 Hello from thread 3
2023-01-04 08:28:45 Hello from thread 4
2023-01-04 08:28:45 Hello from thread 5
2023-01-04 08:28:45 Hello from thread 6
2023-01-04 08:28:45 Hello from thread 7
2023-01-04 08:28:45 Hello from thread 8
2023-01-04 08:28:45 Hello from thread 9
2023-01-04 08:28:48 Hello from thread 0
2023-01-04 08:28:48 Hello from thread 1
2023-01-04 08:28:48 Hello from thread 4
2023-01-04 08:28:48 Hello from thread 2
2023-01-04 08:28:48 Hello from thread 3
2023-01-04 08:28:48 Hello from thread 5
2023-01-04 08:28:48 Hello from thread 6
2023-01-04 08:28:48 Hello from thread 7
2023-01-04 08:28:48 Hello from thread 9
2023-01-04 08:28:48 Hello from thread 8
As you can see the order for the threads is not maintained.
To go around this issue, we will first have to capture all the logs and then print them out sequentially. There are three ways to go about this
- Write each thread log into a seperate file
- Write each thread log into a seperate stream
- Write each thread log into the same I/O stream with thread number formatted in front of the log so that we can filter it later using thread logs
Letβs write our logs into separate stream and then print them out later.
Moving logs from console to our stream
Now we have captured all the logs to a string buffer.
Next step would be to group these logs by threads. We can do that by placing them in different files or using thread name as key for a dictionary instead of a io stream.
Note: based on your use case you can continue using string buffers. My original idea was to use different buffers ( file or string ) and then at the end aggregate these. While writing this up through of another data structure that would fit better for my use case which is given below.
Adding logs to a grouped queue
The output for the above will be
2023-01-04 09:21:02,136 - __main__ - INFO - Hello from thread 0
2023-01-04 09:21:05,139 - __main__ - INFO - Hello from thread 0
2023-01-04 09:21:02,140 - __main__ - INFO - Hello from thread 1
2023-01-04 09:21:05,143 - __main__ - INFO - Hello from thread 1
2023-01-04 09:21:02,141 - __main__ - INFO - Hello from thread 2
2023-01-04 09:21:05,143 - __main__ - INFO - Hello from thread 2
2023-01-04 09:21:02,141 - __main__ - INFO - Hello from thread 3
2023-01-04 09:21:05,144 - __main__ - INFO - Hello from thread 3
2023-01-04 09:21:02,142 - __main__ - INFO - Hello from thread 4
2023-01-04 09:21:05,145 - __main__ - INFO - Hello from thread 4
2023-01-04 09:21:02,142 - __main__ - INFO - Hello from thread 5
2023-01-04 09:21:05,145 - __main__ - INFO - Hello from thread 5
2023-01-04 09:21:02,142 - __main__ - INFO - Hello from thread 6
2023-01-04 09:21:05,145 - __main__ - INFO - Hello from thread 6
2023-01-04 09:21:02,142 - __main__ - INFO - Hello from thread 7
2023-01-04 09:21:05,147 - __main__ - INFO - Hello from thread 7
2023-01-04 09:21:02,143 - __main__ - INFO - Hello from thread 8
2023-01-04 09:21:05,146 - __main__ - INFO - Hello from thread 8
2023-01-04 09:21:02,143 - __main__ - INFO - Hello from thread 9
2023-01-04 09:21:05,146 - __main__ - INFO - Hello from thread 9
As you can see they are coming up nicely in groups of their threads.
Now another issue will arise here for the main thread. We want the main thread logs to come before the threads execute and after the threads execute in order.
Apart from this, once the main thread is printed or any thread is printed and you want to add more logs to it, we must ensure that we flush it. For now I am only adding the flush for mainThread.
So the final output comes as we wanted it to be
2023-01-04 09:27:25,036 - __main__ - INFO - Message 0
2023-01-04 09:27:25,036 - __main__ - INFO - Message 1
2023-01-04 09:27:25,036 - __main__ - INFO - Message 2
2023-01-04 09:27:25,036 - __main__ - INFO - Message 3
2023-01-04 09:27:25,036 - __main__ - INFO - Message 4
2023-01-04 09:27:25,036 - __main__ - INFO - Message 5
2023-01-04 09:27:25,036 - __main__ - INFO - Message 6
2023-01-04 09:27:25,036 - __main__ - INFO - Message 7
2023-01-04 09:27:25,036 - __main__ - INFO - Message 8
2023-01-04 09:27:25,036 - __main__ - INFO - Message 9
2023-01-04 09:27:25,036 - __main__ - INFO - Hello from thread 0
2023-01-04 09:27:28,038 - __main__ - INFO - Hello from thread 0
2023-01-04 09:27:25,037 - __main__ - INFO - Hello from thread 1
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 1
2023-01-04 09:27:25,037 - __main__ - INFO - Hello from thread 2
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 2
2023-01-04 09:27:25,037 - __main__ - INFO - Hello from thread 3
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 3
2023-01-04 09:27:25,037 - __main__ - INFO - Hello from thread 4
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 4
2023-01-04 09:27:25,038 - __main__ - INFO - Hello from thread 5
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 5
2023-01-04 09:27:25,038 - __main__ - INFO - Hello from thread 6
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 6
2023-01-04 09:27:25,038 - __main__ - INFO - Hello from thread 7
2023-01-04 09:27:28,041 - __main__ - INFO - Hello from thread 7
2023-01-04 09:27:25,038 - __main__ - INFO - Hello from thread 8
2023-01-04 09:27:28,039 - __main__ - INFO - Hello from thread 8
2023-01-04 09:27:25,038 - __main__ - INFO - Hello from thread 9
2023-01-04 09:27:28,041 - __main__ - INFO - Hello from thread 9
2023-01-04 09:27:28,041 - __main__ - INFO - Message 0
2023-01-04 09:27:28,041 - __main__ - INFO - Message 1
2023-01-04 09:27:28,041 - __main__ - INFO - Message 2
2023-01-04 09:27:28,041 - __main__ - INFO - Message 3
2023-01-04 09:27:28,041 - __main__ - INFO - Message 4
2023-01-04 09:27:28,041 - __main__ - INFO - Message 5
2023-01-04 09:27:28,041 - __main__ - INFO - Message 6
2023-01-04 09:27:28,041 - __main__ - INFO - Message 7
2023-01-04 09:27:28,041 - __main__ - INFO - Message 8
2023-01-04 09:27:28,041 - __main__ - INFO - Message 9