At Mapado, we use Celery, a python distributed task queue, to queue, distribute and execute all kinds of tasks. For example, in the case where you’d like to submit an activity to Mapado, which it already online on another website (including Facebook), you can paste its URL in our form, so that we can pre-fill the form for you. Try it, it’s pretty awesome!

What could seem like “magic” is in fact quite the opposite. When you paste the URL, you simply send a Celery task to our servers, which will be assigned to a Celery worker. The latter will then download the page, analyze  and datamine it, in order to extract the activity information, and finally return them to you. The entire process is encapsulated in a Celery task, based on external (private and public) packages. If, for any reason, the process fails and returns an error, we want to know what failed quite rapidly, in order to fix it. To do so, we need logging. A lot of it.

Logging what’s happening in your tasks

Unfortunately, Celery does not provides us with a simple way to use a specific logger for each task. The closest thing I could find was a module specific logger,  via the  celery.utils.log.get_task_logger  function.

For example, if the logger was configured to send all messages into a  'celery.log'  file, both task calls would be logged in the same file. In that trivial example case, it does not really matter, but in a real life case, things can become messy quite rapidly. Why? Because we may want to include in our log file all logging messages, with a sufficiently high level, sent from external dependencies. In that case, it will be confusing to group several log messages from different stacks in the same file. It will make debugging confusing and hard, and your developer sad.

Defining one logger per task

Using a subclass

One easy way around this problem would be to define a single Celery task per module. This way, a module level logger would also be a task specific logger. It would also turn your project into an unmaintainable file soup while it grows bigger.  Let’s not go that way.

The first part of the solution is knowing that a Celery task can either be defined as a function, or as a class. The previous add task can also be defined as such:

You can then subclass the  celery.Task  class, overriding its  __init__  method.

This would however have to be duplicated in each Task definition, as the  __name__  attribute would depend of the module in which it would be defined. Quite cumbersome…

Using a class decorator

To fix this problem, we define a  register_task_logger  class decorator, that will instantiate a logger at the class instance level. In other words, an instance of the decorated class will have access to a logger via its  self.log  attribute. This decorator will take the module name of the decorated class as argument.

NB: if you’re not familiar with python decorators, I’d suggest you read this excellent article.

Now, you just have to decorate each task.

Logging configuration

Finally, configure your loggers so that the  myapp.tasks.AddTask  logger logs into the  'addtask.log'  file, and the  myapp.tasks.SubtractTask  logs into the 'subtracttask.log' file. I use the  logging.config.dictConfig function for that.

And there you have it! Task specific logging.