New job + Python descriptors

So, there's been a bit of a hiatus in my blogging activity, which has coincided with a change in my job.

I'm no longer employed by the university - as of the start of August I've been working as a founder of a startup. We're still in stealth mode, so output here will be work-related, but not too revealing, initially at least. I think it's probably safe to say that there will be much more Python than Fortran from now on!

Anyway, I thought it good practice to start writing English again, after several weeks of nothing but Python. Naturally of course these English words will concern Python …

So today I first used Python descriptors in anger. The particular pattern used I hadn't seen before, so I thought I'd write about it.

The problem I faced was how to nicely deal with an object which is expensive to initialize, which there should only be one of, and which is used by a number of other objects. If I were writing in Java, this would be a classic use-case for a Singleton, with some form of delayed initialization. How to do it in a more Pythonic way, though?

The easiest way to get Singleton-ish behaviour is probably to have the ExpensiveObject defined in its own module, with one instance instantiated as a module-level variable, and thus initialized on module import. This means that any other objects which need access to it can simply have a class attribute pointing at it.

elsewhere.py:
class ExpensiveObject(object):
    ...

expensive_instance = ExpensiveObject()
user.py:
class ObjectUser(object):
    from elsewhere import expensive_instance
    reference = expensive_instance
    ...

This doesn't delay instantiation, though - the instantiation is performed whenever the ObjectUser definition is processed. Since expensive_instance isn't always needed, it's annoying to have to always create it.

In order to do avoid that, clearly we need to remove the expensive_instance from elsewhere and replace the ObjectUser attribute reference with a function call.

We could do this in ObjectUser by overriding its getattr appropriately, to do the normal trick where we check whether expensive_instance is defined on this object, and if not, putting it there:

def __getattr__(self, name):
    if name == 'reference' and name not in self.__dict__:
        from elsewhere import expensive_instance
        object.__class__.name =  expensive_instance
    return object__getattr__(self, name)

which has a few problems.

  • Firstly, this involves doing this check for every attribute access on this object, which is an unnecessary price.

  • Secondly, if we are doing lots of getattr tricks for other attributes as well, it's messy to have them all in the same method.

  • Thirdly, we've set expensive_instance to be a class attribute, which means that every class for which we do this will get its own expensive_instance.

We could solve the second two issues with inheritance - have a small class (ExpensiveFactory?) which does nothing but override getattr for the attribute of interest. This isolates the getattr logic for this attribute, and makes sure that only one copy of expensive_instance is instantiated (as a class variable of ExpensiveFactory)

class ObjectUser(object, ExpensiveFactory):
    ...

But: we still haven't solved the first problem (speed of getattr) and we've introduced another - if any of these child classes want to override getattr, they have to remember to call super() all the way up the inheritance hierarchy (see Python's Super considered harmful)

Anyway - so this (I think) was exactly the reason that descriptors were invented. Instead of ExpensiveFactory, we have ExpensiveDescriptor:

elsewhere.py:
class ExpensiveDescriptor(object):
    _expensive_instance = None
    def __get__(self, instance, owner):
        if self.__class__._expensive_instance is None:
             from elsewhere import ExpensiveObject
             self.__class__._expensive_instance = ExpensiveObject()
        return self.__class__._expensive_instance
user.py:
class ObjectUser(object):
    expensive_instance = elsewhere.ExpensiveDescriptor()
    ...

Whenever ObjectUser().expensive_instance is accessed, the descriptor's get method is invoked, and an ExpensiveObject created - but not before then.

This happens for ObjectUser, and any classes which inherit from it, without any further interference in them.

And, get is implemented to have no cost when accessing any other attributes.

And, of course, since _expensive_instance is a class attribute of the Descriptor, there should only ever be one created.

Actually, you could have ExpensiveDescriptor manipulating the module attribute elsewhere.expensive_instance - this would let you get at the expensive_instance from anywhere in the code without having to go through an object - but only after it had been instantiated by one of the accessing objects. Might or might not be useful, depending on your use cases.

Anyway, so that's why descriptors are brilliant! For more reading, try:

Comments have been automatically disabled | digg this | reddit | del.icio.us