Tuesday, February 26, 2013

Rapid defines for Django

Its very popular to use choices In DJango models, i.e. enumerate possible choices, showing them as friendly strings to the user, but storing actual values as integers in DB. So we start with the following:
FILE_TYPE_CHOICES = (
    (10, "csv"),
    (20, "xml"),
    (30, "txt"),
  )

class DataFile(Model):
   name = CharField(max_length=256)
   file_type = IntegerField(choices=FILE_TYPE_CHOICES, default=10)
The above does the job but is obviously ugly, especially that default=10 part. For sure you will want to access those defines elsewhere in the code using their names and not magic numbers.

So lets split the choices part into a separate file - defines.py:

class FileTypes(object):
    csv = 10
    xml = 20
    txt = 30
    
    choices = (
       csv, "csv",
       xml, "xml",
       txt, "txt",
   )
Now we can rewrite our model like this:
from .defines import FileTypes

class DataFile(Model):
   name = CharField(max_length=256)
   file_type = IntegerField(choices=FileTypes.choices, default=FileTypes.csv)
That's does the job, but still defines.py file has a strong degree of silliness. For instance you will for sure want to also dictionary ("dchoice") to quickly convert integer values back into representative names, to show user-friendly errors for example. So quickly enough, you defines file will start looking like this:
class FileTypes(object):
    csv = 10
    xml = 20
    txt = 30
    
    choices = (
       csv, "csv",
       xml, "xml",
       txt, "txt",
   )

    dchoices = {
       csv : "csv",
       xml : "xml",
       txt : "txt",
   }
I.e. duplicating data 3 times!!

That choices and dchoices part needs to be definitely automated. The simple solution is to create get_choices and get_dchoices methods. But I would like to do it using properties. Mainly because when I've come to this problem that I have already had a load of defines built manually that way, i.e having choices and dchoices attributes defined and maintained manually and used massively.

Class properties - the problem

So how to define properties on classes? property decorator does not work for classes, only for class instances:
>>> class A(object):
...  @property
...  def m(self):
...   return "ice-cream"
... 
>>> A.m
<property object at 0x7ff432406f18> # But we want "ice-cream"!!!
>>> 
OK, so property wraps our m descriptor in descriptor protocol and assigns resultiing object to A under m attribute.

Two questions arise:

  1. Why does not it work for classes (i.e. only for instances)?
  2. And particularly why does A.m return property object?
To answer the first question lets look at the getter part of the descriptor protocol. __get__ function syntax is (from Python data model):
object.__get__(self, instance, owner)

Called to get the attribute of the owner class (class attribute access) or of an instance of that class (instance attribute access). owner is always the owner class, while instance is the instance that the attribute was accessed through, or None when the attribute is accessed through the owner. This method should return the (computed) attribute value or raise an AttributeError exception.

OK, so when we access A.m, python sees that m is a descriptor, that gets accessed through the owner, so the property's __get__ method is called like __get__(None, A).

Now lets look at the property's __get__ implementation. From Python source tree, file Objects/descrobject.c (I've added my comments):

static PyObject *
property_descr_get(PyObject *self, PyObject *obj, PyObject *type)
{
    propertyobject *gs = (propertyobject *)self;

    /* obj is None - return self */
    if (obj == NULL || obj == Py_None) {
        Py_INCREF(self);
        return self;
    }

    /* else check if registered getter exists and call it */
    if (gs->prop_get == NULL) {
        PyErr_SetString(PyExc_AttributeError, "unreadable attribute");
        return NULL;
    }

    return PyObject_CallFunction(gs->prop_get, "(O)", obj);
}
Now we can answer both questions 1 and 2:
  1. Its does not work because None is passed to the __get__
  2. property code returns itself if it gets None as an object instance
To those who kept reading till here, the solution to the problem is below.

Class properties - the solution

The root of the problem is that when a descriptor accessed through the owner, the instance is None; and property code is "hard-coded" to look only at instance. So lets just move the property code up in the hierarchy, i.e. to the metaclass!
>>> class A(object):
...  class __metaclass__(type):
...   @property
...   def m(cls):
...    return "ice-cream"
... 
>>> A.m
'ice-cream' # Yeeeha!!
>>> 
What happens? m does not belong to A anymore, but to its metaclass. So when we access A.m, A is passed to the __get__ as instance and type(A) as an owner; and the property code handles it properly.

Disclaimer: The last paragraph is to my best understanding of the attribute resolution process, which I feel I need to study more thoroughly.

Back to our original problem - now we can change our defines file as follows:

class MetaChoices(type):
    @property
    def choices(cls):
        if hasattr(cls, "_choices"): # caching
            return cls._choices
        choices = list()
        for k, v in cls.__dict__.items():
            if not k.startswith("_") or k == "choices":
                choices.append( (v, k) )
        cls._choices = choices
        return choices

    @property
    def dchoices(cls):
        if hasattr(cls, "_dchoices"):
            return cls._dchoices
        dchoices = dict()
        for k, v in cls.choices:
            dchoices[k] = v
        cls._dchoices = dchoices
        return dchoices


class FileTypes(object):
    __metaclass__ = MetaChoices
    csv = 10
    xml = 20
    txt = 30
Finally, lets enjoy!
>>> from defines import FileTypes
>>> FileTypes.csv
10
>>> FileTypes.choices
[(20, 'xml'), (10, 'csv'), (30, 'txt')]
>>> FileTypes.dchoices[FileTypes.csv]
'csv'
>>> 
This solution can be further enhanced to respect definition order, support verbose names, gettext, etc.