3 мая 2012 г.

Automatic filtering in SQLAlchemy: motivation

Server side code of web project usually has 3 layers:

  • data classes mapped to relational database,
  • request handlers for each URL pattern,
  • templates used to render pages.

Simple request handlers contain code like the following:

item = session.query(Entry).get(item_id)
items = session.query(Entry)[:limit]

When Entry class has public attribute and objects should be shown when Entry.public is True only (the simplest example of publicity condition; in real life it might be composite and even involve related tables) we have to include this condition in queries:

item = session.query(Entry).filter_by(public=True, id=item_id).scalar()
items = session.query(Entry).filter_by(public=True)[:limit]

Note, that we already violate DRY principle (the same condition should be used every time we query Entry), but it’s still not problem. Now let’s add relation to some Child class that has similar condition for publicity. If we pass only item or items to template, we have to be careful using their data:

{% for child in item.children %}…{% endfor %}
must be replaced with
{% for child in item.children %}
{% if child.public %}…{% endif %}
{% endfor %}

In real life it becomes even more complex: a simple test for empty list is already not so simple. Do we have other options? Yes, we can pass each relation as separate variable and move filtering to the code. This will prevent mess in templates, but this won’t prevent us from using relations directly by mistake. Do you think this shouldn’t happen? We are lazy, and I doubt anybody will define separate variable for relation that doesn’t have publicity condition (yet). But life changes and eventually we might need this condition. Now one developer adds new field to the database, changes all related request handlers and (if he is a responsible person) even templates. Simultaneously (or even later, since people remember code patterns they often used) other person adds usage of this relation unfiltered in some other place and we have unpublished data leaked to public. International scandal, world war III begins (joke).

In fact, templates developer shouldn’t care about publicity of data. Unpublished data must not reach templates.
Constructing some data structures specially for templates leads to verbose request handler code instead for concise single line:
item = session.query(Entry).filter_by(public=True, id=item_id).scalar()
data = {‘id’: item.id,
        ‘title’: item.title,
        ‘date’: item.date,
        ‘body’: item.body}
data[‘children’] = children = []
for child in item.children:
    if not child.public:
    child_data = {‘id’: child.id,
                  ‘title’: child.title,
                  ‘data’: child.data,
                  ‘body’: child.body}
    if child.author and child.author.public:
        child_data[‘author’] = author = {‘id’: child.author.id,
                                         ‘name’: child.author.name}
        if child.author.company and child.author.company.public:
            author[‘company’] = {‘id’: child.author.company.id,
                                 ‘title’: child.author.company.title}

Here is statistics from one big project where I’m involved in development. The numbers below cover public segment only (internal services like editor interface are not included).

  • 458 templates
  • 6 databases with 210 tables
  • 135 mapped classes, 5 of them are bases for inheritance trees
  • Data for 63 mapped classes must not go to public unless some condition is met (15 of them indirectly through inheritance). Those are only conditions that can’t be applied when replicating data from internal segment to public without significant impact on performance (changing state field of parent object would trigger publication or deletion of a huge list of children; using publication time in future requires some scheduler to trigger publication), the rest is filtered out before reaching database for public sites.

Having we can’t change relations behavior in request handler (this breaks ORM’s single object for each identity rule) I see the following 2 ways to solve the problem:

  • define separate mapped classes for public site,
  • instruct session to filter all ORM queries.

Both ways have problems and require separate analysis.