The Django Fandango Farrago – Looking at Django’s Physical Data Model Design

I’m sure I’m not the only Oracle Developer who, over the years, has conjured a similar mental image during a planning meeting for a new web-based application…

…and we’re going to use an ORM

If you want the full gory details as to why this is so troubling from an Oracle database perspective, it is a topic I have covered at length previously.

This time, however, things are different.

Yes, I am somewhat limited in my choice of database due to the hardware my application will run on (Raspberry Pi).

Yes, Django is a logical choice for a framework as I’m developing in Python.

But, here’s the thing, I plan to do a bit of an audit of the database code that Django spits out.

That’s right Django, No-one expects the Spanish Inquisition !

Donde esta el Base de datos ?!

I know, this is a character from Blackadder and not Monty Python, but I’ve often regretted the fact that there never seems to be a vat of warm marmalade around (or some kind of gardening implement for that matter), when you enter those all important application architecture discussions at the start of a project.

As a result, one or two further Blackadder references may have crept in to the remainder of this post…

What we’re looking at

The Application I’m developing is as described in my previous post and we’ll be using SQLite as the database for our application.

What I’ll be covering here is :

  • The physical data model we want to implement for our DVD Library Application
  • Using Django to generate the data Model
  • Installation and use of the SQLite command line
  • Tweaking our code to improve the model

We’re not too concerned about performance at this point. The application is low-volume in terms of both data and traffic.

I’ll point out aspects of the code that have a potential performance impact as and when they come up (and I notice them), but performance optimisation is not really the objective here.

The main aim is to ensure that we maximise the benefits of using a relational database by ensuring data integrity.

The target model

By default, Django applies a synthetic key to each table it creates. I have indulged this proclivity in the model that follows, although it’s something I will return to later on.

The application I’m building is a simple catalogue of DVDs and Blu-Rays we have lying around the house.

The main table in this application will be TITLE, which will hold details of each Title we have on Disk.

Note that the Unique Key for this table is the combination of TITLE_NAME, YEAR_RELEASED and MEDIA_FORMAT. Yes I do have some films on both DVD and Blu-Ray.

As for the relationships :

  • a film/tv SERIES may have one, or more than one TITLE
  • a TITLE may belong to one or more CATEGORY
  • a CATEGORY may apply to one or more TITLE

So, in addition to our main data table, TITLE, we need two reference tables – SERIES and CATEGORY. We also need a join table between CATEGORY and TITLE to resolve the many-to-many relationship between them.

Each of the tables will have a Synthetic Key, which makes storing of Foreign Key values simple. However, Synthetic Key values alone are no guarantee of the uniqueness of a record (beyond that of the key itself), so these tables will also require unique constraints on their Natural Keys to prevent duplicate records being added.

The final data model should ideally look something like this :

Fun with Synthetic Keys

The first tables we’re going to generate are the CATEGORY and SERIES reference tables.

As we’re using Django, we don’t need to type any SQL for this.

Instead, we need to go to the project directory and create a file called models.py.

So, if we’re using the installation I setup previously…

cd ~/dvds/dvds
nano models.py

…and now we can define the CATEGORY object like this :

from django.db import models

class Category(models.Model) :
    # Object properties defined here map directly to database columns.
    category_name = models.CharField(max_length = 50)

    def __str__(self):
        return self.category_name

We now need to tell Django to implement (migrate) this definition to the database so…

cd ~/dvds
./manage.py makemigrations dvds
./manage.py migrate

Now, if this were a common or garden Python article, we’d be heading over to the Python interactive command line (possibly via another Monty Python reference). The fact is though that I’m getting withdrawal symptoms from not writing any SQL so, we’re going to install a CLI for SQLite.

Incidentally, if hacking around on the command line is not your idea of “a big party”, you can always go down the route of obtaining an IDE for SQLite – SQLite Studio seems as good as any for this purpose.

If like me however, you regard the command line as an opportunity for “a wizard-jolly time”…

sudo apt-get install sqlite3

…and to access the CLI, we can now simply run the following :

cd ~/dvds
sqlite3 db.sqlite3

Django will have created the table using the application name as a prefix. So, in SQLite, we can see the DDL used to generate the table by running …

.schema dvds_category

The output (reformatted for clarity) is :

CREATE TABLE "dvds_category"
(
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "category_name" varchar(50) NOT NULL
);

The problem with this particular table can be demonstrated easily enough (incidentally, a Blokebuster is the opposite of a Chick Flick, in case you’re wondering)…

insert into dvds_category(category_name) values ('BLOKEBUSTER');

insert into dvds_category(category_name) values ('BLOKEBUSTER');

select *
from dvds_category;

1|BLOKEBUSTER
2|BLOKEBUSTER

As is evident, the Unique Key on category_name has not been implemented. Without this, the Synthetic Key on the table (the ID column) does nothing to prevent the addition of what are, in effect, duplicate records.

After tidying up…

delete from dvds_category;
.quit

…we need to re-visit the Category class in models.py…

from django.db import models

class Category(models.Model) :
    # Object properties defined here map directly to database columns.
    # Note that Django creates a synthetic key by default so no need to
    # specify one here
    category_name = models.CharField(max_length = 50, unique = True)

    def __str__(self):
        return self.category_name

This time, we’ve told Django that category_name has to be unique as well. So, when we migrate our change…

cd ~/dvds
./manage.py makemigrations dvds
./manage.py migrate

…and check the DDL that Django has used this time…

sqlite3 db.sqlite3
.schema dvds_category

…we can see that Django has added a Unique Constraint on the category_name…

CREATE TABLE "dvds_category"
(
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "category_name" varchar(50) NOT NULL UNIQUE
);

…meaning that we now no longer get duplicate category_names in the table…

insert into dvds_category(category_name) values('BLOKEBUSTER');
insert into dvds_category(category_name) values('BLOKEBUSTER');
Error: UNIQUE constraint failed: dvds_category.category_name

It’s worth noting here that some RDBMS engines create a Unique Index to enforce a Primary Key. Were this the case for this table, you’d end up with two indexes on a two-column table. This is would not be the most efficient approach in terms of performance or storage.

Assuming that’s not a problem, we can move on and add the Series object to models.py as it’s structure is similar to that of Category…

from django.db import models

class Category(models.Model) :
    # Object properties defined here map directly to database columns.
    # Note that Django creates a synthetic key by default so no need to
    # specify one here
    category_name = models.CharField(max_length = 50, unique = True)

    def __str__(self):
        return self.category_name

class Series(models.Model) :
    series_name = models.CharField( max_length = 50, unique = True)

    def __str__(self) :
        return self.series_name

…and deploy it…

cd ~/dvds
./manage.py makemigrations dvds
./manage.py migrate

…which should result in a table that looks like this in SQLite :

CREATE TABLE "dvds_series"
(
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "series_name" varchar(50) NOT NULL UNIQUE
);

Fun as it is messing around in the database on a command line, it’s not very frameworky…

DML using the Admin Interface

It’s a fairly simple matter to persuade Django to provide an interface that allows us to manage the data in our tables.

Step forward admin.py. This file lives in the same directory as models.py and, for our application as it stands at the moment, contains :

from django.contrib import admin
from .models import Category, Series

#Tables where DML is to be managed via admin
admin.site.register(Category)
admin.site.register(Series)

Save this and then just run :

./manage.py migrations

Now, if we run the server…

./manage.py runserver

…we can navigate to the admin site (appending /admin to the development server URL…

You can then connect using the credentials of the super user you created when you setup Django initially.

Once connected, you’ll notice that Django admin has a bit of an issue with pluralising our table names

We’ll come back to this in a mo. First though, let’s add some Category records…

Click the Add icon next to “Categorys” and you’ll see …

Once we’ve added a few records, we can see a list of Categories just by clicking on the name of the table in the Admin UI :

This list appears to be sorted by most recently added Category first. It may well be that we would prefer this listing to be sorted in alphabetical order.

We can persuade Django to implement this ordering for our tables, as well as correctly pluralizing our table names by adding a Meta class for each of the corresponding classes in models.py :

from django.db import models

class Category(models.Model) :
    # Object properties defined here map directly to database columns.
    # Note that Django creates a synthetic key by default so no need to
    # specify one here
    category_name = models.CharField(max_length = 50, unique = True)

    def __str__(self):
        return self.category_name

    class Meta :
        # set the default behaviour to be returning categories in alphabetical order by category_name
        ordering = ["category_name"]
        # Define the correct plural of "Category". Among other places, this is referenced in the Admin application
        verbose_name_plural = "categories"

class Series(models.Model) :
    series_name = models.CharField( max_length = 50, unique = True)

    def __str__(self) :
        return self.series_name

    class Meta :
        ordering = ["series_name"]
        verbose_name_plural = "series"

Once we migrate these changes :

./manage.py makemigrations dvds
./manage.py migrate

…and restart the dev server…

./manage.py runserver

…we can see that we’ve managed to cure the Admin app of it’s speech impediment…

…and that the Category records are now ordered alphabetically…

It’s worth noting that specifying the ordering of records in this way will cause an additional sort operation whenever Django goes to the database to select from this table.

For our purposes the overhead is negligible. However, this may not be the case for larger tables.

So far, we’ve looked at a couple of fairly simple reference data tables. Now however, things are about to get rather more interesting…

Foreign Keys and other exotic database constructs

The Title object (and it’s corresponding table) are at the core of our application.

Unsurprisingly therefore, it’s the most complex class in our models.py.

In addition to the Referential Integrity constraints that we need to implement, there are also the media_type and bbfc_certificate fields, which can contain one of a small number of static values.

We also need to account for the fact that Django doesn’t really do composite Primary Keys.

I’m going to go through elements of the code for Title a bit at a time before presenting the final models.py file in it’s entirety.

To start with then, we’ll want to create a couple of choices lists for Django to use to validate values for some of the columns in the Title table…

# Allowable values Lists
MEDIA_FORMAT_CHOICES = (
    ('BR', 'Blu Ray'),
    ('DVD', 'DVD'),
)

# British Board of Film Classification Certificates
# as per the official BBFC site - http://www.bbfc.co.uk
BBFC_CHOICES = (
    ('U', 'U'),
    ('PG', 'PG'),
    ('12A', '12A'),
    ('15', '15'),
    ('18', '18'),
    ('R18', 'R18'),
)

In a database, you would expect these valid values to be implemented by check constraints. Django however, goes it’s own way on this. I’d infer from the lack of resulting database constraints that the Choices Lists will work so long as you always populate/update your underlying tables via the Django application itself.

Incidentally, it is possible to reference these name/value pairs in Django templates should the need arise, something I will cover in a future post. It is for this reason that I’ve declared them outside of the classes in which they’re used here.

As with choices, the same appears to apply to the check we’ve added to ensure that we don’t get a silly value for the year a film was released, which necessitates …

from django.core.validators import MinValueValidator
...
year_released = models.IntegerField(validators=[MinValueValidator(1878)]) # Movies invented around 1878.

Our first attempt at the Title class looks like this :

class Title( models.Model) :

    # For optional fields, blank = True means you can leave the field blank when entering the record details
    # null = True means that the column is nullable in the database
    title_name = models.CharField( max_length = 250)
    year_released = models.IntegerField(validators=[MinValueValidator(1878)]) # Movies invented around 1878.
    bbfc_certificate = models.CharField("BBFC Certificate", max_length = 3, choices = BBFC_CHOICES)
    media_format = models.CharField("Format", max_length = 3, choices = MEDIA_FORMAT_CHOICES)
    director = models.CharField(max_length = 100, null=True, blank=True)
    synopsis = models.CharField( max_length = 4000, null = True, blank = True)
    series = models.ForeignKey( Series, on_delete = models.CASCADE, null = True, blank = True)
    number_in_series = models.IntegerField(null = True, blank = True)
    categories = models.ManyToManyField(Category, blank = True)

    class Meta :
        ordering = ["series", "number_in_series", "title_name"]

    def __str__(self) :
        return self.title_name

Hang on, haven’t I forgotten something here ? Surely I need some way of implementing the Natural Key on this table ?

You’re right. However this omission is deliberate at this stage, for reasons that will become apparent shortly.

Yes, this is part of a plan “so cunning you could brush your teeth with it”.

Even without this key element, there’s quite a lot going on here. In the main class :

  • the year_released cannot be before 1878
  • the bbfc_certificate and media_format columns are associated with their choices lists using the choices option
  • we’ve specified that series as type models.ForeignKey
  • we’ve specified categories as the somewhat intriguing type models.ManyToManyField

In the Meta class, we’ve stipulated a multi-column ordering clause. Note that the default ordering appears to put nulls last. Therefore Title records that have null series and number_in_series values will appear first.

When we plug this into our models.py and apply the changes…

./manage.py makemigrations dvds
./manage.py migrate

…then check in the database…

sqlite3 db.sqlite3
.tables dvds_title%
dvds_title             dvds_title_categories

…we can see that Django has created not one, but two new tables.

In addition to the DVDS_TITLE table, which we may have expected and which looks like this :

CREATE TABLE dvds_title (
    id               INTEGER        NOT NULL
                                    PRIMARY KEY AUTOINCREMENT,
    title_name       VARCHAR (250)  NOT NULL,
    year_released    INTEGER        NOT NULL,
    bbfc_certificate VARCHAR (3)    NOT NULL,
    media_format     VARCHAR (3)    NOT NULL,
    director         VARCHAR (100),
    synopsis         VARCHAR (4000),
    number_in_series INTEGER,
    series_id        INTEGER        REFERENCES dvds_series (id)
);

…Django has been smart enough to create a join table to resolve the many-to-many relationship between TITLE and CATEGORY :

CREATE TABLE dvds_title_categories (
    id          INTEGER NOT NULL
                        PRIMARY KEY AUTOINCREMENT,
    title_id    INTEGER NOT NULL
                        REFERENCES dvds_title (id),
    category_id INTEGER NOT NULL
                        REFERENCES dvds_category (id)
);

Whilst Django can’t resist slapping on a gratuitous Synthetic Key, it is at least clever enough to realise that a composite key is also required. To this end, it also creates an Unique Index on DVDS_TITLE_CATEGORIES :

CREATE UNIQUE INDEX dvds_title_categories_title_id_96178db6_uniq ON dvds_title_categories (
    title_id,
    category_id
);

So, it seems that Django can handle composite keys after all. Well, not quite.

Remember that we still need to add a unique key to TITLE as we’ve modelled it to have a Natural Key consisting of TITLE_NAME, YEAR_RELEASED and MEDIA_FORMAT.

We can do that easily enough, simply by adding a unique_together clause to Title’s Meta class in models.py :

class Meta :
    ordering = ["series", "number_in_series", "title_name"]
    # Natural Key for a Title record is title_name, year_released and media_format - we have some films on DVD AND Blu-Ray.
    unique_together = ('title_name', 'year_released', 'media_format',)

If we now apply this change…

./manage.py makemigrations dvds
./manage.py migrate

…we can see that Django has added the appropriate index…

CREATE UNIQUE INDEX dvds_title_title_name_ae9b05c4_uniq ON dvds_title (
    title_name,
    year_released,
    media_format
);

The really wacky thing about all this is that, if we had used the unique_together function in the first place, Django would not have created the Unique Key on the DVDS_TITLE_CATEGORIES table. However, as we’ve added Title’s Natural Key in a separate migration, Django leaves the Unique Key on DVDS_TITLE_CATEGORIES in place.

Irrespective of how practical the Synthetic Key on DVDS_TITLE may be, the fact is, it is defined as the Primary Key for that table. As DVDS_TITLE_CATEGORIES is a Join Table then, in relational terms, it should itself have a Natural Key consisting of the Primary Keys of the two tables it’s joining.

Anyway, our final models.py looks like this :

from django.db import models
from django.core.validators import MinValueValidator

# Allowable values Lists
MEDIA_FORMAT_CHOICES = (
    ('BR', 'Blu Ray'),
    ('DVD', 'DVD'),
)

# British Board of Film Classification Certificates
# as per the official BBFC site - http://www.bbfc.co.uk
BBFC_CHOICES = (
    ('U', 'U'),
    ('PG', 'PG'),
    ('12A', '12A'),
    ('15', '15'),
    ('18', '18'),
    ('R18', 'R18'),
)

class Category(models.Model) :
    # Object properties defined here map directly to database columns.
    # Note that Django creates a synthetic key by default so no need to
    # specify one here
    category_name = models.CharField(max_length = 50, unique = True)

    def __str__(self):
        return self.category_name

    class Meta :
        # set the default behaviour to be returning categories in alphabetical order by category_name
        ordering = ["category_name"]
        # Define the correct plural of "Category". Among other places, this is referenced in the Admin application
        verbose_name_plural = "categories"

class Series(models.Model) :
    series_name = models.CharField( max_length = 50, unique = True)

    def __str__(self) :
        return self.series_name

    class Meta :
        ordering = ["series_name"]
        verbose_name_plural = "series"

class Title( models.Model) :

    # For optional fields, blank = True means you can leave the field blank when entering the record details
    # null = True means that the column is nullable in the database
    title_name = models.CharField( max_length = 250)
    year_released = models.IntegerField(validators=[MinValueValidator(1878)]) # Movies invented around 1878.
    bbfc_certificate = models.CharField("BBFC Certificate", max_length = 3, choices = BBFC_CHOICES)
    media_format = models.CharField("Format", max_length = 3, choices = MEDIA_FORMAT_CHOICES)
    director = models.CharField(max_length = 100, null=True, blank=True)
    synopsis = models.CharField( max_length = 4000, null = True, blank = True)
    series = models.ForeignKey( Series, on_delete = models.CASCADE, null = True, blank = True)
    number_in_series = models.IntegerField(null = True, blank = True)
    categories = models.ManyToManyField(Category, blank = True)

    class Meta :
        ordering = ["series", "number_in_series", "title_name"]
        # Natural Key for a Title record is title_name, year_released and media_format - we have some films on DVD AND Blu-Ray.
        unique_together = ('title_name', 'year_released', 'media_format',)

    def __str__(self) :
        return self.title_name

We also want to add Title to admin.py so that we can perform DML on the table in the admin application. Hence our final admin.py looks like this :

from django.contrib import admin
from .models import Category, Series, Title

#Tables where DML is to be managed via admin
admin.site.register(Category)
admin.site.register(Series)
admin.site.register(Title)

Conclusion

Django makes a pretty decent fist of implementing and maintaining a Relational Data Model without the developer having to write a single line of SQL.

Of course, as with any code generator, some of it’s design decisions may not be those that you might make if you were writing the code by hand.

So, if the data model and it’s physical implementation is important to your application, then it’s probably worth just checking up on what Django is up to in the database.

稿源:The Anti-Kyte (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » The Django Fandango Farrago – Looking at Django’s Physical Data Model Design

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录