Schema Management Under Source Control

by Omer Gertel

Deploying new features that require changes to the database is a technical challenge in any application. To quote Jeff Atwood from Coding Horror: “When I ask development teams whether their database is under version control, I usually get blank stares.”

The database is a critical part of your application. If you deploy version 2.0 of your application against version 1.0 of your database, what do you get? A broken application. And that’s why your database should always be under source control right next to your application code. You deploy the app, and you deploy the database. Like peanut butter and chocolate, they are two great tastes that taste great together.

 

The Database Is Under Source Control

At Kaltura, we have kept the schema definition scripts under source control for a while. This allows you to checkout the latest Kaltura Community Edition, and install a new working server easily. Terrific! Enjoy your new Kaltura deployment.

But what can you do if you already have an existing Kaltura server running? How can we help you get the new and exciting features added to the latest Kaltura version? Upgrading to the latest version of the code and the database is not enough. We also need to migrate your existing data to the new version, change configuration files and even update 3rd party softwares we are using in the server.

In Eagle we created an update script that runs all SQL and PHP updates on the server and all SQL updates on our Kaltura Data Warehouse (DWH), which is a good start for our future upgrade process.

Our upgrade script needs to know which version of schema is already installed, and then understand and automatically apply the updates to the schema. Luckily, it does. As part of the Kaltura development cycle we deploy features to a variety of servers all the time. To deploy a feature to QA, integration, and finally into our SaaS offering, as easy as possible, we developed a method that allows us to automatically update a server to a newer version of Kaltura, and you get to benefit from this as well.

 

Automatic Change Management That Doesn’t Hurt

In the Kaltura open-source spirit, I’m going to describe how we manage schema changes easily in the DWH.

First, we have put in place a strong convention for defining schemas. The initial schema definition is stored in the dwh/ddl directory. Inside, we place all changes under dwh/ddl/migrations/ and create a directory named  DATE_VERSION_DESCRIPTION  that contains the changes, where VERSION is a positive number used to define the order of the change.

For example:
20110329_5053_reaggregate_storage_usage

Second, we have created a table in the database called version_management, which holds the list of changes we have already applied for this deployment.

Third, we have written an update to automatically discover and apply new changes:

  1. Gets the latest version of the code and schema from the source control
  2. Checks the database for a table called version_management, and finds the latest applied changes from the database.
  3. Finds new changes in the dwh/ddl/migration directory that have a version number larger than the latest applied change in the database.

For each new change:

  • Execute all the SQL scripts inside the change, by alphabetical order. If the order matters, we make sure to number them accordingly.
  • Insert a new row into the version_management table with the new change id and the file name. Tracking file names in the database allows us to add more scripts to a change before we finish development and tests. This is also a good way to recover from failure, if such occurs.

 

This simple mechanism allows Kaltura to deploy new features and fixes (PHP and SQL) without hassles, by running a single script that keeps all our internal environments in sync. If you are using a database, you should have a way of updating it seamlessly.
The updating mechanism is easy to develop (just look at our code) and totally worth your while. What are your thoughts about this mechanism?
Let us know how you think we should upgrade the Kaltura Server so we can make the upgrade process easier.

2 Responses to “Schema Management Under Source Control”

  1. Previously, I did similar things using SQLObject, and these days I’m using django-south. What you’re doing seems very similar to south.
    My question to you: what practices and development flows do you employ to make sure this works well in an environment with multiple developers? E.g. how do you manage people doing changes to the same table at the same time, merges between branches, etc. It’s possible you aren’t doing anything special to handle these cases, but still it would be interesting to know.

  2. Trackbacks

Leave a Reply