morph database schema migrations

Morph: Database Schema Migrations Made Easy

Morph is a new database migration tool designed to improve Mattermost schema migrations. It can be used programmatically with a library and from a CLI. Morph stores the schema version in a table, where all applied migrations are persisted. It’s a flexible open source library that can be used with a driver interface implementation and a source to read migration scripts from. The engine uses a dependency injection pattern so that any implementation can be used with the library. Additionally, the interfaces of the dependencies are kept minimal to avoid confusion.

Before we examine how Morph simplifies database schema migration in Mattermost, let’s take a look at why database schema migrations are important and other options for managing them.

Why database schema migrations matter

As an application evolves, the data flowing inside inevitably evolves along with it. If you have an application that persists the data in a relational database, the way you store the data will probably change over time. 

The definition or structure of your data is called the database schema. The schema is a key factor in relational databases, as it guarantees that the data being read from the database will have a certain structure.

As your application changes, you need to find a way to keep track of the differences. It’s pretty straightforward for the source code; using a version control system and publishing releases with semantic versioning is a proven way that works for many teams. There are also alternatives to versioning strategies. But in essence, they generally serve similar purposes.

Tracking schema changes over time

So, how do you track your schema changes over time?

Since we are following the good practice of using idempotent schema migrations in our repository, we can run a migration over and over again without requiring a dirty state. In the event of failure, the application can choose to roll back the migration anyway.

Additionally, you may want to have a more granular schema version and store every applied migration. For example, imagine a scenario where you’re planning to introduce two new migrations for the new release (e.g., version number 18 and 19). At the last minute, you decide to pull the 18th migration. In this case, you need to rename the 19th migration to the 18th, or rename the 18th migration to the 20th.

Instead of using existing tools for tracking schema changes, you can also track these by tieing schema versioning to the application version. Although it seems to make sense at a glance, there’s a major difference once you dig deeper into the motivations of versioning. In a nutshell, you don’t need to give promises as your public API since the database is not intended for public use (generally). As such, you don’t need to follow major, minor, and patch semantics. Still, you may want to track the changes for several reasons, including:

  • Tracking version history. You can see when a table or column was changed and who changed it. You can also see how the change is implemented by looking at the differences between the two versions.
  • Restoring a previous version. If there’s an error in any version, the current version is broken, or if you simply like a previous version better, you can replace the current version with a different one.

Assuming we have enough reasons to control versions of our schema, let’s take a look at how we can manage this.

How to manage schema changes

Since we decided versioning the schema is a good idea, let’s think about how to do it.

One option is to use existing tools developed specifically to track schema migrations — like golang-migrate, which is one of the most popular projects of its kind. Unfortunately, migrations can fail at any time for many reasons. In a failure scenario, golang-migrate leaves the migration state dirty, which requires some manual steps to fix the issue and we want to be able to recover automatically.

Let’s start with tracking the version and then continue with how to apply changes for every version. Also, since the database stores data, it can actually store the schema version in itself. Some versioning approaches could be:

  • Creating a table in the database and storing the latest schema version.
  • Creating a table in the database and storing every applied migration.

Generally, both approaches can do the job. But once you decide to migrate down to a version other than the latest one, you will need to either write a new script and add it as a new forward migration or migrate down to every version from the latest to the one you ultimately want to use. 

The latter wouldn’t be a big problem if you map the application version to the schema version. To render this, we can imagine an example case. 

Let’s say your application version is 2.1 and it had three migration steps on top of 2.0 (e.g., 10 to 13). If you decide to downgrade the application version, the process will be pretty straightforward. But, let’s imagine a case where you just want to skip a migration in between that causes some pain. You can’t simply manage each migration on its own; you need to think about it as a whole. This can be beneficial with respect to being on the safe side, but you pay the price by losing a bit of flexibility.

But if you track each applied migration, you are adding another dimension to the version control. With this approach, you know exactly which migrations are actually applied or not by looking at the version table.

What is the best method for schema migration?

For SQL statements that are going to be applied to the schema, there are two major ways of implementing the migrations;

  • Add these statements to your source code and execute them programmatically from your application code.
  • Add these statements as separate script files and apply each file as a migration step.

The first approach is very hard to track. Since the code lives inside your application, it’s generally mixed with a high-level programming language logic and requires your application to be running to get applied. And you don’t necessarily add the backward migration logic while adding a new one because if you decide to roll back to an earlier version that logic would only exist in the newer version. In essence, you cannot decouple schema management from your application, which can cause you lots of pain.

The latter — having the statements in separate files — comes with a huge advantage: flexibility. You can test the scripts in your database console right away or you can use another tool to apply migrations since it doesn’t require the application to be running. Other advantages to this approach include easy navigation among scripts and the fact that there won’t be many context switches while reading a migration implementation since it can’t have a high-level programming language semantic in it.

How Morph simplifies database schema migration

At this point, it seems like using a schema migration tool is a good way of managing this matter. This is exactly where Morph enters the equation. Simply put, Morph is a schema migration tool built to help improve Mattermost schema migrations.

The Morph engine can be initialized as follows:

engine, err := morph.New(context.Background(), driver, src)
if err != nil {
    return err
}
defer engine.Close()

if err = engine.ApplyAll(); err != nil {
    return err
}

It’s worth mentioning that MySQL, PostgreSQL, and SQLite implementations are provided within the project. The interface is defined as follows:

type Driver interface {
    Ping() error 
    Close() error
    Apply(migration *models.Migration, saveVersion bool) error
    AppliedMigrations() ([]*models.Migration, error)
    SetConfig(key string, value interface{}) error
}

The Morph engine expects the migration scripts to be parsed and accessible from an interface — source.Source — which is a minimal interface as:

type Source interface {
    Migrations() (migrations []*models.Migration)
}

The models.Migration is a simple struct that contains the migration name, version, direction, and the migration statements as an io.ReadCloser interface can be a file essentially. The models.Migration struct can be initialized from the files, and the built-in file source expects files to have a specific format.

The migration files should have up and down versions. The tool expects each migration to be reversible, and the naming of the migration should be in the following format:

{order_number}_{migration_name}.{(up|down).sql}

The first part will be used to determine the order in which the migrations should be applied. And the next part, until the up|down.sql suffix, will be the migration name.

For example:

0000000001_create_user.up.sql
0000000001_create_user.down.sql

Apart from the essential arguments, Morph can be initialized with a locking option which enables a lock mechanism (to mimic advisory locking) that’s been built in with the tool. The reason we avoid using the advisory lock is that we use PgBouncer on our cloud infrastructure, and for scaling reasons, we had to use the transaction pooling mode instead of session pooling

Transaction pooling prevents using session-level advisory locks. Hence, we had to implement our own locking feature to overcome this. In essence, this locking mechanism is being used in cases where multiple applications could execute the migrations at the same time. If the database driver implements the following interface, Morph can be used safely in multi-application deployments:

type Locker interface {
    Lock(context.Context) error
    Unlock() error
}

It essentially allows an instance to create a mutex value with expiration in the database and refreshes it until the instance finishes applying the migrations. Other instances will keep waiting until the mutex is either removed from the database or expires. 

The main difference here is that locking is not used for a single statement; it’s being used for the entire migration process. This mechanism creates another level above sessions, which is why it wouldn’t fail if the advisory locks are bound to the sessions.

The Morph CLI provides the same functionality as the command line. Feel free to install and experiment as we think the command discovery should be sufficient to start using it productively!

Summary

Schema migrations enable versioning as the database schema evolves. Making schema changes to databases is serious business as it means changing your customers’ data structures — and that always comes with some risks. We consider it a good practice to have easy recovery paths and to handle changes as a unit. 

Morph is being used on our Channels product and we started to expand its usage on other products, too. We are continuously improving it as requirements change and emerge. 

If this seems like a tool that may benefit your project, we’d love you to give Morph a try and tell us what you think. The README and API docs should get you started. But if you run into any trouble feel free to create an issue or reach out to us on the Mattermost community server!

Lastly, I’d like to thank Agniva and Miguel for their contributions to the Morph project and Zef for helping me to express this journey much better. Also, special thanks to our SRE team to help iron out the bugs introduced with the new system, which created a couple of interesting issues when we first rolled out this feature.

Ibrahim is a software engineer on the platform team at Mattermost. He is keen open source enthusiast and he also has several open source developer tools to boost developer experience.