Cassandra Version Control

Nikhilbhide
11 min readJan 3, 2021

In this article, you will find out -

  • Which are the tools available in the market for Cassandra version control?
  • How these tools differ in terms of features?
  • Is superior functionality the main factor to consider? Or are aspects like vendor and implementer support equally important?

Database version control is a key feature that assists database schema versioning. It's important to track, version, and deploy database changes. This article presents the comparison matrix for the following tools

  1. Cassandra-Migrate by Cobliteam
  2. Cassandra-migration-tool-java by smartcat-labs
  3. Liquibase-Cassandra by Liquibase

Comparison Matrix

Cassandra-Migrate, Cassandra-migration-tool-java, and golang-migrate provide basic functionalities of version control. These functionalities include schema upgrades and schema reset to baseline. This approach works for small systems and prototypes. In the case of schema migration failure, either the rollback needs to be done manually. Version control is not straightforward and auditing is completely missing. The changesets applied by the team is lost and there is no way to keep the history of the changes. In addition, there is no active development happening in these tools, and the probability of getting support for desirable features such as automatic rollback, rollback by tag, date, or counts is minimal.

By providing the following features, Liquibase scores heavily over other tools -

  • Flexible schema change — JSON, SQL, XML
  • Rollbacks — Undo database changes, either automatically or via custom rollback SQL
  • Context-dependent logic — Use contexts and preconditions to fine-tune script execution
  • Auto-generate SQL — Automatically generate SQL scripts

Liquibase-Cassandra

The liquibase-cassandra support has just become GA. The current extension supports rollback on the basis of tags. It also needs a few schema changes to get this feature to work. However, these are minimal changes and they are manageable. By default, whenever Liquibase is instantiated on the Cassandra keyspace then “databasechangelog” Table is created. When rollback on the basis of tag is issued, the rollback operation executes the query that expects the partition_id to be used in the criteria. Because of a bug, the table’s data modeling is a bit different. The workaround is to drop the table and recreate it with the correct data model. Once it's done then rollback on the basis of count or changeset_id is supported without any problem.

Current Limitations

As of now, Liquibase-Cassandra rollback support is limited to changeset_id and count. Tag/Date based rollback is still under development. It implies that whatever changes that are done can be rolled back sequentially up to a certain version or you can rollback only a particular change by specifying the relevant changeset_id. All of the changes need to specify in the changeset file and that’s how changes are applied. The rollback actions need to be defined in the rollback section. The changeset remains the same for the rollback command; however, rollback just executes the DML/DDL queries from the rollback section. This execution is done sequentially from top to bottom and there is no correlation with the ordering of changeset ids.

Setup

Liquibase-Cassandra needs the following tools in order to work with Cassandra

The Liquibase project structure requires two mandatory files — liquibase.properties and changelog.xml.

changelog.xml

The root of all Liquibase changes is the changelog file. Liquibase uses a changelog to list all changes, in order, made to Cassandra keyspace. It is similar to a ledger that contains a record of keyspace changes (changesets). Liquibase uses this changelog record to audit your database and execute any changes not yet applied to the keyspace.

Running Liquibase update command initializes the Liquibase support for a keyspace specified in the liquibase.properties file.

The structure of the changelog.xml is as follows -

TagDescriptionpreConditionsPre-conditions required to execute the changelog.propertyValue to set the property to, if not set by another means. changeSetThe changesets to execute. includeAdditional files containing changesets to execute.contextContext to be appended (using AND) to all changesets

The changeset requires two mandatory attributes id and author.

id — This is the unique identifier that uniquely identifies the changeset

author — author -of the changeset

When the Liquibase migrator runs, it parses the changelog tag. It first checks any preconditions specified. If any of the preconditions fail, the Liquibase will exit with an error message explaining what failed. Preconditions are useful for both documenting and enforcing expectations and assumptions of the changelog write. If all preconditions are met, Liquibase will then begin running changeset and include tags in the order they appear in the changelog file.

liquibase.properties

This is the configuration file that consists of the environment properties such as the location of changelog.xml file, Cassandra connection string, name of the keyspace, credentials of Cassandra, and driver name.

The liquibase project requires to be initialized on the relevant keyspace.

It creates two tables in the relevant keyspace. The first one is databasechangelog and databasechangeloglock. These tables are used internally by the Liquibase. However, there is a problem with the structure of the databasechangelog table. As per the data model defined by the cassandra-liquibase plugin, the partition key is id. The rollback operation fires a query that expects the author and filename columns to be part of the clustering key.

This needs to be fixed as follows. As a part of the Liquibase project setup, first create the dummy changeset as follows. This changeset just creates the dummy table. The dummy table represents a workaround in order to initialize Liquibase so that relevant tables get created in the keyspace.

The databasechangelog table contains the change log for the dummy table creation changeset. This indicates that Liquibase update is working as expected.

The next step is to drop databasechangelog table and create it with the following query.

Then just run the Liquibase rollback command to perform the rollback operation. This time around, rollback operation is successfully executed.

Now the Liquibase-Cassandra setup is successfully done and the project is ready for performing genuine changeset operations.

Liquibase Change Set Execution

DDL changes followed by a complete rollback

In this scenario, we are going to create new tables using Liquibase.

Once Liquibase performs the changeset operation, two tables — “orders” and “products” get created.

In databasechangelog table, two changesets with ids “1” and “2” are created. The senescing is as per the execution order.

Now we rollback these DDL changes by applying the rollback command. Note that rollback count is 2. As there are two changesets, it's absolutely essential to specify count as 2. Both of the changesets are rolled back with a unique key — file_name::changeset_id::author_name.

This rollback removes both the tables — “orders” and “products”.

Lastly, databasechangelog table will not have any changeset_id as all changesets are removed.

DDL changes followed by a selective rollback

In this scenario, we are rolling back only one changeset. The relevant changeset_id needs to be specified in the changelog.xml and the rollback count needs to be 1. We are going to apply the same changesets as applied in the section. So after applying the Liquibase update, both of the tables — “products” and “orders” get created.

In order to rollback changeset_id “2”, the change log file should contain relevant changeset_id. Rollback command “liquibase rollbackcount” is issued to rollback the applied changeset.

The rollback is successfully applied and from the databasechangelog the entry pertaining to changeset_id = “2” is removed.

Finally, “orders” table is deleted from the keyspace.

DDL (parent) + DML (child) changes followed by entire DML rollback

In this scenario, we are creating tables and inserting records in the tables. Then we are rolling back all of the changes. Note that whenever there is a dependency, the ordering of the changeset ids plays a huge role. In this case, changeset_id 3 is dependent on changeset_id 1 and changeset_id 4 is dependent on changeset_id 2. There the ordering should be reversed at the time of rollback operation. In the CD pipeline, this can be done by ansible (or any other script) to perform this reversal transparently.

First of all, let's understand the changesets in detail. We are having the first two changesets with DDL queries to create tables. In addition, we are applying changeset_id “3”. This changeset is a DML statement that inserts a couple of products (with names “Heavy Lift Arms” and “Precision Action Arms”).

Liquibase table databasechangelog contains the log of all changesets. The order is as per the execution of changeset.

Along with two tables “products” and “orders”, two records get added into the products table.

Now we are going to demonstrate the complete rollback for DML changeset_id “3” and one DDL changeset_id “2”. However, while performing its required to ensure that only relevant changeset ids are present in the corresponding changelog file. Also, count needs to be specified. The following snippet with “liquibase rollbackcount 2” showcases the same. Note that changesets with identifiers changelog.xml::2::nikhil and changelog.xml::3::nikhil are rolled back.

Eventually, the following is the resulted snapshot of databasechangelog table.

You can notice that orders table is removed from the keyspace and two records are removed from products table.

DDL + DML changes followed by a selective rollback

In this scenario, we are creating tables and inserting data into the tables. Then we are rolling back only one of the DML changes. This DML change is a non-edge scenario in which records added in between are removed through a rollback script. It indicates that Liquibase can support such scenarios without any impinge on the rest of the changeset ids.

Following are the additional changesets on the top of the changeset ids “1” and “2” i.e. orders and products table creation changesets. Description of changeset ids “3”, “4”, and “5” is as follows -

After applying changesets through Liquibase update, following is the state of the products table. This table contains all 6 records.

Now we are going to roll back one of the change. For this hypothesis, we are removing changeset_id “4” which is non-edge DML changeset_id. Remember that name is the primary key so that’s the partition key — a unique key from the perspective of data uniqueness and validation. The expectation is to get the next state of products table with 4 records. Following should be the changelog file in order to accomplish this rollback. The file needs to have a set of changeset ids that need to be rolled back.

The above changeset configuration is rolled back using “liquibase rollbackcount” command.

Finally, the products table has 4 records without two of the records added by changeset_id “4”.

With this scenario, we have successfully rolled back the select DML changeset amid multiple DDL and DML changesets.

Liquibase conditional changeset execution

In this scenario, the execution is conditional based and any changeset_id execution can be halted or continued based on the condition fulfillment. In this example, we are first checking whether the products table exists. In this case, we have toggled off the actual value. For demonstration purposes, we are performing an assertion on the “exptectResult” as “0”.

The ordering is as follows -

First changeset_id “1” gets executed and “products” table gets created.

Liquibase execution runner starts executing the next changeset_id “2” and before it executes the actual Sql, it checks the condition. The condition is based on the query count result and this count result is compared with the expectedResult. If this is true, then only the changeset_id is executed otherwise it's discarded.

The application of update is bound to fail as the condition is not satisfied. As onFail is “HALT”, the execution of Liquibase is stopped, preventing the creation of the records. This is highly useful as any mutation of the data should have a set of preconditions.

Dry run using update rollback

updateTestingRollback is typically used when you want to test rollback functionality while deploying changesets in your changelog sequentially. Run updateTestingRollback only when all pending changelogs have been verified as ready to be deployed as you cannot specify changesets to exclude.

In this scenario, rollback is first tested before executing the changeset. The operation works in a specific manner of “execution”-”rollback”-”execution”. This is extremely useful for a dry run of schema migration or a smoke test. In this example, for correct execution, we have rectified the precondition validation. Instead of expectedResult “1”, its “0” now. Because of this change, the insert query gets executed.

Only changeset_id “2” is tested for rolled back as this is the pending change. As all roll back changes are executed, all changesets are successfully applied.

Conclusion

After meticulous analysis, it can be concluded that Liquibase-Cassandra is the best C* version control available right now. Albeit Liquibase is not supporting Cassandra by direct means, it supports Cassandra through the JDBC connector. By JDBC connector, Liquibase is supporting many Java-based databases. It implies that not all Liquibase features can be used with Cassandra. However, even with limited features, Liquibase-Cassandra manages to emerge as the best tool for C* version control, owing to the features such as automatic rollback, selective changeset rollback, count-based rollback, changeset auditing, and conditional changeset execution.

--

--

Nikhilbhide

Principal Software Architect (Microservices, Big Data, Cloud, and AI/ML)