If your application is successful, there may come a time where you’re on an unsupported version of a dependency. In the case of the Heroku Platform API, this dependency was a very old version of Active Record from many years ago. Due to the complexity involved in the upgrade, this core piece of infrastructure had been pegged at version 2.3.18, which was released in March 2013. We’re happy to announce that we’ve overcome the challenge and are now running Active Record 4.2.4, the latest version, in production. In this post, we’ll describe the technical challenges we faced in the upgrade process and take a look at how your organization could benefit from upgrading legacy software dependencies.
As mentioned, we were stuck with a very old version of Active Record for several years. In fact, in mid 2013, we made an attempt to upgrade to Active Record 3.2.13 but we ended up abandoning the effort, because through some micro-benchmarking we found that version of Active Record was 40% slower. As time went by, not using the latest version of Active Record gradually became a maintenance burden for us. Active Record 2 support ended in late 2013 and we had to backport security patches and features to our codebase. For features like filters, we had to build our own query-building infrastructure instead of using the one from Active Record 3+. Adequate Record was introduced in Active Record 4 and we hoped it could improve our app’s performance. With all that in mind, we decided to give the upgrade another shot.
We first upgraded from the latest Active Record 2.x (v2.3.18) to the latest Active Record 3.0.x (v3.0.20). This greatly reduced the risks of the whole upgrade process because Active Record 3.0.x preserves all of the APIs of Active Record 2.x. We spent minimum effort to level up to 3.x and most of our Active Record code continued to work. It ended up taking one engineer working on it full time for about two weeks. This strategy was different from the effort we made in mid 2013 when we tried to upgrade directly to Active Record 3.2.13.
Next, we upgraded from Active Record 3.0.20 to the latest Active Record 4.2.4. We didn’t take small steps this time because most of our efforts were to get rid of the deprecated Active Record 2 syntax. It’s also not as hard to go from Active Record 3.1 to Active Record 4. Some of the gems we use, like
acts_as_paranoid, don’t have great support for Active Record 3.1.x so skipping Active Record 3.1.x+ also freed us from fixing these gems to work with a particular version of Active Record. Besides, upgrading to the latest Active Record version allowed us to cut down the number of Active Record patches we introduced, since most of them are now part of the latest Active Record version. This step was a bit demanding compared to the previous one and it took two engineers working on it full time for about two weeks.
The upgrade to Active Record 3.2.13 that we attempted in mid 2013 resulted in a drop in performance but we accepted the temporary drop this time because our end goal was to migrate all the way to the latest Active Record, which presumably would give us acceptable performance due to Adequate Record. However, as we describe below, the performance didn’t drop from Active Record 2.3.18 to 3.0.20 and it didn’t increase as we had hoped when we migrated from Active Record 3.0.20 to 4.2.4.
Upgrading such large group of components is hard. Upgrading such large group of components with no downtime is even harder. The Heroku CLI, dashboard, add-ons and other parts of the ecosystem all interact with the Heroku Platform API. We have to maintain high availability. In this section, we’re going to share how we rolled out the upgrade with zero downtime.
We worked on a feature branch as we upgraded the code. Because the upgrade process took a relatively long time, we kept syncing the feature branch with the master branch by rebasing often. We also cleaned up code as we upgraded. Some fixes were committed to the master branch directly and were then merged into the upgrade branch upon each rebase.
The first thing we did to the code was to bump Active Record and Active Suppport to a targeted version. It broke quite a lot of code but that was expected. The next thing we did was to fix enough code to get our Ruby console to work. For us, this step involved upgrading gems and patches that depended on Active Record and Active Support. It also involved fixing code that relied on removed APIs and adjusting the loading order of some code. To fix the console, we started the console, saw what errors it raised, fixed the errors and repeated the steps until it booted up. A nice benefit of having a running console first was that we could experiment with arbitrary code in the console which saved us a lot of time when debugging in subsequent steps. As the last step, we let our automated tests drive the upgrade process. We fixed code to get tests to pass. One small tip for this step is: for a multi-layered app, we recommend upgrading in the order of models, services, controllers and then the rest. Fixing models first, then layers above it one by one yields the most effective path because fixing a lower layer may end up fixing a couple of issues on an upper layer.
To test the upgraded code, we mostly relied on our existing automated tests. We don’t have 100% test coverage but our tests were good enough and helped us to upgrade most of the code correctly. We also added tests and cleaned up tests where necessary. Besides, as part of the Heroku infrastructure, we run a set of acceptance tests against our API after each deployment to verify the deployment doesn’t break anything. We find them extremely useful for detecting problems and we recommend them for risky code changes like an Active Record upgrade.
To verify that the upgraded code worked, not only did we start the server locally, but we also deployed it to a remote server. At Heroku, each engineer can deploy a copy of Heroku to a server for testing purposes (their own “private” Heroku). This provides us with the flexibility to check for potential errors before contaminating staging or production environments. We deployed, ran the acceptance tests, manually tested a couple of typical scenarios, found missed bugs, wrote tests to reproduce the bugs, fixed them, and repeated again until all errors were resolved.
To further reduce the risk of the upgraded code, we deployed it to a staging environment and left it running for a few days. Our staging environment simulates our production environment and most importantly, it has a lot of production-like data in the database. Letting risky changes bake in a production-like environment gave enough time for “hidden” bugs to pop up. For example, by chance, due to a database incident in our staging environment, we discovered issues in our patch to Active Record that auto-reconnects to a database. If we had already deployed the upgraded code to production, this might have caused us a production incident.
At Heroku, we take production deployment seriously. For any risky changes, we first do partial rollout. For the upgraded code, we deployed it to a small percentage of traffic. Doing a partial rollout not only allowed us to look for missed errors in a real production environment, but also minimized the risks. It also allowed us to monitor performance characteristics of the upgraded code side-by-side with the pre-upgraded code (more details below). Because we worked hard to reduce the risk of our changes ahead of time, the deployment to production went smoothly with no downtime, although in one occasion, we rolled back immediately due to a bug related to a missed case for
According to our abandoned upgrade to Active Record 3.2.13 in mid 2013, we thought we would need to pay the performance penalty when upgrading from Active Record 2 to 3. To our surprise, the performance of Active Record 3.0.20 was about 18% better than Active Record 2.3.18 for us. The graph below shows overall request latency. The lower the latency, the better.
We thought we would have better performance when upgrading from Active Record 3 to 4 due to Adequate Record. Surprisingly, the performance of Active Record 4.2.4 was 13% worse than Active Record 3.0.20. The graph below shows overall request latency.
Our initial conclusion is that our request latency is dominated by database query time so the performance of the ORM doesn’t matter much. We also concluded that micro-benchmarking is not an accurate way to measure the overall performance of an application. Our concerns about paying a performance penalty when upgrading from Active Record 2 to 3 and our hope for better performance from Active Record 3 to 4 don’t hold. This also brings up an interesting lesson learned: always benchmark changes with your app in production.
Overall, we are very happy with the upgrade. It cleans up many of our patches to Active Record and it gives us many features like the query-building infrastructure. Besides, the outdated Active Record was dropped from security updates and we are now using a supported version. Also, performance has stayed at the same level as it was before the upgrade. For future Active Record updates, it will be easier and quicker for us. Training new employees will also be easier now that we’re on the “current” Active Record version. If you’re thinking about upgrading a legacy dependency in your codebase, the sooner you upgrade, the easier it will be to maintain.