Earlier this month, the OpenSSL project team announced that three days later it would be releasing a new version of OpenSSL to address a high-severity security defect. In the end, this vulnerability resulted in another non-event for our customers, but we thought it might be useful and informative to share the process we went through to prepare for the issue.
The announcement from the OpenSSL project team only said that a vulnerability would be patched, but kept the specifics of the vulnerability embargoed to limit the likelihood of an attack before they could release their patch. Obviously, it’s difficult to gauge the potential impact of a vulnerability when you don’t know the details. A good place to start is by identifying the places where you have any exposure and the criticality of those systems. Heroku’s exposure to OpenSSL vulnerabilities is much less than you might expect. Obviously, we have a lot of clients that use SSL in the applications we host, but the SSL endpoint in those cases is an ELB provided by our underlying infrastructure provider AWS. We were already prepared and in communication with Amazon should the ELBs need patching and would work with them to determine that rollout.
We considered cases where an application becomes an SSL client for one reason or another and the likelihood that an attacker could act as a man-in-the-middle. We also had to evaluate the encrypted connections to our database servers, and until we knew what kind of bug had been discovered in OpenSSL we had to consider some worst-case scenarios; for example, a Remote Code Exploit that could put client data at risk. We finished our analysis with action items to update our stack images and an agreed-upon priority for various systems.
After the initial triage we started assigning the action items we identified to various engineering teams within Heroku. All of the engineers at Heroku were notified that the patch was coming, and the security team reached out to managers in various groups to make sure that explicit responsibility for carrying out the action items was assigned ahead of time.
Much of what we needed to do was already outlined in our normal incident response process, it was just a matter of applying that process to this incident. As part of that process developed a communication plan. We also explicitely laid out how we would communicate with each other during the response, what criteria would make this an all-hands-on-deck incident, and who would get paged when it was time to swing into action. This is all part of our normal incident response process, just done in advance.
The security team was tasked with searching for any new information about the vulnerability and being awake early on the day the patch was scheduled to be released. Members of all the teams were notified to be available in case this turned out to be an all-hands-on-deck moment.
The updated version of OpenSSL and details about the vulnerability came out around 8am Central time on Thursday July 9th. The security team saw the notification and began to re-triage the vulnerability with the new information provided. It turned out this was not an RCE and it was not a server-side vulnerability. Thus the worst scenarios (and the largest amount of remediation work) were ruled out and we decided that we didn’t need to trigger an all-hands.
A couple hours after the details came out, the Ubuntu security team released a bulletin indicating that neither of the Long Term Support (LTS) versions of Ubuntu were affected by this vulnerability. Since all of Heroku production is using LTS this meant that we dodged the bullet and no further action was necessary on our part.
We didn’t put a burdensome amount of time and effort into preparing for this vulnerability, but it did distract from the regular work we had planned for the week. In the end, we performed no work to remediate the vulnerability. Was it all just a waste? Absolutely not.
Without a crystal ball to know which vulnerabilities will directly affect us and their impact to our customers, this kind of preparation makes sense. By planning for an event of unknown severity we were able to exercise our response process and spend some time reviewing our areas of exposure and our communication plans. While we expended some time and effort on it, should there have been something we did have to address we would have been able to do so in a prompt and responsive manner ensuring customer apps and data were kept safe.