When you went to school, boot camp or learned to code on your own, there is a good chance that supporting production code in the middle of the night was not something you dreamed about.
Yet, here we are.
When I started my career, I was handed a pager and was required to keep it around in case something went wrong. At first, the incessant, rhythmic, piercing three beeps filled me with anxiety and a little excitement. As I learned to handle issues, the sound became a nuisance, not just for myself, but my wife and family as well. The scrolling marque on the pager giving no real indication of a problem. Getting to the nearest computer, I would login and call into a conference line to reveal the night I was in for. Sometimes, the issue was obvious and something I had seen previously. Other times, my night was spent triaging, diagnosing, patching and/or coordinating.
There are folks who love the adrenaline of off hour support. If you are one of them, congratulations! I am certainly jealous. For the rest of us, we need to manage the stress, anxiety and inconvenience. There is great reward for being the hero in the middle of the night but handling issues poorly can have consequences. How’s that for lowering your stress level?
Here’s what you can do to keep your heart rate down.
Show up
I’ll let you in on a secret, 99% of production support is showing up. Ghosting on your rotation slot can seriously damage your reputation. I don’t only mean leaving your phone off or forgetting your laptop, I also mean seeing an issue and hoping that someone else will deal with it if you stay silent. Showing up means owning an issue as quickly as possible. Just a simple reply to the alert like “I’m on it,” will calm anyone else who may be monitoring with you or woken up by the alert.
It really is that simple.
Over communicate
Now you need to dig in a bit without getting lost. Give updates every few minutes to let folks know the steps you are taking to diagnose the issue. Use the tools that you know to get as much information as you can. If your organization has a process documented, start there. Next, look at the logs, a stacktrace, machine performance, commit history etc. to understand the problem.
As you do, let people know the actions you are taking at each step and remember these two rules:
- Do not think out loud when reporting your status, instead be as direct/succinct as possible
- Do not ask questions to the mythical “anyone,” instead ask a specific person
Your goal in communication is to determine criticality through severity and impact. Severity is the nature of the problem where impact is the number of users/services affected by the problem. For example, if a service is down, that might be a big deal, but if that service is not being used by anyone, maybe the issue can wait until the morning. If you can communicate the criticality of the issue clearly, this helps you and the organization determine the next steps.
Escalate when needed
When in doubt, escalate. Escalate if you have followed the process and either:
- Cannot make a call on criticality
- Deemed it critical, but do not know the next steps
Before your support rotation, make sure that you have the contact info of the key folks in your organization. The middle of the night is a bad time to not know how to get in touch with folks. Especially if it seems the world is on fire! When you escalate, take all the information that you have communicated and boil it down to the most important details. Go through the steps you have taken and anything you have ruled out as the root cause of the issue.
Its not that bad!
When you get the call, email or slack that something has gone wrong in production, take a deep breath. Remember that showing up is the most important thing you can do. Once you have put your hand in the air to own the issue, dig in using the tools that you know and keep communication alive. Prepare yourself for off hour support by getting the phone numbers of key folks within your organization to escalate too in case you need help.
Production support will put you in the limelight and when done right, you can get a standing ovation.
You’ll probably also get an encore!