Many companies both small and large often are far from acceptable when it comes to their on-call rotations. Quite a few companies only have one person in their “rotation”, if you can even consider that a rotation.
So why do so many companies make major mistakes when it comes to their on-call? Most of the time it comes down to ignorance as to what best practices are and other times it is budget related. Most managers who set up the on-call program at their companies often have never been in an on-call rotation and even more have never been on a single person on-call rotation. This leads to them not understanding how a healthy on-call rotation should be working and because of their lack of understanding they will put their employees in situations where it hurts both the company but more importantly the employee.
Let’s take a look at what things employees have to deal with when they are on-call. Since you have to keep your laptop with you at all times to respond to incidents in a timely fashion – most companies have a 5 minute response time objective. Due to this restriction on time and mobility, many are unable to attend their kid’s soccer game on a weeknight, they can’t go out on a weekend ski trip with their friends, dinner dates with the significant other are out too. Sometimes even the simple daily things like walking their dog for 30 minutes in the morning is an impossible task without risking a slow response to an incident.
Risks of a single person on-call rotation
Burnout lasting long term
Decreased employee happiness and motivation
Lower work quality during business hours
Concentration of important operational knowledge
Risk of time periods without coverage
Many of these risks can be mitigated when you introduce another 1-3 people into an on-call rotation. It is fine to have a new employee in the on-call rotation because once you have multiple people you can have that are available to be on-call you can add a secondary level of escalation before needing to bring a whole team into the incident. Sites like pageanengineer.com provide a way of having that second level of response be notified should the primary be unable to respond due to a gap in coverage for any number of reasons.
While it may cost extra to properly pay for on-call rotations the benefits can be seen right away in the lowered stress level of the person who was previously solely responsible. Pay is important for on-call as many sysadmins, SRE’s and DevOps Specialists/Engineers consider the on-call pay as well as the number of people in the on-call rotation to be an important factor in accepting a job. Done properly on-call is not a major negative thing and is something acceptable to even the best out there.
When there is a single person on-call their ability to properly think at 2 AM when their phone goes off for the third time this week during the night is decreased. This comes out in a variety of issues, the primary one for the business being that the incident is likely to have a higher MTTR (mean time to recovery). This can cost lost business, angry customers and a broken SLA. When more than one person is in the on-call rotation it makes it easier to involve a second person for when issues are challenging or they are more tired than normal due to a prolonged incident the previous night.
What’s all this to mean? Hopefully you understand some of the risks and different things that are a play when you have an on-call rotation that is a single person. Adding another person to that rotation makes a world of difference in the life of the person on-call as well as it can improve the company because that person is doing better work both when an incident occurs but also during their normal work day because they are less stressed.