Are Production Server Reboots Standard Changes?

I attended a meeting recently with a customer of mine and a potential new vendor. The new vendor was there to pitch his configuration and setup service offerings for a specific ITSM toolset.

My customer has already had one bad experience with an ITSM tool configuration vendor who promised one thing and delivered much less. He ended up with a tool that’s minimally used and not configured to match his business needs. He’s looking for a vendor that can understand his business needs and priorities and quickly help him get his tool configured and working in a short time frame.

My role with this customer is to help him adopt and stay aligned with ITIL best practices. The meeting went well, and I felt like the vendor showed my customer something that would really help him get maximum benefit from his chosen ITSM toolset.

Then the topic of standard changes came up. My customer asked for examples of standard changes. The vendor responded, “Server reboots are an example of standard changes.”

Things like this often make my head want to explode. I’m fairly good at controlling my emotions in a professional setting, and it definitely wasn’t the time or the place to get into a theoretical debate about what is and what is not a standard change. Discretion is the better part of valor, and I chose to not sidetrack the meeting into a pointless argument about standard changes.

However, the topic is important to me and to this particular customer. Standard changes are described by ITIL v3 as low-risk, regularly occurring activities that are well understood and repeatable. Standard changes are often associated with service requests. They are simply a way to document regularly occurring activities that involve some aspect of change, but are very low risk, low cost and repeatable.

There certainly is room for debate about what is and what isn’t a standard change. The point of the best practice is to communicate that controlling the risk of and documenting standard changes is a good idea, not to specifically tell you what is and what isn’t a standard change in your environment. Therefore, what is and what isn’t a standard change, like many things described by ITIL, is highly context-driven.

The bulk of my experience is in financial services, in IT environments that are critical to daily business operations involving billions of dollars. In such environments, regularly rebooting a server as a precautionary measure is slothful. It’s seen as not pursuing some underlying issue to root cause, and then permanently correcting that root cause. In such environments, a server reboot is what ITIL calls a workaround, in that the reboot temporarily addresses the symptoms of a problem but does not permanently correct it. In such environments, these are not typically low-risk activities and therefore would not be considered standard changes. This is not to say that we didn’t have many people who would’ve loved to make such a thing a standard change; we did, but the needs of the business wouldn’t allow it.

One issue with deeming server reboots as standard changes is that whatever situation the reboot is intended to address, which is usually clearing some memory-related issue, might immediately recur upon reboot. I have seen this very thing happen. A server is rebooted to clear a memory leak, and almost immediately upon reboot, the server is again unusable because of the memory leak. The reboot only addresses the symptom temporarily, which is why it really is a workaround.

Making these types of things standard changes tends to encourage bad administrative behaviors and lets the effects of poor development linger in organizations for years. Think of the situation where server administrator John creates some automated process to reboot a server, doesn’t document it, and then leaves the company. Whoever follows John (and then follows that person) is often left to figure out why a process is in place to reboot the server.

This doesn’t even mention the impact of a regular reboot on the business. Who knows (without asking) what impact it has on this business, how the business has adjusted and changed over time, or even if the business is taking some workaround actions of their own to ensure that the application hosted on the server is working properly.

The point is, server reboots are much more likely to be workarounds than they are to be standard changes. A workaround is an action taken to temporarily address the effect of an incident. Workarounds are sometimes called “temporary fixes”, which means that the duration the workaround’s effect applies is limited (it could be 1 minute, it could be 10 years). Furthermore, workarounds can be done preemptively. In most cases where I’ve seen organizations regularly reboot servers, what they are in fact doing is preemptively carrying out a workaround.

The Change Management process, where possible, should work to turn normal changes into standard changes. The Change Advisory Board should not review the same changes every week; an effective Change Management process figures out how to lower the risk of those changes and turn some (not all) of those normal change activities into low-risk standard changes.

ITIL is useful in many of the methods, techniques, and constructs that it describes. Declaring something as a standard change means that the activity is low-risk, well understood, and routine. Regular virus definition updates are a good example of a standard change. A regular reboot of a server to preemptively correct a memory leak in an application heavily used by the business is not a standard change. It is a potentially high-risk situation, that, if called a standard change, minimizes the amount of impact and undermines a critical part of the change management process.

Much, if not most, of what ITIL describes is contextual in nature. How one organization applies the concept of a standard change might not be the same as how another organization applies the same concept. However, one thing is clear. Any activity that is potentially high-risk to the business is not a standard change, and high-risk activities should not be managed through a process designed to handle low-risk, repeatable activities. To answer the question posed in this post, in most organizations, production server reboots are not standard changes

In this article

Join the Conversation

7 comments

  1. Benjamin Reed Reply

    THANK YOU!!! I’m sharing this with those that I often have the pleasure of debating this subject.

  2. Soluna Reply

    Without further clarification, the vendor was wrong. The circumstances of the reboot will determine its classification. I could define a Standard Change where a server is patched and rebooted once a month during a prescribed maintenance window. This is outside of normal service hours, can be documented, and approved as a Standard Change.

    The example the author uses, a situation where the server is being reboot during an incident, is a workaround. Too often I’ve seen staff close the process because the symptom had been resolved. However, the underlying problem (say an application memory leak) was never resolved. The workaround will resolve the incident but a good ITIL shop will hand the issue to Problem Management for an RCA. Problem Management (in the memory leak example), would find the leak and create an RFC for getting that bug patched.

    Great Article!

  3. ayush Reply

    i thgink in my point of view server reboot can be considered as a standard change as it is affecting a lot of users and therefore it must be a problem which must be sorted out using change management

  4. Bill Reply

    What are people’s thoughts about a server build being a standard change? Assuming there is a “standard” process defined on how a server is built? The server could be a Linux server, database server, etc..

  5. Fran Zabawa Reply

    Great article – interesting as we’re creating the ‘list’ of standard changes, the first item on our list is ‘Server Reboot’. I agree with your perspective that a reboot is a workaround, not corrective. Thank you!

  6. Saurabh Reply

    Good Explanation But I have a small doubt:

    “Does JVM/application service(NOT SERVER) restart requires a change”

    If everything will be Change then Incident or service request will loose there importance..don’t you think we are duplicating the efforts of raising a break-fix incident and then raising and standard CR or ECR as most of the time Incidents are for application issues which end up JVM/application refresh.

    Let me know your thoughts on this.

  7. Michael Scarborough Reply

    I think you might be confusing a “change” with a “change ticket”. A change is some modification to the environment. A change ticket is evidence or history of a change. Not all changes necessarily have associated changes tickets; this is an organizational difference. Change management is about understanding the risk associated with change and making good decisions about that risk.

    According to the best practice, all changes are within the scope of change management. Therefore restarting JVM as described below is theoretically a change. Just the same as changing a light bulb is a change. Now, does restarting JVM or changing a light bulb require a change ticket (evidence of the change)? That is an organizational decision that can only be answered by a specific organization, and how it views the level of risk associated with those activities. ITIL doesn’t tell you specifically which activities require a change or not, rather, it tells you that there is risk associated with changing things, and you need to handle that risk according to a consistent, predictable process.

    Additionally, while what you’ve mentioned might be handled through an incident management process, it could also be that the workaround required to restore service (restarting JVM) requires a change. Again, this is an organizationally dependent scenario. In some organizations this would be a change, in others it would not be.

    In practice I’ve seen this exact situation, and I’ve seen it handled different ways. I come from financial services, and in most financial services companies we would have an incident to represent this, and the change process would evaluate the risk of the restarting JVM. In other organizations it would all be handled through the incident process. In still other organizations this situation would involve neither incident nor problem management.

    In summary, ITIL is good at giving you options for how to handle situations, not definitive approaches that describe exactly how to respond to a specific situation.