Few events bring a company together like a server outage. When the Exchange Server would go down at Microsoft, our group would race towards the pool table in building 25. We’d play a few games before checking to see if email was still down. Most of the time, IT was able to bring the server back online, and we’d return to work. When that didn’t happen, we’d head home for the day. Email was and still is the lifeblood of most organizations, and outages are no laughing matter.
Server Maintenance Matters
The cloud has helped reduce these outages, but it certainly doesn’t completely stop them from happening. And even if you’ve moved portions of your infrastructure to the cloud, there’s a good chance you’re still running some servers on premise. This week I’d like to discuss how you can best keep these servers running smoothly. The last couple of years, I’ve helped two companies consolidate their servers. We removed some old hardware and replaced it with new Xeon-based servers running Windows Server. I learned a lot in that process, including about proper server maintenance – and what happens when it’s not implemented. I hope you can learn from my experience.
Keep your OS Updated
This seems obvious. A no-brainer. And yet all it takes is nefarious piece of malware like the WannaCry worm to get everyone’s attention. WannaCry did most of its damage to unpatched Windows 7 computers, but it also attacked a number of servers running Windows Server 2003. There are actually a couple of issues at play here. First, you want to make sure you’re running a version of Windows that Microsoft supports with regular patch releases. You then need to keep that OS up-to-date. I spoke with a lot of people who had no idea that Microsoft had dropped support for consumers running Windows 7.
I’ve talked to too many people in IT who, once they get their servers running properly, don’t want to touch them. A few take this approach to the extreme by turning off the Windows update service which is a recipe for disaster. Testing patches in a VM takes time. Microsoft has released buggy patches in the past. But that doesn’t mean you shouldn’t work towards keeping all your systems up-to-date. You might not run into issues for years. But running unpatched servers will eventually catch up to you.
The newest versions of Windows give users more control over how and when updates are applied. If you’re interested in how Microsoft rolls out updates for Windows Server 2016, check out this article from Redmond Magazine.
Physically Clean Your Server
But I keep my server in a closed cabinet! That’s a really good start. If you’re lucky enough to work for a company that provides server racks, cabinets and a proper environment for all company’s servers, then kudos to your CEO. Even if your company provides all that, your servers can inhale dirt and dust that can degrade performance and reliability. Today’s hot running CPUs and GPUs will actually downclock themselves if they don’t have adequate cooling.
Good quality servers generally have powerful fans to keep air moving over and around critical components. But all that power means the fans can suck dirt and dust into the case. A few years ago, I went to a dentist’s office to help him upgrade his server. He told me he never removed it from a fancy glass enclosure he kept at the back of his office. The server ran his patient management software and was occasionally rebooting during the day. I asked him when the last time was that he cleaned the case filters. I had my answer when he simply stared back at me. When I removed the server from the enclosure, I found his case filters full of gunk to the point the server was throttling itself due to the heat inside the case.
I’ve used compressed air to clean both desktop and rackmount server cases. Be careful when shooting compressed air through the fans that don’t damage them. Make sure you remove and clean all the filters if your server case has them. Some of the newer cases have filters that pull out from the bottom in addition to top and rear mounted ones.
Virtualization Helps Server Maintenance
Do you remember the days of the backup server? I spoke with a day trader recently who is still a fan of the backup server, in spite of the extra costs and administration. Thankfully, we live at a time when you can virtualize nearly any server. In fact, you’d be wise to virtualize every server you can. Why? Because it’s so easy to spin up a backup VM today. Consolidating multiple servers running on older hardware by virtualizing them on newer hardware will nearly always result in improved uptime.
I understand that not all servers can be virtualized. Sometimes licensing, performance and hardware issues prevent it. That still leaves a lot of opportunity to virtualize the servers that make sense. For a list of servers you shouldn’t virtualize see this article from Contel Bradford on the Recovery Zone.
Check Logs for Hardware Errors
Bad components can bring a server to its knees if left unchecked. Hardware errors often show up after POST and after Windows has started all its services. Check the system logs for hardware issues, as a part of your server maintenance strategy. You may find that updating a driver for GPU or RAID card fixes the issue. If the error persists you’ll need to replace the component.
RAID Controllers, like this model from LSI, run very hot
It’s not a bad idea to remove any PCI-E cards or drives you’re not using. Server hardware is built to run 24/7. I don’t see a lot of issues with CPUs, boards, and RAM. Even today’s GPUs tend to run for years without issues. But I see a fair share of power supplies, fans and expansion cards fail over time. RAID cards are notorious for running hot, which shortens their lifespan. It never hurts to keep an eye on system errors as well. But I’ve found that hardware errors, when left unchecked, are far more likely to take a server down.
Verify Your Backups
So you’ve scheduled server backups. Each week you confirm the backup service is running properly. But are you taking the extra time to verify your backups actually work? Verifying the integrity of your backups is often the most overlooked step of a server backup process. How do you do this? Well, you’ll want to run a number of test recoveries until you feel comfortable with your process. Going forward, spot checks may be adequate.
If you’re outsourcing your backups to a cloud provider, you will want to understand how they go about verifying backups. Elements such as the backup location, schedule and recovery times are all critical to maintaining a solid backup plan. You should have a firm understanding of all of these element whether it’s your team or a 3rd party providing the service.
I’ve mentioned this before, but you want to use tested and trusted solutions when your reputation is on the line. Companies such as StorageCraft offer a line of backup solutions that work with all types of servers, including products for Exchange and virtualized environments.
Many factors contribute to keeping your servicers running smoothly and with as little drama as possible. Some of the most simple tips are the ones people most often overlook. One would think that keeping your server off the floor would be obvious. Yet, I still visit companies where one or more servers is running off the floor. Finding the proper home for your server should be task #1.
Do you have any tips for server maintenance?