ControlUp: Do more with less!

Recently I worked with a customer to get the number of helpdesk tickets down through automation and self-healing. In this blog, I want to share with you how we did this with ControlUp and how this can help your organization.

The problem

One of the biggest issues at the customer (Managed Service company with 1300+ VM’s) is servers with full disks. When this occurs an application on the server stops responding or the server itself crashes. This, of course, leads to user impact and tickets at the service desk. ControlUp does give a warning when a disk is almost full, but the service desk is overloaded with e-mails from monitoring (1 a minute) and there is no action taken on the mails.

Current resolution

The current resolution is to wait until a service desk ticket is logged. Then there is an investigation of why the application or server is unavailable. When it’s discovered that a disk is full there is a document with some guidelines on what to do like clean the disk using disk cleaner delete log files etc. This document only describes what to do on Windows Server 2012R2. After that, the application is started up again and the ticket is closed.

Issues with the current solution

  • It’s not proactive
  • Users are afflicted by the issues
  • Manual action required to fix the issue

The solution

The customer had a list of guidelines on what needed to be cleaned or zipped or logged when a disk is full. This included delete temp files, remove old profiles, measure the software distribution folder, zip old IIS log files and keep them for 90 days, perform different cleanup actions of different types of servers, etc. So, I came up with a PowerShell that performed all the actions the customer set on the guidelines list. But this was the easy part (well the script got pretty big, but still). The problem was, of course, the number of helpdesk tickets and the user impact. To solve this, we needed something that can automatically run the script when the disk is full! Well, you probably guessed it this is where we used ControlUp automation. We created a script-action within ControlUp’s script library that can run the new PowerShell script. Now admins can easily run the script on VM’s in ControlUp with one click and get the log file of the script back as output. This is great but we wanted to automatically run the script. For this, we can now use ControlUp Triggers. In a trigger, we set when the action needs to be executed. We choose to execute the trigger when the free disk space drops below 6GB’s.

Afbeelding met schermafbeelding Automatisch gegenereerde beschrijving

When that happened, we execute the action type Script Action and selected the new PowerShell script. By checking “send script execution output to email recipients” we send the output of the script (cleaning log) to the helpdesk. This way they know what is cleaned and how much free space the disk ended up with.

Afbeelding met schermafbeelding Automatisch gegenereerde beschrijving

ROI

Of course, the Return of Investment is important in this project. Well the customer got over 40 disk related tickets every week which took 5 to 10 minutes to solve. Costing the customer more than 4 hours of helpdesk time each weak. After the automatic fix this is down to 4 tickets a week so around 25 minutes of helpdesk time. This frees up more than 3,5 hours each week for the helpdesk. So I would say an excellent ROI especially if you think that there is also less negative user impact.

Conclusion

The title of this blog is to do more with less. And that’s exactly what we did! We did more by automatically fixing full disks, there is no more negative user impact. Because application servers are no longer stalling because of a full disk. And the with less part is easy the helpdesk at the customer no longer needs to worrier about these full disks. And are getting significantly fewer tickets. The possibility of self-healing monitoring automation is of course endless. For example, I also created a trigger that when IIS takes more than 90% CPU for 30 seconds it will restart the web application which takes the most CPU automatically. I’m really curious about what you guys think are some smart ways to minimize helpdesk tickets? Leave them down in the comments!

hope this was informative. For questions or comments, you can always give a reaction in the comment section or contact me: