Friday, July 27, 2012

How to implement physical server OS snapshot

In one of the assignments, I needed to solve an issue for a quality assurance department to come up with a way to restore entire QA environment operating systems to their original state. This was needed to ensure that QA process uses clean OS baseline during each test set iteration. There were about 40 various flavors, versions, CPU architecture, 32 bit, 64 bit, file system types, and various combinations of UNIX (Solaris SPARC/i386, Aix, LPAR, HPUX), Linux (RedHat, SuSE) and Mac OS X, as well as Windows server systems. All these systems needed to return to the original state after QA engineers were finished testing another iteration of software release. Virtualization and VMWARE/ESX snapshots would help a little, however the QA process required testing on physical servers as well as virtual machines.

Physical Snapshots - task requirements

  • Commonality
    • works the same way across all platforms
    • uses the same interface
  • Supportability
    • Uses supported, native OS methods
  • Complexity
    • easy to setup
    • easy to use
    • easy to maintain (patch, add / remove features)
    • requires simple skill set
  • Reliability
    • not susceptible to network outages
    • no single point of failure that affects restores for all systems
    • preserves snapshot integrity
  • Speed
    • Close to what it takes to recover a virtual machine
  • Cost
    • Low maintenance
    • Should not tie up scarce physical QA machines

At first, there were few options, ideas and ways that I had in mind, however none of these options used a universal approach to reimaging. Bare Metal restore software was very costly, required another server to perform the restore (one server per OS) and also imagine the kind of network traffic and load that would be generated if 40 systems would need to go into reimaging at the same time. Multiply that by 10 QA engineers with 40 servers each and doing reimaging would consume enormous network resources. No single open source tool was capable of handling everything. Another issue that was important - reliability and speed of the restore process.

In the next series of posts, I will go over the "how to" steps on implementing the snapshot recovery on virtually any physical server OS. If something would not be covered here, you will understand the approach on how to implement such mechanism on any other OS. Also, feel free to send anything happens to be missing to complete the collection.

No comments:

Post a Comment