Saturday, July 28, 2012

How to implement physical servers snapshots on Apple MAC OS X 10.X

Concept

Here is how it works:
  • Mac OS X physical snapshots can be configured with a single disk 
  • Partition your disk to have at least 2 partitions  "MASTER" and "CLONE"
  • Install clean MAC OS X on MASTER. This will become your  disk.
  • Include required packages (see Prerequisites, below)
  • Perfect the OS, install any software that you would like to have as part of the standard OS
  • Setup snapshot scripts and create the snapshot

Banner text

Banner will be displayed during the reimaging process to anyone who will be trying to connect to the system via SSH, CONSOLE or TELNET prior to the end of the snapshot recovery process. Touch /etc/banner_default
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
 
       THE SYSTEM IS BOOTED TO AN ORIGINAL SNAPSHOT DISK
               SNAPSHOT RECOVERY IS IN PROGRESS
 
       TO PRESERVE INTEGRITY OF THE SYSTEM DO NOT LOGIN!

Enable banner for SSH

To enable banner for SSH, simply uncomment "Banner" in /etc/ssh/sshd_config
# no default banner path
Banner /etc/banner
NOTE: During the reimaging process /etc/banner_default will be renamed to /etc/banner

Scripts and entries

You will need to add scripts and append standard configuration files as part of this process. Feel free to customize as you wish. Append /etc/profile. Simple notification mechanism if a user is trying to login to the system during the snapshot restore.
#…
PATH=$PATH:/usr/local/bin
export PATH
#…

#master (YOUR ORIGINAL SNAPSHOT DISK)
MASTER=”/dev/disk0s2”

if [ `/usr/sbin/bless --getBoot` == ${MASTER} ]; then
 clear
 cat /etc/banner_default
 while true
  do
   echo "Are you sure you still wish to login? (y or n) :\c"
   read CONFIRM
   case $CONFIRM in
     y|Y|YES|yes|Yes) break ;;
     n|N|no|NO|No)
     echo Aborting - you entered $CONFIRM
     exit
     ;;
    *) echo Please enter only y or n
   esac
  done
else
/bin/rm -f /etc/banner
fi

Create /usr/local/bin/restore_snapshot

Create file /usr/local/bin/restore_snapshot with execute permissions (500). This is the script that you will be executing to request snapshot restore.
#!/bin/sh
#master (YOUR ORIGINAL SNAPSHOT DISK)
MASTER=”/dev/disk0s2”
#clone  (YOUR TEST DISK)
CLONE=”/dev/disk0s3”

# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi

if [ `/usr/sbin/bless --getBoot` == ${CLONE} ]
then
        echo "System is booted to the secondary disk"
        echo "Changing boot device priority to boot to the master disk for snapshot restore..."
        echo "INFO: Cancel by CRTL-C in 15 seconds..."
        sleep 15
        set -x
        /usr/sbin/bless --device ${MASTER} --setBoot
        set +x
        echo "Rebooting..."
        /sbin/reboot
else
        /usr/local/bin/resnapshot
fi

MAC OS X /usr/local/bin/resnapshot

Create /etc/rc.resnapshot (chmod 500). This script will be called during OS boot. It will check whether the snapshot restore must start after the reboot.
#!/bin/sh

#master (YOUR ORIGINAL SNAPSHOT DISK)
MASTER=”/dev/disk0s2”
#clone  (YOUR TEST DISK)
CLONE=”/dev/disk0s3”

# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi

echo "INFO: You can still cancel by pressing CRTL-C within 15 seconds..."
sleep 15

if [ `/usr/sbin/bless --getBoot` == ${MASTER} ]
then
        echo "INFO: System is booted to the master image disk."
        echo "INFO: Placing banner."
        set -x
        cp /etc/banner_default /etc/banner
        set +x

        echo "INFO: Restoring snapshot. The system will reboot automatically, if successful."
        echo "INFO: Please wait..."
        echo "INFO: Placing FirstReboot into startup"
        /bin/mkdir -p /Library/StartupItems/FirstBoot
        echo "INFO: Creating FirstReboot script"
        /usr/bin/touch /Library/StartupItems/FirstBoot/FirstBoot
        echo "INFO: Changing Permissions on FirstReboot script"
        /bin/chmod +x /Library/StartupItems/FirstBoot/FirstBoot
        echo "INFO: Populating on FirstReboot script"
        echo "/bin/rm -rf /Library/StartupItems/FirstBoot" > /Library/StartupItems/FirstBoot/FirstBoot
        echo "/sbin/reboot" >> /Library/StartupItems/FirstBoot/FirstBoot
 /usr/sbin/asr -h 2>&1 | grep '\-\-erase'
if [ $? –eq 0 ]; then
        ASRCMD=”/usr/sbin/asr -source ${MASTER} -target ${CLONE} --erase --updatebless –noprompt”
  else
        ASRCMD=”/usr/sbin/asr -source ${MASTER} -target ${CLONE} -erase -updatebless –noprompt”
fi

echo $ASRCMD
set -x
${ASRCMD}

        if [ $? -eq 0 ]; then
                echo "INFO: Restore succeeded! Setting boot to clone..."
                /usr/sbin/bless --device ${CLONE} --setBoot
                /usr/sbin/bless --getBoot
                echo "INFO: Removing Local FirstReboot directory"
                /bin/rm -rf /Library/StartupItems/FirstBoot
                echo "Rebooting Now..."
                /sbin/reboot
        else
                echo "INFO: Snapshot restore failed. Aborting..."
                exit 1
        fi
else
        echo "INFO: System is booted to the clone. Exiting..."
        set -x
        /bin/rm -f /etc/banner
        /usr/sbin/diskutil unmount ${MASTER}
        set +x
fi

Mac OS X job scheduling

defaults write /Library/LaunchDaemons/com.globalitadmins.resnapshot Label com.symantec.resnapshot
defaults write /Library/LaunchDaemons/com.globalitadmins.resnapshot ProgramArguments -array "/usr/local/bin/resnapshot" 
defaults write /Library/LaunchDaemons/com.globalitadmins.resnapshot RunAtLoad -bool true

Create first snapshot

To initiate the first cloning process, simply execute:

/usr/local/bin/restore_snapshot
Mac OS X Restore Snapshot Sample Output
mini106-4:bin root# restore_snapshot 
INFO: You can still cancel by pressing CRTL-C within 15 seconds...
INFO: System is booted to the master image disk.
INFO: Placing banner.
+ cp /etc/banner_default /etc/banner
+ set +x
INFO: Restoring snapshot. The system will reboot automatically, if successful.
INFO: Please wait...
INFO: Placing FirstReboot into startup
INFO: Creating FirstReboot script
INFO: Changing Permissions on FirstReboot script
INFO: Populating on FirstReboot script
+ /usr/sbin/asr -source /dev/disk0s2 -target /dev/disk0s3 --erase --updatebless --noprompt
        Validating target...done
        Validating source...done
        Erasing target device /dev/disk0s3...done
        Validating sizes...done
        Copying    ....10....20....30....40....50....60....70....80....90....100
+ '[' 0 -eq 0 ']'
+ echo 'INFO: Restore succeeded! Setting boot to clone...'
INFO: Restore succeeded! Setting boot to clone...
+ /usr/sbin/bless --device /dev/disk0s3 --setBoot
+ /usr/sbin/bless --getBoot
/dev/disk0s3
+ echo 'INFO: Removing Local FirstReboot directory'
INFO: Removing Local FirstReboot directory
+ /bin/rm -rf /Library/StartupItems/FirstBoot
+ echo 'Rebooting Now...'
Rebooting Now...
+ /sbin/reboot

And you are all set!

Important commands

#CURRENT BOOT DISK: bless –getBoot
#ALL DISKS PARTITIONS: diskutil list

#PARTITION WITH GUI: USE DISKUTILITY  
#MOUNT MASTER WHEN BOOTED ON TEST:
#diskutil mount /dev/disk0s2
#UNMOUNT MASTER WHEN BOOTED ON TEST:
#diskutil unmount /dev/disk0s2

Helpful links

Friday, July 27, 2012

How to implement physical servers snapshots on AIX 5.2, 5.3, 6.1, 7.1

Concept

Here is the short description of how the process works.
  • Make sure your system has at least 2 physical hard drives (hdisk0 and hdisk1).
  • Install clean AIX OS on hdisk0. This will become your "MASTER" disk.
  • Include required packages (see Prerequisites, below)
  • Perfect the OS, install any software that you would like to have as part of the standard OS
  • Setup snapshot scripts and create the snapshot

Prerequisites

  • All AIX versions require 2 HDD disks. One disk, will call it "MASTER" holds base line OS. The other, let's call it "CLONE", is the disk that the MASTER will be recovering the snapshot to effectively overwriting it every time you will need to restore your snapshot.
  • For LPAR's, allocate 2 virtual disks in VIOS.
  • All commands must be executed as root.

AIX 5.2 software

bos.alt_disk_install 
OR install file set:(lslpp -L bos.alt_disk_install.rte)

AIX 5.3 software

bos.alt_disk_copy    
OR install file set: (lslpp -L bos.alt_disk_copy.rte)
 bos.alt_disk_install.boot_images
 bos.alt_disk_install.rte     
 bos.msg.en_US.alt_disk_install.rte

AIX 6.1, 7.1 software

  • All required packages are present in the default installation

Banner text

Banner will be displayed during the reimaging process to anyone who will be trying to connect to the system via SSH, CONSOLE or TELNET prior to the end of the snapshot recovery process. Touch /etc/banner_default
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
 
       THE SYSTEM IS BOOTED TO AN ORIGINAL SNAPSHOT DISK
               SNAPSHOT RECOVERY IS IN PROGRESS
 
       TO PRESERVE INTEGRITY OF THE SYSTEM DO NOT LOGIN!

Enable banner for SSH

To enable banner for SSH, simply uncomment "Banner" in /etc/ssh/sshd_config
# no default banner path
Banner /etc/banner
NOTE: During the reimaging process /etc/banner_default will be renamed to /etc/banner

Scripts and entries

You will need to add scripts and append standard configuration files as part of this process. Feel free to customize as you wish.

Append /etc/inittab

The line below is used to check whether user requested snapshot restore during the last reboot.
resnap:2:once:/etc/rc.resnapshot >/dev/console 2>&1

Append /etc/profile

Simple notification mechanism if a user is trying to login to the system during the snapshot restore.
#Note: MASTER DISK hdisk0 is intentionally hard coded here
PATH=$PATH:/usr/local/bin
export PATH
if [ `bootinfo -b` == "hdisk0" ]; then
 clear
 cat /etc/banner_default
 while true
  do
   echo "Are you sure you still wish to login? (y or n) :\c"
   read CONFIRM
   case $CONFIRM in
     y|Y|YES|yes|Yes) break ;;
     n|N|no|NO|No)
     echo Aborting - you entered $CONFIRM
     exit
     ;;
    *) echo Please enter only y or n
   esac
  done
fi

Create /usr/local/bin/restore_snapshot

Create file /usr/local/bin/restore_snapshot with execute permissions (500). This is the script that you will be executing to request snapshot restore.
#!/bin/sh

#Note: MASTER DISK hdisk0 is intentionally hard coded
# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi

if [ `bootinfo -b` != "hdisk0" ]
then
        echo "System is booted to the secondary hdisk"
        echo "Changing boot device priority to boot to the master hdisk for snapshot restore..."
        set -x
        bootlist -m normal hdisk0     
        set +x
        echo "Rebooting..."
        /usr/sbin/shutdown now -r
else
        /etc/rc.resnapshot
fi

AIX 5.2 /etc/rc.resnapshot

Create /etc/rc.resnapshot (chmod 555). This script will be called during OS boot. It will check whether the snapshot restore must start after the reboot.
#!/bin/sh

#Note: MASTER DISK hdisk0 is intentionally hard coded
#Note: CLONE DISK: hdisk1 is intentionally hard coded

if [ `/usr/sbin/bootinfo -b` == "hdisk0" ];then
        echo "INFO: System is booted to the master image hdisk."
        echo "INFO: Placing banner."
        set -x
        cp /etc/banner_default /etc/banner
        set +x
        echo "INFO: Removing altinst_rootvg"
        set -x
        /usr/sbin/alt_disk_install -X altinst_rootvg
        set +x
        echo "INFO: Restoring snapshot. If successful, the system will reboot automatically."
        echo "INFO: Please wait..."
        set -x
        /usr/sbin/alt_disk_install -C hdisk1
        if [ $? -eq 0 ]; then
                echo "rootvg clone operation suceeded!"
                echo "Rebooting..."
                /usr/sbin/reboot
        else
                echo "rootvg clone operation failed!"
                echo "Aborting..."
        fi
        set +x
else
        set -x
        /usr/bin/rm -f /etc/banner
        set +x
fi

AIX 5.3, 6.1, 7.1 /etc/rc.resnapshot

Create /etc/rc.resnapshot (chmod 555)
#!/bin/sh

#Note: MASTER DISK hdisk0 is intentionally hard coded
#Note: CLONE DISK: hdisk1 is intentionally hard coded

if [ `/usr/sbin/bootinfo -b` == "hdisk0" ];then
        echo "INFO: System is booted to the master image hdisk."
        echo "INFO: Placing banner."
        set -x
        cp /etc/banner_default /etc/banner
        set +x

        echo "INFO: Removing altinst_rootvg"
        set -x
        /usr/sbin/alt_rootvg_op -X altinst_rootvg
        set +x
        echo "INFO: Restoring snapshot. If successful, the system will reboot automatically."
        echo "INFO: Please wait..."
        set -x
        /usr/sbin/alt_disk_copy -d hdisk1 -r
        set +x
else
        set -x
        /usr/bin/rm -f /etc/banner
        set +x
fi

Create first snapshot

Please note that you do not "create" the initial snapshot per se.
Your initial snapshot is your current OS hdisk0. What you are doing
is copying the good disk hdisk0 onto the second hdisk1 that after the
whole process is done will become the disk, where you will be testing
stuff. Once test is finished, you will simply execute restore_snapshot
script to request the restore of the snapshot. The restore process will
reboot the OS and will automatically start overwriting the test disk with
the clean OS image. Once that process finishes, the system reboots once
again and you are back on a clone disk.

To initiate the first cloning process, simply execute:

/usr/local/bin/restore_snapshot
And you are all set!

Important commands

#Display AIX disk information
#CURRENT BOOT DISK:      bootinfo –b
#ALL DISKS:              lspv
#DISK DETAILS:           lscfg -vl hdisk0

#MOUNT MASTER WHEN BOOTED ON TEST
#alt_rootvg_op –W –d hdisk0

#UNMOUNT MASTER WHEN BOOTED ON TEST
#alt_rootvg_op –S
#[alternatively specify –t to rebuild boot image.
Not recommended for minor changes.

Helpful links

How to implement physical server OS snapshot

In one of the assignments, I needed to solve an issue for a quality assurance department to come up with a way to restore entire QA environment operating systems to their original state. This was needed to ensure that QA process uses clean OS baseline during each test set iteration. There were about 40 various flavors, versions, CPU architecture, 32 bit, 64 bit, file system types, and various combinations of UNIX (Solaris SPARC/i386, Aix, LPAR, HPUX), Linux (RedHat, SuSE) and Mac OS X, as well as Windows server systems. All these systems needed to return to the original state after QA engineers were finished testing another iteration of software release. Virtualization and VMWARE/ESX snapshots would help a little, however the QA process required testing on physical servers as well as virtual machines.

Physical Snapshots - task requirements

  • Commonality
    • works the same way across all platforms
    • uses the same interface
  • Supportability
    • Uses supported, native OS methods
  • Complexity
    • easy to setup
    • easy to use
    • easy to maintain (patch, add / remove features)
    • requires simple skill set
  • Reliability
    • not susceptible to network outages
    • no single point of failure that affects restores for all systems
    • preserves snapshot integrity
  • Speed
    • Close to what it takes to recover a virtual machine
  • Cost
    • Low maintenance
    • Should not tie up scarce physical QA machines

At first, there were few options, ideas and ways that I had in mind, however none of these options used a universal approach to reimaging. Bare Metal restore software was very costly, required another server to perform the restore (one server per OS) and also imagine the kind of network traffic and load that would be generated if 40 systems would need to go into reimaging at the same time. Multiply that by 10 QA engineers with 40 servers each and doing reimaging would consume enormous network resources. No single open source tool was capable of handling everything. Another issue that was important - reliability and speed of the restore process.

In the next series of posts, I will go over the "how to" steps on implementing the snapshot recovery on virtually any physical server OS. If something would not be covered here, you will understand the approach on how to implement such mechanism on any other OS. Also, feel free to send anything happens to be missing to complete the collection.