VMware have just released a set of free (yup, totally free!) training videos for Site Recovery Manager (SRM) on their website:
http://blogs.vmware.com/education/2012/09/free-site-recovery-manager-training.html
This is a great resource for those wishing to deploy SRM and I would urge all to take a look through the videos before starting your deployments.
It’s supposed to be automatic, but actually you have to push this button. –John Brunner
Monday, 24 September 2012
Wednesday, 19 September 2012
Migrating VMs running on VSS to VDS in a Production ESX cluster (Migrate Virtual Machine Networking…)
With one of our ESXi 5.0 clusters growing to 12 hosts and
our Networking team constantly wanting to deploy new vlans like they are going
out of fashion it was time to implement a distributed vSwitch (vDS) to the
cluster in order to reduce the administrative overhead of adding all of the
port groups to each vSwitch on each host (not really the case but it’s always
good to keep up with the professional dogging of those poor network guys
eh?).
The process to deploy a new vDS to a cluster is pretty
straight forward and you can follow the process from within vCenter here: VMware KB
Once created the next step was to create each of the vlan
port groups onto the VDS. Here I simply
setup the vlan with the same name as used currently on each of the VSS (you can
do this as the name has ‘(dvswitch)’ appended to it anyway so it keeps these separate
from the existing port groups when selecting network connectivity when editing
VMs) and the same vlan ID entry etc.
Next I moved 2 of the 4 x 1Gb adaptors from each VSS into
the dvuplink ports on the VDS. This then
allowed the existing VSS port groups to continue to service network requests
for all of the running VMs and also allowed me to start moving VMs from the VSS
to VDS .
To migrate the VMs from their current port groups on each
VSS to the newly created port groups on the VDS you can use the excellent
Migrate Virtual Machine Networking utility which manages the bulk modifications
to VMs.
To do this simply so to the networking screen in the VI
client and right click on the new VDS and choose ‘Migrate Virtual Machine
Networking…’
Next select the source network from the drop down list
(this is the current port group that you want to move VMs off of) and then
select the destination network (the corresponding port group on the dvSwitch)
Click Next and you can now select all or some of the VMs
to be migrated. If you select all of the
VMs you’ll be able to sit back and watch as each VM is modified in turn and
moved over. It really is simple and best
of all results in no network outage to the running VMs.
I migrated several hundred VMs across our various vlans
to the distributed switch without one little blip!
Then it was a just a matter of going through each of the
hosts and cleaning up the old port groups and vSwitches which were no longer being
used.
Sunday, 16 September 2012
vSphere VM deployment customizations
A small but annoying thing had started to happen to your deployments of Windows 2008 R2 vms in our production environment recently. Whenever we deployed a new vm and used our pre-saved customization specification the vm would be deployed as expected except that it did not join the new vm to our production domain.
The image would be customised, the server name changed, IP settings applied, administrator password set etc but it would no longer join the vm to our windows domain.
Alarmingly, although the option was set within the specification there were no errors recorded for this in the logs on the newly deployed vm (these can be found at c:\windows\temp\vmware-imc\guestcust.log) which I would have expected.
The answer it turned out was very simple. The customization had been modified to have domain\username in the username field of the domain customization properties. Although this looks perfectly reasonable to have in a windows environment this actually needs to be just the username of the domain account which will be joining the vm to the domain.
After changing the pre-saved customization to just the account name and re-entering the password I fired off a test deployment and voilà, 1 windows vm deployed and sitting on our production domain as before!
The image would be customised, the server name changed, IP settings applied, administrator password set etc but it would no longer join the vm to our windows domain.
Alarmingly, although the option was set within the specification there were no errors recorded for this in the logs on the newly deployed vm (these can be found at c:\windows\temp\vmware-imc\guestcust.log) which I would have expected.
The answer it turned out was very simple. The customization had been modified to have domain\username in the username field of the domain customization properties. Although this looks perfectly reasonable to have in a windows environment this actually needs to be just the username of the domain account which will be joining the vm to the domain.
After changing the pre-saved customization to just the account name and re-entering the password I fired off a test deployment and voilà, 1 windows vm deployed and sitting on our production domain as before!
Wednesday, 12 September 2012
Virtual Machine disk consolidation fails with I/O error on change tracking file
A vm was displaying the warning that 'Virtual machine disks consolidation is needed' which is a nice feature of vSphere 5 which now actively tells you about this issue (It's always been there in previous releases but never highlighted in this way until 5.0).
We often get this issue as we use a snapshot backup technology to backup our vms each day and for some reason or other sometimes the remove snapshot process does not complete properly and we get this situation where the snapshots are removed but the snapshot files are still present and referenced in the vm. See the following VMware kb article for details.Consolidating snapshots in vSphere 5
Usually this is a simple process of right clicking the vm, selecting 'snapshot > consolidate' to have the snapshot child disk files consolidated back to the parent disk file but in this case the consolidation failed with the error message: 'A general system error occurred: I/O error accessing change tracking file'.
After some investigation I found that our backup system had a lock on one of the files and so I was able to release the file from the backup software and then re-run the consolidation which completed and all was good again!
The troubleshooting steps to identify the locked file can be found here: Investigating virtual machine file locks on ESX/ESXi
Previously I've also been able to resolve the issue of not being able to consolidate vm disks by creating a clone of the troubled vm and bringing it up as the active vm and then deleting the old one. Not always possible though in a production environment!
We often get this issue as we use a snapshot backup technology to backup our vms each day and for some reason or other sometimes the remove snapshot process does not complete properly and we get this situation where the snapshots are removed but the snapshot files are still present and referenced in the vm. See the following VMware kb article for details.Consolidating snapshots in vSphere 5
Usually this is a simple process of right clicking the vm, selecting 'snapshot > consolidate' to have the snapshot child disk files consolidated back to the parent disk file but in this case the consolidation failed with the error message: 'A general system error occurred: I/O error accessing change tracking file'.
After some investigation I found that our backup system had a lock on one of the files and so I was able to release the file from the backup software and then re-run the consolidation which completed and all was good again!
The troubleshooting steps to identify the locked file can be found here: Investigating virtual machine file locks on ESX/ESXi
Previously I've also been able to resolve the issue of not being able to consolidate vm disks by creating a clone of the troubled vm and bringing it up as the active vm and then deleting the old one. Not always possible though in a production environment!
Tuesday, 11 September 2012
vCenter Operations Manager not displaying Risk or Efficiency data
So finally made the upgrade from CapacityIQ and deployed VMwares new vCenter Ops Manager in it's place. The upgrade process of deploying the new Appliance was straight forward and error free.
During the installation process you have the option to import your old data and settings from the CapacityIQ appliance into the new vCOM database so you don't lose any of the existing trending information etc.
This process worked like a charm but after a few days or so I noticed that the Risk and Efficiency data never populated on the dsahboard screen and I was not able to get any Capacity or Trending information.
After looking at a few blogs and the excellent VMware Communities I was still not able to find why this was not working and so logged a support call. The answer was simple and when thinking about it, obvious.
The below was the summary provided to me from support:
Now in CapacityIQ we never cared for disk Capacity as a factor in our host capacity reports as we run several SANs which are attached to our ESXi cluster and this space is only carved up and added to the environment on a per-need basis. We were mainly only concerned about CPU and Memory primarily and so these settings were not selected in CapacityIQ and so did not come accross to the new vCOM deployment when we imported the settings and data from CapacityIQ.
In our case, simply adding 'Disk Space capacity and usage' and/or 'Disk I/O capacity and usage' in the "Capacity & Time Remaining" configuration panel solved the problem!
When the Analytics process next run on the system (1am by default) the Risk and Efficiency areas populated and all was well.
The support guy did mention that this is being fixed in version 5.6 so that at least one of the applicable resources for each object type is checked, but for now it's a manual process.
During the installation process you have the option to import your old data and settings from the CapacityIQ appliance into the new vCOM database so you don't lose any of the existing trending information etc.
This process worked like a charm but after a few days or so I noticed that the Risk and Efficiency data never populated on the dsahboard screen and I was not able to get any Capacity or Trending information.
After looking at a few blogs and the excellent VMware Communities I was still not able to find why this was not working and so logged a support call. The answer was simple and when thinking about it, obvious.
The below was the summary provided to me from support:
To calculate time remaining and capacity
remaining metrics, there are overall 5 resources that we consider
* cpu
* memory
* disk IO
* disk space
* network IO
However, these 5 resources do not apply to all object types. So under the hood,
we actually consider a subset of the applicable resources for each object type.
For example, for datastore object we consider only selected resources out of
disk space and disk IO resources ; for vm object we consider only selected
resources out of cpu, memory, disk space resources; for host and up, we
consider selected resources out of all resources. For a given object type, if
all the applicable resources are unchecked (i.e., none are selected), the
metric calculation module is unable to figure of the metric dependency and
unable to calculate the time remaining or vm remaining values.
Now in CapacityIQ we never cared for disk Capacity as a factor in our host capacity reports as we run several SANs which are attached to our ESXi cluster and this space is only carved up and added to the environment on a per-need basis. We were mainly only concerned about CPU and Memory primarily and so these settings were not selected in CapacityIQ and so did not come accross to the new vCOM deployment when we imported the settings and data from CapacityIQ.
In our case, simply adding 'Disk Space capacity and usage' and/or 'Disk I/O capacity and usage' in the "Capacity & Time Remaining" configuration panel solved the problem!
When the Analytics process next run on the system (1am by default) the Risk and Efficiency areas populated and all was well.
The support guy did mention that this is being fixed in version 5.6 so that at least one of the applicable resources for each object type is checked, but for now it's a manual process.
Labels:
CapacityIQ,
vCenter Operations Manager,
vCOM,
VMware
Thursday, 6 September 2012
VMware SRM 5 recovery plan environment scripts
How to create a recovery plan script in SRM5 that will perform different tasks depending if the recovery plan is in test mode or recovery mode.
It's pretty easy to add scripts to recovery plans in SRM5 to perform all sorts of tasks in recovered environments or VMs but what if you need to have the script do something different when it is run in a test scenario like add some test environment specific routes or add some host file entries to allow recovered VMs to talk to one another in a non-production LAN (no DNS or Gateways exist for example)? Well thanks to SRM5 you can make use of some environment variables which are injected into the recovered VMs by the SRM service in order to do just that!
The main variable to look at here would be VMware_RecoveryMode. This variable has a setting of either test or recovery depending on how the recovery plan is being run at the time and so can be referenced in your script to act differently according the the value of this variable.
A basic example of this can be found in the below script which is a simple batch file...
IF %VMware_RecoveryMode% EQU test (Goto TestRun) Else (Goto OtherRun)
:TestRun
for /f "delims=: tokens=2" %%a in ('ipconfig ^| findstr /R /C:"IPv4 Address"') do (set tempip=%%a)
set tempip=%tempip: =%
route add 10.10.1.0 mask 255.255.255.0 %tempip% -p
route add 10.10.2.0 mask 255.255.255.0 %tempip% -p
Echo Routes Applied to Test environment on %date% at %time% >> c:\srm\srmlog.txt
Echo 10.99.53.13 server1.company.com >> %windir%\system32\drivers\etc\hosts
Echo Host file entries Applied on %date% at %time%>> c:\srm\srmlog.txt
EXIT
:OtherRun
IF %VMware_RecoveryMode% EQU recovery (Goto RecoveryRun) Else (Echo an unexpected result occurred on %date% at %time% >> c:\srm\srmlog.txt)
EXIT
:RecoveryRun
Echo Recovery started on %date% at %time% >> c:\srm\srmlog.txt
EXIT
This script checks to see if the recovery mode is 'test' and if it is then proceeds to run some things under the :TestRun section.
If the mode is not test then it checks to see if it is in recovery mode and again if it is it then runs some things under the :RecoveryRun section.
If it for some reasons doesn't see either test or recovery in the variable it will just write a simple log file to C:\SRMfolder of the VM running the script.
There are other environment variables available to play with too which can be found in the SRM administrators guide so hop over to VMware and check it out
Subscribe to:
Posts (Atom)