Dunfraggin...some IT posts and things: VMware

Showing posts with label VMware. Show all posts

Thursday, 28 November 2013

vCloud Director sysprep files

Had some fun running up a vCD server this past week so thought I'd post a quick memo to advise of the following changes in vCD between vCD 5.1 and vCD 5.5 regarding sysprep files.

I had been following some excellent blogs on the vCD 5.1 install process from Kendrick Coleman (Install vCD 5.1 & vCD Networking) and applying this to my vCD 5.5 installation. When I tried to follow the process copy the sysprep files over to the vCD cell I hit a snag as there was no script to run to generate the sysprep files required. This, it turns out, is because in 5.5 they have improved this process and now you simply need to create the directories and place the sysprep files into the directory and away you go. Not even a service restart is required to start customizing older OSes through vCD.

The folder locations in vCD 5.5 should be (extract taken from the VMware install document for vCD 5.5 - which I should have read more keenly it seems!):

Procedure:

Log in to the target server as root.
Change directory to $VCLOUD_HOME/guestcustomization/default/windows.
[root@cell1 /]# cd /opt/vmware/vcloud-director/guestcustomization/default/windows
Create a directory named sysprep.
[root@cell1 /opt/vmware/vcloud-director/guestcustomization/default/windows]# mkdir sysprep
For each guest operating system that requires Sysprep binary files, create a subdirectory of
$VCLOUD_HOME/guestcustomization/default/windows/sysprep.
Subdirectory names are specific to a guest operating system and are case sensitive.

Windows 2003 (32-bit) should be called svr2003
Windows 2003 (64-bit) should be called svr2003-64
Windows XP (32-bit) should be called xp
Windows XP (64-bit) should be called xp-64

Copy the Sysprep binary files to the appropriate location on each vCloud Director server in the server group.
Ensure that the Sysprep files are readable by the user vcloud.vcloud.
Use the Linux chown command to do this.
[root@cell1 /]# chown -R vcloud.vcloud $VCLOUD_HOME/guestcustomization

When the Sysprep files are copied to all members of the server group, you can perform guest customization

on virtual machines in your cloud. You do not need to restart vCloud Director after the Sysprep files are copied.

So there you go...simple if you read the manuals properly in the first place :)

Thursday, 15 November 2012

Storage vMotion Error: The method is disabled by 'SYMC-FULL dd-mm-yyyy...'

I had this error come up the other day whilst trying to SvMotion one of our vms over to a new storage array:

Now this is actually one of those obvious and helpful error messages that you get every now and then and just by looking at the error message I could see what had caused this issue.

We use Backup Exec with the avvi agent to perform backups of some of our production vms.

The avvi agent allows us to perform SAN to Tape backups off host which means we don't need to do anything special with regards to backup configuration on any of our ESXi hosts. The configuration process is a simple of case enable the option within the Backup Exec media servers and present the ESX Datastores to them (with the same LUN IDs etc) and that's pretty much it. Most of our vms are also running the Backup Exec Remote Agent for it's OS (Windows or Linux) which then allows us to have granular file recovery from our image based backups which is a nice feature...although not as useful when doing your backups to tape and not disk as the recovery process still needs to extract the full vmdk off of the tape before recovering the individual files to be restored to the vm or elsewhere.

A good guide for setting this configuration up can be found on Symantec's website here:

How to backup ESX 4.0-5.0 using SAN Transport Mode with Backup Exec

Now what usually happens when a backup job is run on a vm using this method is this:

· The BE job starts on the media server and talks to vCenter to take a snapshot of the vms vmdk

· Once completed the vm is now running from the snapshot and the original vmdk is static and only read by the vm

· BE then gets the ESXi host and guest virtual machine information from vCenter it needs to backup

· BE then opens a connection with the ESXi server to ask for the virtual machine metadata

· BE then informs vCenter to disable Storage vMotion for that VM to ensure that the backups can complete successfully.

· Using vStorage APIs, Backup Exec then opens a direct data connection to the ‘unknown’ SAN volumes which have been presented to it and the virtual machine data is offloaded directly to the media server for backup

· Once the backup process has completed the snapshot is deleted and BE disconnects from the ESXi host and informs vCenter to enable Storage vMotion again for the vm

· Backup job then completes.

The error above is caused by the Storage vMotion being disabled by Backup Exec to run the backups. After the backup job completes the call to vCenter does not get made or fails and so the vm is stuck with it's Storage vMotion disabled.

The trouble with this is that you often don't know this is an issue until you go to perform a Storage vMotion or unless you have vms inside an SDRS cluster and they fail to migrate to other datastores.

You can however identify these vms though by performing a lookup within the vCenter database as described in this VMware KB article:

Storage vMotion fails with the error: The method is disabled by 'SYMC-INCR dd-mm-yyyy hh:mm'

Luckily this is a known issue and there are two very easy ways to address this if you have this issue.

The first, and often easiest way, is to shutdown the vm and remove it from the inventory. Then browse thedatastore where it resides, locate the vmx file and add it to the inventory again.

This approach basically gives the vm a new id within vcenter and thus gets any customised settings removed allowing it to SvMotion again.

This does pose an issue however in that you will need downtime on your vm, although very short, in order to resolve this.

The other approach, as detailed by VMware in the KB above, is to manually edit the settings within the vCenter DB for the vm affected. Whilst this does not require a vm outage to work, it does require vCenter to be stopped whilst you access the DB and in some instances (Environments with vCloud Director, SRM, LabManager etc) this is more impacting than 1 vm being shutdown for a couple of mins and so finding a quiet evening or weekend to shut the vm down is my preferred approach and this can be easily scripted anyway to save those long hours from building up!

This is not restricted to Symantec btw. I have seen this issue with VEEAM backup software also and as yet I'm not aware of any definitive solution to prevent this from happening from time to time. It pays to keep an eye on this if you are running a similar backup technology in your environment.

Thursday, 4 October 2012

Setting Storage Path alerts

Since vSphere 4.0 there has been a large increase in the available alarms which not only come pre-configured but are also available to be created and you can pretty much now create an alarm for almost anything within vCenter.

It still surprises me though why some quite essential monitoring areas are not included within the default set of pre-configured alarms. One such alarm is the Storage Path Redundancy alarm will let you know when you have lost paths to your SAN storage and what datastores this will be affecting etc. This is a very simple alarm to setup but also pretty essential to virtually all vSphere implementations these days I'd imagine.

To set up the alarm select the vCenter server in the vSphere client and then go to the 'Alarms' tab.
Select 'Definitions' to see a list of all currently configured alarms and then right click in the section to create a new alarm.
Give the alarm a name ('Degraded Storage Paths' for example) and change the Monitor to 'Hosts' and then choose 'Monitor for specific events occurring on this object, for example, VM powered On'.
On the 'Triggers' tab click 'Add' and then change the Event type to 'Degraded Storage Path Redundancy'.
Next select the 'Actions' tab and Add an action to be performed when this event occurs. This can either be an email alert perhaps to the storage team or even a task for the ESXi host to perform.
Once set, click 'OK' and the alarm is set.

It's also worth creating another alarm to go along with this once which alerts when one of the ports goes offline too. That way you get notifications of path redundancy lost or a full port connectivity loss which will help in troubleshooting the issue being experienced.

To set this up, simply create another rule as above but this time set the trigger to be 'Lost Storage Path Redundancy' and set whatever actions you would like.

There are many other good alarms to set depending on what monitoring solutions you may or may not have in place for your virtual environment so its always good to have a look through the list of available alarms and just check that you have everything you need configured before you need it....they're not going to do that much if you've created them after the event!

Monday, 24 September 2012

Free VMware SRM training videos

VMware have just released a set of free (yup, totally free!) training videos for Site Recovery Manager (SRM) on their website:

http://blogs.vmware.com/education/2012/09/free-site-recovery-manager-training.html

This is a great resource for those wishing to deploy SRM and I would urge all to take a look through the videos before starting your deployments.

Sunday, 16 September 2012

vSphere VM deployment customizations

A small but annoying thing had started to happen to your deployments of Windows 2008 R2 vms in our production environment recently. Whenever we deployed a new vm and used our pre-saved customization specification the vm would be deployed as expected except that it did not join the new vm to our production domain.
The image would be customised, the server name changed, IP settings applied, administrator password set etc but it would no longer join the vm to our windows domain.

Alarmingly, although the option was set within the specification there were no errors recorded for this in the logs on the newly deployed vm (these can be found at c:\windows\temp\vmware-imc\guestcust.log) which I would have expected.

The answer it turned out was very simple. The customization had been modified to have domain\username in the username field of the domain customization properties. Although this looks perfectly reasonable to have in a windows environment this actually needs to be just the username of the domain account which will be joining the vm to the domain.

After changing the pre-saved customization to just the account name and re-entering the password I fired off a test deployment and voilà, 1 windows vm deployed and sitting on our production domain as before!

Wednesday, 12 September 2012

Virtual Machine disk consolidation fails with I/O error on change tracking file

A vm was displaying the warning that 'Virtual machine disks consolidation is needed' which is a nice feature of vSphere 5 which now actively tells you about this issue (It's always been there in previous releases but never highlighted in this way until 5.0).

We often get this issue as we use a snapshot backup technology to backup our vms each day and for some reason or other sometimes the remove snapshot process does not complete properly and we get this situation where the snapshots are removed but the snapshot files are still present and referenced in the vm. See the following VMware kb article for details.Consolidating snapshots in vSphere 5

Usually this is a simple process of right clicking the vm, selecting 'snapshot > consolidate' to have the snapshot child disk files consolidated back to the parent disk file but in this case the consolidation failed with the error message: 'A general system error occurred: I/O error accessing change tracking file'.

After some investigation I found that our backup system had a lock on one of the files and so I was able to release the file from the backup software and then re-run the consolidation which completed and all was good again!
The troubleshooting steps to identify the locked file can be found here: Investigating virtual machine file locks on ESX/ESXi

Previously I've also been able to resolve the issue of not being able to consolidate vm disks by creating a clone of the troubled vm and bringing it up as the active vm and then deleting the old one. Not always possible though in a production environment!

Tuesday, 11 September 2012

vCenter Operations Manager not displaying Risk or Efficiency data

So finally made the upgrade from CapacityIQ and deployed VMwares new vCenter Ops Manager in it's place. The upgrade process of deploying the new Appliance was straight forward and error free.

During the installation process you have the option to import your old data and settings from the CapacityIQ appliance into the new vCOM database so you don't lose any of the existing trending information etc.

This process worked like a charm but after a few days or so I noticed that the Risk and Efficiency data never populated on the dsahboard screen and I was not able to get any Capacity or Trending information.

After looking at a few blogs and the excellent VMware Communities I was still not able to find why this was not working and so logged a support call. The answer was simple and when thinking about it, obvious.
The below was the summary provided to me from support:

To calculate time remaining and capacity remaining metrics, there are overall 5 resources that we consider

* cpu

* memory

* disk IO

* disk space

* network IO

However, these 5 resources do not apply to all object types. So under the hood, we actually consider a subset of the applicable resources for each object type. For example, for datastore object we consider only selected resources out of disk space and disk IO resources ; for vm object we consider only selected resources out of cpu, memory, disk space resources; for host and up, we consider selected resources out of all resources. For a given object type, if all the applicable resources are unchecked (i.e., none are selected), the metric calculation module is unable to figure of the metric dependency and unable to calculate the time remaining or vm remaining values.

Now in CapacityIQ we never cared for disk Capacity as a factor in our host capacity reports as we run several SANs which are attached to our ESXi cluster and this space is only carved up and added to the environment on a per-need basis. We were mainly only concerned about CPU and Memory primarily and so these settings were not selected in CapacityIQ and so did not come accross to the new vCOM deployment when we imported the settings and data from CapacityIQ.

In our case, simply adding 'Disk Space capacity and usage' and/or 'Disk I/O capacity and usage' in the "Capacity & Time Remaining" configuration panel solved the problem!
When the Analytics process next run on the system (1am by default) the Risk and Efficiency areas populated and all was well.

The support guy did mention that this is being fixed in version 5.6 so that at least one of the applicable resources for each object type is checked, but for now it's a manual process.

Thursday, 6 September 2012

VMware SRM 5 recovery plan environment scripts

How to create a recovery plan script in SRM5 that will perform different tasks depending if the recovery plan is in test mode or recovery mode.

It's pretty easy to add scripts to recovery plans in SRM5 to perform all sorts of tasks in recovered environments or VMs but what if you need to have the script do something different when it is run in a test scenario like add some test environment specific routes or add some host file entries to allow recovered VMs to talk to one another in a non-production LAN (no DNS or Gateways exist for example)? Well thanks to SRM5 you can make use of some environment variables which are injected into the recovered VMs by the SRM service in order to do just that!

The main variable to look at here would be VMware_RecoveryMode. This variable has a setting of either test or recovery depending on how the recovery plan is being run at the time and so can be referenced in your script to act differently according the the value of this variable.

A basic example of this can be found in the below script which is a simple batch file...

IF %VMware_RecoveryMode% EQU test (Goto TestRun) Else (Goto OtherRun)

:TestRun
for /f "delims=: tokens=2" %%a in ('ipconfig ^| findstr /R /C:"IPv4 Address"') do (set tempip=%%a)
set tempip=%tempip: =%
route add 10.10.1.0 mask 255.255.255.0 %tempip% -p
route add 10.10.2.0 mask 255.255.255.0 %tempip% -p
Echo Routes Applied to Test environment on %date% at %time% >> c:\srm\srmlog.txt

Echo 10.99.53.13 server1.company.com >> %windir%\system32\drivers\etc\hosts
Echo Host file entries Applied on %date% at %time%>> c:\srm\srmlog.txt
EXIT

:OtherRun
IF %VMware_RecoveryMode% EQU recovery (Goto RecoveryRun) Else (Echo an unexpected result occurred on %date% at %time% >> c:\srm\srmlog.txt)
EXIT

:RecoveryRun
Echo Recovery started on %date% at %time% >> c:\srm\srmlog.txt
EXIT

This script checks to see if the recovery mode is 'test' and if it is then proceeds to run some things under the :TestRun section.
If the mode is not test then it checks to see if it is in recovery mode and again if it is it then runs some things under the :RecoveryRun section.
If it for some reasons doesn't see either test or recovery in the variable it will just write a simple log file to C:\SRMfolder of the VM running the script.

There are other environment variables available to play with too which can be found in the SRM administrators guide so hop over to VMware and check it out