Online Dynamic Hard Drive in Failed State

I ran into a new situation. Here’s the setup, I’m at a customer’s location performing a migration of VMs from one SAN array to another (svmotion). Ok, Not that hard – not really. As soon as you let your guard down — even on the simplest of things, that’s when you get bit by something or you discover a new and challenging puzzle.

The migration consisted of converting VMs with Physical mode RDMs to vmdks. (Again, not that hard). The key thing here is that they were physical and so I needed to take the VM offline for the migration. Yeah, I know I could have converted it to virtual mode and did it online — but trust me when I say, it was better to do it offline. After a grueling 24 hour move, the RDMs were now vmdks. YAY! Hoorah!

— Not so fast….

I had the customer boot up the VM for verification. Did the VM survive the move, and does your app perform normally? After a few minutes, his answer was ‘no’. uh-oh. What is wrong?

Apparently, when the RDMs were created, they were added to Windows as a ‘Dynamic Hard Drive’. Who knows if this was something another vendor did, or whether or not it was done originally. But when we opened “Disk Management”, the Dynamic Hard Drives were showing “Online” but in a failed state. What the heck does this mean? In this case, it just meant the drive had become unavailable and needed to be reactivated.

The fix? Believe it or not, right click the drive and select “Reactivate Disk”. The 2TB drive took all of about 2 seconds to reactivate and everything started working EXACTLY as it should have.

Here’s the link that I followed to resolve the issue:
http://technet.microsoft.com/en-us/library/cc732026.aspx

SVMotion does not rename files – Duncan Epping

Duncan Epping posted just a few days ago on his blog – YellowBricks.com of an issue with files not being renamed when you svmotion a VM. This is a royal pain.

Here’s a scenario of why this is aggravating. You have a VM named “TestVM”. When it was created, it created the folder “TestVM” within a datastore with the VM’s folder, the files were labeled “TestVM.vmdk”, “TestVM.vmx”, etc. If this VM was decommissioned, and you wound up reusing it at later time. Some admins would just rename the VM within vcenter and change the hostname of the virtual machine. Unfortunately, in the storage layer, the VM would still be listed as “TestVM”. This can be confusing if you are having to do some cleanup in your datastores. You would come across a folder labeled “TestVM” and not know what VM this belongs to — without going through each VM or running a powershell script to identify it. Like I said, a royal pain.

In the past, you could svmotion the VM to another datastore, and the svmotion process would rename the files. Unfortunately, this got left out of vSphere 5.0. Duncan’s blog gives a fix for this so that you can get back to renaming files with svmotion.

Link: http://www.yellow-bricks.com/2013/01/25/storage-vmotion-does-not-rename-files/

Maximum Switchover Timeout

I recently ran into an issue where I was having to svmotion some rather large VMs (1-2TBs) that stretched over multiple datastores. During the svmotion, the VMs would time out at various percentages presenting this error.
svmotion

Consulting with Prof. G (Google) presented a VMware KB Article: 1010045. That article states; “This timeout occurs when the maximum amount of time for switchover to the destination is exceeded. This may occur if there are a large number of provisioning, migration, or power operations occurring on the same datastore as the Storage vMotion. The virtual machine’s disk files are reopened during this time, so disk performance issues or large numbers of disks may lead to timeouts.” Yep, this was me. I was having to svmotion VMs from one datastore to another during a vsphere 5 upgrade.

The KB Article discusses adding a timeout value, called “fsr.maxSwitchoverSeconds” to the VM’s VMX file to prevent the timeout.

To modify the fsr.maxSwitchoverSeconds option using the vSphere Client:

1.) Open vSphere Client and connect to the ESX/ESXi host or to vCenter Server.
2.) Locate the virtual machine in the inventory.
3.) Power off the virtual machine.
4.) Right-click the virtual machine and click Edit Settings.
5.) Click the Options tab.
6.) Select the Advanced: General section.
7.) Click the Configuration Parameters button.

Note: The Configuration Parameters button is disabled when the virtual machine is powered on.

8.) From the Configuration Parameters window, click Add Row.
9.) In the Name field, enter the parameter name:

fsr.maxSwitchoverSeconds

10.) In the Value field, enter the new timeout value in seconds (for example: 150).
(I chose a value of 200.)
11.) Click the OK buttons twice to save the configuration change.
12.) Power on the virtual machine.

From personal experience, this was a homerun. It resolved my problem.