VirtuallyAware

Experiences in a Virtual World

Archive for February 2010

When “Not” to Kill the Cluster Services In Hyper-V – Update

leave a comment »

 image  In a recent article I wrote for SearchServerVirtualization I spoke of times when it is necessary to kill the cluster service on your Hyper-V cluster.  This was a last resort in the event that no other way was available to manipulate the cluster i.e. Failover Cluster Manager, cluster.exe, SCVMM etc…  In the past, this has most commonly happened sporadically around when many Hyper-V writer VSS backups have been performed by DPM.  That is a different issue and I won’t go into it here.  Recently I had the pleasure of having a lockup of one of my Hyper-V nodes which left me no way to manipulate the cluster resources.  Cluster.exe would show me the resources, but giving commands to move a VM resource over to another node just hung or left the resource in a pending state on the effected node.  So after trying every option I could come up with, and dodging calls from angry end users trying to do their jobs, I decided it was time to kill the cluster service.  All went as expected.  The hung VMs on the affected node moved over to other remaining healthy nodes of the cluster and they started up.  The problem node was still up an running, but with the cluster services stopped. 

  This is the solution I usually recommend for this circumstance, but after a recent experience with this exact situation, I am reconsidering.  My recommended solution now is to hard power the server off.  You could try to shutdown the node, but from my experience, the clustersvs.exe process, never lets go and inhibits the shutdown.  The reason why I now recommend using an even bigger hammer of hard powering down of the problem node is for the good of the VMs.  How can this be good?  Let me say, it takes some nuggets to power down a node that may have 10-30 VMs living on it, but if there was an easier way to get them functioning better, believe me I would have tried them already before getting to this conclusion.  One strange occurrence that happens when you just kill the cluster service is the problem host never flushes the network layer.  Essentially what this means is that the VMs move to another node and start back up, but come back up with the message that a duplicate IP Address exists on the network or they acquire and APIPA.  Even when you shut these VMs down, they still ping, because the network associated with the VMs are still responding on the problem host.  Rebooting the VMs will not help.   If you do just kill the cluster services, the only thing that will help is if you reboot the problem host and then reboot the VMs that were on the problem host originally who are now moved to other nods of the cluster.  This can be a little hard to track down if you have many VMs on a particular cluster since they scatter to the remaining healthy nodes.   So in order to allow the VMs from the problem host to come up as cleanly as possible on a healthy node with full network capabilities, I now recommend hard powering down the problem host.  It may seem a bit harsh, but it can save you a lot of time.  Do it for the good of your VMs.

Questions? Easier way?  Did I miss something?  Pass it along and I will post it up.

-Rob

Written by VirtuallyAware

February 26, 2010 at 8:29 AM

Posted in Uncategorized

When “Not” to Kill the Cluster Services In Hyper-V – Update

leave a comment »

   In a recent article I wrote for SearchServerVirtualization I spoke of times when it is necessary to kill the cluster service on your Hyper-V cluster.  This was a last resort in the event that no other way was available to manipulate the cluster i.e. Failover Cluster Manager, cluster.exe, SCVMM etc…  In the past, this has most commonly happened sporadically around when many Hyper-V writer VSS backups have been performed by DPM.  That is a different issue and I won’t go into it here.  Recently I had the pleasure of having a lockup of one of my Hyper-V nodes which left me no way to manipulate the cluster resources.  Cluster.exe would show me the resources, but giving commands to move a VM resource over to another node just hung or left the resource in a pending state on the effected node.  So after trying every option I could come up with, and dodging calls from angry end users trying to do their jobs, I decided it was time to kill the cluster service.  All went as expected.  The hung VMs on the affected node moved over to other remaining healthy nodes of the cluster and they started up.  The problem node was still up an running, but with the cluster services stopped. 

  This is the solution I usually recommend for this circumstance, but after a recent experience with this exact situation, I am reconsidering.  My recommended solution now is to hard power the server off.  You could try to shutdown the node, but from my experience, the clustersvs.exe process, never lets go and inhibits the shutdown.  The reason why I now recommend using an even bigger hammer of hard powering down of the problem node is for the good of the VMs.  How can this be good?  Let me say, it takes some nuggets to power down a node that may have 10-30 VMs living on it, but if there was an easier way to get them functioning better, believe me I would have tried them already before getting to this conclusion.  One strange occurrence that happens when you just kill the cluster service is the problem host never flushes the network layer.  Essentially what this means is that the VMs move to another node and start back up, but come back up with the message that a duplicate IP Address exists on the network or they acquire and APIPA.  Even when you shut these VMs down, they still ping, because the network associated with the VMs are still responding on the problem host.  Rebooting the VMs will not help.   If you do just kill the cluster services, the only thing that will help is if you reboot the problem host and then reboot the VMs that were on the problem host originally who are now moved to other nods of the cluster.  This can be a little hard to track down if you have many VMs on a particular cluster since they scatter to the remaining healthy nodes.   So in order to allow the VMs from the problem host to come up as cleanly as possible on a healthy node with full network capabilities, I now recommend hard powering down the problem host.  It may seem a bit harsh, but it can save you a lot of time.  Do it for the good of your VMs.

Questions? Easier way?  Did I miss something?  Pass it along and I will post it up.

-Rob

Written by VirtuallyAware

February 25, 2010 at 9:30 PM

Posted in Uncategorized

Storage Challenges with Hyper-V

leave a comment »

A video question and answer session I did with Virsto Software a few weeks back over storage challenges I have seen in my Hyper-V environment.

Video below -

 

 

Also check out Roger Johnson’s video.

 

Written by VirtuallyAware

February 19, 2010 at 12:00 AM

Posted in Hyper-V

KB962975 – Funny Resolution

leave a comment »

Saw this come through on the from http://scug.be/blogs/scdpm/archive/2010/02/18/hotfix-dynamic-disk-hotfix.aspx.  It describes a hotfix for potential problems with Windows Server 2008 systems.  The hotfix is relevant, but the wording on the resolution gave me a chuckle. 

“Note This hotfix does not resolve the problem that is described in this article after this problem occurs. This hotfix only prevents the problem if this hotfix is installed in advance. Apply this hotfix to systems that will probably experience the described problem in the future. Applying this hotfix to systems that are experiencing the described problem does not resolve this problem.”

So if I have had the problem it is too late to install the hotfix, but if I haven’t had the problem it will correct the problem I have never had?

Written by VirtuallyAware

February 18, 2010 at 2:56 PM

Posted in Uncategorized

Microsoft KB977165 (MS10-015) problems affect XP and 2003.

leave a comment »

Mass rollout of KB977165 (MS10-015) within my organization happed today.  Saw one Windows Server 2003 system that would not boot and a few XP machines that kept on reporting it needed the patch after it was already installed.  Very few system were affected.  The Windows Server 2003 system happened to be a VM so I was able to restore the previous nights DPM backup and work through the patches one by one taking snapshots in between patches.  MS10-015 was the culprit. Restoration of VM’s makes me love the technology more every time I have to use it. 

Note:  When you are taking multiple snapshots, be aware of the amount of diskspace you have on the snapshot volume.  Also remember to delete the point in time snapshots and shutdown the server so that a proper merge process can proceed.

Alternate approaches to getting your problem VMs or physical servers up and running as a result of the MS10-015 issue can be found in many places. Below is one that walks you through the steps quite nicely.

MS10-015 – KB977165 Causing BSOD For Some – How To Deal With The Issue

Workaround if not installing the patch.  Hopefully this is only a temporary fix until v2 of MS10-015 is released.

Microsoft Security Advisory Vulnerability in Windows Kernel could allow elevation of privilege

Written by VirtuallyAware

February 11, 2010 at 8:31 PM

Microsoft KB977165 (MS10-015) problems affect XP and 2003.

leave a comment »

Mass rollout of KB977165 (MS10-015) within my organization happed today.  Saw one Windows Server 2003 system that would not boot and a few XP machines that kept on reporting it needed the patch after it was already installed.  Very few system were affected.  The Windows Server 2003 system happened to be a VM so I was able to restore the previous nights DPM backup and work through the patches one by one taking snapshots in between patches.  MS10-015 was the culprit. Restoration of VM’s makes me love the technology more every time I have to use it. 

Note:  When you are taking multiple snapshots, be aware of the amount of diskspace you have on the snapshot volume.  Also remember to delete the point in time snapshots and shutdown the server so that a proper merge process can proceed.

Alternate approaches to getting your problem VMs or physical servers up and running as a result of the MS10-015 issue can be found in many places. Below is one that walks you through the steps quite nicely.

MS10-015 – KB977165 Causing BSOD For Some – How To Deal With The Issue

Workaround if not installing the patch.  Hopefully this is only a temporary fix until v2 of MS10-015 is released.

Microsoft Security Advisory Vulnerability in Windows Kernel could allow elevation of privilege

Written by VirtuallyAware

February 11, 2010 at 8:31 PM

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 279 other followers

%d bloggers like this: