Hotfix KB982210 – Windows Server 2008 R1/R2 Hang at Logon Bug Deep Dive

 

(WinSE Bug # 318935)

This Hotfix (KB982210), “The startup time increases every time after you backup the data on all disks of the computer in Windows Server 2008 if the computer runs some virtual machines,” just came out on June 13th which is something I battled and worked with Microsoft for about a year.

Detailed Symptoms:

During the first logon after a reboot, the server will hang most commonly at the “Welcome” screen (Sometimes if you logon immediately after being presented with the Ctrl-Alt-Del, you will be able to get to the desktop before if freezes).  This will happen if connecting locally or through RDP.  While in this hung state, you can ping the server and connect to a UNC path i.e. \\SERVERNAME\C$, but you will be unable to remotely manage the server i.e. Eventvwr, Computer Management, WMI, WINRS etc…  After a period of time, which could be minutes to hours, the server will become fully responsive again.  Subsequent logons without rebooting the system will not reproduce the logon hang.

Standalone Hyper-V Hosts:

Standalone Hyper-V hosts will experience the symptoms as outlined above, but will not effect VMs running on the host.

Clustered Hyper-V Hosts:

When experiencing the logon hang on a node of a clustered Hyper-V host, the cluster will become unstable and VMs living on the effected host or on other hosts will begin to crash producing a high-availability reaction of the cluster.

Products or processes that will cause this problem:

Any product that mounts and dismounts a VHD.  Backup products or procedures that utilize host based Hyper-V VSS based backup solutions like DPM 2007/2010 or utilization of the Diskshadow utility will produce the logon hang as the number or orphaned devices within the registry grows.  Commonly these solutions utilize a mounting of a Hyper-V VSS snapshot. 

The screen print above shows the extent of the registry bloat of the SYSTEM Registry Hive that on a Hyper-V host that has taken daily host based snapshots for the past  year.

 

The screen shot above shows the SYSTEM Registry Hive of a normal Hyper-V host system that was just installed.

Workaround:

For the last two months I have used the DevNodeClean utility workaround successfully to help clean out these orphaned devices from the registry to head off potential logon hangs.  I created a scheduled task utilizing the utility that ran every day as well as a shutdown script to ensure that when a system was rebooted and I attempted a logon, there would be no hang.

You can reproduce this by creating a VHD and repeatedly mounting and dismounting the VHD.  If you use the DevNodeClean utility or a utility like DeviceRemover, you will see that each time you mount the VHD it leaves a distinct orphaned storage device.  Do enough of these and you will see the slow logon behavior that can cause instability.  For my testing I created a simple script that mounted and dismounted a small VHD file 4000 times.  Upon logon after a restart I would observe the hang.  After running the DevNodeClean and restarting again, there would be no logon hang or management issues.

This registry bloat caused by the orphaned VHD files and the enumeration of these orphaned devices at the first logon after a reboot is what is behind the “slow logon.”  Even though the DeveNodeClean utility helped with logon time, it does not shrink the size of your SYSTEM Hive once the orphaned storage devices have been removed. 

In order to do this, you will have to run the CHKREG utility:

1. Copy checkreg.exe to the root of C:\ on your server

2. Boot the server using your Windows PE boot disk

3. Copy “C:\Windows\System32\config\SYSTEM” to the root of C:\

4. Open up a command prompt and change to the root of C:\

5. Run the following commands waiting for each one to complete before continuing with the next:

Chkreg /F SYSTEM /R

Chkreg /F SYSTEM /C

6. The new files will have a .bak extension, i.e. SOFTWARE.BAK & SYSTEM.BAK

7. Make a backup copy of SYSTEM in the “C:\Windows\System32\config” folder

8. Copy SYSTEM.BAK to the “C:\Windows\System32\config” folder and remove the .BAK extension

9. After the copy has completed, reboot the server

 

I am currently testing the Hotfix on some of my Windows Sever 2008 R2 with Hyper-V servers and will let you know of the long-term results.  I have confirmed that the Hotfix will not run on Windows Server 2008 R1  with Hyper-V even though the problem also exists on that platform.  The Hotfix will install on Windows 7.  Keeping the DevNodeClean utility procedure would seem to be the only course of action for those platforms at this time.  From my workings with the Microsoft support team, this was a bug that was classified as a Windows OS bug and not a Hyper-V R2 bug, which makes sense since it is reproducible on both Windows Server 2008 R1 and Windows 7 with the mounting and unmounting of a simple VHD file. 

Errors in the KB982210 article:

Prerequistes:  To apply this hotfix, you must be running Windows Server 2008 R2.  Additionally, you must have the Hyper-V role installed. (Hotfix will install on Windows 7 obviously without the Hyper-V role)

Restart Requirement: You do not have to restart the computer after you apply this hotfix.  (Hotfix requests a restart.  Whether it is necessary is unknown.)

Windows Server 2008 R2 file information notes: For all supported IA-64-based versions of Windows Server 2008 R2 (The article states it is for systems running “Windows Server 2008 R2 and that has the Hyper-V Role installed.”  IA-64 does not have the Hyper-V role available.  This points to this being a Windows Server 2008 bug and not having a direct connection to Hyper-V.)

Title of the article: “The startup time increased every time after you backup the data on all the disks of the computer in Windows Server 2008 R2 if the computer runs some virtual machines.”  (There has to be a better title for this article.  Other than the shear length of the title, it is a bit confusing.  How about “Windows Server 2008 R1/R2 and Windows 7 hang at logon due to orphaned device registry bloat”)

 

-Rob

This entry was posted in Hyper-V. Bookmark the permalink.

18 Responses to Hotfix KB982210 – Windows Server 2008 R1/R2 Hang at Logon Bug Deep Dive

  1. We battled with this case for several weeks on one of our VPS servers. Thank you for this detailed post. A quick addition –

    Under: Products or processes that will cause this problem:

    If MSSQL is also installed under the node, it will also fail to respond.

  2. If you have access to Microsoft on this case, this is really something that should be pushed out to all the servers. We now confirmed this happening with 2 other VPS servers. The longer they are online, the more apparent it becomes as they’ll get a chance to reboot.

    Some other events which I think are related and I’ve seen on every one of these cases –

    – VSS snapshots take longer and longer to complete.
    – (possible) sudden server crashes on some VSS snapshots (this one we’ll need to wait a month or 2 to see if it happens again)

  3. Just a quick heads up –

    WE applied the hotfix to 3 of our R2 servers (we have many more). We haven’t rebooted 2 of them but one would hang at 32% complete on Configuring Windows Updates for about 10 minutes prior to shutting down the rest of the servers. On reboot, before logging in, it would attempt to do the same thing. So this happened 3 times before we went into safemode.

    Safemode would say “Failure Configuring Windows Updates. Reverting Changes. Do not turn off your computer.” Then it would reboot again. After 3 times, it let us into safemode. We removed the reboot.xml from the C:\windows\winsxs and were able to boot back into the system. But the issue was not resolved. The system still takes 1-2 hours to load up. Another one of servers takes 3-4 hours to load up. All of these servers have been backing up the VMS every hour for several months.

    I think it’d be good to include how to obtain the devnodeclean utility as well in such cases. As right now, the hotfix is not working for us.

  4. James says:

    Hi, I am experiencing the problem above on one of our servers running Hyper V Server R2 and have run the devnodeclean utility which removed many orphoned enties. Unfortunately my SYSTEM hive is still ~200GB and the server still hangas at the Welcome screen for around an hour. I attempted to compact and repair the hive using chkreg and the instructions above however ran into two issues. Firstly chkreg would not run in Windows PE, I got an error to the effect that “the required subsystem is not present”. Secondly I moved the offline copy of the SYSTEM hive to another server, chkreg /f SYSTEM /R ran OK but when running chkreg ?F SYSTEM /C I get an erro stating “failed to mount hive, error 0x5”. If you have any suggestions I would be much appreciative.

    • Hey James. Thanks for following. You can certainly try to reduce the size of the of the registry using those utilities, but lets try this first.
      1. Install KB982210, Reboot
      2. Download and install Device Remover on the host (http://www.pro-it-education.de/software/deviceremover) Using this utility might be necessary since from my experience Devnodeclean does not remove all the orphaned devices is needs to in order to return the system to a correct boot time. Even if the Hive is still very large, it should boot correctly after removing the orphaned devices.
      3. Launch Device Remover
      4. Click on View
      5. Click on “Device Remove Device Display Mode”
      6. Click on “Show only Hidden/Detached Devices” This process could take some time if you have lots of orphaned devices. (Very important to makes sure you have selected “Show only Hidden/Detached Devices” as you do not want to remove active devices.)
      7. Remove all devices in the following Disk Drives, Storage Volumes, Storage Volume Shadow Copies and Storage Controllers by clicking the check boxes for each of these and then clicking the “Remove all Checked” button on the middle right side.
      8. A dialogue box will pop up. Click on “Remove All Devices”. This process could take some time (hours) to complete depending on the number of devices to be removed.
      9. Reboot your server and test logon speed. At this point you should not see the Welcome hang.

      As far as chkreg, below are the steps that I followed that seemed to work for me after removing all orphaned devices.

      1. Download “chkreg.exe”
      2. Copy checkreg.exe to the root of C:\ on your server
      3. Boot the server using your Windows PE boot disk
      4. Once it comes up, copy “C:\Windows\System32\config\SOFTWARE” to the
      root of C:\
      5. Copy “C:\Windows\System32\config\SYSTEM” to the root of C:\
      6. Open up a command prompt and change to the root of C:\
      7. Run the following commands waiting for each one to complete before
      continuing with the next:

      Chkreg /F SOFTWARE /R

      Chkreg /F SOFTWARE /C

      Chkreg /F SYSTEM /R

      Chkreg /F SYSTEM /C

      8. The new files will have a .bak extension, i.e. SOFTWARE.BAK &
      SYSTEM.BAK
      9. Make backup copies of SOFTWARE and SYSTEM in the
      “C:\Windows\System32\config” folder
      10. Copy SOFTWARE.BAK and SYSTEM.BAK to the “C:\Windows\System32\config”
      folder and remove the .BAK extension
      11. After the copy has completed, reboot the server

      Rob

      • James says:

        Thanks for the advice Rob. I will have to wait until a suitable time to try but will give it a go. A possible issue may be that I did not run devnode clean immediately before using chkreg so many have orphoned entries may have occumulated again. Will keep you posted. Thanks again.

  5. CypherBit says:

    Is there any news regarding Hyper-V nodes in a cluster that are all on R2 SP1. Any hotfixes, workarounds. I have a two node cluster and am getting to the point of 200MB registry.

    Thank you in advance.

    • Stay tuned for another article coming up. There is still some ability to get orphaned devices in the registry that can produce bloat of the System registry key. To quickly check to see what might be issue, use the DeviceRemover utility outlined in the original article and then look for “show only hidden/disabled devices” under the display mode. Look specifically for the storage and controller devices that may show up as not connected anymore. I have seem instances where the storage devices are removed automatically, but the controller devices tend to build up from my experience. Feel free to email me (big envelope on the home page) a screen shot if you have any questions and look for a new article with updated info in the next week or so.

  6. John Cheng says:

    I’m trying to use ChkReg to compact my registry but when I boot into my WinPE environment, ChkReg just errors out with an “cannot be run in Win32 mode”. I tried other boot disks and it reports “cannot run in DOS mode”. How do I get the WinPE disc that would work with ChkReg? I’ve even followed the directions from MS website and loaded the Windows XP disc recovery mode, but it simply attempted and then failed to “fix” the registry without ever given me a chance to enter the commands myself. Help!

    • What version of WinPE are you using? It has been a while since I ran this procedure, but I will give it a try again and see if the process is still valid. I wouldn’t see why it would change. Below are the steps directly from when I was working with Microsoft on the trouble cases.

      1. Download “chkreg.exe” from your workspace (Receive files from Microsoft):
      2. Copy checkreg.exe to the root of C:\ on your server
      3. Boot the server using your Windows PE boot disk
      4. Once it comes up, copy “C:\Windows\System32\config\SOFTWARE” to the root of C:\
      5. Copy “C:\Windows\System32\config\SYSTEM” to the root of C:\
      6. Open up a command prompt and change to the root of C:\
      7. Run the following commands waiting for each one to complete before continuing with the next:

      Chkreg /F SOFTWARE /R

      Chkreg /F SOFTWARE /C

      Chkreg /F SYSTEM /R

      Chkreg /F SYSTEM /C

      8. The new files will have a .bak extension, i.e. SOFTWARE.BAK & SYSTEM.BAK
      9. Make backup copies of SOFTWARE and SYSTEM in the “C:\Windows\System32\config” folder
      10. Copy SOFTWARE.BAK and SYSTEM.BAK to the “C:\Windows\System32\config” folder and remove the .BAK extension
      11. After the copy has completed, reboot the server
      12. Login and note the time it takes to get to the desktop

      • John Cheng says:

        I’ve used the WinPE that came with Windows 7 AIK. And I’ve also tried using Bart’s PE (made with Windows XP SP2 disc). But both environment loads into a “Windows” environment with a command prompt. And Chkreg doesn’t like that environment (errors with “Cannot run in Win32 mode). Maybe I’m using the wrong ChkReg.exe? But the only version I can find is the one Microsoft is offering which, when run, tries to install it into the disk 6 of Windows XP installation floppy. I basically just use an extraction tool to extract it out without running the installation… So far the only environment that my Chkreg.exe seems to run on is the Windows XP Recovery Console which didn’t work either because it tried to auto run ChkReg against the actual Windows installation instead of allowing me to run the program against a backed up registry hive file…

  7. CypherBit says:

    It appears the correct chkreg can be found in this post: http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/07f406c8-678e-4609-b535-7dc2073ecaa0/

    I just ran it and it greatly reduced my SYSTEM hive (from over 80MB to about 12MB).

    • John Cheng says:

      Hi CyperBit. I just tried the SkyDrive link in the URL you posted, but the zip file wouldn’t load. Would it be possible for you to email it to me? That would be GREATLY appreciated! My email is drummerjc at gmail dot com. Thank you!

    • John Cheng says:

      Never mind my request for the email. I just needed to append an “s” to make the link a https link. Skydrive is enforcing security policies so the original link without SSL doesn’t work. But I got the file now and will try it later. Thank you very much for your comment!

      • Glad it worked out. Let me know if you have any other issues. Feel free to email me directly if you would like. Click the envelope on the main VirtuallyAware.com page.l

  8. Ingo says:

    In my enviroment Win2008R2 with SP1 the Device Remover will not work, it always hangs after starting the Remove-process.
    For all who has simmilar Problems, there is another tool on Codeplex to remove hidden devices:
    http://ghostbuster.codeplex.com

    Best regards
    Ingo

  9. John Smith says:

    just got DEVNODECLEAN from microsoft. It is taking around 10 seconds per node it deletes. We have 10,000 orphaned nodes! that’s 100,000 seconds – THAT’S OVER ONE DAY !!!!
    NOOOOOOOOOOOOOOOOOOOOOOOOOO!

Leave a comment