Hotfix KB982210 – Windows Server 2008 R1/R2 Hang at Logon Bug Deep Dive
(WinSE Bug # 318935)
This Hotfix (KB982210), “The startup time increases every time after you backup the data on all disks of the computer in Windows Server 2008 if the computer runs some virtual machines,” just came out on June 13th which is something I battled and worked with Microsoft for about a year.
During the first logon after a reboot, the server will hang most commonly at the “Welcome” screen (Sometimes if you logon immediately after being presented with the Ctrl-Alt-Del, you will be able to get to the desktop before if freezes). This will happen if connecting locally or through RDP. While in this hung state, you can ping the server and connect to a UNC path i.e. \\SERVERNAME\C$, but you will be unable to remotely manage the server i.e. Eventvwr, Computer Management, WMI, WINRS etc… After a period of time, which could be minutes to hours, the server will become fully responsive again. Subsequent logons without rebooting the system will not reproduce the logon hang.
Standalone Hyper-V Hosts:
Standalone Hyper-V hosts will experience the symptoms as outlined above, but will not effect VMs running on the host.
Clustered Hyper-V Hosts:
When experiencing the logon hang on a node of a clustered Hyper-V host, the cluster will become unstable and VMs living on the effected host or on other hosts will begin to crash producing a high-availability reaction of the cluster.
Products or processes that will cause this problem:
Any product that mounts and dismounts a VHD. Backup products or procedures that utilize host based Hyper-V VSS based backup solutions like DPM 2007/2010 or utilization of the Diskshadow utility will produce the logon hang as the number or orphaned devices within the registry grows. Commonly these solutions utilize a mounting of a Hyper-V VSS snapshot.
The screen print above shows the extent of the registry bloat of the SYSTEM Registry Hive that on a Hyper-V host that has taken daily host based snapshots for the past year.
The screen shot above shows the SYSTEM Registry Hive of a normal Hyper-V host system that was just installed.
For the last two months I have used the DevNodeClean utility workaround successfully to help clean out these orphaned devices from the registry to head off potential logon hangs. I created a scheduled task utilizing the utility that ran every day as well as a shutdown script to ensure that when a system was rebooted and I attempted a logon, there would be no hang.
You can reproduce this by creating a VHD and repeatedly mounting and dismounting the VHD. If you use the DevNodeClean utility or a utility like DeviceRemover, you will see that each time you mount the VHD it leaves a distinct orphaned storage device. Do enough of these and you will see the slow logon behavior that can cause instability. For my testing I created a simple script that mounted and dismounted a small VHD file 4000 times. Upon logon after a restart I would observe the hang. After running the DevNodeClean and restarting again, there would be no logon hang or management issues.
This registry bloat caused by the orphaned VHD files and the enumeration of these orphaned devices at the first logon after a reboot is what is behind the “slow logon.” Even though the DeveNodeClean utility helped with logon time, it does not shrink the size of your SYSTEM Hive once the orphaned storage devices have been removed.
In order to do this, you will have to run the CHKREG utility:
1. Copy checkreg.exe to the root of C:\ on your server
2. Boot the server using your Windows PE boot disk
3. Copy “C:\Windows\System32\config\SYSTEM” to the root of C:\
4. Open up a command prompt and change to the root of C:\
5. Run the following commands waiting for each one to complete before continuing with the next:
Chkreg /F SYSTEM /R
Chkreg /F SYSTEM /C
6. The new files will have a .bak extension, i.e. SOFTWARE.BAK & SYSTEM.BAK
7. Make a backup copy of SYSTEM in the “C:\Windows\System32\config” folder
8. Copy SYSTEM.BAK to the “C:\Windows\System32\config” folder and remove the .BAK extension
9. After the copy has completed, reboot the server
I am currently testing the Hotfix on some of my Windows Sever 2008 R2 with Hyper-V servers and will let you know of the long-term results. I have confirmed that the Hotfix will not run on Windows Server 2008 R1 with Hyper-V even though the problem also exists on that platform. The Hotfix will install on Windows 7. Keeping the DevNodeClean utility procedure would seem to be the only course of action for those platforms at this time. From my workings with the Microsoft support team, this was a bug that was classified as a Windows OS bug and not a Hyper-V R2 bug, which makes sense since it is reproducible on both Windows Server 2008 R1 and Windows 7 with the mounting and unmounting of a simple VHD file.
Errors in the KB982210 article:
Prerequistes: To apply this hotfix, you must be running Windows Server 2008 R2. Additionally, you must have the Hyper-V role installed. (Hotfix will install on Windows 7 obviously without the Hyper-V role)
Restart Requirement: You do not have to restart the computer after you apply this hotfix. (Hotfix requests a restart. Whether it is necessary is unknown.)
Windows Server 2008 R2 file information notes: For all supported IA-64-based versions of Windows Server 2008 R2 (The article states it is for systems running “Windows Server 2008 R2 and that has the Hyper-V Role installed.” IA-64 does not have the Hyper-V role available. This points to this being a Windows Server 2008 bug and not having a direct connection to Hyper-V.)
Title of the article: “The startup time increased every time after you backup the data on all the disks of the computer in Windows Server 2008 R2 if the computer runs some virtual machines.” (There has to be a better title for this article. Other than the shear length of the title, it is a bit confusing. How about “Windows Server 2008 R1/R2 and Windows 7 hang at logon due to orphaned device registry bloat”)