Warning: don’t bother to read this unless you care about the intricacies of Microsoft Server, or you are running a Drobo attached to one.
Last August, I began to have problems with my Windows 2008 server. After OS updating, Active Directory would stop working. I would take out the updates one of the time until it worked again. This went on for several months. Last month, I was doing a routine restart after some maintenance that did not involve installing updates. On reboot, the OS indicated that it was installing updates, and when the system came up, active directory was toast.
After trying to fix it myself, I called Microsoft.
The first support engineer I worked with was a guy named Bob, who specializes in Active Directory problems. Bob had me produce an MPS (Microsoft Product Support) report and send it to him. From that report, he diagnosed the problem as the following: “The server service hangs for about 31 seconds during the computer start and leads to the failure to start of the DFS namespace service and the netlogon service.”
He said that troubleshooting the start up of the server service was outside of his skill set, and transferred me to Varun, in the networking group. I worked with Varun over the phone while he used remote desktop software to see what was going on with the server. He made some changes to the way the server service was invoked at startup to isolate it from other services. He asked me to download and install a dump program. With some difficulty, we got the program in place. Then I finally got the idea of what he was trying to do: he was trying to run the dump program while the server service was hung. I didn’t see any way this was going to work, since the hang was so short; by the time I could log in and run the dump program, the server service wouldn’t be hung anymore.
We proved that I was right by going invoking the services snap-in right after a restart and finding that the server service wasn’t hung. Had I been able to Varun to explain his strategy, we wouldn’t have had to waste the time setting up the dump program.
Varun’s next idea was to take a complete dump. He pointed me at a web page that explain the procedure, and said that it sometimes resulted in the server getting into an infinite series of reboots.
I didn’t see why we were more likely to catch the server service in a hung condition with a complete dump than a partial one, and I was afraid that the complete dump operation would take my server completely down. Therefore, I decided I needed a safety net. I ordered another Windows 2008 server, activated active directory and DNS on it, and transferred the schema master role, the domain naming master role, the relative ID master role, the PDC emulator role, and the infrastructure master role to the new server. While I was at it, I installed DHCP on the new server, so I wouldn’t need the ailing server at all if worst came to worst.
Using dcpromo, I tried to remove active directory from the ailing server, but the removal process failed. I tried to uninstall Exchange Server, but that process failed as well. It looked to me like the active directory problems that still existed were preventing the removal of these programs.
Having little confidence in the efficacy of a full dump, I decided to bite the bullet and reinstall Windows Server 2008. After the installation, and the endless restarts for updates, I noticed a familiar-sounding event in the event log: 7022 “Server service hung on starting”. Alarm bells went off in my head. What anomaly could possibly cause the server service to hang in two different installations of the OS? And not only then, to not hang when the OS is freshly installed, but to hanging after some updates? You could blame the updates, but since it’s not just one, it’s many, that would be futile. There had to be some piece of unusual hardware that escaped Microsoft’s testing. What could that be? In my mind there was no question: the prime suspects were the Drobos.
I shut down the server, unplugged the USB cables from both Drobos, and hit the power button. No server service hang. I shut it down again, plugged in the Drobo with the 4 2TB drives, and powered it on. The server service hung. I shut it down again, unplugged the 8TB Drobo and plugged in the one with 4 1TB drives, and powered it up. The server service hung again.
So the presence of either Drobo at boot up is enough to hang the server service.
Next stop: Drobo tech support.