Anybody remember Jerry Pournelle’s column in Byte Magazine? In the spirit of that long-running account of Jerry’s struggles with technology, here’s a blow-by-blow of my experiences with the Drobo boxes. You are welcome to enjoy a bit of schadenfreude. Conclusions in the next post.
I ordered the first box from Amazon. It arrived in two days with no shipping charges thanks to Amazon’s Prime program. The unit was well-packed, and the instructions were minimal, but seemed adequate: load the disks, connect the USB cable, connect the power supply, plug in the power supply, you’re on your way. I stuck four Samsung 1 terabyte drives in the Drobo, connected it up to a server with eight terabytes of spinning storage, and watched the light show as the box booted. After boot, the drive status lights alternated orange and green for a minute or so, then settled down to solid green, indicating that the Drobo was happy. I left it there for a day or so while I sent an email to Drobo’s tech support organization asking if their control program for the Drobo, called Drobo Dashboard, was compatible with Windows Server 2008, 64 bit. Their web site indicated compatibility with Server 2003, but was silent about the newer program. After a day or so, I got back an answer: no problem.
So I loaded the latest version of Drobo Dashboard onto Smithers, the server, and used it to format the four-disk RAID drive. During the formatting process, Dashboard presented me with some choices. The first was what file system to use. There were two NTFS choices: a legacy version that offers compatibility with XP machines but can’t support partitions larger than 2 terabytes, and a newer version that removes that restriction by using an addressing technique called GUID Partition Tables, but will only work with Vista, Windows Server 2003, and WS 2008. I chose the GPT NTFS. [Don’t you love it? An acronym with one letter standing for another acronym, like SAS (Serial Attached SCSI). At least it’s not recursive, like GNU (GNU’s Not Unix). But I digress…]
Having picked the formatting, Dashboard asked me what to tell the OS about the size of the RAID disk. Curiously, the actual size, 2.7 terabytes (four terabytes, less a bit for conversion from 1000-but kilobytes to 1024-bit kilobytes, less 25% for the parity disk in a 4 disk RAID 5, less a bit of Drobo-specific overhead), was not an option. You see, the Drobo lies to the OS about the amount of storage available. It does that so you can, over time, load larger disks without reformatting the partition. I can see how that would be useful if you were using the Drobo for online storage, but it doesn’t appear to offer any advantage when you’re using the Drobo for backup, as was my intention. I picked 4 terabytes for the virtual partition size, the smallest option that allow me to have all the data on one partition. The Drobo formatted itself for a minute or so, and I was ready to add data.
For both on-site and off-site backup, I use a folder-synching program called ViceVersa Pro. It allows many different strategies, including making and managing archive copies of different versions of the same file. I don’t ask much of it. For on-site backup, I just point it at workstation folders and have it replicate them to the server. For off-site backup, I point it at server folders and have it replicate them to the backup devices. TGRMN Software, the company that markets ViceVersa, also sells a program to schedule and manage VV backups, called VVEngine. VVEngine runs in the background, as a service, and it contains a web server, so you can monitor its progress and change schedules using a web browser. The combination of the two programs is remarkable flexible and easy to use.
To add data to the Drobo, I created ViceVersa backup descriptions and scheduled them manually, adding them to VVEngine’s scheduling list as I went, so the backups would run automatically after the first time. In this manner, I added about one and a half terabytes of data to the Drobo, which I imaginatively named “Drobo1”.
Things were going so well that ordered another Drobo from Amazon. When it came, I set it up with what I thought were four 750 gigabyte disks, called it “Drobo2”, and added a little over a terabyte of data to it. I was feeling pretty happy with my backup situation when I noticed that Dashboard was telling me that one of the drives was a 1 TB drive. The way that the Drobo sets up the RAID, space on disks that are larger than the smallest disk is wasted.
So I decided to swap out the 1 TB drive for a 750 GB one. The Drobo should then realize what had happened, and rebuild the RAID with the old data and the new disk. I didn’t want to take the chance of file system corruption because of swapping out a drive while data was being transferred, so I brought up the Dashboard, and used it to put Drobo2 into Standby mode. The Drobo went into standby all right, but the Dashboard hung, and it hung so aggressively and so completely that I couldn’t launch any other programs, including the Task Manager, where I could see if something was using a lot of CPU cycles and kill it. The programs that were already running, like DNS and Exchange, seemed to be doing fine, so I decided to wait a few hours to see if things would sort themselves out.
Five hours later. No joy. I tried to shutdown. Nothing seemed to happen, but decided to give it some time, and went to bed.
Next morning, I had s blank green screen. It had gotten part way through the shutdown operation, and hung. I really didn’t want to take the chance of file system corruption on the main OS disk, but I couldn’t see any alternative to power-cycling the server. So I popped the front panel on the Dell 2900 and leaned on the power button until I heard the fans stop. I swapped the 750 GB drive for the 1 TB one in Drobo2, which was still in Standby. Then I punched the power button on the Dell 2900 again, and the machine hung at the Power-On Self-Test (POST). I waited a few minutes, while I considered what might be wrong. It didn’t seem too complicated; the only thing that had changed about Smithers’ hardware environment since the last time it booted was that addition of the Drobos. I pulled both USB cables to the Drobos. The server booted up. At no time had I turned off the power to the Drobos. After the server was running, I plugged the USB cables back in, and Drobo2 came out of standby. Drobo1, however, sported a dead black front panel. I disconnected and reconnected the USB cable, but couldn’t get it to come up. Not knowing what else to do, I power cycled Drobo1; it ran through its booting-up light show and looked like it was happy. I opened the Dashboard to make sure, and everything looked fine. I checked out Drobo2 with Dashboard, and it seemed to have rebuilt the RAID with the new disk. I congratulated myself on dodging a bullet, swore I’d never use the Dashboard to put a Drobo into standby, and went upstairs.
Later that day, I was checking on the backup status using VVEngine when I noticed that several folders had failed to back up to Drobo1. I fired up the file explorer and found that most, but not all, of the directories in that box were empty. I fiddled around at a command line window, and it looked like the data was gone. Dashboard, however, showed that the Drobo had the same amount of data as it had had the previous day. It looked like the file system was corrupted. I knew how to fix things: reformat and transfer all of the data again. However, I didn’t know how the file system had become corrupted, and how to keep the same thing from happening again.
It seemed like it was time to give Data Robotics tech support a try. I called the number on the web site, and, after less than three minutes, was connected to a real person with an American accent. I first explained what the problem was. That was pretty simple: missing files and folders on the Drobo. I then explained the sequence of events that led up to the problem. As you can see, that was a bit more complicated, but the tech seemed to get most of it.
His preliminary diagnosis: file system corruption due to the power cycling of Drobo1. He said that you should never power cycle Drobos without first putting them into standby mode. I explained that putting the Drobo into standby mode using Dashboard resulted in an OS hang, and asked for another way to do it. He said that powering down the computer to which the Drobo is attached or disconnecting the USB cable should do it, and that he emphatically recommended the former over the latter. I explained that I had done both, and that Drobo1 had refused to go into standby. I asked him what he would have done under those circumstances. He said that the only remaining option was power cycling, but that it was dangerous. That didn’t seem very satisfying.
We went on to troubleshoot what might have caused the problem. I rebooted the server several times, changing which Drobo was attached. It would boot fine with Drobo2 hooked up, but would hang with Drobo 1. The tech asked me to go into the BIOS to see if there was a setting that would keep the computer from trying to boot from a USB device, but the boot sequence that was there didn’t include any USB devices, and there was no way to disable any USB features in the BIOS. The tech had me swap the USB ports and cables to the two Drobos, with no change in symptoms, thus ruling out USB ports and USB cables as a cause. Stumped, he consulted with the escalation techs, and gave me the tech support equivalent of “Take two aspirins and call me in the morning”: upgrade the firmware on the Drobo (to a version so new that the even automatic update facility in Dashboard can’t access it), load all your data, test for the boot hang, and let me know if it happens again. The tech had been unfailingly courteous and pretty knowledgeable throughout the entire hour-plus phone call.
I read the release notes for the new firmware, and didn’t see any fix listed that seemed to apply, so I was dubious – unnecessarily so, as it turns out. I reformatted and loaded all the data to the old drive set on Drobo1 (it took two days), and shut down the server. I then swapped out the four 1 TB disks for fresh ones, and rebooted. No POST hang. I formatted the new drives, and turned VVEngine loose on them to load the data. There appear to be only one problem: a 50 GB archive won’t transfer without error. I labeled the old drives, put silicone snuggies on them, wrapped them in a Domke lens wrap, and trotted off to the bank.
Late-breaking news: Over the weekend, a flock of disk errors showed up in the error log. The offending disk is Drobo1. The Error source is Disk. The Error number is 51. The message is “An error was detected on device DeviceHarddisk4DR5 during a paging operation.” I’ll keep you all posted.