Backing up Photographic Images
In chemical photography, you have only one master image of each exposure. It’s stored on the film you put into your camera. If you value the images you can make from it, that master image is precious to you. Be it original negative or original transparency, any version of the image not produced from the master will allow reduced flexibility and/or deliver inferior quality. Photographers routinely obsess over the storage conditions of their negatives, in some cases spending tens of thousands of dollars to construct underground fire-resistant storage facilities with just the proper temperature and humidity.
One of the inherent advantages of digital photography is the ability to create perfect copies of the master image, and to distribute these copies in such a way that one will always be available in the event of a disaster. In the real world, this ability to create identical copies is only a potential advantage. I have never had a photographer tell me that his house burned down, but he didn’t lose his negatives because he kept the good ones in his safe deposit box, his climate-controlled bunker, or at his ex-wife’s house. On the other hand, I have talked to several people who have lost work through computer hardware or software errors or by their own mistakes. It takes some planning and effort to turn the theoretical safety of digital storage into something you can count on. The purpose of this article is to help you gain confidence in the safety of your digital images and sleep better at night.
Before I delve too deeply into the details of digital storage, I’d like to be clear about for whom I’m writing. I’ve been involved with computers for almost fifty years, and consider myself an expert. If you are similarly skilled, you don’t need my advice, and I welcome your opinions on the subject. Conversely, if you are frightened of computers, feel nervous about installing new hardware or software, and get confused when thinking about networking, you should probably stop reading right now; what I have to say will probably not make you sleep better at night, and may indeed push you in the other direction. My advice is for those between these two poles: comfortable with computers, but short of expert.
I wish this were a simpler article. I don’t have a one-size-fits-all answer, although I’ve included some cookbook advice for three classes of users at the end of this article. I’ve tried to avoid making this discussion more complicated than it has to be, but it’s a technical subject, and it needs to be covered in some detail if you are to make choices appropriate to your situation.
Okay, let’s get started. People occasionally ask me what techniques I recommend for archiving images. I tell them that I don’t recommend archiving images at all, but I strongly recommend backing them up. Let me explain the difference.
When you create an archive of an image, you make a copy of that image that you think is going to last a long time. You store the archived image in a safe place or several safe places and erase it from your hard drive. That’s called offline storage, as opposed on online storage, which stores the data on your computer or on some networked computer that you can access easily. If you want the image at some point in the future, you retrieve one of your archived images, load it onto your computer, and go on from there.
When you back up an image, you leave the image on your hard drive and make copy of that image on some media that you can get at if your hard drive fails. Since the image is still on your hard drive, your normal access to the image is through that hard drive; you only need the backup if your hard drive (or computer, if you haven’t taken precautions) fails.
The key differentiation between the two approaches is that with archiving, you normally expect to access the image from the archived media, as opposed to only using the offline media in an emergency. A world of difference stems from this simple distinction.
I’m down on archiving for the following reasons:
Media life uncertainty. It is difficult – verging on impossible – to obtain trustworthy information on the average useful life of data stored on various media. Accelerating aging of media is a hugely inexact undertaking. Manufacturers routinely change their processes without informing their customers. The people with the best information usually have the biggest incentive to skew the results.
Media variations. Because of manufacturing and storage variations (CD or DVD life depends on the chemical and physical characteristics of the cases they are stored in, as well as temperature and humidity), you can’t be sure that one particular image on one particular archive will be readable when you need it.
File format evanescence. File formats come and go, and by the time you need to get at an image you may not have a program that can read that image’s file format.
Media evanescence. Media types come and go (remember 9-track magnetic tape, 3M cartridges, 8-inch floppies, magneto-optical disks, the old Syquest and Iomega drives?). By the time you need to get at an image you may not have a device that can read your media.
Inconvenient access. You may have a hard time finding the image you’re looking for amidst a pile of old disks or tapes.
Difficulty editing. Over time, your conception of an image worth keeping probably will change. But it’s a lot of trouble to load your archived images onto your computer, decide which ones you really want, and create new archives, so you probably won’t do that. You’ll just let the media pile up. Your biographers will love you for that, but it will make it harder to find a place to store the archives, and harder to find things when you need to.
If you follow my advice and go for the backup strategy, you keep all of your images on your hard disk(s). They will be stored somewhere else as well, but they’re always on one of your computers. There are many advantages to this approach.
Ease of access. You can always immediately get to any image. For large image collections, you will probably want to use a program that assists you in finding images. More on this later.
Handy backups. If you have some onsite redundancy (more on that later) you will probably never have to use your offsite backup copies.
Backups don’t need to last long. The life of the data stored on your backup copies is not an issue, since that data will be refreshed every few months. The life of rewritable backup media remains a factor, but a much smaller one since you will be changing backup media every three to seven years because of device obsolescence.
Evergreen file formats. If it is necessary to update file formats, you can do it automatically by applying a script to all the files you have stored on line.
The drawback to the online-storage-with-backup approach has traditionally been cost. But disk cost per byte has been plummeting at a greater-than-historical rate for the last ten or fifteen years, and it’s now so low that, for most serious photographers, it’s not an impediment to online storage of all your images.
Why are disk prices dropping so fast? A succession of technological breakthroughs (a couple of biggies were the giant magneto-resistive effect and perpendicular recording, for you technophiles) has dramatically increased the aerial density – the amount of data that can be stored in a square inch or square centimeter. Greater aerial density means more data on the same number of platters, or even on fewer platters, which makes disks cheaper per byte stored. Greater aerial density allows smaller platters without sacrificing the amount of data stored, which makes disk drives cheaper. Greater aerial density also increases performance: when the bits are all crammed together, more of them pass the head per unit time, and data transfer rates increase. When the tracks are closer together, seek times drop. Smaller drives also dissipate less power. Increasing aerial density is a won
derful thing for computer users in general, and photographers in particular.
When an old technology is challenged by a newer one, the old technology often improves dramatically. This has been the situation with disk storage. Fifteen years ago, most technologists believed that by now, disk storage technology would have been substantially replaced by now with flash memory. That hasn’t happened, not because flash memory has failed to advance as predicted, but because disk storage has evolved faster than most people thought it would. This has been a good thing for photographers, especially in the last five years when the size of digitally captured images has been increasing quickly (but not as quickly as disk drive capacity). Today, leaving out scanning backs, the upper end of the size of digitally captured images is around 40 megapixels for medium format cameras and about 16 megapixels for 35mm sized cameras. Over the next five years, these numbers might double, but I wouldn’t expect much increase in image resolution above that, since we will be reaching the point of diminishing returns; we will have plenty of resolution for all but the largest size prints, and the resolution of the images will be closing in on the resolution of the lenses. If disk technology continues to advance in the next five years at anywhere near the rate of the last 10 years, the economics of storing all your images on disk will become more attractive as time goes by. Eventually, rotating magnetic memory will be replaced by some other kind of nonvolatile online mass storage, and what I have to say about storing your images on disk will probably apply equally well to that storage medium.
Let’s test my contention that disk storage has become sufficiently affordable by working out some examples with today’s pricing. If you have 10,000 raw images from a 16 megapixel camera, at 12 bits per pixel you have 240 gigabytes of data. You can buy a 500 gigabyte external disk for $140 to $200, so it will cost you about $80 to store all those images. Let’s say you are a prolific shooter and a terrible editor, with 100,000 raw images. Basic storage for those images is less than $1000. Maybe you do a lot of editing and love layers, so you’ve got to store big Photoshop or TIFF files. If your files are 500 megabytes apiece, you can store 1000 images on that under-two-hundred-buck disk.
It’s not all good news. That’s just the beginning of what it will cost you to store your data dependably. Disk reliability has made great strides over the 50-year history of the device, with calculated mean time between failures now coming in at over 100 years, and field failure rates of possibly a half to a quarter of that. All the same, a sensible attitude towards a disk is to view it as a failure just waiting to happen, and to arrange things so that, when disks die, you can easily replace them and restore your data. A disk failure may be an occasion for a mildly elevated pulse, but it should not cause panic.
If you implement my backup philosophy in your own how or studio, you’re going to need three kinds of non-volatile storage. The first kind is your main disk storage. It should be fast, and large enough for all your images. The second is your backup disk storage. It should be equally capacious, but needn’t be fast. The third is your offsite image storage. I will discuss each in turn.
Main Disk Storage. This is the primary storage for your image collection. Your computer came with at least one internal hard disk. Depending on the size of that disk, you may be able to keep up with the growth of your collection of images simply by upgrading your computer every two or three years. If so, you needn’t worry about the details. If you find that you must add disk capacity, you have some decisions to make. The possibilities are:
Adding one or more internal hard disks. This probably the cheapest way to go, especially if you don’t value your time highly. You’ll need to find out if you have room for the disk(s), if there is adequate power and cooling capacity. You should also have a look at your motherboard and see what kind of disk interface it supports, and buy disks that are compatible with that interface. You may have several interfaces to choose from. If it’s a fairly new motherboard, it probably supports SATA, and you’ll want a SATA drive. It may not support the latest and greatest SATA version, so you might not get all the performance or which your fancy new disk is capable, but it will work. If all this sounds daunting, have a trusted repair shop do the work, or read on for less invasive alternatives.
Adding one or more external hard disks. External hard disks contain the same disk drives as the internal ones, but they come packaged in their own cases, with their own power supplies, and have different interfaces than internal drives. The cost premium for the extra hardware is surprisingly low. I recently saw a Seagate 750 GB raw hard disk on Amazon for $260. The external version was ten bucks cheaper, and came with a backup program. There are three interfaces commonly employed by external drives: USB 2.0, IEEE 1394 (aka Firewire), and eSATA. Maximum transfer rates are 480 Mb/s for USB, either 400 or 800 Mb/s for 1394, and 3000 Mb/s (3Gb/s) for eSATA. You may be suspicious of these rates; I know I was. In my testing, I was pleasantly surprised to find that each interface was capable of sustained disk transfers of the large files that comprise photographic images at rates within 20% to 25% of those quoted. You probably have either USB 2.0 or one of the IEEE 1394 variants on your computer. You can use those with marginally tolerable performance for large image files, and perfectly acceptable speed for small files. If you want to use an external disk with all the performance of an internal drive, you’ll need to use the eSATA interface, which probably means that you’ll have to install a peripheral adapter card into your computer.
Adding a network file server. There are two types of network file servers. The traditional way to build one is to take a more-or-less ordinary computer with an Ethernet port, and put a lot of disk storage on it. Most desktop operating systems support some form of file sharing these days, so you don’t need server software. If you decide to use server software, you will gain enhanced management options. You can buy a computer with the idea of making it a file server, or you can upgrade an old computer that you happen to have lying around; these days the file server role makes few demands on computer hardware than acting as a desktop machine. An alternative to using a standard computer as a file server is to buy a box designed from the ground up as a file server. These devices are referred to collectively as Network Attached Storage. A NAS box will typically be smaller, cheaper, more reliable, and less power-hungry than a file server of similar capacity built from off-the-shelf computer parts. You can buy a 2 TB NAS box for a few hundred dollars more than the cost of the disks alone. A downside of a NAS system is that you will usually have limited backup options. No matter which way you obtain your file server, you will have to live with the performance limitations of network access. You want your file server and your workstations to support gigabit Ethernet, which offers raw transfer rates of 1 Gb/s. Unlike the disk interfaces, you will never see anything close to this rate for actual disk transfers. You will probably obtain rates of about 100 Mb/s for photographic image transfers, or about a quarter the speed of the slowest of the external disk interfaces, and less than 4% of the speed of the fastest. That means that a 40 MB image will load in about 4 seconds, a 500 MB image in about a minute. These numbers get worse, but nowhere near ten times worse, if you use 100 Mb/s Ethernet. You may want to copy images from the file server to your hard disk in bulk, work on them, and copy them back when you’re done.
Backup Disk Storage. My principle tenet for disk-based storage is that no single hardwar
e failure, few double hardware failures, and no foreseeable software error should cause loss of data that cannot be recovered in a few minutes of reconstructive work. This means that onsite data must be stored in at least two places. The candidates are the same as for online storage: internal hard disks, external hard disks, and network file servers. However, we evaluate them differently when talking about backup. I think internal hard disks are a non-starter for backup. There are too many things that can go wrong inside a computer that can take out data on a hard disk, and many of those things can take out data on several hard disks. If you must have your backup storage on the same computer as your primary storage, at least put it on an external drive with its own power supply. If your primary computer has a meltdown, you can get running quickly after you replace it by just hooking your external hard disk up to the new computer. You get even more isolation from a single failure with network attached storage. Don’t worry about having a fast interface to your backup hard drive; the actual backups will be done in the background, possibly when you’re asleep, and you won’t care how long they take.
Offline Image Storage. The traditional way to back up disk drives has been magnetic tape. Unfortunately, magnetic tape has not advanced as rapidly as disk storage in the past fifteen years and is now the backup media of choice for only systems with substantially more than a terabyte of storage. Let’s consider backing up a single 750 GB drive. A 36 GB DAT tape drive will cost you almost $1000, and you’ll have to shuffle 20 tapes to do one backup. A Super DLT drive costs more than $2000, and you’ll have to swap in five cartridges. The big AIT drives are a little better deal at around $1600 for a unit that will let you back up that big disk into 5 cartridges. You can buy robots that will swap the tapes for you, but that drives the costs way up; remember that we’re backing up a drive that only costs about $250. What about disks? You already have a CD writer, but it would take more than a thousand CDs do that backup. Even if the CDs were free, you wouldn’t go to the trouble. You could back up the drive to DVDs, but it would take 150 disks. Sony has just announced a BluRay read/write drive; even with 25 GB per disk, we’re talking 30 disks do to our backup. Several vendors have cartridge-based hard disk systems; these are more rugged than ordinary hard disks, but also more expensive and less capacious. In my opinion, the cheapest and most convenient backup for smaller hard disk systems is – wait for it – hard disks. Have one or two external 500 GB or 750 GB hard disks on a computer and make sure that this storage always contains a backup copy of all your images. Have an identical set of disks in reserve. Every month or so, take the external disk drive(s) to an offsite storage facility like a safe deposit box, and bring the reserve disk(s) home. The cost to back up our 750 GB disk? Less than three hundred bucks, a bargain. Be careful when you transport your disks; they’re not as fragile as they used to be, but you still don’t want to subject them to any but the most gentle of mechanical shocks. If the road from your house to the bank is rutted and bumpy, you may want to consider tape backup, or just invest in a roll of bubble-wrap. Wrapping them in aluminum foil and tossing In a desiccant can’t hurt either.
One intriguing possibility for offline image backup is Internet data storage. Until a couple of months ago, the only people offering this service were catering to people with only a few gigabytes of data to store, but now there is a service that offers a terabyte of storage for a thousand dollars a year; they will probably be followed by others. As a way to make your images accessible wherever you are, this could be a winner. However, your bandwidth to the Internet may be a problem, not for uploading the data, but for recovering it in the event of a disaster. At T1 speeds, 1.5 Mb/s, it would take a week to restore a terabyte of data. If you tried downloading this much data continuously over a DSL or cable line, you’d probably get a notice from your ISP requesting that you take your business elsewhere.
How to move your data around among disks. You could move images from your primary storage to your backup storage simply by dragging changed files over. I don’t recommend this; you want to use a method that keeps your backup data current automatically. You don’t want to have to remember which files you changed, and you probably don’t want to wait the time required to replace all files on the backup device every time you edit a few images. Most of all, you don’t want to have to think about backup during your normal working day; it should just happen. There are many backup programs that will perform disk-to-disk backup; one of them probably came with your operating system. I don’t recommend those either, if they use a compressed file format that’s not the native format of the OS. The kind of backup program you want goes under the name of file and folder synchronization program, or sometimes disk-to-disk backup program; an Internet search on “file sync” will yield a slew of candidates. The kind of synchronization you will be performing is one-way synchronization; that is copying data from your primary images store to the backup image store. You can set up a file synchronization program to copy files from your primary disk whenever a file changes, or after waiting set period of time. I recommend the latter choice, since it allows you to avoid having your machine bog down performing the backup while you’re working on your images. Some sync programs let you set different criteria for different file folders. Most of these programs also offer options that let you save the last few versions of each file, which can be useful if you overwrite a file with an edit that you later regret.
Here’s a sure-fire way to ruin a day. You’ve been backing up to tape for years, but have never done a restore. You have a disk failure. No problem you think, as you find and mount the backup tape. The backup tape proves to be corrupted. You find the previous tape. It’s bad too. Turns out all your backup tapes are bad. The moral: whatever backup software you use, you need to make sure that it works, and that it keeps working. This is one of the reasons I’m down on the backup programs that generate a monolithic file with all of your images rolled up into it. The only way to make sure you can get at your backup data is to do a trial restoration. I know lots of people who resist trial restorations, because they’re a pain. If you’ve got a terabyte of data in your backup, you need a terabyte of free disk space to do the restoration. You may not have that space available. Doing a trial restoration over your primary data is a move that makes most people a little queasy. Restoring a lot of data also takes a while.
The file-sync approach avoids all that. Your data is stored right out in the open, and you can open a few files and make sure they are uncorrupted. You can also use the file compare option of most file sync programs to compare all of your backup files to the primary ones. By the way, one of the nice things about most of the NAS systems is that you can have them send you an email if there’s a problem, plus a regular status report. That way you don’t have to remember to go out and check up on your storage; after a few weeks of looking at an email from your NAS box every morning, you’ll notice if it’s missing.
A word on compression. Avoid it, unless it’s part of the image file format. Lossless compression (the kind that doesn’t affect image quality) doesn’t work well on photographic images. It won’t damage the images, but you will find that the losslessly compressed files are only slightly smaller than their uncompressed versions. In addition, compression offers opportunity for new adventures in obsolescence.
How to organize your images into folders. I wish I knew more ab
out this. I would welcome an article on the subject from anyone with a well-reasoned point of view. I do have one recommendation. Keep raw and finished files in completely separate folder trees, since the kind of scripts you will want to run against the two kinds of files will be different.
How to get at the image you want. If you are incredibly organized, you may be able to create a directory tree that will let you find what you’re looking for just by sorting through folders. For the rest of us, some kind of image organizer is essential by the time the number of images in your collection hits quadruple digits. Image organizers let you assign keywords to images and retrieve them by searches. They let you group images together independently of the folders where the files reside. They let you rank and flag images. There are many organizers around, and there are several programs that include organizers and do much more. I have a perspective on choosing an organizer. Take the long view. Many organizers work by creating a database in proprietary format with the information that you enter about your images. You will invest a great deal of time in assigning rankings and keywords to images. You will upgrade computers and operating systems many times over the life of your image collection. Try to find an organizer that will be around for years, and will be updated to run on newer operating systems as they are introduced. If an organizer today runs under both Mac and Windows OSs, that’s a good sign. It also inspires confidence if the company selling the organizer has a track record of shipping quality products, supporting them well, providing transition paths when they introduce new products, and making money. If you can find an organizer that does all its work with IPTC tags rather than propritary database entries you’ll have a chance of transporting your organized image collection to another organizer program, but even that isn’t a slam dunk.
Here are my recommendations for three specific situations.
One user, one computer, a few tens of thousands raw files, and/or less than a thousand edited files. Lucky you, you don’t have such a big storage requirement that you need to mess around with tape. You should have a 500 gigabyte internal or fast external disk. Use striping if performance is more important to you than saving money. Get enough external storage to store your entire collection of images plus as many as you intend to make in the next year or so. Then get a backup set of external disks. Hook one set up to your computer and use a backup or file synching program to make sure they automatically contain a copy of all your images. Every month or so, take the external disks with your backup images to the bank and put them in your safe-deposit box, retrieving the other set of backup disks. Take them home and hook them up to your computer, letting them automatically be filled with the latest versions of your images. Minimum cost: $150 for a 500 GB internal disk, $320 for two 500 GB external disks, for a total of about $500.
One user, two computers, a few tens of thousands raw files, and/or less than a thousand edited files. You don’t need tape either. Network your computers together, and keep your entire image collection on each, using a file syncing program so you can edit any image on either computer and have your changes automatically written to the disk on the other computer. For minimum cost, use external disks on one of the computers to store your images. Then swap the external drives with the drives at the bank as above. Cost: same as above. For maximum performance, keep the images on internal disks on both computers, and use striping to minimize read and write times. Then put one or more external disks on one of the computers, and do the disk swapping trick. That’ll cost you a bit more, but you’ll have your images stored in four places at once, and you’ll have a fast environment on both computers.
One or more users, two or more computers, a few hundred thousand raw files, and/or well over a thousand edited files. Congratulations. You’re into this digital photography thing in a big way. There’s a cost to your success; when you’ve got more than a couple of terabytes of image storage you’re going to need tape backup. Network all your computers together, and designate one as a file server You can do image editing on this computer if you need to, but it’s best to keep the server software stable, so you may want to leave the fancy graphics adapter and the big display off this machine, and dedicate it to serving files. You will want the images that you’re working on stored locally for performance, but make sure all the images stored on the workstations are copied automatically to the file server. Also make sure that all images on the file server are stored on two disks using a file syncing program. Go out and blow a few thousand dollars on an LTO3 tape drive and an armful of 400 GB cartridges. Regularly back up your images to tape and take the tapes to a secure offsite storage location. Minimum cost for the disks, tape drive, and tapes: about $10,000.
Well, there it is—a distillation of what I’ve learned about image data preservation in 17 years of dealing with digital photographs. I hope that it will be useful to you.
June 2007