Today we have a guest poster, Jack Hogan. Over on the DPR forum, a question has been asked, and argued endlessly: when faced with a 16-stop intra-scene dynamic range, what’s the dynamic range of an image captured with a 14-bit camera? Jack responded with a little Chautauqua on how a camera works that I thought deserved some more web ink. I have edited Jack’s words for clarity. Any errors are likely mine.

Take it away, Jack:

- We are interested in
**scene DR as it is projected onto the sensing plane of our camera**where arriving photons are collected within a rectangular area typically but not necessarily divided into smaller squarish portions (pixels) and converted to photoelectrons in order to be counted and recorded in a file. The total number of e- is the same independently of the number of pixels within the sensing area, so clearly the more the pixels the fewer the e-/pixel. We call the number of photoelectrons so collected the ‘**signal**‘. It is an ‘analog’ signal independent of bit depth. - Photons and photoelectrons arrive and are converted with random timing, so the signal is never perfectly clean but it is always somewhat noisy. The inherent SNR of the signal is well defined and equal to the square root of its count in photoelectrons – we call this shot noise.
- The field of view, the size of the sensor and other sensor characteristics (pixel size and shape) are all needed to define what we normally call
**scene DR** - Pixels are typically square and of arbitrary dimensions: i.e. we could make them 1 nm^2, 1mm^2, the size of the sensing area or whatever.
- However pixels do have finite physical characteristics and therefore cannot record an infinite DR (nor can they practically be made as small or as large as we want). Current pixels can collect at most about 3000+ photoelectrons per micron^2. For a current 24MP full frame camera with 6 micron pixel pitch this translates into about 75,000 e-, after which pixels ‘fill up’ and top out (in photography we say they ‘clip’ or saturate). For a pixel of twice the area the saturation count would approximately double to 150k e- – of course in that case the camera’s ‘resolution’ would be halved to 12MP
- Designers choose camera characteristics (including pixel size) and photographers choose Exposure so that the brightest desirable highlights found in typical photographic situations do not exceed saturation count
- Therefore by the definition of
**eDR***the largest scene dynamic range that the 24MP FF camera in 5) can record is 16.2 stops = log2(75,000:1). This number depends on pixel size and is independent of bit depth. - Is 7) the dynamic range of the natural scene? No, that’s potentially much larger. It is the scene dynamic range as referenced to that specific camera with its pixel sizes and sensor characteristics. Scene dynamic range as viewed through the sensor of another camera with other pixel sizes and characteristics would be different. For instance the 12MP camera in 5) could record the same scene with an eDR* up to 17.2 stops = log2(150,000:1)
- While collecting and digitizing the converted photoelectrons the camera electronics adds some noise to the signal, which we typically model as if it were all added at the same time that the collected photoelectrons roll off the pixels. We call this read noise. For modern DSCs this tends to be in the 2 to 50 e- range.

Note that so far we have always spoken about the undigitized ‘analog’ signal only: we have not decided the bit depth at which to digitize it yet.

What procedure shall we use to prepare it for printing at 8×12 with maximum dynamic range and no visible loss of detail? We need to know the minimum number of pixels required for the average human to resolve all of the detail in the 8×12 print when viewed at standard distance. Let’s say that it is 8MP.

What would the scene eDR of your canyon be as seen through the pixel size of a camera of the same format as yours but with 8MP resolution?

Say you captured the canyon with the 8MP camera. If you used a 16MP camera of the same format instead of an 8MP camera, pixel area would be halved and you would have too many pixels for your 8×12. But is the scene DR information captured by both cameras roughly the same for your purposes? Yes, because both sensors sampled the same overall area. Sure the 16MP camera will have recorded more spatial resolution, but as far as the number of e- counted and their inherent noise is concerned they are virtually indifferent when viewed at the same size.

Here is an example with signal and shot noise SNR (see 2 above) of how it would work at the same exposure with everything equivalent other than pixel size. The half sized pixels would clearly see only half the photons arrive:

- Information from average pixel in 8MP recording: average signal 100 e-, SNR 10
- Information from average pixel in 16MP recording: average signal 50 e-, SNR 7
- Information from average pixel in 16MP recording 2:1 into 8MP: average signal 50+50=100 e-, SNR sqrt(7^2+7^2)=10. Same as if the pixels were twice as big.

Does it make any difference whether we use the data from the 8MP recording or from the binned 2:1 16MP recording as far as the eDR of the print in the specified viewing conditions is concerned? Not really, the recorded scene DR information is effectively the same at these viewing conditions. Does it make a difference to the observer when viewing the 8×10 print? Not really.

Note that we have not decided on bit depth yet. [If you are interested in how a camera can capture information whose amplitude is below the least-significant bit of the analog to digital converter, take a look here.]

* This is how DxO defines engineering DR for their purposes (it’s a fairly well accepted definition):

Dynamic range is defined as the ratio between the highest and lowest gray luminance a sensor can capture. However, the lowest gray luminance makes sense only if it is not drowned by noise, thus this lower boundary is defined as the gray luminance for which the SNR is larger than 1

Dominik says

http://theory.uchicago.edu/~ejm/pix/20d/tests/noise/noise-p3.html#pixelsize” Emil Martinec’s analysis leads to somehow different conclusion and I tend to agree. I think Jack overismplified using SNR averaging :

“SNR sqrt(7^2+7^2)=10. Same as if the pixels were twice as big.”

It is only noise part adding geometrically, not SNR…

Correct me if I’m wrong.

Jim says

Jack did a little shorthand there that makes it hard to see what’s happening. Try this:

Case 1: Signal = 100, Noise = sqrt(100) = 10; SNR = 10;

Case 2: Signal = 50, Noise = sqrt(7*7) = sqrt(50) = 7; SNR = 50/7 = 7;

Case 3: Signal = 50 + 50 = 100; Noise = sqrt(7*7 + 7*7) = sqrt(50 + 50) = sqrt(100) = 10; SNR = 100/10 = 10;

OutCast says

Number 2 does not describe noise at all. That is, noise in any form as described Information Theory or statistics I ever heard of.

Photon noise is part of the signal. It is not related to measurement error. It is not related to paramater estimate uncertainty. The thing we wish to know (light amplitude over a period of time at a particular surface in space) is unlike all other signals. The signal behavior is unique. This uniqueness is how come physicists assumed it was noise for decades.

Instead the phenomenon mis-labeled photon noise is an anharmonic modulation of the signal amplitude. Specifically, the electrical component of the electromagnetic radiation can be represented by an anharmonic model. Over time and under commonly encountered circumstances, Poisson statistics adequately models the amplitude modulation. But this does not require the phenomenon to be noise. While the magnetic component of the electromagnetic radiation is also modulated, this is irrelevant for pin-diode semiconductors. The source of the modulation is unknown. All we know is the amplitude modulation is inherent to the nature of light.

Put another way, if one had a perfect measurement device the signal we expect (a constant amplitude over time) would still not exist. So how can this be noise?

It is possible a model that mis-treats inherent, signal-amplitude modulation as noise will not affect parameter estimates (spatial, light-energy amplitudes) from the data due to coincidental mathematic axioms. At the same time there could be parameter estimation problems where these coincidences no longer obtain. For this reason all the properties of the signal should be included in the model’s terms for the signal instead of the terms for the noise.

Number 6 is very confusing. The camera designers know the signal can be limited by shutter time and, or aperture. Why would “Designers choose camera characteristics (including pixel size)” that limit”saturation count”? The exposure parameters are completely capable of limiting “saturation count” (I have no idea what you think you are counting in an analog data stream). Of course there is a practical ceiling for maximum “saturation count”. Clearly increasing “saturation count” by several orders of magnitude would require extraordinary shutter speeds and be of limited use for general purposes. However electronic shutters are becoming available for still cameras. While these shutters can not handle quickly movings objects, they are practical in other situations. Soon current limits for “saturation count” will be obsolete for stationary or slowing moving subjects. Will camera designers still limit “saturation count” three years from now when numerous cameras employ hybrid mechanical/electronic shutters?

Jack Hogan says

Wow, Outcast, quite a comment, it sounds like you know what you are talking about. Don’t forget the context though: digital photography, where shot ‘noise’ (perhaps misnamed) is considered as such by practitioners and viewers alike. Google Photon Transfer for more.

With regards to your second point, the underlying question is whether the increase in DR perceived when downsampling to 8MB for standard distance viewing creates DR out of nothing. And, simplifying a bit (e.g. ignoring read noise for clarity), the answer is clearly no: the ‘count’ (the number of photoelectrons collected by the pixel during ETTR) is similar for the same displayed area whether one started with a same format 8MP camera or a 16MP camera binned 2:1, or a 32MP camera binned 4:1. Binning in software is not so different than ‘binning’ in hardware with a larger pixel. Except that with software one has more options to trade off one IQ variable against another if one so wishes.

Jack