The mysteries of bit depth and chroma subsampling

Colour depth or bit depth is either the number of bits used to define the colour of a single pixel in a bitmapped image or video frame buffer, or the number of bits used for each colour component of a single pixel. Chroma subsampling is the practice of encoding images by delivering a lower resolution for chroma (colour) information than for luma (lightness). The two combine into what your footage looks like when it has been freshly recorded.

Inexpensive videocameras, dSLRs, system and compact cameras with video capabilities usually support 8-bit colour. More expensive cameras may support 10-bit internally and 8-bit on the output port, with SDI-capable cameras normally outputting 10-bit colour or higher.

Chroma subsampling depends on the codec your device uses to record the footage. Many video encoding schemes apply chroma subsampling, with 4:4:4 reserved for high-end cameras and 4:2:0 for low-end devices.

original clips for chroma subsampling test
The original clips to show the difference between 4:2:0 and 4:2:2 subsampling

For a video standard such as H.265 (aka HEVC or High Efficiency Video Coding), bit depth specifies the number of bits used for each colour component. However, bit depth is not the only factor to take into account when you want to know how accurately your device describes colour. The other aspect is colour gamut, which defines the range of colours you can define.

Standards such as Rec. 709 and Rec. 2020 include a description of the colour gamut. These standards define — among other things — a colour space. The number of colours or gamut within that space your camera is capable of recording, depends on its sensor.

If you’re shooting with a HDTV capable camera, then you’re recording to Rec. 709. If you’re shooting in S-Log, Log-C or any other Log specification, Rec. 2020 will provide the colour space that allows for a larger gamut. A TV-set that supports the Rec. 2020 colour space displays the larger number of colours the standard is capable of. Only recording devices that support Rec. 2020 will be able to show you exactly what your HDR footage will look like when it’s broadcast.

Cameras that record to a Log-curve are all capable of rendering a larger number of colours than Rec.709 allows for. Only high-end cameras will have a dynamic range of over 12 f-stops, while lower-end cameras may be limited to 10 f-stops, for example. 1

When do you need 8-bit colour? When will only 10-bit do?

Eight bits per sample in 8-bit colour allows for 256 shades per colour component, which results in a total of 16.78 million colours. A system that’s capable of 10-bit recording allows for 1024 shades per primary colour, recording a total of 1.07 billion colours. Colourists suggest processing footage into 10-bit colour with 4:2:2 chroma subsampling. Most content can look great in 8-bit colour, but once you start pushing contrast or if you work with subtle gradations, banding may occur.

Since external recorders like Atomos’ Ninja and Shogun monitor/recorders are capable of 10-bit colour recording, sending the sensor’s signal to an external recorder will result in better colour rendering and more headroom to colour grade, add effects, etc. Even so, if the camera only sends 8-bit colour over its external port, the 10-bit recording the Ninja/Shogun records will still result in 16.78 million actual colours to work with.

Chroma subsampling: do you need ProRes 444, 422 or will 420 do?

But there are other benefits to sending a video signal to an external monitor/recorder. For example, if the signal that is transmitted over the HDMI port is “clean”, it may mean the external device records exactly what the sensor “sees” without adding any processing to it.

Subsampling depending on the codec, you can improve picture quality considerably by recording to an external recorder such as a Shogun or Ninja. The data your camera records internally is usually compressed to a lossy format to save bandwidth. Outputting to an external recording unit therefore usually results in higher quality.

The observation that the human eye sees variations in brightness much better than colour differences is the foundation of chroma subsampling. Codecs that aim at optimising bandwidth will store more luminance detail than colour detail. To achieve this, they divide the signal into a luma (Y’) component and two colour (Cb and Cr) difference components. The notation for this scheme is Y’CbCr, which was also known as YUV in analogue times. 2

results of subsampled clip after colour grading
The difference between 4:2:0 and 4:2:2 subsampling becomes really clear after some colour grading in post.

We define chroma subsampling types horizontally. This means that a ratio is based on four luma values taking the form “4:N1:N2”, where N1 and N2 are the relative number of chroma values in rows of a 4×2 pixel block. For example, the luma component Y’ can be set up to get twice the bandwidth of the colour difference components Cb (blue) and Cr (red). This is what we call the 4:2:2 scheme. That scheme only needs two-thirds the bandwidth of 4:4:4.

As chroma subsampling effectively decreases resolution in the affected colour channels, it will be most visible near the edges of sharp colour transitions. The higher the chroma subsampling (lower 2nd and 3d figure) the less detail you’ll see in an image. An example is a moving shot of metal bars standing at a 30 degree angle. A 4:2:0 subsampling recording of this scene will show jaggies, while the 4:2:2 shot will have far less and the 4:4:4 shot have none.

View the high-quality (650MB) QuickTime video I made to show you how much worse 4:2:0 is compared to 4:2:2:

Shogun-Flame-Illustration.mov

To illustrate how bad 4:2:0 chroma subsampling works out compared to 4:2:2 I tried different setups. I tested with the latest Atomos Shogun Flame and a GoPro HERO4 set at 1080p 60fps. The composite video I’m presenting you here has the two native recordings across the top half and the two colour corrected recordings across the bottom half. Colour correction was done in Final Cut Pro X using Color Finale. A brightness correction was applied so that the RGB parade of the two clips were similarly distributed. A colour correction was applied to fix a blue cast, which was much heavier in the native GoPro HERO4 clip than in the slip shot with the Shogun Flame.

I found that:

  • Quality difference is most visible in low light conditions, but starts to show in shadow areas that start at approximately 2 to 3 f-stops less than your target (correct) exposure.
  • The Shogun Flame records a considerably brighter and cleaner image than the native MP4 file from the HERO4. This could be camera/sensor specific and may not be repeatable with other, better cameras.
  • The native Shogun Flame clip looks about as noisy as the HERO4 native clip, but after colour correcting both, the HERO4 clip is much noisier than the Shogun Flame clip.
  • The Shogun Flame clip is much sharper than the HERO4 clip both before and after colour correcting.
  • The falling sticks in the native HERO4 clip look smeared, mottled and with noise on their edges. The same sticks in the Shogun Flame clip have clean edges.

The video doesn’t allow to draw conclusions with regards to the difference between 8-bit and 10-bit colour as the HERO4 output 8-bit colour through its HDMI port.

Compressed video’s efficiency

While the above is somewhat theoretical, lets’ take this to the practical level and focus on ProRes encoding, because a large range of hardware supports it — including Atomos’ full product range, many of Blackmagic Design’s products and Convergent Design’s monitor/recorders.

Starting with uncompressed video, a recording captures every frame in full colour. This is about 77% less efficient in terms of bandwidth and file size than ProRes 4444 XQ without alpha channel. Put differently, the bitrate of uncompressed 1080p/30fps video in Mbit/sec is 4.5 times bigger than ProRes 4444 XQ’s, which itself is 2.25 times bigger than ProRes 422 HQ.

As ProRes 422 HQ (the highest ProRes type the Atomos devices support) is a lossy codec, there is a loss of quality, but it’s invisible, not even if you heavily process your footage in post-production. For almost every type of movie shooting, ProRes 422 HQ will suffice.

A 4:4:4 recording gives you perfect quality, but it’s not supported by many recording devices or cameras. Cinema cameras like the AJA Cion, ARRI Alexa ( some models support ProRes 4444 XQ ) and Blackmagic Design Ursa (which also supports ProRes 4444 XQ) support it. The Sony F55 Cine Alta takes a different approach: it shoots its highest quality in a proprietary RAW format. And you can upgrade another Sony camera, the PXW-FS5 to shoot in a RAW format that a Shogun Flame can record (— for a hands-on experience with this, see Newsshooter).

As the human eye doesn’t notice when some colour detail is removed, most of the mid-end devices and video-capable dSLR cameras output 4:2:2 colour as in ProRes 422. The worst quality colour subsampling still used in movie cameras is 4:2:0. MPEG encoding schemes such as the one used by GoPro action cameras use it.

Conclusion

The only way to improve 4:2:0 footage (or lower) to a higher level of subsampling is by recording it in 4:2:2 or 4:4:4 in the first place. For most cameras this means using an external recording device, such as the Atomos Ninja, Samurai or Shogun. But there’s a caveat: while cameras equipped with SDI ports will output in 10-bit 4:2:2 or better, there are many cameras that have only HDMI-out ports. In rare cases, you still might be recording 4:2:0 footage into your recorder. This may happen when your camera is older than four years. 3 Luckily, there aren’t that many cameras left that limit output to 4:2:0 over HDMI.

Even with a GoPro, there are ample benefits to recording via HDMI-out, especially with footage not containing much action or motion. These included a sharper image, more detail in dark areas and less jagged edges when objects are not 100% vertically or horizontally orientated.

  1. Still, claiming a camera has a 14-bit dynamic range simply because the camera has an ADC (Analog-to-Digital-Converter) of 14-bits is misleading because noise and the capacity of the pixel well to produce such a dynamic range has to be considered. (Source: Motion Video Products ↩︎
  2. Y′CbCr color spaces are defined by a mathematical coordinate transformation from an associated RGB colour space. If the underlying RGB colour space is absolute, the Y′CbCr colour space is an absolute colour space as well; conversely, if the RGB space is ill-defined, so is Y′CbCr. Source: Wikipedia ↩︎
  3. E.g. Panasonic GH2, Canon EOS 5D Mark III without firmware update ↩︎
Advertisements