DIGITAL VIDEO SIGNAL COMPRESSION Overview and Tradeoff Dimensions
Digital Video Quality Assessment
Measuring the quality of a video sequence reconstructed from compressed imagery is an application-dependent task. While there may be well-defined criteria for acceptability of the compressed video sequence intended for machine use in a scientific or medical discipline, considerable research activity is still needed to close the gap between objective and subjective assessments of quality for video sequences intended for human viewing.
Human Visual Response
Assessment of video quality by human observers, invited to view scenes encoded with varying parameters takes the human visual response (HVR) explicitly into account. Jain (1989) and the Hyper Physics Web site (Nave, 2001) introduced the following key elements of the HVR.
Subjective evaluation requires a group of human observers—preferably not expert in image quality assessment—to view and rate video quality in terms of a scale of impairments, which range from “not noticeable” to “extremely objectionable.” ITU-R BT.500–10, Methodology for the Subjective Assessment of the Quality of Television Pictures, recommends a specific system prescribing viewing conditions, range of luminance presented to the viewer panel, number and experience of viewers, monitor contrast, selection of test materials, and process for evaluation of test results.
The Double Stimulus Impairment Scale and the Double Stimulus Continuous Quality Scale are particularly noteworthy. Subjective tests are costly and not highly reproducible. ANSI T1.801.01–1995 provides a set of test scenes in digital format while ANSI T1.801.01–1996 provides a dictionary of commonly used video quality impairment terms.
Objective evaluation techniques range from simple test metrics that do not take the HVR into account (such as the peak signal-to-noise ratio often quoted by researchers) to the vision system model metric developed by the Sarnoff Corporation (Sarnoff Corporation, 2001), which relies on comparison of maps of just noticeable differences between original and compressed video sequences, one frame at a time.
Annex A of ANSI T1.803.03–1996 lists a set of objective test criteria that may be used to measure video quality in one-way video systems, applying objective tests closely related to known features of the HVR. Webster et al. (1993) presented a scheme for combining subjective and objective assessments on test scenes based on objectively generated impairments.
Rate Distortion Relationships
The Shannon (1948) rate distortion bound refers to the minimum average bit rate required to encode a data source for a given average distortion level. If the data can be perfectly reconstructed, the bit rate at zero distortion is equal to the source entropy, a measure of the information contained in the source.
Principles of Digital Video Compression
Digital video compression operates by eliminating redundant spatial, temporal, hyper spectral, statistical, or pyschovisual information. A brief inspection of Figure 1a shows considerable overlap between frames 1232 and 1233 but a discontinuous change of scene (likely as a result of editing) between frames 1231 and 1232. All the frames shown exhibit strong local spatial correlation, the motion of the astronauts exhibits temporal correlation, the viewer’s attention is focused on the motion of astronauts and the expressions on their faces, and the space suit color contrasts well with the background. Figure 1b illustrates the well-known video artifact of “dropped frames” when this movie is compressed at an average data rate less than the original 293 kbps.
Compression of video sequences intended for human viewing may take place by a combination of lossy and lossless compression steps. Local spatial correlation may be removed by intraframe coding techniques such as block matching or transform coding followed by quantization of the transform coefficients.
Temporal correlation may be removed by interface coding techniques predicting the motion vectors observed in differences between successive frames. Spectral correlation may be removed by applying temporal decor relation or modeling techniques to the spectral dimensions of the hyper spectral imagery; coding redundancy may be removed through careful design of compression codes, while psych visual redundancy may be addressed by techniques that drop frames and increase the bit rate in regions containing human faces, sharply contrasting regions and trajectories of distracting objects that move rapidly through the peripheral field of view.