Video Condensation by Ribbon Carving

Team: Z. Li, W. Liu, H.-Y. Wu,  P. Ishwar, J. Konrad
Funding: National Science Foundation (CISE-CNS-NOSS)
Status: Ongoing (2008-…)

Background: Efficient browsing of long video sequences is a key tool in visual surveillance, e.g., for post-event video forensics, but can also be used for review of motion pictures and home videos. While frame skipping (fixed or adaptive) is straightforward to implement, its performance is quite limited. More efficient techniques have been developed, such as video summarization and video montage but they lose either the temporal or semantic context of events. A recently-proposed method called video synopsis provides even better performance, however, it involves multiple processing stages and is fairly complex.

Vertical ribbon

Vertical ribbon

Horizontal ribbon

Horizontal ribbon

Summary: In search for an effective and efficient video browsing algorithm, we have been inspired by image seam carving, a method for content-aware still-image re-sizing. In this method, vertical and horizontal seams (connected paths) with lowest cost, e.g., sum of luminance gradient magnitudes along the seam, are removed recursively to meet the target image size. Based on this idea, we developed a novel approach to video synopsis, that we call video condensation. Our approach extends the concept of image seam to video ribbon, a 3-D surface that is rigid either horizontally (vertical ribbon) or vertically (horizontal ribbon). Such a structure of the ribbon permits the use of dynamic programming, originally proposed in seam carving. Furtermore, the ribbon model is flexible and permits an easy adjustment of the compromise between temporal condensation ratio and anachronism of events. Although our approach permits the use of video gradients (3-D) , the most interesting results have been obtained for costs derived from motion labels (moving/static) computed from the video by means of background subtraction.  The method is novel in the way information is removed from the space-time video volume, is conceptually simple and relatively easy to implement.

Results: The method is efficient and effective which we demonstrate below on motor and pedestrian traffic videos. The first video below compares condensation results for three different cost functions:
  • magnitude of spatio-temporal luminance gradient (left),
  • magnitude of temporal luminance derivative (center),
  • activity (motion) labels (right).
Note the presence of distortions in the condensed videos obtained using luminance-based cost functions (early in each sequence) and lack of such distortions for the activity-based case. The spatio-temporal gradient produces high cost in static areas with high spatial detail (background) and low cost in uniform moving areas (white T-shirt, brown jacket), thus allowing ribbon cuts through moving objects leading to object splitting. The temporal gradient produces zero cost in static areas and also in uniformly-colored moving areas, again leading to object splitting. However, activity labels obtained by background subtraction prevent ribbon cuts through moving areas and lead to intact objects. Below are shown video condensation results (MPEG-4 videos) obtained using the activity-based cost function.
Highway – original video Highway – condensed video (8.06:1 ratio)
Overpass – original video Overpass – condensed video (2.87:1 ratio)

Sidewalk – original video

Sidewalk – condensed video (2.32:1 ratio)
Note 1: In the Highway and Overpass condensed videos one may occasionally notice the presence of moving vertical seams with a luminance/hue discontinuity. These artifacts are due to abrupt illumination changes (sun, clouds) as well as automatic exposure/gain control in the camera, that are clearly visible in the original videos by moving the slider. If many ribbons are removed from a static segment of the video, distant frames are combined together thus creating a luminance/hue discontinuity. Although such artifacts can be mitigated by pre-processing techniques, we have not attempted to do this here.

Note 2: Occasionally, objects appear to abruptly jump forward or disappear/reappear after a fraction of a second. This is due to frame skipping in the original video (imperfect video capture process), and not an artifact of the condensation algorithm. The effects of frame skipping may be perceptually more prominent in the condensed video because events happen “more quickly” than in the original video but they are present in both.

Note 3: Some condensed frames are void of moving objects but not removed. This is due to the fact that our background subtraction algorithm is imperfect and produces false positives in some frames thus preventing carving.

Publications: