Joint Space-Time Video Segmentation and Analysis

Team: M. Ristivojevic, J. Konrad
Collaborators: M. Barlaud, F. Precioso, University of Nice, France
Funding: National Science Foundation (CISE-CCR-SPS, International Collaboration USA-Frrance)
Status: Completed (2001-2006)

Background: Traditional video processing methods use two image frames at a time to analyze such dynamics as motion, occlusions, etc. This does not allow to incorporate temporal continuity constraints on the estimated quantities, e.g., motion labels, occlusion labels, and thus the final estimates are often incoherent in time.

Summary: We explored a new framework that is based on joint treatment of many image frames (e.g., 20-30). A form of joint space-time processing, this framework is essentially three-dimensional (3-D) since its domain is the x-y-t space of image sequences. This approach results in more reliable video segmentation, detection of occlusion effects and identification of various dynamic events. In particular, we developed a video segmentation method that is based on an active-surface model and level-set solution. Applied to both synthetic and natural image sequences this method results in object tunnels in the x-y-t space, that we have used successfully to identify certain occlusion events and measure time instants of object occlusions, disappearance, entry, etc. More details and experimental results are available here.

    Object tunnel in x-y-t space for video sequence "Akiyo" depicting a human upperbody in slight movements; the surface agrees with body outline through time.

Object tunnel in x-y-t space for video sequence “Akiyo” depicting a human upperbody in slight movements; the surface agrees with body outline through time.

Publications: