BU Logo
ECE Logo
ISS Laboratory Logo
BU Linux logo
Publications

Conference paper abstracts

K. Guo, P. Ishwar, and J. Konrad, "Action recognition from video by covariance matching of silhouette tunnels," in Proc. Brazilian Symp. on Computer Graphics and Image Proc., Oct. 2009, [PDF: 325KB].

Action recognition is a challenging problem in video analytics due to event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. Central to these challenges is the way one models actions in video, i.e., action representation. In this paper, an action is viewed as a temporal sequence of local shape-deformations of centroid-centered object silhouettes, i.e., the shape of the centroid-centered object silhouette tunnel. Each action is represented by the empirical covariance matrix of a set of 13-dimensional normalized geometric feature vectors that capture the shape of the silhouette tunnel. The similarity of two actions is measured in terms of a Riemannian metric between their covariance matrices. The silhouette tunnel of a test video is broken into short overlapping segments and each segment is classified using a dictionary of labeled action covariance matrices and the nearest neighbor rule. On a database of 90 short video sequences this attains a correct classification rate of 97%, which is very close to the state-of-the-art, at almost 5-fold reduced computational cost. Majority-vote fusion of segment decisions achieves 100% classification rate.

M. Ristivojević and J. Konrad, "Multi-frame motion detection for active/unstable cameras," in Proc. Brazilian Symp. on Computer Graphics and Image Proc., Oct. 2009, [PDF: 232KB].

Network cameras, extensively used in video surveillance, often allow pan-tilt-zoom functionality and are also subject to wind load and mount vibrations, thus causing video frame misalignment. Although algorithms for motion detection, a basic block of most visual surveillance systems, are relatively mature for fixed cameras, they usually perform poorly for active and/or vibrating cameras. The issue is particularly severe for algorithms using multiple video frames jointly. In this paper, we extend our earlier work on multiple-frame motion detection to the case of active and unstable cameras. Our method accounts for spatially-affine, inter-frame transformations that can vary in time, uses a variational formulation and applies a level-set solution. We present ground-truth and real-data experimental results and show significant improvements over earlier methods.

P.-M. Jodoin, V. Saligrama, and J. Konrad, "Implicit active-contouring with MRF," in Proc. Int. Conf. on Image Analysis and Recognition, July 2009, [PDF: 1,820KB].

In this paper, we present a new image segmentation method based on energy minimization for iteratively evolving an implicit active contour. Methods for active contour evolution are important in many applications ranging from video post-processing to medical imaging, where a single object must be chosen from a multi-object collection containing objects sharing similar characteristics. Level set methods have played a fundamental role in many of these applications. These methods typically involve minimizing functionals over the infinite-dimensional space of curves and can be quite cumbersome to implement. The development of Markov random field (MRF) based algorithms, ICM and graph-cuts, over the last decade has led to fast, robust and simple implementations. Nevertheless, the main drawback of current MRF methods is that it is intended for global segmentation of objects. We propose a new MRF formulation that combines the computational advantages of MRF methods and enforces active contour evolution. Several advantages of the method include the ability to segment color images into an arbitrary number of classes; single parameter which can control region boundary smoothness; fast, easy implementation, which can handle images with widely varying characteristics.

P.-M. Jodoin, J. Konrad, and V. Saligrama, "Modeling background activity for behavior subtraction," in ACM/IEEE Int. Conf. Distributed Smart Cameras, Sept. 2008, [PDF: 6,660KB].

The detection of events that differ from what is considered normal is, arguably, the most important task for camera-based surveillance. Clearly, the definition of normal behavior differs from one application to another, and, therefore, approaches to its detection differ as well. In the case of intrusion monitoring, simple motion detection may be sufficient, such as based on background luminance/color modeling. However, in more complex scenarios, such as the detection of abandoned luggage, more advanced approaches have been developed, often relying on object path modeling. In this paper, we describe a new model for representing normality. Our model, that we call a behavior image, is low-dimensional and based on dynamics of luminance/color profiles, however it does not require explicit estimation of object paths. The process of estimating visual abnormality is then a simple comparison of training and observed behavior images, that we call behavior subtraction. We describe a new practical implementation of our model that is based on average activity. It is easy to program and requires little processing power and memory. Moreover, it is robust to motion detection errors, such as those resulting from parasitic background motion (e.g., heavy rain/snow, camera jitter). Most importantly, however, the method is not content-specific, and, therefore, is applicable to the monitoring of humans, cars or other objects in both uncluttered and highly-cluttered scenes. We support these claims by including various experimental results, from urban traffic, through sport scenes to natural environment.

E. Ermis, V. Saligrama, P.-M. Jodoin, and J.Konrad, "Abnormal behavior detection and behavior matching for networked cameras," in ACM/IEEE Int. Conf. Distributed Smart Cameras, Sept. 2008, [PDF: 2,338KB].

We consider a change detection problem in video surveillance applications and propose busy-idle rates, meaningful and easy to compute features of foreground objects, to characterize the behavior profile of a given pixel. We use these features to model the typical behavior that is observed in training sequences. Using a small number of samples for each pixel we generate behavior clusters, wherein pixels with similar behavior profiles fall into the same cluster. We then generate probabilistic models corresponding to behavior clusters, and use these models to perform abnormal behavior detection. We also develop geometry independence results based on busy-idle rates. Simply stated, a set of objects observed by multiple cameras, under certain conditions, generate similar busy-idle profiles in each camera, and this holds true regardless of the camera orientation with respect to the scene. We demonstrate this result via real world camera networks. Based on the premise of geometry independence, we use busy-idle rates and bring a novel approach to behavior matching problems, where the segments of image frame that exhibit similar behavior profiles are matched across cameras. This novel approach deviates from geometry based methods, and greatly simplifies the behavior matching problem. The simulation results indicate that even for a simple statistic, such as the mean busy-idle rates, the behavior matching can be performed, which underlines the efficacy and robustness of our approach.

J. McHugh, J. Konrad, V. Saligrama, P.-M. Jodoin, and D. Castanon, "Motion detection with false discovery rate control," in Proc. IEEE Int. Conf. Image Processing, Oct. 2008, [PDF: 380KB].

Visual surveillance applications such as object identification, object tracking, and anomaly detection require reliable motion detection as an initial processing step. Such a detection is often accomplished by means of background subtraction which can be as simple as thresholding of intensity difference between movement-free background and current frame. However, more effective background subtraction methods employ probabilistic modeling of the background followed by probability thresholding. In this case, the balance between false positives and false negatives (misses) is controlled by a threshold that needs to be adjusted heuristically depending on object sparsity. In this paper, we propose a different detection method that is based on false discovery rate control, a multiple-comparison procedure that applies thresholding in significance-score rather than probability space. The proposed approach allows explicit control of false positives and automatically adapts to object sparsity. The new method offers a qualitative improvement in real scenarios as well as a measurable performance gain over non-adaptive techniques when tested on synthetic sequences.

P.-M. Jodoin, J. Konrad, V. Saligrama, and V. Veilleux-Gaboury, "Motion detection with an unstable camera," in Proc. IEEE Int. Conf. Image Processing, Oct. 2008, [PDF: 1,780KB].

Fast and accurate motion detection in the presence of camera jitter is known to be a difficult problem. Existing statistical methods often produce abundant false positives since jitter-induced motion is difficult to differentiate from scene-induced motion. Although frame alignment by means of camera motion compensation can help resolve such ambiguities, the additional steps of motion estimation and compensation increase the complexity of the overall algorithm. In this paper, we address camera jitter by applying background subtraction to scene dynamics instead of scene photometry. In our method, an object is assumed moving if its dynamical behavior is different from the average dynamics observed in a reference sequence. Our method is conceptually simple, fast, requires little memory, and is easy to train, even on videos containing moving objects. It has been tested and performs well on indoor and outdoor sequences with strong camera jitter.

E. Ermis, V. Saligrama, P.-M. Jodoin, and J.Konrad, "Motion segmentation and abnormal behavior detection via behavior clustering," in Proc. IEEE Int. Conf. Image Processing, Oct. 2008, [PDF: 962KB].

We consider a change detection problem in video surveillance applications and propose busy-idle rates, meaningful and easy to compute features, to characterize the behavior profile of a given pixel. We describe the geometry independence property of these features, and use them to model the typical behavior that is observed in training sequences. Using a small number of samples for each pixel we generate behavior clusters, wherein pixels with similar behavior profiles fall into the same cluster. We then generate probabilistic models corresponding to behavior clusters, and use these models to perform abnormal behavior detection.

P.-M. Jodoin, V. Saligrama, and J. Konrad, "Behavior subtraction," in Proc. SPIE Visual Communications and Image Process., vol. 6822, pp. 10.1-10.12, Jan. 2008, [PDF: 2,107KB].

Network video cameras, invented in the last decade or so, permit today pervasive, wide-area visual surveillance. However, due to the vast amounts of visual data that such cameras produce human-operator monitoring is not possible and automatic algorithms are needed. One monitoring task of particular interest is the detection of suspicious behavior, i.e., identification of individuals or objects whose behavior differs from behavior usually observed. Many methods based on object path analysis have been developed to date (motion detection followed by tracking and inferencing) but they are sensitive to motion detection and tracking errors and are also computationally complex. We propose a new surveillance method capable of abnormal behavior detection without explicit estimation of object paths. Our method is based on a simple model of video dynamics. We propose one practical implementation of this general model via temporal aggregation of motion detection labels. Our method requires little processing power and memory, is robust to motion segmentation errors, and general enough to monitor humans, cars or any other moving objects in uncluttered as well as highly-cluttered scenes. Furthermore, on account of its simplicity, our method can provide performance guarantees. It is also robust in harsh environments (jittery cameras, rain/snow/fog).

T. Little, P. Ishwar, and J. Konrad, "A wireless video sensor network for autonomous coastal sensing," in Proc. Conf. on Coastal Environmental Sensing Networks (CESN), Apr. 2007, [PDF: 2,425KB].

We describe an architecture and prototype for a low-power and low-cost video sensor unit suitable for deployment in remote coastal sensing applications. Our design is based on the premise that if a complete video sensor unit can be constructed for less then $50 then it is possible to deploy a very large number of units providing area coverage measured in kilometers.

A. Jain and J. Konrad, "Crosstalk in automultiscopic 3-D displays: Blessing in disguise?," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 6490, pp. 12.1-12.12, Jan. 2007, [PDF: 372KB].

Most of 3-D displays suffer from interocular crosstalk, i.e., the perception of an unintended view in addition to intended one. The resulting ``ghosting'' at high-contrast object boundaries is objectionable and interferes with depth perception. In automultiscopic (no glasses, multiview) displays using microlenses or parallax barrier, the effect is compounded since several unintended views may be perceived at once. However, we recently discovered that crosstalk in automultiscopic displays can be also beneficial. Since spatial multiplexing of views in order to prepare a composite image for automultiscopic viewing involves sub-sampling, prior anti-alias filtering is required. To date, anti-alias filter design has ignored the presence of crosstalk in automultiscopic displays. In this paper, we propose a simple multiplexing model that takes crosstalk into account. Using this model we derive a mathematical expression for the spectrum of single view with crosstalk, and we show that it leads to reduced spectral aliasing compared to crosstalk-free case. We then propose a new criterion for the characterization of ideal anti-alias pre-filter. In the experimental part, we describe a simple method to measure optical crosstalk between views using digital camera. We use the measured crosstalk parameters to find the ideal frequency response of anti-alias filter and we design practical digital filters approximating this response. Having applied the designed filters to a number of multiview images prior to multiplexing, we conclude that, due to their increased bandwidth, the filters lead to visibly sharper 3-D images without increasing aliasing artifacts.

S. Ince, J. Konrad, and C. Vázquez, "Spline-based intermediate view reconstruction," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 6490, pp. 0F.1-0F.12, Jan. 2007, [PDF: 1,002KB].

Intermediate view reconstruction is an essential step in content preparation for multiview 3D displays and free-viewpoint video. Although many approaches to view reconstruction have been proposed to date, most of them share the need to model and estimate scene depth first, and follow with the estimation of unknown-view texture using this depth and other views. The approach we present in this paper follows this path as well. First, assuming a reliable disparity (depth) map is known between two views, we present a spline-based approach to unknown-view texture estimation, and compare its performance with standard disparity-compensated interpolation. A distinguishing feature of the spline-based reconstruction is that all virtual views between the two known views can be reconstructed from a single disparity field, unlike in disparity-compensated interpolation. In the second part of the paper, we concentrate on the recovery of reliable disparities especially at object boundaries. We outline an occlusion-aware disparity estimation method that we recently proposed; it jointly computes disparities in visible areas, inpaints disparities in occluded areas and implicitly detects occlusion areas. We then show how to combine occlusion-aware disparity estimation with spline-based view reconstruction presented earlier, and we experimentally demonstrate its benefits compared to occlusion-unaware disparity-compensated interpolation.

R. Lau, S. Ince, and J. Konrad, "Compression of still multi-view images for 3-D automultiscopic spatially-multiplexed displays," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 6490, pp. 0O.1-0O.9, Jan. 2007, [PDF: 3,690KB].

Automultiscopic (no glasses, multiview) displays are becoming a viable alternative to 3-D displays with glasses. However, since these displays require multiple views the needed transmission bit rate as well as storage space are of concern. In this paper, we describe results of our research on the compression of still multiview images for display on lenticular or parallax-barrier screens. In one approach, we examine compression of multiplexed images that, unfortunately, have relatively low spatial correlation and thus are difficult to compress. We also study compression/decompression of individual views followed by multiplexing at the receiver. However, instead of using full-resolution views, we apply compression to band-limited and downsampled views in the so-called ``N-tile format''. Using lower resolution images is acceptable since multiplexing at the receiver involves downsampling from full view resolution anyway. We use three standard compression techniques: JPEG, JPEG-2000 and H.264. While both JPEG standards work with still images and can be applied directly to an N-tile image, H.264, a video compression standard, requires N images of the N-tile format to be treated as a short video sequence. We present numerous experimental results indicating that the H.264 approach achieves significantly better performance than the other three approaches studied.

P.-M. Jodoin, M. Mignotte, and J. Konrad, "Background subtraction framework based on local spatial distributions," in Proc. Int. Conf. on Image Analysis and Recognition, pp. 370-380, Sept. 2006, [PDF: 456KB].

Most statistical background subtraction techniques are based on the analysis of temporal color/intensity distributions. However, learning statistics on a series of time frames can be problematic, especially when no frames absent of moving objects are available or when the available memory isn't sufficient to store the series of frames needed for learning. In this paper, we propose a framework that allows common statistical motion detection methods to use spatial statistics gathered on one frame instead of a series of frames as is usually the case. This simple and flexible framework is suitable for various applications including the ones with a mobile background such as when a tree is shaken by wind or when the camera jitters. Three statistical background subtraction methods have been adapted to the proposed framework and tested on different synthetic and real image sequences.

P.-M. Jodoin, M. Mignotte, and J. Konrad, "Light and fast statistical motion detection method based on ergodic model," in Proc. IEEE Int. Conf. Image Processing, Oct. 2006, [PDF: 204KB].

In this paper, we propose a light and fast pixel-based statistical motion detection method based on a background subtraction procedure. The statistical representation of the background relies on its spatial color distributions herein modeled by a mixture of Gaussians. The Gaussian parameters are obtained after segmenting one reference frame with an unsupervised Bayesian approach whose parameter estimation step is ensured by the K-means and Iterated Conditional Estimation (ICE) algorithms. Since the motion detection function only depends on a global mixture of M Gaussians, only a few bits per pixel need to be stored in memory. Our method achieves real-time performance especially when look up tables are used to store pre-calculated data. Results have been obtained on synthetic and real video sequences and compared with other statistical methods.

L. Oddsson, J. Konrad, S. Williams, R. Karlsson, and S. Ince, "A rehabilitation tool for functional balance using altered gravity and virtual reality," in 5-th Int. Workshop on Virtual Rehabilitation, Aug. 2006, [PDF: 680KB].

The current project is driven by the need for effective and functional treatment of various categories of patients with gait and balance problems. Furthermore, early treatment and mobilization of patients with hip fractures, a common consequence of falls especially in the elderly population, is critical for a successful outcome. Gait training in these populations of patients using partial body weight support (BWS) on a treadmill, a technique that involves unloading the subject through a harness, improves walking better than training with full weight bearing. One problem with the BWS technique that is not commonly acknowledged is that the supporting harness decreases the need for natural postural control. The harness provides an external support partly eliminating associated postural adjustments that are required during independent gait. We have developed a tool that can refine the concept of BWS training by allowing natural associated postural adjustments to occur. While in a supine position in a 90 deg tilted environment built around a modified hospital bed, subjects wear a backpack frame that is freely moving on air-bearings (cf. puck on an air hockey table) and attached through a cable to a pneumatic cylinder that provides a load to emulate G-like loads. Various exercise devices can be used including a treadmill, stepper and bicycle. Veridical visual input is provided through two 3-D automultiscopic displays that allow glasses free 3-D vision representing a virtual surrounding environment that may be acquired from sites chosen by the patient. A group of 12 healthy subjects were exposed to a combination of strength and balance training in such a tilted environment over a period of 4 weeks. Measures of both isokinetic strength and balance assessed in an upright position showed statistically significant improvements after training with postural measures indicating less reliance on visual and/or increased use of somatosensory cues.

N. Božinović and J. Konrad, "Modeling motion for spatial scalability," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, May 2006, [PDF: 378KB].

The dramatic proliferation of visual displays, from cell phones, through video iPods, PDAs, and notebooks, to high-quality HDTV screens, has raised the demand for a video compression scheme capable of decoding a "once-encoded" video at a range of supported video resolutions and with high quality. A promising solution to this problem has been recently proposed in the form of wavelet video coding based on motion-compensated temporal filtering (MCTF); scalability is naturally supported while efficiency is comparable to state-of-the-art hybrid coders. However, although rate (quality) and temporal scalability are natural in mainstream ``t+2D'' wavelet video coders, spatial scalability suffers from drift problems. In the light of the recently proposed ``2D+t+2D'' modification, which targets spatial scalability performance, we present a framework for the modeling of spatially-scalable motion that is well matched to this new structure. We propose a motion estimation scheme in which motion fields at different spatial scales are jointly estimated and coded. In addition, at lower spatial resolutions, we extend the block-wise constant motion model to a higher-order model base don cubic splines, effectively creating a ``mixture motion model'' that combines different models at different supported spatial scales. This advanced spatial modeling of motion significantly improves the coding efficiency of motion at low resolutions and leads to an excellent compression performance of the overall coder; spatial scalability performance of the proposed scheme approaches that of a non-scalable coder.

N. Božinović, J. Konrad, W. Zhao, and C. Vázquez, "On the importance of motion invertibility in MCTF/DWT video coding," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. II, pp. 49-52, Mar. 2005, [PDF: 149KB].

Motion-compensated temporal filtering implemented using lifting is an effective and efficient temporal decomposition tool that facilitates video compression competitive with the current standards. As recently shown, however, in order that a lifting-based motion-compensated discrete wavelet transform indeed implement the intended filtering along motion trajectories, motion transformation must be invertible and motion composition between frames must be well-defined. A departure from these conditions results in the application of sub-optimal subband decomposition filters which, in turn, degrades coding performance, even if prediction-step energy is minimized during motion estimation. In this paper, we study the impact of motion field invertibility error on the coding performance of an MCTF/DWT video coder. We propose two new motion field inversion methods and compare them to previously reported inversion techniques. We also compare coding results for all inversion algorithms with those of coding based on triangular meshes that are inherently invertible. Our results show that a significant improvement in coding performance is possible with more accurate motion field inversion.

S. Ince and J. Konrad, "Geometry-based estimation of occlusions from video frame pairs," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. II, pp. 933-936, Mar. 2005, [PDF: 752KB].

The knowledge of occlusions and newly-exposed areas, a natural consequence of changing object juxtaposition in a 3-D scene, can be effectively used to improve video coding efficiency, video rate conversion quality and view interpolation fidelity. Although various occlusion estimation methods have been proposed to date, most of them are not robust or are computationally complex. In this paper, we study two simple, well-known occlusion estimation methods, one based on a photometric mismatch between two frames of an image sequence, while the other based on a geometric mismatch. We demonstrate their weaknesses and propose a new geometric method that exhibits good robustness to noise in the data while maintaining low computational complexity.

J. Konrad and N. Božinović, "Importance of motion in motion-compensated temporal discrete wavelet transforms," in Proc. SPIE Image and Video Communications and Process., vol. 5685, pp. 354-365, Jan. 2005, [PDF: 368KB].

Discrete wavelet transforms (DWTs) applied temporally under motion compensation (MC) have recently become a very powerful tool in video compression, especially when implemented through lifting. A recent theoretical analysis has established conditions for perfect reconstruction in the case of transversal MC-DWT, and also for the equivalence of lifted and transversal implementations of MC-DWT. For Haar MC-DWT these conditions state that motion must be invertible, while for higher-order transforms they state that motion composition must be a well-defined operator. Since many popular motion models do not obey these properties, thus inducing errors (prior to compression), it is important to understand what is the impact of motion non-invertibility or quasi-invertibility on the performance of video compression. In this paper, we present new experimental results of a study aiming at a quantitative evaluation of such impact in case of block-based motion. We propose a new metric to measure the degree with which two motion fields are not inverses of each other. Using this metric we investigate several motion inversion schemes, from simple temporal sample-and-hold, through spatial nearest-neighbor, to advanced spline-based inversion, and we compare compression performance of each method to that of independently-estimated forward and backward motion fields. We observe that compression performance monotonically improves with the reduction of the proposed motion inversion error, up to 1-1.5dB for the advanced spline-based inversion. We also generalize the problem of ``unconnected'' pixels by extending it to both update and prediction steps, as opposed to the update step only used in conventional methods. Initial tests show favorable results compared to previously reported techniques.

S. Ince and J. Konrad, "Recovery of a missing color component in stereo images (or helping NASA find little green martians)," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 5664, pp. 127-138, Jan. 2005, [PDF: 9,234KB].

The current exploration of Mars by the National Aeronautics and Space Administration (NASA) has produced a lot of images of its surface. Two rovers, ``Spirit'' and ``Opportunity'', are each equipped with a pair of high-resolution cameras, called ``PanCam''. While most commercial cameras are sensitive to three spectral bands, typically red (R), green (G) and blue (B), the ``PanCam'' is sensitive to many more bands since it was designed to deliver additional information to geologists. This is achieved by means of a filter wheel in front of each camera lens. It turns out that slightly different filters are used in both cameras; while the left camera is equipped with red, green and blue filters, among others, the right camera does not have a green filter on its color wheel. Therefore, since the G component of the right image is missing, currently it is not possible to view a 3D image of Mars surface in color. In this paper, we develop a method to reconstruct one missing color component of an image given its remaining color components and all three components of the other image of a stereo pair. The method relies on disparity-compensated prediction. In the first step, a disparity field is estimated using the two available components (R and B). In the second step, the missing component is recovered using disparity-compensated prediction from the same component (G) in the other image of the stereo pair. In ground-truth experiments, we have obtained high PSNR values of the reconstruction error confirming efficacy of the approach. Similar reconstructions using images transmitted by the rovers yield comfortable 3D experience when viewing with shutter glasses.

S. Laurent, W. Karl, J. Konrad, J. Wilson, J. Baumgardner, and M. Mendillo, "Design of a high-definition imaging (HDI) analysis technique adapted to challenging environments," in Proc. SPIE Applications of Digital Image Process., vol. 5558, pp. 676-687, Nov. 2004, [PDF: 806KB].

This paper presents a highly automated, more accurate approach to High Definition Imaging (HDI) using low signal-to-noise digital videos recorded at ground-based telescopes. The HDI approach involves the acquisition of a video sequence (10^3-10^5 fields) taken through a turbulent atmosphere followed by three-step post-processing. The specific goal is to be able to reproduce expert results, while limiting human interaction, to study both surface features and the atmospheres of planets and moons. The telescopes used here are preferably small and not equipped with Adaptive Optics. The three steps include registration, selection and restoration. First, registration, based on a template, is performed to find the exact position of each object. Then only higher-quality frames are selected by a criterion based on a measure of the blur in a region of interest around that object. The best quality frames are then shifted and added together to create an effective time exposure under ideal observing conditions. The last step is to remove distortions in the image, caused by the atmosphere and the optical equipment, through a regularized deconvolution of instrument and residual atmospheric blur. This procedure is done first in the white light domain, and then the registration information obtained there is applied to spectral data.

N. Božinović, J. Konrad, T. André, M. Antonini, and M. Barlaud, "Motion-compensated lifted wavelet video coding: toward optimal motion/transform configuration," in Signal Process. XII: Theories and Applications (Proc. Twelfth European Signal Process. Conf.), pp. 1975-1978, Sept. 2004, [PDF: 120KB].

Various coding schemes based on lifting implementation of the discrete wavelet transform applied along motion trajectories have recently gained a lot of interest in video processing community as strong candidates to succeed current state-of-the-art hybrid coders. Still, there are a number of very important issues, including the choice of particular wavelet transform and motion model, that have significant impact on the overall coding performance and will determine usefulness of this class of coders. In this paper, we classify and discuss different motion/transform configurations that are being used in motion-compensated lifting-based wavelet transforms. Our results show that coder performance changes significantly for different combinations of motion models and transforms used.

L. Oddsson, C. Wall III, P. Meyer, and J. Konrad, "A virtual environment with simulated gravity for balance rehabilitation of bedridden patients and frail individuals," in XV-th Congress of the International Society of Electrophysiology and Kinesiology, p. 55, June 2004.

Rehabilitation of physical function and balance in frail individuals and bedridden patients is a challenge for the therapist. Early ambulation following hip fracture has been shown to be directly predictive of extended survival indicating the importance of effective interventions that improve physical function and balance and thereby minimize bed time. Such interventions should preferably involve whole body exercises that challenge coordination and motor function. We have built a 90 deg tilted room environment where a subject "stands" in a supine position while strapped to a frictionless device through a backpack frame and harness that allows free motion in the frontal plane, similar to upright standing. The device is attached to a weight stack through a series of pulleys, which provides a variable gravity-like force that the subject must balance against to remain "upright" in the tilted environment. The room contains common physical objects that are visually "polarized" (well defined "up" and "down", e.g. a chair) to convey to the subject the perception of being upright in a 1-g environment. Healthy subjects, who trained their balance in this supine position on 10 occasions over a two-week period, showed dramatic improvements in upright balance performance including a 50% increase in time to balance on a half cylinder on one leg and a 30% decrease in COP sway velocity while standing on one leg. We expect frail individuals and bedridden patients to be able to safely perform functional balance training in the tilted environment that would transfer to improved function and mobility in an upright position when negotiating gravity. We plan a portable version of this system that would incorporate recently available autostereoscopic 3-D displays, a technique that allows 3-D immersion without the use of glasses, to provide "windows" of a virtual environment around the subject instead of the currently used physical room. A 5-camera, digital image acquisition system, called the Pentacam is being developed to capture 3-D images that can be tailored to the preferences of different individuals. For example, images could be acquired from sites that are familiar to the subject including their own or a relative's indoor or outdoor home environment.

M. Ristivojević and J. Konrad, "Joint space-time image sequence segmentation: object tunnels and occlusion volumes," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. III, pp. 9-12, May 2004, [PDF: 540KB].

Spatial segmentation of image sequences is usually computed based on motion between two frames. Some recent approaches extend this to joint segmentation in space-time; the resulting 3-D segmentation (in x-y-t space) can be interpreted as a volume ``carved out'' by a moving object in the image sequence domain, or the so-called ``object tunnel''. In this paper, we extend this concept to explicit modeling of occlusion events in the x-y-t space. In addition to the modeling of object evolution, we also model occluded and newly-exposed areas in the background and in the object by means of ``occlusion volume'', a new space-time concept. We propose a variational formulation of the problem that we solve using the multiphase level set method. We show experimental results for synthetic and natural image sequences.

N. Božinović and J. Konrad, "Mesh-based motion models for wavelet video coding," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. III, pp. 141-144, May 2004, [PDF: 101KB].

Discrete wavelet transforms implemented using lifting along motion trajectories are effective and efficient temporal decomposition tools that facilitate video compression competitive with the current standards. As recently shown, however, in order that a lifting-based motion-compensated wavelet transform be equivalent to its transversal (standard) implementation, motion transformation must be invertible and motion composition between frames must be well-defined. In this paper, we discuss various mesh-based motion models that satisfy requirements of invertibility and composition, and thus are suitable for use in motion-compensated lifting-based wavelet transforms. We propose a new mesh configuration that preserves regularity of the mesh structure but provides better motion compensation compared to previously-reported mesh topologies, particularly in the proximity of image boundaries. Our results show that an improvement in motion compensation and overall compression performance is possible with only a fractional increase in motion overhead bit-rate.

T. André, M. Cagnazzo, M. Antonini, M. Barlaud, N. Božinović, and J. Konrad, "(N,0) motion-compensated lifting-based wavelet transform," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. III, pp. 121-124, May 2004, [PDF: 64KB].

Motion compensation has been widely used in both DCT- and wavelet-based video coders for years. The recent success of temporal wavelet transform based on motion-compensated lifting suggests that a high-performance, scalable wavelet video coder may soon outperform best DCT-based coders. As recently shown, however, the motion-compensated lifting does not implement exactly its transversal equivalent unless certain conditions on motion are satisfied. In this paper, we review those conditions, and we discuss their importance. We derive a new class of temporal transforms, the so-called 1-N transversal or (N,0) lifting transforms, that are particularly interesting if those conditions on motion are not satisfied. We compare experimentally the 1-3 and 5-3 motion-compensated wavelet transforms for the ubiquitous block-motion model used in all video compression standards. For this model, the 1-3 transform outperforms the 5-3 transform due to the need to transmit additional motion information in the later case. This interesting result, however, does not extend to motion models satisfying the transversal/lifting equivalence conditions.

J. Konrad and P. Agniel, "Non-orthogonal sub-sampling and anti-alias filtering for multiscopic 3-D displays," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 5291, pp. 105-116, Jan. 2004, [PDF: 1,168KB].

Multiview passive 3-D displays, such as those based on lenticular or parallax-barrier technologies, require multiplexing of views into a single same-size RGB image. Thus, multiplexing of N views necessitates N:1 sub-sampling of each view and must be preceded by suitable lowpass filtering to prevent, or at least reduce, aliasing. Without such filtering, objectionable "jagged" edges, distorted textures, or Moire patterns are perceived although, admittedly, these effects are not as disturbing as in the case of single-view sub-sampling without multiplexing with other views. In this paper, unlike in our previous work, we consider anti-alias filtering derived from a non-orthogonal lattice. First, we approximate pixel layout for each view (sampling pattern) by a two-dimensional lattice; we find parameters of the lattice by minimizing a mismatch error between lattice and single-view points. Then, based on lattice parameters, we find frequency-domain specifications of the anti-alias filter. The filter has hexagonal passband and thus is non-separable. Although previously we designed such filters for floating-point implementations, here we opt for the more practical fixed-point arithmetic; the resulting filters can be easily implemented on ubiquitous fixed-point DSP chipsets. The fixed-point filters slightly depart from the desired magnitude specifications, but when applied to actual multiview images they produce almost indistinguishable results from those obtained by floating-point counterparts.

Y. Shi, J. Konrad, and W. Karl, "Multiple motion and occlusion segmentation with a multiphase level set method," in Proc. SPIE Visual Communications and Image Process., vol. 5308, pp. 189-198, Jan. 2004, [PDF: 2,059KB].

In this paper, we propose a new variational formulation for simultaneous multiple motion segmentation and occlusion detection in an image sequence. For the representation of segmented regions, we use the multiphase level set method proposed by Vese and Chan. This method allows an efficient representation of up to 2^L regions with L level-set functions. Moreover, by construction, it enforces a domain partition with no gaps and overlaps. This is unlike previous variational approaches to multiple motion segmentation, where additional constraints were needed. The variational framework we propose can incorporate an arbitrary number of motion transformations as well as occlusion areas. In order to minimize the resulting energy, we developed a two-step algorithm. In the first step, we use a feature-based method to estimate the motions present in the image sequence. In the second step, based on the extracted motion information, we iteratively evolve all level set functions in the gradient descent direction to find the final segmentation. We have tested the above algorithm on both synthetic- and natural-motion data with very promising results. We show here segmentation results for two real video sequences.

M. Ristivojević and J. Konrad, "Joint space-time motion-based video segmentation and occlusion detection using multi-phase level sets," in Proc. SPIE Visual Communications and Image Process., vol. 5308, pp. 156-167, Jan. 2004, [PDF: 778KB].

Spatial video segmentation is usually performed based on motion between two frames. Some recent approaches extend this to joint segmentation in space-time; the resulting 3-D segmentation can be interpreted as a volume ``carved out'' by a moving object in the image sequence domain, or the so-called ``object tunnel''. In this paper, we extend this concept to explicit modeling of occlusion events in space-time. In addition to the modeling of object evolution, we also explicitly model occluded and newly-exposed areas in the background by means of ``occlusion volume'', a new space-time concept. A voxel belongs to occlusion volume if its intensity is consistent with past intensities along its motion trajectory but inconsistent with future intensities (reversed for ``exposed volume''). We propose a variational formulation of the problem that we solve using the multiphase level set method. We show encouraging experimental results for synthetic and natural image sequences.

J. Konrad, "Transversal versus lifting approach to motion-compensated temporal discrete wavelet transform of image sequences: equivalence and tradeoffs," in Proc. SPIE Visual Communications and Image Process., vol. 5308, pp. 452-463, Jan. 2004, [PDF: 133KB].

Lifting-based implementations of various discrete wavelet transforms applied in the temporal direction under motion compensation have recently become a very powerful tool in video compression research. We present in this paper a theoretical analysis of motion compensation in both transversal and lifted implementations of such transforms. We derive conditions for perfect reconstruction in the case of motion-compensated transversal discrete wavelet transform. We also derive conditions on motion transformation assuring that a motion-compensated lifting scheme is exactly equivalent to its transversal counterpart. In general, these conditions require that motion transformation allow composition and be invertible. Unfortunately, many motion models do not obey these properties, thus inducing subband decomposition errors (prior to compression). We propose an alternative approach to motion compensation in the case of Haar transform. This new approach poses no constraints on motion; motion-compensated lifted Haar transform exactly implements its transversal implementation, and the latter obeys perfect reconstruction, both regardless of motion transformation used. This new approach, however, does not extend to the 5/3 or any higher-order discrete wavelet transform.

R. Stasiński and J. Konrad, "Linear shift-variant filtering for POCS reconstruction of irregularly sampled images," in Proc. IEEE Int. Conf. Image Processing, vol. III, pp. 689-692, Sept. 2003, [PDF: 70KB].

The reconstruction of a regularly-sampled image from irregularly-spaced samples is a stumbling block in various video processing tasks. In the past, we have developed a POCS-based (projection onto convex sets) reconstruction method that applies two operators sequentially: bandwidth limitation and sample substitution. Although the method works well, we have observed an interesting paradox: wide-band filtering results in better-looking images, but lower PSNR values, than narrow-band filtering (increased blur). This can be explained by a too-short impulse response of the wide-band filter unable to ``fill-in'' the missing samples in sparsely populated areas. In this paper, we propose an improved version of our algorithm where linear shift-invariant (LSI) filtering is replaced by linear shift-variant (LSV) filtering. The LSV filtering is implemented as a parallel bank of LSI filters, each with different bandwidth (impulse response). We demonstrate experimentally a significant reduction of the reconstruction error due to the new LSV filtering.

N. Božinović and J. Konrad, "Scan order and quantization for 3D-DCT coding," in Proc. SPIE Visual Communications and Image Process., vol. 5150, pp. 1204-1215, July 2003, [PDF: 2,876KB].

Two types of coders dominate the field of video compression research today: well-established hybrid coders, that are in the core of all MPEG and H.26X standards, and emerging three-dimensional (3D) subband coders, largely inspired by the success of wavelet-based still image compression. However, there are surprisingly few results reported on 3D transform coding based on the discrete cosine transform (DCT). Even while exploiting all the beneficial properties of the DCT itself (forward/inverse symmetry, fast separable implementation, and excellent energy compaction), these coders under-perform when compared to competing hybrid coders primarily due to inefficient quantization, scanning and entropy coding used. In this paper, we study means of improving 3D-DCT coding by proposing adaptive scanning order and quantization of coefficients that are better matched to 3D-DCT spectrum of a motion sequence. Our results show significant improvement in performance over previously reported techniques.

J. Konrad and M. Ristivojević, "Video segmentation and occlusion detection over multiple frames," in Proc. SPIE Image and Video Communications and Process., vol. 5022, pp. 377-388, Jan. 2003, [PDF: 1,242KB].

Spatial segmentation of image sequences is usually performed based on motion between two frames, and then followed by tracking. Some recent approaches extend this to joint segmentation in space-time; the resulting 3-D segmentation (in x-y-t space) can be interpreted as a volume ``carved out'' by a moving object in the image sequence domain. We call such volumes ``object tunnels''. In this paper, we propose a new approach to occlusion analysis and characterization that is based on object tunnels. It results from the observation that object-tunnel wall for a fully visible object has different shape than that for an object undergoing occlusion or exposure. Walls of tunnels associated with moving objects have tangent planes that are, in general, non-parallel to the time axis. When an object gets occluded or exposed by a static feature, part of the object tunnel wall stops evolving freely; its spatial coordinates remain fixed (static occlusion boundary) while the temporal coordinate increases linearly (time evolution). This forces part of the wall to be comprised of lines parallel to the time axis, each line defined by a single point on the occlusion boundary. In case this boundary is a straight-line edge, the occluding part of the wall becomes planar. We propose to detect occlusions by searching for such characteristic surfaces of object tunnel walls. We formulate the problem for planar occlusion walls based on a robust distance metric, and we show experimental results for various occlusion types on synthetic and camera-acquired image sequences.

A. Litvin, J. Konrad, and W. Karl, "Probabilistic video stabilization using Kalman filtering and mosaicking," in Proc. SPIE Image and Video Communications and Process., vol. 5022, pp. 663-674, Jan. 2003, [PDF: 1,357KB].

The removal of unwanted, parasitic vibrations in a video sequence induced by camera motion is an essential part of video acquisition in industrial, military and consumer applications. In this paper, we present a new image processing method to remove such vibrations and reconstruct a video sequence void of sudden camera movements. Our approach to separating unwanted vibrations from intentional camera motion is based on a probabilistic estimation framework. We treat estimated parameters of interframe camera motion as noisy observations of the intentional camera motion parameters. We construct a physics-based state-space model of these interframe motion parameters and use recursive Kalman filtering to perform stabilized camera position estimation. A six-parameter affine model is used to describe the interframe transformation, allowing quite accurate description of typical scene changes due to camera motion. The model parameters are estimated using a p-norm-based multi-resolution approach. This approach is robust to model mismatch and to object motion within the scene (which are treated as outliers). We use mosaicking in order to reconstruct undefined areas that result from motion compensation applied to each video frame. Registration between distant frames is performed efficiently by cascading interframe affine transformation parameters. We compare our method's performance with that of a commercial product on real-life video sequences, and show a significant improvement in stabilization quality for our method.

M. Kardouchi and J. Konrad, "Recovering large-amplitude disparity fields using adaptive interpolation," in Proc. SPIE Image and Video Communications and Process., vol. 5022, pp. 761-771, Jan. 2003, [PDF: 3,824KB].

Computing dense disparity fields from large-baseline stereo is a difficult problem because of long-range correspondences involved. A typical solution to this problem is to use optical flow or block matching methods implemented over a hierarchy of resolutions. However, these approaches cannot easily cope with disparity discontinuities. Recently, we have proposed a novel approach that combines feature matching and Delaunay triangulation. In this approach, first feature points are extracted using intensity corner detector, and then corresponding feature-point pairs are found using cross-correlation. These two steps result in a reliable but sparse map of disparity vectors. In order to compute a dense disparity field, the third step involves Delaunay triangulation followed by disparity interpolation based on an affine (planar) model. The resulting disparity fields are continuous everywhere, and thus are not realistic; typical stereo image pairs exhibit disparity discontinuities at object boundaries. To address this problem, in the past we subdivided some Delaunay triangles into smaller ones. Although this approach has significantly improved the rendition of disparity discontinuities, it did not always work reliably. In this paper, we propose an adaptive interpolation over Delaunay triangles. As before, the interpolation is distance-dependent, i.e., accounts for Euclidian distance between the position of disparity under interpolation and three vertices of a triangle. The distance-dependent weights, however, are now additionally adapted so that the interpolated, pixel-based disparities within each triangle afford discontinuities. The new method has been applied to natural stereoscopic images. The resulting dense disparity fields exhibit clear, although subtle, discontinuities at object boundaries, and are more realistic than disparity fields obtained by the prior approach.

J. Konrad and P. Agniel, "Artifact reduction in lenticular multiscopic 3-D displays by means of anti-alias filtering," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 5006, pp. 336-347, Jan. 2003, [PDF: 2,279KB].

This paper addresses the issue of artifact visibility in automultiscopic 3-D lenticular displays. A straightforward extension of the two-view lenticular autostereoscopic principle to M views results in an M-fold loss of horizontal resolution due to the subsampling needed to properly multiplex the views. In order to circumvent the imbalance between the horizontal and vertical resolution, a tilt can be applied to the lenticules to orient them at a small angle to the vertical direction, as is done in the SynthaGram (TM) display from Stereographics Corp. In either case, to avoid aliasing the subsampling should be preceded by suitable lowpass pre-filtering. Although for purely vertical lenticules a sufficiently narrowband lowpass horizontal filtering suffices, the situation is more complicated for diagonal lenticules; the subsampling of each view is no more orthogonal, and more complex sampling models need to be considered. Based on multidimensional sampling theory, we have studied multiview sampling models based on lattices. These models approximate pixel positions on a lenticular automultiscopic display and lead to optimal anti-alias filters. In this paper, we report results for a separable approximation to non-separable 2-D anti-alias filters based on the assumption that the lenticule slant is small. We have carried out experiments on a variety of images, and different filter bandwidths. We have observed that the theoretically-optimal bandwidth is too restrictive; aliasing artifacts disappear, but some image details are lost as well. Somewhat wider bandwidths result in images with almost no aliasing and largely preserved detail. For subjectively-optimized filters, the improvements, although localized, are clear and enhance the 3-D viewing experience.

J. Konrad and M. Ristivojević, "Joint space-time image sequence segmentation based on volume competition and level sets," in Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 573-576, Sept. 2002, [PDF: 209KB].

In this paper, we address the issue of joint space-time segmentation of image sequences. Typical approaches to such segmentation consider two image frames at a time, and perform tracking of individual segmentations across time. We propose to perform this segmentation jointly over multiple frames. This leads to a 3-D segmentation, i.e., search for a volume ``carved out'' by a moving object in the (3-D) image sequence domain. We pose the problem in Bayesian framework and use the MAP criterion. Under suitable structural and segmentation/motion models we convert MAP estimation to a functional minimization. The resulting problem can be viewed as volume competition , a 3-D generalization of region competition. We parameterize the unknown surface to be estimated, but rather than solving for it using an active-surface approach, we embed it into a higher-dimensional function and use the level-set methodology. We show experimental results for the simpler case of object motion against still background although, given suitable models, the general formulation can handle complex motion too.

J. Konrad and N. Božinović, "Interpretation of uniform translational image motion: DCT versus FT," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 281-284, Sept. 2002, [PDF: 2,264KB].

We study properties of the discrete cosine transform (DCT) when applied to an image sequence formed by uniformly translating a still image. The Fourier transform (FT) applied to such a sequence has non-zero content only on a spatio-temporal frequency plane orthogonal to the direction of motion. We derive an equivalent spectrum for the DCT case. The spectrum function is more complicated than in the FT case and cannot be easily interpreted analytically. However, its numerical evaluation demonstrates that spectral occupancy in the DCT domain is limited to a narrow band around a plane similar to one in the FT case with two important differences: the plane is subject to folding, and the DCT coefficient amplitude is strongly attenuated for larger temporal ``frequencies''. We verify the theoretical derivations experimentally on images. The obtained result opens an interesting possibility for the computation of constant-velocity motion in the DCT domain. We demonstrate some preliminary results of motion estimation in the 3-D DCT domain by identifying directions of spectral occupancy with respect to transform coefficients.

C. Vázquez, E. Dubois, and J. Konrad, "Reconstruction of irregularly-sampled images by regularization in spline spaces," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 405-408, Sept. 2002, [PDF: 802KB].

We are concerned with the reconstruction of a regularly-sampled image based on irregularly-spaced samples thereof. We propose a new iterative method based on a cubic spline representation of the image. An objective function taking into account the similarity to the known samples and the regularity of the function is minimized in order obtain a good approximation. We apply the developed algorithm to motion-compensated image interpolation. Under motion compensation, the resulting sampling grids are irregular and require the irregular/regular interpolation. We show experimental results on real-world images and we compare our results with other methods proposed in the literature.

R. Stasiński and J. Konrad, "Space-variable filtering for approximation of uniformly sampled image from samples on irregular grids," in 5-th Nordic Signal Proc. Symp., Oct. 2002, [PDF: 191KB].

The paper presents a space-variable POCS-based (projection onto convex sets) method for the reconstruction of a regularly-sampled image from its irregularly spaced samples. Such reconstruction is often needed in image processing and coding, for example in stereo vision and motion compensation. The proposed approach applies two operators sequentially: bandwidth limitation and sample substitution, and is based on our earlier work. The contribution of this paper is the space-variable implementation of bandwidth limitation operator, which has been postulated previously. The operator is realized in the simplest possible way as a filter with two sets of coefficients, a measure of local density of irregular grid determines which set is used. The technique is efficient computationally although at the cost of increased memory requirements. Experimental results demonstrate that indeed, the new technique is much better in terms of PSNR, convergence speed, and visual quality than methods described previously.

R. Stasiński and J. Konrad, "Improved POCS-based image reconstruction from irregularly-spaced samples," in Signal Process. XI: Theories and Applications (Proc. Eleventh European Signal Process. Conf.), vol. 2, pp. 461-464, Sept. 2002, [PDF: 142KB].

This paper presents an enhanced POCS-based (projection onto convex sets) method for the reconstruction of a regularly-sampled image from its irregularly-spaced samples. Such a reconstruction is often needed in image processing and coding, for example when using motion compensation. The proposed approach applies two operators sequentially: bandwidth limitation and sample substitution, and is based on our earlier work. The contribution of this paper is a new, simpler implementation of the algorithm that allows for faster convergence, and provides better performance, although at the cost of increased memory requirements.

A.-R. Mansouri, T. Chomaud, and J. Konrad, "A comparative evaluation of algorithms for fast computation of level set PDEs with applications to motion segmentation," in Proc. IEEE Int. Conf. Image Processing, pp. 636-639, Oct. 2001, [PDF: 228KB].

We address the problem of fast computation of level set partial differential equations (PDEs) in the context of motion segmentation. Although several fast level set computation algorithms are known, some of them, such as the fast marching method, are not applicable to the video segmentation problem since the front being computed does not advance monotonically. We study narrow-banding, pyramidal and a pyramidal/narrow-banding schemes that leads to a 70-fold time gain over the single-resolution scheme.

R. Stasiński and J. Konrad, "POCS reconstruction of stereoscopic views," in Proc. Int. Conf. on Augmented, Virtual Environments and Three-Dimensional Imaging, pp. 41-44, May 2001, [PDF: 111KB].

This paper presents an application of POCS (projection onto convex sets) methodology to the reconstruction of intermediate stereoscopic views. The basic problem in such a reconstruction, resulting from disparity compensation, is that of the recovery of a regularly-sampled image from its irregularly-spaced samples. This problem also arises in other image processing and coding applications. The results reported here improve our previous POCS-based reconstruction method by locally adapting the algorithm to the density of image samples. We also extend the method to color images by implementing the method in the luminance-chrominance (Y-U-V) space.

M. Kardouchi, J. Konrad, and C. Vázquez, "Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction," in Proc. SPIE Visual Communications and Image Process., vol. 4310, pp. 340-351, Jan. 2001, [PDF: 801KB].

This paper describes a method for establishing dense correspondence between two images in a video sequence (motion) or in a stereo pair (disparity) in case of large displacements. In order to deal with large-amplitude motion or disparity fields, multi-resolution techniques such as blocks matching and optical flow have been used in the past. Although quite successful, these techniques cannot easily cope with motion/disparity discontinuities as they do not explicitly exploit image structure. Additionally, their computational complexity is high; block matching requires examination of numerous vector candidates while optical flow-based techniques are iterative. In this paper, we propose a new approach that addresses both issues. The approach combines feature matching with Delaunay triangulation, and thus reliable long-range correspondences result while the computational complexity is not high (sparse representation). In the proposed approach, feature points are found first using a simple intensity corner detector. Then, correspondence pairs between two images are found by maximizing cross-correlation over a small window. Finally, the Delaunay triangulation is applied to the resulting points, and a dense vector field is computed by planar interpolation over Delaunay triangles. The resulting vector field is continuous everywhere, and thus does not reflect motion or depth discontinuities at object boundaries. In order to improve the rendition of such discontinuities, we propose to further divide Delaunay triangles whenever the displacement vectors within a triangle do not allow good intensity match. The approach has been extensively tested on stereoscopic images in the context of intermediate view reconstruction where the quality of estimated disparity fields is critical for final image rendering. The first results are very encouraging as the reconstructed images are of high quality, especially at object boundaries, and the computational complexity is lower than that of multi-resolution block matching.

A.-R. Mansouri, A. Olivier, and J. Konrad, "Topology-independent region tracking with level sets," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 66-69, Sept. 2000, [PDF: 299KB].

This paper presents a new approach to the tracking of regions in an image sequence. Unlike most other methods, the proposed approach can handle topology changes, i.e., regions may split or merge. This flexibility is naturally embedded into a partial differential equation that solves a minimum description length (MDL) estimation problem. The basic estimation criterion consists of only two terms: the description length of the region shape mismatch and the description length of the region itself, but we show possible extensions to this basic formulation. We minimize the MDL criterion using the level set methodology that inherently accounts for topology changes. We show results for natural data with natural as well as synthetic motion.

C. Vázquez, J. Konrad, and E. Dubois, "Wavelet-based reconstruction of irregularly sampled images: Application to stereo imaging," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 319-322, Sept. 2000, [PDF: 150KB].

We are concerned with the reconstruction of a regularly-sampled image based on irregularly-spaced samples thereof. We propose a new iterative method based on a wavelet representation of the image. For this representation we use a biorthogonal spline wavelet basis implemented on an oversampled grid. We apply the developed algorithm to disparity-compensated stereoscopic image interpolation. Under disparity compensation, the resulting sampling grids are irregular and require the irregular/regular interpolation. We show experimental results on real-world images and we compare our results with other methods proposed in the literature.

R. Stasiński and J. Konrad, "POCS-based image reconstruction from irregularly-spaced samples," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 315-318, Sept. 2000, [PDF: 205KB].

This paper presents a method for the reconstruction of a regularly-sampled image from its irregularly-spaced samples. Such reconstruction is often needed in image processing and coding, for example when using motion compensation. The proposed approach is based on the theory of projections onto convex sets. Two projection operators are used: bandwidth limitation and sample substitution. The approach is similar to some methods presented in the literature in the past, but differs in the implementation. The bandwidth limitation is implemented in the frequency domain on an oversampled grid thus allowing substantial flexibility in spectrum shaping of the reconstructed image. Additionally, a fast Fourier transform algorithm specifically designed for irregularly-sampled images is used to reduce the computational complexity. A number of experimental results on natural images are presented.

A.-R. Mansouri and J. Konrad, "Minimum description length region tracking with level sets," in Proc. SPIE Image and Video Communications and Process., vol. 3974, pp. 515-525, Jan. 2000, [PDF: 682KB].

This paper addresses the problem of tracking an arbitrary region in a sequence of images, given a pre-computed velocity field. Such a problem is of importance in applications ranging from video surveillance to video database search. The algorithm presented here formulates tracking as an estimation problem. We propose, as our estimation criterion, a precise description length measure that quantifies tracking performance. In this context, tracking is naturally formulated as minimum description length estimation. The solution to this estimation problem is given by particular evolution equations for the region boundary. The implicit representation of the region boundary by the zero level set of a smooth function yields an equivalent set of partial differential equations and the added benefit of topology independence; regions may split (e.g., for divergent velocity fields) or merge (e.g., for convergent velocity fields) during tracking, clearly a desirable feature in real-world applications. We illustrate the performance of the proposed algorithm on a number of real images with natural motion.

A.-R. Mansouri, B. Sirivong, and J. Konrad, "Multiple motion segmentation with level sets," in Proc. SPIE Image and Video Communications and Process., vol. 3974, pp. 584-595, Jan. 2000, [PDF: 1,079KB].

Motion segmentation of an image sequence belongs to the most difficult and important problems in video processing and compression, and in computer vision. In this paper, we consider the problem of segmenting an image into multiple regions possibly undergoing different motions. To this end we use level sets of functions evolving according to certain partial differential equations. Contrary to numerous other motion segmentation algorithms based on level sets, we compute accurate motion boundaries without relying on intensity boundaries as an accessory. This will be illustrated on examples where intensity boundaries are hardly visible and yet motion boundaries are accurately identified. The main benefit of the level set representation is in its ability to handle variations in the topology of the level sets. As a result, it is only necessary to know the total number of distinct motion classes and their parameters. We describe an automatic initialization procedure that is based on feature point correspondences and K-means clustering in a 6-parameter space of affine parameters. We illustrate the performance of the proposed algorithm on real images with both real and synthetic motion.

J. Konrad and Z.-D. Lan, "Dense disparity estimation from feature correspondences," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 3957, pp. 90-101, Jan. 2000, [PDF: 1,167KB].

Stereoscopic disparity plays an important role in the processing and compression of 3-D imagery. For example, dense disparity fields are used to reconstruct intermediate (varying-viewpoint) images. Although for small camera baselines dense disparity can be reliably estimated using gradient-based methods, this is not the case for large baselines due to the violation of underlying assumptions (e.g., local intensity linearity). Block matching algorithms work better but they are likely to get trapped in a local minimum due to the increased search space. An appropriate method to estimate large disparities is by using feature (characteristic) points. However, since feature points are unique, they are also sparse. In this paper, we propose a disparity estimation method that combines the reliability of feature-based correspondence methods with the resolution of dense approaches. In the first step we find feature points in the left and right images using Harris operator. In the second step, we select those feature points that allow one-to-one left-right correspondence based on a cross-correlation measure. In the third step, we use the computed correspondence points to control the computation of dense disparity via regularized block matching that minimizes matching and disparity smoothness errors. The approach has been tested on several large-baseline stereo pairs with encouraging initial results.

K. Belloulata, R. Stasiński, and J. Konrad, "Region-based image compression using fractals and shape-adaptive DCT," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 815-819, Oct. 1999, [PDF: 425KB].

The significant effort to provide region-based compression and functionality within the MPEG-4 standard is not paralleled in the still-image compression domain. In this paper, we propose an approach to fractal coding of still images that is truly region-based. Unlike previous fractal compression methods the proposed approach compresses an image region-by-region based on a prior segmentation, very much like in MPEG-4; individual regions can be decoded without full image decoding. The method performs the domain/range block matching in frequency domain using a shape-adaptive discrete cosine transform. Experimental results evaluating the performance of the approach are shown.

A.-R. Mansouri and J. Konrad, "Motion segmentation with level sets," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 126-130, Oct. 1999, [PDF: 1,324KB].

Motion segmentation is an important problem in video processing and compression, and in computer vision. It is usually performed by either first estimating a field of motion parameters and then segmenting it, or by applying joint motion estimation and segmentation. Motion segmentation methods often constrain the set of possible solutions by forcing motion discontinuities to coincide with intensity discontinuities. In this paper, we propose an iterative method for joint motion estimation and segmentation that is based on level sets . The motion within individual segments is parametric and the method does not use the intensity discontinuity constraint, but is shown to be accurate for images with both synthetic and natural motion compliant with the assumed motion models.

J. Konrad, "View reconstruction for 3-D video entertainment: issues, algorithms and applications," in Proc. Int. Conf. on Image Process. and its Applications, pp. 8-12, July 1999, [PDF: 241KB].

Significant advances in stereoscopic imaging in the last decade have lead to viable applications in medicine, teleoperation and, more recently, in entertainment. Although the stereoscopic technology is still mostly analog, the migration to the digital domain is inevitable. Such a migration creates new challenges for stereoscopic video entertainment, but at the same time offers new opportunities. One particular challenge is the reconstruction of intermediate views (between the left and right cameras), that finds various applications. Below, several algorithms aiming at high-quality view reconstruction, recently developed at INRS, are described, and their relative merits are discussed. Since a practical implementation requires low complexity, results of a study of various models and parameters aiming at computational simplicity are reported.

J. Konrad, "Enhancement of viewer comfort in stereoscopic viewing: parallax adjustment," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 3639, pp. 179-190, Jan. 1999, [PDF: 741KB].

One of the major deficiencies of stereoscopic visualization, viewer discomfort, can be caused by the non-robustness of human perception (hyper-sensitivity to 3-D) or by excessive 3-D cues in the viewed images. In order to minimize this discomfort, the amount of parallax (or "3D-ness") within each stereo pair needs to be reduced. Similarly to the case of "continuous look-around", parallax adjustment requires the knowledge of images from virtual cameras. In the case of parallel geometry, the virtual cameras are located on the line between the true cameras. Since in a general scenario no constraint should be posed on the complexity of the viewed scene, 3-D modeling techniques cannot be used. We evaluate the usefulness of parallax adjustment using two view reconstruction methods based on disparity-compensated linear interpolation: a quadtree method with block splitting adapted to object boundaries and a pixel based (dense) method. For all, but most complex, stereoscopic images tested (ITU-R 601 from CCETT and NHK) both algorithms performed very well, especially the pixel-based approach. In terms of the overall usefulness of parallax adjustment, the initial tests have shown a very favorable viewer response; the perceived depth was judged to vary smoothly from zero (one virtual camera) through natural 3-D (true cameras) to exaggerated 3-D (virtual cameras further apart than the true cameras - extrapolation). The adjustment was convincing although not completely free of distortions.

A.-R. Mansouri, A. Mitiche, and J. Konrad, "Selective image diffusion: application to disparity estimation," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 284-288, Oct. 1998, [PDF: 182KB].

Inverse problems encountered in image processing and computer vision are often ill-posed. Whether set in a Bayesian or energy-based context, such problems require prior assumptions expressed through an a priori probability or a regularization term, respectively. In some cases, the prior term exhibits partial dependence on the observations (e.g., images) that is often ignored to simplify modeling and computations. We briefly review methods that take this dependence into account and we propose a new formulation of the prior term that blends some other simple approaches. Similarly to others, we apply a linear transformation to the prior term but, in addition, we require that the eigenvalues of the transformation have specific properties. These properties are chosen so that diffusion is allowed only along the direction perpendicular to local image gradient. If the gradient magnitude is small, isotropic diffusion is performed. We apply this formulation to stereoscopic disparity estimation and we show several experimental results; improvements over a standard approach are clear.

R. Stasiński and J. Konrad, "Reduced-complexity shape-adaptive DCT for region-based image coding," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 114-118, Oct. 1998, [PDF: 177KB].

We propose a computationally-efficient variant of the shape-adaptive discrete cosine transform (SA-DCT) currently considered for MPEG-4. Although the SA-DCT complexity is acceptable for 8x8 blocks, it is very high when complete regions are processed at once. To reduce the SA-DCT complexity, we replace its 1-D DCT with a quasi-DCT algorithm and we assure that the quasi-DCT basis functions are very close to those of the DCT. Unlike in our previous approach, we carry out an optimization of the shape of low-index basis functions. We test the new method numerically and subjectively, and conclude that, in terms of energy compaction performance, the new method gains up to 0.5dB compared to our previous quasi-DCT approach.

L. Labelle, D. Lauzon, J. Konrad, and E. Dubois, "Arithmetic coding of a lossless contour-based representation of label images," in Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 261-265, Oct. 1998, [PDF: 119KB].

We propose a new method for the encoding of label images (also known as segmentation maps or alpha planes) that are often used to identify object location in region-based image and video coders. The method is contour-based and lossless with a contour model composed of two parts: a contour graph describing the topology of the contour network and a directional chain code to deal with the geometric part of the label image (internal contour points). The graph-based description of the topology is designed to minimize the cost of encoding the nodes, while the directional chain codes are compressed by arithmetic coding. The approach is flexible since separating the contour network into topological and geometrical parts allows the use of other lossless or lossy methods to encode the geometric part without changing the graph representation. The proposed method has been compared with an arithmetic encoder used in MPEG-4.

R. Stasiński and J. Konrad, "Fast quasi-DCT algorithm for shape-adaptive DCT image coding," in Signal Process. IX: Theories and Applications (Proc. Ninth European Signal Process. Conf.), pp. 1505-1508, Sept. 1998, [PDF: 215KB].

In this paper we develop a new variant of the shape-adaptive discrete cosine transform (SA-DCT) recently proposed by Sikora and Makai and currently considered for MPEG-4 as a texture compression engine. We are concerned with the computational complexity of the SA-DCT; although its complexity is acceptable in the context of 8x8 (boundary) blocks as proposed for MPEG-4, it is very high for a true region-based coding where complete regions (e.g., 100 by 100 pixels) need to be processed. We adapt the original SA-DCT scheme by replacing the usual DCT with a quasi-DCT for which some basis functions are identical and some similar to those of the DCT. We test the new method and compare it numerically in terms of the basis restriction error as well as subjectively on some natural images. We conclude that the new method's energy compaction performance is slightly inferior to that of the SA-DCT, but its computational complexity is highly reduced.

A. Mancini and J. Konrad, "Robust quadtree-based disparity estimation for the reconstruction of intermediate stereoscopic images," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 3295, pp. 53-64, Jan. 1998, [PDF: 1343KB], [experimental results].

In stereoscopic/multiview video, the reconstruction of intermediate images is needed to assure continuous motion-parallax and/or comfortable 3-D perception. In this context, we propose a block-based disparity estimation followed by disparity-compensated linear interpolation. We progressively deal with deficiencies of the traditional block matching algorithms. First, we employ a spatial smoothness constraint for disparity to overcome inherent matching ambiguity in low-texture areas. Secondly, as a measure of matching error we use a robust function instead of the quadratic that is sensitive to outliers. We also extend the formulation to include color. Finally, we relax the rigidity of the block support for disparities by employing a quadtree block structure (blocks are allowed to split). The proposed algorithm is implemented in a hierarchical coarse-to-fine fashion with a Gaussian pyramid to reduce the computational burden. To correct luminance and color mismatches between images, a 3-component balancing similar to that proposed by MPEG-2's "Multiview Profile Ad Hoc Group" is used. We tested the proposed algorithm on stereoscopic video sequences acquired in natural surroundings by almost parallel cameras. In informal viewing, every feature of the algorithm listed above resulted in clear improvements of the reconstruction quality. Overall the reconstructed image quality was very good to excellent, depending on the image used.

A.-R. Mansouri and J. Konrad, "Block-based winner-takes-all reconstruction of intermediate stereoscopic images," in Proc. SPIE Visual Communications and Image Process., vol. 3309, pp. 922-933, Jan. 1998, [PDF: 2244KB].

This paper addresses the issue of the reconstruction of intermediate views from a pair of stereoscopic images. Such a reconstruction is needed for the enhancement of depth perception in stereoscopic systems, e.g., ``continuous look around'' or adjustment of virtual camera baseline. The algorithm proposed here addresses the issue of blur; unlike typical reconstruction algorithms that perform averaging between disparity-compensated left and right images the new algorithm uses non-linear filtering via a winner-takes-all strategy. The image under reconstruction is assumed to be a tiling by fixed-size blocks that come from various positions of either the left or right images using disparity compensation. The tiling map is modeled by a binary decision field while the disparity model is based on a smoothness constraint. The models are combined through a maximum a posteriori probability (MAP) criterion. The intermediate intensities, disparities and the binary decision field are estimated jointly using the expectation-maximization (EM) algorithm. The proposed algorithm is compared experimentally with a reference block-based algorithm employing linear filtering. Although the improvements are localized and often subtle, they demonstrate that a high-quality intermediate view reconstruction for complex scenes is feasible if camera convergence angle is small.

C.-H. Yang and J. Konrad, "Motion-based video segmentation using continuation method and robust cost functions," in Proc. SPIE Visual Communications and Image Process., vol. 3309, pp. 774-785, Jan. 1998, [PDF: 1005KB].

We propose a new approach to spatial segmentation of video sequences that is based on motion attributes. The approach, similarly to some previous efforts, uses Markov random field models and maximum a posteriori probability estimation. Our approach is novel in three ways. First, we propose a general formulation for the joint motion estimation and segmentation of which the segmentation problem is a special case (piecewise-constant translational motion). Secondly, instead of the usual quadratic models (Gaussian likelihood) we propose a robust estimation criterion that eliminates the impact of outliers on the estimates. Thirdly, since solving the segmentation problem directly in the space of discrete labels is difficult (e.g., because of the high dependence on the initial state), we opt for a continuation method over a Gaussian pyramid. Thus, the estimation process starts as a motion estimation and then slowly converges towards a motion-based segmentation by ``hardening'' the smoothness constraint. The final result is a quasi-segmentation , i.e., the estimated vector field is continuous but almost piecewise constant, and must undergo subsequent quantization. We show experimental results on two natural image sequences; the resulting quasi-segmentations clearly extract moving objects. The method may serve as an initial stage for joint motion estimation and segmentation, or may produce final segmentations if suitable post-processing is applied.

E. Dubois, J. Konrad, and S. Cantet, "Estimation of nonlinear transfer curves for conversion of color images to a known color space," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 26-29, Oct. 1997, [PDF: 216KB].

This paper presents a supervised algorithm for estimating the unknown nonlinearity undergone by the three color components of an image in the image acquisition process. The algorithm is based on the rank-one hypothesis, which postulates that the linear tristmulus values in a region of uniform surface color lie on a straight line through the origin. An objective function is formulated whose minimization yields the estimate of the unknown nonlinearity. Images corrected with the estimated inverse nonlinearity are shown to exhibit chromatic properties that are much more piecewise constant that in the original image. This propoerty will be very useful in quantization and segmentation applications.

R. Stasiński and J. Konrad, "DCT-based shape-adaptive transform for region-oriented image compression and manipulation," in Workshop on Image Analysis for Multimedia Interactive Services, (Louvain, Belgium), June 1997, [PDF: 174KB].

In the paper a new DCT-based shape-adaptive transform algorithm is presented. The transform is derived from the DCT algorithm flowgraph by substitution of operations in such a way that region and background samples are not mixed together. The computational complexity of the algorithm is of the same rank as that of the DCT and significantly lower than that of the state-of-the-art shape-adaptive transforms. Preliminary experiments show that the new algorithm performs better than the direct DCT (with extrapolation) and is only slightly inferior to the approaches of Gilge et al. and of Sikora and Makai.

R. Stasiński and J. Konrad, "A new approach to generation of shape-adaptive transforms," in Int. Workshop on Systems, Signals and Image Process., (Poznań, Poland), pp. 13-16, May 1997, [PDF: 147KB].

In the paper we describe a new approach to generation of orthogonal transforms that self-adapt to arbitrary shapes. The new algorithms are derived from flowgraphs of standard fast transform algorithms by a suitable modification of their substructures. For simplicity we show how to derive a shape-adaptive transform from the discrete Walsh-Hadamard transform (DWHT) flowgraph. We compare performance and computational complexity of new algorithms with those of several well-known approaches. It can be clearly seen that for DCT the proposed approach gives a very beneficial performance/complexity ratio compared to other well-known techniques.

J. Konrad and V.-N. Dang, "Coding-oriented video segmentation inspired by MRF models," in Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 909-912, Sept. 1996, [PDF: 242KB].

This paper presents an approach to the segmentation of video sequences that is inspired by Markov random field (MRF) models and is aimed at region-based video compression. Two goals of the segmentation algorithm are considered: to assure a rate-efficient partitioning of video sequences and to provide regions that are meaningful for human observers (``coding for content''). To address both issues we extend our earlier work; we incorporate a segmentation complexity measure to account for the rate allocated to region shape, we use a robust error criterion to reject outliers in the intensity residual and we incorporate a temporal consistency constraint to assure the continuity of segmentation in time. We demonstrate improvements in the segmentation for real videoconferencing sequences.

C. Stiller and J. Konrad, "A region-adaptive transform based on a stochastic model," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 264-267, Oct. 1995, [PDF: 182KB].

This paper is concerned with linear transforms for arbitrarily-shaped image segments. In contrast to other techniques described in the literature, the proposed transform is based upon a stochastic model of image covariance within the considered region. Emerging from a separable stationary Markov model proposed for rectangular regions, we derive a non-stationary Markov model with natural boundary conditions. We compute it eigentransform, which is the optimum linear transform under a broad variety of performance measures. For the special case of a rectangular region, the method yields the DCT basis functions. Simulation results for natural imagery are provided.

V.-N. Dang, A.-R. Mansouri, and J. Konrad, "Motion estimation for region-based video coding," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 189-192, Oct. 1995, [PDF: 325KB].

Region-based video compression has been a very active research area over the last few years. It has been viewed as a potential alternative to traditional schemes suffering from the ``blockiness'' of image intensities at very low bit rates. In this paper we present a new approach to region-based representation and estimation of motion. It is based on the observation that motion boundaries usually coincide with region boundaries. Thus, we first compute an intensity-based image partition and use it as an initial step in a 3-step algorithm: motion estimation for intensity-derived regions, motion-based region fusion and adjustment of region boundaries. We present experimental results for standard QCIF images and compare our method with block matching and dense motion field estimation. We also study the performance loss due to a lossy transmission of partition information.

J. Konrad, M. Zaremba, G. Chan, and M. Gaudreau, "Parallel computation of dense motion fields using a Hopfield network," in Proc. Scand. Conf. Image Analysis, SCIA'95, pp. 609-616, June 1995, [PDF: 254KB].

Motion of pixels in time-varying images plays an essential role in video compression. Therefore, to build practical video coders motion estimation must be carried out in real time. Usually, simple motion models executed on a sequential processor achieve that goal; VLSI circuits implementing block matching are used in MPEG and H.261 coders. An alternative is to use more complex motion models that can be implemented on a parallel architecture, e.g., single-instruction multiple-data (SIMD) system. In this paper, we study a different approach to the parallelization of motion estimation, an approach based on neural networks. We formulate the problem in the context of a Markov random field (MRF) model, derive a cost function for minimization and propose a solution method using a Hopfield network. We simulate the network on a sequential processor and compare its performance with a sequential algorithm based on the Gauss-Newton minimization.

C. Stiller and J. Konrad, "Eigentransforms for region-based image processing," in Proc. Int. Conf. on Consumer Electronics, pp. 286-287, June 1995, [PDF: 146KB].

Linear transforms such as the DCT are efficient for image compression. While known transforms that approximate the eigentransform are limited to rectangular regions, this paper proposes a model for construction of eigentransforms for arbitrarily-shaped image segments.

J. Konrad, A.-R. Mansouri, E. Dubois, V.-N. Dang, and J.-B. Chartier, "On motion modeling and estimation for very low bit rate video coding," in Proc. SPIE Visual Communications and Image Process., vol. 2501, pp. 262-273, May 1995, [PDF: 598KB].

In video coding at high compression rates, e.g., in very low bit rate coding, every transmitted bit carries a significant amount of information that is related either to motion parameters or to intensity residual. As demonstrated in the SIM-3 coding scheme, a more precise motion model leads to improved quality of coded images when compared with the H.261 coding standard. In this paper, we present some of our recent results on the modeling and estimation of motion for the compression and post-processing of typical videophone (``head-and-shoulders'') image sequences. We describe a block-based motion estimation that attempts to optimize the overall bit budget for intensity residual, motion and overhead information. We compare simulation results for this scheme with full-search block matching in the context of the H.261 coding. Then, we discuss a region-based motion estimation that exploits segmentation maps obtained from an MDL-based (minimum description length) algorithm. We compare experimentally several algorithms for the compression of such maps. Finally, we describe motion-compensated interpolation that takes into account pixel acceleration. We show experimentally a major performance improvement of the constant-acceleration model over the usual constant-velocity models. This is a very promising technique for post-processing in the receiver to improve reconstruction of frames dropped in the transmitter.

L. Bonnaud, C. Labit, and J. Konrad, "Interpolative coding of image sequences using temporal linking of motion-based segmentation," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 2265-2268, May 1995, [PDF: 365KB].

This paper presents a new temporal interpolation algorithm based on segmentation of images into polygonal regions undergoing affine motion. The goal of this work is to improve upon the block-based interpolation used in MPEG (B-Frames). In the first part, we describe the region-based framework and the temporal linking algorithm that jointly provide the segmentation and motion parameters. In the second part, we present various applications of the proposed algorithm to temporal interpolation (from interpolation to bidirectional motion-compensated prediction). We examine one of these schemes in detail, including the special processing of occlusion areas. We show images reconstructed from a synthetic image sequence and using the MSE criterion we compare quality with other schemes.

M. Chahine and J. Konrad, "Motion-compensated interpolation using trajectories with acceleration," in Proc. SPIE Digital Video Compression: Algorithms and Technology, vol. 2419, pp. 152-163, Feb. 1995, [PDF: 816KB].

This paper is primarily concerned with motion-compensated interpolation of video sequences using multiple images. Due to the extended temporal support of such motion compensation, linear (constant-velocity) trajectory model is often inappropriate, for example due to insufficient temporal sampling. Recently, we have proposed a quadratic (constant-acceleration) trajectory model and a framework for the computation of its parameters. The approach is based on Markov random field (MRF) models that lead to a regularized formulation solved by multiresolution deterministic relaxation. In this paper, we demonstrate advantages of using accelerated motion over linear trajectories in a plausible application using natural data. We apply the estimated trajectories to motion-compensated interpolation over multiple frames of progressive and interlaced video sequences. The experimental results for ``Miss America'' (CIF) and ``Femme et arbre'' (interlaced) show, respectively, a 4 and 2 dB average improvement in the PSNR of the reconstruction error when quadratic trajectories are used instead of the linear ones. It is interesting to note that in ``Miss America'' the most significant improvements can be observed in the area of the mouth and the eyes which are in fact likely to exhibit acceleration. We envisage an application of the proposed method to post-processing in very low bit rate video coding.

P. Treves and J. Konrad, "Motion estimation and compensation under varying illumination," in Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 373-377, Nov. 1994, [PDF: 343KB].

In this paper we propose a new approach to motion-compensated filtering of image sequences that contain time-varying illumination. There are two contributions in this paper. First, we propose a new method for the estimation of dense 2-D motion that is robust to time-varying illumination often present in images. We define the structural model that is based on the assumption of intensity gradient constancy along motion trajectories. This is in contrast to the usual hypothesis of the intensity constancy. Secondly, we apply the proposed approach to motion-compensated temporal interpolation. We compare the image reconstruction error obtained using the new approach with the error obtained for standard models.

M. Chahine and J. Konrad, "Estimation of trajectories for accelerated motion from time-varying imagery," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 800-804, Nov. 1994, [PDF: 274KB].

This paper is concerned with the estimation of trajectories for accelerated motion from image sequences. Unlike in many other approaches, that assume linear trajectories, we propose a quadratic model that incorporates both velocity and acceleration. This model corresponds better to practical applications especially when the estimation is performed over several images, e.g., in motion-compensated processing with extended temporal support. This is due to the fact that over longer time frame and in the presence of acceleration, quadratic trajectory is capable of providing a better intensity match than a simple displacement. The algorithm for the estimation of dense accelerated motion fields is formulated in this paper using regularization and the solution is based on deterministic relaxation implemented over a pyramid of resolutions. Extensive experimental results for test images with synthetic motion are presented.

J. Konrad and P. Treves, "Estimation of dense 2-D motion based on the constancy of intensity gradient," in Signal Process. VII: Theories and Applications (Proc. Seventh European Signal Process. Conf.), pp. 684-687, Sept. 1994, [PDF: 268KB].

This paper describes a new approach to the estimation of dense 2-D motion from image sequences. Unlike in many other approaches that assume the constancy of image intensity along motion trajectories, we propose to use a higher order model that permits a variation of such intensity. We define the structural model that is based on the assumption of intensity gradient constancy along motion trajectories. This model has been proposed before, however in formulations that require exact satisfaction of the intensity gradient constraint. Due to inherent noise, aliasing, etc. present in images such solution necessitates additional post-processing , for example smoothing. We propose a different approach that is based on simultaneous estimation and smoothing. We formulate the problem using regularization where the assumptions of gradient constancy and of motion smoothness are combined into a single cost function. We minimize this function by an iterative method. We demonstrate estimation results for the original and for the ``regularized'' approach on natural image sequences.

H. Nicolas, J. Konrad, and C. Labit, "Joint estimation of motion and illumination variations for coding of image sequences," in Proc. Scandinavian Conf. Image Analysis, pp. 507-514, May 1993, [PDF: 156KB].

This paper describes a new approach to the problem of motion estimation for the coding of image sequences. The goal is to obtain an efficient description (parametrization) of temporal variations between two successive images in a sequence. To achieve this we propose to use the standard hypothesis of luminance constancy along a motion trajectory simultaneously introducing a polynomial representation of illumination variations. The estimation process consists of two iteratively alternating stages: a region-based estimation of apparent 2D motion parameters and an estimation of 2D illumination variations. Such an approach reduces the residual reconstruction error after motion compensation due to improved estimation of motion parameters.

E. Dubois and J. Konrad, "Motion estimation and motion-compensated filtering of video signals," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 95-98, Apr. 1993, [PDF: 117KB].

This paper is concerned with methods to estimate 2-D motion in time-varying images for application to motion-compensated filtering. The approach is based on the minimization of objective functions that can be interpreted as energies of suitable Markov-Gibbs random fields. A flexible class of cost functions is described that can be applied in a wide variety of specific applications, including the estimation of motion trajectories over several image frames. The issues of minimizing the cost function and applications to motion-compensated filtering are then briefly addressed.

J. Radecki, J. Konrad, and E. Dubois, "Design of finite wordlength 2-D IIR filters using simulated annealing," in Signal Process. VI: Theories and Applications (Proc. Sixth European Signal Process. Conf.), pp. 953-956, Aug. 1992, [PDF: 188KB].

This paper proposes a new approach to the design of two-dimensional (2-D) infinite impulse response (IIR) filters with finite precision coefficients. An objective function is proposed which combines magnitude, phase, step response and stability errors. This function being multidimensional and, in general, non-convex is minimized using simulated annealing . Development of this method constitutes the first step in a feasibility study of the application of 2-D IIR filters to the processing of video signals. Initial results on the design of low-pass filters are very encouraging and compare favourably with similar finite impulse response (FIR) designs.

J. Konrad, "Use of colour in gradient-based estimation of dense two-dimensional motion," in Proc. Conf. Vision Interface VI'92, pp. 103-109, May 1992, [PDF: 539KB].

This paper presents a gradient-based approach to the multi-constraint estimation of dense two-dimensional (2-D) motion. The formulation is based on feature-invariance along motion trajectories and applies motion smoothness constraint to reduce ill-posedness. It permits the use of various image features as the input, for example intensity and colours, or sub-bands of a spectral decomposition. The proposed cost function is minimized using a sequence of quadratic approximations of the matching error and solving the resulting linear system by deterministic relaxation. The proposed algorithm is a generalization of the Horn and Schunck algorithm to the case of vector data. Results of application of the proposed technique to the estimation of 2-D motion from TV images are shown. The obtained motion fields are applied to motion-compensated temporal interpolation resulting in significant but localized improvements.

J. Konrad, J. Radecki, and E. Dubois, "On the design of finite wordlength IIR filters for video applications," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. 4, pp. 341-344, Mar. 1992, [PDF: 135KB].

This paper addresses the problem of designing finite precision one-dimensional (1-D) infinite impulse response (IIR) digital filters for video processing. The design algorithm is based on simultaneous minimization of magnitude, phase and stability errors in a discrete space of solutions using simulated annealing . It is demonstrated that the approach results in filters characterized by a substantially reduced non-linearity of the phase response in filter pass band, which is critical in any video processing application. To reduce image degradations due to ripples of the filter step response, another error term is introduced into the cost function. It is demonstrated that this additional term permits significant reduction of step response overshoots, and thus the visibility of degradations in a filtered image. The designed IIR filters are compared with their finite impulse response (FIR) counterparts in terms of characteristic parameters as well as distortion visibility in processed images.

J. Radecki, J. Konrad, and E. Dubois, "Design of finite wordlength IIR filters with prescribed magnitude, group delay and stability properties using simulated annealing," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 1637-1640, May 1991, [PDF: 145KB].

This paper investigates the problem of designing finite precision one-dimensional (1-D) infinite impulse response (IIR) filters with prescribed magnitude, phase and stability constraints. The design problem is formulated as the minimization of a cost function incorporating these conflicting requirements. The first two elements of the cost function express magnitude and group delay errors between the desired and the actual frequency responses of a filter, while the third one is related to its stability margin. This cost function is minimized using simulated annealing based on the Metropolis algorithm . Examples of several finite wordlength filters designed by the above method are presented and compared with Chebyshev and elliptic filters with rounded coefficients.

E. Dubois and J. Konrad, "Review of techniques for motion estimation and motion compensation," in Proc. Int. Coll. Advanced Television Syst., pp. 3B.3.1-3B.3.19, June 1990.

J. Radecki, J. Konrad, and E. Dubois, "A comparison of simulated annealing and N-step newton methods for designing 1-D and 2-D finite wordlength FIR filters," in Proc. Canadian Conf. Electr. Comp. Eng., pp. 53.3.1-53.3.4, Sept. 1990.

J. Konrad and E. Dubois, "A comparison of stochastic and deterministic solution methods in Bayesian estimation of 2-D motion," in Proc. European Conf. Computer Vision, pp. 149-160, Apr. 1990.

J. Konrad and E. Dubois, "Use of colour information in Bayesian estimation of 2-D motion," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 2205-2208, Apr. 1990.

This paper is concerned with extension of previous work on the Bayesian estimation of 2-D motion from image sequences by incorporating the colour cue into the estimation process. Instead of scalar image intensity, three-component vector representation of colour images is used, thus allowing Y-C1-C2, RGB or other formats. The Maximum a Posteriori Probability estimation is shown to result in a three-term energy minimization. White Gaussian noise model is used for the displaced pel differences of each image component, and a coupled vector-binary Markov random field model is used for displacement and discontinuity fields. The resulting criterion is optimized using the method of discrete state space simulated annealing. Improvements in the quality of estimated displacement fields due to additional colour information are demonstrated through several experimental results.

J. Konrad and E. Dubois, "Bayesian estimation of discontinuous motion in images using simulated annealing," in Proc. Conf. Vision Interface VI'89, pp. 51-60, June 1989.

J. Konrad and E. Dubois, "Multigrid Bayesian estimation of image motion fields using stochastic relaxation," in Proc. IEEE Int. Conf. Computer Vision, pp. 354-362, Dec. 1988.

J. Konrad, "Stochastic estimation of motion in television images," in Proc. 3-rd Conf. on Science and Technology ``Signal Processing in Telecommunications, Control and Inspection'', Poland, Sept. 1988 (in Polish).

J. Konrad and E. Dubois, "Estimation of image motion fields: Bayesian formulation and stochastic solution," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 1072-1075, Apr. 1988.

[Home] [Biography] [Publications] [Research] [Students] [Courses]