BU Logo
ECE Logo
ISS Laboratory Logo
BU Linux logo
Publications

Journal paper abstracts

Z. Li, P. Ishwar, and J. Konrad, "Video condensation by ribbon carving," IEEE Trans. Image Process., vol. 18, pp. 2572-2583, Nov. 2009, [PDF: 2,617KB].

Efficient browsing of long video sequences is a key tool in visual surveillance, e.g., for post-event video forensics, but can also be used for fast review of motion pictures and home videos. While frame skipping (fixed or adaptive) is straightforward to implement, its performance is quite limited. Although more efficient techniques have been developed, such as video summarization and video montage, they lose either the temporal or semantic context of events. A recently-proposed method called video synopsis deals with some of these issues but involves multiple processing stages and is fairly complex. Video condensation , that we propose here, is novel in the way information is removed from the space-time video volume, is conceptually simple and relatively easy to implement. We introduce the concept of a video ribbon inspired by that of a seam recently proposed for image resizing. We recursively carve ribbons out by minimizing an activity-aware cost function using dynamic programming. The ribbon model we develop is flexible and permits an easy adjustment of the compromise between temporal condensation ratio and anachronism of events. We also propose a sliding-window ribbon carving to handle streaming video and demonstrate method's efficiency on motor and pedestrian traffic data.

J. McHugh, J. Konrad, V. Saligrama, and P.-M. Jodoin, "Foreground-adaptive background subtraction," IEEE Signal Process. Lett., vol. 16, pp. 390-393, May 2009, [PDF: 256KB].

Background subtraction is a powerful mechanism for detecting change in a sequence of images that finds many applications. The most successful background subtraction methods apply probabilistic models to background intensities evolving in time; non-parametric and mixture-of-Gaussians models are but two examples. The main difficulty in designing a robust background subtraction algorithm is the selection of a detection threshold. In this paper, we adapt this threshold to varying video statistics by means of two statistical models. In addition to a non-parametric background model, we introduce a foreground model based on small spatial neighborhood to improve discrimination sensitivity. We also apply a Markov model to change labels to improve spatial coherence of the detections. The proposed methodology is applicable to other background models as well.

S. Ince and J. Konrad, "Occlusion-aware view interpolation," EURASIP J. Image and Video Process., vol. 2008, Article ID 803231, 15 pages, 2008, doi:10.1155/2008/803231, [PDF: 6,929KB].

View interpolation is an essential step in content preparation for multiview 3D displays, free-viewpoint video and multiview image/video compression. It is performed by establishing a correspondence among views, followed by interpolation using the corresponding intensities. However, occlusions pose a significant challenge, especially if few input images are available. In this paper, we identify challenges related to disparity estimation and view interpolation in presence of occlusions. We then propose an occlusion-aware intermediate view interpolation algorithm that uses four input images to handle the disappearing areas. The algorithm consists of three steps. First, all pixels in view to be computed are classified in terms of their visibility in the input images. Then, disparity for each pixel is estimated from different image pairs depending on the computed visibility map. Finally, luminance/color of each pixel is adaptively interpolated from an image pair selected by its visibility label. Extensive experimental results show striking improvements in interpolated image quality over occlusion-unaware interpolation from two images and very significant gains over occlusion-aware spline-based reconstruction from four images, both on on synthetic and real images. Although improvements are obvious only in the vicinity of object boundaries, this should be useful in high-quality 3D applications, such as digital 3D cinema and ultra-high resolution multi-view autostereoscopic displays, where distortions at depth discontinuities are highly objectionable, especially if they vary with viewpoint change.

S. Ince and J. Konrad, "Occlusion-aware optical flow estimation," IEEE Trans. Image Process., vol. 17, pp. 1443-1451, Aug. 2008, [PDF: 1,222KB].

Optical flow can be reliably estimated between areas visible in two images, but not in occlusion areas. If optical flow is needed in the whole image domain, one approach is to use additional views of the same scene. If such views are unavailable, an often-used alternative is to extrapolate optical flow in occlusion areas. Since the location of such areas is usually unknown prior to optical flow estimation, this is usually performed in three steps. First, occlusion-ignorant optical flow is estimated, then occlusion areas are identified using the estimated (unreliable) optical flow, and, finally, the optical flow is corrected using the computed occlusion areas. This approach, however, does not permit interaction between optical flow and occlusion estimates. In this paper, we permit such interaction by proposing a variational formulation that jointly computes optical flow, implicitly detects occlusions and extrapolates optical flow in occlusion areas. The extrapolation mechanism is based on anisotropic diffusion and uses the underlying image gradient to preserve structure, such as optical flow discontinuities. Our results show significant improvements in the computed optical flow fields over other approaches, both qualitatively and quantitatively.

P.-M. Jodoin, M. Mignotte, and J. Konrad, "Statistical background subtraction using spatial cues," IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 1758-1763, Dec. 2007, [PDF: 571KB].

Most statistical background subtraction techniques are based on the analysis of temporal color/intensity distribution. However, learning statistics on a series of time frames can be problematic, especially when no frame absent of moving objects is available or when the available memory isn't sufficient to store the series of frames needed for learning. In this paper, we propose a spatial variation to the traditional temporal framework. The proposed framework allows statistical motion detection with methods trained on one background frame instead of a series of frames as is usually the case. Our framework includes two spatial background subtraction approaches suitable for different applications. The first approach is meant for scenes having a non-static background due to noise, camera jitter or animation in the scene (e.g., waving trees, fluttering leaves). This approach models each pixel with two PDFs: one unimodal PDF and one multimodal PDF, both trained on one background frame. In this way, the method can handle backgrounds with static and non-static areas. The second spatial approach is designed to use as little processing time and memory as possible. Based on the assumption that neighboring pixels often share similar temporal distribution, this second approach models the background with one global mixture of M Gaussians.

J. Konrad and M. Halle, "3-D displays and signal processing: An answer to 3-D ills?," IEEE Signal Process. Mag., vol. 24, pp. 97-111, Nov. 2007, [PDF: 566KB].

Three-dimensional (3-D) perception is an intrinsic part of the human experience. While most people gain the majority of their spatial information through vision, and approximately 90% of the population benefit from stereopsis, display systems have historically reproduced only two-dimensional depth cues. Over the last 150 years, many attempts have been made to exploit stereopsis in various 3-D displays; while several achieved limited commercial success, none have attained equal status to their 2-D counterparts. Today, novel electronic display technologies, powerful microprocessors, and advanced signal processing algorithms are about to open a new era for 3-D displays. Signal processing specifically focused on 3-D imaging will, in large part, determine the viability of these emerging 3-D display systems. In this paper, we overview today's main electronic 3-D display technologies from a signal processing perspective. We describe the underlying physics, and point out benefits and deficiencies of various displays. We discuss the general role of signal processing and provide specific examples of signal processing helping address certain display deficiencies. We highlight challenges awaiting signal processing in quest of the ultimate 3-D experience.

L. Oddsson, R. Karlsson, J. Konrad, S. Ince, S. Williams, and E. Zemkova, "A rehabilitation tool for functional balance using altered gravity and virtual reality," Journal of NeuroEngineering and Rehabilitation, vol. 4 (25), July 2007.

M. Mendillo, S. Laurent, J. Wilson, J. Baumgardner, J. Konrad, and W. Karl, "The sources of sodium escaping from Io revealed by spectral high definition imaging," Nature, vol. 448, pp. 330-332, July 2007.

P. McNerney, J. Konrad, and M. Betke, "Block-based MAP disparity estimation under alpha-channel constraints," IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 785-789, June 2007, [PDF: 2,962KB].

Disparity estimation belongs to the most important, but difficult, problems in image processing and computer vision. Its importance stems from a wide range of applications, while its difficulty is related to ill-posedness. To date, numerous disparity estimation algorithms have been developed. In this paper, we consider a particular case of disparity estimation based on two views and a known alpha channel partitioning each view into foreground and background. The main idea is to use this partitioning in order to enhance disparity estimation in the foreground object close to its boundary. We propose a block-based disparity model with two alpha-channel constraints: a photometric one, disabling invalid intensity/color matches, and a geometric one, preventing disparity smoothing between foreground and background. We incorporate these constraints into a Bayesian framework using the maximum a posteriori probability criterion. We experimentally demonstrate improvements in the estimated disparities at foreground object boundaries, and show examples of image relighting using these disparities.

J. Konrad, "Videopsy: Dissecting visual data in space-time," IEEE Comm. Mag., vol. 45, pp. 34-42, Jan. 2007, [PDF: 977KB].

Network camera, made possible by recent advances in the integration of sensing, compression and communication hardware, is a new video source that can be easily deployed and remotely managed. Unobtrusively located along highways, at airports or in office buildings such cameras can form a visual sensor network, or camera web, an extremely rich source of visual information. In its infancy today, camera web deployment will likely accelerate in the future and one can expect visual sensing devices to eventually become as ubiquitous as electric bulbs. While the capturing hardware has evolved tremendously, hardware and algorithms necessary for effective analysis and efficient communication of multi-camera data clearly lag. In this paper, I overview one particular aspect of visual data analysis, namely space-time video segmentation that is often a pre-requisite for motion estimation, video compression, event detection, scene understanding, etc. I introduce the concept of object tunnel, a 3-D surface in space-time through which a video object travels, and the associated concept of occlusion volume. I present examples of object tunnels and occlusion volumes on surveillance data that, upon further processing, may lead to automatic event detection or scene understanding. Finally, I describe challenges in extending video analysis algorithms to visual sensor networks, and I outline some approaches possible.

M. Ristivojević and J. Konrad, "Space-time image sequence analysis: object tunnels and occlusion volumes," IEEE Trans. Image Process., vol. 15, pp. 364-376, Feb. 2006, [PDF: 2,013KB].

We address the issue of image sequence analysis jointly in space and time. While typical approaches to such an analysis consider two image frames at a time, we propose to perform this analysis jointly over multiple frames. We concentrate on spatio-temporal segmentation of image sequences and on analysis of occlusion effects therein. The segmentation process is three-dimensional (3-D); we search for a volume carved out by each moving object in the image sequence domain, or ``object tunnel'', a new space-time concept. We pose the problem in variational framework by using only motion information (no intensity edges). The resulting formulation can be viewed as volume competition, a 3-D generalization of region competition. We parameterize the unknown surface to be estimated, but rather than using an active-surface approach, we embed it into a higher-dimensional function and apply the level-set methodology. We first develop simple models for the detection of moving objects over static background; no motion models are needed. Then, in order to improve segmentation accuracy, we incorporate motion models for objects and background. We further extend the method by including explicit models for occluded and newly-exposed areas that lead to ``occlusion volumes'', another new space-time concept. Since in this case multiple volumes are sought, we apply a multiphase variant of the level-set method. We present various experimental results for synthetic and natural image sequences.

J. Konrad and P. Agniel, "Subsampling models and anti-alias filters for 3-D automultiscopic displays," IEEE Trans. Image Process., vol. 15, pp. 128-140, Jan. 2006, [PDF: 921KB].

A new type of 3-D display recently introduced on the market holds great promise for the future of 3-D visualization, communication and entertainment. This so-called automultiscopic display can deliver multiple views without glasses thus allowing a limited ``look-around'' (correct motion-parallax). Central to this technology is the process of multiplexing several views into a single viewable image. This multiplexing is a complex process involving irregular subsampling of the original views. If not preceded by lowpass filtering, it results in aliasing that leads to texture as well as depth distortions. In order to eliminate this aliasing, we propose to model the multiplexing process with lattices, find their parameters and then design optimal anti-alias filters. To this effect, we use multi-dimensional sampling theory and basic optimization tools. We derive optimal anti-alias filters for a specific automultiscopic monitor using three models: orthogonal lattice, non-orthogonal lattice and union of shifted lattices. In the first case, the resulting separable low-pass filter offers significant aliasing reduction that is further improved by hexagonal-passband lowpass filter for the non-orthogonal lattice model. A more accurate model is obtained using union of shifted lattices, but due to the complex nature of repeated spectra practical filters designed in this case offer no additional improvement. We also describe a practical method to design finite-precision, low-complexity filters that can be implemented using modern graphics cards.

R. Stasiński and J. Konrad, "POCS reconstruction of irregularly-sampled images based on oversampling and linear space-variant filtering," Sampling Theory in Signal and Image Processing, vol. 5, pp. 37-58, Jan. 2006, [PDF: 470KB].

Image reconstruction from irregularly-spaced samples is becoming a pivotal element of advanced video processing and compression tasks. Typically, irregular sample positions are due to the process of motion compensation, and can result in areas void of data (divergent motion, occlusions areas). Since sample positions do not obey constraints required by irregular-sampling theorems, alternative, for example approximate, reconstruction methods are needed. In this paper, we describe an image reconstruction method from irregularly-spaced samples based on the theory of projection onto convex sets (POCS). Similarly to other POCS-based image reconstruction methods our approach applies two projection operators: bandwidth limitation and sample substitution. Unlike other methods, however, our algorithm is implemented on an oversampled lattice. Although the method performs well, it can be optimized to deal efficiently only with either densely- or sparsely-sampled image areas, but not with both types of area simultaneously. In order to address this issue, we propose to replace the usual linear space-invariant filtering with linear space-variant filtering. We develop a filter adaptation strategy that selects suitable filter depending on the local density of irregularly-spaced input samples. We further improve the method by adapting filter bandwidth to the progress of image reconstruction. We experimentally demonstrate efficacy of the method on disparity compensation in the context of stereoscopic 3-D imaging.

N. Božinović and J. Konrad, "Motion analysis in 3D DCT domain and its application to video coding," Signal Process., Image Commun., vol. 20, pp. 510-528, July 2005, [PDF: 1,950KB], 2004-2005 EURASIP Image Communication Best Paper Award.

Global, constant-velocity, translational motion in an image sequence induces a characteristic energy footprint in the Fourier-transform (FT) domain; spectrum is limited to a plane with orientation defined by the direction of motion. By detecting these spectral occupancy planes, methods have been proposed to estimate such global motion. Since the discrete cosine transform (DCT) is a ubiquitous tool of all video compression standards to date, we investigate in this paper properties of motion in the DCT domain. We show that global, constant-velocity, translational motion in an image sequence induces in the DCT domain spectral occupancy planes, similarly to the FT domain. Unlike in the FT case, however, these planes are subject to spectral folding. Based on this analysis, we propose a motion estimation method in the DCT domain, and we show that results comparable to standard block matching can be obtained. Moreover, by realizing that significant energy in the DCT domain concentrates around a folded plane, we propose a new approach to video compression. The approach is based on 3D DCT applied to a group of frames, followed by motion-adaptive scanning of DCT coefficients (akin to ``zig-zag'' scanning in MPEG coders), their adaptive quantization, and final entropy coding. We discuss the design of the complete 3D DCT coder and we carry out a performance comparison of the new coder with ubiquitous hybrid coders.

C. Vázquez, E. Dubois, and J. Konrad, "Reconstruction of irregularly-sampled images in spline spaces," IEEE Trans. Image Process., vol. 14, pp. 713-725, June 2005, [PDF: 2,813KB].

This paper presents a novel approach to the reconstruction of images from irregularly-spaced samples. This problem is often encountered in digital image processing applications. Non-recursive video coding with motion compensation, spatio-temporal interpolation of video sequences and generation of new views in multi-camera systems are three possible applications. We propose a new reconstruction algorithm based on a spline model for images. We use regularization since this is an ill-posed inverse problem. We minimize a cost function composed of two terms: one related to the approximation error and the other related to the smoothness of the modeling function. All the processing is carried out in the space of spline coefficients; this space is discrete although the problem itself is of a continuous nature. The coefficients of regularization and approximation filters are computed exactly by using the explicit expressions of B-spline functions in the time domain. The regularization is carried out locally while the computation of the regularization factor accounts for the structure of the irregular sampling grid. The linear system of equations obtained is solved iteratively. Our results show a very good performance in motion-compensated interpolation applications.

A.-R. Mansouri and J. Konrad, "Multiple motion segmentation with level sets," IEEE Trans. Image Process., vol. 12, pp. 201-220, Feb. 2003, [PDF: 4,753KB].

Segmentation of motion in an image sequence is one of the most challenging problems in image processing, while at the same time one that finds numerous applications. To date, a wealth of approaches to motion segmentation have been proposed. Many of them suffer from the local nature of the models used. Global models, such as those based on Markov random fields, perform, in general, better. In this paper, we propose a new approach to motion segmentation that is based on a global model. The novelty of the approach is twofold. First, inspired by recent work of other researchers we formulate the problem as that of region competition, but we solve it using the level set methodology. The key features of a level set representation, as compared to active contours , often used in this context, are its ability to handle variations in the topology of the segmentation and its numerical stability. The second novelty of the paper is the formulation in which, unlike in many other motion segmentation algorithms, we do not use intensity boundaries as an accessory; the segmentation is purely based on motion. This permits accurate estimation of motion boundaries of an object even when its intensity boundaries are hardly visible. Since occasionally intensity boundaries may prove beneficial, we extend the formulation to account for the coincidence of motion and intensity boundaries. In addition, we generalize the approach to multiple motions. We discuss possible discretizations of the evolution (PDE) equations and we give details of an initialization scheme so that the results could be duplicated. We show numerous experimental results for various formulations on natural images with either synthetic or natural motion.

R. Stasiński and J. Konrad, "Improved POCS reconstruction of stereoscopic views," Signal Process., Image Commun., vol. 17, pp. 689-704, Oct. 2002, [PDF: 314KB].

This paper presents an application of the projection onto convex sets (POCS) framework to the reconstruction of intermediate stereoscopic views. Such views are needed in 3-D viewing in order to simulate the so-called ``look-around'' as well as to adjust the perceived depth (interocular adjustment). The basic problem in the above reconstruction is that of the recovery of a regularly-sampled image from its irregularly-spaced samples due to disparity compensation. This problem also arises in other image processing and coding applications, such as multiple-frame motion compensation or video frame rate conversion. In our POCS-based approach to view reconstruction, two projection operators are used: bandwidth limitation and sample substitution. The bandwidth limitation can be implemented in the original domain by means of lowpass FIR filtering but we opt for a frequency-domain implementation by means of windowing. The results reported here improve our original POCS-based reconstruction method by locally adapting the algorithm to the density of image samples. We also extend the method to color images through an implementation in the luminance-chrominance space.

K. Belloulata and J. Konrad, "Region-by-region fractal image compression," IEEE Trans. Image Process., vol. 11, pp. 351-362, Apr. 2002, [PDF: 300KB].

Region-based functionality offered by the MPEG-4 video compression standard is also appealing for still images, for example to permit object-based queries of a still-image database. A popular method for still-image compression is fractal coding. However, traditional fractal image coding uses rectangular range and domain blocks. Although new schemes have been proposed that merge small blocks into irregular shapes, the merging process does not, in general, produce semantically-meaningful regions. We propose a new approach to fractal image coding that permits region-based functionalities; images are coded region by region according to a previously-computed segmentation map. We use rectangular range and domain blocks, but divide boundary blocks into segments belonging to different regions. Since this prevents the use of standard dissimilarity measure, we propose a new measure adapted to segment shape. We propose two approaches: one in the spatial and one in the transform domain. While providing additional functionality, the proposed methods perform similarly to other tested methods in terms of PSNR but often result in images that are subjectively better. Due to the limited domain-block codebook size, the new methods are faster than other fractal coding methods tested. The results are very encouraging and show the potential of this approach for various internet and still-image database applications.

J. Konrad, "Visual communications of tomorrow: natural, efficient and flexible," IEEE Comm. Mag., vol. 39, pp. 126-133, Jan. 2001, [PDF: 242KB].

In the last decade, we have witnessed a phenomenal growth of communication and information technologies. These technologies have greatly simplified and even enriched our daily lives; cellular telephony and the Internet are probably the most striking examples. A particularly promising, and at the same time challenging, aspect of both technologies is the transmission and use of visual information. In this paper, I overview the state of visual communication at the end of 20th century, discuss today's challenges and outline some future directions.

A.-R. Mansouri and J. Konrad, "Bayesian winner-take-all reconstruction of intermediate views from stereoscopic images," IEEE Trans. Image Process., vol. 9, pp. 1710-1722, Oct. 2000, [PDF: 1,210KB].

This paper presents a new algorithm for the reconstruction of intermediate views from a pair of still stereoscopic images. The algorithm is designed to address the issue of blur caused by linear filtering often employed in such reconstruction. The proposed algorithm is block-based and to reconstruct the intermediate views employs non-linear disparity-compensated filtering by means of a winner-take-all strategy. The reconstructed image is modeled as a tiling by fixed-size blocks coming from various positions (disparity compensation) of either the left or right images, while the tiling map itself is modeled by a binary decision field. In addition to that, an observation model relating the left and right images via a disparity field, and a disparity field model are used. All models are probabilistic and are combined into a maximum a posteriori probability criterion. The intermediate intensities, disparities and the binary decision field are estimated jointly using the expectation-maximization algorithm. The new approach is compared experimentally on complex natural images with a reference block-based algorithm employing linear filtering. Although the improvements are localized and often subtle, they demonstrate that a high-quality intermediate view reconstruction for complex scenes is feasible.

J. Konrad, B. Lacotte, and E. Dubois, "Cancellation of image crosstalk in time-sequential displays of stereoscopic video," IEEE Trans. Image Process., vol. 9, pp. 897-908, May 2000, [PDF: 242KB].

Stereoscopic visualization systems based on liquid crystal shutter (LCS) eyewear and cathode-ray tube (CRT) displays provide today the best overall quality of 3-D images and therefore have a dominant position in commercial as well as professional markets. Due to the CRT and LCS characteristics, however, such systems suffer from perceptual crosstalk (``shadows'') at object boundaries that can reduce, and at times inhibit, the ability to perceive depth. In this paper, we propose a method to reduce such crosstalk. We present a simple model for intensity leak, we assess model parameters for a time-sequential LCS/CRT system and we propose a computationally-efficient algorithm to eliminate the crosstalk. Since the full crosstalk elimination implies an unacceptable image degradation (reduction of contrast), we study the trade-off between crosstalk elimination and image contrast. We describe experiments on synthetic and natural stereoscopic images and we discuss informal subjective viewing of processed images. Overall, the viewer response has been very positive; 3-D perception of many objects became either much easier or even effortless. Since the proposed algorithm can be easily implemented in real time (only linear scaling and table look-up are needed), we believe that it can be successfully used today in various stereoscopic applications suffering from image crosstalk. This is particularly true in view of the continuously increasing CPU and graphics power of modern PCs.

F. Dufaux and J. Konrad, "Robust, efficient and fast global motion estimation for video coding," IEEE Trans. Image Process., vol. 9, pp. 497-501, Mar. 2000, [PDF: 212KB].

In this paper, we propose an efficient, robust, and fast method for the estimation of global motion from image sequences. The method is generic in that it can accommodate various global motion models, from a simple translation to an 8-parameter perspective model. The algorithm is hierarchical and consists of three stages. In the first stage, a low-pass image pyramid is built. Then, an initial translation is estimated with full-pixel precision at the top of the pyramid using a modified n-step search matching. In the third stage, a gradient descent is executed at each level of the pyramid starting from the initial translation at the coarsest level. Due to the coarse initial estimation and the hierarchical implementation, the method is very fast. To increase robustness to outliers, we replace the usual formulation based on a quadratic error criterion with a truncated quadratic function. We have applied the algorithm to various test sequences within an MPEG-4 coding system. From the experimental results we conclude that global motion estimation provides significant performance gains for video material with camera zoom and/or pan. The gains result from a reduced prediction error and a more compact representation of motion. We also conclude that the robust error criterion can introduce additional performance gains without increasing computational complexity.

R. Stasiński and J. Konrad, "A new class of fast shape-adaptive orthogonal transforms and their application to region-based image compression," IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 16-34, Feb. 1999, [PDF: 344KB].

Region-based approaches to image and video compression have been very actively explored in the last few years. It is widely expected that they will result in rate/quality gains and expanded functionalities. In such approaches one of the essential problems is the representation of luminance and color in arbitrarily-shaped regions. For rectangular blocks extracted from natural images the discrete cosine transform (DCT) has been found to perform close to the eigentransform. Although for arbitrarily-shaped regions orthogonalization-based procedures have been shown to perform very well, their computational complexity and memory requirements are prohibitive for today's technology. Therefore, other approaches are presently investigated and particular attention is paid to low implementation complexity. In this paper, we propose a new class of orthogonal transforms that self-adapt to arbitrary shapes. The new algorithms are derived from flowgraphs of standard fast transform algorithms by a suitable modification of certain butterfly operators. First, we show how to derive a shape-adaptive transform from the discrete Walsh-Hadamard transform (DWHT) flowgraph. Then, we discuss modifications needed to arrive at a DCT-based shape-adaptive transform. We give implementation details of this transform and compare its computational complexity with several well-known approaches. We also evaluate the energy compaction performance of the new transform for both synthetic and natural data. We conclude that the proposed DCT-based shape-adaptive transform gives a very beneficial compaction/complexity ratio compared to other well-known approaches. The complexity of the new method does not exceed the complexity of two non-adaptive DCTs on a circumscribing rectangle, and therefore, unlike other tested methods with comparable energy compaction, it is suitable for large regions. This property should prove very valuable in the future when true region-based image/video compression methods are developed.

C. Stiller and J. Konrad, "Estimating motion in image sequences: A tutorial on modeling and computation of 2D motion," IEEE Signal Process. Mag., vol. 16, pp. 70-91, July 1999, [PDF: 929KB], 2001 IEEE Signal Processing Magazine Award.

This paper addresses the estimation of 2D motion (optical flow) from sequences of images and is intended for readers involved in video processing and compression as well as computer vision. Motion estimation is one of the key techniques helping solve various problems encountered when dealing with image sequences; redundancy elimination in digital video or tracking of moving objects are but two interesting tasks. Due to a strong correlation of image intensities in the direction of motion, operations such as prediction, interpolation or filtering are most efficient when applied along motion trajectories. To compute these trajectories, underlying models need to be specified, estimation criterion must be selected and a search strategy must be implemented. In the paper, we discuss various motion representations and the associated regions of support, as well as models that relate motion parameters to image data. Then, we concentrate on various estimation criteria: from simple ones comprising the displaced frame difference only to complex Bayesian criteria involving multiple terms. Finally, we address search strategies. We describe matching- and gradient-based schemes, deterministic and stochastic relaxation methods including simulated annealing as well as other deterministic approaches such as ``highest confidence first'' and mean field techniques. We sketch multiresolution and multiscale strategies and point out their benefits. No experimental results are included in the paper, however a substantial body of literature is cited; interested readers are referred to earlier work of the authors and of other researchers.

M. Ben Slima, J. Konrad, and A. Barwicz, "Improvement of stereo disparity estimation through balanced filtering: the sliding-block approach," IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 6, pp. 913-920, 1997, [PDF: 939KB].

In a typical disparity (or motion) estimation algorithm developed for inter-image prediction, an interpolation of intensities is applied to one of the two images used. Therefore, non-filtered intensities of the image being predicted are compared with lowpass-filtered intensities of the other image of the stereo pair. Consequently, noise and detail suppression in the two images are unequal. In this paper we propose to apply the same ( balanced ) filtering to both images. In addition to image smoothing that helps avoid unreliable intensity matches, the lowpass filter is used to carry out intensity interpolation at the same time; the computation of sub-pixel attributes is consistent with lowpass filtering of both images unlike arbitrary linear or cubic interpolation applied to one image only. The proposed approach lends itself naturally to a multiresolution implementation. We apply the new approach to stereo disparity estimation based on sliding blocks. Using synthetic and natural data we experimentally compare the new approach with the traditional sliding-block method. For standard stereoscopic images we demonstrate up to 2.4dB reduction of disparity-compensated prediction error over the traditional sliding-block method.

J. Konrad, J. Radecki, and E. Dubois, "The application of two-dimensional finite-precision IIR filters to enhanced NTSC coding," IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 355-374, Aug. 1996, [PDF: 857KB].

The goal of this paper is to study the application of two-dimensional (2-D) finite-precision infinite impulse response (IIR) filters to enhanced NTSC coding. It is well-known that suitable two- or three-dimensional digital filtering greatly improves the quality of NTSC pictures by suppressing the interference between the luminance Y and the chrominances I, Q. Thus far, 2-D and 3-D finite impulse response (FIR) filters have been used to reduce or eliminate these cross effects. To achieve good performance, however, they require many coefficients. Since, in general, IIR filters need fewer coefficients to approximate a given magnitude response, we investigate here the possibility of applying 2-D IIR filters to the NTSC encoding/decoding. We also study the feasibility of using digital filters for NTSC channel filtering; this would permit a digital-only encoder. To design suitable filters, we use a recently proposed method based on multiple constraint optimization and simulated annealing . We propose a new implementation structure for the IIR filters that differs from the zero-phase FIR structure. We simulate the full NTSC coding chain, and compare the resulting images for both filter types.

M. Chahine and J. Konrad, "Estimation and compensation of accelerated motion for temporal sequence interpolation," Signal Process., Image Commun., vol. 7, pp. 503-527, Nov. 1995, [PDF: 898KB].

This paper makes two contributions to the area of motion-compensated processing of image sequences. First contribution is the development of a framework for the modeling and estimation of dense 2-D motion trajectories with acceleration. Therefore, Gibbs-Markov models are proposed and linked together by the maximum a posteriori probability (MAP) criterion, and the resulting objective function is minimized using multiresolution deterministic relaxation. Accuracy of the method is demonstrated by measuring the mean-squared error of estimated motion parameters for images with synthetic motion. Second contribution is the demonstration of a significant gain resulting from the use of trajectories with acceleration in motion-compensated temporal interpolation of videoconferencing/videophone images. An even higher gain is demonstrated when the accelerated motion trajectory model is augmented with occlusion and motion discontinuity models. The very good performance of the method suggests a potential application of the proposed framework in the next generation of video coding algorithms.

J. Radecki, J. Konrad, and E. Dubois, "Design of multidimensional finite-wordlength FIR and IIR filters by simulated annealing," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 42, pp. 424-431, June 1995, [PDF: 413KB].

This paper describes a new approach to the design of multidimensional (M-D) finite-wordlength digital filters with specifications in the frequency and spatial domains. The approach is based on stochastic optimization and extends previous work on finite impulse response (FIR) filters in two ways: by inclusion of spatial constraints and by application to the case of infinite impulse response (IIR) filters. The formulation proposed is based on a multiple-term objective function that, in addition to magnitude constraints, also includes step response, group delay and stability constraints. Our attention to these characteristics stems from the application of such filters to video processing that we are actively pursuing. Since filter coefficients are of finite precision and since the objective function is multivariable, non-differentiable and likely to have multiple minima, we use simulated annealing for optimization. We show numerous examples of the design of practical filters such as channel and luminance/chrominance separation filters used in the NTSC system. We demonstrate the impact of coefficient precision as well as of group delay and step response constraints on filter parameters.

J. Konrad and E. Dubois, "Bayesian estimation of motion vector fields," IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 910-927, Sept. 1992, [PDF: 1,403KB].

This paper presents a new approach to the estimation of two-dimensional motion vector fields from time-varying images. The approach is stochastic, both in its formulation and in the solution method. The formulation involves the specification of a deterministic structural model, along with stochastic observation and motion field models. Two motion models are proposed: a globally smooth model based on vector Markov random fields and a piecewise smooth model derived from coupled vector-binary Markov random fields. Two estimation criteria are studied. In the Maximum A Posteriori Probability (MAP) estimation the a posteriori probability of motion given data is maximized, while in the Minimum Expected Cost (MEC) estimation the expectation of a certain cost function is minimized. The MAP estimation is performed via simulated annealing , while the MEC algorithm performs iteration-wise averaging. Both algorithms generate sample fields by means of stochastic relaxation implemented via the Gibbs sampler . Two versions are developed, one for a discrete state space, the other for a continuous state space. The MAP estimation is incorporated into a hierarchical environment to deal efficiently with large displacements. Numerous experimental results of application of these algorithms to natural and computer-generated images with natural and synthetic motion are shown.

J. Konrad and E. Dubois, "Comparison of stochastic and deterministic solution methods in Bayesian estimation of 2D motion," Image Vis. Comput., vol. 9, pp. 215-228, Aug. 1991, [PDF: 1,451KB].

This paper discusses the estimation of two-dimensional (2-D) motion from spatio-temporally sampled image sequences. It concentrates on the optimization aspect of the problem formulated through a Bayesian framework based on Markov random field (MRF) models. First, the Maximum A Posteriori Probability (MAP) formulation for motion estimation over discrete and continuous state spaces is reviewed along with the solution method using simulated annealing (SA). Then, instantaneous ``freezing'' is applied to the stochastic algorithms resulting in well known deterministic methods. The stochastic algorithms are compared with their deterministic approximations over image sequences with natural data and synthetic as well as natural motion.

[Home] [Biography] [Publications] [Research] [Students] [Courses]