|
Journal paper abstracts
Z. Li, P. Ishwar, and J. Konrad, "Video condensation by ribbon
carving," IEEE Trans. Image Process., Oct. 2008 (submitted), [PDF: 714KB].
Efficient browsing of long video sequences is a key tool in visual
surveillance, e.g., for post-event video forensics, but can also be used for
review of motion pictures and home videos. While frame skipping (fixed or
adaptive) is straightforward to implement, its performance is quite
limited. More efficient techniques have been developed, such as video
summarization and video montage but they lose either the temporal or semantic
context of events. A recently-proposed method called video synopsis provides
even better performance, however, it involves multiple processing stages and is
fairly complex. Video condensation, that we propose here, is novel in the way
information is removed from the space-time video volume, is conceptually simple
and relatively easy to implement. We introduce the concept of a video ribbon
inspired by that of a seam recently proposed for image resizing. We recursively
carve ribbons out by minimizing an activity-aware cost function using dynamic
programming. The ribbon model we develop is flexible and permits an easy
adjustment of the compromise between temporal condensation ratio and
anachronism of events. We demonstrate ribbon carving efficiency on motor and
pedestrian traffic videos.
M. Ristivojević, J. Konrad, and M. Barlaud, "Multi-frame
motion detection for unstable cameras," IEEE Trans. Circuits Syst. Video
Technol., Sept. 2008 (submitted), [PDF: 247KB].
Network cameras, extensively used in surveillance, often allow pan-tilt-zoom
and are also subject to wind load and mount vibrations, thus causing video
frame misalignment. Although algorithms for motion detection, a basic block of
most visual surveillance systems, are relatively mature for fixed cameras, they
perform poorly for active and vibrating cameras. The issue is particularly
severe for algorithms using multiple video frames jointly. In this paper, we
extend our earlier work on multiple-frame motion detection to the case of such
unstable cameras. Our method accounts for spatially-affine and
temporally-varying inter-frame transformations, uses a variational formulation
and applies a level-set solution. We present ground-truth and real-data
experimental results and show significant improvements over earlier methods.
J. McHugh, J. Konrad, V. Saligrama, and P.-M. Jodoin,
"Foreground-adaptive background subtraction," IEEE Signal Process. Lett., Sept. 2008 (submitted), [PDF: 257KB].
Background subtraction is a powerful mechanism for detecting change in a
sequence of images that finds many applications. The most successful background
subtraction methods apply probabilistic models to background intensities
evolving in time; non-parametric and mixture-of-Gaussians models are but two
examples. The main difficulty in designing a robust background subtraction
algorithm is the selection of a detection threshold. In this paper, we adapt
this threshold to varying video statistics by means of two statistical
models. In addition to a non-parametric background model, we propose a
foreground model based on small spatial neighborhood. This improves the
method's discrimination sensitivity. We also apply a Markov random field model
to spatially characterize the change labels computed. This results in
spatially-variable detection threshold and improved spatial coherence of the
detections. The proposed methodology is applicable to other background models
as well.
S. Ince and J. Konrad, "Occlusion-aware view interpolation,"
EURASIP J. Image and Video Process., Oct. 2008 (in print), [PDF: 2,279KB].
View interpolation is an essential step in content preparation for multiview 3D
displays, free-viewpoint video and multiview image/video compression. It is
performed by establishing a correspondence among views, followed by
interpolation using the corresponding intensities. However, occlusions pose a
significant challenge, especially if few input images are available. In this
paper, we identify challenges related to disparity estimation and view
interpolation in presence of occlusions. We then propose an occlusion-aware
intermediate view interpolation algorithm that uses four input images to handle
the disappearing areas. The algorithm consists of three steps. First, all
pixels in view to be computed are classified in terms of their visibility in
the input images. Then, disparity for each pixel is estimated from different
image pairs depending on the computed visibility map. Finally, luminance/color
of each pixel is adaptively interpolated from an image pair selected by its
visibility label. Extensive experimental results show striking improvements in
interpolated image quality over occlusion-unaware interpolation from two images
and very significant gains over occlusion-aware spline-based reconstruction
from four images, both on on synthetic and real images. Although improvements
are obvious only in the vicinity of object boundaries, this should be useful in
high-quality 3D applications, such as digital 3D cinema and ultra-high
resolution multi-view autostereoscopic displays, where distortions at depth
discontinuities are highly objectionable, especially if they vary with
viewpoint change.
S. Ince and J. Konrad, "Occlusion-aware optical flow
estimation," IEEE Trans. Image Process., vol. 17, pp. 1443-1451,
Aug. 2008, [PDF: 1,222KB].
Optical flow can be reliably estimated between areas visible in two images, but
not in occlusion areas. If optical flow is needed in the whole image domain,
one approach is to use additional views of the same scene. If such views are
unavailable, an often-used alternative is to extrapolate optical flow in
occlusion areas. Since the location of such areas is usually unknown prior to
optical flow estimation, this is usually performed in three steps. First,
occlusion-ignorant optical flow is estimated, then occlusion areas are
identified using the estimated (unreliable) optical flow, and, finally, the
optical flow is corrected using the computed occlusion areas. This approach,
however, does not permit interaction between optical flow and occlusion
estimates. In this paper, we permit such interaction by proposing a variational
formulation that jointly computes optical flow, implicitly detects occlusions
and extrapolates optical flow in occlusion areas. The extrapolation mechanism
is based on anisotropic diffusion and uses the underlying image gradient to
preserve structure, such as optical flow discontinuities. Our results show
significant improvements in the computed optical flow fields over other
approaches, both qualitatively and quantitatively.
P.-M. Jodoin, M. Mignotte, and J. Konrad, "Statistical
background subtraction using spatial cues," IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 1758-1763, Dec. 2007, [PDF: 571KB].
Most statistical background subtraction techniques are based on the analysis of
temporal color/intensity distribution. However, learning statistics on a series
of time frames can be problematic, especially when no frame absent of moving
objects is available or when the available memory isn't sufficient to store the
series of frames needed for learning. In this paper, we propose a spatial
variation to the traditional temporal framework. The proposed framework allows
statistical motion detection with methods trained on one background frame
instead of a series of frames as is usually the case. Our framework includes
two spatial background subtraction approaches suitable for different
applications. The first approach is meant for scenes having a non-static
background due to noise, camera jitter or animation in the scene (e.g., waving
trees, fluttering leaves). This approach models each pixel with two PDFs: one
unimodal PDF and one multimodal PDF, both trained on one background frame. In
this way, the method can handle backgrounds with static and non-static
areas. The second spatial approach is designed to use as little processing time
and memory as possible. Based on the assumption that neighboring pixels often
share similar temporal distribution, this second approach models the background
with one global mixture of M Gaussians.
J. Konrad and M. Halle, "3-D displays and signal processing:
An answer to 3-D ills?," IEEE Signal Process. Mag., vol. 24, pp.
97-111, Nov. 2007, [PDF:
566KB].
Three-dimensional (3-D) perception is an intrinsic part of the human
experience. While most people gain the majority of their spatial information
through vision, and approximately 90% of the population benefit from
stereopsis, display systems have historically reproduced only two-dimensional
depth cues. Over the last 150 years, many attempts have been made to exploit
stereopsis in various 3-D displays; while several achieved limited commercial
success, none have attained equal status to their 2-D counterparts. Today,
novel electronic display technologies, powerful microprocessors, and advanced
signal processing algorithms are about to open a new era for 3-D displays.
Signal processing specifically focused on 3-D imaging will, in large part,
determine the viability of these emerging 3-D display systems.
In this paper, we overview today's main electronic 3-D display technologies
from a signal processing perspective. We describe the underlying physics, and
point out benefits and deficiencies of various displays. We discuss the general
role of signal processing and provide specific examples of signal processing
helping address certain display deficiencies. We highlight challenges awaiting
signal processing in quest of the ultimate 3-D experience.
L. Oddsson, R. Karlsson, J. Konrad, S. Ince, S. Williams, and E.
Zemkova, "A rehabilitation tool for functional balance using altered
gravity and virtual reality," Journal of NeuroEngineering and
Rehabilitation, vol. 4 (25), July 2007.
M. Mendillo, S. Laurent, J. Wilson, J. Baumgardner, J. Konrad, and W.
Karl, "The sources of sodium escaping from Io revealed by spectral
high definition imaging," Nature, vol. 448, pp. 330-332, July
2007.
P. McNerney, J. Konrad, and M. Betke, "Block-based MAP
disparity estimation under alpha-channel constraints," IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 785-789, June 2007, [PDF: 2,962KB].
Disparity estimation belongs to the most important, but difficult, problems in
image processing and computer vision. Its importance stems from a wide range
of applications, while its difficulty is related to ill-posedness. To date,
numerous disparity estimation algorithms have been developed. In this paper,
we consider a particular case of disparity estimation based on two views and a
known alpha channel partitioning each view into foreground and background. The
main idea is to use this partitioning in order to enhance disparity estimation
in the foreground object close to its boundary. We propose a block-based
disparity model with two alpha-channel constraints: a photometric one,
disabling invalid intensity/color matches, and a geometric one, preventing
disparity smoothing between foreground and background. We incorporate these
constraints into a Bayesian framework using the maximum a posteriori
probability criterion. We experimentally demonstrate improvements in the
estimated disparities at foreground object boundaries, and show examples of
image relighting using these disparities.
J. Konrad, "Videopsy: Dissecting visual data in space-time,"
IEEE Comm. Mag., vol. 45, pp. 34-42, Jan. 2007, [PDF: 977KB].
Network camera, made possible by recent advances in the integration of sensing,
compression and communication hardware, is a new video source that can be
easily deployed and remotely managed. Unobtrusively located along highways, at
airports or in office buildings such cameras can form a visual sensor
network, or camera web, an extremely rich source of visual
information. In its infancy today, camera web deployment will likely accelerate
in the future and one can expect visual sensing devices to eventually become as
ubiquitous as electric bulbs. While the capturing hardware has evolved
tremendously, hardware and algorithms necessary for effective analysis and
efficient communication of multi-camera data clearly lag. In this paper, I
overview one particular aspect of visual data analysis, namely space-time video
segmentation that is often a pre-requisite for motion estimation, video
compression, event detection, scene understanding, etc. I introduce the concept
of object tunnel, a 3-D surface in space-time through which a video
object travels, and the associated concept of occlusion volume. I
present examples of object tunnels and occlusion volumes on surveillance data
that, upon further processing, may lead to automatic event detection or scene
understanding. Finally, I describe challenges in extending video analysis
algorithms to visual sensor networks, and I outline some approaches possible.
M. Ristivojević and J. Konrad, "Space-time image sequence
analysis: object tunnels and occlusion volumes," IEEE Trans. Image
Process., vol. 15, pp. 364-376, Feb. 2006, [PDF: 2,013KB].
We address the issue of image sequence analysis jointly in space and time.
While typical approaches to such an analysis consider two image frames at a
time, we propose to perform this analysis jointly over multiple frames. We
concentrate on spatio-temporal segmentation of image sequences and on analysis
of occlusion effects therein. The segmentation process is three-dimensional
(3-D); we search for a volume carved out by each moving object in the image
sequence domain, or ``object tunnel'', a new space-time concept. We pose the
problem in variational framework by using only motion information (no intensity
edges). The resulting formulation can be viewed as volume competition, a 3-D
generalization of region competition. We parameterize the unknown surface to be
estimated, but rather than using an active-surface approach, we embed it into a
higher-dimensional function and apply the level-set methodology. We first
develop simple models for the detection of moving objects over static
background; no motion models are needed. Then, in order to improve segmentation
accuracy, we incorporate motion models for objects and background. We further
extend the method by including explicit models for occluded and newly-exposed
areas that lead to ``occlusion volumes'', another new space-time concept. Since
in this case multiple volumes are sought, we apply a multiphase variant of the
level-set method. We present various experimental results for synthetic and
natural image sequences.
J. Konrad and P. Agniel, "Subsampling models and anti-alias
filters for 3-D automultiscopic displays," IEEE Trans. Image
Process., vol. 15, pp. 128-140, Jan. 2006, [PDF: 921KB].
A new type of 3-D display recently introduced on the market holds great promise
for the future of 3-D visualization, communication and entertainment. This
so-called automultiscopic display can deliver multiple views without glasses
thus allowing a limited ``look-around'' (correct motion-parallax). Central to
this technology is the process of multiplexing several views into a single
viewable image. This multiplexing is a complex process involving irregular
subsampling of the original views. If not preceded by lowpass filtering, it
results in aliasing that leads to texture as well as depth distortions. In
order to eliminate this aliasing, we propose to model the multiplexing process
with lattices, find their parameters and then design optimal anti-alias
filters. To this effect, we use multi-dimensional sampling theory and basic
optimization tools. We derive optimal anti-alias filters for a specific
automultiscopic monitor using three models: orthogonal lattice, non-orthogonal
lattice and union of shifted lattices. In the first case, the resulting
separable low-pass filter offers significant aliasing reduction that is further
improved by hexagonal-passband lowpass filter for the non-orthogonal lattice
model. A more accurate model is obtained using union of shifted lattices, but
due to the complex nature of repeated spectra practical filters designed in
this case offer no additional improvement. We also describe a practical method
to design finite-precision, low-complexity filters that can be implemented
using modern graphics cards.
R. Stasiński and J. Konrad, "POCS reconstruction of
irregularly-sampled images based on oversampling and linear space-variant
filtering," Sampling Theory in Signal and Image Processing, vol. 5,
pp. 37-58, Jan. 2006, [PDF:
470KB].
Image reconstruction from irregularly-spaced samples is becoming a pivotal
element of advanced video processing and compression tasks. Typically,
irregular sample positions are due to the process of motion compensation, and
can result in areas void of data (divergent motion, occlusions areas). Since
sample positions do not obey constraints required by irregular-sampling
theorems, alternative, for example approximate, reconstruction methods are
needed. In this paper, we describe an image reconstruction method from
irregularly-spaced samples based on the theory of projection onto convex sets
(POCS). Similarly to other POCS-based image reconstruction methods our approach
applies two projection operators: bandwidth limitation and sample
substitution. Unlike other methods, however, our algorithm is implemented on an
oversampled lattice. Although the method performs well, it can be optimized to
deal efficiently only with either densely- or sparsely-sampled image areas, but
not with both types of area simultaneously. In order to address this issue, we
propose to replace the usual linear space-invariant filtering with linear
space-variant filtering. We develop a filter adaptation strategy that selects
suitable filter depending on the local density of irregularly-spaced input
samples. We further improve the method by adapting filter bandwidth to the
progress of image reconstruction. We experimentally demonstrate efficacy of the
method on disparity compensation in the context of stereoscopic 3-D imaging.
N. Božinović and J. Konrad, "Motion analysis in 3D
DCT domain and its application to video coding," Signal Process., Image
Commun., vol. 20, pp. 510-528, July 2005, [PDF: 1,950KB], 2004-2005 EURASIP Image Communication Best Paper
Award.
Global, constant-velocity, translational motion in an image sequence induces a
characteristic energy footprint in the Fourier-transform (FT) domain; spectrum
is limited to a plane with orientation defined by the direction of motion. By
detecting these spectral occupancy planes, methods have been proposed to
estimate such global motion. Since the discrete cosine transform (DCT) is a
ubiquitous tool of all video compression standards to date, we investigate in
this paper properties of motion in the DCT domain. We show that global,
constant-velocity, translational motion in an image sequence induces in the DCT
domain spectral occupancy planes, similarly to the FT domain. Unlike in the FT
case, however, these planes are subject to spectral folding. Based on this
analysis, we propose a motion estimation method in the DCT domain, and we show
that results comparable to standard block matching can be obtained. Moreover,
by realizing that significant energy in the DCT domain concentrates around a
folded plane, we propose a new approach to video compression. The approach is
based on 3D DCT applied to a group of frames, followed by motion-adaptive
scanning of DCT coefficients (akin to ``zig-zag'' scanning in MPEG coders),
their adaptive quantization, and final entropy coding. We discuss the design of
the complete 3D DCT coder and we carry out a performance comparison of the new
coder with ubiquitous hybrid coders.
C. Vázquez, E. Dubois, and J. Konrad, "Reconstruction of
irregularly-sampled images in spline spaces," IEEE Trans. Image
Process., vol. 14, pp. 713-725, June 2005, [PDF: 2,813KB].
This paper presents a novel approach to the reconstruction of images from
irregularly-spaced samples. This problem is often encountered in digital image
processing applications. Non-recursive video coding with motion compensation,
spatio-temporal interpolation of video sequences and generation of new views in
multi-camera systems are three possible applications. We propose a new
reconstruction algorithm based on a spline model for images. We use
regularization since this is an ill-posed inverse problem. We minimize a cost
function composed of two terms: one related to the approximation error and the
other related to the smoothness of the modeling function. All the processing is
carried out in the space of spline coefficients; this space is discrete
although the problem itself is of a continuous nature. The coefficients of
regularization and approximation filters are computed exactly by using the
explicit expressions of B-spline functions in the time domain. The
regularization is carried out locally while the computation of the
regularization factor accounts for the structure of the irregular sampling
grid. The linear system of equations obtained is solved iteratively. Our
results show a very good performance in motion-compensated interpolation
applications.
A.-R. Mansouri and J. Konrad, "Multiple motion segmentation
with level sets," IEEE Trans. Image Process., vol. 12, pp. 201-220,
Feb. 2003, [PDF: 4,753KB].
Segmentation of motion in an image sequence is one of the most challenging
problems in image processing, while at the same time one that finds numerous
applications. To date, a wealth of approaches to motion segmentation have been
proposed. Many of them suffer from the local nature of the models used. Global
models, such as those based on Markov random fields, perform, in general,
better. In this paper, we propose a new approach to motion segmentation that is
based on a global model. The novelty of the approach is twofold. First,
inspired by recent work of other researchers we formulate the problem as that
of region competition, but we solve it using the level set
methodology. The key features of a level set representation, as compared to
active contours , often used in this context, are its ability to handle
variations in the topology of the segmentation and its numerical stability.
The second novelty of the paper is the formulation in which, unlike in many
other motion segmentation algorithms, we do not use intensity boundaries as an
accessory; the segmentation is purely based on motion. This permits accurate
estimation of motion boundaries of an object even when its intensity boundaries
are hardly visible. Since occasionally intensity boundaries may prove
beneficial, we extend the formulation to account for the coincidence of motion
and intensity boundaries. In addition, we generalize
the approach to multiple motions. We discuss possible
discretizations of the evolution (PDE) equations and we give details of an
initialization scheme so that the results could be duplicated. We show numerous
experimental results for various formulations on natural images with either
synthetic or natural motion.
R. Stasiński and J. Konrad, "Improved POCS reconstruction
of stereoscopic views," Signal Process., Image Commun., vol. 17, pp.
689-704, Oct. 2002, [PDF:
314KB].
This paper presents an application of the projection onto convex sets
(POCS) framework to the reconstruction of intermediate stereoscopic views. Such
views are needed in 3-D viewing in order to simulate the so-called
``look-around'' as well as to adjust the perceived depth (interocular
adjustment). The basic problem in the above reconstruction is that of the
recovery of a regularly-sampled image from its irregularly-spaced samples due
to disparity compensation. This problem also arises in other image processing
and coding applications, such as multiple-frame motion compensation or video
frame rate conversion. In our POCS-based approach to view reconstruction, two
projection operators are used: bandwidth limitation and sample substitution.
The bandwidth limitation can be implemented in the original domain by means of
lowpass FIR filtering but we opt for a frequency-domain implementation by means
of windowing. The results reported here improve our original POCS-based
reconstruction method by locally adapting the algorithm to the density of image
samples. We also extend the method to color images through an implementation
in the luminance-chrominance space.
K. Belloulata and J. Konrad, "Region-by-region fractal image
compression," IEEE Trans. Image Process., vol. 11, pp. 351-362, Apr.
2002, [PDF: 300KB].
Region-based functionality offered by the MPEG-4 video compression standard is
also appealing for still images, for example to permit object-based queries of
a still-image database. A popular method for still-image compression is fractal
coding. However, traditional fractal image coding uses rectangular range and
domain blocks. Although new schemes have been proposed that merge small blocks
into irregular shapes, the merging process does not, in general, produce
semantically-meaningful regions. We propose a new approach to fractal image
coding that permits region-based functionalities; images are coded region by
region according to a previously-computed segmentation map. We use rectangular
range and domain blocks, but divide boundary blocks into segments belonging to
different regions. Since this prevents the use of standard dissimilarity
measure, we propose a new measure adapted to segment shape. We propose two
approaches: one in the spatial and one in the transform domain. While
providing additional functionality, the proposed methods perform similarly to
other tested methods in terms of PSNR but often result in images that are
subjectively better. Due to the limited domain-block codebook size, the new
methods are faster than other fractal coding methods tested. The results are
very encouraging and show the potential of this approach for various internet
and still-image database applications.
J. Konrad, "Visual communications of tomorrow: natural,
efficient and flexible," IEEE Comm. Mag., vol. 39, pp. 126-133, Jan.
2001, [PDF: 242KB].
In the last decade, we have witnessed a phenomenal growth of communication and
information technologies. These technologies have greatly simplified and even
enriched our daily lives; cellular telephony and the Internet are probably the
most striking examples. A particularly promising, and at the same time
challenging, aspect of both technologies is the transmission and use of visual
information. In this paper, I overview the state of visual communication at the
end of 20th century, discuss today's challenges and outline some future
directions.
A.-R. Mansouri and J. Konrad, "Bayesian winner-take-all
reconstruction of intermediate views from stereoscopic images," IEEE
Trans. Image Process., vol. 9, pp. 1710-1722, Oct. 2000, [PDF: 1,210KB].
This paper presents a new algorithm for the reconstruction of intermediate
views from a pair of still stereoscopic images. The algorithm is designed to
address the issue of blur caused by linear filtering often employed in such
reconstruction. The proposed algorithm is block-based and to reconstruct the
intermediate views employs non-linear disparity-compensated filtering by means
of a winner-take-all strategy. The reconstructed image is modeled as a tiling
by fixed-size blocks coming from various positions (disparity compensation) of
either the left or right images, while the tiling map itself is
modeled by a binary decision field. In addition to that, an observation model
relating the left and right images via a disparity field, and a disparity
field model are used. All models are probabilistic and are combined into a
maximum a posteriori probability criterion. The intermediate intensities,
disparities and the binary decision field are estimated jointly using the
expectation-maximization algorithm. The new approach is compared experimentally
on complex natural images with a reference block-based algorithm employing
linear filtering. Although the improvements are localized and often subtle,
they demonstrate that a high-quality intermediate view reconstruction for
complex scenes is feasible.
J. Konrad, B. Lacotte, and E. Dubois, "Cancellation of image
crosstalk in time-sequential displays of stereoscopic video," IEEE Trans. Image Process., vol. 9, pp. 897-908, May 2000, [PDF: 242KB].
Stereoscopic visualization systems based on liquid crystal shutter (LCS)
eyewear and cathode-ray tube (CRT) displays provide today the best overall
quality of 3-D images and therefore have a dominant position in commercial as
well as professional markets. Due to the CRT and LCS characteristics, however,
such systems suffer from perceptual crosstalk (``shadows'') at object
boundaries that can reduce, and at times inhibit, the ability to perceive
depth. In this paper, we propose a method to reduce such crosstalk. We present
a simple model for intensity leak, we assess model parameters for a
time-sequential LCS/CRT system and we propose a computationally-efficient
algorithm to eliminate the crosstalk. Since the full crosstalk elimination
implies an unacceptable image degradation (reduction of contrast), we study the
trade-off between crosstalk elimination and image contrast. We describe
experiments on synthetic and natural stereoscopic images and we discuss
informal subjective viewing of processed images. Overall, the viewer response
has been very positive; 3-D perception of many objects became either much
easier or even effortless. Since the proposed algorithm can be easily
implemented in real time (only linear scaling and table look-up are needed), we
believe that it can be successfully used today in various stereoscopic
applications suffering from image crosstalk. This is particularly true in view
of the continuously increasing CPU and graphics power of modern PCs.
F. Dufaux and J. Konrad, "Robust, efficient and fast global
motion estimation for video coding," IEEE Trans. Image Process., vol.
9, pp. 497-501, Mar. 2000, [PDF:
212KB].
In this paper, we propose an efficient, robust, and fast method for the
estimation of global motion from image sequences. The method is generic in that
it can accommodate various global motion models, from a simple translation to
an 8-parameter perspective model. The algorithm is hierarchical and consists of
three stages. In the first stage, a low-pass image pyramid is built. Then, an
initial translation is estimated with full-pixel precision at the top of the
pyramid using a modified n-step search matching. In the third stage, a gradient
descent is executed at each level of the pyramid starting from the initial
translation at the coarsest level. Due to the coarse initial estimation and the
hierarchical implementation, the method is very fast. To increase robustness to
outliers, we replace the usual formulation based on a quadratic error criterion
with a truncated quadratic function. We have applied the algorithm to various
test sequences within an MPEG-4 coding system. From the experimental results we
conclude that global motion estimation provides significant performance gains
for video material with camera zoom and/or pan. The gains result from a reduced
prediction error and a more compact representation of motion. We also conclude
that the robust error criterion can introduce additional performance gains
without increasing computational complexity.
R. Stasiński and J. Konrad, "A new class of fast
shape-adaptive orthogonal transforms and their application to region-based
image compression," IEEE Trans. Circuits Syst. Video Technol., vol.
9, pp. 16-34, Feb. 1999, [PDF:
344KB].
Region-based approaches to image and video compression have been very actively
explored in the last few years. It is widely expected that they will result in
rate/quality gains and expanded functionalities. In such approaches one of the
essential problems is the representation of luminance and color in
arbitrarily-shaped regions. For rectangular blocks extracted from natural
images the discrete cosine transform (DCT) has been found to perform close to
the eigentransform. Although for arbitrarily-shaped regions
orthogonalization-based procedures have been shown to perform very well, their
computational complexity and memory requirements are prohibitive for today's
technology. Therefore, other approaches are presently investigated and
particular attention is paid to low implementation complexity. In this paper,
we propose a new class of orthogonal transforms that self-adapt to arbitrary
shapes. The new algorithms are derived from flowgraphs of standard fast
transform algorithms by a suitable modification of certain butterfly
operators. First, we show how to derive a shape-adaptive transform from the
discrete Walsh-Hadamard transform (DWHT) flowgraph. Then, we discuss
modifications needed to arrive at a DCT-based shape-adaptive transform. We give
implementation details of this transform and compare its computational
complexity with several well-known approaches. We also evaluate the energy
compaction performance of the new transform for both synthetic and natural
data. We conclude that the proposed DCT-based shape-adaptive transform gives a
very beneficial compaction/complexity ratio compared to other well-known
approaches. The complexity of the new method does not exceed the complexity of
two non-adaptive DCTs on a circumscribing rectangle, and therefore, unlike
other tested methods with comparable energy compaction, it is suitable for
large regions. This property should prove very valuable in the future when
true region-based image/video compression methods are developed.
C. Stiller and J. Konrad, "Estimating motion in image
sequences: A tutorial on modeling and computation of 2D motion," IEEE
Signal Process. Mag., vol. 16, pp. 70-91, July 1999, [PDF: 929KB], 2001
IEEE Signal Processing Magazine Award.
This paper addresses the estimation of 2D motion (optical flow) from sequences
of images and is intended for readers involved in video processing and
compression as well as computer vision. Motion estimation is one of the key
techniques helping solve various problems encountered when dealing with image
sequences; redundancy elimination in digital video or tracking of moving
objects are but two interesting tasks. Due to a strong correlation of image
intensities in the direction of motion, operations such as prediction,
interpolation or filtering are most efficient when applied along motion
trajectories. To compute these trajectories, underlying models need to be
specified, estimation criterion must be selected and a search strategy must be
implemented. In the paper, we discuss various motion representations and the
associated regions of support, as well as models that relate motion parameters
to image data. Then, we concentrate on various estimation criteria: from simple
ones comprising the displaced frame difference only to complex Bayesian
criteria involving multiple terms. Finally, we address search strategies. We
describe matching- and gradient-based schemes, deterministic and stochastic
relaxation methods including simulated annealing as well as other deterministic
approaches such as ``highest confidence first'' and mean field techniques. We
sketch multiresolution and multiscale strategies and point out their benefits.
No experimental results are included in the paper, however a substantial body
of literature is cited; interested readers are referred to earlier work of the
authors and of other researchers.
M. Ben Slima, J. Konrad, and A. Barwicz, "Improvement of
stereo disparity estimation through balanced filtering: the sliding-block
approach," IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 6,
pp. 913-920, 1997, [PDF:
939KB].
In a typical disparity (or motion) estimation algorithm developed for
inter-image prediction, an interpolation of intensities is applied to one of
the two images used. Therefore, non-filtered intensities of the image being
predicted are compared with lowpass-filtered intensities of the other image of
the stereo pair. Consequently, noise and detail suppression in the two images
are unequal. In this paper we propose to apply the same ( balanced )
filtering to both images. In addition to image smoothing that helps avoid
unreliable intensity matches, the lowpass filter is used to carry out intensity
interpolation at the same time; the computation of sub-pixel attributes is
consistent with lowpass filtering of both images unlike arbitrary linear or
cubic interpolation applied to one image only. The proposed approach lends
itself naturally to a multiresolution implementation. We apply the new approach
to stereo disparity estimation based on sliding blocks. Using synthetic and
natural data we experimentally compare the new approach with the traditional
sliding-block method. For standard stereoscopic images we demonstrate up to
2.4dB reduction of disparity-compensated prediction error over the traditional
sliding-block method.
J. Konrad, J. Radecki, and E. Dubois, "The application of
two-dimensional finite-precision IIR filters to enhanced NTSC coding,"
IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 355-374,
Aug. 1996, [PDF: 857KB].
The goal of this paper is to study the application of two-dimensional (2-D)
finite-precision infinite impulse response (IIR) filters to enhanced NTSC
coding. It is well-known that suitable two- or three-dimensional digital
filtering greatly improves the quality of NTSC pictures by suppressing the
interference between the luminance Y and the chrominances I, Q. Thus far,
2-D and 3-D finite impulse response (FIR) filters have been used to reduce or
eliminate these cross effects. To achieve good performance, however, they
require many coefficients. Since, in general, IIR filters need fewer
coefficients to approximate a given magnitude response, we investigate here the
possibility of applying 2-D IIR filters to the NTSC encoding/decoding. We also
study the feasibility of using digital filters for NTSC channel filtering; this
would permit a digital-only encoder. To design suitable filters, we use a
recently proposed method based on multiple constraint optimization and
simulated annealing . We propose a new implementation structure for the IIR
filters that differs from the zero-phase FIR structure. We simulate the full
NTSC coding chain, and compare the resulting images for both filter types.
M. Chahine and J. Konrad, "Estimation and compensation of
accelerated motion for temporal sequence interpolation," Signal Process.,
Image Commun., vol. 7, pp. 503-527, Nov. 1995, [PDF: 898KB].
This paper makes two contributions to the area of motion-compensated processing
of image sequences. First contribution is the development of a framework for
the modeling and estimation of dense 2-D motion trajectories with acceleration.
Therefore, Gibbs-Markov models are proposed and linked together by the maximum
a posteriori probability (MAP) criterion, and the resulting objective
function is minimized using multiresolution deterministic relaxation. Accuracy
of the method is demonstrated by measuring the mean-squared error of estimated
motion parameters for images with synthetic motion. Second contribution is the
demonstration of a significant gain resulting from the use of trajectories with
acceleration in motion-compensated temporal interpolation of
videoconferencing/videophone images. An even higher gain is demonstrated when
the accelerated motion trajectory model is augmented with occlusion and motion
discontinuity models. The very good performance of the method suggests a
potential application of the proposed framework in the next generation of video
coding algorithms.
J. Radecki, J. Konrad, and E. Dubois, "Design of
multidimensional finite-wordlength FIR and IIR filters by simulated
annealing," IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process., vol. 42, pp. 424-431, June 1995, [PDF: 413KB].
This paper describes a new approach to the design of multidimensional (M-D)
finite-wordlength digital filters with specifications in the frequency and
spatial domains. The approach is based on stochastic optimization and extends
previous work on finite impulse response (FIR) filters in two ways: by
inclusion of spatial constraints and by application to the case of infinite
impulse response (IIR) filters. The formulation proposed is based on a
multiple-term objective function that, in addition to magnitude constraints,
also includes step response, group delay and stability constraints. Our
attention to these characteristics stems from the application of such filters
to video processing that we are actively pursuing. Since filter coefficients
are of finite precision and since the objective function is multivariable,
non-differentiable and likely to have multiple minima, we use simulated
annealing for optimization. We show numerous examples of the design of
practical filters such as channel and luminance/chrominance separation filters
used in the NTSC system. We demonstrate the impact of coefficient precision as
well as of group delay and step response constraints on filter parameters.
J. Konrad and E. Dubois, "Bayesian estimation of motion vector
fields," IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp.
910-927, Sept. 1992, [PDF:
1,403KB].
This paper presents a new approach to the estimation of two-dimensional motion
vector fields from time-varying images. The approach is stochastic, both in its
formulation and in the solution method. The formulation involves the
specification of a deterministic structural model, along with stochastic
observation and motion field models. Two motion models are proposed: a globally
smooth model based on vector Markov random fields and a piecewise smooth model
derived from coupled vector-binary Markov random fields. Two estimation
criteria are studied. In the Maximum A Posteriori Probability (MAP)
estimation the a posteriori probability of motion given data is
maximized, while in the Minimum Expected Cost (MEC) estimation the expectation
of a certain cost function is minimized. The MAP estimation is performed via
simulated annealing , while the MEC algorithm performs iteration-wise
averaging. Both algorithms generate sample fields by means of stochastic
relaxation implemented via the Gibbs sampler . Two versions are
developed, one for a discrete state space, the other for a continuous state
space. The MAP estimation is incorporated into a hierarchical environment to
deal efficiently with large displacements. Numerous experimental results of
application of these algorithms to natural and computer-generated images with
natural and synthetic motion are shown.
J. Konrad and E. Dubois, "Comparison of stochastic and
deterministic solution methods in Bayesian estimation of 2D motion,"
Image Vis. Comput., vol. 9, pp. 215-228, Aug. 1991, [PDF: 1,451KB].
This paper discusses the estimation of two-dimensional (2-D) motion from
spatio-temporally sampled image sequences. It concentrates on the optimization
aspect of the problem formulated through a Bayesian framework based on Markov
random field (MRF) models. First, the Maximum A Posteriori Probability
(MAP) formulation for motion estimation over discrete and continuous state
spaces is reviewed along with the solution method using simulated
annealing (SA). Then, instantaneous ``freezing'' is applied to the stochastic
algorithms resulting in well known deterministic methods. The stochastic
algorithms are compared with their deterministic approximations over image
sequences with natural data and synthetic as well as natural motion.
|