L. Oddsson, C. Wall III, P. Meyer, and J. Konrad, "A virtual
environment with simulated gravity for balance rehabilitation of bedridden
patients and frail individuals," in XV-th Congress of the International
Society of Electrophysiology and Kinesiology, p. 55, June 2004.
Rehabilitation of physical function and balance in frail individuals and
bedridden patients is a challenge for the therapist. Early ambulation following
hip fracture has been shown to be directly predictive of extended survival
indicating the importance of effective interventions that improve physical
function and balance and thereby minimize bed time. Such interventions should
preferably involve whole body exercises that challenge coordination and motor
function. We have built a 90 deg tilted room environment where a subject
"stands" in a supine position while strapped to a frictionless device through a
backpack frame and harness that allows free motion in the frontal plane,
similar to upright standing. The device is attached to a weight stack through a
series of pulleys, which provides a variable gravity-like force that the
subject must balance against to remain "upright" in the tilted environment. The
room contains common physical objects that are visually "polarized" (well
defined "up" and "down", e.g. a chair) to convey to the subject the perception
of being upright in a 1-g environment. Healthy subjects, who trained their
balance in this supine position on 10 occasions over a two-week period, showed
dramatic improvements in upright balance performance including a 50% increase
in time to balance on a half cylinder on one leg and a 30% decrease in COP sway
velocity while standing on one leg. We expect frail individuals and bedridden
patients to be able to safely perform functional balance training in the tilted
environment that would transfer to improved function and mobility in an upright
position when negotiating gravity. We plan a portable version of this system
that would incorporate recently available autostereoscopic 3-D displays, a
technique that allows 3-D immersion without the use of glasses, to provide
"windows" of a virtual environment around the subject instead of the currently
used physical room. A 5-camera, digital image acquisition system, called the
Pentacam is being developed to capture 3-D images that can be tailored to the
preferences of different individuals. For example, images could be acquired
from sites that are familiar to the subject including their own or a relative's
indoor or outdoor home environment.
M. Ristivojević and J. Konrad, "Joint space-time image
sequence segmentation: object tunnels and occlusion volumes," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. III, pp.
9-12, May 2004, [PDF:
540KB].
Spatial segmentation of image sequences is usually computed based on motion
between two frames. Some recent approaches extend this to joint segmentation in
space-time; the resulting 3-D segmentation (in x-y-t space) can be interpreted
as a volume ``carved out'' by a moving object in the image sequence domain, or
the so-called ``object tunnel''. In this paper, we extend this concept to
explicit modeling of occlusion events in the x-y-t space. In addition to the
modeling of object evolution, we also model occluded and newly-exposed areas in
the background and in the object by means of ``occlusion volume'', a new
space-time concept. We propose a variational formulation of the problem that we
solve using the multiphase level set method. We show experimental results for
synthetic and natural image sequences.
N. Božinović and J. Konrad, "Mesh-based motion models
for wavelet video coding," in Proc. IEEE Int. Conf. Acoustics Speech
Signal Processing, vol. III, pp. 141-144, May 2004, [PDF: 101KB].
Discrete wavelet transforms implemented using lifting along motion trajectories
are effective and efficient temporal decomposition tools that facilitate video
compression competitive with the current standards. As recently shown,
however, in order that a lifting-based motion-compensated wavelet transform be
equivalent to its transversal (standard) implementation, motion transformation
must be invertible and motion composition between frames must be
well-defined. In this paper, we discuss various mesh-based motion models that
satisfy requirements of invertibility and composition, and thus are suitable
for use in motion-compensated lifting-based wavelet transforms. We propose a
new mesh configuration that preserves regularity of the mesh structure but
provides better motion compensation compared to previously-reported mesh
topologies, particularly in the proximity of image boundaries. Our results
show that an improvement in motion compensation and overall compression
performance is possible with only a fractional increase in motion overhead
bit-rate.
T. André, M. Cagnazzo, M. Antonini, M. Barlaud, N. Božinović,
and J. Konrad, "(N,0) motion-compensated lifting-based wavelet
transform," in Proc. IEEE Int. Conf. Acoustics Speech Signal
Processing, vol. III, pp. 121-124, May 2004, [PDF: 64KB].
Motion compensation has been widely used in both DCT- and wavelet-based video
coders for years. The recent success of temporal wavelet transform based on
motion-compensated lifting suggests that a high-performance, scalable wavelet
video coder may soon outperform best DCT-based coders. As recently shown,
however, the motion-compensated lifting does not implement exactly its
transversal equivalent unless certain conditions on motion are satisfied. In
this paper, we review those conditions, and we discuss their importance. We
derive a new class of temporal transforms, the so-called 1-N transversal or
(N,0) lifting transforms, that are particularly interesting if those conditions
on motion are not satisfied. We compare experimentally the 1-3 and 5-3
motion-compensated wavelet transforms for the ubiquitous block-motion model
used in all video compression standards. For this model, the 1-3 transform
outperforms the 5-3 transform due to the need to transmit additional motion
information in the later case. This interesting result, however, does not
extend to motion models satisfying the transversal/lifting equivalence
conditions.
J. Konrad and P. Agniel, "Non-orthogonal sub-sampling and
anti-alias filtering for multiscopic 3-D displays," in Proc. SPIE
Stereoscopic Displays and Virtual Reality Systems, vol. 5291, pp.
105-116, Jan. 2004, [PDF:
1,168KB].
Multiview passive 3-D displays, such as those based on lenticular or
parallax-barrier technologies, require multiplexing of views into a single
same-size RGB image. Thus, multiplexing of N views necessitates N:1
sub-sampling of each view and must be preceded by suitable lowpass filtering to
prevent, or at least reduce, aliasing. Without such filtering, objectionable
"jagged" edges, distorted textures, or Moire patterns are perceived although,
admittedly, these effects are not as disturbing as in the case of single-view
sub-sampling without multiplexing with other views. In this paper, unlike in
our previous work, we consider anti-alias filtering derived from a
non-orthogonal lattice. First, we approximate pixel layout for each view
(sampling pattern) by a two-dimensional lattice; we find parameters of the
lattice by minimizing a mismatch error between lattice and single-view
points. Then, based on lattice parameters, we find frequency-domain
specifications of the anti-alias filter. The filter has hexagonal passband and
thus is non-separable. Although previously we designed such filters for
floating-point implementations, here we opt for the more practical fixed-point
arithmetic; the resulting filters can be easily implemented on ubiquitous
fixed-point DSP chipsets. The fixed-point filters slightly depart from the
desired magnitude specifications, but when applied to actual multiview images
they produce almost indistinguishable results from those obtained by
floating-point counterparts.
Y. Shi, J. Konrad, and W. Karl, "Multiple motion and occlusion
segmentation with a multiphase level set method," in Proc. SPIE Visual
Communications and Image Process., vol. 5308, pp. 189-198, Jan. 2004, [PDF: 2,059KB].
In this paper, we propose a new variational formulation for simultaneous
multiple motion segmentation and occlusion detection in an image sequence. For
the representation of segmented regions, we use the multiphase level set method
proposed by Vese and Chan. This method allows an efficient representation of up
to 2^L regions with L level-set functions. Moreover, by construction, it
enforces a domain partition with no gaps and overlaps. This is unlike previous
variational approaches to multiple motion segmentation, where additional
constraints were needed. The variational framework we propose can incorporate
an arbitrary number of motion transformations as well as occlusion areas. In
order to minimize the resulting energy, we developed a two-step algorithm. In
the first step, we use a feature-based method to estimate the motions present
in the image sequence. In the second step, based on the extracted motion
information, we iteratively evolve all level set functions in the gradient
descent direction to find the final segmentation. We have tested the above
algorithm on both synthetic- and natural-motion data with very promising
results. We show here segmentation results for two real video sequences.
M. Ristivojević and J. Konrad, "Joint space-time
motion-based video segmentation and occlusion detection using multi-phase
level sets," in Proc. SPIE Visual Communications and Image Process.,
vol. 5308, pp. 156-167, Jan. 2004, [PDF: 778KB].
Spatial video segmentation is usually performed based on motion between two
frames. Some recent approaches extend this to joint segmentation in space-time;
the resulting 3-D segmentation can be interpreted as a volume ``carved out'' by
a moving object in the image sequence domain, or the so-called ``object
tunnel''. In this paper, we extend this concept to explicit modeling of
occlusion events in space-time. In addition to the modeling of object
evolution, we also explicitly model occluded and newly-exposed areas in the
background by means of ``occlusion volume'', a new space-time concept. A voxel
belongs to occlusion volume if its intensity is consistent with past
intensities along its motion trajectory but inconsistent with future
intensities (reversed for ``exposed volume''). We propose a variational
formulation of the problem that we solve using the multiphase level set method.
We show encouraging experimental results for synthetic and natural image
sequences.
J. Konrad, "Transversal versus lifting approach to
motion-compensated temporal discrete wavelet transform of image sequences:
equivalence and tradeoffs," in Proc. SPIE Visual Communications and Image
Process., vol. 5308, pp. 452-463, Jan. 2004, [PDF: 133KB].
Lifting-based implementations of various discrete wavelet transforms applied in
the temporal direction under motion compensation have recently become a very
powerful tool in video compression research. We present in this paper a
theoretical analysis of motion compensation in both transversal and lifted
implementations of such transforms. We derive conditions for perfect
reconstruction in the case of motion-compensated transversal discrete wavelet
transform. We also derive conditions on motion transformation assuring that a
motion-compensated lifting scheme is exactly equivalent to its transversal
counterpart. In general, these conditions require that motion transformation
allow composition and be invertible. Unfortunately, many motion models do not
obey these properties, thus inducing subband decomposition errors (prior to
compression). We propose an alternative approach to motion compensation in the
case of Haar transform. This new approach poses no constraints on motion;
motion-compensated lifted Haar transform exactly implements its transversal
implementation, and the latter obeys perfect reconstruction, both regardless of
motion transformation used. This new approach, however, does not extend to the
5/3 or any higher-order discrete wavelet transform.
R. Stasiński and J. Konrad, "Linear shift-variant filtering
for POCS reconstruction of irregularly sampled images," in Proc. IEEE
Int. Conf. Image Processing, vol. III, pp. 689-692, Sept. 2003, [PDF: 70KB].
The reconstruction of a regularly-sampled image from irregularly-spaced
samples is a stumbling block in various video processing tasks. In the past, we
have developed a POCS-based (projection onto convex sets) reconstruction method
that applies two operators sequentially: bandwidth limitation and sample
substitution. Although the method works well, we have observed an interesting
paradox: wide-band filtering results in better-looking images, but lower PSNR
values, than narrow-band filtering (increased blur). This can be explained by a
too-short impulse response of the wide-band filter unable to ``fill-in'' the
missing samples in sparsely populated areas. In this paper, we propose an
improved version of our algorithm where linear shift-invariant (LSI) filtering
is replaced by linear shift-variant (LSV) filtering. The LSV filtering is
implemented as a parallel bank of LSI filters, each with different bandwidth
(impulse response). We demonstrate experimentally a significant reduction of
the reconstruction error due to the new LSV filtering.
N. Božinović and J. Konrad, "Scan order and
quantization for 3D-DCT coding," in Proc. SPIE Visual Communications
and Image Process., vol. 5150, pp. 1204-1215, July 2003, [PDF: 2,876KB].
Two types of coders dominate the field of video compression research today:
well-established hybrid coders, that are in the core of all MPEG and H.26X
standards, and emerging three-dimensional (3D) subband coders, largely inspired
by the success of wavelet-based still image compression. However, there are
surprisingly few results reported on 3D transform coding based on the discrete
cosine transform (DCT). Even while exploiting all the beneficial properties of
the DCT itself (forward/inverse symmetry, fast separable implementation, and
excellent energy compaction), these coders under-perform when compared to
competing hybrid coders primarily due to inefficient quantization, scanning and
entropy coding used. In this paper, we study means of improving 3D-DCT coding
by proposing adaptive scanning order and quantization of coefficients that are
better matched to 3D-DCT spectrum of a motion sequence. Our results show
significant improvement in performance over previously reported techniques.
J. Konrad and M. Ristivojević, "Video segmentation and
occlusion detection over multiple frames," in Proc. SPIE Image and Video
Communications and Process., vol. 5022, pp. 377-388, Jan. 2003, [PDF: 1,242KB].
Spatial segmentation of image sequences is usually performed based on motion
between two frames, and then followed by tracking. Some recent approaches
extend this to joint segmentation in space-time; the resulting 3-D segmentation
(in x-y-t space) can be interpreted as a volume ``carved out'' by a moving
object in the image sequence domain. We call such volumes ``object
tunnels''. In this paper, we propose a new approach to occlusion analysis and
characterization that is based on object tunnels. It results from the
observation that object-tunnel wall for a fully visible object has different
shape than that for an object undergoing occlusion or exposure. Walls of
tunnels associated with moving objects have tangent planes that are, in
general, non-parallel to the time axis. When an object gets occluded or exposed
by a static feature, part of the object tunnel wall stops evolving freely; its
spatial coordinates remain fixed (static occlusion boundary) while the temporal
coordinate increases linearly (time evolution). This forces part of the wall to
be comprised of lines parallel to the time axis, each line defined by a single
point on the occlusion boundary. In case this boundary is a straight-line edge,
the occluding part of the wall becomes planar. We propose to detect occlusions
by searching for such characteristic surfaces of object tunnel walls. We
formulate the problem for planar occlusion walls based on a robust distance
metric, and we show experimental results for various occlusion types on
synthetic and camera-acquired image sequences.
A. Litvin, J. Konrad, and W. Karl, "Probabilistic video
stabilization using Kalman filtering and mosaicking," in Proc. SPIE
Image and Video Communications and Process., vol. 5022, pp. 663-674,
Jan. 2003, [PDF: 1,357KB].
The removal of unwanted, parasitic vibrations in a video sequence induced by
camera motion is an essential part of video acquisition in industrial, military
and consumer applications. In this paper, we present a new image processing
method to remove such vibrations and reconstruct a video sequence void of
sudden camera movements. Our approach to separating unwanted vibrations from
intentional camera motion is based on a probabilistic estimation framework. We
treat estimated parameters of interframe camera motion as noisy observations of
the intentional camera motion parameters. We construct a physics-based
state-space model of these interframe motion parameters and use recursive
Kalman filtering to perform stabilized camera position estimation. A
six-parameter affine model is used to describe the interframe transformation,
allowing quite accurate description of typical scene changes due to camera
motion. The model parameters are estimated using a p-norm-based
multi-resolution approach. This approach is robust to model mismatch and to
object motion within the scene (which are treated as outliers). We use
mosaicking in order to reconstruct undefined areas that result from motion
compensation applied to each video frame. Registration between distant frames
is performed efficiently by cascading interframe affine transformation
parameters. We compare our method's performance with that of a commercial
product on real-life video sequences, and show a significant improvement in
stabilization quality for our method.
M. Kardouchi and J. Konrad, "Recovering large-amplitude
disparity fields using adaptive interpolation," in Proc. SPIE Image and
Video Communications and Process., vol. 5022, pp. 761-771, Jan. 2003, [PDF: 3,824KB].
Computing dense disparity fields from large-baseline stereo is a difficult
problem because of long-range correspondences involved. A typical solution to
this problem is to use optical flow or block matching methods implemented over
a hierarchy of resolutions. However, these approaches cannot easily cope with
disparity discontinuities. Recently, we have proposed a novel approach that
combines feature matching and Delaunay triangulation. In this approach, first
feature points are extracted using intensity corner detector, and then
corresponding feature-point pairs are found using cross-correlation. These two
steps result in a reliable but sparse map of disparity vectors. In order to
compute a dense disparity field, the third step involves Delaunay triangulation
followed by disparity interpolation based on an affine (planar) model. The
resulting disparity fields are continuous everywhere, and thus are not
realistic; typical stereo image pairs exhibit disparity discontinuities at
object boundaries. To address this problem, in the past we subdivided some
Delaunay triangles into smaller ones. Although this approach has significantly
improved the rendition of disparity discontinuities, it did not always work
reliably. In this paper, we propose an adaptive interpolation over Delaunay
triangles. As before, the interpolation is distance-dependent, i.e., accounts
for Euclidian distance between the position of disparity under interpolation
and three vertices of a triangle. The distance-dependent weights, however, are
now additionally adapted so that the interpolated, pixel-based disparities
within each triangle afford discontinuities. The new method has been applied to
natural stereoscopic images. The resulting dense disparity fields exhibit
clear, although subtle, discontinuities at object boundaries, and are more
realistic than disparity fields obtained by the prior approach.
J. Konrad and P. Agniel, "Artifact reduction in lenticular
multiscopic 3-D displays by means of anti-alias filtering," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 5006, pp.
336-347, Jan. 2003, [PDF:
2,279KB].
This paper addresses the issue of artifact visibility in automultiscopic 3-D
lenticular displays. A straightforward extension of the two-view lenticular
autostereoscopic principle to M views results in an M-fold loss of horizontal
resolution due to the subsampling needed to properly multiplex the views. In
order to circumvent the imbalance between the horizontal and vertical
resolution, a tilt can be applied to the lenticules to orient them at a small
angle to the vertical direction, as is done in the
SynthaGram (TM) display from Stereographics Corp. In
either case, to avoid aliasing the subsampling should be preceded by suitable
lowpass pre-filtering. Although for purely vertical lenticules a sufficiently
narrowband lowpass horizontal filtering suffices, the situation is more
complicated for diagonal lenticules; the subsampling of each view is no more
orthogonal, and more complex sampling models need to be considered. Based on
multidimensional sampling theory, we have studied multiview sampling models
based on lattices. These models approximate pixel positions on a lenticular
automultiscopic display and lead to optimal anti-alias filters. In this paper,
we report results for a separable approximation to non-separable 2-D anti-alias
filters based on the assumption that the lenticule slant is small. We have
carried out experiments on a variety of images, and different filter
bandwidths. We have observed that the theoretically-optimal bandwidth is too
restrictive; aliasing artifacts disappear, but some image details are lost as
well. Somewhat wider bandwidths result in images with almost no aliasing and
largely preserved detail. For subjectively-optimized filters, the improvements,
although localized, are clear and enhance the 3-D viewing experience.
J. Konrad and M. Ristivojević, "Joint space-time image
sequence segmentation based on volume competition and level sets," in
Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 573-576, Sept.
2002, [PDF: 209KB].
In this paper, we address the issue of joint space-time segmentation
of image sequences. Typical approaches to such segmentation consider two image
frames at a time, and perform tracking of individual segmentations across
time. We propose to perform this segmentation jointly over multiple
frames. This leads to a 3-D segmentation, i.e., search for a volume ``carved
out'' by a moving object in the (3-D) image sequence domain. We pose the
problem in Bayesian framework and use the MAP criterion. Under suitable
structural and segmentation/motion models we convert MAP estimation to a
functional minimization. The resulting problem can be viewed as volume
competition , a 3-D generalization of region competition. We parameterize the
unknown surface to be estimated, but rather than solving for it using an
active-surface approach, we embed it into a higher-dimensional function and use
the level-set methodology. We show experimental results for the simpler case of
object motion against still background although, given suitable models, the
general formulation can handle complex motion too.
J. Konrad and N. Božinović, "Interpretation of uniform
translational image motion: DCT versus FT," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 281-284, Sept. 2002, [PDF: 2,264KB].
We study properties of the discrete cosine transform (DCT) when
applied to an image sequence formed by uniformly translating a still image. The
Fourier transform (FT) applied to such a sequence has non-zero content only on
a spatio-temporal frequency plane orthogonal to the direction of motion. We
derive an equivalent spectrum for the DCT case. The spectrum function is more
complicated than in the FT case and cannot be easily interpreted
analytically. However, its numerical evaluation demonstrates that spectral
occupancy in the DCT domain is limited to a narrow band around a plane similar
to one in the FT case with two important differences: the plane is subject to
folding, and the DCT coefficient amplitude is strongly attenuated for larger
temporal ``frequencies''. We verify the theoretical derivations experimentally
on images. The obtained result opens an interesting possibility for the
computation of constant-velocity motion in the DCT domain. We demonstrate some
preliminary results of motion estimation in the 3-D DCT domain by identifying
directions of spectral occupancy with respect to transform coefficients.
C. Vázquez, E. Dubois, and J. Konrad, "Reconstruction of
irregularly-sampled images by regularization in spline spaces," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 405-408, Sept. 2002, [PDF: 802KB].
We are concerned with the reconstruction of a regularly-sampled image based on
irregularly-spaced samples thereof. We propose a new iterative method based on
a cubic spline representation of the image. An objective function taking into
account the similarity to the known samples and the regularity of the function
is minimized in order obtain a good approximation. We apply the developed
algorithm to motion-compensated image interpolation. Under motion compensation,
the resulting sampling grids are irregular and require the irregular/regular
interpolation. We show experimental results on real-world images and we compare
our results with other methods proposed in the literature.
R. Stasiński and J. Konrad, "Space-variable filtering for
approximation of uniformly sampled image from samples on irregular grids," in
5-th Nordic Signal Proc. Symp., Oct. 2002, [PDF: 191KB].
The paper presents a space-variable POCS-based (projection onto convex sets)
method for the reconstruction of a regularly-sampled image from its irregularly
spaced samples. Such reconstruction is often needed in image processing and
coding, for example in stereo vision and motion compensation. The proposed
approach applies two operators sequentially: bandwidth limitation and sample
substitution, and is based on our earlier work. The contribution of this paper
is the space-variable implementation of bandwidth limitation operator, which
has been postulated previously. The operator is realized in the simplest
possible way as a filter with two sets of coefficients, a measure of local
density of irregular grid determines which set is used. The technique is
efficient computationally although at the cost of increased memory
requirements. Experimental results demonstrate that indeed, the new technique
is much better in terms of PSNR, convergence speed, and visual quality than
methods described previously.
R. Stasiński and J. Konrad, "Improved POCS-based image
reconstruction from irregularly-spaced samples," in Signal Process. XI:
Theories and Applications (Proc. Eleventh European Signal Process. Conf.), vol. 2, pp. 461-464, Sept. 2002, [PDF: 142KB].
This paper presents an enhanced POCS-based (projection onto convex sets) method
for the reconstruction of a regularly-sampled image from its irregularly-spaced
samples. Such a reconstruction is often needed in image processing and coding,
for example when using motion compensation. The proposed approach applies two
operators sequentially: bandwidth limitation and sample substitution, and is
based on our earlier work. The contribution of this paper is a new, simpler
implementation of the algorithm that allows for faster convergence, and
provides better performance, although at the cost of increased memory
requirements.
A.-R. Mansouri, T. Chomaud, and J. Konrad, "A comparative
evaluation of algorithms for fast computation of level set PDEs with
applications to motion segmentation," in Proc. IEEE Int. Conf. Image
Processing, pp. 636-639, Oct. 2001, [PDF: 228KB].
We address the problem of fast computation of level set partial
differential equations (PDEs) in the context of motion segmentation. Although
several fast level set computation algorithms are known, some of them, such as
the fast marching method, are not applicable to the video segmentation problem
since the front being computed does not advance monotonically. We study
narrow-banding, pyramidal and a pyramidal/narrow-banding schemes that leads to
a 70-fold time gain over the single-resolution scheme.
R. Stasiński and J. Konrad, "POCS reconstruction of
stereoscopic views," in Proc. Int. Conf. on Augmented, Virtual
Environments and Three-Dimensional Imaging, pp. 41-44, May 2001, [PDF: 111KB].
This paper presents an application of POCS (projection onto convex
sets) methodology to the reconstruction of intermediate stereoscopic views.
The basic problem in such a reconstruction, resulting from disparity
compensation, is that of the recovery of a regularly-sampled image from its
irregularly-spaced samples. This problem also arises in other image processing
and coding applications. The results reported here improve our previous
POCS-based reconstruction method by locally adapting the algorithm to the
density of image samples. We also extend the method to color images by
implementing the method in the luminance-chrominance (Y-U-V) space.
M. Kardouchi, J. Konrad, and C. Vázquez, "Estimation of
large-amplitude motion and disparity fields: Application to intermediate view
reconstruction," in Proc. SPIE Visual Communications and Image
Process., vol. 4310, pp. 340-351, Jan. 2001, [PDF: 801KB].
This paper describes a method for establishing dense correspondence between two
images in a video sequence (motion) or in a stereo pair (disparity) in case of
large displacements. In order to deal with large-amplitude motion or disparity
fields, multi-resolution techniques such as blocks matching and optical flow
have been used in the past. Although quite successful, these techniques cannot
easily cope with motion/disparity discontinuities as they do not explicitly
exploit image structure. Additionally, their computational complexity is high;
block matching requires examination of numerous vector candidates while optical
flow-based techniques are iterative. In this paper, we propose a new approach
that addresses both issues. The approach combines feature matching with
Delaunay triangulation, and thus reliable long-range correspondences result
while the computational complexity is not high (sparse representation). In the
proposed approach, feature points are found first using a simple intensity
corner detector. Then, correspondence pairs between two images are found by
maximizing cross-correlation over a small window. Finally, the Delaunay
triangulation is applied to the resulting points, and a dense vector field is
computed by planar interpolation over Delaunay triangles. The resulting vector
field is continuous everywhere, and thus does not reflect motion or depth
discontinuities at object boundaries. In order to improve the rendition of such
discontinuities, we propose to further divide Delaunay triangles whenever the
displacement vectors within a triangle do not allow good intensity match. The
approach has been extensively tested on stereoscopic images in the context of
intermediate view reconstruction where the quality of estimated disparity
fields is critical for final image rendering. The first results are very
encouraging as the reconstructed images are of high quality, especially at
object boundaries, and the computational complexity is lower than that of
multi-resolution block matching.
A.-R. Mansouri, A. Olivier, and J. Konrad,
"Topology-independent region tracking with level sets," in Proc. IEEE
Int. Conf. Image Processing, vol. 3, pp. 66-69, Sept. 2000, [PDF: 299KB].
This paper presents a new approach to the tracking of regions in an image
sequence. Unlike most other methods, the proposed approach can handle topology
changes, i.e., regions may split or merge. This flexibility is naturally
embedded into a partial differential equation that solves a minimum description
length (MDL) estimation problem. The basic estimation criterion consists of
only two terms: the description length of the region shape mismatch and the
description length of the region itself, but we show possible extensions to
this basic formulation. We minimize the MDL criterion using the level set
methodology that inherently accounts for topology changes. We show results for
natural data with natural as well as synthetic motion.
C. Vázquez, J. Konrad, and E. Dubois, "Wavelet-based
reconstruction of irregularly sampled images: Application to stereo imaging,"
in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 319-322,
Sept. 2000, [PDF: 150KB].
We are concerned with the reconstruction of a regularly-sampled image based on
irregularly-spaced samples thereof. We propose a new iterative method based on
a wavelet representation of the image. For this representation we use a
biorthogonal spline wavelet basis implemented on an oversampled grid. We apply
the developed algorithm to disparity-compensated stereoscopic image
interpolation. Under disparity compensation, the resulting sampling grids are
irregular and require the irregular/regular interpolation. We show experimental
results on real-world images and we compare our results with other methods
proposed in the literature.
R. Stasiński and J. Konrad, "POCS-based image
reconstruction from irregularly-spaced samples," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 315-318, Sept. 2000, [PDF: 205KB].
This paper presents a method for the reconstruction of a regularly-sampled
image from its irregularly-spaced samples. Such reconstruction is often needed
in image processing and coding, for example when using motion compensation. The
proposed approach is based on the theory of projections onto convex sets. Two
projection operators are used: bandwidth limitation and sample
substitution. The approach is similar to some methods presented in the
literature in the past, but differs in the implementation. The bandwidth
limitation is implemented in the frequency domain on an oversampled grid thus
allowing substantial flexibility in spectrum shaping of the reconstructed
image. Additionally, a fast Fourier transform algorithm specifically designed
for irregularly-sampled images is used to reduce the computational
complexity. A number of experimental results on natural images are presented.
A.-R. Mansouri and J. Konrad, "Minimum description length
region tracking with level sets," in Proc. SPIE Image and Video
Communications and Process., vol. 3974, pp. 515-525, Jan. 2000, [PDF: 682KB].
This paper addresses the problem of tracking an arbitrary region in a sequence
of images, given a pre-computed velocity field. Such a problem is of importance
in applications ranging from video surveillance to video database search. The
algorithm presented here formulates tracking as an estimation problem. We
propose, as our estimation criterion, a precise description length measure that
quantifies tracking performance. In this context, tracking is naturally
formulated as minimum description length estimation. The solution to this
estimation problem is given by particular evolution equations for the region
boundary. The implicit representation of the region boundary by the zero level
set of a smooth function yields an equivalent set of partial differential
equations and the added benefit of topology independence; regions may split
(e.g., for divergent velocity fields) or merge (e.g., for convergent velocity
fields) during tracking, clearly a desirable feature in real-world
applications. We illustrate the performance of the proposed algorithm on a
number of real images with natural motion.
A.-R. Mansouri, B. Sirivong, and J. Konrad, "Multiple motion
segmentation with level sets," in Proc. SPIE Image and Video
Communications and Process., vol. 3974, pp. 584-595, Jan. 2000, [PDF: 1,079KB].
Motion segmentation of an image sequence belongs to the most difficult and
important problems in video processing and compression, and in computer
vision. In this paper, we consider the problem of segmenting an image into
multiple regions possibly undergoing different motions. To this end we use
level sets of functions evolving according to certain partial differential
equations. Contrary to numerous other motion segmentation algorithms based on
level sets, we compute accurate motion boundaries without relying on intensity
boundaries as an accessory. This will be illustrated on examples where
intensity boundaries are hardly visible and yet motion boundaries are
accurately identified. The main benefit of the level set representation is in
its ability to handle variations in the topology of the level sets. As a
result, it is only necessary to know the total number of distinct motion
classes and their parameters. We describe an automatic initialization procedure
that is based on feature point correspondences and K-means clustering in a
6-parameter space of affine parameters. We illustrate the performance of the
proposed algorithm on real images with both real and synthetic motion.
J. Konrad and Z.-D. Lan, "Dense disparity estimation from
feature correspondences," in Proc. SPIE Stereoscopic Displays and Virtual
Reality Systems, vol. 3957, pp. 90-101, Jan. 2000, [PDF: 1,167KB].
Stereoscopic disparity plays an important role in the processing and
compression of 3-D imagery. For example, dense disparity fields are used to
reconstruct intermediate (varying-viewpoint) images. Although for small camera
baselines dense disparity can be reliably estimated using gradient-based
methods, this is not the case for large baselines due to the violation of
underlying assumptions (e.g., local intensity linearity). Block matching
algorithms work better but they are likely to get trapped in a local minimum
due to the increased search space. An appropriate method to estimate large
disparities is by using feature (characteristic) points. However, since feature
points are unique, they are also sparse. In this paper, we propose a disparity
estimation method that combines the reliability of feature-based correspondence
methods with the resolution of dense approaches. In the first step we find
feature points in the left and right images using Harris operator. In the
second step, we select those feature points that allow one-to-one left-right
correspondence based on a cross-correlation measure. In the third step, we use
the computed correspondence points to control the computation of dense
disparity via regularized block matching that minimizes matching and disparity
smoothness errors. The approach has been tested on several large-baseline
stereo pairs with encouraging initial results.
K. Belloulata, R. Stasiński, and J. Konrad, "Region-based
image compression using fractals and shape-adaptive DCT," in Proc. IEEE
Int. Conf. Image Processing, vol. 2, pp. 815-819, Oct. 1999, [PDF: 425KB].
The significant effort to provide region-based compression and functionality
within the MPEG-4 standard is not paralleled in the still-image compression
domain. In this paper, we propose an approach to fractal coding of still
images that is truly region-based. Unlike previous fractal compression methods
the proposed approach compresses an image region-by-region based on a prior
segmentation, very much like in MPEG-4; individual regions can be decoded
without full image decoding. The method performs the domain/range block
matching in frequency domain using a shape-adaptive discrete cosine
transform. Experimental results evaluating the performance of the approach are
shown.
A.-R. Mansouri and J. Konrad, "Motion segmentation with level
sets," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp.
126-130, Oct. 1999, [PDF:
1,324KB].
Motion segmentation is an important problem in video processing and
compression, and in computer vision. It is usually performed by either first
estimating a field of motion parameters and then segmenting it, or by applying
joint motion estimation and segmentation. Motion segmentation methods often
constrain the set of possible solutions by forcing motion discontinuities to
coincide with intensity discontinuities. In this paper, we propose an iterative
method for joint motion estimation and segmentation that is based on level
sets . The motion within individual segments is parametric and the method does
not use the intensity discontinuity constraint, but is shown to be accurate for
images with both synthetic and natural motion compliant with the assumed motion
models.
J. Konrad, "View reconstruction for 3-D video entertainment:
issues, algorithms and applications," in Proc. Int. Conf. on Image
Process. and its Applications, pp. 8-12, July 1999, [PDF: 241KB].
Significant advances in stereoscopic imaging in the last decade have lead to
viable applications in medicine, teleoperation and, more recently, in
entertainment. Although the stereoscopic technology is still mostly analog, the
migration to the digital domain is inevitable. Such a migration creates new
challenges for stereoscopic video entertainment, but at the same time offers
new opportunities. One particular challenge is the reconstruction of
intermediate views (between the left and right cameras), that finds various
applications. Below, several algorithms aiming at high-quality view
reconstruction, recently developed at INRS, are described, and their relative
merits are discussed. Since a practical implementation requires low complexity,
results of a study of various models and parameters aiming at computational
simplicity are reported.
J. Konrad, "Enhancement of viewer comfort in stereoscopic
viewing: parallax adjustment," in Proc. SPIE Stereoscopic Displays and
Virtual Reality Systems, vol. 3639, pp. 179-190, Jan. 1999, [PDF: 741KB].
One of the major deficiencies of stereoscopic visualization, viewer discomfort,
can be caused by the non-robustness of human perception (hyper-sensitivity to
3-D) or by excessive 3-D cues in the viewed images. In order to minimize this
discomfort, the amount of parallax (or "3D-ness") within each stereo pair needs
to be reduced. Similarly to the case of "continuous look-around", parallax
adjustment requires the knowledge of images from virtual cameras. In the case
of parallel geometry, the virtual cameras are located on the line between the
true cameras. Since in a general scenario no constraint should be posed on the
complexity of the viewed scene, 3-D modeling techniques cannot be used. We
evaluate the usefulness of parallax adjustment using two view reconstruction
methods based on disparity-compensated linear interpolation: a quadtree method
with block splitting adapted to object boundaries and a pixel based (dense)
method. For all, but most complex, stereoscopic images tested (ITU-R 601 from
CCETT and NHK) both algorithms performed very well, especially the pixel-based
approach. In terms of the overall usefulness of parallax adjustment, the
initial tests have shown a very favorable viewer response; the perceived depth
was judged to vary smoothly from zero (one virtual camera) through natural 3-D
(true cameras) to exaggerated 3-D (virtual cameras further apart than the true
cameras - extrapolation). The adjustment was convincing although not completely
free of distortions.
A.-R. Mansouri, A. Mitiche, and J. Konrad, "Selective image
diffusion: application to disparity estimation," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 284-288, Oct. 1998, [PDF: 182KB].
Inverse problems encountered in image processing and computer vision are often
ill-posed. Whether set in a Bayesian or energy-based context, such problems
require prior assumptions expressed through an a priori probability or a
regularization term, respectively. In some cases, the prior term exhibits
partial dependence on the observations (e.g., images) that is often ignored to
simplify modeling and computations. We briefly review methods that take this
dependence into account and we propose a new formulation of the prior term that
blends some other simple approaches. Similarly to others, we apply a linear
transformation to the prior term but, in addition, we require that the
eigenvalues of the transformation have specific properties. These properties
are chosen so that diffusion is allowed only along the direction perpendicular
to local image gradient. If the gradient magnitude is small, isotropic
diffusion is performed. We apply this formulation to stereoscopic disparity
estimation and we show several experimental results; improvements over a
standard approach are clear.
R. Stasiński and J. Konrad, "Reduced-complexity
shape-adaptive DCT for region-based image coding," in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 114-118, Oct. 1998, [PDF: 177KB].
We propose a computationally-efficient variant of the shape-adaptive discrete
cosine transform (SA-DCT) currently considered for MPEG-4. Although the SA-DCT
complexity is acceptable for 8x8 blocks, it is very high when complete regions
are processed at once. To reduce the SA-DCT complexity, we replace its 1-D DCT
with a quasi-DCT algorithm and we assure that the quasi-DCT basis functions are
very close to those of the DCT. Unlike in our previous approach, we carry out
an optimization of the shape of low-index basis functions. We test the new
method numerically and subjectively, and conclude that, in terms of energy
compaction performance, the new method gains up to 0.5dB compared to our
previous quasi-DCT approach.
L. Labelle, D. Lauzon, J. Konrad, and E. Dubois, "Arithmetic
coding of a lossless contour-based representation of label images," in
Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 261-265, Oct.
1998, [PDF: 119KB].
We propose a new method for the encoding of label images (also known as
segmentation maps or alpha planes) that are often used to identify object
location in region-based image and video coders. The method is contour-based
and lossless with a contour model composed of two parts: a contour graph
describing the topology of the contour network and a directional chain code to
deal with the geometric part of the label image (internal contour points). The
graph-based description of the topology is designed to minimize the cost of
encoding the nodes, while the directional chain codes are compressed by
arithmetic coding. The approach is flexible since separating the contour
network into topological and geometrical parts allows the use of other lossless
or lossy methods to encode the geometric part without changing the graph
representation. The proposed method has been compared with an arithmetic
encoder used in MPEG-4.
R. Stasiński and J. Konrad, "Fast quasi-DCT algorithm for
shape-adaptive DCT image coding," in Signal Process. IX: Theories and
Applications (Proc. Ninth European Signal Process. Conf.), pp.
1505-1508, Sept. 1998, [PDF:
215KB].
In this paper we develop a new variant of the shape-adaptive discrete cosine
transform (SA-DCT) recently proposed by Sikora and Makai and currently
considered for MPEG-4 as a texture compression engine. We are concerned with
the computational complexity of the SA-DCT; although its complexity is
acceptable in the context of 8x8 (boundary) blocks as proposed for MPEG-4, it
is very high for a true region-based coding where complete regions (e.g.,
100 by 100 pixels) need to be processed. We adapt the original SA-DCT scheme by
replacing the usual DCT with a quasi-DCT for which some basis functions are
identical and some similar to those of the DCT. We test the new method and
compare it numerically in terms of the basis restriction error as well as
subjectively on some natural images. We conclude that the new method's energy
compaction performance is slightly inferior to that of the SA-DCT, but its
computational complexity is highly reduced.
A. Mancini and J. Konrad, "Robust quadtree-based disparity
estimation for the reconstruction of intermediate stereoscopic images," in
Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol.
3295, pp. 53-64, Jan. 1998, [PDF:
1343KB], [experimental results].
In stereoscopic/multiview video, the reconstruction of intermediate images is
needed to assure continuous motion-parallax and/or comfortable 3-D perception.
In this context, we propose a block-based disparity estimation followed by
disparity-compensated linear interpolation. We progressively deal with
deficiencies of the traditional block matching algorithms. First, we employ a
spatial smoothness constraint for disparity to overcome inherent matching
ambiguity in low-texture areas. Secondly, as a measure of matching error we use
a robust function instead of the quadratic that is sensitive to outliers. We
also extend the formulation to include color. Finally, we relax the rigidity of
the block support for disparities by employing a quadtree block structure
(blocks are allowed to split). The proposed algorithm is implemented in a
hierarchical coarse-to-fine fashion with a Gaussian pyramid to reduce the
computational burden. To correct luminance and color mismatches between images,
a 3-component balancing similar to that proposed by MPEG-2's "Multiview Profile
Ad Hoc Group" is used. We tested the proposed algorithm on stereoscopic video
sequences acquired in natural surroundings by almost parallel cameras. In
informal viewing, every feature of the algorithm listed above resulted in clear
improvements of the reconstruction quality. Overall the reconstructed image
quality was very good to excellent, depending on the image used.
A.-R. Mansouri and J. Konrad, "Block-based winner-takes-all
reconstruction of intermediate stereoscopic images," in Proc. SPIE Visual
Communications and Image Process., vol. 3309, pp. 922-933, Jan. 1998, [PDF: 2244KB].
This paper addresses the issue of the reconstruction of intermediate views from
a pair of stereoscopic images. Such a reconstruction is needed for the
enhancement of depth perception in stereoscopic systems, e.g., ``continuous
look around'' or adjustment of virtual camera baseline. The algorithm proposed
here addresses the issue of blur; unlike typical reconstruction algorithms that
perform averaging between disparity-compensated left and right images the new
algorithm uses non-linear filtering via a winner-takes-all strategy. The
image under reconstruction is assumed to be a tiling by fixed-size blocks that
come from various positions of either the left or right images
using disparity compensation. The tiling map is modeled by a binary decision
field while the disparity model is based on a smoothness constraint. The models
are combined through a maximum a posteriori probability (MAP)
criterion. The intermediate intensities, disparities and the binary decision
field are estimated jointly using the expectation-maximization (EM)
algorithm. The proposed algorithm is compared experimentally with a reference
block-based algorithm employing linear filtering. Although the improvements are
localized and often subtle, they demonstrate that a high-quality intermediate
view reconstruction for complex scenes is feasible if camera convergence angle
is small.
C.-H. Yang and J. Konrad, "Motion-based video segmentation
using continuation method and robust cost functions," in Proc. SPIE
Visual Communications and Image Process., vol. 3309, pp. 774-785, Jan.
1998, [PDF: 1005KB].
We propose a new approach to spatial segmentation of video sequences that is
based on motion attributes. The approach, similarly to some previous efforts,
uses Markov random field models and maximum a posteriori probability
estimation. Our approach is novel in three ways. First, we propose a general
formulation for the joint motion estimation and segmentation of which the
segmentation problem is a special case (piecewise-constant translational
motion). Secondly, instead of the usual quadratic models (Gaussian likelihood)
we propose a robust estimation criterion that eliminates the impact of outliers
on the estimates. Thirdly, since solving the segmentation problem directly in
the space of discrete labels is difficult (e.g., because of the high dependence
on the initial state), we opt for a continuation method over a Gaussian
pyramid. Thus, the estimation process starts as a motion estimation and then
slowly converges towards a motion-based segmentation by ``hardening'' the
smoothness constraint. The final result is a quasi-segmentation , i.e.,
the estimated vector field is continuous but almost piecewise constant, and
must undergo subsequent quantization. We show experimental results on two
natural image sequences; the resulting quasi-segmentations clearly extract
moving objects. The method may serve as an initial stage for joint motion
estimation and segmentation, or may produce final segmentations if suitable
post-processing is applied.
E. Dubois, J. Konrad, and S. Cantet, "Estimation of nonlinear
transfer curves for conversion of color images to a known color space," in
Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 26-29, Oct.
1997, [PDF: 216KB].
This paper presents a supervised algorithm for estimating the unknown
nonlinearity undergone by the three color components of an image in the image
acquisition process. The algorithm is based on the rank-one hypothesis, which
postulates that the linear tristmulus values in a region of uniform surface
color lie on a straight line through the origin. An objective function is
formulated whose minimization yields the estimate of the unknown
nonlinearity. Images corrected with the estimated inverse nonlinearity are
shown to exhibit chromatic properties that are much more piecewise constant
that in the original image. This propoerty will be very useful in quantization
and segmentation applications.
R. Stasiński and J. Konrad, "DCT-based shape-adaptive
transform for region-oriented image compression and manipulation," in
Workshop on Image Analysis for Multimedia Interactive Services,
(Louvain, Belgium), June 1997, [PDF:
174KB].
In the paper a new DCT-based shape-adaptive transform algorithm is
presented. The transform is derived from the DCT algorithm flowgraph by
substitution of operations in such a way that region and background samples are
not mixed together. The computational complexity of the algorithm is of the
same rank as that of the DCT and significantly lower than that of the
state-of-the-art shape-adaptive transforms. Preliminary experiments show that
the new algorithm performs better than the direct DCT (with extrapolation) and
is only slightly inferior to the approaches of Gilge et al. and of Sikora
and Makai.
R. Stasiński and J. Konrad, "A new approach to generation of
shape-adaptive transforms," in Int. Workshop on Systems, Signals and
Image Process., (Poznań, Poland), pp. 13-16, May 1997, [PDF: 147KB].
In the paper we describe a new approach to generation of orthogonal transforms
that self-adapt to arbitrary shapes. The new algorithms are derived from
flowgraphs of standard fast transform algorithms by a suitable modification of
their substructures. For simplicity we show how to derive a shape-adaptive
transform from the discrete Walsh-Hadamard transform (DWHT) flowgraph. We
compare performance and computational complexity of new algorithms with those
of several well-known approaches. It can be clearly seen that for DCT the
proposed approach gives a very beneficial performance/complexity ratio compared
to other well-known techniques.
J. Konrad and V.-N. Dang, "Coding-oriented video segmentation
inspired by MRF models," in Proc. IEEE Int. Conf. Image
Processing, vol. 1, pp. 909-912, Sept. 1996, [PDF: 242KB].
This paper presents an approach to the segmentation of video sequences that is
inspired by Markov random field (MRF) models and is aimed at region-based video
compression. Two goals of the segmentation algorithm are considered: to assure
a rate-efficient partitioning of video sequences and to provide regions that
are meaningful for human observers (``coding for content''). To address both
issues we extend our earlier work; we incorporate a segmentation complexity
measure to account for the rate allocated to region shape, we use a robust
error criterion to reject outliers in the intensity residual and we incorporate
a temporal consistency constraint to assure the continuity of segmentation in
time. We demonstrate improvements in the segmentation for real
videoconferencing sequences.
C. Stiller and J. Konrad, "A region-adaptive transform based
on a stochastic model," in Proc. IEEE Int. Conf. Image Processing,
vol. 2, pp. 264-267, Oct. 1995, [PDF:
182KB].
This paper is concerned with linear transforms for arbitrarily-shaped image
segments. In contrast to other techniques described in the literature, the
proposed transform is based upon a stochastic model of image covariance within
the considered region. Emerging from a separable stationary Markov model
proposed for rectangular regions, we derive a non-stationary Markov model with
natural boundary conditions. We compute it eigentransform, which is the optimum
linear transform under a broad variety of performance measures. For the special
case of a rectangular region, the method yields the DCT basis
functions. Simulation results for natural imagery are provided.
V.-N. Dang, A.-R. Mansouri, and J. Konrad, "Motion estimation
for region-based video coding," in Proc. IEEE Int. Conf. Image
Processing, vol. 2, pp. 189-192, Oct. 1995, [PDF: 325KB].
Region-based video compression has been a very active research area over the
last few years. It has been viewed as a potential alternative to traditional
schemes suffering from the ``blockiness'' of image intensities at very low bit
rates. In this paper we present a new approach to region-based representation
and estimation of motion. It is based on the observation that motion boundaries
usually coincide with region boundaries. Thus, we first compute an
intensity-based image partition and use it as an initial step in a 3-step
algorithm: motion estimation for intensity-derived regions, motion-based region
fusion and adjustment of region boundaries. We present experimental results for
standard QCIF images and compare our method with block matching and dense
motion field estimation. We also study the performance loss due to a lossy
transmission of partition information.
J. Konrad, M. Zaremba, G. Chan, and M. Gaudreau, "Parallel
computation of dense motion fields using a Hopfield network," in Proc. Scand. Conf. Image Analysis, SCIA'95, pp. 609-616, June 1995, [PDF: 254KB].
Motion of pixels in time-varying images plays an essential role in video
compression. Therefore, to build practical video coders motion estimation must
be carried out in real time. Usually, simple motion models executed on a
sequential processor achieve that goal; VLSI circuits implementing block
matching are used in MPEG and H.261 coders. An alternative is to use more
complex motion models that can be implemented on a parallel architecture, e.g.,
single-instruction multiple-data (SIMD) system. In this paper, we study a
different approach to the parallelization of motion estimation, an approach
based on neural networks. We formulate the problem in the context of a Markov
random field (MRF) model, derive a cost function for minimization and propose a
solution method using a Hopfield network. We simulate the network on a
sequential processor and compare its performance with a sequential algorithm
based on the Gauss-Newton minimization.
C. Stiller and J. Konrad, "Eigentransforms for region-based
image processing," in Proc. Int. Conf. on Consumer Electronics, pp.
286-287, June 1995, [PDF:
146KB].
Linear transforms such as the DCT are efficient for image compression. While
known transforms that approximate the eigentransform are limited to rectangular
regions, this paper proposes a model for construction of eigentransforms for
arbitrarily-shaped image segments.
J. Konrad, A.-R. Mansouri, E. Dubois, V.-N. Dang, and J.-B.
Chartier, "On motion modeling and estimation for very low bit rate
video coding," in Proc. SPIE Visual Communications and Image
Process., vol. 2501, pp. 262-273, May 1995, [PDF: 598KB].
In video coding at high compression rates, e.g., in very low bit rate coding,
every transmitted bit carries a significant amount of information that is
related either to motion parameters or to intensity residual. As demonstrated
in the SIM-3 coding scheme, a more precise motion model leads to improved
quality of coded images when compared with the H.261 coding standard. In this
paper, we present some of our recent results on the modeling and estimation of
motion for the compression and post-processing of typical videophone
(``head-and-shoulders'') image sequences. We describe a block-based motion
estimation that attempts to optimize the overall bit budget for intensity
residual, motion and overhead information. We compare simulation results for
this scheme with full-search block matching in the context of the H.261
coding. Then, we discuss a region-based motion estimation that exploits
segmentation maps obtained from an MDL-based (minimum description length)
algorithm. We compare experimentally several algorithms for the compression of
such maps. Finally, we describe motion-compensated interpolation that takes
into account pixel acceleration. We show experimentally a major performance
improvement of the constant-acceleration model over the usual constant-velocity
models. This is a very promising technique for post-processing in the receiver
to improve reconstruction of frames dropped in the transmitter.
L. Bonnaud, C. Labit, and J. Konrad, "Interpolative coding of
image sequences using temporal linking of motion-based segmentation," in
Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp.
2265-2268, May 1995, [PDF:
365KB].
This paper presents a new temporal interpolation algorithm based on
segmentation of images into polygonal regions undergoing affine motion. The
goal of this work is to improve upon the block-based interpolation used in MPEG
(B-Frames). In the first part, we describe the region-based framework and the
temporal linking algorithm that jointly provide the segmentation and motion
parameters. In the second part, we present various applications of the proposed
algorithm to temporal interpolation (from interpolation to bidirectional
motion-compensated prediction). We examine one of these schemes in detail,
including the special processing of occlusion areas. We show images
reconstructed from a synthetic image sequence and using the MSE criterion we
compare quality with other schemes.
M. Chahine and J. Konrad, "Motion-compensated interpolation
using trajectories with acceleration," in Proc. SPIE Digital Video
Compression: Algorithms and Technology, vol. 2419, pp. 152-163, Feb.
1995, [PDF: 816KB].
This paper is primarily concerned with motion-compensated interpolation of
video sequences using multiple images. Due to the extended temporal support of
such motion compensation, linear (constant-velocity) trajectory model is often
inappropriate, for example due to insufficient temporal sampling. Recently, we
have proposed a quadratic (constant-acceleration) trajectory model and a
framework for the computation of its parameters. The approach is based on
Markov random field (MRF) models that lead to a regularized formulation solved
by multiresolution deterministic relaxation. In this paper, we demonstrate
advantages of using accelerated motion over linear trajectories in a plausible
application using natural data. We apply the estimated trajectories to
motion-compensated interpolation over multiple frames of progressive and
interlaced video sequences. The experimental results for ``Miss America'' (CIF)
and ``Femme et arbre'' (interlaced) show, respectively, a 4 and 2 dB average
improvement in the PSNR of the reconstruction error when quadratic trajectories
are used instead of the linear ones. It is interesting to note that in ``Miss
America'' the most significant improvements can be observed in the area of the
mouth and the eyes which are in fact likely to exhibit acceleration. We
envisage an application of the proposed method to post-processing in very low
bit rate video coding.
P. Treves and J. Konrad, "Motion estimation and compensation
under varying illumination," in Proc. IEEE Int. Conf. Image
Processing, vol. 1, pp. 373-377, Nov. 1994, [PDF: 343KB].
In this paper we propose a new approach to motion-compensated filtering of
image sequences that contain time-varying illumination. There are two
contributions in this paper. First, we propose a new method for the estimation
of dense 2-D motion that is robust to time-varying illumination often present
in images. We define the structural model that is based on the assumption
of intensity gradient constancy along motion trajectories. This is in contrast
to the usual hypothesis of the intensity constancy. Secondly, we apply the
proposed approach to motion-compensated temporal interpolation. We compare the
image reconstruction error obtained using the new approach with the error
obtained for standard models.
M. Chahine and J. Konrad, "Estimation of trajectories for
accelerated motion from time-varying imagery," in Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 800-804, Nov. 1994, [PDF: 274KB].
This paper is concerned with the estimation of trajectories for accelerated
motion from image sequences. Unlike in many other approaches, that assume
linear trajectories, we propose a quadratic model that incorporates both
velocity and acceleration. This model corresponds better to practical
applications especially when the estimation is performed over several images,
e.g., in motion-compensated processing with extended temporal support. This is
due to the fact that over longer time frame and in the presence of
acceleration, quadratic trajectory is capable of providing a better intensity
match than a simple displacement. The algorithm for the estimation of dense
accelerated motion fields is formulated in this paper using
regularization and the solution is based on deterministic relaxation
implemented over a pyramid of resolutions. Extensive experimental results for
test images with synthetic motion are presented.
J. Konrad and P. Treves, "Estimation of dense 2-D motion
based on the constancy of intensity gradient," in Signal Process. VII:
Theories and Applications (Proc. Seventh European Signal Process. Conf.), pp. 684-687, Sept. 1994, [PDF: 268KB].
This paper describes a new approach to the estimation of dense 2-D motion from
image sequences. Unlike in many other approaches that assume the constancy of
image intensity along motion trajectories, we propose to use a higher order
model that permits a variation of such intensity. We define the structural
model that is based on the assumption of intensity gradient constancy along
motion trajectories. This model has been proposed before, however in
formulations that require exact satisfaction of the intensity gradient
constraint. Due to inherent noise, aliasing, etc. present in images such
solution necessitates additional post-processing , for example
smoothing. We propose a different approach that is based on simultaneous
estimation and smoothing. We formulate the problem using regularization
where the assumptions of gradient constancy and of motion smoothness are
combined into a single cost function. We minimize this function by an iterative
method. We demonstrate estimation results for the original and for the
``regularized'' approach on natural image sequences.
H. Nicolas, J. Konrad, and C. Labit, "Joint estimation of
motion and illumination variations for coding of image sequences," in
Proc. Scandinavian Conf. Image Analysis, pp. 507-514, May 1993, [PDF: 156KB].
This paper describes a new approach to the problem of motion estimation for the
coding of image sequences. The goal is to obtain an efficient description
(parametrization) of temporal variations between two successive images in a
sequence. To achieve this we propose to use the standard hypothesis of
luminance constancy along a motion trajectory simultaneously introducing a
polynomial representation of illumination variations. The estimation process
consists of two iteratively alternating stages: a region-based estimation of
apparent 2D motion parameters and an estimation of 2D illumination
variations. Such an approach reduces the residual reconstruction error after
motion compensation due to improved estimation of motion parameters.
E. Dubois and J. Konrad, "Motion estimation and
motion-compensated filtering of video signals," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 95-98, Apr. 1993, [PDF: 117KB].
This paper is concerned with methods to estimate 2-D motion in time-varying
images for application to motion-compensated filtering. The approach is based
on the minimization of objective functions that can be interpreted as energies
of suitable Markov-Gibbs random fields. A flexible class of cost functions is
described that can be applied in a wide variety of specific applications,
including the estimation of motion trajectories over several image frames. The
issues of minimizing the cost function and applications to motion-compensated
filtering are then briefly addressed.
J. Radecki, J. Konrad, and E. Dubois, "Design of finite
wordlength 2-D IIR filters using simulated annealing," in Signal
Process. VI: Theories and Applications (Proc. Sixth European Signal
Process. Conf.), pp. 953-956, Aug. 1992, [PDF: 188KB].
This paper proposes a new approach to the design of two-dimensional (2-D)
infinite impulse response (IIR) filters with finite precision coefficients. An
objective function is proposed which combines magnitude, phase, step response
and stability errors. This function being multidimensional and, in general,
non-convex is minimized using simulated annealing . Development of this
method constitutes the first step in a feasibility study of the application of
2-D IIR filters to the processing of video signals. Initial results on the
design of low-pass filters are very encouraging and compare favourably with
similar finite impulse response (FIR) designs.
J. Konrad, "Use of colour in gradient-based estimation of
dense two-dimensional motion," in Proc. Conf. Vision Interface
VI'92, pp. 103-109, May 1992, [PDF:
539KB].
This paper presents a gradient-based approach to the multi-constraint
estimation of dense two-dimensional (2-D) motion. The formulation is based on
feature-invariance along motion trajectories and applies motion smoothness
constraint to reduce ill-posedness. It permits the use of various image
features as the input, for example intensity and colours, or sub-bands of a
spectral decomposition. The proposed cost function is minimized using a
sequence of quadratic approximations of the matching error and solving the
resulting linear system by deterministic relaxation. The proposed algorithm is
a generalization of the Horn and Schunck algorithm to the case of vector data.
Results of application of the proposed technique to the estimation of 2-D
motion from TV images are shown. The obtained motion fields are applied to
motion-compensated temporal interpolation resulting in significant but
localized improvements.
J. Konrad, J. Radecki, and E. Dubois, "On the design of finite
wordlength IIR filters for video applications," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, vol. 4, pp. 341-344, Mar.
1992, [PDF: 135KB].
This paper addresses the problem of designing finite precision one-dimensional
(1-D) infinite impulse response (IIR) digital filters for video processing. The
design algorithm is based on simultaneous minimization of magnitude, phase and
stability errors in a discrete space of solutions using simulated
annealing . It is demonstrated that the approach results in filters
characterized by a substantially reduced non-linearity of the phase response in
filter pass band, which is critical in any video processing application. To
reduce image degradations due to ripples of the filter step response, another
error term is introduced into the cost function. It is demonstrated that this
additional term permits significant reduction of step response overshoots, and
thus the visibility of degradations in a filtered image. The designed IIR
filters are compared with their finite impulse response (FIR) counterparts in
terms of characteristic parameters as well as distortion visibility in
processed images.
J. Radecki, J. Konrad, and E. Dubois, "Design of finite
wordlength IIR filters with prescribed magnitude, group delay and stability
properties using simulated annealing," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 1637-1640, May 1991, [PDF: 145KB].
This paper investigates the problem of designing finite precision
one-dimensional (1-D) infinite impulse response (IIR) filters with prescribed
magnitude, phase and stability constraints. The design problem is formulated as
the minimization of a cost function incorporating these conflicting
requirements. The first two elements of the cost function express magnitude and
group delay errors between the desired and the actual frequency responses of a
filter, while the third one is related to its stability margin. This cost
function is minimized using simulated annealing based on the
Metropolis algorithm . Examples of several finite wordlength filters designed
by the above method are presented and compared with Chebyshev and elliptic
filters with rounded coefficients.
E. Dubois and J. Konrad, "Review of techniques for motion
estimation and motion compensation," in Proc. Int. Coll. Advanced
Television Syst., pp. 3B.3.1-3B.3.19, June 1990.
J. Radecki, J. Konrad, and E. Dubois, "A comparison of
simulated annealing and N-step newton methods for designing 1-D and 2-D
finite wordlength FIR filters," in Proc. Canadian Conf. Electr. Comp. Eng., pp. 53.3.1-53.3.4, Sept. 1990.
J. Konrad and E. Dubois, "A comparison of stochastic and
deterministic solution methods in Bayesian estimation of 2-D motion," in
Proc. European Conf. Computer Vision, pp. 149-160, Apr. 1990.
J. Konrad and E. Dubois, "Use of colour information in
Bayesian estimation of 2-D motion," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 2205-2208, Apr. 1990.
This paper is concerned with extension of previous work on the Bayesian
estimation of 2-D motion from image sequences by incorporating the colour cue
into the estimation process. Instead of scalar image intensity, three-component
vector representation of colour images is used, thus allowing Y-C1-C2, RGB or
other formats. The Maximum a Posteriori Probability estimation is shown to
result in a three-term energy minimization. White Gaussian noise model is used
for the displaced pel differences of each image component, and a coupled
vector-binary Markov random field model is used for displacement and
discontinuity fields. The resulting criterion is optimized using the method of
discrete state space simulated annealing. Improvements in the quality of
estimated displacement fields due to additional colour information are
demonstrated through several experimental results.
J. Konrad and E. Dubois, "Bayesian estimation of discontinuous
motion in images using simulated annealing," in Proc. Conf. Vision
Interface VI'89, pp. 51-60, June 1989.
J. Konrad and E. Dubois, "Multigrid Bayesian estimation of
image motion fields using stochastic relaxation," in Proc. IEEE Int. Conf. Computer Vision, pp. 354-362, Dec. 1988.
J. Konrad, "Stochastic estimation of motion in television
images," in Proc. 3-rd Conf. on Science and Technology ``Signal
Processing in Telecommunications, Control and Inspection'', Poland, Sept.
1988 (in Polish).
J. Konrad and E. Dubois, "Estimation of image motion fields:
Bayesian formulation and stochastic solution," in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing, pp. 1072-1075, Apr. 1988.