Occupancy sensing using overhead fisheye cameras

One data modality leveraged by COSSY for indoor people counting is the output of an overhead, high-resolution RGB camera equipped with a fisheye lens (e.g., 360-deg by 180-deg, example shown on right).

A reliable detection of people in such images poses two challenges. First, if camera’s optical axis is orthogonal to room’s floor, standing people appear radially with respect to the center of the field of view (FOV). This is unlike in images captured by a side-mounted, standard surveillance camera (e.g., mounted high on a wall) where standing people usually appear aligned with image’s vertical axis. Secondly, a fisheye lens exhibits significant geometric distortions at the FOV periphery (e.g., objects become compressed). This effect is absent or minimal in standard surveilllance cameras. Because of these challenges, people-detection methods developed for side-mounted, standard surveillance cameras perform poorly on overhead, fisheye cameras. To address this, we have developed various approaches to people detection in overhead, fisheye cameras:

People counting using overhead fisheye cameras [Li et al., AVSS-2019]
This method applies YOLO (version 3) to a large, center-top window “under” which the image is rotated in 15-deg increments and the results are combined via post-processing. The source code for this method is available for download from the link above.
Rotation-Aware People Detection in overhead fisheye images: RAPiD [Duan et al., CVPRW-2020]
This method is an end-to-end solution that extends YOLO version 3 by adding a novel loss function for the bounding-box rotation angle and by suitably modifying the network architecture.
RAPiD+REPP, RAPiD+FA and RAPiD+FGFA [Tezcan et al., WACV-2022]
These three methods are spatio-temporal extensions RAPiD that enforce temporal coherence of people detections in neighboring video frames.

In large spaces, a single overhead fisheye camera may not be sufficient for reliable people counting since people far away may not be reliably detected by the above algorithms (too small – see the video above). Employing multiple cameras can help address this difficulty, however a person may appear in the field of view of multiple cameras and be counted multiple times resulting in overcounting. To resolve this, person re-identification is needed so that each person is counted only once despite appearing in view of multiple cameras. While person re-identification has been studied extensively for standard rectilinear cameras, it is a fairly unexplored topic for overhead fisheye cameras, and the specific scenario of overlapping fields of view. This is our ongoing work that we have explored from the standpoint of algorithms and datasets:

Geometry-Based Person Re-Identification in Fisheye Stereo [Bone et al., AVSS-2021]
A method that uses a person’s location rather than appearance to perform re-identification between two calibrates overhead fisheye cameras. Also, a new fisheye-camera calibration method and a novel automated approach to calibration data
collection.
Fisheye Re-Identification Dataset with Annotations (FRIDA) [Cokbas et al. AVSS-2022]
A new dataset for validating person re-identification algorithms from overhead fisheye cameras and a benchmark comparison of the geometry-based re-identification algorithms (above) with a state-of-the-art deep-learning appearance-based algorithm.