Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF)

Motivation

Although there exist public people-detection datasets for fisheye images, they are annotated either by point location of a person’s head or by a bounding box around a person’s body aligned with image boundaries. However, due to radial geometry of fisheye images, people standing under an overhead fisheye camera appear radially-aligned. Therefore,  their ground-truth bounding boxes should be also radially aligned. This dataset addresses this issue – all ground-truth bounding boxes are aligned with a person’s body rather than with image boundaries.

Description

Human-Aligned Bounding Boxes from Overhead Fisheye cameras (HABBOF) dataset has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in September 2019. The dataset contains 4 videos recorded by overhead-mounted fisheye cameras in two different rooms (a computer lab and a small conference room) and associated annotations of 5,837 frames in total. In these videos, 3 or 4 people perform daily activities like standing, walking, sitting, and writing on a whiteboard. In some videos, lights are being turned on and off, and some furniture is moved. This increases realism and the level of difficulty for tasks such as people detection and tracking. More detailed information about each video is provided in the table below.

Video Meeting 1 Meeting 2 Lab 1 Lab 2
Scenario Conference
room
Conference
room
Computer
lab
Computer
lab
Max. No.
of people
3 3 4 4
No. of frames 1,119 1,121 1,792 1,805
Resolution   2,048 x 2,048
x 30Hz
2,048 x 2,048
x 30Hz
2,048 x 2,048
x 30Hz
2,048 x 2,048
x 12Hz
Camera Axis M3057 PLVE Axis M3057 PLVE Geovision
GV-FER12203
Geovision
GV-FER12203
Challenges  Walking activity
in image center
and at periphery
Walking activity
at image periphery
Strong occlusions
Complex poses
and walking at image periphery
Spatially-nonuniform illumination
Close proximity between people
Time-varying global illumination
Spatially-nonuniform illumination


Data Format

The dataset consists of 5,837 frame-annotation pairs:

  • original video frame: <fr_id>.jpg
  • annotation text file: <fr_id>.txt

where <fr_id> is a 6 digit number.

Each line in the annotation text file describes a single bounding box as [object_class, x, y, w, h, R] where “object_class” represents the class of the object (in the current version of the dataset, it is always “person”), “x” and “y” are the coordinates of the center of the bounding box in pixels from the top-left image corner at [0,0], “w” and “h” are its width and height in pixels, and “R” is its clock-wise rotation angle from the vertical axis pointing up, in degrees. In order to avoid ambiguity, the bounding-box width is constrained to be less or equal to its height (w<=h), and the rotaation angle is set to be within -90 to +90 degrees. An example of one ground-truth annotation is shown in the table and image below. 

Class name x y  w h R
person 512 396 107 213 -41
person 980 229 89 198 0
person 1587 461 97 190 45
person 929 1601 220 316 61

Dataset Download

You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:

S. Li, M.O. Tezcan. P. Ishwar, and J. Konrad, “Supervised people counting using an overhead fisheye camera”, in Proc. IEEE Intern. Conf. on Advanced Visual and Signal-Based Surveillance (AVSS), Sep. 2019.

To access the download page, please complete the form below (tested only in Chrome).

HABBOF Download Form

Contact

Please contact [mtezcan] at [bu] dot [edu] if you have any questions.

Acknowledgements

The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944. 

We would also like to thank students at Boston University for their participation in the recording and annotation of our dataset.