Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF)

Motivation

Although there exist public people-detection datasets for fisheye images, they are annotated either by point location of a person’s head or by a bounding box around a person’s body aligned with image boundaries. However, due to radial geometry of fisheye images, people standing under an overhead fisheye camera appear radially-aligned. Therefore, their ground-truth bounding boxes should be also radially aligned. This dataset addresses this issue – all ground-truth bounding boxes are aligned with a person’s body rather than with image boundaries.

Description

Human-Aligned Bounding Boxes from Overhead Fisheye cameras (HABBOF) dataset has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in September 2019. The dataset contains 4 videos recorded by overhead-mounted fisheye cameras in two different rooms (a computer lab and a small conference room) and associated annotations of 5,837 frames in total. In these videos, 3 or 4 people perform daily activities like standing, walking, sitting, and writing on a whiteboard. In some videos, lights are being turned on and off, and some furniture is moved. This increases realism and the level of difficulty for tasks such as people detection and tracking. More detailed information about each video is provided in the table below.

Video	Meeting 1	Meeting 2	Lab 1	Lab 2
Scenario	Conference room	Conference room	Computer lab	Computer lab
Max. No. of people	3	3	4	4
No. of frames	1,119	1,121	1,792	1,805
Resolution	2,048 x 2,048 x 30Hz	2,048 x 2,048 x 30Hz	2,048 x 2,048 x 30Hz	2,048 x 2,048 x 12Hz
Camera	Axis M3057 PLVE	Axis M3057 PLVE	Geovision GV-FER12203	Geovision GV-FER12203
Challenges	Walking activity in image center and at periphery	Walking activity at image periphery	Strong occlusions Complex poses and walking at image periphery Spatially-nonuniform illumination	Close proximity between people Time-varying global illumination Spatially-nonuniform illumination

Data Format

The dataset consists of 5,837 frame-annotation pairs:

original video frame: <fr_id>.jpg
annotation text file: <fr_id>.txt

where <fr_id> is a 6 digit number.

Each line in the annotation text file describes a single bounding box as [object_class, x, y, w, h, R] where “object_class” represents the class of the object (in the current version of the dataset, it is always “person”), “x” and “y” are the coordinates of the center of the bounding box in pixels from the top-left image corner at [0,0], “w” and “h” are its width and height in pixels, and “R” is its clock-wise rotation angle from the vertical axis pointing up, in degrees. In order to avoid ambiguity, the bounding-box width is constrained to be less or equal to its height (w<=h), and the rotaation angle is set to be within -90 to +90 degrees. An example of one ground-truth annotation is shown in the table and image below.

Class name	x	y	w	h	R
person	512	396	107	213	-41
person	980	229	89	198	0
person	1587	461	97	190	45
person	929	1601	220	316	61

Dataset Download

You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:

S. Li, M.O. Tezcan. P. Ishwar, and J. Konrad, “Supervised people counting using an overhead fisheye camera”, in Proc. IEEE Intern. Conf. on Advanced Visual and Signal-Based Surveillance (AVSS), Sep. 2019.

To access the download page, please complete the form below (tested only in Chrome).

HABBOF Download Form

Name*
First Last

Institution*

Email*

Contact

Please contact [mtezcan] at [bu] dot [edu] if you have any questions.

Acknowledgements

The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944.

We would also like to thank students at Boston University for their participation in the recording and annotation of our dataset.