Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF)
Although there exist public people-detection datasets for fisheye images, they are annotated either by point location of a person’s head or by a bounding box around a person’s body aligned with image boundaries. However, due to radial geometry of fisheye images, people standing under an overhead fisheye camera appear radially-aligned. Therefore, their ground-truth bounding boxes should be also radially aligned. This dataset addresses this issue – all ground-truth bounding boxes are aligned with a person’s body rather than with image boundaries.
Human-Aligned Bounding Boxes from Overhead Fisheye cameras (HABBOF) dataset has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in September 2019. The dataset contains 4 videos recorded by overhead-mounted fisheye cameras in two different rooms (a computer lab and a small conference room) and associated annotations of 5,837 frames in total. In these videos, 3 or 4 people perform daily activities like standing, walking, sitting, and writing on a whiteboard. In some videos, lights are being turned on and off, and some furniture is moved. This increases realism and the level of difficulty for tasks such as people detection and tracking. More detailed information about each video is provided in the table below.
|Video||Meeting 1||Meeting 2||Lab 1||Lab 2|
|No. of frames||1,119||1,121||1,792||1,805|
|Resolution||2,048 x 2,048
|2,048 x 2,048
|2,048 x 2,048
|2,048 x 2,048
|Camera||Axis M3057 PLVE||Axis M3057 PLVE||Geovision
in image center
and at periphery
at image periphery
and walking at image periphery
|Close proximity between people
Time-varying global illumination
The dataset consists of 5,837 frame-annotation pairs:
- original video frame: <fr_id>.jpg
- annotation text file: <fr_id>.txt
where <fr_id> is a 6 digit number.
Each line in the annotation text file describes a single bounding box as [object_class, x, y, w, h, R] where “object_class” represents the class of the object (in the current version of the dataset, it is always “person”), “x” and “y” are the coordinates of the center of the bounding box in pixels from the top-left image corner at [0,0], “w” and “h” are its width and height in pixels, and “R” is its clock-wise rotation angle from the vertical axis pointing up, in degrees. In order to avoid ambiguity, the bounding-box width is constrained to be less or equal to its height (w<=h), and the rotaation angle is set to be within -90 to +90 degrees. An example of one ground-truth annotation is shown in the table and image below.
You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:
S. Li, M.O. Tezcan. P. Ishwar, and J. Konrad, “Supervised people counting using an overhead fisheye camera”, in Proc. IEEE Intern. Conf. on Advanced Visual and Signal-Based Surveillance (AVSS), Sep. 2019.
To access the download page, please complete the form below (tested only in Chrome).
Please contact [mtezcan] at [bu] dot [edu] if you have any questions.
The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944.
We would also like to thank students at Boston University for their participation in the recording and annotation of our dataset.