Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF)

Motivation

In September 2019, we published the first people-detection dataset captured by overhead fisheye images that used rotated bounding boxes aligned with a person’s body (HABBOF). Although very useful for developing and evaluating people-detection algorithms, the dataset contained less than 6,000 annotated frames, with at most 4 people visible at a time and only two challenges (moving objects and lights off).

Therefore, we introduce a new dataset, Challenging Events for Person Detection from Overhead Fisheye Images (CEPDOF). The new dataset consists of 8 videos, with up to 13 people visible at a time, over 25,000  annotated frames and several new challenging scenarios, compared to HABBOF (please see the description below for details). Furthermore, CEPDOF is annotated spatio-temporally, that is bounding boxes of the same person carry the same ID in consecutive frames, and thus can be also used for additional vision tasks using overhead, fisheye images, such as video-object tracking and human re-identification.

Description

Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF) dataset has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in April 2020. The new dataset consists of 8 videos recorded by overhead-mounted fisheye cameras in one room (a small classroom), with up to 13 people visible at a time and 25,504  annotated frames. The videos include a range of challenging scenarios: crowded room, severe body occlusions, various (often unusual) body poses, head camouflage (e.g., hoods, hats), images of people on a projection screen and low-light conditions with or without IR (infra-red) illumination. More detailed information about each video is provided in the table below.

Video sequence Scenarios Description/Challenges Max. No. of people No. of frames Resolution
Lunch meeting 1 Common activities People walking and sitting. 11 1,201 2,048 x 2,048 x 1Hz
Lunch meeting 2 Crowded scene More than 10 people sitting and having lunch. 13 3,000 2,048 x 2,048 x 10Hz
Lunch meeting 3 Common activities People walking and sitting. 10 900 2,048 x 2,048 x 1Hz
Edge cases Edge cases People walking and sitting, extreme body poses, head comouflage, severe body occlusions. 8 4,201 2,048 x 2,048 x 10Hz
High activity Walking activity People walking in through one door and leaving through the other door. 9 7,202 1,080 x 1,080 x 10Hz
All-off Low light People walking and sitting, overhead lights off, camera IR filter removed, no IR illumination 7 3,000 1,080 x 1,080 x 10Hz
IRfilter Low light People walking and sitting, overhead lights off, with camera IR filter, no IR illumination 8 3,000 1,080 x 1,080 x 10Hz
IRill Low light People walking and sitting, overhead lights off, camera IR filter removed, with IR illumination 8 3,000 1,080 x 1,080 x 10Hz

Data Format

Each video is stored in a separate folder, with one JPEG file for each frame. The annotations for each video are stored in a single JSON file in a format that largely follows the conventions used in the MS COCO dataset, except for the following differences:

  • Each bounding box is represented by five numbers: [cx, cy, w, h, d], where “cx” and “cy” are coordinates of the center of the bounding box in pixels from the top-left image corner at [0,0], “w” and “h” are its width and height in pixels, and “d” is its clock-wise rotation angle from the vertical axis pointing up, in degrees. In order to avoid ambiguity, the bounding-box width is constrained to be less or equal to its height (w<=h), and the rotation angle is set to be within -90 to +90 degrees. An example of one ground-truth annotation superimposed on the original image is shown below.
  • Only one category, namely “person” is used in the dataset.
  • Each annotated person-object is assigned a “person_ID” which is consistently assigned to the same person in consecutive frames.
  • No segmentation masks are provided.

Dataset Download

You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:

Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-aware people detection in overhead fisheye images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020.

To access the download page, please complete the form below (tested only in Chrome).

CEPDOF Download Form

Contact

Please contact [mtezcan] at [bu] dot [edu] if you have any questions.

Acknowledgements

The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944. 

We would also like to thank Boston University students for their participation in the recording and annotation of our dataset.