Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF)
In September 2019, we published the first people-detection dataset captured by overhead fisheye images that used rotated bounding boxes aligned with a person’s body (HABBOF). Although very useful for developing and evaluating people-detection algorithms, the dataset contained less than 6,000 annotated frames, with at most 4 people visible at a time and only two challenges (moving objects and lights off).
Therefore, we introduce a new dataset, Challenging Events for Person Detection from Overhead Fisheye Images (CEPDOF). The new dataset consists of 8 videos, with up to 13 people visible at a time, over 25,000 annotated frames and several new challenging scenarios, compared to HABBOF (please see the description below for details). Furthermore, CEPDOF is annotated spatio-temporally, that is bounding boxes of the same person carry the same ID in consecutive frames, and thus can be also used for additional vision tasks using overhead, fisheye images, such as video-object tracking and human re-identification.
Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF) dataset has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in April 2020. The new dataset consists of 8 videos recorded by overhead-mounted fisheye cameras in one room (a small classroom), with up to 13 people visible at a time and 25,504 annotated frames. The videos include a range of challenging scenarios: crowded room, severe body occlusions, various (often unusual) body poses, head camouflage (e.g., hoods, hats), images of people on a projection screen and low-light conditions with or without IR (infra-red) illumination. More detailed information about each video is provided in the table below.
|Video sequence||Scenarios||Description/Challenges||Max. No. of people||No. of frames||Resolution|
|Lunch meeting 1||Common activities||People walking and sitting.||11||1,201||2,048 x 2,048 x 1Hz|
|Lunch meeting 2||Crowded scene||More than 10 people sitting and having lunch.||13||3,000||2,048 x 2,048 x 10Hz|
|Lunch meeting 3||Common activities||People walking and sitting.||10||900||2,048 x 2,048 x 1Hz|
|Edge cases||Edge cases||People walking and sitting, extreme body poses, head comouflage, severe body occlusions.||8||4,201||2,048 x 2,048 x 10Hz|
|High activity||Walking activity||People walking in through one door and leaving through the other door.||9||7,202||1,080 x 1,080 x 10Hz|
|All-off||Low light||People walking and sitting, overhead lights off, camera IR filter removed, no IR illumination||7||3,000||1,080 x 1,080 x 10Hz|
|IRfilter||Low light||People walking and sitting, overhead lights off, with camera IR filter, no IR illumination||8||3,000||1,080 x 1,080 x 10Hz|
|IRill||Low light||People walking and sitting, overhead lights off, camera IR filter removed, with IR illumination||8||3,000||1,080 x 1,080 x 10Hz|
Each video is stored in a separate folder, with one JPEG file for each frame. The annotations for each video are stored in a single JSON file in a format that largely follows the conventions used in the MS COCO dataset, except for the following differences:
- Each bounding box is represented by five numbers: [cx, cy, w, h, d], where “cx” and “cy” are coordinates of the center of the bounding box in pixels from the top-left image corner at [0,0], “w” and “h” are its width and height in pixels, and “d” is its clock-wise rotation angle from the vertical axis pointing up, in degrees. In order to avoid ambiguity, the bounding-box width is constrained to be less or equal to its height (w<=h), and the rotation angle is set to be within -90 to +90 degrees. An example of one ground-truth annotation superimposed on the original image is shown below.
- Only one category, namely “person” is used in the dataset.
- Each annotated person-object is assigned a “person_ID” which is consistently assigned to the same person in consecutive frames.
- No segmentation masks are provided.
You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:
Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020.
To access the download page, please complete the form below (tested only in Chrome).
Please contact [mtezcan] at [bu] dot [edu] if you have any questions.
The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944.
We would also like to thank Boston University students for their participation in the recording and annotation of our dataset.