Fisheye Re-Identification Dataset with Annotations (FRIDA)

Motivation

Knowing the number and location of people in office and school buildings, stores and shopping malls, etc., is critical for public safety (fire, chemical hazards), spatial analytics (optimization of office or store space usage), HVAC energy reduction, and, recently, for pandemic management. While a number of approaches have been developed to detect and track people indoors, overhead fisheye cameras have recently emerged as a compelling alternative. With their vast field of view (180 × 360 degrees), fewer fisheye cameras are needed to cover a large space than commonly-used standard rectilinear-lens cameras, thus reducing system deployment costs. However, in very large spaces a single overhead fisheye camera is insufficient for effective coverage; people very far away are projected to just a few pixels at the field-of-view periphery making detection close to impossible. Therefore, multiple fisheye cameras working in unison are needed for effective large-space coverage.

To date, we have published three fisheye-image datasets, HABBOF, CEPDOF and WEPDTOF, each captured by a single, overhead, fisheye camera. These datasets are proving useful for testing people-detection algorithms in small-to-medium size spaces. In order to inspire research on people detection and tracking in large indoor spaces using multiple overhead fisheye cameras with overlapping fields of view, we introduce the Fisheye Re-Identification Dataset with Annotations (FRIDA). The dataset, the first of its kind, has been captured using 3 time-synchronized, overhead, fisheye cameras and has been annotated with consistent person IDs across all frames and all cameras. Therefore, FRIDA can be used independently for each camera or jointly across time-synchronized fisheye frames. One particularly interesting application of FRIDA is for testing person re-identification (PRID) algorithms across time-synchronized frames, for example to avoid overcounting of people in a large space.

Description

The Fisheye Re-Identification Dataset with Annotations (FRIDA) has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in October 2022. It consists of 4 videos recorded by 3 time-synchronized ceiling-mounted fisheye cameras that have fully-overlapping fields of view. FRIDA was recorded in a 2,000 sqft classroom (72 × 28 ft) with 20 different people moving around. It consists of 18,318  annotated frames with 242,809 bounding-box labels. The frames were captured by three time-synchronized Axis M3057-PLVE cameras at 2,048 × 2,048-pixel resolution and  1.5 frames/sec. The videos include a range of challenging scenarios: crowded room, the same person being captured at very different resolutions in different camera views, severe body occlusions, various (often unusual) body poses, entering/leaving the room. More information about each video and scenario is provided in the table below.

Video sequence No. of frames No. of bounding boxes No. of people per frame Scenarios/Challenges
Segment #1  7,017 66,810 3-15 People coming in and settling down; evenly distributed around the
room; mostly sitting (lower bodies mostly occluded)
Segment #2 3,471 53,460 13-18 People walking around the room; significant occlusions
Segment #3 6,207 103,141 13-17 Concentration of people in parts of the room; people standing and
staying close to each other; people strongly occluding each other
Segment #4 1,623 20,028 5-16 People leaving the room; occasional occlusions at entry/exit points

Data Format

There are 3 folders in the dataset: one with full fisheye frames (JPEG), one with bounding-box images (JPEG) and one with annotations (JSON).

For testing a visual PRID algorithm, the folder with bounding-box images should suffice. These images were extracted from the full frames by using the provided annotations. For PRID methods that require locations of people, the use of annotations will be necessary. We are providing full frames, in case one would like to use FRIDA for other purposes than PRID, such as people detection, people tracking, background subtraction, etc.

The annotations for each video  segment and each camera view are stored in a single JSON file (4 segments × 3 camera views = 12 JSON files) in a format that follows the conventions used in the CEPDOF dataset.

An example of ground-truth annotation superimposed on the original fisheye frames is shown below. These are 3 separate fisheye frames with 3 separate sets of annotations shown next to each other for ease of visualization.  (Click on the image to magnify.) While the three cameras are synchronized to the same NTP server, the acquisition times are not identical for a number of reasons, however the differences do not exceed 1 second.

Dataset Download

You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:

M. Cokbas, J. Bolognino,  J. Konrad and P. Ishwar, “FRIDA: Fisheye Re-Identification Dataset with Annotations“, in 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS),  November 2022.

To access the download page, please complete the form below (tested only in Chrome).

FRIDA Download Form

Contact

Please contact [mcokbas] at [bu] dot [edu] if you have any questions.

Acknowledgements

The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944. 

We would also like to thank Boston University students for their participation in the recording and annotation of our dataset.