Fisheye Re-Identification Dataset with Annotations (FRIDA)
Knowing the number and location of people in office and school buildings, stores and shopping malls, etc., is critical for public safety (fire, chemical hazards), spatial analytics (optimization of office or store space usage), HVAC energy reduction, and, recently, for pandemic management. While a number of approaches have been developed to detect and track people indoors, overhead fisheye cameras have recently emerged as a compelling alternative. With their vast field of view (180 × 360 degrees), fewer fisheye cameras are needed to cover a large space than commonly-used standard rectilinear-lens cameras, thus reducing system deployment costs. However, in very large spaces a single overhead fisheye camera is insufficient for effective coverage; people very far away are projected to just a few pixels at the field-of-view periphery making detection close to impossible. Therefore, multiple fisheye cameras working in unison are needed for effective large-space coverage.
To date, we have published three fisheye-image datasets, HABBOF, CEPDOF and WEPDTOF, each captured by a single, overhead, fisheye camera. These datasets are proving useful for testing people-detection algorithms in small-to-medium size spaces. In order to inspire research on people detection and tracking in large indoor spaces using multiple overhead fisheye cameras with overlapping fields of view, we introduce the Fisheye Re-Identification Dataset with Annotations (FRIDA). The dataset, the first of its kind, has been captured using 3 time-synchronized, overhead, fisheye cameras and has been annotated with consistent person IDs across all frames and all cameras. Therefore, FRIDA can be used independently for each camera or jointly across time-synchronized fisheye frames. One particularly interesting application of FRIDA is for testing person re-identification (PRID) algorithms across time-synchronized frames, for example to avoid overcounting of people in a large space.
The Fisheye Re-Identification Dataset with Annotations (FRIDA) has been developed at the Visual Information Processing (VIP) Laboratory at Boston University and published in October 2022. It consists of 4 videos recorded by 3 time-synchronized ceiling-mounted fisheye cameras that have fully-overlapping fields of view. FRIDA was recorded in a 2,000 sqft classroom (72 × 28 ft) with 20 different people moving around. It consists of 18,318 annotated frames with 242,809 bounding-box labels. The frames were captured by three time-synchronized Axis M3057-PLVE cameras at 2,048 × 2,048-pixel resolution and 1.5 frames/sec. The videos include a range of challenging scenarios: crowded room, the same person being captured at very different resolutions in different camera views, severe body occlusions, various (often unusual) body poses, entering/leaving the room. More information about each video and scenario is provided in the table below.
|Video sequence||No. of frames||No. of bounding boxes||No. of people per frame||Scenarios/Challenges|
|Segment #1||7,017||66,810||3-15||People coming in and settling down; evenly distributed around the
room; mostly sitting (lower bodies mostly occluded)
|Segment #2||3,471||53,460||13-18||People walking around the room; significant occlusions|
|Segment #3||6,207||103,141||13-17||Concentration of people in parts of the room; people standing and
staying close to each other; people strongly occluding each other
|Segment #4||1,623||20,028||5-16||People leaving the room; occasional occlusions at entry/exit points|
There are 3 folders in the dataset: one with full fisheye frames (JPEG), one with bounding-box images (JPEG) and one with annotations (JSON).
For testing a visual PRID algorithm, the folder with bounding-box images should suffice. These images were extracted from the full frames by using the provided annotations. For PRID methods that require locations of people, the use of annotations will be necessary. We are providing full frames, in case one would like to use FRIDA for other purposes than PRID, such as people detection, people tracking, background subtraction, etc.
The annotations for each video segment and each camera view are stored in a single JSON file (4 segments × 3 camera views = 12 JSON files) in a format that follows the conventions used in the CEPDOF dataset.
An example of ground-truth annotation superimposed on the original fisheye frames is shown below. These are 3 separate fisheye frames with 3 separate sets of annotations shown next to each other for ease of visualization. (Click on the image to magnify.) While the three cameras are synchronized to the same NTP server, the acquisition times are not identical for a number of reasons, however the differences do not exceed 1 second.
You may use this dataset for non-commercial purposes. If you publish any work reporting results using this dataset, please cite the following paper:
M. Cokbas, J. Bolognino, J. Konrad and P. Ishwar, “FRIDA: Fisheye Re-Identification Dataset with Annotations“, in 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), November 2022.
To access the download page, please complete the form below (tested only in Chrome).
FRIDA Download Form
Please contact [mcokbas] at [bu] dot [edu] if you have any questions.
The development of this dataset was supported in part by the Advanced Research Projects Agency – Energy (ARPA-E), within the Department of Energy, under agreement DE-AR0000944.
We would also like to thank Boston University students for their participation in the recording and annotation of our dataset.