Overview

This dataset is a set of 164 recordings of untethered ants grasping objects. Each recording is:

The dataset was developed for use with 3D tracking of animal and object pose throughout the interaction and therefore a subset of the dataset is paired with:

Setup

Labelled diagram of the camera arrangement used to collect the video data used in this dataset. The ant nests are visible in the bottom left of the image.
In the top right, there is a computer monitor showing some application frames with a live stream video of the five different camera views. Labelled diagram of the camera arrangement used to collect the video data used in this dataset.


Objects

A series of images containing the different objects used in the experiment sessions. 
In the centre of the frame is an image of a ChAruCo marker that was used in the calibration videos. All images are taken from the same camera so the ChAruCo board can be used for scale.

A series of images containing the different objects used in the experiment sessions.

In the centre of the frame is an image of a ChAruCo marker that was used in the calibration videos. All images are taken from the same camera so the ChAruCo board can be used for scale.

Download

240905-1616_cam3_strawberry_session29.gif 240905-1616_cam4_blueberry_session33.gif

If you would like to download the full dataset or any part of it, follow the instructions here or here.

Example

We’ve included an Example of how this video dataset is being used with the catar repo for 3D pose extraction.

Dataset

Aside from the large quantity of synchronised videos and verbal descriptions, there are additional features in this dataset. The dataset readme.txt contains some additional information on the file structure.

Annotations

For the 3D Pose and Object Mask, the videos were manually annotated. In addition to this, there are verbal descriptions and action tags for each experiment session.

Descriptive

The videos are separated into 4 categories based on the observed behaviour of interest. In the context of recording, we were most interested in how the ants approached and sensed an object before grasping it and the influence of the sensing behaviour on the likelihood of grasp success. Therefore, we organised the videos into the following categories:

One of the categories in the dataset_description.csv is “Reviewed for Upload?”. In experiments where this says “Yes”, the description and tag have been reviewed since July 2025, the others were last reviewed around December 2024. This time gap allowed for more deliberation on which videos fit in which categories and the specific interesting behaviours to be described. Therefore, experiments that have not been reviewed for upload may have less specific descriptions and incorrect behaviour tags.

2D Pose Object Mask
Annotated frame in SLEAP Masked video frame using SAM2

2D Animal Pose

Skeleton Keypoints

Two illustrated ants with some joint annotations for the name scheme used with SLEAP

Using sleap.ai, around 500 frames of video were manually annotated for the keypoints shown in the skeleton keypoints. The annotations are provided in the Annotations/sleap directory, in the form of an .slp project file. This folder also contains a trained model using a top-down approach.

Object Mask

To extract a pixel occupancy mask, the process was accelerated by using facebookresearch/sam2 to segment the objects. The pixel annotations used to train the SAM2 model are available in the Annotations/object_mask/input directory as .csv.

The code used to train the model on University of Edinburgh cluster computing is also available at prolley-parnell/SAM2_cluster.

3D Animal Pose

Using the mokap repository, the 2D keypoints can be combined into 3D keypoint poses for the frames in the videos where calibration parameters are good. The output of this process can be found in Data/240905-1616/outputs/tracking directory as tracklets_*.pkl files.

To read these files, use the repo prolley-parnell/3d_ant_analysis.

Some further research has been carried out on how to extract a reliable 3D pose from multiple viewpoints in Example.

Object Pose

To generate the 3D object pose, the 2D pixel occupancy masks were combined using the camera parameters and voxel carving then using Iterative Closest Pairs (ICP) to match between the poses of sequential meshes. The code to achieve this is also in mokap.

The output of this pose tracking is Data/240905-1616/outputs/segmentation. This folder contains both the .toml file with pose transforms from the reference frame, as well as the .dae of the reference frame.

Due to the object being tracked having rotational symmetry and being partially occluded by ants, the pose tracking is unreliable in places.

An array of photographs showing the same ant viewed from five different cameras. Each image is annotated with the name of the camera taking the photo, and the object pixel mask is shown in blue.
Next to each photograph, the 3D volume created from the pixel mask is shown from the same perspective as the camera.

An array of photographs showing the same ant viewed from five different cameras. Each image is annotated with the name of the camera taking the photo, and the object pixel mask is shown in blue. Next to each photograph, the 3D volume created from the pixel mask is shown from the same perspective as the camera.