Spoof in the Wild Dataset

There are >10K videos from 3 types of 3D mask attacks tailored

for iBeta level 2 certification

Check samples on Kaggle

Introduction

The Face Spoofing in the Wild Multi-Mask dataset provides a practical benchmark for strengthening liveness detection systems under real-world conditions. It brings together three realistic mask-attack modalities – cardboard cutouts, fabric/printed textile masks, and 3D latex masks – captured across varied subjects, cameras, viewpoints, and lighting. This diversity makes it especially useful for assessing, stress-testing, and fine-tuning passive PAD models, an important step toward iBeta Level 2 readiness

The key value is high variety

Different actors, a large number of individual masks, various lighting (indoor/outdoor, natural/artificial light), backgrounds and locations, shooting angles and distances, different cameras/lenses, accessories (glasses/headwear/beards). This range reduces retraining on specific mask textures and increases generalizability in passive liveness detection and attack type recognition

Types of attacks

Cardboard — thick cardboard with cut-out eyes Flat face print; eyes cut out to allow for real eye movement. Flat image with hard edges around the outline
Textile — thin elastic mesh mask with face print Nylon/elastic fabric mask; full-color face print, fabric fits snugly around the head and partially replicates the 3D shape
Latex — volume 3D latex masks Full-size realistic masks with pronounced “skin” texture. Sometimes they have cut-out eyes and are complemented by external attributes

Why in-the-wild collection improves robustness

Crowd-sourced, in-the-wild data covers the natural long-tail diversity of real scenarios, so models trained on it generalize better and rely less on hidden shortcuts typical of controlled, in-office datasets

How in-the-wild fundamentally differs from in-house

Heterogeneous hardware. Smartphones/webcams across brands and generations → different sensors, optics, ISP pipelines, noise/exposure, stabilization, frame rates, and codecs
Unstaged conditions. Random lighting (daylight/artificial, flicker, backlight), backgrounds, reflections, shadows, and outdoor/indoor scenes
Human behavior. Natural poses, micro-movements, expressions, speed and amplitude of head turns, varying camera distance
Broad spectrum of PAIs/masks. Different materials, shapes, and application methods; real-world artifacts (glare, folds, misalignment); plus “imperfect” spoofing attempts
Demographic and cultural diversity. Skin tones, makeup, accessories (glasses, headwear), styles—rarely covered in office setups

In in-house collected datasets, even with artificial variation of backgrounds/angles, common constants remain: the same camera pool, typical lighting and backgrounds, repeated rooms and collection crews. Models quickly latch onto these spurious cues (e.g., characteristic white balance or wall texture), which harms transfer to real-world conditions

Because the sources of variability differ, in-the-wild and in-house datasets have weak overlap. This makes our dataset a valuable external test bed: if a model performs well here, the likelihood of failure in real production is substantially lower

File format and accessibility

Format: Videos are optimized for compatibility with mainstream ML frameworks
Resolution and frame rate: Videos are high-resolution with frame rates calibrated for capturing quick and realistic mask placements, ensuring precise data for model training

Legal & Compliance

We prioritize data privacy, ethical AI development, and regulatory compliance. Our Silicone Mask Attack Dataset is collected and processed in full accordance with global data protection standards including GDPR, ensuring legality, security, and responsible AI practices

Sample dataset

A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below

Have a question?

Where does the data come from?

We collect data from our internal team. All information is further verified by our specialists

What is the process?

Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements

Can you help us meet dataset disclosure requirements, GDPR and other regulatory controls?

Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.

What makes this dataset suitable for iBeta certification?

The dataset follows iBeta testing protocols and includes diverse attack scenarios that mirror real-world spoofing attempts. It covers both passive and active liveness testing requirements with proper demographic representation and standardized capture conditions essential for certification preparation

What is the price of the dataset?

The price depends on your specific requirements. Please submit a request to receive a free consultation

Contact us

Tell us about yourself, and get access to free samples of the dataset

I want to receive communications on the newly added datasets

Didn't find what you were looking for?

Our collection includes many datasets for various requests

Liveness Detection

Spoof in the Wild Dataset