Selfies and Paired ID Photos

Selfie & ID Photo Dataset

70k+ images and 6k+ people: 10-15 photos per ID

Check samples on Kaggle

Introduction

The Selfies & ID Photos Face Recognition Dataset is a comprehensive, enterprise-grade collection designed for training robust face recognition, identity verification, and KYC systems. With 10-15 high-quality images per person (diverse selfies + 2 official ID document photos), this dataset provides the depth and variety needed for production-ready AI models

Why this dataset solves real production challenges

Enterprise Face Recognition systems fail in production not from algorithm weakness, but from training data gaps.
This dataset addresses the 3 critical gaps we identified from our client deployments:

  • Insufficient variation per identity
    Problem: Models trained on 1-3 photos/person fail when users change appearance
    Solution: 15 diverse selfies per person = robust to lighting, pose, accessories, time
  • Selfie-to-document matching gap 
    Problem: Models trained only on selfies can’t verify against official ID photos
    Solution: Same person in both casual selfies AND official documents (rare combination)
  • Demographic bias in production
    Problem: Models perform poorly on underrepresented ethnicities
    Solution: Balanced coverage across Caucasian, Asian, African, Latin American, Arab population

Dataset summary

  • 6,000+ real individuals (not synthetic/AI-generated)
  • 10-15 images per person – maximum training diversity
  • Selfies + Official ID Photos – unique combination for identity verification
  • Balanced demographics – ages 18-65
  • Multi-ethnic coverage – Caucasian, African, Asian, Latin American
  • Real-world conditions – diverse backgrounds, lighting, expressions

Examples

Composition

Parameter
Value
Total Participants
6,000+ unique individuals
Total Images
70,000+ photos
Images per Person
10-15 (selfies + 2 official ID photos)
Metadata Fields
Demographics, device info, temporal data

Demographics

Category
Coverage
Age Range
18-65 years (wide distribution)
Gender
Balanced male/female split
Ethnicities
Caucasian, African, East Asian, South Asian, Latin American

Structured Metadata Included

Dataset includes file with structured metadata for each participant:

  • Demographics – Gender, ethnicity, age group for balanced training
  • Device Information – OS type (Android/iOS/Windows), device model for multi-device analysis  
  • Temporal Data – Historic photo year timestamps for age-gap analysis
  • Photo Categories – Indoor/outdoor/lighting conditions for scenario-based filtering

Source and collection methodology

The Selfies & ID Photos Face Recognition Dataset was collected through a structured, multi-stage process involving a diverse group of participants recruited from multiple geographic regions. All data collection followed strict ethical guidelines with full informed consent obtained from each participant prior to any image capture

Use cases and applications

  • Face Recognition & Detection. Train robust face recognition and detection models using diverse selfies per person to identify individuals across varying lighting, poses, and environmental conditions in security, surveillance, and access control applications
  • KYC & Identity Verification. Automate customer identity verification by matching live selfies against official ID document photos for banking onboarding, fintech applications, and regulatory compliance with AML/KYC requirements
  • Biometric Authentication. Implement secure facial biometrics for mobile device unlocking, payment authorization, multi-factor authentication, and physical access control systems with sub-second response time and 99%+ accuracy

Download information

A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below

Have a question?

We collect data from our internal team. All information is further verified by our specialists

Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements

Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.

The price depends on your specific requirements. Please submit a request to receive a free consultation

Contact us

Tell us about yourself, and get access to free samples of the dataset 

Didn't find what you were looking for?

Our collection includes many datasets for various requests

High-quality biometric datasets for real-world AI

Contacts

UAE, Ajman

© 2022 – 2025 Copyright protected.