Set of Pictures Dataset

Each individual has 3-5 different photos. There are >10M individuals multiple ethnicities in the dataset

Check samples on Kaggle

Dataset details
Large human faces dataset multiple ethnicities for face recognition models (30M+ images). This dataset targets 1:N and 1:1 NIST face recognition tests. Dataset contains 10+M individuals, each with 3 images containing their faces
What you receive

When you get our dataset, you can be sure it is high-quality and ready for use in facial recognition and neural network training. The dataset includes:

  1. Unique IDs: Each entry has a unique identifier, so there are no duplicate IDs
  2. Non-redundant Photos: All photos in the dataset are unique, with no duplicates
  3. Correctly Matched Photos and IDs: Photos of different people are correctly matched with their unique IDs
  4. Consistent Photos of Individuals: All photos of the same person are grouped under the right ID, so images are not scattered
How we ensured the data is clean

Our careful cleaning process makes sure the dataset is accurate and reliable. Cleaning Process include the following steps

  1. Initial Face Size Check:
    • We start by checking that each photo meets the minimum face size requirement to ensure they are suitable for facial recognition
  2. Removing Duplicate Photos Within each ID:
    • We identify and remove any duplicate photos within each folder, making sure all images in a folder are unique
  3. Validation of Minimum Number of Faces:
    • We check that each ID has at least three different photos with faces
    • This ensures enough data for each ID
  4. Duplicate Detection Across the Dataset:
    • We search for duplicate photos across the whole dataset
    • This step helps remove redundant images, including photos of celebrities, anime characters, and other irrelevant images
  5. Removing Non-conforming Data:
    • We remove any data that doesn’t meet our criteria, including images that fail the face size check, duplicate IDs and photos, IDs with insufficient photos

By following these steps, we ensure the dataset you receive is clean, reliable, and ready for use. This process minimizes errors and optimizes the dataset for training your facial recognition models, leading to better performance and more accurate results

Best used for:

Contact us

Tell us about yourself, and get access to free samples of the dataset