Call Center Speech Recognition Dataset

Call Center Speech Recognition Dataset

10,000 hours of real-world call center speech recordings                     in 7 languages

Check samples on Kaggle

Introduction

10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

Dataset Features

Scale & Quality

  • 10,000 hours of inbound & outbound calls
  • Real-world field recordings – no synthetic audio
  • With transcripts and concise summaries

Audio Specifications

  • Format: Single-channel (mono) telephone speech
  • Sample rate: 8,000 Hz
  • Non-synthetic source audio

Languages (7)

English, Russian, Polish, French, German, Spanish, Portuguese

  • Non-English calls include English transcriptions
  • Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.

Domains

Support, Billing/Account, Sales, Finance/Account Management, Pharma

  • Each call labeled by domain
  • Automatic (machine-generated) speaker role labels (Agent/Customer)

Purpose and Usage Scenarios

  • Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
  • Intent detection, topic classification, and sentiment analysis from customer-service dialogs
  • Post-call concise summaries for QA/quality monitoring and CRM automation
  • Cross-lingual pipelines (original → English) and multilingual support models

Legal & Compliance

We prioritize data privacy, ethical AI development, and regulatory compliance. Our Silicone Mask Attack Dataset is collected and processed in full accordance with global data protection standards including GDPR, ensuring legality, security, and responsible AI practices

Sample dataset

A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below

Have a question?

Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements

Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.

The price depends on your specific requirements. Please submit a request to receive a free consultation

Contact us

Tell us about yourself, and get access to free samples of the dataset 

Didn't find what you were looking for?

Our collection includes many datasets for various requests

High-quality biometric datasets for real-world AI

Contacts

UAE, Ajman

© 2022 – 2025 Copyright protected.