Call Center Speech Recognition Dataset

10,000 hours of real-world call center speech recordings in 7 languages

Check samples on Kaggle

Introduction

10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

Dataset Features

Scale & Quality

10,000 hours of inbound & outbound calls
Real-world field recordings – no synthetic audio
With transcripts and concise summaries

Audio Specifications

Format: Single-channel (mono) telephone speech
Sample rate: 8,000 Hz
Non-synthetic source audio

Languages (7)

English, Russian, Polish, French, German, Spanish, Portuguese

Non-English calls include English transcriptions
Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.

Domains

Support, Billing/Account, Sales, Finance/Account Management, Pharma

Each call labeled by domain
Automatic (machine-generated) speaker role labels (Agent/Customer)

Purpose and Usage Scenarios

Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
Intent detection, topic classification, and sentiment analysis from customer-service dialogs
Post-call concise summaries for QA/quality monitoring and CRM automation
Cross-lingual pipelines (original → English) and multilingual support models

Legal & Compliance

We prioritize data privacy, ethical AI development, and regulatory compliance. Our Silicone Mask Attack Dataset is collected and processed in full accordance with global data protection standards including GDPR, ensuring legality, security, and responsible AI practices

Sample dataset

A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below

Have a question?

What is the process?

Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements

Can you help us meet dataset disclosure requirements, GDPR and other regulatory controls?

Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.

What is the price of the dataset?

The price depends on your specific requirements. Please submit a request to receive a free consultation

Contact us

Tell us about yourself, and get access to free samples of the dataset

I want to receive communications on the newly added datasets

Didn't find what you were looking for?

Our collection includes many datasets for various requests

Liveness Detection

Call Center Speech Recognition Dataset