Call Center Speech Recognition Dataset
10,000 hours of real-world call center speech recordings in 7 languages
Check samples on Kaggle
Introduction
10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains
Dataset Features
Scale & Quality
- 10,000 hours of inbound & outbound calls
- Real-world field recordings – no synthetic audio
- With transcripts and concise summaries
Audio Specifications
- Format: Single-channel (mono) telephone speech
- Sample rate: 8,000 Hz
- Non-synthetic source audio
Languages (7)
English, Russian, Polish, French, German, Spanish, Portuguese
- Non-English calls include English transcriptions
- Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.
Domains
Support, Billing/Account, Sales, Finance/Account Management, Pharma
- Each call labeled by domain
- Automatic (machine-generated) speaker role labels (Agent/Customer)
Purpose and Usage Scenarios
- Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
- Intent detection, topic classification, and sentiment analysis from customer-service dialogs
- Post-call concise summaries for QA/quality monitoring and CRM automation
- Cross-lingual pipelines (original → English) and multilingual support models
Legal & Compliance
We prioritize data privacy, ethical AI development, and regulatory compliance. Our Silicone Mask Attack Dataset is collected and processed in full accordance with global data protection standards including GDPR, ensuring legality, security, and responsible AI practices
Sample dataset
A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below
Have a question?
Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements
Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.
The price depends on your specific requirements. Please submit a request to receive a free consultation
Contact us
Tell us about yourself, and get access to free samples of the dataset
Didn't find what you were looking for?
Our collection includes many datasets for various requests
iBeta Level 1 Dataset
– 35,000+ videos
– 85+ participants
– zoom in and
zoom out
iBeta Level 2 Dataset
– 25 000+ videos
– 3D masks
– iBeta Level 2
Display Replay Dataset for Liveness Detection
– 9,000+ videos
– 6,500+ participants
– Balanced mix of genders and ethnicities
Photo Print Dataset
– 7000+ videos.
– 10-20 second each video
– Mix of genders



