CoRal - Danish Conversational and Read-aloud Dataset

CoRal is a comprehensive Automatic Speech Recognition (ASR) dataset designed to capture the diversity of the Danish language across various dialects, accents, genders, and age groups. The primary goal of the CoRal dataset is to provide a robust resource for training and evaluating ASR models that can understand and transcribe spoken Danish in all its variations.

Key Features:

Dialect and Accent Diversity: The dataset includes speech samples from all major Danish dialects as well as multiple accents, ensuring broad geographical coverage and the inclusion of regional linguistic features.

Gender Representation: Both male and female speakers are well-represented, offering balanced gender diversity. Age Range: The dataset includes speakers from a wide range of age groups, providing a comprehensive resource for age-agnostic ASR model development.

High-Quality Audio: All recordings are of high quality, ensuring that the dataset can be used for both training and evaluation of high-performance ASR models.

Forbidden Use Cases Speech Synthesis and Biometric Identification are not allowed using the CoRal dataset. For more information, see addition 4 in our license (https://huggingface.co/datasets/alexandrainst/coral/blob/main/LICENSE).

A research paper will be submitted soon, but until then, if you use the CoRal dataset in your research or development, please cite it as follows:

@dataset{coral2024, author = {Dan Saattrup Nielsen, Sif Bernstorff Lehmann, Simon Leminen Madsen, Anders Jess Pedersen, Anna Katrine van Zee and Torben Blach}, title = {CoRal: A Diverse Danish ASR Dataset Covering Dialects, Accents, Genders, and Age Groups}, year = {2024}, url = {https://hf.co/datasets/alexandrainst/coral}, }

Data og Distribution(er)

Yderligere info test

Felt Værdi
Destinationsside https://huggingface.co/datasets/alexandrainst/coral
Forfatter Dan Saattrup Nielsen
Vedligeholdes af Dan Saattrup Nielsen
Metadata sidst opdateret september 19, 2024, 08:37 (UTC)
Metadata oprettet september 18, 2024, 13:05 (UTC)
Opdateret 2024-09-13
URI https://data.gov.dk/dataset/lang/79c44568-2a5e-4ff4-9430-6e22da1f432d
Udgivelsesdato 2024-08-26
Udgivernavn Alexandra Instituttet, Københavns Universitet, Digitaliseringsstyrelsen, Alvenir, Corti
type Korpora
Dokumentation