Ressourcer - sprogteknologi.dk

The Norwegian Colossal Corpus

"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
- JSON
Stortinget Speech Corpus version 1.0

The Stortinget Speech Corpus (SSC) is a 5000+ hours speech dataset for weak supervision ASR created from audio and aligned proceedings text from Stortinget, the Norwegian...
- JSONL

2 ressourcer fundet