2 ressourcer fundet

Organisationer: Nationalbiblioteket i Norge Tags: norsk

Filtrér resultater
  • "The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
    • JSON
  • The Stortinget Speech Corpus (SSC) is a 5000+ hours speech dataset for weak supervision ASR created from audio and aligned proceedings text from Stortinget, the Norwegian...
    • JSONL