-
"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
-
Listen indeholder opslagsordene i ODS (og ODS-S) på nettet ordnet.dk/ods samt de bøjningsformer der er registreret til brug for ordbogens søgefunktion. Listen er TAB-separeret...
- HTML
-
Dansk taledata fra Alvenir, som særligt kan bruges til at evaluere ASR modeller på dansk. Datasættet består af ca. 5 timers tale indtalt af 50 talere mellem 20 - 60 år....
-
Denne database er udviklet af Nordisk språkteknologi AS som datagrundlag for talekendelse og diktering på dansk. I denne version er dataene strukturet på en ny måde, således at...
- TAR
-
Dansk etsproget korpus på 3,708,693 sætninger, med indholdet på www.retsinformation.dk.
-
The Leipzig Corpora Collection provides different tools and data for download, which are protected by copyright. For more details please refer to our terms of usage....
-
Danske Taler er en levende samling, der konstant udvides med aktuelle taler. Vi indfanger og transskriberer de afgørende og definerende øjeblikke, hvor politikere, debattører...
-
A billion-word corpus of Danish text. Split into many sections, and covering many dimensions of variation (spoken/written, formal/informal, modern/old, rigsdansk/dialect, and so...
- ZIP
-
Retsinformation.dk er indgangen til det fælles statslige retsinformationssystem, der giver adgang til alle gældende love, bekendtgørelser og cirkulærer m.v. Der er også adgang...
-
EN-DA Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020). Attribution details: This dataset has...
-
Contents of the Nordic Co-operation web site http://www.norden.org downloaded and converted into a parallel corpus This dataset has been created within the framework of the...
- ZIP
-
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...
- ZIP
-
Contents of https://www.vikingeskibsmuseet.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 1939 translation units (EN-DA)....
- ZIP
-
KOMMER SNART: Crowdsourced talekorpus på en lang række sprog - dog endnu ikke tilgængelig som download for dansk, men det er tanken at det stilles frit tilgængeligt
-
Bilingual (EN-DA) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). Contains 6261 translation units (DA-EN).
-
Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...
-
Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
-
Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...