-
28.000 stednavne i Danmark der har en stavemåde som er autoriseret af Kulturministeriet som gældende retskrivning. Navnene kan fremsøges via applikationen stednavne.info hvis...
- XLSX
-
135 mio parallelsætninger (1620 sprogpar - 85 sprog) fra Wikipedia. License: The mined data is distributed under the Creative Commons Attribution-ShareAlike license. Please cite...
- TSV
-
Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size: 120,000 words, topic: innovation, science This dataset has been created within the...
-
Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size 110,000 words, topic: research policy (Processed) This dataset has been created...
- TMX
-
It is generally assumed that addresses form up to 80% of the digital solutions used by a modern society. Access to accurate and up-to-date information on Denmark's addresses is...
- WMS
- XML
-
Danish Named Place data contain names on everything from the tree “Konge egen” and the city center to the peninsula Jutland. There are 140,000 Danish Named Places in total, all...
- GML
- WMS
- XML
-
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN...
- XML
-
DK-CLARIN Reference Corpus of General Danish has been collected as part of DK-CLARIN project, WP2.1, 2008 - 2011. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with...
- XML
-
Dette korpus indeholder n-grammer på dansk afledt af et korpus på 290 millioner ord med danske nyhedsarktikler fra aviserne Berlingske Tidende, Ekstrabladet og Politiken....
- TXT
-
The DanPASS corpus was developed for research and applied research purposes. It consists of of non-scripted monologues and dialogues, recorded by 27 speakers, comprising a total...
- BIN
- TXT
-
The Copenhagen Dependency Treebanks are a set of treebanks for Danish, English, Spanish and Italian. The purpose of the Copenhagen Dependency Treebank project is to create...
- TAG
- ATAG
-
Udtale af ord med bornholmsk dialekt. BCP-47: da-DK-bornholm.
- HTML
-
PAROLE-DK er et manuelt opmærket korpus som danner en de fakto-standard for POS-opmærkning af mange danske og udenlandske resurser. ePAROLE (udgivet i 2015) er en revideret...
- XML
- TXT
-
Liste med alle opslagsord og ordklasser.
- TXT
- HTML
-
Liste med alle opslagsord og ordklasser samt alle bøjede ordformer - 'fuldformsliste'. Må kun bruges integreret i sprogteknologiske produkter, dvs. stavekontroller, spil,...
- TXT
-
Digitalisering og opmærkning af trusselsbreve til projektet 'Truslers sprog og genre', der bygger på en innovativ kombination af sprogvidenskab og genrestudier med det formål at...
- XML
-
Medical spelling dictionary with terms in Danish, English and Latin This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
- TBX
-
BERT (Bidirectional Encoder Representations from Transformers) is a deep neural network model used in Natural Language Processing. The network learns the grammar and semantics...
- CKPT
-
The EUIPO Guidelines are the main point of reference for users of the European Union trade mark system and professional advisers who want to make sure they have the latest...
- TMX
-
Crowdsourced talekorpus på en lang række sprog. Korpusset er blevet skabt ved, at frivillige har doneret sætninger, oplæsninger af sætninger, samt validering af oplæsninger til...
- MP3