-
CST's lemmatiser fører hvert ord i en tekst tilbage til grundformen, lemmaet.
- C/C++
-
Stammer fra NST (Nordisk Språkteknologi) som gik konkurs i 2003. Er holdt ajour i den norske sprogbank i Nationalbiblioteket.
- TXT
-
Danish multi-task CNN trained on UD Danish DDT and DaNE. Assigns context-specific token vectors, POS tags, dependency parses and named entities. Sources: Danish Universal...
- Python
- Source code
-
ML Powered Danish Sentiment Model
- Python
- Source code
-
The Danish Universal Dependencies treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish Dependency Treebank (Buch-Kromann et al. 2003) based on texts from...
- coNLL-U
-
Open-source Python-pakke til dansk talegenkendelse (tale-til-tekst). DanSpeech har arbejdet på at udvikle generelle talegenkendelsesmodeller siden 2018. Projektet har levet som...
- Python
-
A toolkit for Part-of-Speech tagging and NER in DyNet. It has been tested on Danish, amongst other languages (for the UD POS tags in the UD_Danish-DDT version 1.1 and 2.3)...
- Python
-
The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...
- TXT
- TMX
-
DUDS Jens Bille’s Ballad Book belongs to a corpus of the oldest Danish ballad tradition. The corpus consists of 9 ballad books handed down from Renaissance ballad collectors...
- XML
-
Elektroniske versioner af størstedelen af Johannes V. Jensens udgivelser. I regi af CLARIN-projektet og i samarbejde med rettighedshaverne, gjorde Jensen Forum i 2011...
- HTML
-
Gruntvig's Works version 1,12. april 2018 contains N.F.S. Grundtvig's authorship. Corpus folder containing edited texts and OCR texts. Creator: Ravn, Kim Steen License:...
- XML
-
The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges...
- CSV
-
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...
- XML
-
The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...
- XML
-
CST's tokeniserings- og segmenteringsprogram til tekst- og RTF-filer. Opdeler en tekst i ord og ordforbindelser
- HTML
-
En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte...
- XML
-
CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++
- C/C++
-
This resource is an annotation of four NER types (PER, ORG, LOC, MISC) on top of the UD_Danish-DDT data. Status: published and freely available since summer 2019 Reference:...
- conll
-
The Leipzig Corpora Collection provides different tools and data for download, which are protected by copyright. For more details please refer to our terms of usage....
- TXT
-
MULINCO - MUltiLINgual Corpus of the University of COpenhagen. 7 eventyr af H.C.Andersen, tekster af Edgar Allen Poe, Saxos Danmarks historie og EU-traktater på flere sprog...