13 ressourcer fundet

Organisationer: Center for Sprogteknologi

Filtrér resultater
  • Binary wordlists for the CST lemmatizer as suplement to the rules of the lemmatizer. Works with both tagged and untagged input. Use: cstlemma -d NAME-OF-WORDLIST
    • HTML
  • The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN...
    • XML
  • The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The syntax layer of the lexicon, presented here...
    • LMF
    • CSV
  • CST's lemmatiser fører hvert ord i en tekst tilbage til grundformen, lemmaet.
    • C/C++
  • The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...
    • TXT
    • TMX
  • The DK-CLARIN JRC-Acquis Parallel Corpus (da, en) is a part of the JRC-Acquis mulilingual parallel corpus, containing documents from The Acquis Communautaire (AC) which is the...
    • XML
  • The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges...
    • CSV
  • The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...
    • XML
  • The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...
    • XML
  • CST's tokeniserings- og segmenteringsprogram til tekst- og RTF-filer. Opdeler en tekst i ord og ordforbindelser
    • HTML
  • En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte...
    • XML
  • CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++
    • C/C++
  • MULINCO - MUltiLINgual Corpus of the University of COpenhagen. 7 eventyr af H.C.Andersen, tekster af Edgar Allen Poe, Saxos Danmarks historie og EU-traktater på flere sprog...