Datasæt - sprogteknologi.dk

Compilation of Danish-English parallel corpora resources used for training...

Dette tosproget korpora er bygget af en række forskellige korpusser fra udvalgte offentlige og private korpus og er blevet brugt til at træne NTEU (Neural Translation for the...
- TMX
COVID-19 EUR-LEX dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). Contains 21238 translations units (DA-EN)
- TMX
COVID-19 EUROPARL dataset v2. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (9th May 2020). Contains 633 translation units (DA-EN).
- TMX
COVID-19 EU presscorner v2 dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). Contains 6261 translation units (DA-EN).
- TMX
COVID-19 EC-EUROPA v1 dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). Contains 2803 translation units (DA-EN).
- TMX

Du kan også tilgå dette register med API (se API-dokumenter).

5 sprogressourcer fundet