-
"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
- JSON
-
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...
- TMX
-
Contents of https://eng.mst.dk/ and https://mst.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created...
- TMX