Datasæt - sprogteknologi.dk

Udgivere

Der er ingen Grupper der matcher denne søgning

Synthetic from Text Matching Long Tasks Danish

The purpose of this dataset is to pre- or post-train embedding models for Danish text matching tasks. The dataset consists of 100,000 samples generated with gemma-2-27b-it. The...
- Parquet
Synthetic from Classification Tasks Danish

The purpose of this dataset is to pre- or post-train embedding models for Danish text classification tasks. The dataset consists of 100,000 samples generated with...
- Parquet
Synthetic from Text Matching Short Tasks Danish

The purpose of this dataset is to pre- or post-train embedding models for Danish text matching tasks on short texts. The dataset consists of 100,000 samples generated with...
- Parquet
Synthetic from Retrieval Tasks Danish

The purpose of this dataset is to pre- or post-train embedding models for Danish retrieval tasks. The dataset consists of 100,000 samples generated with gemma-2-27b-it. The...
- Parquet
Synthetic from Unit Triple Tasks Danish

The purpose of this dataset is to pre- or post-train embedding models for Danish on text similarity tasks. The dataset consists of 100,000 samples generated with gemma-2-27b-it....
- Parquet
Syntetisk dialog opsummering raw

Thanks to NVIDIA and Arrow Denmark for sponsoring the compute needed to generate this dataset This dataset conists of 1,000,000 synthetic dialogs in Danish and a summary of each...
- Parquet

Du kan også tilgå dette register med API (se API-dokumenter).