Skip to main content

Medical word embedding eval

In natural language processing, benchmarks are used to track progress and identify useful models. Currently, no benchmark for Danish clinical word embeddings exists. This paper describes the development of a Danish benchmark for clinical word embeddings. The clinical benchmark consists of ten datasets: eight intrinsic and two extrinsic. Moreover, we evaluate word embeddings trained on text from the clinical domain, general practitioner domain and general domain on the established benchmark. All the intrinsic tasks of the benchmark are publicly available. License: Creative Commons BY SA 3.0

Data og ressourcer

Nøgleord

Yderligere info

URI https://data.gov.dk/dataset/lang/1eb20906-1e1b-48f2-9bd0-107ab378ded5
Destinationsside https://huggingface.co/datasets/Den-Intelligente-Patientjournal/Medical_word_embedding_eval
Høstes af Datavejviser Nej
Udgivelsesdato 07-03-2023
Seneste ændringsdato 29-11-2024
Opdateringsfrekvens aldrig
Dækningsperiode  / 
Emne(r)
  • 16.05.07 Sprog og retskrivning
  • Sundhed
  • Uddannelse, kultur og sport
Adgangsrettigheder offentlig
Overholder
Proveniensudsagn

Læs mere om udviklingen af datasættet her: https://aclanthology.org/2023.nejlt-1.4/

Dokumentation