The Danish dynaword is a collection of Danish free-form text datasets from various domains. All of the datasets in Danish Dynaword are openly licensed and deemed permissible for training large language models.
Danish Dynaword is continually developed, which means that the dataset will actively be updated as new datasets become available. The authors welcome contributions to the dataset, including new sources, improved data filtering, and other enhancements. Please consult the contribution guidelines beforehand.
Please note that the license varies from dataset to dataset in the ressource and we advice users to inform themselves about the license on the specific datasets they intend to use.