DanPASS-korpus (Danish Phonetically Annotated Spontaneous Speech)

The DanPASS corpus was developed for research and applied research purposes. It consists of of non-scripted monologues and dialogues, recorded by 27 speakers, comprising a total of 73,227 running words, corresponding to 9 h and 46 min of speech. The monologues were recorded as one-way communication with an unseen partner where the speaker performed three different tasks: (s)he described a network consisting of various geometrical shapes in various colours, (s)he guided the listener through four different routes in a virtual city map, and (s)he instructed the listener how to build a house from its individual pieces. The dialogues are replicas of the HCRC map tasks. Annotation is performed in Praat. The sound files are segmented into prosodic phrases, words, and syllables. The files are supplied, in separate interval tiers, with an orthographical representation, detailed part-of-speech tags, simplified part-of-speech tags, a phonemic notation, a semi-narrow phonetic notation, a symbolic representation of the pitch relation between each stressed and post-tonic syllable, and a symbolic representation of the phrasal intonation.

An extensive description and documentation of the corpus and its numerous resources can be found at https://danpass.hum.ku.dk.

The corpus was presented at the 5th International Conference on Language Resources and Evaluation, Genova 24-24 May 2006.

Note that to open the sound files you need a password. Contact the publisher via email.

Data og Distribution(er)

Yderligere info test

Felt Værdi
Destinationsside https://danpass.hum.ku.dk/
Metadata sidst opdateret december 8, 2022, 15:16 (UTC)
Metadata oprettet maj 20, 2020, 13:12 (UTC)
Emne Uddannelse, kultur og sport Sprog og retskrivning
GUID https://data.gov.dk/dataset/lang/78e6f282-9214-4d42-8dff-6cf3180ed4cc
Kontaktemail ninag@hum.ku.dk
Kontaktnavn Nina Grønnum
Opdateret 2016-03
Opdateringsfrekvens IRREG
URI https://data.gov.dk/dataset/lang/78e6f282-9214-4d42-8dff-6cf3180ed4cc
Udgivelsesdato 2006
Udgivernavn KU, NorS
type Korpora
usage talegenkendelse; talesyntese; sprogforståelse