Diphones data

In 1999-2000, Warner, along with Roel Smits, James McQueen, and Anne Cutler, collected a very large quantity of data on Dutch listeners' perception of all possible strings of two Dutch sounds (all possible Dutch diphones) over time, as the sound unfolds over time.  For example, when a listener is hearing a sequence like /hɶy/ as in the Dutch word huis 'house,' how much does the listener know about whether the first sound is an /h/ by 2/3 of the way through the /h/ sound?  How much does the listener know by that point about what the sound after it (the diphthong /ɶy/) might be?  How does this compare to sequences like /pt/ or /aɛ/?  Testing all possible Dutch diphones (more than 2000 of them, gated to end at 6 time points in the sounds, resulting in over 12,000 stimuli) allows us to determine how much information spreads throughout sounds over time, and allows us to test questions about speech perception in a consistent way across all sounds of the language, unlike smaller studies.  The listeners in total provided almost 500,000 perceptual datapoints.  This data also forms the input to the Shortlist-B Bayesian model of Spoken Word Recognition for Dutch.  All data from this project is publicly available, and other researchers are welcome to use it to answer additional questions, as long as the data is cited.

Warner, McQueen, and Cutler later collected matched data for English, testing all possible diphones of American English.  The English dataset is slightly larger, but the size of the two datasets is comparable.  This data is also publicly available.