Background. The frequency-following response (FFR) is a scalp-recorded electrophysiological potential reflecting phase-locked activity from neural ensembles in the auditory system. The FFR is often used to assess the robustness of subcortical pitch processing. Due to low signal-to-noise ratio at the single-trial level, FFRs are typically averaged across thousands of stimulus repetitions. Prior work using this approach has shown that subcortical encoding of linguistically-relevant pitch patterns is modulated by long-term language experience.
New method. We examine the extent to which a machine learning approach using hidden Markov modeling (HMM) can be utilized to decode Mandarin tone-categories from scalp-record electrophysiolgical activity. We then assess the extent to which the HMM can capture biologically-relevant effects(language experience-driven plasticity). To this end, we recorded FFRs to four Mandarin tones from 14 adult native speakers of Chinese and 14 of native English. We trained a HMM to decode tone categories from the FFRs with varying size of averages.
Results and comparisons with existing methods. Tone categories were decoded with above-chance accuracies using HMM. The HMM derived metric (decoding accuracy) revealed a robust effect of language 3 experience, such that FFRs from native Chinese speakers yielded greater accuracies than native English speakers. Critically, the language experience-driven plasticity was captured with average sizessignificantly smaller than those used in the extant literature.
Conclusions. Our results demonstrate the feasibility of HMM in assessing the robustness of neural pitch. Machine-learning approaches can complement extant analytical methods that capture auditory function and could reduce the number of trials needed to capture biological phenomena.
Pitch is critical to speech and music processing (Ladefoged & Maddieson, 1998; Patel, 2010). For example, speakers of tonal languages (e.g., Chinese) rely on phonologically-relevant pitch patterns (i.e., lexical tones) to convey different word meanings(Gandour, 1983; Ladefoged & Maddieson, 1998). The main cues used for perceptual categorization of such lexical tones are pitch height and pitch direction (Gandour, 1994; Francis & Ciocca, 2003). The neural encoding of pitch is often assessed with the frequency-following response (FFR). FFR is a scalp-recorded electrophysiological potential that reflects phase-locked activity from neural ensembles involved in the processing of low level sound characteristics (Bidelman, 2015; Chandrasekaran & Kraus, 2010; Krishnan, Gandour, & Bidelman, 2010; Coffey, Herholz, Chepesiuk, Baillet, & Zatorre, 2006; Smith, Marsh, & Brown, 1975; Sohmer, Pratt, & Kinarti, 1977). Although it is generally considered that the FFR is entirely generated by auditory subcortical structures (e.g., Krishnan et al., 2005), recent evidence suggests a contribution from auditory cortex (Coffey et al. 2016). An important property of the FFR is that it captures the spectro-temporal correlates of the pitch (e.g., the fundamental 4 frequency, F0) with high fidelity (Chandrasekaran & Kraus, 2010; Krishnan, Xu, Gandour, & Cariani, 2004) (see Fig-1)