Language identification from prosody without explicit features

Most current language identi cation (LID) systems make little or no use of prosodic information, despite the importance of prosody in LID by humans. The greatest obstacle has been that of nding an appropriate feature set which captures linguistically relevant prosodic information. The only system to attempt LID entirely on the basis of prosodic variables uses a set of over 200 features which are selected and combined in a task-speci c manner [12]. We apply a novel recurrent neural network model to the task of pairwise discrimination among languages. Network inputs are limited to delta-F0 and the rst di erence of the band limited amplitude envelope. Initial results are based on all pairwise combinations of English, German, Japanese, Mandarin and Spanish, with 90 speakers per language.