Literary and Linguistic Computing 9/1: 25–28.Ĭruttenden, Alan. Proceedings of EUROSPEECH 1991, Second European Conference on Speech Communication and Technology, 693–696.Ĭrowdy, Steve. Inqscribe language manual#A preliminary statistical evaluation of manual and automatic segmentation discrepancies. (24 February, 2020.)Ĭosi, Piero, Daniele Falavigna and Maurizio Omologo. Audio BNC: The Audio Edition of the Spoken British National Corpus. Oxford: Elsevier, 229–253.Ĭoleman, John, Ladan Baghai-Ravary, John Pybus and Sergio Grau. Language Resources and Evaluation 44/4: 387–419.Ĭlayman, Steven E. The NXT-format Switchboard Corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Brenier, Neil Mayo, Dan Jurafsky, Mark Steedman and David Beaver. Cambridge: Cambridge University Press.Ĭalhoun, Sasha, Jean Carletta, Jason M. The Evolution of Pragmatic Markers in English: Pathways of Change. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. The British National Corpus, version 3 (BNC XML Edition). Praat, a system for doing phonetics by computer. Proceedings of INTERSPEECH 2009, Tenth Annual Conference of the Interantional Speech Communication Association, 2879–2882.īoersma, Paul. Precision of phoneme boundaries derived using Hidden Markov Models. Literary and Linguistic Computing 7/1: 1–16.īaghai-Ravay, Ladan, Greg Kochanski and John Coleman. International Journal of Corpus Linguistics 21/3: 323–347.Ītkins, Sue, Jeremy Clear and Nicholas Ostler. Semi-lexical features in corpus transcription: Consistency, comparability, standardisation. Oxford: Oxford University Press, 101–122.Īndersen, Gisle. Phraseology: Theory, Analysis, and Applications. On the phraseology of spoken English: The evidence of recurrent word combinations. Conversational Routines in English: Convention and Creativity. To illustrate this, we present three studies that have successfully used the LLC-2 audio material.Īijmer, Karin. The public release of the LLC-2 audio material is a valuable feature of the corpus that allows users to extend the corpus data relative to their own research interests and, thus, broaden the scope of corpus linguistics. Second, anonymisation was done by means of a Praat script, which replaced all personal information with a sound that made the lexical information incomprehensible but retained the prosodic characteristics. First, audio-to-text alignment was solved through the insertion of timestamps in front of speaker turns in the transcription stage, which, as we show in the article, may later be used as a valuable complement to more robust automatic segmentation. However, making the audio material publicly available required careful consideration of how to, most effectively, 1) align the transcripts with the audio and 2) anonymise personal information in the recordings. We draw on our experience of compiling the new London-Lund Corpus 2 (LLC-2), where transcripts are released together with the audio files. This article aims to describe key challenges of preparing and releasing audio material for spoken data and to propose solutions to these challenges.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |