Anda belum login :: 17 Feb 2025 08:35 WIB
Detail
ArtikelImproved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus  
Oleh: Post, Matt ; Kumar, Gaurav ; Lopez, Adam ; Karakos, Damianos ; Callison-Burch, Chris ; Khudanpur, Sanjeev
Jenis: Article from Proceeding
Dalam koleksi: Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT 2013), Heidelberg, Germany: Dec. 5-6, 2013
Fulltext: Improved Speech-to-Text.pdf (11.79MB)
Isi artikelResearch into the translation of the output of automatic speech recognition (ASR) systems is hindered by the dearth of datasets developed for that explicit purpose. For Spanish- English translation, in particular, most parallel data available exists only in vastly different domains and registers. In order to support research on cross-lingual speech applications, we introduce the Fisher and Callhome Spanish-English Speech Translation Corpus, supplementing existing LDC audio and transcripts with (a) ASR 1-best, lattice, and oracle output produced by the Kaldi recognition system and (b) English translations obtained on Amazon’s Mechanical Turk. The result is a four-way parallel dataset of Spanish audio, transcriptions, ASR lattices, and English translations of approximately 38 hours of speech, with defined training, development, and held-out test sets. We conduct baseline machine translation experiments using models trained on the provided training data, and validate the dataset by corroborating a number of known results in the field, including the utility of in-domain (information, conversational) training data, increased performance translating lattices (instead of recognizer 1-best output), and the relationship between word error rate and BLEU score.
Opini AndaKlik untuk menuliskan opini Anda tentang koleksi ini!

Kembali
design
 
Process time: 0.015625 second(s)