Anda belum login :: 27 Nov 2024 13:20 WIB
Home
|
Logon
Hidden
»
Administration
»
Collection Detail
Detail
Mining A Corpus Of Biographical Texts Using Keywords
Oleh:
Conway, Mike
Jenis:
Article from Journal - e-Journal
Dalam koleksi:
Literary and Linguistic Computing vol. 25 no. 1 (Apr. 2010)
,
page 23-35.
Fulltext:
Vol 25, 1, p 23-35.pdf
(362.39KB)
Isi artikel
Using statistically derived keywords to characterize texts has become an important research method for digital humanists and corpus linguists in areas such as literary analysis and the exploration of genre difference. Keywords—and the associated concepts of ‘keyness’ and ‘key-keyness’—have inspired conferences and workshops, many and varied research papers, and are central to several modern corpus processing tools. In this article, we present evidence that (at least for the task of biographical sentence classification) frequent words characterize texts better than keywords or key-keywords. Using the nai¨ve Bayes learning algorithm in conjunction with frequency-, keyword-, and key-keyword-based text representation to classify a corpus of biographical sentences, we discovered that the use of frequent words alone provided a classification accuracy better than either the keyword or key-keyword representations at a statistically significant level. This result suggests that (for the biographical sentence classification task at least) frequent words characterize texts better than keywords derived using more computationally intensive methods.
Opini Anda
Klik untuk menuliskan opini Anda tentang koleksi ini!
Kembali
Process time: 0.015625 second(s)