Anda belum login :: 02 Jun 2025 20:46 WIB
Home
|
Logon
Hidden
»
Administration
»
Collection Detail
Detail
Language-independent text categorization by word N-gram using an automatic acquisition of words
Oleh:
Suzuki, Makoto
;
Yamagishi, Naohide
;
Tsai, Yi-Ching
;
Goto, Masayuki
Jenis:
Article from Proceeding
Dalam koleksi:
The 14th Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS), 3-6 December 2013 Cebu, Philippines
,
page 1-10.
Topik:
Text Mining
;
Automatic Text Classification
;
Newspaper Articles
;
N-gram
Fulltext:
1056.pdf
(532.49KB)
Isi artikel
We previously proposed the accumulation method, a language-independent text classification method that is based on character N-grams. The accumulation method does not depend on the language structure because this method uses character N-grams to form index terms. If text documents are expressed in Unicode, the accumulation method can classify the documents using the same algorithm. In the last APIEMS 2012, we showed some results of document classification using the word N-gram. However, the language-independence that was an original good point in our method was lost. So, in the present paper, we show that a document classification using the word N-gram is possible without losing language-independence. Specifically, we perform it in the following procedure. (Step 1) We acquire pseudo-words using an automatic word acquisition method that we previously proposed and that has language-independence. (Step 2) We constitute a certain pseudo-word N-gram using pseudo-words acquired by Step 1. (Step 3) We perform a document classification using pseud-word N-gram created by Step 2. We classify some data sets of newspaper articles in Japanese, Korean and Chinese according to this procedure, and we show these results. Furthermore, we compare these classification results by pseudo-word N-gram which we proposed in the present paper with those by the common word N-gram using the morphological analysis.
Opini Anda
Klik untuk menuliskan opini Anda tentang koleksi ini!
Kembali
Process time: 0.015625 second(s)