Anda belum login :: 23 Nov 2024 05:01 WIB
Home
|
Logon
Hidden
»
Administration
»
Collection Detail
Detail
Multi-valued classification of text data based on ECOC approach consideringparallel processing
Oleh:
Ogihara, Tairiku
;
Mikawa, Kenta
;
Goto, Masayuki
Jenis:
Article from Proceeding
Dalam koleksi:
The 14th Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS), 3-6 December 2013 Cebu, Philippines
,
page 1-9.
Topik:
ECOC
;
Reed Muller code
;
RVM
;
document classification
;
multivalued classification
Fulltext:
1029.pdf
(573.82KB)
Isi artikel
Due to the development of Information Technology, a large number of document data has been treated in many fields. Such digital text data have become an important information source for industrial management. However,it is difficult for analysts to read all text data and classify it by hand because the number of data stored in databases is enormous. Therefore, the technology of automatic document classification has become more important for business analytics. In this research field, Relevance Vector Machine (RVM) which isa probabilistic binary classifier with good performance has been proposed.On the other hand, when carrying out multi-valued classification,it is known that a combination of appropriate number of binary classifiers is more efficient than a unique multi-valued classifier. However, there is room to improve classification accuracy and reduce computational complexity by studying how to combine binary classifiers. In this study, we focus on multi-valued classification based on Error-Collecting Output Codes (ECOC) approach using the framework of coding theory.In the field of coding theory, the Reed Muller (RM) code and BCH code are well known.For multi-valued classification by ECOC approach based on coding theory, it is necessary to verify the effectiveness of these codes.First of all, we investigate the performances of RM and BCH codes in ECOC approach for multi-valued classification.As the result, we show the effectiveness of RM code in the classification accuracy.However, the ECOC method that the RM code is just applied directly to multi-valued classification problems is not configured to parallel processing, so that there is room to improve the computational time.The number of training data for each classifier was determined in advance and it is not possible to reduce the computational complexity of learning phase. On the other hand, ECOC has the advantage to enable parallel processing because each learning process of binary classifiers can be carried out independently. We can get an aggregated output by combining results of the each processing in the prediction phase. That is, if it is possible to reduce the computational complexity of each classifier, we can greatly reduce the entire computation time by parallel processing. To reduce the computational cost of each classifier, we reduce the number of training data for each classifier by dividing the classification problem into several sub-problems. The proposed method can make use of RM code property and control the number of training data. In the proposed method, small size RM codes are concatenated to acquire an efficient and high accuracy code word.
Opini Anda
Klik untuk menuliskan opini Anda tentang koleksi ini!
Kembali
Process time: 0.015625 second(s)