A category classification algorithm for Indonesian and Malay news documents

Jaafar, J. and Indra, Z. and Zamin, N. (2016) A category classification algorithm for Indonesian and Malay news documents. Jurnal Teknologi, 78 (8-2). pp. 121-132.

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....


Text classification (TC) provides a better way to organize information since it allows better understanding and interpretation of the content. It deals with the assignment of labels into a group of similar textual document. However, TC research for Asian language documents is relatively limited compared to English documents and even lesser particularly for news articles. Apart from that, TC research to classify textual documents in similar morphology such Indonesian and Malay is still scarce. Hence, the aim of this study is to develop an integrated generic TC algorithm which is able to identify the language and then classify the category for identified news documents. Furthermore, top-n feature selection method is utilized to improve TC performance and to overcome the online news corpora classification challenges: rapid data growth of online news documents, and the high computational time. Experiments were conducted using 280 Indonesian and 280 Malay online news documents from the year 2014-2015. The classification method is proven to produce a good result with accuracy rate of up to 95.63 for language identification, and 97.5 for category classification. While the category classifier works optimally on n = 60, with an average of 35 seconds computational time. This highlights that the integrated generic TC has advantage over manual classification, and is suitable for Indonesian and Malay news classification. © 2016 Penerbit UTM Press. All rights reserved.

Item Type:Article
Impact Factor:cited By 4
ID Code:25485
Deposited By: Ms Sharifah Fahimah Saiyed Yeop
Deposited On:27 Aug 2021 13:02
Last Modified:27 Aug 2021 13:02

Repository Staff Only: item control page