Improving Classification Accuracy of Scikit-learn Classifiers with Discrete Fuzzy Interval Values

Hishamuddin, M.N.F. and Hassan, M.F. and Tran, D.C. and Mokhtar, A.A. (2020) Improving Classification Accuracy of Scikit-learn Classifiers with Discrete Fuzzy Interval Values. In: UNSPECIFIED.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Understanding machine learning (ML) algorithm from scratch is time consuming. Thus, many software and library packages such as Weka and Scikit-Learn have been introduced to help researchers run simulation on several amounts of well-known classifiers. In ML, different classifiers have different performance and this depends on factor such as type of data used as input for the classification phase. Thus, it is necessary to perform data discretization when dealing with continuous data for classifiers that perform better with discrete data. However, in data mining, depending solely on discretization is not enough as real-world data can be large, imprecise and noisy. In addition, knowledge representation is necessary to help researchers to understand better about the data during the discretization process. Thus, the objective of this study is to observe the effect of fuzzy elements inside the discretization phase on the classification accuracy of Scikit-learn classifiers. In this study, fuzzy logic has been proposed to assist the existing discretization technique through fuzzy membership graph, linguistic variables and discrete interval values. All classifiers in Scikit-learn packages were used during the classification phase through 10-fold cross validation. The simulation results showed that the presence of fuzzy in assisting the discretization process slightly improved the classification accuracy of ensemble type classifiers such as Random Forest and Naive Bayes while slightly degrading the performance of other classifiers. © 2020 IEEE.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Impact Factor: cited By 1
Uncontrolled Keywords: Computer software; Data mining; Decision trees; Fuzzy logic; Intelligent computing; Knowledge representation; Machine learning, 10-fold cross-validation; Classification accuracy; Continuous data; Data discretization; Discrete intervals; Discretization process; Fuzzy membership; Linguistic variable, Classification (of information)
Depositing User: Ms Sharifah Fahimah Saiyed Yeop
Date Deposited: 25 Mar 2022 03:04
Last Modified: 25 Mar 2022 03:04
URI: http://scholars.utp.edu.my/id/eprint/29869

Actions (login required)

View Item
View Item