International Conference on Emerging Technologies in Electronics, Computing and Communication (ICETECC) 2022




Conference Proceedings

The International Conference on Emerging Technologies in Electronics, Computing and Communication 2022

(ICETECC`22)

Data Dimension Reduction makes ML Algorithms efficient

Wisal Khan1; Muhammad Turab2; Waqas Ahmad1; Syed Hasnat Ahmad3; Kelash Kumar4; Bin Luo1*;
1School of Computer and Technology Anhui University, Hefei 230039 Peoples Republic of China
2Department of Computer Systems Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
3Northwestern Polytechnical University, China
4Department Electrical Engineering Mehran Uni. of Enginerring & Technology Hyderabad, Pakistan


ABSTRACT
Data dimension reduction (DDR) is all about mapping data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end mapping. In this paper, we demonstrate that pre-processing not only speeds up the algorithms but also improves accuracy in both supervised and unsupervised learning. In pre-processing of DDR, first PCA based DDR is used for supervised learning, then we explore AE based DDR for unsupervised learning. In PCA based DDR, we first compare supervised learning algorithms accuracy and time before and after applying PCA. Similarly, in AE based DDR, we compare unsupervised learning algorithm accuracy and time before and after AE representation learning. Supervised learning algorithms including support-vector machines (SVM), Decision Tree with GINI index, Decision Tree with entropy and Stochastic Gradient Descent classifier (SGDC) and unsupervised learning algorithm including K-means clustering, are used for classification purpose. We used two datasets MNIST and FashionMNIST Our experiment shows that there is massive improvement in accuracy and time reduction after pre-processing in both supervised and unsupervised learning.



-->