Big Data Dimensionality Reduction Technique with Deep Autoencoder

Authors

  • O.P. Nweke Lecturer, Department of Computer Science, Rivers State University, Port Harcourt, Nigeria
  • D. Matthias Lecturer, Department of Computer Science, Rivers State University, Port Harcourt, Nigeria
  • E.O. Bennett Lecturer, Department of Computer Science, Rivers State University, Port Harcourt, Nigeria

Keywords:

Big data analysis, high-dimensional data, Deep Autoencoder, dimensionality reduction

Abstract

Dimensionality reduction refers to the act of narrowing the focus from an abundance of potential variables to a more manageable set. Google created the open-source Flutter framework for creating mobile apps. The issue of high-dimensional data hindered the system's effectiveness in terms of providing correct findings (the quantity of false-positive, negative, true-positive, and negative outcomes). The process of solving a classification problem, carrying out a precise visualization, and transmitting vast amounts of data will be made easier by the reduction of high-dimensional data. Because of the high dimensional problem and big data, this paper proposes a reduction technique named Deep Autoencoder Network that was used in reducing high-dimension data to low-dimension data. The proposed technique was applied to big data for dimensionality reduction. The outcome of the Deep Autoencoder demonstrates how the big data runtime was decreased and improved accuracy was attained. For test data, 78% accuracy was achieved. The output of the Deep Autoencoder was evaluated by principal component analysis (PCA) and singular value decomposition. SVD and PCA were used to assess the Deep Autoencoder's (SVD) output. Runtime and accuracy were also contrasted. The results show that the Deep Autoencoder outperforms PCA and SVD with an accuracy level of 78%, and runtime of 10 seconds over 14% for PCA, 12% for SVD, with a runtime of 46 seconds, and 53 seconds, respectively.

References

Tanwar S, Ramani T, Tyagi S. Dimensionality reduction using PCA and SVD in big data: a comparative case study. In: Future Internet Technologies and Trends: First International Conference, ICFITT 2017, Surat, India, August 31–September 2, 2017. pp. 116–125.

Mahajan K, Garg U, Shabaz M. CPIDM: a clustering-based profound iterating deep learning model for HSI segmentation. Wireless Commun Mobile Comput. 2021; 2021: 1–2.

Abdar M, Makarenkov V. CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement. 2019; 146: 557–570.

Potluri S, Diedrich C. Accelerated deep neural networks for enhanced intrusion detection system. In: 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), Berlin, Germany, September 6–9, 2016. pp. 1–8.

Embrandiri SS, Reddy MR. Maximum contrastive networks for multi-channel SSVEP detection. In: 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, April 22–24, 2015. pp. 992–995.

Seki S, Kameoka H, Li L, Toda T, Takeda K. Generalized multichannel variational autoencoder for underdetermined source separation. In: 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, September 2–6, 2019. pp. 1–5.

Kaur S, Singh M. Hybrid intrusion detection and signature generation using deep recurrent neural networks. Neural Comput Appl. 2020; 32: 7859–7877.

Kaya IE, Pehlivanlı AÇ, Sekizkardeş EG, Ibrikci T. PCA based clustering for brain tumor segmentation of T1w MRI images. Computer Methods Programs Biomed. 2017; 140: 19–28.

Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010; 36: 1–3.

Zhu C, Idemudia CU, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform Med Unlocked. 2019; 17: 100179.

Zhao F, Li W. A combined model based on feature selection and WOA for PM 2.5 concentration forecasting. Atmosphere. 2019;10 (4): 223.

Li Z, Ma X, Xin H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today. 2017; 280: 232–238.

Li Y, Ma R, Jiao R. A hybrid malicious code detection method based on deep learning. Int J Security Appl. 2015; 9 (5): 205–216.

Tanwar S, Ramani T, Tyagi S. Dimensionality reduction using PCA and SVD in big data: a comparative case study. In: Future Internet Technologies and Trends: First International Conference, ICFITT 2017, Surat, India, August 31–September 2, 2017. pp. 116–125.

Mahajan K, Garg U, Shabaz M. CPIDM: a clustering-based profound iterating deep learning model for HSI segmentation. Wireless Commun Mobile Comput. 2021; 2021: 1–2.

Abdar M, Makarenkov V. CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement. 2019; 146: 557–570.

Potluri S, Diedrich C. Accelerated deep neural networks for enhanced intrusion detection system. In: 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), Berlin, Germany, September 6–9, 2016. pp. 1–8.

Embrandiri SS, Reddy MR. Maximum contrastive networks for multi-channel SSVEP detection. In: 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, April 22–24, 2015. pp. 992–995.

Seki S, Kameoka H, Li L, Toda T, Takeda K. Generalized multichannel variational autoencoder for underdetermined source separation. In: 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, September 2–6, 2019. pp. 1–5.

Kaur S, Singh M. Hybrid intrusion detection and signature generation using deep recurrent neural networks. Neural Comput Appl. 2020; 32: 7859–7877.

Kaya IE, Pehlivanlı AÇ, Sekizkardeş EG, Ibrikci T. PCA based clustering for brain tumor segmentation of T1w MRI images. Computer Methods Programs Biomed. 2017; 140: 19–28.

Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010; 36: 1–3.

Zhu C, Idemudia CU, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform Med Unlocked. 2019; 17: 100179.

Zhao F, Li W. A combined model based on feature selection and WOA for PM 2.5 concentration forecasting. Atmosphere. 2019;10 (4): 223.

Li Z, Ma X, Xin H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today. 2017; 280: 232–238.

Li Y, Ma R, Jiao R. A hybrid malicious code detection method based on deep learning. Int J Security Appl. 2015; 9 (5): 205–216.

Tanwar S, Ramani T, Tyagi S. Dimensionality reduction using PCA and SVD in big data: a comparative case study. In: Future Internet Technologies and Trends: First International Conference, ICFITT 2017, Surat, India, August 31–September 2, 2017. pp. 116–125.

Mahajan K, Garg U, Shabaz M. CPIDM: a clustering-based profound iterating deep learning model for HSI segmentation. Wireless Commun Mobile Comput. 2021; 2021: 1–2.

Abdar M, Makarenkov V. CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement. 2019; 146: 557–570.

Potluri S, Diedrich C. Accelerated deep neural networks for enhanced intrusion detection system. In: 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), Berlin, Germany, September 6–9, 2016. pp. 1–8.

Embrandiri SS, Reddy MR. Maximum contrastive networks for multi-channel SSVEP detection. In: 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, April 22–24, 2015. pp. 992–995.

Seki S, Kameoka H, Li L, Toda T, Takeda K. Generalized multichannel variational autoencoder for underdetermined source separation. In: 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, September 2–6, 2019. pp. 1–5.

Kaur S, Singh M. Hybrid intrusion detection and signature generation using deep recurrent neural networks. Neural Comput Appl. 2020; 32: 7859–7877.

Kaya IE, Pehlivanlı AÇ, Sekizkardeş EG, Ibrikci T. PCA based clustering for brain tumor segmentation of T1w MRI images. Computer Methods Programs Biomed. 2017; 140: 19–28.

Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010; 36: 1–3.

Zhu C, Idemudia CU, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform Med Unlocked. 2019; 17: 100179.

Zhao F, Li W. A combined model based on feature selection and WOA for PM 2.5 concentration forecasting. Atmosphere. 2019;10 (4): 223.

Li Z, Ma X, Xin H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today. 2017; 280: 232–238.

Li Y, Ma R, Jiao R. A hybrid malicious code detection method based on deep learning. Int J Security Appl. 2015; 9 (5): 205–216.

Liou CY, Huang JC, Yang WC. Modeling word perception using the Elman network. Neurocomputing. 2008; 71 (16–18): 3150–3157.

Sartakhti JS, Zangooei MH, Mozafari K. Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Computer Methods Programs Biomedicine. 2012;108 (2): 570–579.

Tang H, Wang T, Li M, Yang X. The design and implementation of cardiotocography signals classification algorithm based on neural network. Comput Math Methods Med. 2018; 2018: Article 8568617.

Published

2023-08-14

Issue

Section

Review Article