International Conference on Emerging Technologies in Electronics, Computing and Communication (ICETECC) 2022




Conference Proceedings

The International Conference on Emerging Technologies in Electronics, Computing and Communication 2022

(ICETECC`22)

SPEECH EMOTION RECOGNITION USING DEEP LEARNING HYBRID MODELS

Jamsher Bhanbhro1*; Shahnawaz Talpur1; Asif Aziz Memon2;
1Department of Computer Systems Engineering Mehran University of Engineering and Technology Jamshoro, Pakistan
2Department of Computer Science Dawood University of Engineering and Technology Karachi, Pakistan


ABSTRACT
Speech Emotion Recognition (SER) has been essential to Human-Computer Interaction (HCI) and other complex speech processing systems over the past decade. Due to the emotive differences between different speakers, SER is a complex and challenging process. The features retrieved from speech signals are crucial to SER systems’ performance. It is still challenging to develop efficient feature extracting and classification models. This study suggested hybrid deep learning models for accurately extracting crucial features and enhancing predictions with higher probabilities. Initially, the Mel spectrogram’s temporal features are trained using a combination of stacked Convolutional Neural Networks (CNN) & Long-term short memory (LSTM). The said model performs well. For enhancing the speech, samples are initially preprocessed using data improvement and dataset balancing techniques. The RAVDNESS dataset is used in this study which contains 1440 samples of audio in North American English accent. The strength of the CNN algorithm is used for obtaining spatial features and sequence encoding conversion, which generates accuracy above 93.9% for the model on mentioned data set when classifying emotions into one of eight categories. The model is generalized using Additive white Gaussian noise (AWGN) and Dropout techniques.



-->