Speech Emotion Recognition using Convolutional Neural Networks

Authors

  • Rajat Mittal

DOI:

https://doi.org/10.37591/joaira.v8i3.137

Keywords:

Spectrograms, CNN, emotion, classification, preprocessing, deep learning

Abstract

This research work presents an assortment of techniques in Speech Emotion Recognition with the help of spectrograms and handcrafted Deep learning architecture: Convolutional Neural Networks (CNN). Emotional kingdom detection is a crucial part of human-device interplay studies. To make interaction between man and machine more natural, many milestones are conquered in speech emotion recognition, but still this process requires more up-to-the-mark results. To make an attempt for the same, this study represents a three-layers deep, two-dimensional Convolutional Neural Network for the challenging task of emotion detection from spectrograms produced by audio (speech) signals. We teach and examine our version on eight emotions: Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust and Surprised. The mean value of outputs produces a classification of human speech. Our proposed version achieves, on average, a weighted accuracy of 72%.

Published

2022-01-28