Speech Emotion Recognition using Convolutional Neural Networks


  • Rajat Mittal




Spectrograms, CNN, emotion, classification, preprocessing, deep learning


This research work presents an assortment of techniques in Speech Emotion Recognition with the help of spectrograms and handcrafted Deep learning architecture: Convolutional Neural Networks (CNN). Emotional kingdom detection is a crucial part of human-device interplay studies. To make interaction between man and machine more natural, many milestones are conquered in speech emotion recognition, but still this process requires more up-to-the-mark results. To make an attempt for the same, this study represents a three-layers deep, two-dimensional Convolutional Neural Network for the challenging task of emotion detection from spectrograms produced by audio (speech) signals. We teach and examine our version on eight emotions: Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust and Surprised. The mean value of outputs produces a classification of human speech. Our proposed version achieves, on average, a weighted accuracy of 72%.