Application of ABM to Spectral Features for Emotion Recognition

ER (Emotion Recognition) from speech signals has been among the attractive subjects lately. As known feature extraction and feature selection are most important process steps in ER from speech signals. The aim of present study is to select the most relevant spectral feature subset. The proposed method is based on feature selection with optimization algorithm among the features obtained from speech signals. Firstly, MFCC (Mel-Frequency Cepstrum Coefficients) were extracted from the EmoDB. Several statistical values as maximum, minimum, mean, standard deviation, skewness, kurtosis and median were obtained from MFCC. The next process of study was feature selection which was performed in two stages: In the first stage ABM (Agent-Based Modelling) that is hardly applied to this area was applied to actual features. In the second stageOpt-aiNET optimization algorithm was applied in order to choose the agent group giving the best classification success. The last process of the study is classification. ANN (Artificial Neural Network) and 10 cross-validations were used for classification and evaluation. A narrow comprehension with three emotions was performed in the application. As a result, it was seen that the classification accuracy was rising after applying proposed method. The method was shown promising performance with spectral features.


INTRODUCTION
features of a speech, an application for determining differences among patients who are depressive and have suicidality also exists [5]. One of the vector features that have been used recently is LPC. Linear predictive analysis has importance in characterizing spectral features of a speech sign in time environment [6][7]. MFCC can be calculated without the necessity of LPC because while linear predictive analysis is modelling speech path, MFCC features models human ear [7]. As a result of this, MFCC method produces quite successful results in ER applications [7][8].
Milton et. al. [6] have taken Pitch, duration, energy and MFCC, LPC, features of AR (Autoregressive) parameters, which include gain and reflection coefficients to recognize the emotion from the speech. Single classifier or a combination of classifiers was applied to recognize emotions from the input features. Seven emotions (Anger, Boredom, Disgust, Fear, Happiness, Sadness and Neutral) were taken to recognize.
Lee et. al. [9] have introduced a hierarchical computational structure to recognize emotions. Structure maps were proposed as an input speech utterance into one of the multiple emotion classes through subsequent layers of binary classifications. The classification framework was evaluated on two different emotional databases using acoustic features, the AIBO database and the USC IEMOCAP database.
Multi-agent systems were widely utilized for machine learning systems. Montano et. al. [10] used multi-agent system for learning to identify an appropriate agent to answer free-text queries and keyword searches for defense contracting. Navarro et. al. [11] simulated an expert multi agent system that can compose harmony following specific rules. Taha et. al. [12] developed a novel agent-based design for Arabic speech recognition. The Arabic speech recognition was defined as a Multi-Agent-System where each agent has a specific goal and deals with that goal only.
Since this area is young, there are many ways to perform and improve it. It is seen that MFCC and LPC are used predominantly in the results of the investigations, but it is observed that MFCC is more successful. One of the most important stages in emotion recognition is feature selection. Whereas there are many methods for this step, the desired success rate has not been achieved. The aim of this study is to develop a new feature selection method to increase the success rate.
For application stage data obtained from Berlin Emotion Database [13] for ER were used. Firstly, MFCC were obtained from the data. Since obtained features had too large dimensions to give them to classifiers and they were in different dimensions, new features were calculated by extracting some statistical features from these data. So as to determine the features which supplies to obtain the high accuracy level, opt-aiNET optimization algorithmon ABM were applied. ABM with opt-aiNET was applied in this study for the first time in literature. Obtained features were given to the classifier. As classifier, ANN was used.
Classification results with ABM and without ABM were compared. It was observed that classification accuracy increases using ABM.

Application of ABM to Spectral Features for Emotion Recognition
The remainder of this paper is structured as follows. In Section 2, information related to used methods was given.
In Section 3, database was shortly explained. The experiments to assess the performance of proposed method were described in Section 4. In the last part, conclusion was given.

METHOD
ER from speech signals has been among the attractive subjects lately. As known the most important process steps are feature extraction and feature selection in ER.The process of ER can be made from facial expressions and speech signals. Schematic representation of ER process realized in this study is obtained via the steps depicted in Although directly ER from signal information seems theoretically possible, the dimension reducing process is needed by using features extracted from data through different methods because data dimension is very large.
When it is considered from this point of view, it can be clearly said that the most important steps are feature extraction and selection in emotion detection.
Many methods exist for feature extraction in speech processing. The most known and used ones are MFCC, LPC, F0 value, Wavelet transform, AR Parameters.

Mel-Frequency Cepstrum Coefficients
MFCC comes first among spectral features that are used most widely to obtain feature from speech signals.
Input signal is divided into parts in a way that M is sample number and N is sample length (MN). Whereas the first frame consists of N sample, the subsequent frame starts after M sample from the first frame and thus, the samples as N-M match up with [14].
Then, the process of windowing is carried out using the function of Hamming windowing. The function of Hamming windowing is indicated in Equation (1).
The windowed sign is passed through a FIR filter of first

2.2
Opt-aiNET Algorithm aiNET algorithm is a discrete immune network algorithm that is developed for clustering. Opt-aiNET algorithm is a little more developed state of it and its adaptation for optimization of problems [15]. In the study feature selection was made by applying opt-aiNET algorithm.

Agent-Based Modelling
Although agent does not have an exact definition, it can be defined as an object having features of target, action and state in a particular environment [16]. Moreover, it can be stated as a computer system that can act automatically in order to fulfil a particular aim.
An autonomous agent is defined as a system that receives (virtual or real) perception of the environment where it exists, creates situational awareness after that and by using this perception information in accordance with the aim, it determines its subsequent behavior and that realizes determined action in environment [17]. In a similar way, extracted feature groups are considered as agents and optimization process is considered as an environment.
Action value of agent is determined according to classification success. If the classification success is maximum, state is being sent as "1" to the agent and the agent is used for ER, with the action value "1". Otherwise, it is decided that state will not be classified because state is "0" and at the same time, action value of the agent is "0".

MATERIAL
Many databases are applied in ER problems. It is possible to classify them in different ways according to their number of emotion included, with which language they are formed, as being public or private, their voicing by professionals or actors [18]. A list of the most important database according to their speaking language [19] is also given. The access of databases created by being vocalized by professionals is expensive because they are generally private.
EmoDB known as "Berlin Database of Emotional Speech" database [13] is a public and free database. Therefore, "Berlin DB" database was chosen in our study. This sentences are sentences we frequently use in real life [13]. They consist of 535 audio segments that were sampled at 16 kHz. In this study, 520 segments from 535 segments were taken.

PROPOSED METHOD AND RESULTS
In this section, how the feature selection is done using the agent-based modeling and Opt-aiNET optimization algorithm is explained in detail.
The application was performed with narrow comprehension (3 emotions). The first group of emotions is BNS (Boredom, Neutral and Sadness) and the second group of emotions are HAF (Happiness, Anger and Fear).
Both performances firstly, statistical values were calculated from MCFF obtained by using MFCC methods on emotion data. In second stage of study, to select the features which increase the classification accuracy ABM was used.
As optimization algorithm for feature selection Opt-aiNET algorithm was used. Finally, selected features were used as inputs for ANN and emotion classification was done.
In this study, ANN classification ofWeka [20] (an opensource public available toolbox for automatic classification) was used. MLP (Multilayer Perceptron Algorithm) was preferred. 10-fold cross validation was applied in order to indicate reliability of the study.
In advance, 16 Mel-Coefficients of different lengths from each data were obtained. In the first step, data reduction was made by taking statistics of each coefficient. Extracted statistical values are indicated in Table 1.
The first group of emotion consists of AHF. The steps of process for AHF are shown in Fig. 3.   Fig. 4. Fig. 4  According toOpt-aiNET algorithm result action value was determined and it was used as a state value in the classification.
In the optimization step, the population is created first.
Because the individual in the population represents the group of agents, it is foreseen that the length is as much as the number of agents. For this reason, the length of the individual in this study has been seven. Individuals consist of 0s and 1s. If the value of the agent is 1, the feature group represented by this agent will be included in the classification, 0 will not be included.

Application of ABM to Spectral Features for Emotion Recognition
For example, let the individual is (0,1,0,1,0,0,1). This vector means that the 2nd, 4th, and 7th factors, namely the minimum value of MFCC, Standard deviation of MFCC, and the Median value of MFCC properties are used.
Classification success is used as fitness function. Since the goal is to choose agent (or agents) that will make the classification accuracy maximum, Opt-aiNET optimization algorithm was employed. The individual who has achieved the highest classification success as the result of the optimization has been selected for use in the future stages.
Agent groups chosen as a result of trials are modelled as feature groups with Agent3, Agent5, Agent6 and Agent7.
All of these features were named as Dataset2. Selected features are shown in Table 2.
Obtained results are shown in Table 3