Why emotion recognition




















With current innovation, emotion identification software has developed very adeptly. Moreover, its aptitude to track first facial looks for emotions like happiness, sadness, surprise, anger, etc.

Emotion recognition also concurs with other types of facial recognition technologies and bio-metric image identification. These two types of technologies can be applied in many kinds of security cases. For example, authorities can utilize emotion recognition software to further investigation efforts concerning someone at some point in an interview or interrogation.

Emotion detection continues to go forward on par with other innovations such as natural language processing and these signs of progress are for the most part made probable by the blending of ever more dominant processors, the scientific growth of complex algorithms, and other associated technologies.

Emotion recognition is already widely used by different companies to gauge consumer mood towards their product or brand. The opportunities brought by this technology goes further than market research and digital advertising. Automotive industry and emotion recognition: The automotive industry is also applying emotion recognition technology, as car manufacturers around the world are increasingly focusing on making cars more personal and safe for people to drive.

The latter is a chiefly attention-grabbing area and one that various companies have by now taken steps in testing and researching. In their pursuit to build more smart car features, it makes sense for car manufacturers to use AI to help them understand the human emotions. Using facial emotion detection smart cars can alert the driver when he is feeling drowsy.

Emotion recognition in video game testing: Video games are designed with a specific target audience in mind and aim to evoke a particular behavior and set of emotions from the users. During the testing phase, users are asked to play the game for a given period and their feedback is incorporated to make the final product.

Using facial emotion recognition can aid in understanding which emotions a user is experiencing in real-time as he or she is playing without analyzing the complete video manually. Detecting emotions with technology is quite a challenging task, yet one where machine learning algorithms have shown great promise.

By using Facial Emotion Recognition , businesses can process images, and videos in real-time for monitoring video feeds or automating video analytics, thus saving costs and making life better for their users. Our face analysis algorithm can identify seven different type of emotional states in real-time: happiness, sadness, disgust, surprise, anger, and fear. Making lightweight AI edge software solutions that bridge the gap between the online and real world.

Usually emotions are not observed in isolation—one sees the other embedded in an environment that contributes additional contextual information 12 see also Thus, the environment provides a frame for the perceptual recognition or the inference-based interpretation of emotional responses: when a person observes another in a situation that the observer renders dangerous, he or she will likely attribute fear to the observed subject.

Some pragmatic situations partly derive their meaning from social conventions. There is evidence in the literature indicating that social contexts play an important role in emotion recognition. For example, Carrol and Russell 14 show that a semantic background story of a restaurant invitation can modify emotion recognition even when dealing with a typical Ekman face of a basic emotion like fear, anger, sadness, or disgust.

The restaurant story to be read by the participants clearly indicates that the relevant person is angry due to unfair treatment. After reading the story, participants immediately see a typical Ekman face of fear; due to the influence of the story, the evaluation is significantly shifted from fear to anger [for a discussion see 15 ]. Thus, we aim to further investigate the role of contextual information in emotion recognition.

To best mirror everyday situations, we will not use semantic stories, but situate a facial expression within a natural visual scene. Before presenting our framework of investigation, we would like to situate the role of contextual information in our theory of the nature of emotion as pattern of characteristic features.

We previously described the typical constitutive features of fear. Contextual factors were not mentioned, and we think that they should be seen as mere pragmatic features modulating an affective state, not as being constitutive. For example, it is possible to experience test anxiety just by thinking about an upcoming exam even if the context is a relaxed bar situation with some friends.

An analysis of this situation illustrates that test anxiety is an affective state for which the cognitive evaluation is constitutive e.

It is very important for my future. For being in such an affective state and having such an emotion, the context is important, but not constitutive. This is radically different for emotion recognition : To activate a typical pattern of an emotion, the typical context is essential for the observer.

Contextual factors include most prominently a the pragmatic context in which the emotion occurs and b knowledge of the person having the emotion. As argued, these factors do not belong to the aspects of emotion individuation: Fear is fear, no matter whether an anxious person or a courageous person is afraid.

It remains fear no matter whether it occurs in an obviously dangerous setting or not. However, fear in another person is much easier to recognize in a setting that assigns danger to the observer than it is in a seemingly safe setting. The fact that people make use of such contextual information does not mean that the pragmatic situation or aspects of the bearer's history or personality are part of the emotion, but these aspects are essential for emotion recognition. Wallbott et al. Our aim is to investigate the role of pragmatic context information for the pattern recognition of emotions.

To do this, we followed in the footsteps of literature reporting the effects of visual contextual information on recognition of facial expression in healthy and psychiatric cohorts 16 , Recognizing stereotypical faces is a relatively easy task and only simple pattern recognition skills are required. In everyday situations, contextual information is needed to activate the correct pattern of the emotion that the observed person has. How do we make use of contextual information? We have designed a study providing a more natural facial recognition task than some of the classic tasks.

Our material comprises photos depicting real life situations like festivals, sport events, and vacations. The faces of the protagonists involved were prepared to present ambiguous expressions such that without contextual information, the emotions are rated differently by several raters as showing happy, disgusting, fearful, angry, surprised, or sad expressions.

Moreover, the faces are then presented embedded in different backgrounds so that the influence of context can be examined. Contextual information seems to elicit a widely shared impression that a person in this context has one specific emotion, such as feeling sad, happy, neutral, angry, or fearful.

Only when the context explicitly does not fit the emotions displayed in the facial expression is contextual information ignored. In contrast, when predicting negative emotions, contextual information gains in importance relative to personal information. Moreover, current mood seems to influence one's judgment of another's emotional state.

One part of our hypothesis is that ambiguous faces which are misinterpreted when shown in isolation become intelligible when viewed in the context of a given situation 18 , 19 see also 20 , which indicates an important role of context in emotion recognition.

In our study, we follow a more natural design by showing ambiguous facial expressions, which are rated differently by several individuals who only perceive the facial expressions in isolation.

In addition, each facial expression is put into different contexts so that the context influences the perception of the face and may bias our participants to perceive a certain emotion. By using this paradigm, we can investigate the participants' strategy of how to evaluate facial expressions on a perceptual basis.

This part of the study probes how much information about an emotional pattern the observer needs to feel certain concerning the evaluation of the others' emotions. A second hypothesis to be tested is the influence of the participants' own emotional situation we focus on the observers' mood and their own prior experiences. Both features are relevant for emotion recognition in addition to contextual information. The participants' own emotional situation and their own experiences with life events eliciting strong emotions are reported.

For instance, a depressive person needs happier facial expressions to label them correctly 21 , When considering the role of one's own internal states, we must account for the effect of the mirror neuron system MNS in understanding others.

Iacoboni 24 describes the MNS as pre-reflective, automatic mechanism of mirroring which consists of the activation of the same neurons when doing an action and when observing it.

For example, this mirroring role of specific neurons has been shown for disgust, such that the same neurons activate when experiencing disgust and when perceiving it in the face of another person Because the mirror neuron activation is automatic and does not normally induce a conscious experience of the emotion of the other but only an unconscious representation, the role of the MNS for emotion recognition is intensely debated 26 , The standard interpretation of the MNS-camp concerning social cognition is that the MNS produces the same emotional state in the observer as in the observed person and that this state—despite being unconscious—is the central basis for projecting this internal state to the observed person by attributing the correct emotion.

This process is described as low-level simulation by Goldman Low-level simulation is a subpersonal process initiating the projection of one's own registered state onto the other. This view is supported by the observation that muscle activity is related to activation of the MNS in the brain when seeing emotional faces [ 29 ; for further evidence see 28 ]. But on the other hand, the activation of the MNS remains unconscious and often the observer's conscious emotional situation is different from the other's.

Even if the observer is emotionally involved in a different project e. Further criticism is developed in Gallagher 30 and Newen and Schlicht For the purpose of this article, we accept that the MNS contributes to emotion recognition, but it is not sufficient to explain the result of the perception or evaluation because the perceptual process of seeing an emotion needs more determinants and can allow for even rich contents to interfere Because our main focus of discussing the MNS is the question of whether it plays a crucial role in determining the involvement of one's own emotional state in emotion recognition, the evidence is mixed and the role of the MNS is overstated by the MNS-camp, claiming that mirror neuron activation is one of the components of understanding others including emotion recognition.

We may add to other evidence pointing in different directions: on the one hand, it is reported by Adolphs et al. On the other hand, we have evidence from investigating psychopaths that this group of people have a high sensitivity and reliability in recognizing fear in others while they have a strongly flattened level of experiencing emotions, including fear. According to our view, these data can be reconciled by accounting for several independent and parallel mechanisms of emotion recognition [this multiplicity view is developed in Newen Furthermore, the result of a variety of mechanisms may be two different types of understanding which we highlighted at the beginning of the article: empathic understanding and cognitive understanding.

Evaluating the other's mental state is often more reliable if it is not combined with equating it with one's own emotional state—the idea of decoupling. The better people are at keeping their own feelings current feelings as well as history apart from the other's, the better they are at recognizing other people's mental states. Empathic understanding of the other's emotion remains easier in cases when one has the same emotion.

Cognitive understanding of the other's emotion is easier if one does not undergo the same emotion but is in a neutral emotional state.

With questionnaires, we will assess the actual mood of our participants at moment of testing, which could have an influence on assessing the emotions of others. The combination of the perceptual task with ambiguous faces in different contexts and the questionnaires assessing the participant's own mental state will contribute to the research question of whether we use knowledge about the situation or of the facial expression to ascribe emotions to others.

It is conceivable that different subjects use different strategies. However, we want to investigate the role of contextual information and one's own mood for emotion recognition since this has not been tested so far in a combined design, while we focus on pattern recognition as main mechanism for recognizing the emotions of others see mechanism 3 above.

We searched the Internet and private databases to find pictures of faces with ambiguous emotional expressions. After pre-testing, we chose nine pictures with a high standard deviation, indicating no clear emotional state, for further study preparation. The face was cut out of the original context and pasted over a pre-rated emotional background e. Our participants psychologically healthy individuals and people with psychiatric disorders will view these pictures on a computer screen and be asked to label the facial expression as happy, sad, fearful, angry, surprised, or disgusted.

In this study, participants will be asked to feel with the person, without any time limitation, but with the instruction of giving short and intuitive answers, and then to choose the label which comes closest to their impression of the emotion of the other. Afterwards, the participants will be asked how certain they feel about their answer.

In the following questionnaire, the face will be presented in the original context and after rating the facial expression again, the participant will be asked what served as an indicator of the emotion: the facial expression, the eyes, or the context. By using this experimental design, we can compare whether and when subjects changed their minds about the facial expression of the person in the picture.

The participants can either change their opinion with different contextual information or they can keep their initial opinion. The result will enable us to clarify the role of contextual information in emotion recognition. Our working hypothesis is that contextual information is strongly integrated into the process of emotion recognition in everyday situations. If this turns out to be true, then one explanation for this is that perceptual processes of emotion recognition can be cognitively penetrated by the contextual information such that the observable input features face, body posture, behavior, etc.

Figure 1. It is likely that she is disgusted by and laughing about how her child eats. Figure copyright Dr. Heinisch, private photos with permission and written consent for publication by the imaged person. Figure 2. The same cutout facial expression put into different contexts. In the second path, an efficient channel attention network based on deep separable convolution is proposed to improve the linear bottleneck structure, reduce network complexity and prevent overfitting.

By designing an efficient attention module, the depth of the feature map is combined with spatial information, focusing more on the extraction of important features, and improving the accuracy of emotion recognition. Finally, the feature classification module classifies the fused features through Softmax layer. The research in this paper is carried out on the PC platform, and the experiment is carried out on the ubuntu The experiment is based on the Pytorch deep learning framework, and the programming language uses Python 3.

In order to ensure the fairness of the experiment on the improved network and the comparison network, the training parameters used in the experiment are exactly the same. All model training strategies adopt the learning rate attenuation strategy. The initial learning rate is 0. After the model training is completed, all images in the training data set are called an epoch, and epochs are set in the experiment.

In order to optimize the network faster, I use the Adam optimization algorithm. The data set used in this article is FER Due to the small amount of data in the original facial expression data set, it is far from enough for data-driven deep learning, so data augmentation is a very important operation.

In the network training stage, in order to prevent the network from overfitting, I first do a series of random transformations, including flipping, rotating, cutting, etc. I input the amplified picture into the network for its recognition, and average the results, and finally the output classification with the highest score is the corresponding expression. This method can expand the size of the data set, make the trained network model more generalized and robust, and further improve the accuracy of recognition.

The overall accuracy rate is used as the evaluation index of this study, and its calculation formula is as follows:. In order to verify the reliability of the overall algorithm in this paper, this paper has carried out comparative experiments on the FER data set with the current advanced expression recognition network to evaluate the performance of the algorithm in this paper.

The experimental results are shown in Table 1. Table 1 is a comparison of the recognition rates of different methods. This article uses a lightweight network structure. When compared with MobileNetV3 and Inception, which are also lightweight networks, the accuracy of the FER data set is improved with fewer model parameters.

Increased by 3. In order to avoid misjudgment of model performance when only the overall recognition rate is used as the evaluation index, we conduct detailed experiments on the recognition results of each type of expression through the confusion matrix. The confusion matrix is also called the error matrix.

Each row represents the expression prediction label, and each column represents the actual expression label. Using the confusion matrix can clearly observe the recognition of each type of data, and from the recognition accuracy of each type of expression, we can analyze the performance of the network model in more detail.

Table 2 is the confusion matrix of the high-efficiency channel attention model proposed in this paper for the recognition results of the FER test set. The data in bold on the diagonal line in the table represents the recognition accuracy of each type of expression correctly classified, and the remaining data are expression errors.

The proportion of classification, the last line is the average recognition accuracy of all expressions that are correctly classified. For example, the neutral expression recognition accuracy in the lower right corner of the diagonal of the confusion matrix is 0. It can be seen that the recognition accuracy of happy and surprised expressions is high, with accuracy rates of 0.

Finally, the average recognition rate of the model in the FER test set reached 0. To verify the influence of the feature fusion strategy on the performance of the proposed algorithm, an ablation experiment is set up in this section, where add represents the feature addition strategy, mul represents the feature multiplication strategy, and C represents the feature concat strategy. The results of the ablation experiment are shown in Table 3. It can be seen from Table 3 that the feature concat strategy has achieved the best results.

In addition, the feature addition strategy is better than the feature multiplication strategy. Therefore, this proves that the proposed algorithm is effective in adopting the feature concat strategy. To verify the influence of the channel attention mechanism on the performance of the proposed algorithm, an ablation experiment is set up in this section. CA stands for the channel attention mechanism and SA stands for the spatial attention mechanism.

The results of the ablation experiment are shown in Table 4. It can be seen from Table 4 that the channel attention mechanism has achieved better performance, which proves the superiority of CA in facial emotion recognition.

In this paper, I propose a novel feature fusion dual-channel expression recognition algorithm based on machine learning theory and emotional philosophy. The active facial expression region is first segmented from the original face image, and the features of this region are extracted using Gabor transform, focusing more on the detail description of the local region, in order to make full use of the detail feature of the active facial expression region.

To improve the linear bottleneck structure, reduce network complexity, and avoid overfitting, a channel attention network based on deep separable convolution is proposed in the second path. The depth of the feature map is combined with spatial information by designing an efficient attention module, focusing more on the extraction of important features and improving the accuracy of emotion recognition.

On the FER data sets, competitive performance was achieved. In future work, we will investigate the feasibility of real-time face recognition, and will use the Internet of Things technology to collect faces in real time for emotion recognition. Publicly available datasets were analyzed in this study. ZS was responsible for designing the framework of the entire manuscript, from topic selection to solution to experimental verification.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers.

Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Adjabi, I. Past, present, and future of face recognition: a review. Electronics Ajili, I. Human motions and emotions recognition inspired by LMA qualities. Cai, W. Multimodal data guided spatial feature fusion and grouping strategy for E-commerce commodity demand forecasting. PiiGAN: generative adversarial networks for pluralistic image inpainting.

IEEE Access 8, — Residual-capsule networks with threshold convolution for segmentation of wheat plantation rows in UAV images.

Tools Appl. Chu, Z. Robotic grasp detection using a novel two-stage approach. ASP Trans. Internet Things 1, 19— Interspecies transmission of emotional information via chemosignals: from humans to dogs Canis lupus familiaris.

A machine learning model for emotion recognition from physiological signals. Signal Process. Control Dubuisson, S. A solution for facial expression representation and recognition. Image Commun. Gao, H. A robust improved network for facial expression recognition. Gao, M. AGTH-net: attention-based graph convolution-guided third-order hourglass network for sports video classification.

Ghosal, D. Dialoguegcn: a graph convolutional neural network for emotion recognition in conversation.



0コメント

  • 1000 / 1000