学位論文要旨



No 129637
著者(漢字) 蘇,鷺梅
著者(英字)
著者(カナ) スー,ルーメイ
標題(和) 微細な特徴の解析による顔表情の早期認識に関する研究
標題(洋) Early Facial Expression Recognition with Subtle Feature Analysis
報告番号 129637
報告番号 甲29637
学位授与日 2013.03.25
学位種別 課程博士
学位種類 博士(学際情報学)
学位記番号 博学情第59号
研究科 学際情報学府
専攻 学際情報学
論文審査委員 主査: 東京大学 教授 佐藤,洋一
 東京大学 教授 池内,克史
 東京大学 教授 相澤,清晴
 東京大学 特任准教授 森,武俊
 東京大学 准教授 苗村,健
内容要旨 要旨を表示する

The early facial expression recognition task is one to recognize facial expressions as early as possible. It differs significantly from the conventional facial expression recognition task that is aimed as recognizing fully displayed facial expressions at the apex. The conventional method often performs poorly at recognizing subtle facial expressions, and this means that early facial expression recognition can make important contributions for developing techniques for natural human-computer interfaces and human affective computing.

Three issues play important roles in determining the performance of early facial expression recognition. Firstly, the facial deformation is subtle in the early stage of displaying an expression, and this makes it difficult to extract subtle facial features from 2D images or 3D dense data. Secondly, the subtle facial features are very sensitive to noise due to their low intensity, and the noise would unavoidably affect the early recognition results. Lastly, it is difficult to develop an early facial expression classifier that can achieve the conflicting goals of higher classification accuracy and early recognition time simultaneously.

This thesis focuses on developing techniques to resolve the above mentioned issues in early facial expression recognition. In particular, a feature magnification method is proposed to extract discriminative facial features from subtle facial deformation. Moreover, to reduce the influence of noise on the expression recognition performance, two feature refinement methods are developed by analyzing the characteristics of noise existing in subtle facial features. Furthermore, an early classifier that best suits the early facial expression classification problem is constructed by maximally utilizing the expression information in subtle facial features.

The first part describes an expression category- and intensity- dependent feature magnification method to extract subtle facial features. Because the facial deformations are so subtle, it is extremely difficult to extract discriminative features from different facial expression categories. By and large, the existing facial feature extraction methods, ranging from simple feature descriptors like the pixel intensity difference and facial geometry variations to facial motion models like optical flow and the active appearance model, extract obvious facial deformations. To extract discriminative features from subtle facial expressions, an expression category- and intensity- dependent feature magnification approach was proposed to magnify the captured subtle expressions into the corresponding exaggerated expressions due to the complexity of the motions in facial expressions. By considering the cases of pre-magnified subtle expressions, an expression category- and intensity- dependent magnification factor can be prepared for each possible subtle expression with a certain expression category and certain intensity level. Based on these magnification factors, subtle facial expressions can be reliably transformed into much more discriminative exaggerated expressions and thereby a significant improvement can be made in the performance of subtle expression recognition. Experiments corroborated that the expression category- and intensity dependent feature magnification method outperforms methods without motion magnification and even worked well at recognizing exaggerated facial expressions.

The second part describes two subtle feature refinement approaches to reduce the influence of noise on subtle facial features. The low intensity of subtle facial expression features (deformations) makes them very sensitive to noise and the noise can easily affect the early recognition result. Conventional facial expression recognition mainly focuses on recognizing obvious facial expressions and often ignores the influence of noise on feature classification. On the other hand, existing feature refinement approaches (such as principal component analysis and filtering) cannot successfully reduce the influence of noise on subtle facial features. This is because they only work when the facial features compose most of the principle component in feature spatial space or the noise is distributed at higher frequencies. In the case of subtle facial expression recognition, the noise is probably nearly equal in intensity to the subtle facial features in spatial domain and is probably distributed at lower frequencies. Therefore, to alleviate the influence of noise on early facial expression recognition, two feature refinement methods were devised to enhance subtle facial features. One is adaptive wavelet spectral subtraction, which spatial-temporally refines subtle facial expression deformation with an estimated noise model. In particular, a wavelet packet method is used to analyze the spatial-temporal characteristics of the noise in subtle features. To the best of our knowledge, this is the first effort that refines subtle facial features in the spatial-temporal domain. The estimated noise model is then used to adaptively reduce the noise not only at high frequencies but also at low frequencies. The other subtle feature refinement method is LDA-based support vector machine, which combines the idea of linear discriminant analysis (LDA) with support vector machine (SVM). The LDA-based SVM method refines subtle features by compacting noise and maximizing the class separability of subtle features without requiring a noise model. The margin of the LDA-based SVM can be enlarged, and consequently, the classification performance improves. The final goal of feature refinement is improving the classification performance. Generally speaking, feature refinements are independently performed before feature classification. Therefore, the improvement in classification performance from the feature refinements cannot be directly evaluated. The LDA-based SVM integrating together feature refinement and feature classification can improve the classification performance by directly reducing the influence of noise on feature classification. Experiments corroborated that the above described feature refinement methods outperform other feature refinement methods by enhancing the discriminability of subtle facial expression features and consequently make correct recognitions earlier.

In the third part, an early classifier based on early RankBoost is proposed for early facial expression classification. Previous early classifiers are based on structured frame classifiers, and each frame classifier is trained to classify similar frame features. Since subtle facial features are similar because of the subtlety of the corresponding facial changes, these frame classifiers are poor at finding discriminative features. On the other hand, to improve the performance of frame classifiers, a temporal alignment or interpolation compensation is usually performed on expression sequences with different lengths and speeds in order to warp similar frame features. However, such methods cannot be used with a real-time human computer interface. An early classification method based on early RankBoost was devised to solve the early facial expression recognition problem. In most cases, the facial expression intensity increases monotonically from neutral to apex. This observation was exploited to develop a solution for early facial expression recognition. In order to find the most discriminative features of subtle facial expressions, frame rankers are introduced to learn the temporal variation of pair-wise subtle facial expression features in accordance with their temporal order. A weight propagation strategy is then applied to boost the frame ranker into an early recognizer. In this way, the early RankBoost method is capable of finding the most discriminative features from subtle facial features. The rank order of facial features can be naturally decided by the frame order of facial features in training facial expression sequences. The early RankBoost method can also learn and recognize facial expression sequences with different lengths and speeds without requiring temporal alignment or interpolation compensation. Experiments corroborated that the early classifier outperforms other early detection methods and gives promising results on the Cohn-Kanade database and our own dataset built using a high-speed motion capture system.

審査要旨 要旨を表示する

本論文は「Early Facial Expression Recognition with Subtle Feature Analysis」(微細な特徴の解析による顔表情の早期認識に関する研究)と題し,表出強度の弱い微細顔表情を認識することより,表情表出過程においてより早い段階で顔表情を認識する早期顔表情認識に関する研究が取りまとめられており,全体で5章により構成され,英文で書かれている.

第1章「Introduction(はじめに)」では,まず本研究の背景と目的について述べている.具体的には,従来の顔表情認識では,最大表出強度もしくはそれに近い表出強度の顔表情の認識を取り扱うのに対し,本研究では,弱い表出強度の微細顔表情を認識することにより,表情表出開始直後の早い段階での顔表情認識を取り扱っているという違いについて説明した後,早期顔表情認識が持つ実応用上の意義についてまとめている.その上で,早期顔表情認識における主要な技術的課題を整理し,各課題を解決するために提案する4つの手法の概要について述べている.

第2章「Subtle expression recognition based on feature magnification(特徴強調にもとづく微細な顔表情の認識)」では,微細な顔表情において観察される顔特徴点の動きを強調することにより微細顔表情認識の精度を向上させるという考え方にもとづく手法を提案している.ここで,顔表情の種別の違いを明確化するように顔特徴点の動きを強調することが有効であるため,提案手法では,顔表情表出時の顔特徴点の動きをもとに表情種別と表出強度毎に強調量を予め準備しておき,認識時に適切な強調量を選択することで顔表情認識の精度改善を図っている.高速度3次元モーションキャプチャシステムを用いた実験により,提案手法の有効性を確認した.

第3章「Subtle expression recognition based on feature refinement(特徴改善にもとづく微細な顔表情の認識)」では,微細な顔表情を認識しようとする際に,顔表情とは無関係に観察される顔の動きやセンサノイズのために顔表情の認識精度が低下してしまうという問題に対し,二通りのアプローチによりその解決を図っている.第一のアプローチでは,音声信号処理分野において開発されたノイズ低減手法であるウェーブレットサブトラクションを拡張し,各周波成分に対する閾値を適応的に設置可能とすることにより,顔表情とは無関係な顔の動きや観測ノイズを効果的に低減することを実現している.第二のアプローチでは,統計的パターン認識の考え方にもとづき,線形判別分析とサポートベクトルマシンによる識別の組み合わせにより,顔表情とは無関係な動きやノイズの影響を低減しつつ,精度良く顔表情を認識することを可能としている.

第4章「Early facial expression classification with early RankBoost(早期ランクブーストよる早期顔表情認識)」では,一般的に顔表情が表出される際に,表出強度は単調に増加する傾向が強いという点に着目し,ランクブーストによる表出強度の順序関係の学習にもとづく顔表情認識手法を提案している.特に,より早い段階での顔表情認識を実現するために,早期の入力データの誤識別コストを高くするようにランクブーストを拡張した早期ランクブーストを新たに考案し,早期顔表情認識に対して有効であることを顔表情の公開ベンチマークデータセットならびに3次元モーションキャプチャデータセットを用いた評価実験により検証している.

第5章「Conclusions(まとめ)」では,本論文で提案された手法と応用について新規性と貢献を簡潔に述べた上で,今後取り組むべき課題について論じている.

以上これを要するに,本論文では,表出強度の弱い微細顔表情を表情表出の早い段階で認識するという,従来の顔表情認識ではほとんど扱うことが出来なかった早期顔表情認識の問題に対し,早期顔表情認識にともなう技術的課題を整理した上で,微細な顔表情特徴の強調,適応的ウェーブレットサブトラクションによる微細顔表情特徴とノイズの分離,線形判別分析とサポートベクトルマシンの統合によるノイズに頑健な微細顔表情認識,早期ランクブーストによる早期顔表情認識の各手法を提案し,それらの有効性を実験により検証したものであり,学術情報学上貢献するところが少なくない.

よって,本審査委員会は,本論文が博士(学際情報学)の学位に相当するものと判断する.

UTokyo Repositoryリンク