学位論文要旨詳細

学位論文要旨


No		126216
著者（漢字）		菅野,裕介
著者（英字）
著者（カナ）		スガノ,ユウスケ
標題（和）		注目領域獲得のための頭部姿勢および注視点推定
標題（洋）		Head Pose and Gaze Estimation for Inferring Focus of Attention
報告番号		126216
報告番号		甲26216
学位授与日		2010.03.24
学位種別		課程博士
学位種類		博士(情報理工学)
学位記番号		博情第283号
研究科		情報理工学系研究科
専攻		電子情報学専攻
論文審査委員		主査：　東京大学　准教授　苗村,健　東京大学　准教授　佐藤,洋一　東京大学　教授　池内,克史　東京大学　教授　相澤,清晴　東京大学　准教授　上條,俊介　東京大学　准教授　山崎,俊彦
内容要旨		要旨を表示する Head pose and gaze direction play significant roles in inferring human attention, and they also help us to design more human-centered computer systems. Especially, camera-based remote sensing techniques of head pose and gaze can be led to a wide range of applications. However, although a lot of methods have been proposed, there exist some technical limitations of the estimation techniques. Accurate estimation using only a monocular camera is still a difficult task, and existing methods often require calibration actions prior to the estimation procedure. The goal of this thesis is developing head pose and gaze estimation systems with minimal requirements; all proposed methods do not need active calibration stages and additional equipments other than a camera. The first part describes a monocular method of tracking 3D head poses and facial actions. Using a multilinear face model that treats interpersonal and intrapersonal shape variations separately, real-time parameter estimation is done by integrating two different frameworks: particle filter-based tracking for time-dependent pose and facial action estimation and incremental bundle adjustment for person-dependent shape estimation. This unique combination together with multilinear face models enables tracking of faces and facial actions in real time with no pre-learned individual face models. In the second part, an unconstrained gaze estimation method is presented, which allows free head movements of users in a casual desktop environment using an online learning algorithm. The key assumption is that a user gazes at a cursor position when s/he presses a mouse button. The user's eye images and 3D head poses are continuously captured based on the head pose estimation method described above. By using clicked positions as exemplars of gaze positions, our system collects learning samples for estimating gazes while a user is unconscious of the system when using a PC. The samples are adaptively clustered according to the head pose and estimation parameters are incrementally updated. In this way, our method avoids the lengthy calibration stage prior to using the gaze estimator. One of the drawbacks of the above method is that it cannot be applied to passive displays without user interaction. To solve the problem, another novel calibration-free gaze sensing framework using visual saliency maps is presented in the last part. The method uses visual saliency maps of video frames that are computed in a bottom-up manner. By relating the saliency maps with appearances of eyes of a person watching video frames, our method automatically constructs a gaze estimator. To efficiently identify gaze points from saliency maps, saliency maps are aggregated to generate probability distributions of gaze points. Mapping between eye images and gaze points is then established by Gaussian process regression. This results in a gaze estimator that exempts users from active calibration and can be applied to any type of display devices. Throughout these works, head pose and gaze estimation methods were made significantly more practical by reducing installation and setup costs. The proposed methods can be used with commonly-available cameras and estimation procedures without manual initialization can be seamlessly integrated into our daily computer interactions. This enhances potential for future investigations of attention-based application systems that enrich our daily lives with ubiquitous computing devices.
審査要旨		要旨を表示する本論文は「Head Pose and Gaze Estimation for Inferring Focus of Attention」(注目領域獲得のための頭部姿勢および注視点推定)と題し,単眼カメラのみを用いた人物の3次元頭部姿勢と注視点の推定という未だ困難な課題に対する実現手法の提案を通して,人物の注目領域推定のための枠組みを示したものであり,全体で5章により構成されている. 第1章「Introduction」(はじめに)では,本研究の背景と目的について論じた上で,関連手法に対する本研究の位置付けと利点を整理し,論文で提案される3つの手法の概要について述べている. 第2章「Person-Independent Monocular Head Pose Estimation」(人物非依存の単眼頭部姿勢推定)ではまず,注目領域推定のために重要な精度の高い単眼での3次元頭部姿勢が従来手法では実現されていないことを指摘する.これに対し本論文では,人物の違いにより生じる顔変形と表情の違いにより生じる顔変形を区別して記述する顔形状モデルを用い,表情変動を含むような条件下においても高精度な頭部姿勢推定を実現する手法を提案している.具体的には,移動モデルを用いた時間依存パラメータの予測・追跡と複数フレームの情報を用いた顔形状モデルの個人当てはめを組み合わせることで提案手法は構成されており,追跡を開始する上で個人ごとの初期化作業は必要としない.従来手法で取られているような,要因を分離しない通常の形状モデルを使った推定との比較実験により,特にカメラに対する深さ方向の推定精度が向上することが示されている. 第3章「Incremental Learning for Gaze Estimation」(注視点推定のための逐次学習)では,単眼による注視点推定を行う上で取りうる二つのアプローチ,モデルベースとアピアランスベースの手法について,各々が抱える課題を整理している.本研究では,アピアランスベースの推定手法,すなわち目画像と注視点座標の関係を回帰推定により学習するアプローチに関して,従来手法が持つ高い較正コストと頭部姿勢変動対応の困難さという二つの点を解決する手法を提案している.鍵となるのは,コンピュータの操作中,人物はマウスでクリックする点を注視しているという仮定であり,第2章の技術を元に人物の頭部姿勢と目画像を常に追跡した上で,人物の自然な行動の中から抽出した情報により自動的に較正・学習を行う視線推定器を実現した.さらに,頭部姿勢に応じた学習データの動的クラスタリングにより,アピアランスベースの注視点推定手法を頭部姿勢変動に対応させることに成功している.提案手法は実際にコンピュータの操作情報を取得するよう実装され,単眼注視点推定の既存手法と比較しても高精度な推定が実現出来ることが実環境での実験を通して示されている. 第4章「Calibration-free Gaze Sensing using Saliency Maps」(顕著性マップを用いた無較正注視点検知)では,第3章で提示された構想をさらに一般化するための手法として,人物の視覚特性を元に,較正作業を必要としない新しい注視点推定の枠組みを提案している.ある画像の中で人物がどの領域を注視しやすいか,という問題については,これを計算するための視覚的顕著性モデルの研究が古くから行われており,本研究はこのモデルを注視点推定器の学習のために利用した初めての研究例である.動画とそれを見ている目画像との組のみを入力として,動画から抽出した顕著性マップを対応する目画像による注視点座標の存在確率分布と捉え,注視点座標推定器の回帰学習を行う.これにより,明示的な較正データを一切用いることなく,動画閲覧時の注視点を推定することが可能となり,複数の動画ソースと被験者に対する実験により提案手法の有効性が示されている. 第5章「Conclusions」(まとめ)では,論文で提案された手法のまとめ及びそれぞれの新規性と貢献をまとめた上で,今後取り組むべき課題を議論している. 以上これを要するに,本論文では,単眼カメラのみを用いた簡便な3次元頭部姿勢および注視点推定という重要な課題に対して,パラメータ分離形状モデルを用いた頭部姿勢推定手法,人物行動を利用した注視点推定器の逐次学習手法,視覚的顕著性に基づく注視点推定手法を提案し,実環境において手法の有効性を示したものであり,電子情報学上貢献するところが少なくない. よって本論文は博士(情報理工学)の学位請求論文として合格と認められる.
UTokyo Repositoryリンク