学位論文要旨



No 126514
著者(漢字) オストルク,オゥグ
著者(英字) Ozturk,Ovgu
著者(カナ) オストルク,オゥグ
標題(和) 単一カメラ映像からの人の体と頭部の向きの推定と群衆の動き場の解析
標題(洋) Human Body/Head Orientation Estimation and Crowd Motion Flow Analysis from a Single Camera
報告番号 126514
報告番号 甲26514
学位授与日 2010.12.24
学位種別 課程博士
学位種類 博士(科学)
学位記番号 博創域第640号
研究科 新領域創成科学研究科
専攻 基盤情報学専攻
論文審査委員 主査: 東京大学 教授 相澤,清晴
 東京大学 教授 柴田,直
 東京大学 教授 佐藤,洋一
 東京大学 准教授 杉本,雅則
 東京大学 准教授 山,俊彦
内容要旨 要旨を表示する

In the last few decades, automation of descriptive and statistical analysis of human behavior became a very significant research topic. Due to the technological advances in video technologies, many researchers have focused on detection and analysis of human motion from video cameras. To achieve this, until now researchers have tried to solve problems, such as: detecting humans in a given scene, counting number of humans, motion tracking and analyzing their trajectory of motion, etc. However, to truly understand human behavior and evaluate a given scene, more semantic analysis is required. Hence, as a next step, measuring people's visual focus of attention has become a significant and challenging problem. Measuring focus of attention can be useful in many different ways. Finding the objects or places in people's focus of attention can help us to understand their intention and control the security of the environment. People looking at the bulletin boards, customers walking around market stands can provide information about recent trends, marketing strategies. Social interactions can be interpreted in a more meaningful way, human-computer interactions improve and more intelligent autonomous systems can be built.

Visual focus of attention of humans is defined as the direction they are heading to or the direction they are looking at during their motion. Humans show their attention by walking towards that direction or by turning their head to that direction. In a given scene, the orientation of body and head of a human can give us a hint about his/her visual focus of attention. When there are many people in the scene, e.g. extremely crowded scenes, it is not possible to analyze each person individually. In this case, the paths humans mostly walk can give us information about their interests in the environment.

Currently, there are a huge number of researches that try to solve visual focus of attention estimation problem by using multiple cameras, multi-sensors or they locate various markers on the bodies. These approaches are often too impractical or expensive to build in common public places for general cases. Our aim is to extract the most possible useful information to achieve human motion analysis in a given public scene from a single camera. However, it has very big challenges due to the articulations in human pose and less data. On the other hand, by only using a single camera, we can build portable, low-cost systems with less complexity.

In our research we focus on two major problems. First, we have developed a system that tracks people and estimates their body and head orientation. Second, we have analyzed various crowd scenes and proposed a method to calculate the dominant motion flows that can handle very complex situations. At the beginning of our research, we have studied human tracking and developed a real-time application which tracks multiple people simultaneously and detects their focus of attention by estimating their next steps. It was incorporated in a digital art project that was exhibited in Haneda Airport in Tokyo for one month.

First of all, we have developed an interactive entertainment system employing real-time multiple human tracking to demonstrate possible applications and importance of human tracking and motion understanding. The system was a part of a digital public art project, which presented technological advances by using art in an airport. Our system tracks people and continuously visualizes their predicted future footsteps in front of them while they keep moving. A real-time multiple human tracking algorithm has been developed and combined with a visualization process. The system can be installed in any indoor place easily. It does not affect the natural flow of life in the sense that it does not affect the movements of people until they notice the displayed interactive foot shapes. Hundreds of passers-by visited the system for duration of one month. When they noticed, people showed surprise, excitement, astonishment. They tried to discover where and why the foot shapes were coming from. They played with the system by making various movements.

To solve focus of attention analysis problem for a single person, we have developed a system which tracks a person in the environment and estimates his/her body and head orientation during their motion from a top-view single camera. We have utilized the edge map of Ω shape of the head-shoulder region of the person to estimate the initial body orientation. Next, by calculating the orientation change in the body and head, we estimated the new orientation. Displacement vectors of SIFT features have been analyzed to calculate the orientation change. The experiments showed that, body and head orientation of a person can be estimated successfully by using the proposed method. The orientation angle range was pi/8 and the error was five degrees at most. The algorithm works for a single person under various motions(walking straight, turning the head to the right, turning around himself, turning right, etc…) of the person in the scene.

Next, complex crowd motions have been studied to determine the focus of attention of crowds in the scenes. To analyze the crowd motion, one of the most useful information is to find the mostly followed paths. It gives us information about the tendency of the people in the scene and usage statistics of the regions in the scene. We have proposed two main algorithms to detect the dominant motion flows in structured and unstructured (very complex) scenes. First algorithm was developed to extract and represent the motion flows in the scene in the local regions. We used SIFT motion flows for short periods of time and we have accumulated those instantaneous flows for a long period to represent the motion in the scene. Then, we utilized a hierarchical clustering algorithm to classify the motion flows into meaningful groups by prioritizing the orientation. The proposed system was tested against a group of challenging scenarios from real world scenes and it successfully detected the dominant motion flows. Furthermore, our system provided a flexible way to analyze the motion flows in various levels of detail and it also successfully dealt with the local irregularities and detected the motion flows in any part of the scene.

審査要旨 要旨を表示する

本論文は,「Human Body/Head Orientation Estimation and Crowd Motion Flow Analysis from a Single Camera (単一カメラ映像からの人の体と頭部の向きの推定と群衆の動き場の解析)」と題し,英文で書かれており,7章よりなる.サーベイランスやデジタルサイネージ,さらにはインタラクティブシステムなど公共スペースで利用される映像では,その場に居合わせた人に関して映像からより多くの情報を抽出することが望まれている.本論文では,天井に配置された単一のカメラを介して得られる映像から,人や人物群の動きを解析する課題について論じている.具体的には,以下の3つの課題(1)人物の実時間追跡とインタラクティブエンターテイメントへの応用 (2)人物の体と頭部の向きの推定 (3)人物群の動きの検出に関する研究を行っている.

第1章は,「Introduction(序論)」と題し,本論文の目的と背景,構成について論じている.本論文で取り組む課題についての概要を紹介している.

第2章は,「State of the Art(技術動向)」と題し,本研究で取り扱う課題に関し,関連研究のまとめを行うとともに,本論文の貢献に関してまとめている.インタラクティブ応用のための複数人の実時間追跡,人の体と頭部の向きの推定,人物群の動き場の解析に関して,それぞれ研究の現状を述べ,さらに本論文で論じる手法の特徴を論じている.

第3章は,「Real-time Multiple Human Tracking and Motion Flow Estimation: Future Footsteps (実時間での複数人物の追跡と動きの推定:未来の足跡)」と題し,高所に設置された単一の天井カメラから視野内で自由に動く人物を複数人同時に実時間で追跡する手法とそのインタラクティブアートへの応用について論じている.インタラクティブアートとしては,実時間の追跡の結果を利用し,追跡対象人物の前に"未来の足跡"として足跡の形状を投影表示するシステムを構築しており,当該システムは羽田空港にて1ヶ月間の展示を行った.システムの概要を述べ,所与の条件の下,同一視野内の複数人を実時間で追跡しうる手法として,背景差分で求められる塊の動きの対応付けを行い,移動履歴からの線形近似により,未来の足跡の位置推定を行う仕組みについて述べている.また,空港での展示により得られた人の反応についてもまとめている.近年の映像からの追跡を用いたインタラクティブシステムとの比較を行い,本検討がマーカーなしに多人数を追うものであり,動きの予測を行う点に特徴があることを述べている.

第4章は,「Body/Head Orientation Estimation while Tracking a Person (人物追跡時における体と頭部の向きの推定)」と題し,単一の天井カメラから得られる公共空間の人物の動きを追うとともにその体と頭部の向きを推定する手法について論じている.但し,天井カメラの位置は高く,その映像では,対象の詳細は十分に把握できない.このため,人物の動きを追跡し,人物輪郭形状から体の向き,さらに動き場から顔の向きの変化を求めている.人物の追跡にあたっては,人物を楕円形状でモデル化し,その動きをパーティクルフィルタで継続的に追跡する.体の向きは7つに離散化して扱い,その輪郭を用いて,あらかじめ得た事例データとのコンテキストマッチングにより求まる上位の多数決にて決定する.さらに,頭部の向きに関しては,SIFT特徴量から求める動きの変化を追跡することで頭部の向きの変化を推定している.このフレームワークにより,羽田空港の公共空間の映像に対しての評価を行い,良好な精度を得ている.

第5章は,「Dominant Motion Flow Analysis In Structured/Unstructured Crowds (群衆の主要な動き場の解析)」と題し,群衆の動きに関して,ある程度の秩序のある動きと秩序のない動きの両者に対して,主要な動き場の解析を論じている.SIFT特徴量を用いることで,個々の人物に重点をおいた追跡を行い,さらに,その動きの階層的なクラスタリングを行う.位置を考慮したクラスタリングを行うことで,局所的に主要な動きをもとめ,最終的に局所的な動きを接続することで広域の主要な動きを求めており,また,その詳細さの度合いを制御できることを示している.

第6章は,「Discussion and Future Work (考察と今後の課題)」と題し,本論文の延長上の課題について論じている.

第7章は,「Conclusion(結論)」と題し,本論文の成果についてまとめている.

以上を要するに,本論文は,単一カメラからの映像に対して,その視野内の人物の追跡,体と頭部の向きの推定,群衆の動きの解析,足跡の予測というインタラクティブアートでの応用について論じ,評価を通じてその有効性を示したものであり,情報学の基盤に貢献するところが少なくない.

従って、博士(科学)の学位を授与できると認める.

UTokyo Repositoryリンク