学位論文要旨



No 125007
著者(漢字) 郝,佳
著者(英字)
著者(カナ) ハオ,ジャ
標題(和) 方向性エッジ情報を用いた動きフィールド特徴表現に基づく観察者動作認識システム
標題(洋) An Ego-Motion Detection System Employing Directional-Edge-Based Motion Field Representations
報告番号 125007
報告番号 甲25007
学位授与日 2009.03.23
学位種別 課程博士
学位種類 博士(科学)
学位記番号 博創域第425号
研究科 新領域創成科学研究科
専攻 基盤情報学専攻
論文審査委員 主査: 東京大学 教授 柴田,直
 東京大学 教授 相田,仁
 東京大学 教授 高木,信一
 東京大学 教授 相澤,清晴
 東京大学 准教授 峯松,信明
 東京大学 講師 山,俊彦
内容要旨 要旨を表示する

Benefiting from the remarkable development of VLSI technology according to Moore's law, computer owns the powerful computing ability exceeding that of humans. It has taken most of the charge of computing works instead of us, and has contributed much to scientific computation. However, in spite of this success in logical and dedicated computation, computers are still not good at flexible intelligent processing, such as "recognition". This sets the research of human-like intelligence irreplaceable tendency for the next generation of computer system. Since for humans, visual information, and especially visual motion information, is one of the most critical sources for recognition tasks, we explore machine intelligence by taking visual motion analysis as a breakthrough.

As the prerequisite of any motion analysis system, ego-motion detection that analyzes the relative motion of an observer with respect to the environment has been drawing a lot of interest in research. It plays an essential role in navigational tasks, such as automotive vehicle guidance, real-time robot control, etc. The performances of these systems are essentially determined by two major characteristics. First is the system's flexibility in adapting to disturbances, viz. illumination change, irregular observer motion due to a variation in speed or bumping and shaking of the observer and so forth. The other is the system's operating speed that ensures a real-time response capability. Since an increase in a system's flexibility is offset by an increase in computational cost, how to compromise between these two aspects is of prime importance in building such systems.

Many contributions have been achieved for ego-motion detection by means of feature tracking, environment modeling, and so forth. Unfortunately, most of them are developed for specific missions, assuming comparative ideal circumstances such as constant illumination condition, smooth motion, etc. They therefore suffer from degradation of accuracy under severe circumstances. Moreover, most of these algorithms involve operations in frequency and spatial domain, i.e., estimating the motion models by solving some complex equations with floating-point calculations. Since the circuitry needed for the calculation of floating-point numbers is very complicated, it is difficult to implement these algorithms as a VLSI system, therefore by no means real-time practical.

On the other hand, biological systems are robust against these problems. Since the discovery by Hubel and Wiesel in the study of visual cortex of animals, it is well known that edge information plays an essential role in early visual processing. The illumination invariance of edge information makes the perception systems work well even under serious illumination conditions. In addition, for the motion interpretation stage, "an associative architecture" directly inspired by human physiology can more easily execute recognition process instead of the large number of numerical computations required by conventional methods, which has been successfully demonstrated by applying it to still image recognition.

The purpose of this research is to accomplish more complex recognition task, ego-motion detection, based on these biological principles, where edge information is utilized as the very basis of the system, and the "recognition" process of motion analysis is executed in the "associative" manner. The system is composed of two stages: extracting motion information from a scene, and interpreting the accumulated information. In the first stage, motion field generation, directional edge maps generated from original gray-scale images are utilized as the input. Using edge information renders the system robust against dynamic illumination variation and weak texture, while the histogram matching scheme based on edge maps for local motion detection drastically reduces the computational cost than conventional block matching. In the second stage, motion characteristics are extracted from two perspectives of a motion field and are represented by two kinds of feature vectors. They are jointly utilized in the hierarchical classification scheme for a concise and efficient estimation of motion pattern. Multi-clue template matching in this scheme makes the algorithm robust against the motion ambiguity problem, such as distinguishing tracking and panning motions of the same direction. Moreover, by introducing a new scheme in hierarchical classification, motion field distortion due to camera shaking during video capture has also been resolved.

All operations in both the motion field generation stage and the hierarchical motion pattern classification stage are executed with integer or 1-bit calculation, namely fixed-point operation, thus it is easy to implement the algorithm with compact hardware circuitry to meet real-time need.

The ego-motion detection system has also been applied to motion estimation of hand-held devices, such as mobile phones. Digit-writing gesture recognition was taken as a target problem, where the writing stroke is recorded from a image sequence taken by a moving camera. The automatic speed adaptation capability developed in the motion detection system has enabled very robust writing stroke detection. As a result, the temporal stroke distortion due to irregular writing speed has been eliminated. Since the writing stroke is correctly reconstructed by integrating direction and magnitude of motion results, feature vector for each digit character was constructed by connecting feature distribution in each direction. As a result, handwriting gesture recognition is achieved by simple template matching. The system performance has been evaluated by digit-writing gesture recognition with irregular writing speed, different users, or cursive writing. Recognition of hand-writing Chinese characters is also attempted and the potentiality for more complicated hand-writing patterns by the algorithm has been examined. The result shows that it is possible to build higher level interfacing based on only vision information and broader use of ego motion detection to motion estimation of mobile can be expected.

In this research, an ego motion detection system employing directional-edge-based motion field representations has been developed, and successfully applied to camera motion estimation, and digit-writing gesture recognition. Usage of directional edge information renders the illumination-independent performance in motion field generation. Hierarchical vector representation of motion field has resolved the problems of motion ambiguity and motion field distortion by simple vector processing. The introduction of the speed adaptation scheme is very effective for correct camera motion detection and therefore correct stroke generation for trajectory recognition. The algorithms proposed in this work have enabled us to build real-time response systems using the dedicated VLSI chips developed for the processing. The flexibility and the speed performance of the system simultaneously are achieved. The performance of the system has been evaluated under various circumstances and the capability of the directional edge-based motion field representation algorithm in performing robust visual motion perception has thus been verified.

審査要旨 要旨を表示する

本論文は、An Ego-Motion Detection System Employing Directional-Edge-Based Motion Field Representations (和訳:方向性エッジ情報を用いた動きフィールド特徴表現に基づく観察者動作認識システム)と題し、人間のように柔軟な動画像認識VLSIシステム構築の基礎として、方向性エッジを用いて動きフィールドを高速且つ精度よく生成するアルゴリズムと、これを用いて擾乱に対しロバストな観察者動作認識システムを開発した研究成果を纏めたもので、全文7章よりなり、英文で書かれている。

第1章は、序論であり、本研究の背景について議論するとともに、本論文の構成について述べている。

第2章は、Local Motion Detection と題し、動画像の各部分より局所的な動きを精度良く検出する方法について述べている。生体の情報処理にヒントを得て、画像の輝度情報ではなく方向性エッジ情報を利用することで、簡単な演算でしかも精度の良い動き検出を実現した。所定の領域の縦または横方向エッジのヒストグラムを作成し、このヒストグラムを連続するフレーム間でマッチングすることにより、それぞれx方向あるいはy方向の動きを求めている。また、エッジ検出のための閾値を適応的に決めるアルゴリズムの導入により、照明変化に対してロバストな動き検出を実現した。

第8章は、Motion Field Generation と題し、前章で得た局所的な動き情報を統合して有用な動きフィールドを構成する方法について述べている。観察者の動作スピードが時間的に変動しても一定の動きパターンが得られるように、自動スピード調整機能を提案しシステムに導入した。これはロバストな観察者動作認識を実現する上で重要な機能である。

第4章は、Hierarchical Motion Pattern Classification と題し、前章で得た動きフィールドから観点の異なる二種類の特徴ベクトルを抽出・生成する手法と、これを用いて様々な観察者動作パターンを階層的なテンプレートマッチング法により分類・認識するアルゴリズムについて述べている。先ずシーン全体の大まかな動きを表現するGlobal Motion Vectorによって、観察者の前進・後退、左右・上下の動きといった概略の分類を行うが、例えば左右への平行移動か、左右への回転かといった詳細な判別はできない。そこで細かな動きを表現するComponent Distribution Vector を導入し詳細な判別を可能にした。さらに後者を用いて、観察者が乗り物等に乗った場合に発生する、縦揺れや横揺れといった擾乱を除外して正しく分類する手法も開発した。これは重要な成果である。

第5章は、Ego-Motion Detection in Various Environinents と題し、第2~4章で開発したアルゴリズムを用いて行った一連の実験結果について述べている。前進・後進、左右への平行移動、左右への回転、上下への回転等の動きを、様々な環境で、また極端に変化する照明条件において検出を行い、良好な結果を得ている。また大きな縦揺れ、横揺れの重畳した動きに対しても正しく認識の出来ることを示し、本アルゴリズムが有効であることを述べている。

第6章は、Application to Digit-Writmg Hand Gesture Recognition と題し、本研究で開発した観察者動作認識手法の一つの応用として、Oから9までの数字の入力をカメラの動きによって行う方法について述べている。各瞬間に検出されたカメラの動きを時間的・空間的に積分することにより筆跡を特徴ベクトルに変換、テンプレートマッチング法により良好な認識結果を得た。また漢字のような複雑な入力パターンに対しても対応可能であることを示している。

第7章は、結論である。

以上要するに本論文は、方向性エッジ情報を用いて高速かつ精度よく動画像より動きフィールドを生成し、これをマクロ及びミクロの二つの観点から二種類のベクトル表現に変換し、これらを相補的に用いた階層的なテンプレートマッチング法により、照明変化や観察者の不規則な動きといった擾乱に対してロバストな観察者動作認識システムを実現したもので、情報学の基盤に寄与するところが少なくない。

よって本論文は博士(科学)の学位請求論文として合格と認められる。

UTokyo Repositoryリンク