学位論文要旨



No 126223
著者(漢字) バールフット,ダニエル クレーリー
著者(英字) Burfoot,Daniel Crary
著者(カナ) バールフット,ダニエル クレーリー
標題(和) 乱数欠陥の探索による統計的モデル化
標題(洋) Statistical Modeling as a Search for Randomness Deficiencies
報告番号 126223
報告番号 甲26223
学位授与日 2010.03.24
学位種別 課程博士
学位種類 博士(情報理工学)
学位記番号 博情第290号
研究科 情報理工学系研究科
専攻 知能機械情報学専攻
論文審査委員 主査: 東京大学 教授 國吉,康夫
 東京大学 教授 廣瀬,通孝
 東京大学 教授 中村,仁彦
 東京大学 准教授 鹿島,久嗣
 東京大学 准教授 森,武俊
 東京大学 准教授 原田,達也
内容要旨 要旨を表示する

Abstract

The problem of estimating a real distribution from a limited set of empirical data is a widely important problem in statistical modeling and recognition. This thesis proposes an algorithm for the automatic construction of complex models. This algorithm works by dividing the data set into subsets, and improving the current model ensemble for each subset, and then continuing recursively. The special features of the method are that it avoids the combinatorial explosion caused by combining several context functions; that it works when the context functions are not independent, and that its performance scales well when applied to very large data sets. The thesis is divided into 11 chapters as described below, which present an overview of the basic concepts, a formal description of the method, an empirical evaluation on real world data, as well as an academic discussion of the philosophical basis of the idea.

Chapter 1 (Introduction) presents the main concepts of the proposed approach to statistical modeling. The "Algorithmic Information Theory (AIT) view" of statistical modeling is developed and compared to the view of traditional statistics and the view of information theory. Two key conceptual barriers to the AIT view are discussed, and solutions are proposed. The basic operation of the PITA algorithm is discussed, and two example problems are mentioned where PITA can be used. These example problems illustrate two important challenges faced by any feature-based modeling algorithm: the feature combination problem and the feature selection problem. A brief motivation section argues that a new way of statistical thinking is needed for new applications of statistics to fields such as computer vision and speech recognition.

Chapter 2 (Model Ensemble Update Method) presents a method for applying a simultaneous update to an ensemble of statistical models. The method uses the CDF F(x) of the model Q(x) to transform the observed data points X: Y=F(X). This technique is called the Probability Integral Transform. If the model Q(x) matches the real distribution P(x) that generated the points, then the distribution of Y values will be uniform. If the Y values are non-uniform (randomness deficiency), this indicates an imperfection in the model Q(x) and indicates a way to update Q(x) to bring it closer to P(x). The parameters of the model update are simply the bin counts of a histogram of Y values (called PIT values). Also, the update and the improvement resulting from the update are independent of the original data and models, and depend only on the histogram bin counts. Finally, it is shown that the PIT histogram update can be applied to an ensemble of models.

Chapter 3 (PIT Value Analysis Algorithm) presents the core idea of the thesis, the PIT Value Analysis (PITA) algorithm. A key idea of the algorithm is to maintain a separate model distribution for each data point. Each data point is also assumed to have its own context. A binary context function can be used to separate the data set into two subsets. The PIT histogram based model ensemble update can then be applied separately to each subset. The PITA algorithm uses a battery of context functions, and in each round selects the best context function by scoring the PIT histograms of the corresponding subsets. At the end of each round, the subsets corresponding to the best context function are updated separately using the PIT histogram model ensemble update method. The computational characteristics of the algorithm are discussed, with special reference to its ability to handle large data sets. The key ability of the algorithm to be layered over some other initial model is discussed. Finally the so-called "Bin Overlap" problem is analyzed and methods for dealing with it are presented.

Chapter 4 (Alternative CDF Transformation Techniques) discusses a set of alternative model ensemble update methods. Each of these depends on a different set of statistics related to the PIT values, and has a method for predicting the codelength savings achieved by the update.

Chapter 5 (Related Work) compares the PITA algorithm to other well-known ideas in the machine learning and statistics literature, including AdaBoost and other boosting methods, the Maximum Entropy framework, and various Goodness of Fit tests in statistics.

Chapters 6-10 present applications of the PITA algorithm to various tasks, including image compression, binary pattern classification, word morphology modeling, speech modeling, and motion recognition. The rigorous test of image compression indicates that the theory is basically sound. Encouraging results are achieved for image compression rates. In the chapter on binary pattern classification, a modified form of the PITA algorithm is applied to a well-known machine learning benchmark dataset. The results achieved are competitive with AdaBoost, a popular boosting algorithm. The word morphology chapter discusses problems that arise when dealing with non-numeric data, and presents some solutions to those problems. The speech modeling chapter shows how complex models for speech can provide much better descriptions of the speech data than standard models such as the Laplacian or Gaussian. On the motion recognition task, the PITA algorithm achieves much better recognition performance than an HMM.

Chapter 11 (Conclusion) provides some concluding remarks. It is emphasized that the PITA algorithm is just one example of a statistical modeling algorithm based on the search for randomness deficiencies. Other types of algorithms, using other techniques to search for randomness deficiencies, can be defined. The specific advantages of the PITA algorithm are reiterated, including its ability to construct high-performance complex models, its property that it can be layered over other models, and its ability to scale up to large data sets. Various issues and lessons coming from each specific application are discussed. A particularly important lesson comes from the speech modeling application, where a model of about one million bits complexity was used to save more than eighty million bits when encoding the data. This shows that when attempting to model raw data (in this case speech data), highly complex models can be constructed without overfitting. Finally, some pieces of future work are mentioned. One goal is to develop a version of PITA that can exploit the idea of "hidden" abstractions. Another goal is to use the AIT view of statistical modeling to develop a theory of cortical function.

The specific contributions made by the thesis are as follows. (i) The realization that statistical modeling can be approached as a search for randomness deficiencies in encoded data, i.e. a bit string in information transmission, indicating the imperfectness of data compression. (ii) The realization that the encoded data does not need to be represented as a bit string, and in particular can be represented as a stream of [0,1]-distributed numbers called PIT values, and the process of searching for randomness deficiencies will still work. (iii) The derivation of a method for updating an ensemble of models based on a histogram of PIT values. (iv) A formula for predicting the codelength savings achieved by applying the model update which depends only on the PIT histogram bin counts. (v) A method for scoring a context function in terms of the scores associated with the PIT histograms it generates. (vi) An algorithm called PITA that uses the model ensemble update method, a set of context functions, and the method for scoring the context functions, to construct complex conditional models Q(x|c), where x is an outcome and c is an arbitrary context value. (vii) The derivation of a number of other model ensemble update methods, along with different ways of predicting the savings achieved. (viii) The realization that the PITA algorithm can be layered on top of some other arbitrary model, resulting in an improved hybrid model. (ix) Experimental demonstrations of the PITA algorithm on a variety of tasks including image compression, binary pattern classification, word morphology modeling, speech modeling, and motion modeling and recognition.

In summary, this thesis develops an approach for the construction of complex models using large data sets, using a view of the problem of statistical modeling that is based on algorithmic information theory. The soundness of the method is demonstrated on a wide number of representative applications. Also, because of the algorithm's ability to be layered over an existing model to achieve a performance improvement, it has very wide applicability.

For the above reasons, the thesis makes several contributions to the field of information sciences. Thus, it is accepted for the degree of doctor of philosophy.

審査要旨 要旨を表示する

有限の観測データ集合からそれが従う真の確率分布を推定することは,広汎な統計的認識・学習手法の基盤である.本論文は,極めて複雑な結合分布となり得る実世界データに対して,分布関数の推定・改善と判別条件による分割を再帰的に繰り返すことで,階層的な統計モデルを自動的に構成する手法を提案している.結合条件に起因する組合せ爆発や条件の非独立性に起因する性能低下が少なく,大規模複雑データの処理に適している点が本手法の特徴である.本論文は11章からなり,以下の各章では基本概念の導入に始まり,手法の提示を経て,実データによる検証・評価を示した上で,学術的考察を加えている.

第1章 "Introduction"では,統計モデル推定理論とアルゴリズム情報理論とを関連付けつつ,本論文の基本概念と問題意識を呈示している.複雑な確率構造の推定について,モデルの推定・改善と判別条件による分割を再帰的に繰り返す基本的な考え方を呈示し,また,この種の問題における重要課題として,特徴選択問題と特徴結合問題を指摘し,その解決のための方針について論じている.

第2章 "Model Ensemble Update Method"では,統計モデル集合の逐次更新法の基礎を呈示している.観測データ集合Xとそれを生成する真の確率分布P(x)に関して,所与の確率分布推定モデルQ(x)の累積分布関数F(x)によるXの確率積分変換Y=F(X)が,Q(x)=P(x)のとき一様分布となることから,Yの非一様性(乱数欠陥)に基づきF(x)を修正することでQ(x)をP(x)に近づけられる.このとき,Y(PIT値と呼ぶ)のヒストグラムを用いてF(x)の修正パラメタを得る方法を示すと共に,その場合は修正の効果である符合短縮量が実データやモデルによらず,PITヒストグラムの階数のみに依存することを示している.また,上述の方法が複数モデルの集合に対して適用可能なことを指摘している.

第3章 "PIT Value Analysis Algorithm"では,本論文の中核的提案であるPIT値解析アルゴリズム(PITAアルゴリズム)を呈示している.入力の各変量ごとにモデル分布と文脈関数を割り当てる.文脈関数とは2値の判別条件式であり,その値により入力データ集合を2個の部分集合に分割する.分割された各部分集合ごとに2章で示したモデル更新法を適用する.事前に用意した多数の文脈関数の中から,毎回,PITヒストグラムに基づいて最もモデル改善効果の大きいものを自動選択しつつ,再帰的に上述の分割とモデル更新を繰り返す.この手法の計算量についての検討を示し,特に大規模データへの適用に適していることを論じている.また,本手法が,任意の既存手法が与える推定モデルを初期モデルとしてさらなる改善を得るために使える可能性を指摘している.

第4章 "Alternative CDF Transformation Techniques"では,2章で示した手法以外にモデル集合更新を実現する方法を複数呈示している.各方法はPIT値の異なる統計に対応しており,各々の場合における修正の効果を表す符合短縮量の評価も与えている.

第5章 "Related Work"は,本論文で提案したPITAアルゴリズムを,機械学習や統計学における著名な既存手法と対比し関連性と相違について論じている.特に,AdaBoostをはじめとするboosting手法,最大エントロピー法,および一連のモデル適合度評価を取り上げている.

第6章~10章では,PITAアルゴリズムを様々な応用問題に適用し,検証と評価を行っている.適用対象は,画像圧縮,2値パタン識別,英語の形態素モデル,音声モデル,運動認識である.劣化なしの画像圧縮への適用では,PITAアルゴリズムの正当性を確認すると共に,良好な圧縮率を達成した.2値パタン識別は著名な機械学習用公開ベンチマークデータに適用し,AdaBoostと同等の性能を示した.形態素モデルへの適用では,非数値データに適用する際の問題点を明らかにし,その解決策を呈示した.音声モデルにおいては,本手法による複雑なモデルが,標準的従来手法であるLaplacian,Gaussianモデルに比べ格段に良い記述能力を有することを示した.最後に,公開モーションキャプチャデータを用いた運動認識実験においては,PITAアルゴリズムが,標準的従来手法であるHMMと比較して,総合的に優れた認識能力を有することを示した.さらに,HMMに重畳してPITAを用いる方法を示し,それによりさらなる改善が得られることを示した.

第11章 "Conclusion"では,全体を総括し学術的考察と今後の展開について論じている.まず,PITAアルゴリズムが,乱数欠陥の探索を基盤とする統計モデル推定手法の一つであるとの位置づけを述べ,乱数欠陥の探索手法を変えることで,同じ基盤に立つ別の方法があり得ることを論じている.次に,PITAアルゴリズムの特長について,高性能の複雑なモデルを過学習に陥りにくく構成可能で大規模データに適していること,他手法で得たモデルの上に重畳可能であることなどを上げてまとめている.最後に,今後の展開について,脳機能モデル化への適用可能性等を含めて論じている.

以上これを要するに,本論文は実世界情報処理における重要課題である大規模複雑データの統計モデル推定に対し,アルゴリズム情報理論と統計モデル推定理論にまたがる独自の考察に基づく新手法を提案し、多様な代表的実データ処理問題に適用・評価してその正当性と有効性を示している.また,任意の既存手法に重畳して性能改善を図る方法を呈示することで有効応用範囲を広げている.

以上の理由から,本論文は知能機械情報学上貢献するところ少なくない.よって本論文は博士(情報理工学)の学位請求論文として合格と認められる.

UTokyo Repositoryリンク