学位論文要旨詳細

学位論文要旨


No		129106
著者（漢字）		張,任遠
著者（英字）
著者（カナ）		チョウ,ニンエン
標題（和）		学習アルゴリズム実装のための全並列アナログVLSIアーキテクチャ
標題（洋）		A Fully Parallel Analog VLSI Architecture for Implementing Learning Algorithms
報告番号		129106
報告番号		甲29106
学位授与日		2013.03.25
学位種別		課程博士
学位種類		博士(工学)
学位記番号		博工第7997号
研究科		工学系研究科
専攻		電気系工学専攻
論文審査委員		主査：　東京大学　教授　柴田,直　東京大学　教授　浅田,邦博　東京大学　教授　坂井,修一　東京大学　教授　廣瀬,明　東京大学　准教授　池田,誠　東京大学　准教授　三田,吉郎
内容要旨		要旨を表示する The cognitive functions play very important roles in the real-world tasks such as text analysis, audio processing and visual processing. In these cognitive tasks, the human brain is much superior to traditional very large scale integrated (VLSI) processors or software programs, since the brain can learn from samples autonomously. Therefore, plenty of machine learning algorithms have been developed to realize the learning operations, which were originally implemented by the software programs. Due to the reasons of power consumption and processing performances, a number of attempts to implement the machine learning algorithms were made by using hardware including graphic processing units (GPUs), field programmable gate array (FPGA), and VLSI circuits. Since many computations in the machine learning algorithms are very complex, the implementation costs including computing time and hardware utilization are greatly concerned. Furthermore, a large amount of iterations are always required by these algorithms, the learning speed is also a critical issue. Thus, the challenge on hardware implementations of learning algorithms lies on achieving a high processing speed with the consideration of limited hardware resource. In this thesis, a fully parallel architecture for implementing learning algorithms is proposed by using analog VLSI circuits. Several analog circuitries are designed to carry out the complex functions such as Gaussian function and Euclidean distance. These computations in the learning algorithms can be done in real time within the compact chip area. On the basis of analog computational circuitries, a generally applied architecture in fully parallel is developed to implement some machine learning algorithms. Since the chaos of analog signals is used for learning instead of clock-based numerical iterations, the learning operation is accomplished autonomously and self- converges with a high speed. Furthermore, the chip area and inner connection explosion problem in the traditionally parallel architectures can be prevented. To verify the proposed architecture, the support vector machine (SVM) was implemented by VLSI circuits and fabricated in a complementary metal-oxide- semiconductor (CMOS) technology. SVM is one of most important supervised machine learning algorithms, which has been widely applied in the pattern recognition tasks. In fact, a number of VLSI implementations have been developed to realize the SVM on-chip learning. Since the kernel functions in SVM theory are always expensive to carry out by using digital circuits, the analog implementations of SVM algorithm were suggested by some works. There were two problems in the previously developed works. Firstly, the traditional analog circuits applied in these works generate a highly dimensional Gaussian function through single-dimension multipliers. The error intolerably increases as the dimension increases. Therefore, these works can be hardly implemented in highly dimensional pattern classification. The second problem is the trade-off between the learning speed and the chip size. Generally, there is a trade-off between the amount of circuits and the learning speed. A high processing parallelism realizes a high speed; however, it requires a large number of circuits. The number of learning iterations, which is usually very large and does not depend on the hardware parallelism, has a marked effect on the learning speed. Therefore, conventional VLSI implementations employing clock-based iterations consume much time on these iterations indifferently to the degree of hardware parallelism. In this work, the proposed fully parallel implementation of SVM was used in the image recognition problem. An analog Gaussian generation circuit, which is robust against process variations, was developed for highly dimensional pattern vectors. The center, height, and width of the generated Gaussian function feature can all be programmed easily. Furthermore, the chip-area-hungry part for highly dimensional Euclidean distance computations and the much smaller part for exponential computation are built separately. Only the exponential computing circuits should be duplicated for a high degree of parallelism. In this manner, a fully parallel learning SVM processor was built within the compact chip area in a standard 0.18 um CMOS technology. Upon receiving highly dimensional pattern vectors, the learning process autonomously proceeded without any clock-based control and self-converged within a single clock cycle of the system (at 10MHz). To confirm the learning/classifying performance characteristics, 16 object images from a database were converted into 64-dimensional vectors and fed into the proposed SVM processor as learning samples. After self-learning, several other vectors were used as test patterns. The proposed SVM processor classified all the testing patterns into correct classes according to the measurement results. The processing speed, chip area and power consumption performances are improved compared with the traditional approaches. As a generally applied methodology, the proposed fully parallel architecture was also used to implement the unsupervised machine learning algorithms. On the basis of K-means mechanism, which is an important pattern clustering algorithm, a hardware efficient version was developed and named as K-Quasi-Centers (KQCs) method. From viewpoint of clustering results, the suggested scheme of clustering method has similar convergence performance to the original K-means algorithm. By using this modified clustering algorithm, the proposed analog fully parallel architecture can be applied to solve the unsupervised pattern clustering problem. The proof-of-concept processor was designed for 64-dimensional vectors categorization. In order to verify the performances of the proposed processor, sixteen images of two kinds of objects selected from the real image database were converted into feature vectors and fed into our KQC clustering processor. According to the circuit simulation results, all the images were correctly categorized into their respective classes even with several different random initializations, and the categorization results self-converged with higher speed than conventional approaches. From the above image processing applications, the proposed architecture performs a high processing speed and acceptable accuracy. However, the processing capacity of VLSI implementations is seriously limited by the chip size. One of the reasonable solutions to increase the number of learning samples is applying the on-line learning strategy, which was originally developed by software programs. In this work, the efficiency and importance of each learning sample are evaluated after the learning operation. The most inefficient sample is discarded to make the learning processor accept a new sample on-line. Employing the updated samples, the learning operation is repeated again. In this manner, a fixed VLSI processor can be used for the learning operation of a very large scale even unpredictable sample space. However, since on-line learning results in a large number of machine learning operations, this strategy is difficult to realize using software or traditional VLSI processors. Employing the proposed fully parallel architecture, the learning operations are accomplished with a high speed. Thus, this on-line learning strategy is efficient for the proposed architecture particularly. In order to verify the on-line learning performances, both SVM and KQC algorithms were implemented employing the analog fully parallel architecture for the image classification and clustering problems, respectively. From the circuit simulation results, the learning results are all correct with the consideration of on-line received samples. Furthermore, a visual tracking system was built by the combination of FPGA boards and the analog SVM processor developed by this work. Employing the on-line learning SVM, the object tracking performances were improved compared with those of conventional approaches. Besides the pattern classification and clustering problems, another important task of machine learning is called data domain description. It was found that the data domain description has an enhanced capacity for pattern recognitions. For instance, the SVM classification algorithm is originally for the two-class classification problems; but in the real-world applications, various numbers of classes might be required, even only a single class of learning samples is available in some applications. To solve these problems, a data domain description theory (also called one-class classification) was developed as an extension of SVM theory, which is named support vector domain description (SVDD). The SVDD algorithm has been applied in some classification problems, even unsupervised clustering problems by software programs. In this work, the SVDD algorithm has been implemented by our proposed analog fully parallel architecture. The proof-of-concept chip was built for the 64-dimensional pattern recognition. A multiple chip topology was proposed for multi-class recognition problems. For expending the classes, the number of chips can be freely increased. As an example, a three-class classification system employing three SVDD chips was built for real image recognition. After the on-chip learning session, several test images were fed in the system. From the chip measurement results, all the test patterns were correctly recognized. As the extension of analog VLSI implementations for soft-computing tasks, we discuss how CMOS supporting circuitries can interface the fabric of nano devices with digital computing world. Using CMOS ring oscillators to emulate the nano oscillator behavior, how to produce the associative memory function and to use it for image recognition is demonstrated by circuit simulation.
審査要旨		要旨を表示する本論文は,"A Fully Parallel Analog VLSI Architecture for Implementing Learning Algorithms (和訳:学習アルゴリズム実装のための全並列アナログVLSIアーキテクチャ)"と題し,学習機能を持った知的なVLSIシステムをアナログ回路技術で実現する研究の成果を纏めたもので,全文6章よりなり英文で書かれている. 第1章は,序論であり,本研究の背景について議論するとともに,本論文の構成について述べている. 第2章は,"Fully Parallel Support Vector Machine Processor Employing Analog Circuitry"と題し,Support Vector Machine(SVM)の境界学習が,回路のダイナミクスによって瞬時に完了するアナログVLSIチップについて述べている.ガウシアンカーネルを用いたSVMは,その高い分類性能により広く認識システムに用いられているが,そのほとんど全てはソフトウェア実装であり,実時間学習は不可能である.これまでアナログ回路による実現は2次元ベクトルに対応する原理検証のチップだけであったが,本研究で初めて64次元ベクトル対応の全並列動作チップの動作が示された.N個のサンプルの学習に全並列演算で対応するには,すべてのサンプル間の距離情報が同時に必要となるため,N2個のガウシアン回路が必要となる.この研究では,ガウシアン関数回路をユークリッド距離演算部と指数関数演算部に分離し,前者はtranslinear原理で実現した回路をN個並列に配置し,後者は一個のMOSトランジスタのsubthreshold 特性を直接用いて実現してN2個配置している.回路の面積はほとんどN個の距離演算部が占めているので,全体としては極めてコンパクトである.この結果,全結合の回路が実現でき,数マイクロ秒で学習の完了することが実験的に示された.これは重要な成果である. 第3章は,"Fully Parallel K-Quasi-Center Clustering Processor Employing Analog Circuitry"と題し,前章で開発したアナログSVMチップの構成をベースとして,K-meansクラスタリング学習の実行できるチップアーキテクチャについて述べている.前章で開発したSVMチップのN2個のガウシアン回路をN2個の電流メモリアレーに置き換えることで簡単に実現した.N個の学習サンプル間のすべての距離情報がチップ上に保持できているため,適宜スイッチ操作を組み変えることで任意のサンプル間の距離の合計を電流加算で容易に求めることができる.これによりK-meansの学習機能を実現した. 第4章は,"On-line Learning Strategy Based on the Fully parallel Architecture"と題し,時系列で入ってくる大量のサンプルデータのオンライン学習を実行できる回路方式について述べている.ハードウェアのリソースは有限であるため,刻々増加するサンプルデータを全てチップ上に蓄え,学習処理を行うことはできない.そこで時系列で入ってくサンプルを即座に学習処理し,分類境界決定に重要なサンプルだけを残し,あとはすべて新しいデータに置き換えるオンライン学習が有効である.本研究で開発した回路は,全並列アーキテクチャによる高速学習が可能であるため,これが実現できる.本章では,第2章のSVMと第3章のK-meansプロセッサの両方に対しこの考えを適用し,回路シミュレーションによってその有効性を示している.また,他の研究で開発された物体追跡システムに組み込み,実時間学習によって追跡物体の形状変化を随時学習し,有効に追跡の行えることも実証した.これは実用上重要である. 第5章は,"Support Vector Domain Description"と題し,多クラス分類問題に対応できるSVM方式について述べている.本来SVMは,2クラス間の境界を有効に決定できる分類器であるため,基本的に多クラス分類は不得手である.本章では,各SVMが自分のクラスとそれ以外のサンプルとの境界を効率よく決める方式の分類器を,これまで開発してきたアナログVLSIチップのアーキテクチャを基本に実現することを提案している.そして回路シミュレーションにより,画像の分類が正しく行われることを実証した. 第6章は結論である. 以上要するに本論文は,学習アルゴリズムの実時間実行を可能にするため,全サンプル間の距離情報をチップ上に保持して全並列回路で高速に実行するアナログVLSIアーキテクチャを開発し,その有効性を試作回路の実測と回路シミュレーションで実証した研究成果を纏めたもので,電子工学の発展に寄与するところが少なくない. よって本論文は博士(工学)の学位請求論文として合格と認められる.
UTokyo Repositoryリンク