学位論文要旨詳細

学位論文要旨


No		127290
著者（漢字）		ネビアロスカヤ,アレナ
著者（英字）		Neviarouskaya,Alena
著者（カナ）		ネビアロスカヤ,アレナ
標題（和）		テキスト中の感情・判断・評価認識のための構成的アプローチ
標題（洋）		Compositional approach for Automatic Recognition of Fine-Grained Affect, Judgment, and Appreciation in Text
報告番号		127290
報告番号		甲27290
学位授与日		2011.03.24
学位種別		課程博士
学位種類		博士(情報理工学)
学位記番号		博情第328号
研究科		情報理工学系研究科
専攻		電子情報学専攻
論文審査委員		主査：　東京大学　教授　広瀬,啓吉　東京大学　教授　石塚,満　東京大学　教授　喜連川,優　東京大学　教授　相澤,清晴　東京大学　准教授　苗村,健　東京大学　准教授　豊田,正史
内容要旨		要旨を表示する Sharing feelings, pleasant or painful impressions, showing sincere empathy or indifference, exchanging tastes and points of view, advancing moral values, expressing praise or reprehension are indispensable for full-value and effective social interplay between people. With rapidly growing online sources (news, blogs, discussion forums, product or service reviews, social networks etc.) aimed at encouraging and stimulating people's discussions concerning personal, public, or social issues, there is a great need in development of robust computational tools for the analysis of people's preferences and attitudes. Sentiment or subjectivity analysis is nowadays a rapidly developing field with a variety of emerging approaches targeting the recognition of sentiment reflected in written language. Automatic recognition of positive and negative opinions and classification of text using emotion labels have been gaining increased attention of researchers. However, the topic of recognition of fine-grained attitudes expressed in text has been ignored. According to the Appraisal theory proposed by Martin and White (2005), attitude types define the specifics of appraisal being expressed: (1)Affect - personal emotional state or reaction. (2)Judgement - ethical appraisal of person's character, behaviour, skills etc. according to various normative principles. (3)Appreciation - aesthetic evaluation of semiotic and natural phenomena, events, artifacts etc. The main objectives of our research are: (1)Fine-grained classification of sentences using attitude types: Affect: nine emotions defined by (Izard 1971): 'Anger', 'Disgust', 'Fear', 'Guilt', 'Interest', 'Joy', 'Sadness', 'Shame', and 'Surprise'. Judgment: positive and negative judgment: 'POS jud' and 'NEG jud'. Appreciation: positive and negative appreciation: 'POS app' and 'NEG app'. (2)Novel way of deep attitude analysis based on the compositional approach and the semantics of terms. (3)Analysis of the strength of the attitude and determination of the level of confidence, with which the attitude is expressed, in the interval [0.0, 1.0]. (4)Development of applications driven by attitude-sensing system. In the thesis, first we describe the developed Affect Analysis Model (AAM) that is based on rule-based linguistic approach for classification of sentences using nine emotion labels or neutral. The proposed algorithm consists of five main stages: (1) symbolic cue analysis; (2) syntactic structure analysis; (3) word-level analysis; (4) phrase-level analysis; and (5) sentence-level analysis. We demonstrate the results of AAM evaluation on two data sets represented by sentences from diary-like blog posts. Averaged accuracy of our system is up to 81.5 percent in fine-grained emotion classification (nine emotion labels and neutral) and up to 89.0 percent in polarity-based classification. As lexicon-based systems strongly depend on the availability of sentiment-conveying terms in their databases, in order to overcome the problem of lexicon coverage, we introduce original methods for building and expanding sentiment lexicon (SentiFul) represented by sentiment-conveying words that are annotated by sentiment polarity, polarity scores and weights. The main features of the SentiFul are as follows: (1) it is built using not only methods exploring direct synonymy ('congratulate'Pos=0.4 => 'compliment'Pos=0.4), antonymy ('reward'Pos=0.2 => 'penalty'Neg=0.2), and hyponymy ('fault'Neg=0.6 => 'betise'Neg=0.6) relations, but also innovative methods based on derivation and compounding with known lexical units; (2) it is larger than the existing lists of sentiment words; (3) it includes polarity scores, in contrast to most existing sentiment dictionaries that lack assignments of degree or strength of sentiment. The originality and valuable contribution lie in the elaborate patterns/rules for the derivation and compounding processes that have not been considered before. We propose to distinguish the following types of affixes (used to derive new words) depending on the role they play with regard to sentiment features: (1)Propagating affixes preserve sentiment features of the original lexeme and propagate them to newly derived lexical unit. For example: 'en-' + 'rich'Pos=0.2 => 'enrich'Pos=0.2; 'scary'Neg=0.9 + '-fy' => 'scarify'Neg=0.9. (2)Reversing affixes change the orientation of sentiment features of the original lexeme. For example: 'harm'Neg=0.88 + '-less' => 'harmless'Pos=0.88; 'dis-' + 'honest'Pos=0.1 => 'dishonest'Neg=0.1. (3)Intensifying affixes (e.g., 'super-' in 'superhero', 'over-' in 'overawe') and Weakening affixes (e.g., 'semi-' in 'semisweet', 'mini-' in 'mini-recession') increase/decrease the strength of sentiment features of the original lexeme. The schematic illustration of our derivation and scoring algorithm is shown in Figure 1. Besides derivation, we considered important process of finding new words such as compounding, which is a highly productive process, especially in the case of nouns and adjectives. We elaborated the algorithm for automatic extraction of new sentiment-related compounds from WordNet (Miller 1990) using words from SentiFul as seeds for sentiment-carrying base components and applying the patterns of compound formations (for example, 'ill'Neg=0.467 + 'famed'Pos=0.475 => 'ill-famed'Neg=0.467; 'pain'Neg=0.8 + 'killer'Neg=0.35 => 'pain-killer'Pos=0.575; 'risk'Neg=0.567 + 'free'[valence shifter] => 'risk-free'Pos=0.567). We assume that if a compound contains at least one base component that conveys sentiment features, we can predict the valence of this compound. The evaluations of the proposed methods showed that they achieved high accuracy in assigning dominant polarity labels and polarity scores to the words. The method based on compounding performed with the highest accuracy in assigning dominant positive or negative labels, followed by the methods considering hyponymy relations, derivation process, synonymy relations, and antonymy relations (this method yielded noisy results). In this thesis, we introduce novel compositional linguistic approach for attitude recognition in text. We built a lexicon for fine-grained attitude analysis (AttitudeFul) that includes attitude-conveying terms (e.g., 'honorable' [POS jud: 0.3], 'unfriendly' [Sadness: 0.5; NEG jud: 0.5; NEG app: 0.5]), extensive sets of modifiers, contextual valence shifters, and modal operators, which contribute to robust analysis of contextual attitude and its strength. The architecture of the developed Attitude Analysis Model (@AM) is presented in Figure 2. During the 'Symbolic Cue Processing' stage, the system analyses the occurrences of emoticons, abbreviations and acronyms, interjections, 'question mark' and 'exclamation mark', repeated punctuation, and capital letters. The analysis of syntactic structure and functional dependencies of a sentence is performed by the Connexor Machinese Syntax. On the 'Word Level Analysis' stage, the system checks the availability of the sentence tokens in the AttitudeFul database and gets their annotations depending on the category. In case of an attitude-conveying word, its attitude features are represented as a vector of attitude strengths (intensities): a=[POS jud, NEG jud, POS app, NEG app, Anger, Disgust, Fear, Guilt, Interest, Joy, Sadness, Shame, Surprise]. For example: a('high-spirited')=[0.7 (POS jud),0,0,0,0,0,0,0,0,0.7 (Joy),0,0,0]. There are several categories of modifiers registered in the AttitudeFul database: adverbs of degree, adverbs of affirmation, negation words, adverbs of doubt, adverbs of falseness, prepositions, and condition operators. After the word level annotations are taken from the database, the system turns to the analysis of high-level concepts, which will play the key role in the decision on final attitude label of a sentence. A high-level concept of each noun in the sentence is determined based on: (1)Analysis of the sequence of hypernymic semantic relations of a particular noun in WordNet (Miller 1999). For example: 'student' => PERSON; 'miracle' => EVENT; 'decoration' => ARTIFACT. (2)Annotations from the Stanford Named Entity Recognizer (Stanford NER) (Finkel et al. 2005): PERSON, ORGANIZATION, and LOCATION. Using the data from the 'Clause Splitter', the 'Formation Builder' module represents each clause as a set of formations: Subject formation (SF), Verb formation (VF) and Object formation (OF), each of which may consist of a main element (subject, verb, or object) and its attributives and complements. The 'Representation of Clause Dependencies' module is responsible for building a so-called 'relation matrix', which contains information about the dependencies between different clauses in a compound, complex, or complex-compound sentences. Words in a sentence are interrelated and, hence, each of them can influence the overall meaning and attitudinal bias of a statement. Our algorithm for attitude classification is designed based on the compositionality principle, according to which we determine the attitudinal meaning of a sentence by composing the pieces that correspond to lexical units or other linguistic constituent types governed by the rules of polarity reversal, aggregation (fusion), propagation, domination, neutralization, and intensification, at various grammatical levels. In order to elaborate rules for the attitude analysis based on the semantics of verbs, we investigated VerbNet (Kipper et al. 2007), the largest on-line verb lexicon that is organized into verb classes characterized by syntactic and semantic coherence among members of a class. Based on the thorough analysis of 270 first-level classes of VerbNet and their members, 73 verb classes (1) were found useful for the task of attitude analysis, and (2) were further classified into 22 classes differentiated by the role that members play in attitude analysis and by rules applied to them. For example, @AM classifies sentence 'They prevented [verb of adverse attitude] the spread of disease' as positive appreciation, and 'My whole enthusiasm and excitement disappear [verb of disappearance] like a bubble touching a hot needle' - as conveying negative emotion ('Sadness'). The decision on the most appropriate final label for the clause, in case @AM annotates it using different attitude types according to the words with multiple annotations or based on the availability of the words conveying different attitude types, is made based on the analysis of: (1) morphological tags of nominal heads and their premodifiers in the clause; (2) high-level concepts of nouns based on WordNet; and (3) high-level concepts of named entities based on the annotations from the Stanford NER. For example, @AM outputs different attitude labels for the following sentences containing only one attitude-conveying word 'unfriendly' (a('unfriendly')=[0,0.5 (NEG jud),0,0.5 (NEG app),0,0,0,0,0,0,0.5 (Sadness),0,0]): 'I feel highly unfriendly attitude towards me', 'The salesperson was really unfriendly', and 'Plastic bags are environment unfriendly': (1)I [NomFPP] feel highly [modifier: adverb of degree: 1.7] unfriendly [NEG aff (Sadness): 0.5; NEG jud: 0.5; NEG app: 0.5] attitude [WN: COGNITION] towards me [AccFPP] => => 'NEG aff' ('Sadness'): 0.85. (2)The salesperson [WN: PERSON] was really [modifier: adverb of degree: 1.55] unfriendly [NEG aff (Sadness): 0.5; NEG jud: 0.5; NEG app: 0.5] => => 'NEG jud': 0.78. (3)Plastic bags [WN: ARTIFACT] are environment [WN: STATE] unfriendly [NEG aff (Sadness): 0.5; NEG jud: 0.5; NEG app: 0.5] => => 'NEG app': 0.5. There are several aspects that distinguish our Attitude Analysis Model from other systems. First, our method classifies individual sentences using fine-grained attitude labels (nine for different affective states, two for positive and negative judgment, and two for positive and negative appreciation), as against other methods that mainly focus on two sentiment categories (positive and negative) or six basic emotions. Next, our Attitude Analysis Model is based on the analysis of syntactic and dependency relations between words in a sentence; the compositionality principle; a novel linguistic approach based on the rules elaborated for semantically distinct verb classes; and a method considering the hierarchy of concepts. As distinct from the state-of-the-art approaches, the proposed compositional linguistic approach for automatic recognition of fine-grained affect, judgment, and appreciation in text (1) is domain-independent; (2) extensively deals with the semantics of terms, which allows accurate and robust automatic analysis of attitude type, and broadens the coverage of sentences with complex contextual attitude; (3) processes sentences of different complexity, including simple, compound, complex (with complement and relative clauses), and complex-compound sentences; (4) handles not only correctly written text, but also informal messages written in an abbreviated or expressive manner; and (5) encodes the strength of the attitude and the level of confidence, with which the attitude is expressed, through numerical values in the interval [0.0, 1.0]. The performance of our Attitude Analysis Model was evaluated on data sets represented by sentences from different domains. @AM achieved high level of accuracy on sentences from personal stories about life experiences, fairy tales, and news headlines, outperforming other methods on several measures. In fine-grained attitude classification (14 labels) our system achieved averaged accuracy of 62.1 percent, and in coarse-grained classification (3 labels) -87.9 percent. Using Affect Analysis Model and Attitude Analysis Model, we have developed several applications: AffectIM (Instant Messaging application integrated with AAM), EmoHeart (application of AAM in 3D world Second Life), iFeel_IM! (innovative real-time communication system with rich emotional and haptic channels), and web-based @AM interface. We believe that the output of our systems can contribute to the robustness of the following society-beneficial and analytical applications: public opinion mining, deep understanding of a market and trends in consumers' subjective feedback, attitude-based recommendation system, economic and political forecasting, affect-sensitive and empathic dialogue agent, emotionally expressive storytelling, integration into online communication media and social networks. Figure 1 The algorithm of derivation and scoring of the new words Figure 2 Architecture of @AM
審査要旨		要旨を表示する本論文は「Compositional Approach for Automatic Recognition of Fine-Grained Affect, Judgment, and Appreciation in Text(テキスト中の感情,判断,評価認識のための構成的アプローチ)」と題し,英文で全9章から成る.大別して,第2~4章は第I部「Recognition of fine-grained emotion in text(テキスト中の細粒度感情の認識)」,第5~7章は第II部「Recognition of affect, judgment, and appreciation in text(テキスト中の感情,判断,評価の認識)」として構成され,8章は第III部「Applications(応用)」となっている. 第1章「Introduction(序論)」では,まずテキストから認識対象とする社会的コミュニケーションに於けるテキスト著者の態度(attitude)の種別について説明している.そして本研究の目的は,(1)感情(怒り,嫌悪,恐れ,罪悪感,興味,喜び,悲しみ,恥,驚き),正/負の判断,正/負の評価といったテキスト中に表れる態度の細粒度レベルでの認識,(2)構成的アプローチと語彙の意味に基づくテキスト著者の態度の深い分析法,(3)態度の強度の分析法,(4)テキスト著書の確信度レベルの判別法,(5)これらの態度の認識による応用システムの開発,であるとしている.本章でこれらに必要な背景知識と関連研究についても述べている.なお対象とするテキストは英文である. 第2章「Basis for Affective Text Classification(感情的テキストの識別の基礎)」では,心理学者によって提唱されている上記9種の感情カテゴリーと,感情認識の基礎になるデータベースについて記している.テキスト感情認識システムを支えるここでの語彙データベースは,感情に関係する形容詞,副詞,名詞,動詞,修飾詞,間投詞に加えて,絵文字(emoticons),略語も含む.更に,これらの語彙の蓄積法,アノテーション付加法のこれまでの研究を紹介している. 第3章「Affect Analysis Model(感情分析モデル)」では,テキスト中の感情認識のための手法を記し,異なる複雑度をもつ文中の感情センシングの例を与えている.この認識手法は9種の感情ラベルに対する言語的ルールベース・アプローチを採っている.そして,この提案手法は感情に関係する語彙の手がかり分析,構文的解析,句レベル分析,文レベル分析を含み,簡単な文から複合的で複雑な文まで処理可能にしている. 第4章「Evaluation of the Affect Analysis Model Algorithm(感情分析モデル・アルゴリズムの評価)」では,ブログ記事の2つのデータセットを用いて提案した手法を評価している.そして,感情認識の適合率,再現率,F値(適合率と再現率の調和平均)の点で,既存手法より優れた結果が得られることを実験的に示している. 第5章「Lexical Resources(語彙資源)」では,感情語彙セット(本研究のSentiFul),態度分析用語彙セット(本研究のAttitudeFul)の生成法を提示している.SentiFul生成法の特徴は,語義性,反意性,包摂性の関係を利用するだけでなく,既知語彙ユニットからの派生や組み合わせを用いる新手法を含んでいることである.そして,この感情語彙セットは12,900語の規模となり既存のものよりも大きく,また多くの既存のものが有しない正負極性の強度情報を有することも特徴である.AttitudeFullは細粒度態度分析のための次のような語彙セットを含んでいる.即ち,態度を表わす用語,修飾詞,正負極性を反転させる用語,様相を表わす用語であり,これらは態度とその強度の認識に仕様される. 第6章「Attitude Analysis Model(@AM)(態度分析モデル)」では,テキスト中の細粒度感情,判断,評価の強度も含めた認識の中核となる言語分析のルールベースによる構成的アプローチについて記している.この手法では,まず各文の細粒度態度ラベル(9感情状態,正/負の判断,正/負の評価のラベル)を認識する.(大部分の既存手法が正/負の判断や6感情を認識しているのに対し,ここでの手法は細粒度の認識になっている.)次いで,文中の語の構文的及び依存関係を利用して合成する構成原理(各種文法レベルでの正負極性の反転,集約,伝播,支配,中性化,強調のルール),及び意味的に優位な動詞クラスを用いる言語的アプローチと,WordNetとNER (named entity recognizer)に基づく階層を考慮する方法を用いて,認識する手法になっている. 第7章「Evaluation of the @AM Algorithm(@AMアルゴリズムの評価)」では,第3章及び第6章のモデル及び手法を用いて開発した応用システムを記している.AffectAMシステムは,テキスト中の感情認識を結合したアバタ(分身キャラクタ)を用いる感性的インスタント・メッセージング・システムであり,自動認識された感情によりアバタの感情動作・表現を生成する.20名の被験者による検証実験を行い,ユーザが感情を手動で選択するシステムと同等のアバタの感情表現が生成されることを示している.態度分析モデルである@AMを3D仮想空間セカンドライフ上でのアバタを介する感性的チャット・システムにも適用し,チャット・テキストよりアバタの感情表現の自動生成を可能にしている.更に,触覚機器を装着した人間ユーザへチャット・テキストから感情を触覚的にも伝達するiFeel_IM!システムを実現している.これらを通じて,創案,開発したテキスト中の態度認識法の実用的システムにおける有用性を提示している. 第9章「Discussion and Conclusions(議論と結論)」では,本論文の研究成果をまとめ,今後の研究課題について述べている. 以上を要するに,本論文はテキスト中に表わされる9種の感情,正/負の判断,正/負の評価という細粒度の態度(attitude)をその強度も含めて認識するための,テキスト中の語の構文的及び依存関係を利用する新しい構成的手法,及びその認識の初期段階で用いる規模が大で付随情報も多い語彙セットの新しい生成法を創案,開発している.そして,この認識手法をテキスト中から認識した感情などにより感情的動作・表現を自動生成するアバタを用いる感性的インスタント・メッセージング・システム,及び3D仮想空間での感性的チャット・システム等の構築に適用し,被験者実験等を通して実用的システムにおける有用性を提示しており,電子情報学上貢献するところが大きい. よって本論文は博士(情報理工学)の学位論文として合格と認められる.
UTokyo Repositoryリンク		http://hdl.handle.net/2261/44001