
No 126193
著者(漢字) 櫻庭,俊
著者(カナ) サクラバ,シュン
標題(和) 生体分子運動の縮約表現
標題(洋) Reduced Ensemble Representation of Biomolecular System
報告番号 126193
報告番号 甲26193
学位授与日 2010.03.24
学位種別 課程博士
学位種類 博士(科学)
学位記番号 博創域第610号
研究科 新領域創成科学研究科
専攻 情報生命科学専攻
論文審査委員 主査: 東京大学 教授 森下,真一
 東京大学 教授 浅井,潔
 理化学研究所 グループディレクター 泰地,真弘人
 東京大学 准教授 中村,周吾
 東京大学 准教授 北尾,彰朗
内容要旨 要旨を表示する

Dynamics of the biomolecules are complex, which enables the complex func-tionality of the biomolecular systems. One of the best ways to understand suchcomplex systems is, simply, seeing how it works. Molecular simulation enables usto make a _sandbox_ of the proteins inside computers, with which we researcherscan observe its intrinsic motions or its response to changes in the system; we cansee how it works. Combined with other experimental techniques, it nowadaysbecame a mature and an essential tool for protein science.

One of the problems with the molecular simulation is how we understandthe results. Proteins are high dimensional systems, and the simulations of them generate even tera-bytes sized data within a week. The ways to extract datafrom the enormous data space is necessary, and the result of extraction has to be human-understandable. Herein, I propose two approaches to cope with the problem. One is a method to estimate the stability of the protein's substates, without requiring long sequential run. The other is a method to determine the best axes to represent protein's dynamics, or reaction coordinates of the proteins, automatically from the simulation.

1 Combining Multiple Non-equilibrium Dynam-ics Simulation Based on Markov Model

In the molecular simulation, the snapshots of the system are taken with interval,which consists of trajectory. Trajectory is considered as the representative set for the molecule. However, even with today's hardware and simulation algo-rithm, there are still a large gap between the time-frame of simulation and that of protein's functional motion, such as enzyme activity or receptor binding. Pos-sible workaround is to improve sampling with modi_ed dynamics. These meth-ods, usually called generalized ensemble, sample snapshots from simulations with modi_ed dynamics; after obtaining data, each snapshot of the trajectory is weighed according to the magnitude of modi_cation. The pair of trajectory and weigh represents the protein's character. One of the drawbacks of these methods is that methods rely on the equivalence of statistical ensemble average and time average. Because of this, for the large systems these methods require long equilibration time before researchers can start sampling snapshots. Thus generalized ensemble is at this moment impractical for the biomolecules.

In this research I pursued the new analysis method which calculates the statistical ensemble of the system, but which does not require total equilibration of the system. Starting from the assumption of Markovity, the dynamics of the system can be represented by a transition matrix M. The equilibrium probability density _ can be obtaind by the left diagonalization of the M, since_ is the left eigenvector of M corresponding to eigenvalue 1.

Simple form to compute equilibrium probability distribution was therefore ob-tained. Extending this form to combine multiple simulations, I developed a method called Multiple Markovian transition Matrix Method (MMMM), based on the error analysis of the eigenproblems. MMMM was compared with other state-of-art ensemble value determination methods, and it showed better result with the case that the equilibration cannot be expected. Also, the method was tested with peptide energy landscape as an application, giving proper transition state structure. With these results, I showed the method is applicable to even a practical case.

2 Decomposing Protein into Components with Correlations

In section 1, reaction coordinates were considered to be given, but _nding proper reaction coordinates is a non-trivial task. Also, because of the curse of dimen-sionality, above problem does not scale well. The number of states blows-up with the increase of dimension. One of the reasons is that proteins consist of strongly coupled components. If there are many uncoupled components in a protein, each component can be analyzed in divide-and-conquer fashion. In reality, it is non-trivial task to _nd uncoupled components.

In order to identify the correlated modes and to decompose proteins into uncoupled components, I borrowed an idea from the _eld of the signal process-ing. In this thesis Independent subspace analysis (ISA) method is introduced to the concept of the biomolecular collective motion. A linear projection of the coordinates x is considered:〓With ISA, projected vector s and its rotation matrix A can be determined so as to minimize the number of strongly correlated modes. Procedure based on subspace joint approximate diagonalization of eigenmatrices (SJADE) algo-rithm is employed to perform ISA. ISA/SJADE dissects multiple dimensions into irreducible _blocks_, in which projection to each dimension have strong cor-relation within the same block, while there are no or very small correlation over di_erent blocks. With ISA/SJADE, the result of 100-ns MD simulation of T4 lysozyme was analyzed for testing purpose. Result showed the modes determined from ISA explain long-ranged correlation of modes, and it success-fully found the modes which are strongly correlated to functionally important residues reported from mutation experiments. From these results, ISA is shown to be powerful technique for analyzing protein dynamics.

Figure 1: Energy landscape of 5-residue peptide Met-enkephalin, obtained with MMMM. Representative structures for each characteristic point is also shown on the right side.

Figure 2: Arrow representation of SJADE mode 5, and the residues which strongly correlated to mode 5. Presented residues are con_rmed to be strongly correlated to the motion of the mode 5.

審査要旨 要旨を表示する





UTokyo Repositoryリンク