学位論文要旨



No 121864
著者(漢字) エネス ヨハン ウァルター
著者(英字) EHNES Jochen Walter
著者(カナ) エネス ヨハン ウァルター
標題(和) プロジェクテッド・リアリティー : 複数のビデオプロジェクタを協調利用した環境の拡張現実感に関する研究
標題(洋) Projected Reality : Augmenting the Environment with a Network of Controllable Video Projectors
報告番号 121864
報告番号 甲21864
学位授与日 2006.09.29
学位種別 課程博士
学位種類 博士(工学)
学位記番号 博工第6394号
研究科 工学系研究科
専攻 先端学際工学専攻
論文審査委員 主査: 東京大学 教授 廣瀬,通孝
 東京大学 教授 伊福部,達
 東京大学 教授 堀,浩一
 東京大学 助教授 広田,光一
 東京大学 講師 谷川,智洋
内容要旨 要旨を表示する

A lot of research on Augmented Reality (AR) has been done aiming to support people doing complex tasks. The main idea is that computer generated views of virtual objects are overlaid onto the view of the real world. These virtual objects could mark the position of a button to press, show the correct position a lever of a machine should be in, or show where certain parts need to be inserted next. The possibilities seem to be endless. However, AR still has not made the step out of the laboratory into wide spread applications. I believe that this is mainly due to the way AR has been implemented so far. Most AR systems use Head Mounted Displays (HMD) to overlay the virtual objects over the real world. While HMD certainly have been improved since their invention by Sutherland in the late 1960s, they are still kind of cumbersome and heavy to wear, or their field of view is too small to augment the whole environment.

Besides these physical reasons, also social reasons speak against their usage: They block eye contacts, a very important part of non-verbal communication. This isolates the users wearing them from other people in the same room. Another fact that is often ignored by developers of AR systems is that they make their wearer appear like a 'geek'. While this does not matter for the 'typical AR developer', it creates a resistance within the intended users to accept the new technology. Since this happens on a subconscious level, it is very hard to counter the resulting rejection.

Consequently it has become my goal to develop a system that uses video projectors in a very flexible way to be able to augment objects while users may move them around. Since the range of a single projector is limited and projectors can only project onto surfaces that face them, it is necessary to have several projection units, each controlled by a computers, in a room. A software architecture had to be developed that enables these projection units to communicate with each other in order to find the best suited projection system to augment surfaces in real time. This architecture should scale well up to global scales with many thousands of projection systems while being powerful enough to allow for complex interactions involving several projection systems in a certain area. Furthermore this architecture should hide all the networking/roaming functionality from the developer of AR applications, which run on this AR systems like conventional applications run on an operating system. Ideally the application developers should not even think about the fact that their applications may be using several projection units at runtime. The involvement of additional projection units shall happen completely transparently and automatically.

The first milestone to reach that goal was to develop a single prototype of an AR projection system. This AR projection system is based on the pan and tilt able projector AV4 from Active Vision. Not only the pan and tilt of the projector can be controlled from the computer, but also zoom, focus and several other projector parameters. While the projection of the augmentation certainly is an important functionality, it is equally important to know the exact position and orientation of the objects that shall be augmented. Video tracking was the obvious choice, as it can be integrated into the projection system well and has similar limitations as the projection system. Both work just within a certain range and the projection surface has to be visible from the point of view of the projector and the camera. To provide this functionality I mounted a camera on top of the projector so that the unit always 'sees' what is in front of its projector. The AR-ToolKit Tracking library is used to detect markers in the captured video frames and to calculate their position and orientation towards the camera. This information can be evaluated further to control the projector's orientation and focus. While this functionality is sufficient to project simple graphics onto an object, the behavior was hard coded in the first prototype.

The next step consequently was to develop an Application Programming Interface (API) in order to make the development of applications easier. Only this way more complex behaviors became possible. This API is based on my concept of Projected Applications.

The basic idea of Projected Applications (PA) is as follows:

The AR-projection units themselves have no knowledge about how to augment different objects. This information is coded as 'Projected Applications', which are analogous to applications in a GUI environment. However, while conventional applications interact with the user via windows and widgets on a computer's screen, projected applications use tracked objects for interaction. The AR-Projection systems provide means to identify tracked objects as well as to measure their positions and orientations. They also project the output onto these tracked objects or other fixed and known surfaces, such as walls (coded in the projected application). The AR-system can be seen as an operating system that loads and executes projected applications and provides an abstraction layer for these applications to communicate with the user and to control the hardware.

Based on that API the system can be extended to use several projection units. For this third milestone an application server was introduced to store and manage the projected applications. However, the application server not only serves the applications for the AR-projection systems. More importantly, it also maintains the state of the projected applications. It therefore has to ensure that the state is only modified by one AR-system at the time, which means that only one projection unit may execute an application and augment the corresponding object at any given moment.

Once a projection unit detects a marker in the image captured by its camera, it sends the ID of the marker to the application server. In reply to that, the application server sends the application and, if available, the display rights and state of the application back to the projection unit. Now the projection system starts the new application. If the unit was granted the display rights and sent the last state of the application, it initializes the application with its state and starts to project the augmentations. If the display rights were not be available at that moment, the system does nothing but to try to follow the object and wait to be granted the display rights.

If a system that owns the display rights for an application cannot detect the relevant marker any longer, it returns the display rights together with the current state (encoded in a container object) to the application server. It can reapply for the display rights once it detects the marker again.

In the mean time, the application server may send the state as well as the display rights to another projection system.

While this simple method of managing the display rights requires only minimal amounts of network bandwidth and is sufficient if the ranges of the projection systems don't overlap, it is not satisfactory if more than one system could perform the augmentation at the same time. In that case the system that detected the marker first gets the display rights and keeps them until the marker disappears from its camera's view. However, it would be better if the system with the best view of the object would project the augmentation. Furthermore, the management of the display rights had to be more dynamic and find a new projection system that takes over before the active one looses the object and the augmentation disappears completely.

Consequently the next milestone was to extend the management of the display rights and make it more dynamic. The application server can actively withdraw display rights (in combination with the applications' states) from a projection system now. This way it can give the display rights (and states) to better suited projection systems at any time, long before the augmentations disappear completely on the systems that held the display rights before.

However, in order to decide which system gets the task to augment certain objects, the application server needs to know which system is suited best to perform the job, which may depend on the three main criteria: distance, direction of the surface normal and a free line of sight. The quality can be considered to be a weighted combination of these criteria. However, the weighting may be quite task- or application specific.

In order to keep the task of the application server simple, I introduced a scalar quality value. The system with the highest quality value is considered the best and consequently should perform the augmentation. Since the criteria of quality can be very task specific, it is up to the application developer to define the function to calculate this quality value. However, a default function should work reasonably well in most cases.

The AR projection systems regularly send updates of the quality values of all applications they host to the application server.

In consequence the application server can easily compare the quality values of the different applications running on the available AR projection systems and in turn can ensure that every application runs on the optimal projection system.

I performed a user study that compared different functions to calculate a quality value with the way human test subjects would evaluate the quality of different projections onto a test object at different distances and at different angles towards the projector.

As a result I found a function that better matches the dependence of projection quality dependent on distance and orientation according to human perception.

While the system so far worked quite well for projected applications using only one tracked object and augmenting only one surface of it, it became clear that the restriction to project only using the projection unit the PA is running on is a severe limitation. Many objects of daily life are more like a box with several surfaces than a board with only a front side. While it is possible to project onto several surfaces of an object using one projection unit, it is only possible as long as they all face this projection unit. In order to augment an object from different sides it has to be possible to augment this object using different projection units at the same time, with each surface being augmented by the unit that fits best. In order to make that possible while still keeping the applications' states consistent, I developed the concept of Hydra Applications. Analog to the creature from ancient Greek mythology which had one body and many heads, the projected application runs on only one unit, the first one that requested the application from the application server, and any further units requesting the application will get a reference to that first unit. The additional units send their tracking information to the executing system and in return they get information about what to project onto which surface. Based on the Model-View-Controller design pattern, the communication between the different units is hidden in the Hydra Controller, which is part of the system. Application developers only have to implement the model, the behavior of the application that is executed on only one system, and the view objects, which represent the different surfaces that the application can augment. Besides enabling the augmentation of several surfaces of an object, this approach also makes it possible to augment several objects that may be in the range of different projection units by a single application. An other possibility introduced by this feature is that users may wear stereo glasses tracked by one system while another one would project stereoscopic images with the correct perspective for the users' eyes onto surfaces, creating the impression of a three dimensional augmentation.

審査要旨 要旨を表示する

 本研究では,一定のアプリケーションにおいてヘッドマウンテッドディスプレイ(HMD)に替わる,映像投影技術による拡張現実感システムを構築した.拡張現実感技術は複雑な作業を行っている作業者をサポートとして非常に期待されている技術である.しかしながら,従来HMDやウェアラブルコンピュータを毎日長時間装着することが必要であり,拡張現実感技術の普及の妨げになっていた.物体に情報を重畳する場合,拡張現実感の一般的な産業環境におけるアプリケーションでは,議論の対象となる物体に直接情報が投影する方法がしばしば用いられる.この様なシステムの最も大きな利点は,作業従事者が如何なる装置を身につけたり,手に持ったりすることなく対象物に対する重畳情報を目にすることができる点である.したがって,作業従事者は拡張現実感システムに束縛されることなく作業を遂行することが可能である.さらに,作業従事者同士は,HMDで問題とされるアイコンタクトや,視線などのコミュニケーションチャネルを隠すことがないため自然なノンバーバル・コミュニケーションをとることが可能である.

 本研究の一方で,多くの先行研究ではビデオプロジェクターを正面の物体に直接投影する方法がとられており,それらのシステムは決められた投影面や投影視体積に利用が制限されている.本研究では,パン・チルト回転機構を有するビデオプロジェクターを利用することで,利用できる投影視体積を行う.制御可能なプロジェクターを利用する先行研究は存在するものの,本研究は静止した物体,動く物体の双方に対して情報を重畳するシステムとしてはさきがけたものである.さらに,ビデオトラッキング技術を磁気によるトラッキング技術に替えて採用することで,情報を投影する物体は自由に動かすことを可能としている.

 情報重畳がなされる物体が一層遠い場所に位置していたとしても利用できるように,作業可能領域を拡張するため,映像投影方式による拡張現実感システムを複数のプロジェクターを用いたものに拡張した.システムのアーキテクチャとして,World Wide Webに匹敵する地球規模のスケールにまでシステムが成長可能するものと,その上でローカルなプロジェクターユニット間での高性能の協調を提供する二つの階層を構築した.地球規模のスケールでは,本システムはデジタルコンテンツを,物理的形状を持つ雑誌や本,新聞と一緒に提供する利用法が考えられる.これらの印刷媒体に対しては,印刷時には掲載できなかった最新の情報を重畳表示することが可能である.ローカルなプロジェクターユニット間の協調動作は,投影情報の質を最適化するためにとても重要である.開発した拡張現実感システムは,ランタイムにおいて個々の投影面に対し,どのプロジェクターユニットがその投影面に情報重畳を行うのに適しているのかを決定する.

どのプロジェクターユニットによる投影が最適か,定性的評価による決定を行うため,感性評価実験を実施し,投影距離と投影角度に従い被験者の知覚される投影映像の質を計算する適当な評価関数を見積もった.

 もう一つの重要な成果は,物体に映像を投影するアプリケーションの実装を比較的容易にする,アプリケーション・プログラミング・インタフェース(API)を開発したことである.APIのフレームワークは,アプリケーション開発者に対して,プロジェクターユニットの制御に関するタスクを隠蔽するだけでなく,アプリケーションを異なるプロジェクターユニットの間でローミング可能とするために必要な機能を提供する.二番目のバージョンのAPIである'Hydra Application'はさらに進んだ機能を提供する.このフレームワークを利用することで,アプリケーション開発者はいくつかのプロジェクターユニットを利用して物体の異なる面に対して情報重畳を行うことアプリケーションを容易に実装できるようになった.フレームワークがネットワークやマルチスレッドに関するタスク透過的に行うことで,開発者から隠蔽している.結果として,Hydraアプリケーションの開発は一つのユニットで動作するアプリケーションと同程度に簡単になった.さらに,アプリケーションに固有のデータ構造体(表示目的に使われる様々な表現データの代わりに関連する情報だけを持つ)がネットワーク転送されるため,追加のプロジェクターユニットを更新するのに要求されるネットワーク帯域は,通常,提示が一貫されていないといけない,OpenGLコマンドをネットワーク転送したり,複製されたシーングラフを利用したりするクラスターVRシステムより低く抑えられている.

 最終的に,映像投影技術による拡張現実感システムの潜在能力をデモンストレーションするいくつかのアプリケーション例が開発された.'Volume Experience'アプリケーションは実空間の中でボリュームデータを扱う新しい方法を示した.'Stereo Box'は映像投影技術による拡張現実感システムは2D情報を物体表面に提示するにとどまらないことを示した.結論として,HMDは何もない空間,つまりは投影表面の存在しない空間に仮想物体を表示するアプリケーションにだけしか必要とならなくなった.

よって本論文は博士(工学)の学位請求論文として合格と認められる.

UTokyo Repositoryリンク