JP2020042575A

JP2020042575A - Information processing apparatus, positioning method, and program

Info

Publication number: JP2020042575A
Application number: JP2018169820A
Authority: JP
Inventors: 創輔山尾; Sosuke Yamao
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2020-03-19

Abstract

To provide an information processing apparatus, a positioning method, and a program which can easily position even an image having less remarkable feature.SOLUTION: The present invention is directed to an information processing apparatus having: an inertial sensor for outputting an acceleration data of the information processing apparatus; a gravity direction estimating unit for estimating a first gravity direction in a world coordinate system based on the acceleration data; a model position estimating unit for calculating a first conversion matrix for changing a position and a posture of a model in a camera coordinate system in response to the first gravity direction relative to the camera coordinate system, converting the position and the posture of the model in the camera coordinate system into a position and a posture of the model in a model coordinate system, and converting the position and the posture of the model in the model coordinate system into a position and a posture of the model in the world coordinate system; a model drawing unit for converting the position and the posture of the model in the camera coordinate system into a position of the model in an image coordinate system to draw an input image and the model in the image coordinate system; and a display unit for displaying the input image and the model in response to the drawing result of the model drawing unit.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、位置合わせ方法、及びプログラムに関する。 The present invention relates to an information processing device, a positioning method, and a program.

近年、ＡＲ（Augmented Reality）技術が注目されつつある。ＡＲは、例えば、人が知覚する現実環境をコンピュータにより拡張する技術、およびコンピュータにより拡張された現実環境そのものを指す用語として用いられる。ＡＲでは、例えば、現実世界をベースとして、仮想物体を現実世界に映し出して、現実世界の一部を拡張した映像を映し出すことができる。ＡＲは、例えば、仮想空間がベースとなり、現実世界が入り込まないＶＲ（Virtual Reality：仮想現実）と対比されて用いられる場合がある。 In recent years, AR (Augmented Reality) technology has been receiving attention. AR is used as a term that refers to, for example, a technology of extending a real environment perceived by humans with a computer, and a real environment itself extended by a computer. In AR, for example, a virtual object can be projected in the real world based on the real world, and an image in which a part of the real world is extended can be projected. The AR may be used, for example, in comparison with a VR (Virtual Reality) that is based on a virtual space and does not enter the real world.

ＡＲ技術では、仮想物体を、カメラの動きに従って、現実世界に重畳して３次元的に変化させている。例えば、ＡＲ技術を適用する情報処理装置では、各画像フレームにおいて、現実世界に重畳させる仮想物体の位置を、カメラの位置（又は方向）と姿勢（又は回転）に応じて決定する。仮想物体の位置を、カメラの位置と姿勢に応じて画像座標系へと投影したときに、現実世界の座標系から画像座標系へ、座標系を変換する情報を求めることを、例えば、「位置合わせ」と称する場合がある。 In the AR technology, a virtual object is three-dimensionally superimposed on the real world according to the movement of a camera. For example, in an information processing apparatus to which the AR technology is applied, in each image frame, the position of a virtual object to be superimposed on the real world is determined according to the position (or direction) and attitude (or rotation) of the camera. When projecting the position of the virtual object onto the image coordinate system according to the position and orientation of the camera, obtaining information for converting the coordinate system from the real world coordinate system to the image coordinate system may be performed, for example, by using the `` position It may be called "matching".

位置合わせの手法として、例えば、マーカーベースの位置合わせ手法がある。マーカーベースの位置合わせ手法は、例えば、２値（例えば、白と黒）の矩形のマーカーを予め用意しておき、情報処理装置において、マーカーを撮像し、撮像したマーカーの画像からマーカーの直線を検出するなどして、位置合わせを行う手法である。マーカーベースの位置合わせ手法は、例えば、既知のマーカーが用いられるため、情報処理装置では、画像から直線などを容易に検出し、位置合わせも容易に行うことが可能である。 As an alignment method, for example, there is a marker-based alignment method. In the marker-based alignment method, for example, a binary (for example, black and white) rectangular marker is prepared in advance, the marker is imaged in the information processing apparatus, and a straight line of the marker is extracted from the captured marker image. This is a method of performing position alignment by detecting. In the marker-based alignment method, for example, a known marker is used, so that the information processing apparatus can easily detect a straight line or the like from an image and can easily perform alignment.

しかし、マーカーベースの位置合わせ手法は、マーカーを別途用意することになる。そのため、マーカーベースの位置合わせ手法は、マーカーを用いない他の手法と比較して、設置コストが高くなる場合がある。 However, in the marker-based alignment method, a marker is separately prepared. For this reason, the marker-based positioning method may have a higher installation cost than other methods that do not use a marker.

そこで、位置合わせ手法に関しては、マーカーレスによる位置合わせ手法の技術がある。例えば、様々な姿勢による物体を観測して学習データを得て、メモリなどに保存し、カメラで撮影した単一視点のＲＧＢ（Red, Green, Blue）画像から、３次元６Ｄ（Degree of freedom）の被写体の位置と姿勢を学習ベースで推定する技術がある。 Therefore, as a positioning method, there is a technique of a markerless positioning method. For example, three-dimensional 6D (Degree of freedom) is obtained from a single viewpoint RGB (Red, Green, Blue) image obtained by a camera by observing objects in various postures, acquiring learning data, storing the learning data in a memory, or the like. There is a technique for estimating the position and orientation of a subject on a learning basis.

また、仮想座標系内に固定されている仮想カメラからの仮想画像を、実空間中のオブジェクトの実ビデオ画像と実質的に一致する位置までカメラを移動させて、仮想座標系の仮想モデルの位置を、実座標系のオブジェクトの位置にマッピングする技術がある。 Also, by moving the virtual image from the virtual camera fixed in the virtual coordinate system to a position substantially matching the real video image of the object in the real space, the position of the virtual model in the virtual coordinate system is moved. Is mapped to the position of the object in the real coordinate system.

この技術によれば、オブジェクトの仮想画像の、オブジェクトの実際の位置へのコレジストレーションのための改善されたシステムおよび方法を提供できる、とされる。 According to this technique, an improved system and method for co-registration of a virtual image of an object to an actual position of the object can be provided.

Wadim Kehl et al., “SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again”, ICCV(IEEE (The Institute of Electrical and Electronics Engineers, Inc.) International Conference on Computer Vision), 2017Wadim Kehl et al., “SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again”, ICCV (IEEE (The Institute of Electrical and Electronics Engineers, Inc.) International Conference on Computer Vision), 2017

特表２００９−５０１６０９号公報JP-T-2009-501609

しかし、単一視点のＲＧＢ画像から学習ベースで被写体の位置と姿勢と推定する技術は、見た目の特徴が乏しい画像が被写体の場合、その物体の位置と姿勢を推定することが困難な場合がある。「見た目の特徴が乏しい画像」とは、例えば、コントラストの強いコーナ点やエッジ点の数が閾値よりも少ない、或いは、コントラストの強い表面の模様の数が閾値よりも少ない画像などの場合である。このような物体の画像は、画像処理で抽出可能な特徴数が閾値よりも少ない傾向にある。かかる技術では、メモリに保存した学習データに基づいて、被写体の位置と姿勢を推定しており、特徴数が閾値よりも少ない場合、ＲＧＢ画像に対応する学習データをメモリから読み出しても、読み出した学習データの精度が低くなる場合がある。従って、かかる技術では、「見た目の特徴が乏しい画像」に対して、推定した被写体の位置と姿勢の精度が低くなる場合がある。 However, in the technology of estimating the position and orientation of a subject from a single viewpoint RGB image on a learning basis, it may be difficult to estimate the position and orientation of the object when the image having poor appearance characteristics is the subject. . The “image with poor appearance characteristics” is, for example, an image in which the number of corner points and edge points having strong contrast is smaller than the threshold value, or the number of patterns on the surface having strong contrast is smaller than the threshold value. . In such an image of the object, the number of features that can be extracted by image processing tends to be smaller than a threshold. In this technique, the position and orientation of the subject are estimated based on the learning data stored in the memory. If the number of features is smaller than the threshold, the learning data corresponding to the RGB image is read from the memory. The accuracy of the learning data may decrease. Therefore, in such a technique, the accuracy of the estimated position and orientation of the subject may be reduced with respect to the “image having poor appearance characteristics”.

また、固定された仮想カメラからの仮想画像を実ビデオ画像と実質的に一致させる技術は、例えば、仮想画像が固定されて表示されるため、実ビデオ画像と実質的に一致させるために、ユーザが、ある決められた１つの位置及び姿勢にカメラを移動させるようにしている。従って、かかる技術では、位置合わせの際に、カメラの移動が制限され、ユーザにとって、位置合わせが容易に行うことができない場合がある。 In addition, a technique for substantially matching a virtual image from a fixed virtual camera with a real video image is, for example, a method in which a virtual image is fixedly displayed, so that a user can substantially match a real video image. However, the camera is moved to one determined position and posture. Therefore, in such a technique, the movement of the camera is limited at the time of the alignment, and the user may not be able to easily perform the alignment.

そこで、見た目の特徴が乏しい画像であっても、位置合わせを容易に行うことが可能な情報処理装置、位置合わせ方法、及びプログラムを提供することにある。 Therefore, an object of the present invention is to provide an information processing apparatus, a positioning method, and a program that can easily perform positioning even for an image having poor appearance characteristics.

一開示は、情報処理装置の加速度データを出力する慣性センサと、前記加速度データから世界座標系における第１の重力方向を推定する重力方向推定部と、カメラ座標系におけるモデルの位置と姿勢を、カメラ座標系に対する前記第１の重力方向に応じて変化させ、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換し、モデル座標系における前記モデルの位置と姿勢を世界座標系における前記モデルの位置と姿勢へそれぞれ変換する第１の変換行列を算出するモデル位置推定部と、カメラ座標系における前記モデルの位置と姿勢を画像座標系における前記モデルの位置に変換し、入力画像と前記モデルとを画像座標系に描画するモデル描画部と、前記モデル描画部の描画結果に従って、前記入力画像と前記モデルとを表示する表示部とを備える情報処理装置。 One disclosure is an inertial sensor that outputs acceleration data of an information processing device, a gravitational direction estimating unit that estimates a first gravitational direction in a world coordinate system from the acceleration data, and a position and orientation of a model in a camera coordinate system. Changing the position and orientation of the model in the camera coordinate system into the position and orientation of the model in the model coordinate system, respectively, by changing the position and orientation of the model in the camera coordinate system, And a model position estimating unit for calculating a first transformation matrix for converting the position and orientation into the position and orientation of the model in the world coordinate system, and the position and orientation of the model in the camera coordinate system and the position of the model in the image coordinate system. And a model drawing unit that draws the input image and the model in the image coordinate system, according to the drawing result of the model drawing unit. Te, an information processing apparatus and a display unit for displaying said model and the input image.

一開示によれば、見た目の特徴が乏しい画像であっても、位置合わせを容易に行うことが可能である。 According to an embodiment of the present disclosure, it is possible to easily perform positioning even for an image having poor appearance characteristics.

図１は情報処理装置の構成例を表す図である。FIG. 1 is a diagram illustrating a configuration example of an information processing apparatus. 図２は座標系の例を示す図である。FIG. 2 is a diagram illustrating an example of a coordinate system. 図３はカメラの視点位置と重力ベクトルｇ_ｃとの関係例を表す図である。Figure 3 is a diagram illustrating an example of the relationship between the viewpoint position and the gravity vector g _c of the camera. 図４は動作例を表すフローチャートである。FIG. 4 is a flowchart showing an operation example. 図５は初期化処理の例を表すフローチャートである。FIG. 5 is a flowchart illustrating an example of the initialization processing. 図６は世界座標系と仮想カメラ座標系との関係例を表す図である。FIG. 6 is a diagram illustrating an example of the relationship between the world coordinate system and the virtual camera coordinate system. 図７（Ａ）と図７（Ｂ）は各座標系の関係例を表す図である。FIGS. 7A and 7B are diagrams illustrating an example of the relationship between the coordinate systems. 図８は３Ｄモデルの計算処理の例を表すフローチャートである。FIG. 8 is a flowchart illustrating an example of the calculation processing of the 3D model. 図９（Ａ）と図９（Ｂ）は初期化処理の例を表す図である。FIGS. 9A and 9B are diagrams illustrating an example of the initialization processing. 図１０（Ａ）はカメラの視点位置と２点Ｐ_１，ｃ，Ｐ_２，ｃの関係例、図１０（Ｂ）はカメラ視点位置が移動した場合の２点Ｐ_１，ｃ，Ｐ_２，ｃの関係例を夫々表す図である。FIG. 10A shows an example of the relationship between the viewpoint position of the camera and the two points P _{1, c} , P _{2, c} , and FIG. 10B shows the two points P _{1, c} , P _2,2 when the camera viewpoint position moves _. It is a figure showing the example of a relationship of _c , respectively. 図１１（Ａ）と図１１（Ｂ）は対象物体と３Ｄモデルの表示例を表す図である。FIGS. 11A and 11B are diagrams illustrating display examples of a target object and a 3D model. 図１２はカメラの視点位置と２点Ｐ_１，ｃ，Ｐ_２，ｃの関係例を表す図である。FIG. 12 is a diagram illustrating an example of the relationship between the viewpoint position of the camera and two points P1 _{, c} , P2 _{, c} . 図１３（Ａ）と図１３（Ｂ）は各座標系の関係例を表す図である。FIGS. 13A and 13B are diagrams illustrating a relationship example of each coordinate system. 図１４（Ａ）と図１４（Ｂ）は対象物体と３Ｄモデルの表示例を表す図である。FIGS. 14A and 14B are diagrams illustrating display examples of a target object and a 3D model. 図１５（Ａ）と図１５（Ｂ）は対象物体と３Ｄモデルの表示例を表す図である。FIGS. 15A and 15B are diagrams illustrating display examples of a target object and a 3D model. 図１６（Ａ）と図１６（Ｂ）は対象物体と３Ｄモデルの表示例を表す図である。FIGS. 16A and 16B are diagrams illustrating display examples of a target object and a 3D model. 図１７は情報処理装置のハードウェア構成例を表す図である。FIG. 17 is a diagram illustrating a hardware configuration example of the information processing apparatus. 図１８は情報処理システムの構成例を表す図である。FIG. 18 is a diagram illustrating a configuration example of an information processing system.

以下、本発明を実施するための形態について説明する。なお、以下の実施例は開示の技術を限定するものではない。そして、各実施の形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments for carrying out the present invention will be described. The following embodiments do not limit the disclosed technology. The embodiments can be appropriately combined within a range that does not contradict processing contents.

［第１の実施の形態］
＜情報処理装置の構成例＞
図１は情報処理装置１００の構成例を表す図である。 [First Embodiment]
<Configuration example of information processing device>
FIG. 1 is a diagram illustrating a configuration example of the information processing apparatus 100.

情報処理装置１００は、例えば、スマートフォン、ゲーム装置、設備の点検及び管理装置、ナビゲーション装置などである。 The information processing device 100 is, for example, a smartphone, a game device, an equipment inspection and management device, a navigation device, or the like.

第１の実施の形態における情報処理装置１００は、ＡＲ技術を用いて、現実世界にモデル画像（又仮想モデル、或いは３Ｄ（3-Dimension）モデル。以下、「３Ｄモデル」と称する場合がある。）を表示させる。情報処理装置１００は、３Ｄモデルを用いて位置合わせを行う。この際、情報処理装置１００は、見た目の特徴が乏しい物体を対象物体とする場合でも、様々なカメラ視点から、３Ｄモデルを用いて対象物体に対する位置合わせが可能となる。 The information processing apparatus 100 according to the first embodiment uses an AR technique to create a model image (or a virtual model or a 3D (3-Dimension) model in the real world; hereinafter, may be referred to as a “3D model”. ) Is displayed. The information processing device 100 performs alignment using the 3D model. In this case, the information processing apparatus 100 can perform positioning with respect to the target object by using a 3D model from various camera viewpoints even when the target object has a poor appearance characteristic.

本第１の実施の形態において、「位置合わせ」とは、例えば、２つの座標系を結びつける情報を算出することである。本第１の実施の形態における「位置合わせ」は、例えば、モデル座標系を世界座標系へ変換する行列Ｔ_ｗｍを算出することである。「位置合わせ」を、例えば、レジストレーションと称する場合もある。座標系を含め、詳細は後述する。 In the first embodiment, “positioning” means, for example, calculating information for linking two coordinate systems. “Positioning” in the first embodiment is, for example, calculating a matrix T _wm for converting a model coordinate system into a world coordinate system. "Positioning" may be referred to as, for example, registration. Details including the coordinate system will be described later.

位置合わせ後の３Ｄモデルは、例えば、図１４（Ｂ）や図１６（Ｂ）に示すように、対象物体に合致した状態で表示部１０９に表示される。そのため、ユーザは、表示部１０９に表示された３Ｄモデルと対象物体との位置関係により、対象物体が変化したことを確認するなどして、設備の点検や管理を行うことが可能となる。或いは、情報処理装置１００では、位置合わせ後において、表示部１０９に３次元のナビゲーション画像を表示させたり、ゲームの３次元のキャラクタを表示させたりすることが可能となる。 The 3D model after the positioning is displayed on the display unit 109 in a state where the 3D model matches the target object, as shown in, for example, FIG. 14B and FIG. 16B. Therefore, the user can check and manage the equipment by confirming that the target object has changed based on the positional relationship between the 3D model displayed on the display unit 109 and the target object. Alternatively, in the information processing apparatus 100, after the positioning, it is possible to display a three-dimensional navigation image on the display unit 109 or to display a three-dimensional character of the game.

図１に示すように、情報処理装置１００は、撮像部１０１、慣性センサ１０２、記憶部１０３、自己位置推定部１０４、重力方向推定部１０５、初期化処理部１０６、モデル位置推定部１０７、モデル描画部１０８、表示部１０９、認識開始判定部１１０、及び物体位置認識部１１１を備える。 As shown in FIG. 1, the information processing apparatus 100 includes an imaging unit 101, an inertial sensor 102, a storage unit 103, a self-position estimation unit 104, a gravity direction estimation unit 105, an initialization processing unit 106, a model position estimation unit 107, a model It includes a drawing unit 108, a display unit 109, a recognition start determination unit 110, and an object position recognition unit 111.

撮像部１０１は、対象物体を含む画像を撮像し、撮像した画像を入力画像とし、入力画像の画像データを出力する。撮像部１０１は、画像データを記憶部１０３に記憶する。画像データは、例えば、ＲＧＢ（Red Green Blue）の各プレーンを持つＲＧＢ画像データである。 The imaging unit 101 captures an image including a target object, uses the captured image as an input image, and outputs image data of the input image. The imaging unit 101 stores the image data in the storage unit 103. The image data is, for example, RGB image data having each plane of RGB (Red Green Blue).

慣性センサ１０２は、情報処理装置１００の加速度を測定し、測定した加速度を加速度データとして出力する。慣性センサ１０２は、加速度データを記憶部１０３に記憶する。慣性センサ１０２は、例えば、加速度センサやジャイロセンサなどであってもよい。 Inertial sensor 102 measures the acceleration of information processing device 100 and outputs the measured acceleration as acceleration data. The inertial sensor 102 stores the acceleration data in the storage unit 103. The inertial sensor 102 may be, for example, an acceleration sensor, a gyro sensor, or the like.

記憶部１０３は、例えば、メモリであって、ＲＧＢ画像データ、加速度データ、さらに、３Ｄモデルデータと各種設定値を記憶する。３Ｄモデルデータは、例えば、世界座標系における３Ｄモデルの位置を表す位置情報とその位置におけるＲＧＢデータとが含まれる。また、各種設定値は、例えば、仮想カメラ座標系における３Ｄモデルの任意の２点Ｐ_１，ｍ，Ｐ_２，ｍなどが含まれる。なお、世界座標系や仮想カメラ座標系などの座標系については後述する。 The storage unit 103 is, for example, a memory, and stores RGB image data, acceleration data, 3D model data, and various setting values. The 3D model data includes, for example, position information indicating a position of the 3D model in the world coordinate system and RGB data at the position. In addition, the various setting values include, for example, arbitrary two points P _{1, m} , P _{2, m} of the 3D model in the virtual camera coordinate system. Note that coordinate systems such as the world coordinate system and the virtual camera coordinate system will be described later.

自己位置推定部１０４は、記憶部１０３から読み出したＲＧＢ画像データに基づいて、世界座標系における実カメラ（例えば、撮像部１０１）の位置と姿勢を推定する。例えば、自己位置推定部１０４は、ＳＬＡＭ（Simultaneous Localization and Mapping）を用いて、複数画像フレームのＲＧＢ画像データから、実カメラの位置と姿勢を表すカメラパラメータを算出する。また、自己位置推定部１０４は、例えば、カメラパラメータを含む行列を算出する。カメラパラメータを含む行列は、例えば、世界座標系から実カメラ座標系への変換行列Ｔ_ｃｗになり得る。詳細は動作例で説明する。自己位置推定部１０４は、変換行列Ｔ_ｃｗなどをモデル描画部１０８とモデル位置推定部１０７へ出力する。 The self-position estimating unit 104 estimates the position and orientation of a real camera (for example, the imaging unit 101) in the world coordinate system based on the RGB image data read from the storage unit 103. For example, the self-position estimating unit 104 calculates camera parameters representing the position and orientation of a real camera from RGB image data of a plurality of image frames using Simultaneous Localization and Mapping (SLAM). Further, the self-position estimating unit 104 calculates, for example, a matrix including camera parameters. The matrix including the camera parameters can be, for example, a transformation matrix T _cw from the world coordinate system to the real camera coordinate system. Details will be described in an operation example. Self-position estimating section 104 outputs transformation matrix T _{cw and the} like to model drawing section 108 and model position estimating section 107.

重力方向推定部１０５は、記憶部１０３から読み出した加速度データに基づいて、世界座標系における重力方向を推定する。例えば、重力方向推定部１０５は、内部メモリに記憶された式を利用して、加速度データから重力方向を推定する。詳細は動作例で説明する。重力方向推定部１０５は、推定した重力方向をモデル位置推定部１０７へ出力する。 The gravity direction estimation unit 105 estimates the gravity direction in the world coordinate system based on the acceleration data read from the storage unit 103. For example, the gravitational direction estimating unit 105 estimates the gravitational direction from the acceleration data using an equation stored in the internal memory. Details will be described in an operation example. The gravity direction estimation unit 105 outputs the estimated gravity direction to the model position estimation unit 107.

初期化処理部１０６は、記憶部１０３から読み出した３Ｄモデルデータと各種設定値とを利用して、仮想カメラ座標系を設定する。各種設定値としては、例えば、３Ｄモデル上の任意の２点Ｐ_１，ｍ，Ｐ_２，ｍと、２点Ｐ_１，ｍ，Ｐ_２，ｍ間のユークリッド距離Ｌ、２点Ｐ_１，ｍ，Ｐ_２，ｍとを含む平面の法線ベクトルｎ_ｃ、及び鉛直下方向ベクトルｇ_ｍがある。そして、初期化処理部１０６は、仮想カメラ座標系における単位視線ベクトルｒ_１，ｒ_２を算出する。初期化処理部１０６は、算出した単位視線ベクトルｒ_１，ｒ_２と仮想カメラ座標系における各種数値とをモデル位置推定部１０７へ出力する。詳細は動作例で説明する。 The initialization processing unit 106 sets a virtual camera coordinate system using the 3D model data read from the storage unit 103 and various setting values. As various setting values, for example, an arbitrary two points P _{1, m} , P _{2, m} on the 3D model, a Euclidean distance L between the two points P _{1, m} , P _{2, m} and a two point P _{1, m} , there is a normal vector _{n c,} and the vertically downward direction vector _{g m} of a plane including the _{P 2, m.} Then, the initialization processing unit 106 calculates unit line-of-sight vectors r ₁ and r ₂ in the virtual camera coordinate system. The initialization processing unit 106 outputs the calculated unit line-of-sight vectors r ₁ and r ₂ and various numerical values in the virtual camera coordinate system to the model position estimation unit 107. Details will be described in an operation example.

モデル位置推定部１０７は、記憶部１０３から読み出した３Ｄモデルデータ、重力方向、単位視線ベクトルｒ_１，ｒ_２、世界座標系から実カメラ座標系への変換行列Ｔ_ｃｗなどを用いて、モデル座標系における３Ｄモデルの位置と姿勢を推定する。具体的には、例えば、モデル位置推定部１０７は、モデル座標系における３Ｄモデルの位置と姿勢を世界座標系における３Ｄモデルの位置と姿勢へそれぞれ変換する変換行列Ｔ_ｗｍを算出する。また、モデル位置推定部１０７は、例えば、世界座標系の３Ｄモデルの位置及び姿勢をカメラ座標系の３Ｄモデルの位置及び姿勢へ変換したり、モデル座標系における３Ｄモデルの位置及び姿勢へ変換したりする処理を行う。詳細は動作例で説明する。モデル位置推定部１０７は、カメラ座標系の３Ｄモデルの位置及び姿勢などをモデル描画部１０８へ出力し、モデル座標系における３Ｄモデルの位置及び姿勢と変換行列Ｔ_ｗｍなどを、物体位置認識部１１１へ出力する。 The model position estimating unit 107 uses the 3D model data read from the storage unit 103, the gravitational direction, the unit line-of-sight vectors r ₁ and r ₂ , the transformation matrix T _cw from the world coordinate system to the real camera coordinate system, and the like, to obtain model coordinates. Estimate the position and orientation of the 3D model in the system. Specifically, for example, the model position estimating unit 107 calculates a conversion matrix T _wm that converts the position and orientation of the 3D model in the model coordinate system into the position and orientation of the 3D model in the world coordinate system. The model position estimating unit 107 converts, for example, the position and orientation of the 3D model in the world coordinate system into the position and orientation of a 3D model in the camera coordinate system, and converts the position and orientation of the 3D model in the model coordinate system. Or perform a process. Details will be described in an operation example. The model position estimating unit 107 outputs the position and orientation of the 3D model in the camera coordinate system to the model drawing unit 108, and outputs the position and orientation of the 3D model in the model coordinate system, the transformation matrix _{Twm, and the} like to the object position recognizing unit 111. Output to

なお、３Ｄモデルの位置及び姿勢の他の座標系への変換は、例えば、変換行列により行われる。他の座標系の位置及び姿勢へと変換するための数値が変換行列の各成分に含まれる。以下では、例えば、１つの変換行列により、ある座標系の３Ｄモデルの位置及び姿勢を、他の座標系の３Ｄモデルの位置及び姿勢へ変換することができるとして説明する。また、変換行列により、３Ｄモデルの位置及び姿勢がある座標系から他の座標系へ変換されるが、このことを、例えば、ある座標系における３Ｄモデルデータが、他の座標系における３Ｄモデルデータへ変換される、として説明する場合がある。３Ｄモデルの位置及び姿勢を、例えば、３Ｄモデルデータと称する場合がある。 The conversion of the position and orientation of the 3D model into another coordinate system is performed using, for example, a conversion matrix. Numerical values for conversion to a position and orientation in another coordinate system are included in each component of the conversion matrix. Hereinafter, for example, a description will be given on the assumption that the position and orientation of a 3D model in a certain coordinate system can be converted into the position and orientation of a 3D model in another coordinate system by using one transformation matrix. The transformation matrix converts the position and orientation of the 3D model from one coordinate system to another coordinate system. For example, this is because the 3D model data in one coordinate system is converted to the 3D model data in another coordinate system. May be described as being converted to The position and orientation of the 3D model may be referred to as, for example, 3D model data.

モデル描画部１０８は、入力画像に３Ｄモデルを描画する。具体的には、モデル描画部１０８は、以下の処理を行う。すなわち、モデル描画部１０８は、カメラ座標系の３Ｄモデルデータを、投影行列Ｔ_ｐを用いて、画像座標系の３Ｄモデルデータへ変換する。モデル描画部１０８は、ＲＧＢ画像データと３Ｄモデルデータとを、画像座標系に描画する。この際、モデル描画部１０８は、画像座標系の３Ｄモデルの位置におけるＲＧＢ画像データを、３Ｄモデルの画像データへ変更する。モデル描画部１０８は、画像座標系におけるＲＧＢ画像データと３Ｄモデルの画像データとを表示部１０９と認識開始判定部１１０へ出力する。 The model drawing unit 108 draws a 3D model on the input image. Specifically, the model drawing unit 108 performs the following processing. That is, the model rendering unit 108, the 3D model data of the camera coordinate system, using the projection matrix T _p, converted into 3D model data of the image coordinate system. The model drawing unit 108 draws the RGB image data and the 3D model data in an image coordinate system. At this time, the model drawing unit 108 changes the RGB image data at the position of the 3D model in the image coordinate system to the image data of the 3D model. The model drawing unit 108 outputs the RGB image data and the 3D model image data in the image coordinate system to the display unit 109 and the recognition start determination unit 110.

表示部１０９は、モデル描画部１０８から出力されたＲＧＢ画像データと３Ｄモデルの画像データとに基づいて、入力画像と３Ｄモデルを表示する。表示部１０９に表示される３Ｄモデルの位置は、カメラ座標系における重力方向に応じて変化する。ユーザは、表示部１０９においてこのように変化する３Ｄモデルを見ながら、情報処理装置１００（又は撮像部１０１）を移動させ、入力画像に含まれる対象物体と３Ｄモデルとを一致させて、「位置合わせ」を行う。そのように一致したときにおいて、表示部１０９に表示される対象物体と３Ｄモデルの例が、例えば、図１４（Ｂ）や図１６（Ｂ）となる。 The display unit 109 displays the input image and the 3D model based on the RGB image data output from the model drawing unit 108 and the image data of the 3D model. The position of the 3D model displayed on the display unit 109 changes according to the direction of gravity in the camera coordinate system. The user moves the information processing device 100 (or the imaging unit 101) while watching the 3D model that changes in this way on the display unit 109, matches the target object included in the input image with the 3D model, and sets the “position Matching ". When such a match occurs, examples of the target object and the 3D model displayed on the display unit 109 are, for example, FIGS. 14B and 16B.

図１に戻り、認識開始判定部１１０は、例えば、ユーザが情報処理装置１００の操作ボタンを押圧したか否かにより、対象物体の位置合わせの開始判定を行う。認識開始判定部１１０は、ユーザにより、位置合わせの開始判定が行われたと判定したときは、その旨を物体位置認識部１１１へ通知する。 Returning to FIG. 1, the recognition start determination unit 110 determines whether to start positioning the target object based on, for example, whether or not the user has pressed the operation button of the information processing apparatus 100. When the recognition start determining unit 110 determines that the user has performed the positioning start determination, the recognition start determining unit 110 notifies the object position recognizing unit 111 of the determination.

物体位置認識部１１１は、例えば、位置合わせの開始判定の行われた旨の通知を受け取ったとき、その通知を受け取ったときのモデル座標系から世界座標系への変換行列Ｔ_ｗｍを、モデル位置推定部１０７から受け取る。そして、物体位置認識部１１１は、この変換行列Ｔ_ｗｍを利用して、モデル画像系における３Ｄモデルデータを世界座標系の３Ｄモデルデータへ変換したり、世界座標系の３Ｄモデルデータをカメラ座標系の３Ｄモデルデータへ変換したりする。物体位置認識部１１１は、カメラ座標系の３Ｄモデルデータをモデル描画１０８へ出力する。モデル描画部１０８では、この３Ｄモデルデータに対して、投影行列Ｔ_ｐを用いて画像座標系への３Ｄモデルデータへ変換し、画像座標系に３Ｄモデルデータを描画する。表示部１０９では、描画結果に従って、入力画像と３Ｄモデルとを表示する。 The object position recognizing unit 111, for example, upon receiving a notification indicating that the start of alignment has been determined, converts the transformation matrix T _wm from the model coordinate system to the world coordinate system at the time of receiving the notification, from the model position Received from the estimation unit 107. The object position recognition unit 111 uses the conversion matrix _Twm to convert 3D model data in the model image system into 3D model data in the world coordinate system or convert 3D model data in the world coordinate system into the camera coordinate system. To the 3D model data of. The object position recognition unit 111 outputs the 3D model data of the camera coordinate system to the model drawing 108. The model rendering unit 108, with respect to the 3D model data, converts the 3D model data to the image coordinate system using the projection matrix T _p, to draw the 3D model data to the image coordinate system. The display unit 109 displays the input image and the 3D model according to the drawing result.

＜各座標系について＞
図２は各座標系の例を表す図である。本第１の実施の形態では、世界座標系（Ｘ，Ｙ，Ｚ）、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）、及び画像座標系（ｘ，ｙ）の５つの座標系がある。 <About each coordinate system>
FIG. 2 is a diagram illustrating an example of each coordinate system. In the first embodiment, the world coordinate system (X, Y, Z), the virtual camera coordinate system ( _Xv , _Yv , _Zv ), the real camera coordinate system ( _Xc , _Yc , _Zc ), model coordinate system _{_{_{(X m, Y m, Z}}} m) there are five coordinate system, and image coordinate system (x, y).

世界座標系（Ｘ，Ｙ，Ｚ）の任意の位置に世界座標系の原点Ｏ_ｗが存在する。対象物体は、世界座標系（Ｘ，Ｙ，Ｚ）において、固定された位置に位置してもよい。また、３Ｄモデルも世界座標系（Ｘ，Ｙ，Ｚ）において、ある特定の位置に位置している。 World coordinate system (X, Y, Z) are present origin O _w in the world coordinate system at an arbitrary position. The target object may be located at a fixed position in the world coordinate system (X, Y, Z). Further, the 3D model is also located at a specific position in the world coordinate system (X, Y, Z).

また、世界座標系における任意の位置に仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）の原点Ｏ_ｖが存在し、原点Ｏ_ｖを基準にして仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）がある。原点Ｏ_ｖは、例えば、仮想カメラの視点位置となる。 Further, the virtual camera coordinate system at an arbitrary position in the world coordinate system _{_{_{(X v, Y v, Z}}} v) origin _{O v} is present in the origin _{O v} virtual camera coordinate system with respect to the _(X _{v, Y} v, Z _v ). The origin _Ov is, for example, the viewpoint position of the virtual camera.

実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）についても、世界座標系の任意の位置にその原点Ｏ_ｃが存在し、原点Ｏ_ｃを中心に実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）がある。原点Ｏ_ｃは、例えば、実カメラの視点位置となる。以下では、原点Ｏ_ｖ，Ｏ_ｃを、仮想カメラと実カメラの視点位置とそれぞれ称する場合がある。 Actual camera coordinate system _{_{_{(X c, Y c, Z}}} c) for also, the origin _{O c} is present at any position in the world coordinate system, the actual camera coordinate system around the origin _{_{_{O c (X c, Y c}}} , Z _c ). The origin _Oc is, for example, the viewpoint position of the real camera. Hereinafter, the origin O _v, the O _c, may be referred to each and viewpoint position of the virtual camera and the real cameras.

なお、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における３Ｄモデルの位置及び姿勢と、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルの位置及び姿勢は、視点位置がそれぞれ異なるため、異なるものとなる。また、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の視点位置Ｏ_ｃは、世界座標系（Ｘ，Ｙ，Ｚ）において移動可能である。 The position and orientation of the 3D model in the virtual camera coordinate system ( _Xv , _Yv , _Zv ) and the position and orientation of the 3D model in the real camera coordinate system ( _Xc , _Yc , _Zc ) are the viewpoint positions. Are different and therefore different. Moreover, the actual camera coordinate system _{_{_{(X c, Y c, Z}}} c) the viewpoint position _{O c} of is movable in the world coordinate system (X, Y, Z).

実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の原点Ｏ_ｃと、３Ｄモデルの中心とを結ぶ線分上に、画像座標系（ｘ，ｙ）の原点ｏが存在する。画像座標系（ｘ，ｙ）は、世界座標系（Ｘ，Ｙ，Ｚ）において、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）と３Ｄモデルとの間に位置する。実カメラの視点位置Ｏ_ｃと、画像座標系（ｘ，ｙ）の原点ｏとの間の距離は、例えば、焦点距離ｆと呼ばれ、世界座標系において一定の距離を維持する。 Actual camera coordinate system _{_{_{(X c, Y c, Z}}} c) and the origin _{O c} of, on a line connecting the center of the 3D model, the origin o of the image coordinate system (x, y) is present. The image coordinate system (x, y) is located between the real camera coordinate system ( _Xc , _Yc , _Zc ) and the 3D model in the world coordinate system (X, Y, Z). A viewpoint position O _c of the real camera, the distance between the origin o of the image coordinate system (x, y), for example, it called the focal length f, to maintain a constant distance in the world coordinate system.

さらに、３Ｄモデル上のある特定の位置（図２の例では、図面上、右上の角）を原点Ｏ_ｍとするモデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）がある。 Further, there is a model coordinate system (X _m , Y _m , Z _m ) having an origin O _m at a specific position on the 3D model (in the example of FIG. 2, the upper right corner in the drawing).

図２に示すように、対象物体には、世界座標系（Ｘ，Ｙ，Ｚ）における重力ベクトルｇ_ｗが働く。図２では、Ｙ軸方向の負方向に重力ベクトルｇ_ｗが働くため、−ｇ_ｗとして表記している。重力ベクトルｇ_ｗは、３Ｄモデルにも働く。 As shown in FIG. 2, the target object, the world coordinate system (X, Y, Z) is the gravity vector g _w in work. In FIG. 2, since the gravity vector g _w acts in the negative direction in the Y-axis direction, it is represented as −g _w . The gravity vector g _w also works for the 3D model.

本第１の実施の形態では、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）において、重力ベクトルｇ_ｃが存在する。 In the first embodiment, the actual camera coordinate system _{_{_{(X c, Y c, Z}}} c) in, there is a gravity vector _{g c.}

図３は、実カメラの視点位置Ｏ_ｃと、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における重力ベクトルｇ_ｃとの関係例を表す図である。図３に示すように、視点位置Ｏ_ｃを、３Ｄモデルを見上げる方向に移動させると、重力ベクトル（−ｇ_ｃ）は、視点位置Ｏ_ｃの方向へ傾く。一方、図２において、視点位置Ｏ_ｃを、３Ｄモデルに近づく方向へ移動させても、重力ベクトル（−ｇ_ｃ）の方向はほとんど変わらない。 Figure 3 is a diagram representing the viewpoint position _{O c} of the real camera, real camera coordinate system _{_{_{(X c, Y c, Z}}} c) an example of the relationship between the gravity vector _{g c} in. As shown in FIG. 3, the viewpoint position _{O c,} is moved in a direction to look up a 3D model, the gravity vector (-g _c) are inclined in the direction of the viewpoint position _{O c.} On the other hand, in FIG. 2, the viewpoint position O _c, be moved in a direction closer to a 3D model, the direction of the gravity vector (-g _c) hardly changes.

すなわち、実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における重力ベクトルｇ_ｃは、例えば、重力ベクトルｇ_ｗの方向（重力方向）に対して同じ向きを維持したまま、実カメラの視点位置Ｏ_ｃが世界座標系（Ｘ，Ｙ，Ｚ）を移動しても、その方向は変わらない。一方、重力ベクトルｇ_ｗの方向（重力方向）に対して、実カメラの視点位置Ｏ_ｃがその向きを変える方向に移動すると、重力ベクトルｇ_ｃの方向は変化する。このように、重力ベクトルｇ_ｃの方向は、実カメラの視点位置Ｏ_ｃの移動により、変化する場合がある、という特徴を持つ。本第１の実施の形態では、このような視点位置Ｏ_ｃの移動により、重力ベクトルｇ_ｃのその方向が変化する場合、その変化に応じて、３Ｄモデルの実カメラ座標系の位置及び姿勢を変化させるようにしている。詳細は動作例で説明する。 That is, the gravity vector g _c in the real camera coordinate system (X _c , Y _c , Z _c ) is, for example, the viewpoint position of the real camera while maintaining the same direction with respect to the direction of the gravity vector g _w (gravity direction). When _Oc moves in the world coordinate system (X, Y, Z), its direction does not change. On the other hand, when the viewpoint position O _c of the real camera moves in the direction of changing its direction with respect to the direction of the gravity vector g _w (the direction of gravity), the direction of the gravity vector g _c changes. Thus, the direction of the gravity vector g _c is the movement of the viewpoint position O _c of the real camera, it may vary, with the feature that. In the first embodiment, by the movement of such a viewpoint position O _c, if the direction of the gravity vector g _c varies, depending on the change, the position and orientation of the real camera coordinate system of the 3D model I try to change it. Details will be described in an operation example.

なお、以下では、実カメラ座標系のことを、例えば、カメラ座標系と称する場合がある。 In the following, the actual camera coordinate system may be referred to as a camera coordinate system, for example.

＜動作例＞
図４は情報処理装置１００の動作例を表すフローチャートである。 <Operation example>
FIG. 4 is a flowchart illustrating an operation example of the information processing apparatus 100.

情報処理装置１００は、処理を開始すると（Ｓ１０）、初期化処理を行う（Ｓ１１）。 When the processing is started (S10), the information processing apparatus 100 performs an initialization processing (S11).

図５は初期化処理の動作例を表すフローチャートである。初期化処理は、例えば、初期化処理部１０６で処理が行われ、仮想カメラ座標系における単位視線ベクトルｒ_１，ｒ_２が算出される。 FIG. 5 is a flowchart illustrating an operation example of the initialization processing. The initialization processing is performed by, for example, the initialization processing unit 106, and the unit line-of-sight vectors r ₁ and r ₂ in the virtual camera coordinate system are calculated.

初期化処理部１０６は、初期化処理を開始すると（Ｓ１１０）、仮想カメラ座標系における３Ｄモデルの任意の２点Ｐ_１，ｖ，Ｐ_２，ｖ（Ｐ_１，ｖ≠Ｐ_２，ｖ）を決定する（Ｓ１１１）。 When the initialization processing unit 106 starts the initialization processing (S110), any two points P1 _{, v} , P2 _{, v} (P1 _, vｖP2 _{, v} ) of the 3D model in the virtual camera coordinate system are determined. It is determined (S111).

図６は、世界座標系（Ｘ，Ｙ，Ｚ）と仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）との関係例を表す図である。図６に示すように、任意の２点Ｐ_１，ｖ，Ｐ_２，ｖは、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における３Ｄモデル上の任意の２点を表している。例えば、初期化処理部１０６は、記憶部１０３に記憶された、２点Ｐ_１，ｖ，Ｐ_２，ｖの位置座標を、記憶部１０３から読み出すことで、任意の２点Ｐ_１，ｖ，Ｐ_２，ｖを決定してもよい。 FIG. 6 is a diagram illustrating a relationship example between the world coordinate system (X, Y, Z) and the virtual camera coordinate system ( _Xv , _Yv , _Zv ). As shown in FIG. 6, arbitrary two points P1 _{, v} , P2 _{, v} represent arbitrary two points on the 3D model in the virtual camera coordinate system ( _Xv , _Yv , _Zv ). For example, the initialization processing unit 106 reads the position coordinates of the two points P _{1, v} , P _{2, v} stored in the storage unit 103 from the storage unit 103, and thus, the arbitrary two points P _{1, v} , P2 _{, v} may be determined.

図５に戻り、また、初期化処理部１０６は、２点Ｐ_１，ｖ，Ｐ_２，ｖ間のユークリッド距離Ｌを決定する（Ｓ１１１）。例えば、初期化処理部１０６は、２点Ｐ_１，ｖ，Ｐ_２，ｖの位置座標に基づいて、その距離を計算することで、Ｌを計算してもよいし、記憶部１０３に記憶されたＬを読み出すことで決定してもよい。 Returning to FIG. 5, the initialization processing unit 106 determines a Euclidean distance L between the two points P _{1, v} , P _{2, v} (S111). For example, the initialization processing unit 106 may calculate L by calculating the distance based on the position coordinates of the two points P _{1, v} , P _{2, v} , or may be stored in the storage unit 103. Alternatively, it may be determined by reading out the L.

さらに、初期化処理部１０６は、２点Ｐ_１，ｖ，Ｐ_２，ｖを含む適当な平面の法線ベクトルｎ_ｖを決定する（Ｓ１１１）。図６に示すように、法線ベクトルｎ_ｖは、３Ｄモデルのある面に対する法線ベクトルでもよい。例えば、初期化処理部１０６は、点Ｐ_１，ｖ，Ｐ_２，ｖの位置座標に基づいて、法線ベクトルｎ_ｖを計算してもよいし、記憶部１０３から法線ベクトルｎ_ｖの情報を読み出すことで、法線ベクトルｎ_ｖを決定してもよい。 Furthermore, the initialization processing unit 106, the two points _P 1, _v, determines the normal vector _{n v} of suitable plane containing _{P 2, v} (S111). As shown in FIG. 6, the normal vector n _v may be the normal vector to the surface with a 3D model. For example, the initialization processing unit 106, the point _P 1, _v, based on the position coordinates of _{P 2, v,} may be calculated normal vector _{n v,} information from the storage unit 103 of the normal vector _{n v} by reading, it may determine the normal vector n _v.

図５に戻り、さらに、初期化処理部１０６は、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における鉛直下向きベクトルｇ_ｖを決定する（Ｓ１１１）。鉛直下向きベクトルｇ_ｖは、例えば、図６に示すように、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における重力ベクトルとなり得る。例えば、初期化処理部１０６は、記憶部１０３から鉛直下向きベクトルｇ_ｖの情報を読み出すことで、決定してもよい。この仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における鉛直下向きベクトルｇ_ｖも、重力ベクトルｇ_ｃと同様に、例えば、図３に示すように、仮想カメラの視点位置Ｏ_ｖに応じて、変化する場合がある。 Returning to FIG. 5, further, the initialization processing unit 106, the virtual camera coordinate system _{_{_{(X v, Y v, Z}}} v) determining the vertically downward vector _{g v} in (S 111). The vertical downward vector _gv can be, for example, a gravity vector in a virtual camera coordinate system ( _Xv , _Yv , _Zv ), as shown in FIG. For example, the initialization processing unit 106, by reading the information of the vertically downward direction vector g _v from the storage unit 103 may be determined. The virtual camera coordinate system _{_{_{(X v, Y v, Z}}} v) is also in the vertically downward vector _{g v,} similarly to the gravity vector _{g c,} for example, as shown in FIG. 3, in accordance with the viewpoint position _{O v} of the virtual camera , May change.

図５に戻り、次に、初期化処理部１０６は、ある位置及び姿勢から２点Ｐ_１，ｖ，Ｐ_２，ｖを観測する仮想カメラを定義する（Ｓ１１２）。例えば、初期化処理部１０６は、図６に示すように、世界座標系（Ｘ，Ｙ，Ｚ）の任意の位置Ｏ_ｖを、記憶部１０３から読み出して、仮想カメラの視点位置に設定することで、仮想カメラを定義する。 Returning to FIG. 5, then the initialization processing unit 106 is located and the two points from the orientation _P 1, _v, defines the virtual camera to observe the _{P 2, v} (S112). For example, the initialization processing unit 106, as shown in FIG. 6, the world coordinate system (X, Y, Z) any position O _v of reads from the storage unit 103, setting a viewpoint position of the virtual camera Defines a virtual camera.

図５に戻り、次に、初期化処理部１０６は、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における単位視線ベクトルｒ_１，ｒ_２（ｒ_１≠ｒ_２）を計算する（Ｓ１１３）。例えば、初期化処理部１０６は、仮想カメラの視点位置Ｏ_ｖの位置座標（０，０，０）と、仮想モデル座標系の２点Ｐ_１，ｖ，Ｐ_２，ｖの位置座標とを結ぶ線分の長さが「１」となる２点の位置座標を計算し、その位置座標をそれぞれ単位視線ベクトルｒ_１，ｒ_２の成分としてもよい。 Returning to FIG. 5, next, the initialization processing unit 106 calculates the unit line-of-sight vectors r ₁ , r ₂ (r ₁ ≠ r ₂ ) in the virtual camera coordinate system (X _v , Y _v , Z _v ) (S113). ). For example, the initialization processing unit 106 connects the position coordinate of the virtual camera viewpoint position _{O v} (0,0,0), 2 points _P 1, v virtual model coordinate _system, the position coordinates of _{P 2, v} The position coordinates of two points where the length of the line segment is “1” may be calculated, and the position coordinates may be used as the components of the unit line-of-sight vectors r ₁ and r ₂ .

そして、初期化処理部１０６は、初期化処理を終了する（Ｓ１１４）。初期化処理部１０６は、計算した単位視線ベクトルｒ_１，ｒ_２と、決定した法線ベクトルｎ_ｖ、及び鉛直下向きベクトルｇ_ｖをモデル位置推定部１０７へ出力する。 Then, the initialization processing unit 106 ends the initialization processing (S114). The initialization processing unit 106 outputs the calculated unit line-of-sight vectors r ₁ and r ₂ , the determined normal vector n _v , and the vertical downward vector g _v to the model position estimation unit 107.

図４に戻り、次に、情報処理装置１００は、ＲＧＢ画像データと加速度データとを取得する（Ｓ１２）。例えば、撮像部１０１は、撮像した入力画像のＲＧＢ画像データを記憶部１０３に記憶し、慣性センサ１０２は、入力画像を撮像したときに測定した加速度データを記憶部１０３に記憶する。 Returning to FIG. 4, next, the information processing apparatus 100 acquires RGB image data and acceleration data (S12). For example, the imaging unit 101 stores RGB image data of a captured input image in the storage unit 103, and the inertial sensor 102 stores acceleration data measured when the input image is captured in the storage unit 103.

なお、情報処理装置１００は、Ｓ１２からＳ１７までの処理を、撮像部１０１で撮像した画像の画像フレーム毎に行う。従って、情報処理装置１００は、画像フレーム毎に、ＲＧＢ画像データを取得したり、画像フレーム毎に、慣性センサ１０２から加速度データを取得したりする。 Note that the information processing apparatus 100 performs the processing from S12 to S17 for each image frame of the image captured by the imaging unit 101. Therefore, the information processing apparatus 100 acquires RGB image data for each image frame, or acquires acceleration data from the inertial sensor 102 for each image frame.

次に、情報処理装置１００は、カメラの位置及び姿勢を推定する（Ｓ１３）。例えば、自己位置推定部１０４は、ＳＬＡＭを利用して、カメラパラメータを取得し、実カメラの位置及び姿勢を推定することで、取得したカメラパラメータを含む変換行列Ｔ_ｃｗを計算する。 Next, the information processing apparatus 100 estimates the position and orientation of the camera (S13). For example, the self-position estimating unit 104 obtains camera parameters using SLAM, estimates the position and orientation of the real camera, and calculates a transformation matrix T _cw including the obtained camera parameters.

ここで、ＳＬＡＭについて説明する。ＳＬＡＭとは、例えば、同一のカメラで撮像された複数の画像（２次元）に基づいて、画像の特徴点を抽出して追跡することで、カメラ周囲の３次元構造の認識と、カメラの位置及び姿勢の算出とを同時に行う技術である。 Here, SLAM will be described. SLAM is, for example, based on a plurality of images (two-dimensional) captured by the same camera, extracting and tracking feature points of the image, thereby recognizing a three-dimensional structure around the camera and recognizing the position of the camera. And the calculation of the posture at the same time.

ＳＬＡＭ処理として、自己位置推定部１０４では、例えば、以下の処理を行う。 As the SLAM processing, the self-position estimating unit 104 performs, for example, the following processing.

すなわち、最初に、自己位置推定部１０４は、記憶部１０３からＲＧＢ画像データを読み出し、ＲＧＢ画像データにより示された複数の画像（又は画像フレーム）から、特徴点を抽出する。例えば、自己位置推定部１０４は、ＳＩＦＴ（Scale Invariant Feature Transform）やＳＵＲＦ（Speeded Up Robust Feature）などの公知の手法を用いて、各画像について、特徴点を抽出する。 That is, first, the self-position estimating unit 104 reads the RGB image data from the storage unit 103 and extracts feature points from a plurality of images (or image frames) indicated by the RGB image data. For example, the self-position estimating unit 104 extracts a feature point for each image using a known method such as SIFT (Scale Invariant Feature Transform) or SURF (Speeded Up Robust Feature).

次に、自己位置推定部１０４は、各画像で抽出した特徴点の各画像におけるマッチングを行う。この際、自己位置推定部１０４は、特徴点抽出で用いた公知の手法で、マッチングを行ってもよい。 Next, the self-position estimating unit 104 performs matching in each image of feature points extracted in each image. At this time, the self-position estimating unit 104 may perform the matching by a known method used for feature point extraction.

そして、自己位置推定部１０４は、マッチング結果に基づいて、特徴点の３次元座標を算出し、算出した３次元座標から各画像に対応したカメラパラメータを算出する。カメラパラメータとしては、例えば、カメラの位置座標と座標軸の回転角を含む。自己位置推定部１０４は、このカメラパラメータを含む変換行列Ｔ_ｃｗを算出する。この変換行列Ｔ_ｃｗは、例えば、カメラの位置座標（又は位置）と座標軸の回転角（又は姿勢）を含むため、世界座標系（Ｘ，Ｙ，Ｚ）における任意の位置及び姿勢を、カメラの視点位置を原点としたカメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における位置及び姿勢に変換する変換行列となり得る。 Then, the self-position estimating unit 104 calculates three-dimensional coordinates of the feature points based on the matching result, and calculates camera parameters corresponding to each image from the calculated three-dimensional coordinates. The camera parameters include, for example, the position coordinates of the camera and the rotation angle of the coordinate axes. The self-position estimating unit 104 calculates a transformation matrix T _cw including the camera parameters. Since the transformation matrix T _cw includes, for example, the position coordinates (or position) of the camera and the rotation angle (or posture) of the coordinate axes, an arbitrary position and posture in the world coordinate system (X, Y, Z) can be calculated. It can be a conversion matrix for converting into a position and orientation in a camera coordinate system ( _Xc , _Yc , _Zc ) with the viewpoint position as the origin.

図７（Ａ）は、各座標系の関係例を表す図である。自己位置推定部１０４は、ＳＬＡＭを用いて、カメラの位置及び姿勢を算出することで、世界座標系（Ｘ，Ｙ，Ｚ）から実カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）への変換行列Ｔ_ｃｗを算出している。 FIG. 7A is a diagram illustrating a relationship example of each coordinate system. The self-position estimating unit 104 calculates the position and orientation of the camera by using the SLAM, thereby converting the world coordinate system (X, Y, Z) to the real camera coordinate system ( _Xc , _Yc , _Zc ). The transformation matrix T _cw is calculated.

以上、ＳＬＡＭの処理の例について説明した。ＳＬＡＭには、例えば、ＥＫＦ（Extended Kalman Filter）を用いたＥＫＦベースのＳＬＡＭや、パーティクルフィルタを利用したＳＬＡＭなどがある。本第１の実施の形態では、例えば、どのような手法のＳＬＡＭを用いてもよい。 The example of the SLAM processing has been described above. Examples of the SLAM include an EKF-based SLAM using an EKF (Extended Kalman Filter) and a SLAM using a particle filter. In the first embodiment, for example, any method of SLAM may be used.

図４に戻り、次に、情報処理装置１００は、重力方向を計算する（Ｓ１４）。例えば、重力方向推定部１０５は、以下の処理を行う。 Returning to FIG. 4, next, the information processing apparatus 100 calculates the direction of gravity (S14). For example, the gravity direction estimation unit 105 performs the following processing.

すなわち、重力方向推定部１０５は、記憶部１０３から読み出した、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の各軸方向の加速度データ（Ａｘ，Ａｙ，Ａｚ）から、以下の式を用いて、各軸方向に対する傾き（θ，ψ，φ）を計算する。 That is, the gravity direction estimation section 105, read from the storage unit 103, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) acceleration data in respective axis directions of the (Ax, Ay, Az) from using the following equation Then, the inclination (θ, ψ, φ) with respect to each axis direction is calculated.

そして、重力方向推定部１０５は、傾き（θ，ψ，φ）に基づいて、重力方向を推定する。例えば、重力方向推定部１０５は、傾きが、（０，−１，０）のときは、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の−Ｙ_ｃ軸方向に重力方向があると推定し、傾きが、（１，０，０）のときは、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）のＸ_ｃ軸方向に重力方向があると推定する。重力方向推定部１０５は、推定した重力方向が、世界座標系（Ｘ，Ｙ，Ｚ）の重力方向であるとして、世界座標系（Ｘ，Ｙ，Ｚ）における重力ベクトルｇ_ｗの方向（又は重力方向）を得る。 Then, the gravity direction estimation unit 105 estimates the gravity direction based on the inclination (θ, ψ, φ). For example, the gravity direction estimation unit 105, the gradient is (0, -1,0) when the camera coordinate system _{_{_{(X c, Y c, Z}}} c) and the -Y _c-axis direction is the direction of gravity estimated and, slope, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) estimates that there is a direction of gravity _{X c-axis} direction when the (1,0,0). The gravitational direction estimating unit 105 determines that the estimated gravitational direction is the gravitational direction in the world coordinate system (X, Y, Z), and determines the direction of the gravitational vector g _{w in} the world coordinate system (X, Y, Z) (or the gravitational direction). Direction).

例えば、重力方向推定部１０５は、内部メモリに式（１）から式（３）を記憶し、本処理の際に読み出して、記憶部１０３から読み出した加速度データを式（１）から式（３）に代入することで、傾きを算出する。そして、重力方向推定部１０５は、その傾きに基づいて、重力ベクトルｇ_ｗの方向を推定する。 For example, the gravity direction estimating unit 105 stores the formulas (1) to (3) in the internal memory, reads out the data at the time of this processing, and reads the acceleration data read from the storage unit 103 from the formulas (1) to (3). ) To calculate the slope. Then, the gravity direction estimation unit 105 estimates the direction of the gravity vector g _w based on the inclination.

以上が重力方向の計算方法である。 The above is the calculation method of the direction of gravity.

次に、情報処理装置１００は、３Ｄモデルの位置と姿勢の計算処理（以下、「３Ｄモデルの計算処理」と称する場合がある。）を行う（Ｓ１５）。 Next, the information processing apparatus 100 performs calculation processing of the position and orientation of the 3D model (hereinafter, may be referred to as “3D model calculation processing”) (S15).

図８は、３Ｄモデルの計算処理の動作例を表すフローチャートである。 FIG. 8 is a flowchart illustrating an operation example of the calculation processing of the 3D model.

モデル位置推定部１０７は、３Ｄモデルの計算処理を開始すると（Ｓ１５０）、カメラ座標系における重力ベクトルｇ_ｃの方向と、仮想モデル座標系における鉛直下向きベクトルｇ_ｖの方向とが一致するように法線ｎ_ｍを回転させる（Ｓ１５１）。 Model position estimation unit 107 starts the calculation process of the 3D model (S150), the direction of the gravity vector g _c in the camera coordinate system, the law so that the direction of the vertically downward vector g _v in the virtual model coordinate system is coincident The line _nm is rotated (S151).

図９（Ａ）と図９（Ｂ）は、Ｓ１５１の処理を説明するための図である。本処理では、仮想カメラ座標系（Ｘ_ｖ，Ｙ_ｖ，Ｚ_ｖ）における法線ベクトルｎ_ｖを回転させて、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における法線ベクトルｎ_ｃを算出する。その際に、モデル位置推定部１０７は、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の重力ベクトルｇ_ｃと、仮想カメラ座標系の鉛直下向きベクトルｇ_ｖとを用いて計算する。モデル位置推定部１０７は、例えば、以下の計算を行う。 FIGS. 9A and 9B are diagrams for explaining the processing of S151. In this process, calculates the virtual camera coordinate system _{_{_{(X v, Y v, Z}}} v) by rotating the normal vector _{n v} in the camera coordinate system _{_{_{(X c, Y c, Z}}} c) the normal vector _{n c} in I do. At that time, the model position estimating section 107 is calculated using a camera coordinate system _{_{_{(X c, Y c, Z}}} c) and the gravity vector _{g c} of the vertically downward direction vector _{g v} of the virtual camera coordinate system. The model position estimating unit 107 performs, for example, the following calculation.

すなわち、モデル位置推定部１０７は、回転軸をｖ、回転軸ｖを中心にして法線ベクトルｎ_ｖを法線ベクトルｎ_ｃへ回転させる回転角度をθとすると、

により、回転軸ｖと法線ベクトルｎ_ｃとを計算する。 That is, the model position estimating unit 107, the rotation axis v, when the rotation angle to rotate about the axis of rotation v of the normal vector n _v to the normal vector n _c and theta,

Accordingly, to calculate the rotation axis v and the normal vector n _c.

次に、モデル位置推定部１０７は、回転軸ｖのまわりに角度θだけ回転させる回転行列Ｒを、以下の式を用いて算出する。 Next, the model position estimating unit 107 calculates a rotation matrix R for rotating around the rotation axis v by the angle θ using the following equation.

そして、モデル位置推定部１０７は、回転行列Ｒを用いて、以下の式を利用して、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における法線ベクトルｎ_ｃを計算する。 The model position estimation unit 107 uses the rotation matrix R, using the following formula, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) calculating a normal vector _{n c} in.

なお、モデル位置推定部１０７は、例えば、カメラ位置推定処理（図４のＳ１３）で得た変換行列Ｔ_ｃｗと、重力方向計算処理（Ｓ１４）で得た重力ベクトルｇ_ｗを用いて、以下の式により、重力ベクトルｇ_ｃを計算する。 Note that the model position estimating unit 107 uses the transformation matrix T _cw obtained in the camera position estimating process (S13 in FIG. 4) and the gravity vector g _w obtained in the gravity direction calculating process (S14) to The gravitational vector g _c is calculated by the equation.

例えば、モデル位置推定部１０７は、内部メモリに式（４）から式（８）を記憶し、本処理の際に内部メモリから式（４）から式（８）を読み出して、鉛直下向きベクトルｇ_ｖや重力ベクトルｇ_ｃなどを、式（４）から式（８）に代入するなどして、法線ベクトルｎ_ｃを計算する。 For example, the model position estimating unit 107 stores the formulas (4) to (8) in the internal memory, reads out the formulas (8) to (8) from the internal memory at the time of this processing, and sets the vertical downward vector g _v etc. and gravity vector _{g c,} such as by substituting the equation (4) into equation (8), to calculate the normal vector _{n c.}

図８に戻り、次に、モデル位置推定部１０７は、固定視線のスケールｔ_１，ｔ_２を計算する（Ｓ１５２）。 Returning to FIG. 8, next, the model position estimating unit 107 calculates the scales t ₁ and t ₂ of the fixed line of sight (S152).

図１０（Ａ）は、スケールｔ_１，ｔ_２の例を表す図である。スケールｔ_１，ｔ_２は、例えば、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の視点位置Ｏ_ｃから、初期化処理で得た単位視線ベクトルｒ_１，ｒ_２を延長し、２点間の距離がＬとなっている３Ｄモデル上の２点Ｐ_１，ｃ，Ｐ_２，ｃへ延ばしたときの、単位視線ベクトルｒ_１，ｒ_２に対するスケールを表す。 FIG. 10A is a diagram illustrating an example of the scales t ₁ and t ₂ . The scales t ₁ and t ₂ are obtained by extending the unit line-of-sight vectors r ₁ and r ₂ obtained by the initialization process from the viewpoint position O _{c in} the camera coordinate system (X _c , Y _c , Z _c ), for example, to obtain two points. It represents the scale for the unit line-of-sight vectors r ₁ , r ₂ when extending to two points P _{1, c} , P _{2, c} on the 3D model in which the distance between them is L.

モデル位置推定部１０７は、例えば、以下の式を利用して、スケールｔ_１，ｔ_２を計算する。 The model position estimating unit 107 calculates the scales t ₁ and t ₂ using, for example, the following equations.

ただし、αは、以下の式となる。 Here, α is represented by the following equation.

例えば、モデル位置推定部１０７は、内部メモリに式（９）から式（１１）を記憶し、処理の際に内部メモリから読み出して、Ｓ１５１で得た法線ベクトルｎ_ｃなどを式（９）から式（１１）に代入することで、スケールｔ_１，ｔ_２を得る。 For example, the model position estimating unit 107 stores the expression (11) from equation (9) in the internal memory, is read from the internal memory during processing, such as an expression normal vector _{n c} obtained in S151 (9) By substituting into Equation (11), scales t ₁ and t ₂ are obtained.

図８に戻り、次に、モデル位置推定部１０７は、カメラ座標系における３Ｄモデルの２点Ｐ_１，ｃ，Ｐ_２，ｃを、以下の式を用いて計算する（Ｓ１５３）。 Returning to FIG. 8, next, the model position estimating unit 107 calculates two points P _{1, c} , P _{2, c} of the 3D model in the camera coordinate system using the following formula (S153).

Ｐ_１，ｃ＝ｔ_１ｒ_１，Ｐ_２，ｃ＝ｔ_２ｒ_２・・・（１２）
例えば、モデル位置推定部１０７は、内部メモリに式（１２）を記憶し、処理の際に内部メモリから読み出して、Ｓ１５２で計算したスケールｔ_１，ｔ_２を式（１２）に代入することで、２点Ｐ_１，ｃ，Ｐ_２，ｃを得る。 _{_{_{_{P 1, c = t 1 r}}}} 1, P 2, c = t 2 r 2 ··· (12)
For example, the model position estimating unit 107 stores the formula (12) in the internal memory, reads out the formula from the internal memory at the time of processing, and substitutes the scales t ₁ and t ₂ calculated in S152 into the formula (12). , Two points P _{1, c} and P _{2, c} are obtained.

ここで、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における任意の２点Ｐ_１，ｃ，Ｐ_２，ｃと、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における視点位置Ｏ_ｃとの関係について説明する。 Here, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) any two points _P 1, c _in, and _{P 2, c,} the camera coordinate system _{_{_{(X c, Y c, Z}}} c) viewpoint in the position _{O c} Will be described.

図１０（Ｂ）は、その関係例を表す図である。上述したように、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における重力ベクトルｇ_ｃは、カメラの視点位置Ｏ_ｃが重力ベクトルｇ_ｗの方向（重力方向）に対して向きを変える場合、その方向が変化する。 FIG. 10B is a diagram illustrating an example of the relationship. As described above, the gravitational vector g _c in the camera coordinate system (X _c , Y _c , Z _c ) is obtained when the viewpoint position O _{c of the} camera changes its direction with respect to the direction of the gravitational vector g _w (gravity direction). The direction changes.

例えば、図１０（Ｂ）に示すように、カメラの視点位置がＯ_ｃからＯ’_ｃへ移動した場合を考える。丁度、視点位置Ｏ_ｃが、３Ｄモデルに対して、世界座標系（Ｘ，Ｙ，Ｚ）のＹ軸方向へ（３Ｄモデルの上空方向へ）、移動した場合である。 For example, as shown in FIG. 10 (B), consider a case where the viewpoint position of the camera is moved from O _c to O _'c. Just viewpoint position _{O c} is, with respect to the 3D model, the world coordinate system (X, Y, Z) in the Y-axis direction (the sky direction of the 3D model), a case where the movement.

この場合、図３の場合と同様に、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における重力ベクトルｇ_ｃは、重力ベクトルｇ_ｗの方向に対して向きを変えているため、その方向が変化する。従って、カメラの視点位置がＯ_ｃにあるときの重力ベクトルｇ_ｃと、カメラの視点位置がＯ’_ｃにあるときの重力ベクトルｇ_ｃとは異なるものとなる。この相違により、式（４）と式（５）に示すように、カメラの視点位置がＯ_ｃにあるときの回転軸ｖと回転角度θと、カメラの視点位置がＯ’_ｃにあるときの回転軸ｖと回転角度θとが異なるものとなる。カメラの視点位置の相違により、回転軸ｖと回転角度θとが異なると、式（６）に示す回転行列Ｒも異なるものとなり、結果として、法線ベクトルｎ_ｃも異なるものとなる。スケールｔ_１，ｔ_２は、式（９）から式（１１）に示すように、法線ベクトルｎ_ｃが含まれるため、上述したカメラの視点位置の相違により、スケールｔ_１，ｔ_２も異なるものとなる。このスケールｔ_１，ｔ_２の相違により、図１０（Ｂ）に示すように、カメラの視点位置がＯ_ｃからＯ’_ｃへ移動すると、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における２点Ｐ_１，ｃ，Ｐ_２，ｃは、２点Ｐ’_１，ｃ，Ｐ’_２，ｃへそれぞれ移動する。従って、例えば、視点位置Ｏ’_ｃからは、視点位置Ｏ_ｃの場合と比較して、３Ｄモデルの上面が大きく見える状態となる。 In this case, as in the case of FIG. 3, the gravity vector g _c in the camera coordinate system (X _c , Y _c , Z _c ) changes its direction with respect to the direction of the gravity vector g _w. Change. Therefore, different from the gravity vector g _c when the gravity vector g _c when the viewpoint position of the camera is O _c, the viewpoint position of the camera is in the O _'c. This difference, as shown in equation (4) and (5), when the rotation axis v and the rotational angle θ of when the viewpoint position of the camera is O _c, the viewpoint position of the camera is in the O _'c The rotation axis v and the rotation angle θ are different. The difference in the point of view of the camera position, when the rotation axis v and the rotational angle θ are different, also becomes different rotation matrix R shown in equation (6), as a result, also be different normal vector n _c. Scale _t 1, _{t 2,} as shown from equation (9) into equation (11), because it contains the normal vector _{n c,} the difference in the viewpoint position of the camera described above, different scales _t 1, _{t 2} also It will be. Due to the difference between the scales t ₁ and t ₂ , as shown in FIG. 10B, when the viewpoint position of the camera moves from O _c to O ′ _c , the camera coordinate system (X _c , Y _c , Z _c ) The two points P1 _{, c} , P2 _{, c} move to the two points P'1 _{, c} , P'2 _{, c} , respectively. Thus, for example, from the view point O _'c, as compared with the case of the viewpoint position O _c, a state in which the upper surface of the 3D model appear larger.

図１１（Ａ）と図１１（Ｂ）は、カメラの視点位置をＯ_ｃからＯ’_ｃへ変えたときの、３Ｄモデルの表示例を表す図である。図１１（Ｂ）は、図１１（Ａ）と比較して、３Ｄモデルの上面部分が大きく表示されているのがわかる。 Figure 11 (A) and FIG. 11 (B) when the viewpoint position of the camera is changed from _{O c} to O _'c, is a diagram illustrating a display example of a 3D model. FIG. 11B shows that the top surface of the 3D model is displayed larger than in FIG. 11A.

このように、本第１の実施の形態では、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルの位置と姿勢（例えば、例えば、２点Ｐ_１，ｃ，Ｐ_２，ｃ）を、カメラ座標系からの重力方向（例えば、重力ベクトルｇ_ｃ）に応じて変化させるようにしている。このような関係により、情報処理装置１００は、図１１（Ａ）や図１１（Ｂ）に示すように、３Ｄモデルの位置と姿勢がカメラの視点位置Ｏ_ｃの位置と姿勢に応じて変化する。 As described above, in the first embodiment, the position and the posture (for example, two points P _{1, c} , P _{2, c} ) of the 3D model in the camera coordinate system (X _c , Y _c , Z _c ). Is changed in accordance with the direction of gravity from the camera coordinate system (for example, the gravity vector g _c ). Such relationships, the information processing apparatus 100 includes, as shown in FIG. 11 (A) and FIG. 11 (B), the position and orientation of the 3D model are changed in accordance with the position and orientation of the camera viewpoint position O _c .

図８に戻り、次に、モデル位置推定部１０７は、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の重力ベクトルｇ_ｃと、法線ベクトルｎ_ｃ、及びモデル幅Ｗに基づいて、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）からモデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）へ、座標系を変換する変換行列Ｔ_ｍｃを計算する（Ｓ１５４）。変換行列Ｔ_ｍｃの全体の座標系における位置付けは、例えば、図７（Ｂ）に示すものとなる。なお、モデル幅Ｗは、例えば、図１２に示すように、カメラ座標系における点Ｐ_１，ｃから、モデル座標系の原点Ｏ_ｍまでの距離を表し、モデル座標系のＸ_ｍ軸方向における３Ｄモデルの長さを表す。 Referring back to FIG. 8, the model position estimating unit 107 next calculates the camera vector based on the gravity vector g _{c in} the camera coordinate system (X _c , Y _c , Z _c ), the normal vector n _c , and the model width W. coordinate system _{_{_{(X c, Y c, Z}}} c) model coordinate system _{_{_{(X m, Y m, Z}}} m) to, to calculate the transform matrix _{T mc} of converting the coordinate system (S154). The position of the transformation matrix T _{mc in} the entire coordinate system is, for example, as shown in FIG. Incidentally, the model the width W is, for example, as shown in FIG. 12, from the point P _{1, c} in the camera coordinate system, represents the distance to the origin O _m of the model coordinate system, 3D in X _m-axis direction of the model coordinate system Represents the length of the model.

例えば、モデル位置推定部１０７は、変換行列Ｔ_ｍｃの全成分に、重力ベクトルｇ_ｃと、法線ベクトルｎ_ｃ、及びモデル幅Ｗの全部又は一部を含む行列を計算してもよい。或いは、モデル位置推定部１０７は、例えば、変換行列Ｔ_ｍｃの一部の成分が数値とし、他の成分に、重力ベクトルｇ_ｃと、法線ベクトルｎ_ｃ、及びモデル幅Ｗの全部又は一部を含む行列を計算してもよい。或いは、モデル位置推定部１０７は、例えば、内部メモリに、変換行列Ｔ_ｍｃを記憶しておき、重力ベクトルｇ_ｃと、法線ベクトルｎ_ｃ、及びモデル幅Ｗを、変換行列Ｔ_ｍｃの各成分の全部又は一部に代入することで、変換行列Ｔ_ｍｃを得るようにしてもよい。 For example, the model position estimating unit 107 may calculate a matrix including all or a part of the gravity vector g _c , the normal vector n _c , and the model width W in all components of the transformation matrix T _mc . Alternatively, for example, the model position estimating unit 107 determines that some components of the transformation matrix T _mc are numerical values, and that the other components include all or some of the gravity vector g _c , the normal vector n _c , and the model width W May be calculated. Alternatively, the model position estimating unit 107 stores the transformation matrix T _mc in, for example, an internal memory, and converts the gravity vector g _c , the normal vector n _c , and the model width W into each component of the transformation matrix T _mc . by substituting all or part of, may be obtained a transformation matrix T _mc.

なお、モデル幅Ｗは、例えば、記憶部１０３やモデル位置推定部１０７の内部メモリに記憶しておき、モデル位置推定部１０７から記憶部１０３や内部メモリから読み出して、変換行列Ｔ_ｍｃを計算するようにしてもよい。 Note that the model width W is stored in, for example, the internal memory of the storage unit 103 or the model position estimating unit 107, and is read from the storage unit 103 or the internal memory from the model position estimating unit 107 to calculate the transformation matrix _Tmc . You may do so.

図８に戻り、次に、モデル位置推定部１０７は、世界座標系におけるカメラの位置と、３Ｄモデル上の２点Ｐ_１，ｃ，Ｐ_２，ｃに基づいて、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）から世界座標系（Ｘ，Ｙ，Ｚ）へ、座標系を変換する変換行列Ｔ_ｗｍを計算する（Ｓ１５５）。 Returning to FIG. 8, the model position estimating unit 107 next calculates the model coordinate system (X _m , P _m) based on the camera position in the world coordinate system and the two points P _{1, c} , P _{2, c} on the 3D model. Y _m, the world coordinate system _{Z m)} (X, Y, to Z), calculates a transformation matrix _{T wm} for converting the coordinate system (S155).

図１２は、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）とモデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）の関係例を表す図である。本処理では、３Ｄモデル上の２点Ｐ_１，ｃ，Ｐ_２，ｃを、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の点から、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）への点へ変換する変換行列Ｔ_ｗｍを計算する。丁度、カメラの視点位置Ｏ_ｃから、３Ｄモデルの視点位置Ｏ_ｍへ、視点位置を変えたときに、３Ｄモデル上の２点を、モデル座標系の２点Ｐ_１，ｍ，Ｐ_２，ｍへ変換する場合の変換行列Ｔ_ｗｍを計算している。 Figure 12 is a diagram showing the camera coordinate system _{_{_{(X c, Y c, Z}}} c) a model coordinate system _{_{_{(X m, Y m, Z}}} m) an example of the relationship. In this process, the two points _P 1, c on the 3D _{model, P 2, c,} the camera coordinate system _{_{_{(X c, Y c, Z}}} c) in terms of the model coordinate system _{_{_{(X m, Y m, Z}}} m ) _Is calculated. Just from the point of view of the camera position _{O c,} the 3D model to a view point _{O m,} when changing the viewpoint position, the two points on the 3D model, two points of a model coordinate system _P 1, _{m, P 2, m} The conversion matrix T _wm in the case of conversion to is calculated.

図７（Ｂ）に示すように、Ｓ１５４の処理により、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）からモデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）への変換行列Ｔ_ｗｍを計算した。また、自己位置推定処理（図４のＳ１３）により、世界座標系（Ｘ，Ｙ，Ｚ）からカメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）への変換行列Ｔ_ｃｗを計算した。本処理においては、この関係を利用して、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）から世界座標系（Ｘ，Ｙ，Ｚ）への変換行列Ｔ_ｗｍを計算する。 As shown in FIG. 7 (B), the processing of S154, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) model coordinate system _{_{_{(X m, Y m, Z}}} m) computing a transformation matrix _{T wm} to did. Further, a transformation matrix T _cw from the world coordinate system (X, Y, Z) to the camera coordinate system (X _c , Y _c , Z _c ) was calculated by the self-position estimation processing (S13 in FIG. 4). In this process, by utilizing this relationship, the model coordinate system _{_{_{(X m, Y m, Z}}} m) to compute the transformation matrix _{T wm} from the world coordinate system (X, Y, Z) to.

すなわち、モデル位置推定部１０７は、以下の式を利用して、変換行列Ｔ_ｗｍを計算する。 That is, the model position estimating unit 107 calculates the transformation matrix _Twm using the following equation.

例えば、モデル位置推定部１０７は、内部メモリから式（１３）を読み出して、Ｓ１３で計算した変換行列Ｔ_ｃｗと、Ｓ１５４で計算した変換行列Ｔ_ｍｃとを、式（１３）に代入することで、変換行列Ｔ_ｗｍを得る。 For example, the model position estimating unit 107 reads expression (13) from the internal memory, and substitutes the conversion matrix T _cw calculated in S13 and the conversion matrix T _mc calculated in S154 into expression (13). , To obtain a transformation matrix T _wm .

図８に戻り、モデル位置推定部１０７は、Ｓ１５５の処理を終了すると、３Ｄモデルの計算処理を終了する（Ｓ１５６）。 Returning to FIG. 8, when the process of S155 ends, the model position estimating unit 107 ends the 3D model calculation process (S156).

以上、３Ｄモデル計算処理（図４のＳ１５）について説明した。 The 3D model calculation processing (S15 in FIG. 4) has been described above.

なお、モデル位置推定部１０７は、自己位置推定部１０４から受け取った変換行列Ｔ_ｃｗを用いて、記憶部１０３から読み出した世界座標系（Ｘ，Ｙ，Ｚ）における３Ｄモデルデータを、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータへ変換する。モデル位置推定部１０７は、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータを、モデル描画部１０８へ出力する。 The model position estimating unit 107 uses the transformation matrix T _cw received from the self-position estimating unit 104 to convert the 3D model data in the world coordinate system (X, Y, Z) read from the storage unit 103 into the camera coordinate system. Conversion to 3D model data in ( _Xc , _Yc , _Zc ). The model position estimating unit 107 outputs the 3D model data in the camera coordinate system ( _Xc , _Yc , _Zc ) to the model drawing unit 108.

また、モデル位置推定部１０７は、Ｓ１５４において算出した変換行列Ｔ_ｍｃを利用して、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータを、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）における３Ｄモデルデータへ変換する。モデル位置推定部１０７は、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）における３Ｄモデルデータと、Ｓ１５５で算出した変換行列Ｔ_ｗｍとを、物体位置認識部１１１へ出力する。 Further, the model position estimating unit 107, by using the transformation matrix _{T mc} calculated in S154, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) the 3D model data in the model coordinate system _(X m, _{Y m} , Z _m ). Model position estimation unit 107, a model coordinate system _{_{_{(X m, Y m, Z}}} m) and 3D model data in, and a transformation matrix _{T wm} calculated in S155, and outputs it to the object position recognizing unit 111.

図４に戻り、次に、情報処理装置１００は、カメラ映像に３Ｄモデルを描画する（Ｓ１６）。例えば、モデル描画部１０８は、以下の処理を行う。 Returning to FIG. 4, next, the information processing apparatus 100 draws a 3D model on the camera video (S16). For example, the model drawing unit 108 performs the following processing.

すなわち、モデル描画部１０８は、モデル位置推定部１０７から受け取った、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータに対して、図１３（Ａ）に示すように、投影行列Ｔ_ｐを用いて、画像座標系（ｘ，ｙ）の３Ｄモデルデータへ変換する。そして、モデル描画部１０８は、記憶部１０３から読み出したＲＧＢデータ（又はカメラ映像）と、変換後の３Ｄモデルデータとを、画像座標系（ｘ，ｙ）に描画する。この際、モデル描画部１０８は、画像座標系（ｘ，ｙ）における３Ｄモデルの位置における入力画像のＲＧＢ画像データを、３Ｄモデルの画像データに変更することで、３Ｄモデルを描画する。モデル描画部１０８は、描画結果を表示部１０９と認識開始判定部１１０へ出力する。表示部１０９は、描画結果に従って、カメラ映像に３Ｄモデルが写っている画像を表示する。 That is, as shown in FIG. 13A, the model drawing unit 108 calculates the projection matrix of the 3D model data in the camera coordinate system ( _Xc , _Yc , _Zc ) received from the model position estimating unit 107. using T _p, converted into 3D model data of the image coordinate system (x, y). Then, the model drawing unit 108 draws the RGB data (or the camera image) read from the storage unit 103 and the converted 3D model data in the image coordinate system (x, y). At this time, the model drawing unit 108 draws the 3D model by changing the RGB image data of the input image at the position of the 3D model in the image coordinate system (x, y) to the image data of the 3D model. The model drawing unit 108 outputs the drawing result to the display unit 109 and the recognition start determination unit 110. The display unit 109 displays an image in which the 3D model is captured in the camera video according to the drawing result.

例えば、図１１（Ａ）や図１１（Ｂ）、及び図１４（Ａ）は、表示部１０９に表示されるカメラ映像と３Ｄモデルの例を表す。カメラ映像には対象物体（図１１（Ａ）の例は、ティッシュ箱）が含まれており、ユーザが、３Ｄモデルを対象物体に一致させるように、撮像部１０１（又は情報処理装置１００）を移動させることで、「位置合わせ」が行われる。 For example, FIG. 11A, FIG. 11B, and FIG. 14A show examples of a camera image and a 3D model displayed on the display unit 109. The camera video includes a target object (in the example of FIG. 11A, a tissue box), and the user operates the imaging unit 101 (or the information processing apparatus 100) so that the 3D model matches the target object. By moving, "positioning" is performed.

図４に戻り、次に、情報処理装置１００は、ユーザの決定操作が行われたか否かを判定する（Ｓ１７）。決定操作とは、例えば、ユーザが表示部１０９に写っている映像において、対象物体と３Ｄモデルとが一致したと判断したときに、情報処理装置１００の操作ボタンなどを押す操作のことである。例えば、図１４（Ｂ）は、表示部１０９に表示された３Ｄモデルの例であるが、３Ｄモデルと対象物体とが一致したとユーザが判断すると、所定の操作ボタンをユーザが押圧する。認識開始判定部１１０は、操作ボタンを押圧したことを示す信号を操作ボタンから受信したとき、決定操作が行われたと判定し（Ｓ１７でＹｅｓ）、その信号を受け取らかったとき、決定操作がおこなわれていないと判定する（Ｓ１７でＮｏ）。決定操作が行われなったとき、図４に示すように、情報処理装置１００は、Ｓ１２へ移行して、Ｓ１２からＳ１７までの処理を繰り返す。操作ボタンに代えて、例えば、表示部１０９に表示されたタッチパネルの操作により決定操作が行われてもよい。 Returning to FIG. 4, next, the information processing apparatus 100 determines whether or not the user's determination operation has been performed (S17). The deciding operation is, for example, an operation of pressing an operation button or the like of the information processing apparatus 100 when the user determines that the target object and the 3D model match in the video displayed on the display unit 109. For example, FIG. 14B illustrates an example of the 3D model displayed on the display unit 109. When the user determines that the 3D model matches the target object, the user presses a predetermined operation button. When receiving a signal indicating that the operation button has been pressed from the operation button, the recognition start determination unit 110 determines that the determination operation has been performed (Yes in S17), and when the signal has not been received, the determination operation is performed. It is determined that it has not been performed (No in S17). When the determination operation is not performed, as illustrated in FIG. 4, the information processing apparatus 100 proceeds to S12 and repeats the processing from S12 to S17. Instead of the operation buttons, for example, the determination operation may be performed by operating the touch panel displayed on the display unit 109.

情報処理装置１００は、決定操作が行われたと判定したとき（Ｓ１７でＹｅｓ）、対象物体の位置及び姿勢を計算する（Ｓ１８）。具体的には、物体位置認識部１１１は、例えば、決定操作が行われたときにモデル位置推定部１０７から受け取った変換行列Ｔ_ｗｍを取得することで、「位置合わせ」を行うことになる。決定操作が行われたときの変換行列Ｔ_ｗｍは、例えば、世界座標系における対象物体の位置及び姿勢と、世界座標系における３Ｄモデルの位置及び姿勢とが、ある対応関係にあるとき（又はマッピングしたとき）である。対応関係としては、例えば、世界座標系において、対象物体と３Ｄモデルとが一致する関係がある。物体位置認識部１１１は、このような対応関係にあるときの、変換行列Ｔ_ｗｍを、モデル位置推定部１０７から取得している、といえる。 When determining that the determination operation has been performed (Yes in S17), the information processing apparatus 100 calculates the position and orientation of the target object (S18). Specifically, the object position recognizing unit 111 performs “alignment” by acquiring the transformation matrix _Twm received from the model position estimating unit 107 when the determination operation is performed, for example. The transformation matrix _Twm when the determination operation is performed is, for example, when the position and orientation of the target object in the world coordinate system and the position and orientation of the 3D model in the world coordinate system have a certain correspondence (or mapping). ). As the correspondence relationship, for example, there is a relationship where the target object and the 3D model match in the world coordinate system. It can be said that the object position recognizing unit 111 has obtained the transformation matrix _Twm from the model position estimating unit 107 when there is such a correspondence.

そして、物体位置認識部１１１は、図１３（Ｂ）に示すように、「位置合わせ」により取得した変換行列Ｔ_ｗｍや、モデル位置推定部１０７から受け取った変換行列Ｔ_ｃｗ、及び投影行列Ｔ_ｐを用いて、座標変換を（１）から（３）の順で行う。 Then, as shown in FIG. 13B, the object position recognition unit 111 converts the transformation matrix T _wm acquired by “positioning”, the transformation matrix T _cw received from the model position estimation unit 107, and the projection matrix T _p. Is used to perform coordinate conversion in the order of (1) to (3).

具体的には、情報処理装置１００は、例えば、以下の処理を行う。すなわち、物体位置認識部１１１は、モデル位置推定部１０７から受け取った、モデル座標系（Ｘ_ｍ，Ｙ_ｍ，Ｚ_ｍ）における３Ｄモデルデータを、決定操作のタイミングでモデル位置推定部１０７から受け取った変換行列Ｔ_ｗｍを用いて、世界座標系（Ｘ，Ｙ，Ｚ）の３Ｄモデルデータへ変換する（図１３（Ｂ）の（１））。さらに、物体位置認識部１１１は、自己位置推定部１０４から受け取った変換行列Ｔ_ｃｗを用いて、世界座標系（Ｘ，Ｙ，Ｚ）の３Ｄモデルデータを、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）の３Ｄモデルデータへ変換する（図１３（Ｂ）の（２））。物体位置認識部１１１は、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータをモデル描画部１０８へ出力する。モデル描画部１０８では、投影行列Ｔ_ｐを用いて、物体位置認識部１１１から受け取った、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータを、画像座標系（ｘ，ｙ）の３Ｄモデルデータへ変換する（図１３（Ｂ）の（３））。そして、モデル描画部１０８は、変換後の３Ｄモデルデータと、記憶部１０３から読み出したＲＧＢ画像データとを、画像座標系（ｘ，ｙ）に描画する。 Specifically, the information processing apparatus 100 performs, for example, the following processing. That is, the object position identification unit 111, received from the model position estimation unit 107, a model coordinate system _{_{_{(X m, Y m, Z}}} m) of the 3D model data in, received from the model position estimating unit 107 at the timing of determining operation Using the conversion matrix _Twm , the data is converted into 3D model data in the world coordinate system (X, Y, Z) ((1) in FIG. 13B). Further, the object position recognizing unit 111 converts the 3D model data of the world coordinate system (X, Y, Z) into the camera coordinate system (X _c , Y _c ) using the transformation matrix T _cw received from the self-position estimating unit 104. , _Zc ) into 3D model data ((2) in FIG. 13B). The object position recognition unit 111 outputs the 3D model data in the camera coordinate system ( _Xc , _Yc , _Zc ) to the model drawing unit 108. The model rendering unit 108, using the projection matrix _{T p,} received from the object position recognizing unit 111, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) the 3D model data in the image coordinate system (x, y) (3) in FIG. 13 (B). Then, the model drawing unit 108 draws the converted 3D model data and the RGB image data read from the storage unit 103 in the image coordinate system (x, y).

なお、モデル描画部１０８は、投影行列Ｔ_ｐを用いることで、ｘ＝−ｆＸ_ｃ／Ｚ_ｃ、ｙ＝−ｆＹ_ｃ／Ｚ_ｃにより、カメラ座標系の点（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）を、画像座標系の点（ｘ，ｙ）へ変換する。モデル描画部１０８は、例えば、内部メモリに投影行列Ｔ_ｐを記憶しておき、処理の際に読み出して、カメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）における３Ｄモデルデータに適用することで、画像座標系（ｘ，ｙ）における３Ｄモデルデータを得る。 Incidentally, the model rendering unit 108, by using the projection matrix _{_{_{T p, x = -fX c /}}} Z c, by _y = -fY _c / _Z c, the point of the camera coordinate system _{_{_{(X c, Y c, Z}}} c ) Is converted to a point (x, y) in the image coordinate system. Model rendering unit 108, for example, stores the projection matrix _{T p} in the internal memory, reads the time of processing, the camera coordinate system _{_{_{(X c, Y c, Z}}} c) By applying the 3D model data in , 3D model data in the image coordinate system (x, y) is obtained.

図１３（Ｂ）に示すように、座標変換が（１）から（３）の順で行われ、物体位置認識部１１１では、世界座標系（Ｘ，Ｙ，Ｚ）における３Ｄモデルデータを生成した。その後の座標変換により、情報処理装置１００では、この３Ｄモデルデータを、世界座標系（Ｘ，Ｙ，Ｚ）からカメラ座標系（Ｘ_ｃ，Ｙ_ｃ，Ｚ_ｃ）を介して、画像座標系（ｘ，ｙ）へと変換した。このように、「位置合わせ」後、３Ｄモデルデータは、世界座標系（Ｘ，Ｙ，Ｚ）を介して３Ｄモデルデータを画像座標系（ｘ，ｙ）へと変換されている。そのため、３Ｄモデルデータは、「位置合わせ」後、世界座標系（Ｘ，Ｙ，Ｚ）と対応して表示部１０９に表示される。例えば、図１４（Ｂ）において「位置合わせ」が行われた後は、対象物体である「ティッシュ箱」と、３Ｄモデルデータとが一致した状態で、表示部１０９に表示される。カメラの位置及び姿勢を変化させても、３Ｄモデルは対象物体と一致した状態で表示される。 As shown in FIG. 13B, coordinate conversion is performed in the order of (1) to (3), and the object position recognition unit 111 generates 3D model data in the world coordinate system (X, Y, Z). . By the subsequent coordinate conversion, the information processing apparatus 100 converts the 3D model data from the world coordinate system (X, Y, Z) to the image coordinate system ( _Xc , _Yc , _Zc ) via the camera coordinate system ( _Xc , _Yc , _Zc ). x, y). As described above, after “alignment”, the 3D model data is converted from the 3D model data to the image coordinate system (x, y) via the world coordinate system (X, Y, Z). Therefore, the 3D model data is displayed on the display unit 109 in correspondence with the world coordinate system (X, Y, Z) after “alignment”. For example, after “alignment” is performed in FIG. 14B, the target object “tissue box” and the 3D model data are displayed on the display unit 109 in a state where they match. Even when the position and orientation of the camera are changed, the 3D model is displayed in a state in which it matches the target object.

図１５（Ａ）から図１６（Ｂ）は、異なる形状の３Ｄモデルの例を表す。また、対象物体も、「サーバ装置」の例を表している。図１５（Ａ）から図１６（Ｂ）に示すように、３Ｄモデルは、重力ベクトルｇ_ｃに応じて、その位置と姿勢が変化している。 FIGS. 15A to 16B show examples of 3D models having different shapes. The target object also represents an example of the “server device”. As shown in FIG. 16 (B) from FIG. 15 (A), 3D model, depending on the gravity vector _{g c,} is changing its position and orientation.

このように、本第１の実施の形態における３Ｄモデルは、重力ベクトルｇ_ｃに応じて、その位置と姿勢が変化する。そのため、３Ｄモデルが表示部１０９において固定となっている場合と比較して、ユーザがカメラを移動させる自由度が増し、カメラの位置と姿勢に応じて、「位置合わせ」を容易に行うことができる。 Thus, 3D model in the first embodiment, depending on the gravity vector g _c, the position and orientation changes. Therefore, compared to the case where the 3D model is fixed on the display unit 109, the degree of freedom for the user to move the camera is increased, and “positioning” can be easily performed according to the position and orientation of the camera. it can.

また、本情報処理装置１００における「位置合わせ」に際して、３Ｄモデルの画像データの特徴点を検出したり、対象物体の特徴点を検出したりする処理は、Ｓ１４からＳ１８までの処理では行われない。従って、図１１（Ａ）などに示す「ティッシュ箱」や、図１５（Ａ）などに示す「サーバ装置」など、「見た目の特徴が乏しい」対象物体の画像であっても、重力ベクトルｇ_ｃに応じて変化する３Ｄモデルを用いているため、精度の良い「位置合わせ」を行うことが可能となる。 In addition, in the “positioning” in the information processing apparatus 100, the process of detecting the feature point of the image data of the 3D model or the process of detecting the feature point of the target object is not performed in the processes from S14 to S18. . Therefore, even if the image of the target object is “poor in appearance”, such as the “tissue box” shown in FIG. 11A or the “server device” shown in FIG. 15A, the gravity vector g _c Since the 3D model that changes according to the position is used, it is possible to perform “alignment” with high accuracy.

［その他の実施の形態］
図１７は、情報処理装置１００のハードウェア構成例を表す図である。 [Other embodiments]
FIG. 17 is a diagram illustrating a hardware configuration example of the information processing apparatus 100.

情報処理装置１００は、さらに、カメラ１２０、メモリ１２１、ＣＰＵ（Central Processing Unit）１２２、ＲＯＭ（Read Only Memory）１２３、及びＲＡＭ（Random Access Memory）１２４を備える。 The information processing apparatus 100 further includes a camera 120, a memory 121, a CPU (Central Processing Unit) 122, a ROM (Read Only Memory) 123, and a RAM (Random Access Memory) 124.

メモリ１２１は、例えば、第１の実施の形態における記憶部１０３に対応する。 The memory 121 corresponds to, for example, the storage unit 103 in the first embodiment.

また、ＣＰＵ１２２は、ＲＯＭ１２３に記憶されたプログラムを読み出してＲＡＭ１２４にロードし、ロードしたプログラムを実行する。これにより、ＣＰＵ１２２は、自己位置推定部１０４、重力方向推定部１０５、初期化処理部１０６、モデル位置推定部１０７、モデル描画部１０８、表示部１０９、認識開始判定部１１０、及び物体位置認識部１１１の機能を実現する。ＣＰＵ１２２は、例えば、自己位置推定部１０４、重力方向推定部１０５、初期化処理部１０６、モデル位置推定部１０７、モデル描画部１０８、表示部１０９、認識開始判定部１１０、及び物体位置認識部１１１に対応する。 Further, the CPU 122 reads the program stored in the ROM 123, loads the program into the RAM 124, and executes the loaded program. Thereby, the CPU 122 determines that the self-position estimating unit 104, the gravity direction estimating unit 105, the initialization processing unit 106, the model position estimating unit 107, the model drawing unit 108, the display unit 109, the recognition start determining unit 110, and the object position recognizing unit 111 functions are realized. The CPU 122 includes, for example, the self-position estimating unit 104, the gravity direction estimating unit 105, the initialization processing unit 106, the model position estimating unit 107, the model drawing unit 108, the display unit 109, the recognition start determining unit 110, and the object position recognizing unit 111. Corresponding to

なお、ＣＰＵ１２２に代えて、ＭＰＵ（Micro Processing Unit）やＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）などのプロセッサやコントローラなどが用いられてもよい。 Note that, instead of the CPU 122, a processor or a controller such as an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), or an FPGA (Field Programmable Gate Array) may be used.

図１８は、情報処理装置１００の他の構成例を表す図である。図１８に示す例は、情報処理装置１００の外部に撮像装置１３０があり、撮像装置１３０において、対象物体を含む画像を撮像する例を示す。撮像装置１３０は、例えば、カメラなどであり、撮像部１０１と慣性センサ１０２を含む。撮像装置１３０は、移動可能であり、ユーザにより様々な場所に移動することができる。撮像装置１３０で撮像されたＲＧＢデータや、計測された加速度データは、有線や無線で情報処理装置１００へ送信することができる。図１８に示すように、情報処理システム１０には、情報処理装置１００と撮像装置１３０が含まれる。 FIG. 18 is a diagram illustrating another configuration example of the information processing apparatus 100. The example illustrated in FIG. 18 illustrates an example in which the imaging device 130 is provided outside the information processing device 100 and the imaging device 130 captures an image including the target object. The imaging device 130 is, for example, a camera, and includes the imaging unit 101 and the inertial sensor 102. The imaging device 130 is movable, and can be moved to various places by the user. The RGB data captured by the imaging device 130 and the measured acceleration data can be transmitted to the information processing device 100 by wire or wirelessly. As shown in FIG. 18, the information processing system 10 includes an information processing device 100 and an imaging device 130.

以上まとめると、付記のようになる。 The summary is as follows.

（付記１）
情報処理装置の加速度データを出力する慣性センサと、
前記加速度データから世界座標系における第１の重力方向を推定する重力方向推定部と、
カメラ座標系におけるモデルの位置と姿勢を、カメラ座標系に対する前記第１の重力方向に応じて変化させ、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換し、モデル座標系における前記モデルの位置と姿勢を世界座標系における前記モデルの位置と姿勢へそれぞれ変換する第１の変換行列を算出するモデル位置推定部と、
カメラ座標系における前記モデルの位置と姿勢を画像座標系における前記モデルの位置に変換し、入力画像と前記モデルとを画像座標系に描画するモデル描画部と、
前記モデル描画部の描画結果に従って、前記入力画像と前記モデルとを表示する表示部と
を備えることを特徴とする情報処理装置。 (Appendix 1)
An inertial sensor that outputs acceleration data of the information processing device;
A gravitational direction estimating unit that estimates a first gravitational direction in a world coordinate system from the acceleration data;
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. A model position estimating unit that calculates a first conversion matrix that converts and converts the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
A model drawing unit that converts a position and orientation of the model in a camera coordinate system into a position of the model in an image coordinate system, and draws an input image and the model in an image coordinate system.
An information processing apparatus, comprising: a display unit that displays the input image and the model according to a drawing result of the model drawing unit.

（付記２）
更に、前記モデルと前記入力画像に含まれる対象物体とが世界座標系で対応関係にあるときの前記第１の変換行列を前記モデル位置推定部から取得する物体位置認識部を備えることを特徴とする付記１記載の情報処理装置。 (Appendix 2)
Further, an object position recognizing unit that acquires the first transformation matrix from the model position estimating unit when the model and the target object included in the input image are in a correspondence relationship in a world coordinate system, The information processing apparatus according to Supplementary Note 1.

（付記３）
前記物体位置認識部は、前記モデルと前記対象物体とが世界座標系で一致したときの前記第１の変換行列を前記モデル位置推定部から取得することを特徴とする付記２記載の情報処理装置。 (Appendix 3)
The information processing apparatus according to claim 2, wherein the object position recognizing unit acquires the first transformation matrix when the model and the target object match in the world coordinate system from the model position estimating unit. .

（付記４）
前記物体位置認識部は、ユーザの決定操作を示す信号を受信したとき、前記第１の変換行列を前記モデル位置推定部から取得することを特徴とする付記２記載の情報処理装置。 (Appendix 4)
The information processing apparatus according to claim 2, wherein the object position recognition unit acquires the first transformation matrix from the model position estimation unit when receiving a signal indicating a user's determination operation.

（付記５）
前記物体位置認識部は、前記第１の変換行列を利用して、モデル座標系における前記モデルの位置と姿勢を、世界座標系を介してカメラ座標系における前記モデルの位置と姿勢へそれぞれ変換し、
前記モデル描画部は、前記物体位置認識部でカメラ座標系に変換された前記モデルの位置と姿勢に基づいて、前記モデルと入力画像を画像座標系に描画する
ことを特徴とする付記２記載の情報処理装置。 (Appendix 5)
The object position recognition unit converts the position and orientation of the model in a model coordinate system into the position and orientation of the model in a camera coordinate system via a world coordinate system, using the first transformation matrix. ,
3. The model drawing unit according to claim 2, wherein the model drawing unit draws the model and the input image in an image coordinate system based on the position and orientation of the model converted into a camera coordinate system by the object position recognition unit. Information processing device.

（付記６）
前記モデル位置推定部は、前記第１の重力方向を、カメラ座標系における第２の重力方向に変換して、カメラ座標系に対する前記第１の重力方向とし、
カメラ座標系における視点位置が前記第１の重力方向が変化する方向へ移動するとき、前記第２の重力方向は変化し、カメラ座標系における視点位置が前記第１の重力方向が変化しない方向に移動するとき、前記第２の重力方向は変化しないことを特徴とする付記１記載の情報処理装置。 (Appendix 6)
The model position estimating unit converts the first gravitational direction into a second gravitational direction in a camera coordinate system, and sets the first gravitational direction as the first gravitational direction in a camera coordinate system.
When the viewpoint position in the camera coordinate system moves in a direction in which the first gravity direction changes, the second gravity direction changes, and the viewpoint position in the camera coordinate system moves in a direction in which the first gravity direction does not change. The information processing apparatus according to claim 1, wherein the second gravity direction does not change when moving.

（付記７）
更に、前記入力画像の画像データに基づいて、世界座標系における撮像部又は撮像装置の位置と姿勢を推定し、推定した位置と姿勢に基づいて、世界座標系からカメラ座標系へ変換する第２の変換行列を算出する自己位置推定部を備え、
前記モデル位置推定部は、前記第２の変換行列を用いて、前記第１の重力方向を前記第２の重力方向へ変換することを特徴とする付記６記載の情報処理装置。 (Appendix 7)
Further, based on the image data of the input image, the position and orientation of the imaging unit or the imaging device in the world coordinate system are estimated, and based on the estimated position and orientation, a second coordinate conversion is performed from the world coordinate system to the camera coordinate system. A self-position estimating unit that calculates a transformation matrix of
The information processing apparatus according to claim 6, wherein the model position estimating unit converts the first gravitational direction into the second gravitational direction using the second conversion matrix.

（付記８）
更に、前記モデル上の任意の２点と、前記２点間のユークリッド距離、前記２点を含む平面における第１の法線ベクトル、及び鉛直下方向ベクトルに基づいて、仮想モデル座標系を設定し、仮想モデル座標系における仮想カメラの視点位置から前記２点への単位視線ベクトルを算出する初期化処理部を備えることを特徴とする付記１記載の情報処理装置。 (Appendix 8)
Further, a virtual model coordinate system is set based on any two points on the model, a Euclidean distance between the two points, a first normal vector on a plane including the two points, and a vertical downward vector. The information processing apparatus according to claim 1, further comprising an initialization processing unit that calculates a unit line-of-sight vector from the viewpoint position of the virtual camera in the virtual model coordinate system to the two points.

（付記９）
更に、前記入力画像の画像データに基づいて、世界座標系における撮像部又は撮像装置の位置と姿勢を推定し、世界座標系からカメラ座標系へ変換する第２の変換行列を算出する自己位置推定部を備え、
前記モデル位置設定部は、前記第２の変換行列を用いて、前記第１の重力方向を前記第２の重力方向へ変換し、前記第２の重力方向と前記単位視線ベクトルとに基づいて、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換する第３の変換行列を算出し、前記第３の変換行列を用いて、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換することを特徴とする付記８記載の情報処理装置。 (Appendix 9)
Further, self-position estimation for estimating the position and orientation of the imaging unit or the imaging device in the world coordinate system based on the image data of the input image and calculating a second transformation matrix for converting the world coordinate system to the camera coordinate system. Part,
The model position setting unit converts the first gravitational direction to the second gravitational direction using the second conversion matrix, and based on the second gravitational direction and the unit line-of-sight vector, Calculating a third transformation matrix for transforming the position and orientation of the model in the camera coordinate system into the position and orientation of the model in the model coordinate system, and using the third transformation matrix to calculate the model in the camera coordinate system; 9. The information processing apparatus according to claim 8, wherein the position and the posture of the model are respectively converted into the position and the posture of the model in a model coordinate system.

（付記１０）
前記モデル位置推定部は、前記第１の重力方向を有する重力ベクトルと前記鉛直下方向ベクトルとに基づいて、前記第１の法線ベクトルを、カメラ座標系における第２の法線ベクトルへ変換し、前記第２の法線ベクトルと前記単位視線ベクトルとに基づいて、カメラ座標系における前記モデルの２点を算出し、カメラ座標系における前記モデルの２点と、前記第２の法線ベクトル、及びモデル座標系のＸ軸方向における前記モデルの長さとに基づいて、前記第３の変換行列を算出することを特徴とする付記９記載の情報処理装置。 (Appendix 10)
The model position estimating unit converts the first normal vector into a second normal vector in a camera coordinate system based on the gravity vector having the first gravity direction and the vertical downward vector. Calculating two points of the model in a camera coordinate system based on the second normal vector and the unit line-of-sight vector, and calculating the two points of the model in a camera coordinate system, and the second normal vector; 10. The information processing apparatus according to claim 9, wherein the third transformation matrix is calculated based on a length of the model in the X-axis direction of the model coordinate system.

（付記１１）
前記モデル位置推定部は、前記第２の変換行列をＴ_ｃｗ、前記第３の変換行列をＴ_ｍｃとすると、内部メモリから読み出した以下の式（１４）を用いて、前記第１の変換行列Ｔ_ｗｍを算出することを特徴とする付記１０記載の情報処理装置。

(Appendix 11)
_Assuming that the second transformation matrix is T _cw and the third transformation matrix is T _mc , the model position estimating unit calculates the first transformation matrix by using the following equation (14) read from the internal memory. The information processing device according to claim 10, wherein _Twm is calculated.

（付記１２）
情報処理装置の加速度データを出力し、
前記加速度データから世界座標系における第１の重力方向を推定し、
カメラ座標系におけるモデルの位置と姿勢を、カメラ座標系に対する前記第１の重力方向に応じて変化させ、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換し、モデル座標系における前記モデルの位置と姿勢を世界座標系における前記モデルの位置と姿勢へそれぞれ変換する第１の変換行列を算出し、
カメラ座標系における前記モデルの位置と姿勢を画像座標系における前記モデルの位置に変換し、入力画像と前記モデルとを画像座標系に描画し、
描画結果に従って、前記入力画像と前記モデルとを表示する
ことを特徴とする位置合わせ方法。 (Appendix 12)
Outputting acceleration data of the information processing device,
Estimating a first direction of gravity in the world coordinate system from the acceleration data,
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. Calculating a first transformation matrix for transforming the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
Convert the position and orientation of the model in the camera coordinate system to the position of the model in the image coordinate system, draw the input image and the model in the image coordinate system,
A positioning method, comprising: displaying the input image and the model according to a drawing result.

（付記１３）
情報処理装置のコンピュータに実行させるプログラムであって、
前記情報処理装置の加速度データを出力し、
前記加速度データから世界座標系における第１の重力方向を推定し、
カメラ座標系におけるモデルの位置と姿勢を、カメラ座標系に対する前記第１の重力方向に応じて変化させ、カメラ座標系における前記モデルの位置と姿勢をモデル座標系における前記モデルの位置と姿勢へそれぞれ変換し、モデル座標系における前記モデルの位置と姿勢を世界座標系における前記モデルの位置と姿勢へそれぞれ変換する第１の変換行列を算出し、
カメラ座標系における前記モデルの位置と姿勢を画像座標系における前記モデルの位置に変換し、入力画像と前記モデルとを画像座標系に描画し、
描画結果に従って、前記入力画像と前記モデルとを表示する
処理を前記コンピュータに実行させるプログラム。 (Appendix 13)
A program to be executed by a computer of the information processing apparatus,
Outputting acceleration data of the information processing device;
Estimating a first direction of gravity in the world coordinate system from the acceleration data,
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. Calculating a first transformation matrix for transforming the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
Convert the position and orientation of the model in the camera coordinate system to the position of the model in the image coordinate system, draw the input image and the model in the image coordinate system,
A program for causing the computer to execute a process of displaying the input image and the model according to a drawing result.

１０：情報処理システム１００：情報処理装置
１０１：撮像部１０２：慣性センサ
１０３：記憶部１０４：自己位置推定部
１０５：重力方向推定部１０６：初期化処理部
１０７：モデル位置推定部１０８：モデル描画部
１０９：表示部１１０：認識開始判定部
１１１：物体位置認識部１２０：カメラ
１２２：ＣＰＵ１３０：撮像装置 10: Information processing system 100: Information processing apparatus 101: Imaging unit 102: Inertial sensor 103: Storage unit 104: Self-position estimation unit 105: Gravity direction estimation unit 106: Initialization processing unit 107: Model position estimation unit 108: Model drawing Unit 109: display unit 110: recognition start determination unit 111: object position recognition unit 120: camera 122: CPU 130: imaging device

Claims

An inertial sensor that outputs acceleration data of the information processing device;
A gravitational direction estimating unit that estimates a first gravitational direction in a world coordinate system from the acceleration data;
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. A model position estimating unit that calculates a first conversion matrix that converts and converts the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
A model drawing unit that converts a position and orientation of the model in a camera coordinate system into a position of the model in an image coordinate system, and draws an input image and the model in an image coordinate system.
An information processing apparatus, comprising: a display unit that displays the input image and the model according to a drawing result of the model drawing unit.

Further, an object position recognizing unit that acquires the first transformation matrix from the model position estimating unit when the model and the target object included in the input image are in a correspondence relationship in a world coordinate system, The information processing apparatus according to claim 1, wherein

The information processing apparatus according to claim 2, wherein the object position recognition unit acquires the first transformation matrix when the model and the target object match in a world coordinate system from the model position estimation unit. apparatus.

The model position estimating unit converts the first gravitational direction into a second gravitational direction in a camera coordinate system, and sets the first gravitational direction as the first gravitational direction in a camera coordinate system.
When the viewpoint position in the camera coordinate system moves in a direction in which the first gravity direction changes, the second gravity direction changes, and the viewpoint position in the camera coordinate system moves in a direction in which the first gravity direction does not change. The information processing apparatus according to claim 1, wherein the second gravity direction does not change when moving.

Outputting acceleration data of the information processing device,
Estimating a first direction of gravity in the world coordinate system from the acceleration data,
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. Calculating a first transformation matrix for transforming the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
Convert the position and orientation of the model in the camera coordinate system to the position of the model in the image coordinate system, draw the input image and the model in the image coordinate system,
A positioning method, comprising: displaying the input image and the model according to a drawing result.

A program to be executed by a computer of the information processing apparatus,
Outputting acceleration data of the information processing device;
Estimating a first direction of gravity in the world coordinate system from the acceleration data,
The position and orientation of the model in the camera coordinate system are changed according to the first direction of gravity with respect to the camera coordinate system, and the position and orientation of the model in the camera coordinate system are respectively changed to the position and orientation of the model in the model coordinate system. Calculating a first transformation matrix for transforming the position and orientation of the model in the model coordinate system to the position and orientation of the model in the world coordinate system, respectively.
Convert the position and orientation of the model in the camera coordinate system to the position of the model in the image coordinate system, draw the input image and the model in the image coordinate system,
A program for causing the computer to execute a process of displaying the input image and the model according to a drawing result.