JP2004235934A

JP2004235934A - Calibration processing device, calibration processing method, and computer program

Info

Publication number: JP2004235934A
Application number: JP2003021605A
Authority: JP
Inventors: Ikoku Go; 偉国呉
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-01-30
Filing date: 2003-01-30
Publication date: 2004-08-19
Anticipated expiration: 2023-01-30
Also published as: JP4297197B2; JP4238586B2; JP2009071844A

Abstract

【課題】多視点画像撮影システムにおけるキャリブレーションを高精度にかつ効率的に実行する装置および方法を提供する。
【解決手段】キャリブレーション処理対象となるカメラにより、発光球体を異なる方向から撮影し、撮影画像フレームから球体中心位置を特徴点として抽出し、抽出した球体中心位置に基づく特徴点対応付けを実行し、特徴点対応付けデータに基づいてファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の算出を実行する。照明環境等に影響しにくい発光球体をキャリブレーション治具として用い、球体の中心位置をリアルタイムに検出し対応付けしながらＦ行列を推定し、推定されたＦ行列と対応点との誤差や位置関係などを評価することによって、必要となる対応点の数や三次元空間上に置くべき球体位置を指示し、より高精度なＦ行列を効率的に算出することが可能となる。
【選択図】図１１An apparatus and method for performing calibration in a multi-viewpoint image capturing system with high accuracy and efficiency.
A luminous sphere is photographed from different directions by a camera to be subjected to calibration processing, a sphere center position is extracted as a feature point from a photographed image frame, and feature point association based on the extracted sphere center position is executed. And a calculation of a fundamental matrix based on the feature point association data. Using a luminous sphere that does not easily affect the lighting environment, etc., as a calibration jig, the F matrix is estimated while detecting and associating the center position of the sphere in real time, and the error and positional relationship between the estimated F matrix and the corresponding point. By evaluating such factors, the number of necessary corresponding points and the position of a sphere to be placed in a three-dimensional space are indicated, and a more accurate F matrix can be efficiently calculated.
[Selection diagram] FIG.

Description

【０００１】
【発明の属する技術分野】
本発明は、キャリブレーション処理装置、およびキャリブレーション処理方法、並びにコンピュータ・プログラムに関する。さらに詳細には、異なる位置に配置した複数のカメラで被写体を撮影して多視点画像を生成する多視点画像撮影カメラ間の調整、または多視点画像撮影カメラの取得画像の補正処理に適用するパラメータ算出処理を行うキャリブレーション処理装置、およびキャリブレーション処理方法、並びにコンピュータ・プログラムに関する。
【０００２】
【従来の技術】
近年、パノラマ、全天球画像など、視点を様々に移動可能とした画像データの利用が盛んになりつつある。例えば、ＤＶＤ、ＣＤ等の記憶媒体に複数の視点位置、視線方向からある被写体を撮影した画像を蓄積し、蓄積画像をＣＲＴ、液晶表示装置等に表示する際に、ユーザがコントローラの操作によって、自由な位置に視点を移動させて、被写体の像を観察するシステムが実現されている。また、インターネット等の通信システムを介して複数の視点位置、視線方向からある被写体を撮影した画像を配信し、ユーザがＰＣ等のマウス操作により、好みの視点位置、視線方向からの画像をディスプレイに表示するシステム等が構築されている。
【０００３】
コンピュータの処理能力の向上や多様な映像メディア再生機器の発展に伴って、以前困難とされた膨大なポリゴンデータや映像データ（コンテンツ）の処理が可能となり、視点の異なる複数台カメラで撮影された実世界（対象）の映像データをコンピュータや映像機器等により処理し、ユーザの要望に応じた任意視点の映像を実時間に生成し提示することができるようになってきている。
【０００４】
コンピュータ上の処理によって複数の視点で撮影された画像から任意視点映像を生成し、提示するためには、（１）全てのカメラが同一の領域（注目対象）を見えていること、（２）各カメラ間の位置関係等が得られることが必要である。
【０００５】
同じ対象を注目している視点の異なる複数カメラで撮影した多視点映像に基づいて、実写カメラの間の任意位置の仮想カメラで撮影した仮想視点映像を画像処理によって生成することが可能であるが、高精度な仮想視点映像を生成するには実写カメラ間の位置関係を正確に求めることが必要となる。実写カメラ間の位置関係を示すパラメータとしてジオメトリ・パラメータ（ＧｅｏｍｅｔｒｙＰａｒａｍｅｔｅｒｓ）がある。ジオメトリ・パラメータは、行列式によって表現され、これをファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列またはＦ行列と呼ぶ。
【０００６】
実写カメラ間の位置関係を表すファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を求める方法は、これまでにも数多く提案されている。例えば、幾つかの（隣接）カメラによって、三次元空間にあるチェッカパターンを同時に観測し、チェッカパターン画像上の特徴点（例えば、白黒パターンの交差点位置）を抽出し、それらの特徴点の対応付けを行うことによって、ファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を推定する方法が知られている（例えば非特許文献１）。
【０００７】
しかしながら、（１）実際の撮影現場では、照明光の影響（例えば、反射光）等によって、撮影したチェッカパターン画像から必ずしも対応点を精度よく求めることができないため、高精度なファンダメンタル行列（以下、Ｆ行列と表現する）を推定することが容易ではない。（２）また、視点の異なる多数カメラで同一のチェッカパターンプレートを観測する場合、観測カメラはチェッカパターンプレートに正対していない場合が多く、そのため、観測されたチェッカパターン画像上のパターン変形によって、チェッカパターン画像における特徴点位置を精度よく検出することが容易ではないという問題がある。
【０００８】
（３）さらに、多数のカメラを同じ対象（被写体）を囲むように配置しようとした場合、撮影されたチェッカパターンが変形するだけでなく、平面状のチェッカパターンを用いると、チェッカパターンを３６０度周囲取り囲むよう配置しても全てのカメラが同じチェッカパターンを同時に撮影することができない。その際、多視点カメラを幾つかのグループに分けて、チェッカパターンを複数のカメラで複数回に分けて撮影することが必要となり、多くの労力や時間等を費やすことになる。（４）さらに、カメラキャリブレーションを行う際、Ｆ行列を推定するためのチェッカ画像パターンが十分に収録できたか、またそれらのチェッカパターンの空間位置が適切であるかどうかを判断する処理についての明確な基準がなく、結果的に算出されるＦ行列の精度が保証されないという問題がある。
【０００９】
【非特許文献１】
Ｒ．Ｙ．Ｔｓａｉ：ＡｎＥｆｆｉｃｉｅｎｔａｎｄＡｃｃｕｒａｔｅＣａｍｅｒａＣａｌｉｂｒａｔｉｏｎＴｅｃｈｎｉｑｕｅｆｏｒ３ＤＭａｃｈｉｎｅＶｉｓｉｏｎ，ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，ＭｉａｍｉＢｅａｃｈ，ＦＬ，ｐｐ．３６４−３７４，（１９８６）
【００１０】
【発明が解決しようとする課題】
本発明は、上述の問題点に鑑みてなされたものであり、多視点映像撮影システムを構成する実写カメラ間の位置関係を示すパラメータであるジオメトリ・パラメータ（ＧｅｏｍｅｔｒｙＰａｒａｍｅｔｅｒｓ）の算出を高精度に行なうことを可能とし、高速かつ高精度なカメラキャリブレーションを実現するカメラキャリブレーション装置、およびカメラキャリブレーション方法、並びにコンピュータ・プログラムを提供することを目的とする。
【００１１】
本発明は、照明環境等に影響しにくい発光球体をキャリブレーション治具として用い、球体の中心位置をリアルタイムに検出し対応付けしながらＦ行列を推定し、推定されたＦ行列と対応点との誤差や位置関係などを評価することによって、必要となる対応点の数や三次元空間上に置くべき球体位置を指示し、より高精度なＦ行列を自動的に推定するものである。
【００１２】
球体の形状はカメラ観測視点位置によらないため、様々な視点位置で全てのカメラによって観測可能となる。また、同期機能を備えた多視点映像撮影システムによって球体画像を観測撮影し、各パソコン上のメモリやハードディスクなどの記録媒体に収録しながらカメラキャリブレーションを行うことが可能である。そして、各パソコン上の画像処理手法によって、各フレーム画像における球体中心の座標位置（特徴点）を抽出し、抽出した特徴点を用いた画像間の対応付けによってファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を推定する。さらに、推定されたＦ行列の精度評価を実行して、対応点の数や対応点の適切な位置、つまり、球体の空間位置を自動的に提示し、高速かつ高精度なカメラ間パラメータの自動的推定を実現する装置および方法を提供する。
【００１３】
【課題を解決するための手段】
本発明の第１の側面は、
多視点画像撮影カメラ間の調整、または多視点画像撮影カメラによる取得画像の補正処理に適用するパラメータ算出処理を実行するキャリブレーション処理装置であり、
移動する球体を異なる視点方向から撮影した複数カメラの映像データを入力する画像入力部と、
前記画像入力部において入力する複数カメラの映像データを構成する複数の撮影画像フレームから球体中心位置を特徴点として抽出する処理を実行する特徴点抽出部と、
各カメラの対応フレームにおける特徴点として前記特徴点抽出部において抽出された球体中心位置の対応付け処理を実行する対応付け処理部と、
前記対応付け処理部において対応付けのなされた特徴点対応データに基づいてキャリブレーションパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を算出するＦ行列算出部と、
を有することを特徴とするキャリブレーション処理装置にある。
【００１４】
さらに、本発明のキャリブレーション処理装置の一実施態様において、前記球体は、発光球体であり、前記特徴点抽出部は、フレームに撮影された球体のエッジ情報に基づいて球体中心位置を算出する処理を実行する構成であることを特徴とする。
【００１５】
さらに、本発明のキャリブレーション処理装置の一実施態様において、前記Ｆ行列算出部は、算出したＦ行列に基づくエピポーララインを設定し、設定したエピポーララインと、特徴点に対応する球体中心位置との距離を個別にまたは平均距離として算出し、該算出距離が予め定めた閾値より大である場合は新たな特徴点を含む特徴点対応データに基づいてＦ行列の再算出処理を実行する構成であることを特徴とする。
【００１６】
さらに、本発明のキャリブレーション処理装置の一実施態様において、前記Ｆ行列算出部は、撮影画像フレーム内における球体中心位置、すなわち特徴点の分布状況判定処理を実行し、特徴点偏在が確認された場合、該偏在を解消する位置に追加設定した特徴点を含む特徴点対応データに基づくＦ行列再算出処理を実行する構成であることを特徴とする。
【００１７】
さらに、本発明のキャリブレーション処理装置の一実施態様において、前記特徴点抽出部は、撮影フレーム内に球体全体のエッジが存在する場合にのみ、該エッジ情報に基づく中心位置算出処理を実行する構成であることを特徴とする。
【００１８】
さらに、本発明のキャリブレーション処理装置の一実施態様において、前記特徴点抽出部は、前記Ｆ行列の９個の要素の比率を算出するために必要となる少なくとも８点の球体中心位置データを各カメラの複数の撮影画像フレームから取得し、前記対応付け処理部は、前記特徴点抽出部において抽出された８点以上の球体中心位置の対応付け処理を実行し、前記Ｆ行列算出部は、対応付けのなされた８以上の特徴点対応データに基づいてキャリブレーションパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の要素値算出を実行する構成であることを特徴とする。
【００１９】
さらに、本発明の第２の側面は、
多視点画像撮影カメラ間の調整、または多視点画像撮影カメラの取得画像の補正処理に適用するパラメータ算出処理を実行するキャリブレーション処理方法であり、
移動する球体を異なる視点方向から撮影した複数カメラの映像データを入力する画像入力ステップと、
前記画像入力ステップにおいて入力する複数カメラの映像データを構成する複数の撮影画像フレームから球体中心位置を特徴点として抽出する特徴点抽出ステップと、
前記特徴点抽出ステップにおいて抽出された球体中心位置に基づいて特徴点対応付け処理を実行する対応付け処理ステップと、
前記対応付け処理ステップにおいて対応付けのなされた特徴点対応データに基づいてキャリブレーションパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を算出するＦ行列算出ステップと、
を有することを特徴とするキャリブレーション処理方法にある。
【００２０】
さらに、本発明のキャリブレーション処理方法の一実施態様において、前記球体は、発光球体であり、前記特徴点抽出ステップは、フレームに撮影された球体のエッジ情報に基づいて球体中心位置の算出処理を実行することを特徴とする。
【００２１】
さらに、本発明のキャリブレーション処理方法の一実施態様において、前記Ｆ行列算出ステップは、算出したＦ行列に基づくエピポーララインを設定し、設定したエピポーララインと前記特徴点としての球体中心位置との距離を個別にまたは平均距離として算出し、該算出距離が予め定めた閾値より大である場合は新たな特徴点を含む特徴点対応データに基づくＦ行列再算出処理を実行することを特徴とする。
【００２２】
さらに、本発明のキャリブレーション処理方法の一実施態様において、前記Ｆ行列算出ステップは、撮影画像フレーム内における球体中心位置、すなわち特徴点の分布状況判定処理を実行し、特徴点偏在が確認された場合、該偏在を解消する位置に追加設定した追加特徴点を含む特徴点対応データに基づくＦ行列再算出処理を実行することを特徴とする。
【００２３】
さらに、本発明のキャリブレーション処理方法の一実施態様において、前記特徴点抽出ステップは、撮影フレーム内に球体全体のエッジが存在する場合にのみ、該エッジ情報に基づく中心位置算出処理を実行することを特徴とする。
【００２４】
さらに、本発明のキャリブレーション処理方法の一実施態様において、前記特徴点抽出ステップは、前記Ｆ行列の９個の要素の比率を算出するために必要となる少なくとも８点の球体中心位置データを各カメラの複数の撮影画像フレームから取得し、前記対応付け処理ステップは、前記特徴点抽出ステップにおいて抽出された８点以上の球体中心位置の対応付け処理を実行し、前記Ｆ行列算出ステップは、対応付けのなされた８以上の特徴点対応データに基づいてキャリブレーションパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を構成する要素値算出を実行することを特徴とする。
【００２５】
さらに、本発明の第３の側面は、
多視点画像撮影カメラ間の調整、または多視点画像撮影カメラの取得画像の補正処理に適用するパラメータ算出処理を実行するコンピュータ・プログラムであって、
移動する球体を異なる視点方向から撮影した複数カメラの映像データを入力する画像入力ステップと、
前記画像入力ステップにおいて入力する複数カメラの映像データを構成する複数の撮影画像フレームから球体中心位置を特徴点として抽出する特徴点抽出ステップと、
前記特徴点抽出ステップにおいて抽出された球体中心位置の対応付け処理を実行する対応付け処理ステップと、
前記対応付け処理ステップにおいて対応付けのなされた特徴点対応データに基づいてキャリブレーションパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を算出するＦ行列算出ステップと、
を具備することを特徴とするコンピュータ・プログラムにある。
【００２６】
【作用】
本発明の構成によれば、多視点画像撮影カメラ間の調整、または多視点画像撮影カメラによる取得画像の補正処理に適用するパラメータ算出処理を実行するキャリブレーション処理において、発光球体を各カメラにより撮影する構成としたので、照明環境が不安定な撮影現場等でも球体検出と球体中心位置の取得を容易に行うことができ、球体中心位置を特徴点として対応付け処理を実行することで高精度なＦ行列の算出が可能となる。すなわち、多視点映像撮影システムを構成する実写カメラ間の位置関係を示すパラメータであるジオメトリ・パラメータ（ＧｅｏｍｅｔｒｙＰａｒａｍｅｔｅｒｓ）としてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の算出を高精度に行なうことが可能となり、高速かつ高精度なキャリブレーションが可能となる。
【００２７】
さらに、本発明の構成によれば、発光球体をキャリブレーション治具として用い、球体の中心位置をリアルタイムに検出し対応付けしながらＦ行列を推定し、推定されたＦ行列と対応点との誤差や位置関係などを評価することによって、必要となる対応点の数や三次元空間上に置くべき球体位置を指示し、より高精度なＦ行列を効率的に算出することが可能となる。
【００２８】
さらに、本発明の構成によれば、同期機能を備えた多視点映像撮影システムによって球体画像を観測撮影し、各パソコン上のメモリやハードディスクなどの記録媒体に収録しながらカメラキャリブレーションを行うことが可能であり、各パソコン上の画像処理手法によって、各フレーム画像における球体中心の座標位置（特徴点）抽出、抽出特徴点を用いた画像間の対応付けによるファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の推定、推定Ｆ行列の精度評価、対応点の数や対応点の適切な位置の判定、提示等を、例えばＰＣ上で実行可能であり、高速かつ高精度なキャリブレーション処理を、ＰＣ等、簡易なシステムにおいて実行することが可能となる。
【００２９】
なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体、例えば、ＣＤやＦＤ、ＭＯなどの記憶媒体、あるいは、ネットワークなどの通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。
【００３０】
本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づく、より詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。
【００３１】
【発明の実施の形態】
以下、本発明のカメラキャリブレーション装置、およびカメラキャリブレーション方法、並びにコンピュータ・プログラムについて、図面を参照しながら詳細に説明する。
【００３２】
まず、図１を参照して多視点カメラ撮影処理及び多視点カメラで撮影した実写映像を用いて仮想視点映像を生成する処理について説明する。被写体１００を、被写体１００周囲に配置した複数のカメラで撮影する。各カメラは、異なる視点からの被写体１００の画像を撮影する。実写画像は、配置するカメラ数に応じて取得される。しかし、各カメラ間の実写画像は取得できない。従って各カメラ間の画像、すなわち仮想視点の画像は、実写画像に基づく画像処理によって生成する。
【００３３】
例えばカメラＡ１０１と、カメラＢ１０２との間にカメラが配置されていない場合、カメラＡ１０１と、カメラＢ１０２との間の画像は、これらのカメラの２つの実写画像１１１，１１２に基づいて生成する。これは、実際には撮影されていない画像を生成する処理、いわゆるビューインターポレーション処理によって生成することができる。
【００３４】
ビューインターポレーションとは、複数のカメラからの画像から、実際のカメラのない視点から見える画像を生成する技術である。このビューインターポレーションについては、たとえば［Ｓ．Ｍ．ＳｅｉｔｚａｎｄＣ．Ｒ．Ｄｙｅｒ， ”ＶｉｅｗＭｏｒｐｈｉｎｇ，”Ｐｒｏｃ．ＳＩＧＧＲＡＰＨ９６，ＡＣＭ，１９９６ｐｐ．２１−３０．］に記載されている。ビューインターポレーションによれば、複数のカメラに基づく実際の取得画像に基づいて、カメラのない視点の画像の生成が可能となる。
【００３５】
例えば、カメラＡ１０１とカメラＢ１０２の間であたかも仮想カメラ１１３で撮影したような仮想視点カメラ画像１２１を生成するためには、カメラＡ１０１とカメラＢ１０２の特性の違いや相対的な位置関係を求め、これらの情報としてのパラメータに基づいて仮想視点画像の生成における各種補正を行なうことが必要になる。つまり、実写画像を撮影しているカメラの特性の差異を正確に把握した調整、すなわちカメラキャリブレーションを行う必要がある。
【００３６】
仮想視点映像生成に必要となるキャリブレーションパラメータを適用した補正処理には、図２に示すように歪係数（ＤｉｓｔｏｒｔｉｏｎＣｏｅｆｆｉｃｉｅｎｔ）、歪中心（ＤｉｓｔｏｒｔｉｏｎＣｅｎｔｅｒ）、アスペクト比（ＡｓｐｅｃｔＲａｔｉｏｎ）からなる歪曲収差パラメータに基づく歪パラメータ補正と、スケール因子、捻れ、画像中心データからなる内部パラメータ［Ａ］と、並進［Ｔ］および回転［Ｒ］ベクトルからなる外部パラメータ［Ｔ，Ｒ］に基づくカメラ内部・外部パラメータ補正とがある。カメラ歪みパラメータについては、一般的にチェッカパターンや直線等をキャリブレーションを実行する複数カメラで撮影し、チェッカパターンや直線等の撮影画像に基づいて各カメラ間の調整パラメータの推定が行われる。
【００３７】
カメラ内部・外部パラメータ［Ａ，Ｔ，Ｒ］は、実写カメラ間の位置関係を示すパラメータとしてのジオメトリ・パラメータ（ＧｅｏｍｅｔｒｙＰａｒａｍｅｔｅｒｓ）を示す行列式によって表現される。これをファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列（Ｆ行列）と呼ぶ。Ｆ行列は、下式（式１）によって示される。
【００３８】
【数１】

【００３９】
上記式によって示されるカメラ間のファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を求める方法については、既にいくつか提案されている。ファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列を求める一般的な方法について簡単に説明する。
【００４０】
図３は、一般的なＦ行列パラメータの推定方法を示す。まず、図３（ａ）に示すように、カメラ１，２３１とカメラ２，２３２によって、三次元空間にあるキャリブレーション治具としての画像パターン（例えば、チェッカパターン）を数回撮影し、撮影された画像Ｉ１，２４１（カメラ１，２３１実写画像）と、Ｉ２，２４２（カメラ２，２３２実写画像）から特徴点を抽出する。特徴点は、例えば、チェッカパターンの交差点ｍ１１，ｍ１２，．．．．．．と、ｍ２１，ｍ２２，．．．．．．を抽出する。
【００４１】
特徴点抽出後、図３（ｂ）または以下の式に示すように、抽出された特徴点ｍ１ｉとｍ２ｉ（ｉ＝１，２，．．．．．．Ｎ）との対応付けによって、Ｆ行列を推定することができる。
【００４２】
【数２】

【００４３】
上記式において、ｆを、Ｗ^ＴＷの最小固有値に対応する固有ベクトルとして算出することで、Ｆ行列の要素を求めることができる。
【００４４】
図４は、実際にＦ行列を推定するために、チェッカパターンをキャリブレーション治具として用いた場合のカメラキャリブレーション手順を示した図である。
【００４５】
まず、ステップＳ１０１において、キャリブレーション治具としてのチェッカパターンを異なる視点からカメラ１とカメラ２で撮影する。
【００４６】
次に、ステップＳ１０２において、カメラ１とカメラ２で撮影したチェッカパターン画像Ｉ１（ｘ，ｙ）とＩ２（ｘ，ｙ）における特徴点ｍ１（ｘ，ｙ）とｍ２（ｘ，ｙ）を抽出し、ステップＳ１０３において、各抽出特徴点ｍ１（ｘ，ｙ）とｍ２（ｘ，ｙ）間の対応付け処理を行う。
【００４７】
対応付け処理とは、同一対象物を撮影して得られる複数の画像における画素同士を対応づけるマッチング処理である。従来から、よく使われている「対応付け」の手法には、例えばＰｉｘｅｌ−ｂａｓｅｄマッチング、Ａｒｅａ−ｂａｓｅｄマッチング、Ｆｅａｔｕｒｅ−ｂａｓｅｄマッチングがある。Ｐｉｘｅｌ−ｂａｓｅｄマッチングとは、一方の画像における点の対応を、他方の画像でそのまま探索する方法である。Ａｒｅａ−ｂａｓｅｄマッチングとは、一方の画像における点の対応を、他方の画像で探す時、その点の周りの局所的な画像パターンを用いて探索する方法である。Ｆｅａｔｕｒｅ−ｂａｓｅｄマッチングとは、画像から濃淡エッジなどの特徴を抽出し、画像間の特徴だけを用いて対応付けを行う方法である。これらの方法を用いて複数の画像から抽出した特徴点に基づく画素同士の対応付けを行う。
【００４８】
最後に、ステップＳ１０４において、取得した複数の対応点に基づいて、前述の式（式２）を適用したＦ行列の算出を行なう。すなわち、抽出された特徴点ｍ１ｉとｍ２ｉ（ｉ＝１，２，．．．．．．Ｎ）の対応付けによって、ｆを、Ｗ^ＴＷの最小固有値に対応する固有ベクトルとして算出することで、Ｆ行列、すなわちファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列（＝Ｆパラメータ）を求める。
【００４９】
しかしながら、上述の方法においては、カメラとチェッカパターンプレートが正対していない場合、特徴点の抽出精度が低下するという問題がある。また、複数カメラの配置によって、各カメラで同時にチェッカパターンを観測できない場合が発生することや、画像間の特徴点の対応付けが撮影画像の品質に影響され、処理時間がかかるなどの問題がある。
【００５０】
これらの問題を解決する手法として、本発明では、平面的なチェッカパターンではなく移動する発光球体をキャリブレーションツールとして用い、この発光球体を異なる方向から撮影する複数のカメラを設置し、各カメラ映像の同期収録を行い、収録画像に基づいてファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列（＝Ｆパラメータ）を算出して、カメラキャリブレーションを実行する。本発明の構成によれば、平面のチェッカパターンを異なる方向から撮影することによる特徴点抽出精度の低下や、各カメラで同時にチェッカパターンを観測できないといった問題が発生することなく、効率的かつ高精度なカメラキャリブレーションを実行できる。
【００５１】
移動する発光球体をキャリブレーションツールとして用い、異なる視点から発行球体を撮影する構成例を図５に示す。また、各カメラの収録映像の同期記録処理装置構成例を図６に示す。
【００５２】
図５に示すように、カメラキャリブレーションを実行するカメラ３０１−１〜Ｎをキャリブレーションツールとしての発光球体３００の周囲に配列し、発光球体３００を移動させて、例えば３０フレーム／秒の映像を撮影する。図には、カメラ１，３０１−１、カメラ２，３０１−２、およびカメラｎ，３０１−ｎのそれぞれの取得画像（１フレーム）３０２−１，２，ｎを示している。
【００５３】
図５に示すような発光球体を用いる利点は、（１）任意背景・環境で観測された画像からその球体を容易に検出できることと、（２）任意の視点位置にあるカメラで球体を同じ形状（円形）として観測できることと、（３）画像内における球体の中心位置を精度よく推定できることと、（４）時系列画像から抽出された球体の中心を用いる画像間の対応点処理が容易にできることにある。
【００５４】
Ｆ行列を推定するためには、上述の式２に示すＦ行列を構成する９個の要素（ｆ_１１，ｆ_１２，・・ｆ_３３）の値を決定することが必要となる。これは、実質的にはＦ行列の９個の要素の比率を求めることになり未知数が８つの方程式の解を求めることに相当する。従って、多数の対応点（少なくとも８点以上）の関係式を設定することが必要となる。
【００５５】
本発明の構成では、移動する発光球体を撮影し、時間軸にそったフレーム毎に発光球体の中心位置を特徴点として抽出し、それら、時間軸にそったフレームから抽出される複数の特徴点を、あたかも一回の撮影画像から抽出されたと同様の扱いとして、画像間の対応付けに用いる。例えば３０フレーム／秒で５秒間撮影を実行すれば、３０×５＝１５０フレームを各カメラで撮影することが可能であり、１５０個の特徴点（＝球体中心位置）を抽出できる。
【００５６】
このような特徴点抽出と対応付けを自動的に行うために、図６に示すようにネットワーク同期機構を備えた複数カメラ映像の同期収録装置を適用する。
【００５７】
異なる視点からキャリブレーションツールとしての発光球体３００を撮影するカメラ１〜ｎのカメラ３０１は、同期信号発生器３２０より出力された同期信号（水平、垂直同期信号）を各カメラの外部同期入力端子を介して入力し、複数のカメラ１〜ｎにより観測された多視点の複数映像を同期映像として出力する。
【００５８】
各カメラ出力映像は、例えば輝度信号（Ｙ）とＲＧＢ色成分信号の差信号（Ｃ）としてのＹ／Ｃ信号等の映像信号であり、各カメラは、映像信号を各カメラに対応して設定されたＡＤコンバータ３３０に入力する。ＡＤコンバータ３３０では、カメラ３０１から入力する映像信号のデジタルデータへの変換、さらに必要に応じて圧縮データ（例えばモーションＪＰＥＧ等）への変換処理を実行し、処理後のデータをそれぞれのデータを記録する画像記録処理装置としてのクライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０、画像モニタ３６０に出力する。
【００５９】
画像記録処理装置としてのクライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０は、データを記録する記録媒体３５６、例えばハードディスク、ＤＶＤ、ＣＤ等を備えている。
【００６０】
クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０にＡＤコンバータ３３０から入力した映像信号は、クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０内の映像信号処理部３５５に入力される。映像信号処理部３５５は、例えば１３９４ＤＶキャプチャボードにより構成される。
【００６１】
クライアントＰＣ３５０を構成する複数のＰＣ１〜ＰＣｎは、汎用ＰＣ上で実行可能なＤＶキャプチャソフトによって、各カメラからのＤＶ映像をそれぞれのＰＣ内ハードディスク３５６に収録する。各クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０とサーバーＰＣ３７０は、各ＰＣに装着されたネットワークインタフェースとしてのネットワークカード３５７、およびネットワーク３８０によって接続されており、サーバーＰＣ３７０から各クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０に対して録画開始コマンドを出力することにより、各クライアントＰＣ３５０が一斉に録画を開始することが可能となる。
【００６２】
クライアントＰＣ３５０内の例えば１３９４ＤＶキャプチャボードにより構成される映像信号処理部３５５の構成例を図７に示す。
【００６３】
ＩＥＥＥ１３９４ポート（以下、１３９４ポート）３９４と、ＬＩＮＫ／ＰＨＹ３９５は、ＡＤコンバータ３３０によって処理されたデジタルビデオ（ＤＶ）信号を取り込む。１３９４ポート３９４は、ＤＶ信号を取り込む入力ポートであり、入力したＤＶ信号はＬＩＮＫ／ＰＨＹ３９５へ送出される。ＬＩＮＫ／ＰＨＹ３９５は、入力したＤＶ画像データをバッファ手段としてのＲＡＭ３９２に一次蓄積する。
【００６４】
ＲＡＭ３９２は、取り込まれたＤＶ画像データを記憶するバッファを備えた画像データ記憶手段として機能する。ＲＡＭ３９２に一次記録された画像データは、ＣＰＵ３９１の制御のもとにＨＤＤ、ＤＶＤ等の記憶手段３５６に記憶手段Ｉ／Ｆ３９６を介して出力され、格納される。前述したように、各クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０とサーバーＰＣ３７０は、各ＰＣに装着されたネットワークインタフェースとしてのネットワークカード３５７、およびネットワーク３８０によって接続されており、サーバーＰＣ３７０から各クライアントＰＣ（ＰＣ１〜ＰＣｎ）３５０に対して録画開始コマンドを出力することにより、各クライアントＰＣ３５０が一斉に録画、すなわち、ＨＤＤ、ＤＶＤ等の記憶手段３５６に対するデータ格納処理を開始することが可能である。
【００６５】
ＣＰＵ３９１は、画像データの撮り込み、格納に関する装置全体の制御を行ない、ＤＶ画像データの流れ全般を制御する画像データ制御手段として機能する。ＣＰＵ３９１は、ＲＯＭ３９３に記憶されたプログラムに従って、画像データの流れを制御する。
【００６６】
ＲＯＭ３９３に記憶されたプログラムは、例えば汎用ＰＣ上で実行可能なＤＶキャプチャソフトであり、サーバーＰＣからの録画開始コマンドによって、プログラムが実行され、クライアントＰＣが一斉に録画を開始する。
【００６７】
上述のように、同期信号発生器３２０からの基準ビデオ信号（同期信号）を各カメラ３０１−１〜ｎの外部同期端子に入力し、各カメラ３０１−１〜ｎのビデオ出力信号はＡＤ変換器３３０によってデジタル映像信号に変換され、キャプチャボードを通して、パソコン３５０に内蔵されたメモリやハードディスク等の記録媒体に保存される。各パソコンの実行環境などによって、画像収録のスタートを実行するまでのタイミングが必ずしも同じではないので、収録された各カメラ映像が通常同期されていない。そこで、各カメラ映像を保存する際、記録中の画像フレーム番号は、ネットワークを通して、計測ポイントデータとしてサーバー報告し、収録後の映像に対して、サーバー上に集計された計測ポイントデータを用いて、カメラ画像間の同期合わせを実現した。
【００６８】
図８を参照して、サーバーＰＣで実行する同期調整処理の詳細について説明する。まず、ステップＳ３０１において、サーバーＰＣは、各クライアントＰＣから入力した各計測点における各ＰＣの収録フレーム番号データを読込む。これは、例えば図８のデータ例（ａ）に示すデータである。
【００６９】
ここでは、サーバーＰＣが複数のクライアントＰＣ（ＰＣ１〜ＰＣｎ）から４回の計測点（計測点１〜４）における収録フレーム番号を取得したものとして説明する。
【００７０】
図８の（ａ）各クライアントＰＣから入力した各計測点における各ＰＣの収録フレーム番号のリストによれば、例えばＰＣ１は、
計側点１において、収録処理フレーム＝６、
計側点２において、収録処理フレーム＝１０、
計側点３において、収録処理フレーム＝１４、
計側点４において、収録処理フレーム＝１９、
を各計測点において、サーバーＰＣからの「現在収録フレーム番号保存」コマンドに応じて保存したフレーム番号であることを示している。
【００７１】
次に、サーバーＰＣは、ステップＳ３０２において、処理遅延量の大きいＰＣ、ここでは、計側点１において、フレームＮｏ．３を処理しているＰＣ２を基準ＰＣとして、各ＰＣにおける基準ＰＣからのフレームずれ量を算出する。
【００７２】
この処理の結果、図８に示す（ｂ）各計測点における各ＰＣ内収録映像のフレームずれ量が求められる。図８に示す例では、ＰＣ２が基準となり、ＰＣ２はすべての計測点においてずれ量＝０となる。また、例えばＰＣ１のずれ量は、各計測点において、ＰＣ１の収録処理フレーム−ＰＣ２の収録処理フレームによって求める。その結果、ＰＣ１のずれ量は、
計側点１において、ずれ量＝６−３＝３
計側点２において、ずれ量＝１０−７＝３
計側点３において、ずれ量＝１４−１１＝３
計側点４において、ずれ量＝１９−１５＝３
となる。
【００７３】
他のＰＣ、ＰＣ３〜ＰＣｎについてもＰＣ２との収録フレームＮｏ．との差分が、ずれ量として求められる。
【００７４】
次に、ステップＳ３０３において、基準ＰＣ＝ＰＣ２からの各ＰＣのずれ量の決定処理を行なう。これは、例えば各ＰＣの各計測点におけるずれ量の平均値を算出し、その平均値に基づく整数値を四捨五入により算出することにより実行する。
【００７５】
図８（ｃ）が、各ＰＣの各計測点におけるずれ量の平均値を算出したデータである。ＰＣ１のＰＣ２に対するずれ量は、
（３＋３＋３＋４）／４＝３．２５
となる。
【００７６】
図８（ｄ）が、各ＰＣの各計測点におけるずれ量の平均値を四捨五入したデータである。
ＰＣ１のＰＣ２に対するずれ量は３、
ＰＣｎのＰＣ２に対するずれ量は２となる。
このようにして、各クライアントＰＣの基準ＰＣ（ＰＣ２）からのずれ量が決定される。
【００７７】
次に、ステップＳ３０４において、サーバーＰＣは、算出した各ＰＣのずれ量に基づいて、各クライアントＰＣの同期調整処理を実行する。
【００７８】
各クライアントＰＣの各時刻１〜４における処理フレームＮｏ．を統一する処理として実行される。図８（ｅ）に示すように、時刻１〜４において、
時刻１においては、全てのＰＣの収録フレーム＝３
時刻２においては、全てのＰＣの収録フレーム＝７
時刻３においては、全てのＰＣの収録フレーム＝１１
時刻４においては、全てのＰＣの収録フレーム＝１５
とした設定が実行される。
【００７９】
例えばＰＣ１設定フレームＮｏ．は、基準ＰＣ（ＰＣ２）の設定フレームＮｏ．との差が３であると判定されるので、ＰＣ１設定フレームＮｏから差３を差し引いた値を、ＰＣ１における新たなフレームＮｏ．として設定する。また、ＰＣｎ設定フレームＮｏ．は、基準ＰＣ（ＰＣ２）の設定フレームＮｏ．との差が２であると判定されるので、ＰＣｎ設定フレームＮｏから差２を差し引いた値を、ＰＣ１における新たなフレームＮｏ．として各時刻において設定する。
【００８０】
このように各クライアントＰＣにおける同期調整処理を実行することにより、正確なフレーム間同期が可能となる。
【００８１】
上述した処理により各カメラ映像は完全に同期して取得することが可能となる。本発明のキャリブレーション装置は、上述した同期調整のなされた画像撮影フレームの各々について、特徴点としての発光球体中心の対応付け処理を実行し、前述の図４を参照して説明した手続きに従って、Ｆ行列を求める。
【００８２】
図９は、一台のカメラ（カメラ１，３０１−１）で観測した発光球体の時系列画像（ｔ＝１，……，ｋ）を示す。全てのカメラ映像から各フレーム画像における球体の中心を自動的に抽出し、対応付けを行うことによって、Ｆ行列を高速かつ精度よく推定することができる。
【００８３】
本発明の構成においては、発光球体の中心点を、対応付け処理を実行するための特徴点とする。この発光球体の中心点を求める処理を各カメラ撮影画像の各フレームにおいて実行する。Ｆ行列の要素を決定するためには、前述したように最低でも８つの特徴点の対応付けが必要となり、少なくとも８フレームにおいて、キャリブレーションを実行するカメラの撮影画像から特徴点抽出を行う。この特徴点抽出処理として行われる発光球体の中心位置の判定処理について、図１０を参照して説明する。
【００８４】
図１０の処理フローに従って、本発明の構成における特徴点抽出処理として実行される発光球体の中心点位置判定処理の手順を説明する。図１０は、１つのフレーム画像からの球体の中心画素位置を求める処理フローを示している。まず、ステップＳ５０１において、発光球体の撮影画像フレームを入力する。
【００８５】
ステップＳ５０２において、入力画像から球体のエッジを検出する。なお、エッジ画素の座標値が取得フレーム画像の４辺に接している場合、球体の一部が欠けていると判定し、そのフレーム（時刻）における特徴点の抽出は行わない。この判定処理がステップＳ５０３である。
【００８６】
入力フレームに発光球体全体が画像として観測されている場合（ステップＳ５０３：Ｙｅｓ）、ステップＳ５０４において、エッジ画素の座標値に基づく投影処理により球体の初期中心位置を判定する。具体的には、球体左右エッジ上画素のｘｙ座標値の差をそれぞれＸ軸とＹ軸に投影し、ＸとＹに対する分布を求め、それぞれの最大となる位置を球体の初期中心（ｘｓ，ｙｓ）および大きさ（半径）とする。
【００８７】
ステップＳ５０５において、推定された初期中心と大きさを、円形モデルに適用し、そのモデルから大きく外れたエッジ上の画素点を例外点（Ｏｕｔｌｉｅｒ）として削除する。最後に、ステップＳ５０６において、例外点（Ｏｕｔｌｉｅｒ）削除後のエッジ画素点を円形モデルに適用し、球体の中心位置（及び大きさ）を判定する。
【００８８】
図１１は、隣接カメラで観測された時系列画像から、それぞれ球体の中心位置を検出し、あたかも一枚の画像から抽出された特徴点のように設定する処理例を示した図である。
【００８９】
発光球体５００を異なる方向から撮影するカメラ１，５０１と、カメラ２，５０２間の調整処理のためのカメラキャリブレーション用のＦ行列を求めるとする。この場合、前述したように複数フレームから複数の特徴点（最低８個）を抽出することが必要となる。
【００９０】
本発明の構成においては、特徴点は発光球体撮影画像における発光球体の中心位置であり、各フレームにおける発光球体の中心位置を特徴点として抽出する。カメラ１，５０１と、カメラ２，５０２との撮影フレームｔ＝１〜ｋの各フレーム画像は、図６を参照して説明した同期調整処理がなされた画像であり、各対応フレームは同期した画像である。ｔ＝１〜ｋのｋ個のフレームについて、図１０を参照して説明した球体の中心位置座標判定処理を実行することで、各カメラ毎にｋ個の特徴点が求まることになる。ただし、前述したように球体画像がフレーム画像からはみ出ている場合は、中心を求めることができないので、最大ｋ個の特徴点となる。
【００９１】
前述したように少なくとも８個の特徴点を求め、その対応付け処理を実行することにより、９個の要素からなるＦ行列の要素を決定することができるので、キャリブレーション対象となる２つのカメラの同期フレーム画像から少なくとも発光球体の中心位置を求めることが可能な８フレームを選択して、その特徴点（球体中心位置）を求める。それらを１つにまとめると、図１１に示す特徴点分布データ５１１，５１２が得られる。
【００９２】
特徴点分布データ５１１は、カメラ１，５０１の撮影フレーム中のｔ＝１，２，．．ｋの各フレームの特徴点（球体中心位置）の分布を示している。ｐ１（１）は、ｔ＝１のフレームにおける球体中心位置であり、ｐ１（２）は、ｔ＝２のフレームにおける球体中心位置、ｐ１（ｋ）は、ｔ＝ｋのフレームにおける球体中心位置である。
【００９３】
特徴点分布データ５１２は、カメラ２，５０２の撮影フレーム中のｔ＝１，２，．．ｋの各フレームの特徴点（球体中心位置）の分布を示している。ｐ２（１）は、ｔ＝１のフレームにおける球体中心位置であり、ｐ２（２）は、ｔ＝２のフレームにおける球体中心位置、ｐ２（ｋ）は、ｔ＝ｋのフレームにおける球体中心位置である。
【００９４】
これらの特徴点分布データ５１１，５１２の対応特徴点、すなわち、ｐ１（１）とｐ２（１）、ｐ１（２）とｐ２（２）、・・ｐ１（ｋ）とｐ２（ｋ）との各特徴点についての対応付け処理を、先に説明した図４のフローの処理手順に従って実行し、前述の式（式２）を適用してＦ行列を求めることができる。
【００９５】
図１２は、複数フレームからなる時系列画像に基づいて、それぞれの画像フレームから特徴点としての球体中心位置を求める処理手順を示すフローチャートである。図１２に示すフローチャートの各処理ステップについて説明する。ここでは、特徴点抽出処理対象のカメラＮｏ．をｍとし、カメラＮｏ．ｍが撮影した画像フレーム数がｋであるとする。
【００９６】
ステップＳ７０１において、ｍ番目（ｍ＝１，２，……，Ｎ）のカメラの観測（撮影）画像Ｉｍ（ｔ）（ｔ＝１，２，・・，ｋ）をキャリブレーション処理装置に入力する。
【００９７】
ステップＳ７０２において、順次、各フレームについての処理を実行するための処理フレームの初期値としてｔ＝１を設定する。ステップＳ７０３において、カメラ映像Ｉｍ（ｔ）を読み込み、先に図１０を参照して説明した手順に従って、読み込みフレームにおける球体の検出およびその中心位置の推定処理を実行する。
【００９８】
処理対象画像内に球体の画像が検出されないか、または球体全体が含まれていない場合（ステップＳ７０４：Ｎｏ）には、前述したように球体中心位置推定処理が実行されないが、キャリブレーション処理装置のメモリには、その結果、すなわち、中心位置が求められていないことを示すフレーム対応データとして、ステップＳ７１２においてＰｍ（ｔ）＝Ｎｕｌｌが保存される。
【００９９】
画像内に球体全体が検出された場合、先に説明した図１０の手法に従って球体の中心位置の推定が実行され、その結果を、ステップＳ７０５において、フレーム対応の中心位置データＰｍ（ｔ）としてメモリに保存する。
【０１００】
ここで、Ｐｍ（ｔ）は、ｍ番目のカメラ映像におけるｔ（ｔ＝１，……，ｎ）フレーム目の球体中心位置を意味する。ステップＳ７０６では、全フレームの処理が終了したか否か、すなわち［ｔ＜ｋ？］の判定処理が実行され、未処理フレームがある場合は、ステップＳ７１１において、［ｔ＝ｔ＋１］の更新処理を実行し、ステップＳ７０３以下において、球体中心位置の推定処理、保存処理が実行される。
【０１０１】
図１２に示す処理をＮ台のカメラ全てについて実行し、全てのカメラ映像から複数フレームにおけるボール（発光球体）の中心位置Ｐｍ（ｔ）を求める。
【０１０２】
キャリブレーション処理を実行する隣接カメラ、または離れたカメラの撮影フレームから中心位置が求められなかったフレームを除くフレームの組を用いて、図１１に示す特徴点分布データの組を構成し、構成した特徴点分布データにおける対応フレームの特徴点対応付け処理を実行して、Ｆ行列を算出する。特徴点の対応付けによるＦ行列の算出処理は、先に図３、図４を参照して説明したと同様の手法によって実行される。Ｆ行列が決定されると、Ｆ行列を用いて各カメラの撮影画像の補正処理が可能となり、複数カメラの実際の撮影画像から、高精度な仮想視点カメラ画像を生成することが可能となる。
【０１０３】
図１３に本発明のキャリブレーション処理装置の機能構成を説明するブロック図を示す。本発明のキャリブレーション処理装置において実行する各種の処理は、コンピュータ・プログラムに従った処理として実行可能である。具体的には、制御手段としてのＣＰＵの制御の下に様々な処理が実行される。図１３は、制御手段としてのＣＰＵの実行する処理を個別に説明するものであり、本発明のキャリブレーション処理装置の実行する機能を説明するブロック図である。具体的なハードウェア構成については後段で説明する。
【０１０４】
図１３のブロック図について説明する。画像入力部６０１は、発光球体を撮影した画像データを入力する。キャリブレーション処理は、多くの場合、２つのカメラ間の画像調整を目的とするものであり、発光球体を異なる視点から撮影した２つのカメラのフレーム画像が入力される。この入力フレーム画像は、先に図６を参照して説明した構成において取得された画像であり、それぞれのカメラの各フレーム画像は同期された画像データである。
【０１０５】
特徴点抽出部６０２は、図１０乃至図１２を参照して説明した手順に従って、各フレームの発光球体の中心位置を特徴点として求めるものである。対応付け処理部６０３は、各カメラで撮影された画像に基づいて求められた複数の特徴点からなる特徴点分布データ（例えば図１１に示す特徴点分布データ５１１，５１２）に基づいて、特徴点の対応付け処理を実行する。この対応付け処理は、例えばＰｉｘｅｌ−ｂａｓｅｄマッチング、Ａｒｅａ−ｂａｓｅｄマッチング、Ｆｅａｔｕｒｅ−ｂａｓｅｄマッチング等によって実行される。
【０１０６】
対応付け処理が実行されると、次に、Ｆ行列算出部６０４において、対応付けデータに基づくＦ行列算出処理が実行される。対応付けデータに基づくＦ行列算出処理は、先に図３、図４を参照して説明した処理に従って実行されるものであり、前述した式（式２）に従って、Ｆ行列が求められる。
【０１０７】
求められたＦ行列は、仮想視点画像生成部６０５に出力され、Ｆ行列に基づく２つのカメラの取得画像のキャリブレーション、すなわち補正処理が実行され、補正画像に基づいて２つのカメラの間の実際には撮影されていない仮想視点カメラの画像の生成が行われる。
【０１０８】
なお、算出されたＦ行列に基づいてカメラ自体の調整処理としてのキャリブレーションを実行してもよい。取得した画像の補正を行うか、カメラ自体の調整を行うかは選択的事項である。
【０１０９】
先に説明したように、同じ対象を注目している視点の異なる複数カメラで撮影した多視点映像から、それらの実写カメラの間に、あたかも仮想カメラで撮影したような任意の仮想視点映像を生成し表現するためには、実写カメラ間の位置関係を正確に求めることが必要となる。実写カメラ間の位置関係を示すパラメータとしてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列により、２つのカメラの正確な位置関係が取得され、求められた位置関係に基づく撮影画像の補正、合成が高精度に実行可能となり、２つのカメラの間の実際には撮影されていない仮想視点カメラの高精度な画像生成が可能となる。
【０１１０】
次に、図１４に示すフローチャートを参照して、本発明のキャリブレーション処理装置の実行する最適なＦ行列の算出処理手順について説明する。
【０１１１】
ステップＳ８０１において、まず、発光球体をキャリブレーション対象となる各カメラが観測できる位置に置く。例えば、各カメラの光軸が交わるところに発光球体を置き移動させる。ステップＳ８０２において、各カメラにより移動する発光球体を撮影する。この撮影処理においては、図５〜８を参照して説明した通り、各カメラの撮影フレームが同期したフレームとして取得される。
【０１１２】
ステップＳ８０３において、各カメラの撮影画像から、特徴点としての球体中心位置を検出する。球体中心位置の検出は、図９、図１０を参照して説明した手順に従って実行する。
【０１１３】
ステップＳ８０４において、検出された球体中心位置を各カメラ画像の特徴点として、キャリブレーション対象カメラ画像間の対応付け処理を行う。特徴点対応付け処理は、図１１、図１２を参照して説明した手順に従って実行する。
【０１１４】
ステップＳ８０５において、対応点の数が８点に達しているかいないかを判定する。前述したようにＦ行列の９個の要素の比率を算出するためには、８つの特徴点の対応付けが必要となる。対応点数が８未満である場合は、ステップＳ８０１に戻り、さらに発光球体を動かして球体の位置を変えながら、キャリブレーション処理対象の複数カメラによる撮影画像を取得する。
【０１１５】
対応点の数が８点以上に達した場合、ステップＳ８０６に進み、特徴点の対応点に基づいてＦ行列を算出する。Ｆ行列の算出は、図２〜図４を参照して説明したように、式（式２）に従って行われる。なお、カメラによる撮影画像は、１回の撮影処理において、例えば３０フレーム／秒で数秒間撮影されることになり、１００以上のフレームが取得され、これら多数のフレームから特徴点抽出処理が実行される。
【０１１６】
従って、対応付け処理対象となる特徴点を１００個以上の特徴点からランダムに選択し、選択点に基づいて特徴点の対応付け処理がなされる。さらに対応付けされた特徴点に基づいてＦ行列の算出が実行される。しかし、特徴点の対応付けはあくまで画像間の相関に基づいて行われるものであるため、対応付け処理が誤って実行される場合も多い。このような誤りがあると正確なＦ行列が求まらなくなる。従って誤って対応付けのなされた特徴点の組み合わせを排除して、正確なＦ行列を求める処理が必要となる。
【０１１７】
ステップＳ８０７以下の処理は、誤った対応付けのなされた特徴点等、Ｆ行列の算出に不適当と判断される特徴点を排除する処理である。
【０１１８】
ステップＳ８０７において、算出したＦ行列を用いてエピポーララインを算出する。さらに、ステップＳ８０８において、Ｆ行列推定に使われた対応点とエピポーララインとの距離（誤差）を算出する。その誤差が、予め定めた閾値より大きい場合、球体位置を変えて、その中心を検出し、新しい対応点を求める。
【０１１９】
図１５および図１６を参照して、ステップＳ８０７，８０８の処理、すなわち、エピポーララインの生成と、Ｆ行列推定に使われた対応点とエピポーララインとの距離（誤差）の算出処理について説明する。まず、エピポーララインの生成について説明する。エピポーララインは２つのカメラの位置関係に基づいて設定されるラインであり、一般的にはステレオ法による視差検出により生成される。
【０１２０】
ステレオ法について、その原理を簡単に説明する。ステレオ法は複数のカメラを用いて２つ以上の視点（異なる視線方向）から同一対象物を撮影して得られる複数の画像における画素同士を対応づけることで計測対象物の三次元空間における位置を求めようとするものである。例えば基準カメラと検出カメラにより異なる視点から同一対象物を撮影して、それぞれの画像内の計測対象物の距離を三角測量の原理により測定する。
【０１２１】
図１５は、ステレオ法の原理を説明する図である。基準カメラ（Ｃａｍｅｒａ１）と検出カメラ（Ｃａｍｅｒａ２）は異なる視点から同一対象物を撮影する。基準カメラによって撮影された画像中の「ｍｂ」というポイントの奥行きを求めることを考える。
【０１２２】
基準カメラによる撮影画像中のポイント「ｍｂ」に見える物体は、異なる視点から同一物体を撮影している検出カメラによって撮影された画像において、「ｍ１」、「ｍ２」、「ｍ３」のようにある直線上に展開されることになる。この直線をエピポーラライン（Ｅｐｉｐｏｌａｒｌｉｎｅ）Ｌｐと呼ぶ。
【０１２３】
基準カメラにおけるポイント「ｍｂ」の位置は、検出カメラによる画像中では「エピポーラライン」と呼ばれる直線上に現れる。撮像対象となる点Ｐ（Ｐ１，Ｐ２，Ｐ３を含む直線上に存在する点）は、基準カメラの視線上に存在する限り、奥行きすなわち基準カメラとの距離の大小に拘らず、基準画像上では同じ観察点「ｍｂ」に現れる。これに対し、検出カメラによる撮影画像上における点Ｐは、エピポーラ・ライン上に基準カメラと観察点Ｐとの距離の大小に応じた位置にあらわれる。
【０１２４】
本発明の構成においては、キャリブレーション対象となる２つのカメラの撮影画像に基づいて特徴点抽出を行い、抽出した特徴点に基づいて、ステップＳ８０６においてＦ行列を算出した。このＦ行列は、２つのカメラ間の位置関係を示すパラメータであり、この位置関係パラメータに基づいてエピポーララインの設定が可能となる。
【０１２５】
図１６を参照してエピポーララインの設定について説明する。図１６（ａ），（ｂ）は、キャリブレーション処理対象となるカメラの撮影した複数フレームにおいて求めた特徴点（球体中心）をまとめて示した図である。これは、図１１に示した特徴点分布データ５１１，５１２に相当する。さらに、（ｂ）には、図１４に示すフローのステップＳ８０６において求めたＦ行列に基づいて設定したエピポーララインＬ１，Ｌ２，Ｌｉを示している。これらのエピポーララインは、Ｆ行列算出のために対応付け処理を実行した例えば８つの特徴点に基づいて設定されたエピポーララインである。
【０１２６】
例えばカメラ１と２の画像における対応点ｍｌｉ（ｉ＝１，……，Ｐ）とｍ２ｉ（ｉ＝１，……，Ｐ）を用いて推定されたＦ行列から、次の式（式３）、（式４）によってカメラ１画像内の各特徴点（対応点）ｍ１ｉ（ｉ＝１，……，Ｐ）に対して、カメラ２画像上でのエピポーララインＬｉが得られる。同様に、カメラ２画像内の各特徴点に対して、カメラ１画像でのエピポーララインが得られる。
【０１２７】
【数３】

【０１２８】
上記式（式３）に基づいて、下式（式４）に示される１つの直線（エピポーラライン）が設定される。
【０１２９】
【数４】

【０１３０】
しかし、Ｆ行列算出に用いた特徴点は、２つのカメラの撮影した多数フレームに存在する多数の特徴点からランダムに選択された特徴点である。Ｆ行列が正確に算出されていれば、特徴点は、Ｆ行列に基づいて設定されるエピポーラライン上にのることになるが、特徴点の正確な抽出あるいは正確な対応付けが実行されていないと、特徴点がエピポーララインからずれた位置になる。ずれが大きい場合は特徴点対応付け処理が不正確であり、Ｆ行列もまた不正確なものであると判定される。
【０１３１】
このＦ行列の正確度を判定する指標として、特徴点と、エピポーララインとの距離Ｄｉの算出を実行する。図１６に示すように、Ｆ行列に基づいて設定されたエピポーララインＬ１，Ｌ２・・，Ｌｉと、本来、ライン上にあるべき対応点としての各特徴点ｍ２１，ｍ２２，．．ｍ２ｉとの距離Ｄ１，Ｄ２，．．Ｄｉを求める。
【０１３２】
ステップＳ８０９において、各算出距離Ｄ１，Ｄ２，．．Ｄｉと、予め定めた閾値と比較する。閾値より大の距離を持つ特徴点ｍ２ｉがあった場合は、ステップＳ８１０において、その特徴点の組、すなわち、カメラ１とカメラ２の特徴点の組ｍ１ｉ，ｍ２ｉを対応付け対象から除いて、ステップＳ８０６において、その他の対応付け可能な特徴点の対応付け処理データに基づいて、再度Ｆ行列を算出する。
【０１３３】
ステップＳ８０９において、各算出距離Ｄ１，Ｄ２，．．Ｄｉのすべてが予め定めた閾値以下となった場合は、ステップＳ８１１において、各算出距離Ｄ１，Ｄ２，．．Ｄｉの平均値を算出し、平均値と予め定めた第２の閾値とを比較する。距離平均値が第２の閾値以上となった場合は、再度ステップＳ８０６において、その他の対応付け可能な特徴点の対応付け処理データに基づいて、再度Ｆ行列を算出する。
【０１３４】
ステップＳ８１１において、各算出距離Ｄ１，Ｄ２，．．Ｄｉの平均値が第２の閾値未満となった場合は、ステップＳ８１２に進み、特徴点の分布の判定を行う。特徴点分布判定処理について、図１７を参照して説明する。対応付け処理を実行する特徴点が撮影画像フレームの一部に偏っている場合、各カメラ間の正確な位置関係を求めることは困難となる。すなわち正確なＦ行列の算出は困難となる。そこで対応付け処理を実行する特徴点分布を撮影画像フレーム全体に散在するものとして、より正確なＦ行列算出を行うため、特徴点分布を確認し、特徴点の偏在がある場合には、離れた位置に新たな特徴点を設定して、再度Ｆ行列算出を行う。
【０１３５】
図１７に示すように、カメラ１の画像上の対応点ｍ１ｉ（ｉ＝１，……，Ｐ）及びカメラ２の画像上の対応点ｍ２ｉ（ｉ＝１，……，Ｐ）の座標値を用いて、下式（式５）に示すように、それぞれの座標分布（ｘ、ｙの平均値）を求める。
【０１３６】
【数５】

【０１３７】
上記各式で求められた各カメラの座標分布が、それぞれ画像中心（ｘｃ１，ｙｃ１）、（ｘｃ２，ｙｃ２）から離れている場合、特徴点の偏在があると判定し、新たな特徴点の追加を行う。具体的には、上記各式で求められた各カメラの座標分布値と画像中心（ｘｃ１，ｙｃ１）、（ｘｃ２，ｙｃ２）との距離を、予め定めた第３の閾値と比較して閾値より大である場合に特徴点分布に偏りがあると判定し、ステップＳ８１３からステップＳ８１４に進み、必要となる特徴点位置を設定する。指示する追加特徴点は、上述の式（式５）において求められる座標分布の偏りを減少させる位置に設定される。例えば図１７に示す点ｍ１ｋ、ｍ２ｋである。
【０１３８】
ステップＳ８１５では、指示された特徴点を取得可能な位置に球体を移動し、その後、ステップＳ８０２に戻り、再度、球体のカメラ撮影を実行して、追加特徴点を加えて特徴点対応付け処理、Ｆ行列算出処理を実行する。
【０１３９】
ステップＳ８１３の特徴点の偏り判定において、特徴点の偏りがないと判定されると、求めたＦ行列を最終的なＦ行列として処理を終了する。
【０１４０】
このように、画像観測−＞球体検出−＞球体中心推定−＞対応付け−＞Ｆ行列推定−＞エピポーラライン算出−＞対応点とエピポーララインとの誤差計算−＞誤差値評価といった処理手順は、誤差評価がある閾値より小さくなるまで繰り返す。さらに、上記の誤差値がある閾値より小さくなった場合、画像間の視差情報を用いて、対応点の位置（つまり、球体の空間位置）が偏っているかどうかを判定する。もし、対応点が偏っていた場合、球体の置くべき空間位置を提示し、上記の繰り返し作業を行う。その結果、照明環境などが不安定な現場撮影でも高精度なカメラキャリブレーションを行うことが可能となる。
【０１４１】
上述したように、図１４を参照して説明した処理手順に従えば、キャリブレーション対象カメラの取得した球体の画像から求められる球体中心位置を特徴点として対応付け、Ｆ行列を算出する処理において、対応付け設定を行う特徴点の選択誤りや、対応付け処理における誤りが発生した場合であっても、その誤りを訂正することが可能となる。また特徴点が偏在している場合にも、偏在状況を確認し、偏在しない位置に特徴点を追加して新たなＦ行列を算出することができるので、より高精度なＦ行列を算出することが可能となる。従って、高精度なＦ行列に基づくカメラキャリブレーション、あるいは取得画像の補正処理、および補正処理に基づく高精度な仮想視点画像の生成が可能となる。
【０１４２】
次に、図１８を参照して、本発明に係るキャリブレーション処理装置の具体的なハードウェア構成例について説明する。ＣＰＵ（ＣｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１は、上述した各フローチャートを参照して説明した処理プログラムや、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）を実行するプロセッサである。ＲＯＭ（Ｒｅａｄ−Ｏｎｌｙ−Ｍｅｍｏｒｙ）９０２は、ＣＰＵ９０１が実行するプログラム、あるいは演算パラメータとしての固定データを格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３は、ＣＰＵ９０１の処理において実行されるプログラム、およびプログラム処理において適宜変化するパラメータの格納エリア、ワーク領域として使用される。ＨＤＤ９０４はハードディスクの制御を実行し、ハードディスクに対する各種データ、プログラムの格納処理および読み出し処理を実行する。
【０１４３】
バス９１０はＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｎｅｔ／Ｉｎｔｅｒｆａｃｅ）バス等により構成され、各モジュール、入出力インタフェース９１１を介した各入手力装置とのデータ転送を可能にしている。
【０１４４】
入力部９０５は、画像データの入力部、およびキーボード、ポインティングデバイス等によって構成され、キャリブレーションを実行するカメラによって取得された画像データを入力する他、ＣＰＵ９０１に各種のコマンド、データを入力する。出力部９０６は、撮影画像、特徴点抽出処理画像、あるいはＦ行列算出後に、カメラ取得画像に基づいて生成した仮想視点画像等を表示する例えばＣＲＴ、液晶ディスプレイ等である。
【０１４５】
通信部９０７は他デバイスとの通信処理を実行する。例えば図６に示す画像の同期取得処理を実行するシステムにおいて取得した複数のカメラの映像を入力する。入力画像に基づいて上述のＦ行列算出処理、仮想視点画像生成処理等が制御部としてのＣＰＵ９０１の制御の下に実行される。なお、処理対象とする画像データは、通信部を介して入力するばかりでなく、入力部９０５に構成したＡ／Ｖ入力部を介して入力してもよく、また、ドライブ９０８に接続されたＨＤＤ、ＣＤ、ＤＶＤ等のリムーバブル記録媒体９０９に格納された画像データを処理対象画像として入力する構成としてもよい。
【０１４６】
ドライブ９０８は、フレキシブルディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＭＯ（Ｍａｇｎｅｔｏｏｐｔｉｃａｌ）ディスク，ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、磁気ディスク、半導体メモリなどのリムーバブル記録媒体９０９の記録再生を実行するドライブであり、各リムーバブル記録媒体９０９からのプログラムまたはデータ読み取り、リムーバブル記録媒体９０９に対するプログラムまたはデータ格納処理を実行する。
【０１４７】
以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。
【０１４８】
なお、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。
【０１４９】
例えば、プログラムは記憶媒体としてのハードディスクやＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に予め記録しておくことができる。あるいは、プログラムはフレキシブルディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＭＯ（Ｍａｇｎｅｔｏｏｐｔｉｃａｌ）ディスク，ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、磁気ディスク、半導体メモリなどのリムーバブル記録媒体に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウエアとして提供することができる。
【０１５０】
また、プログラムは、上述したようなリムーバブル記録媒体からコンピュータにインストールする他、ダウンロードサイトから、コンピュータに無線転送したり、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを受信し、内蔵するハードディスク等の記憶媒体にインストールすることができる。
【０１５１】
なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。
【０１５２】
【発明の効果】
以上、説明したように、本発明の構成によれば、多視点画像撮影カメラ間の調整、または多視点画像撮影カメラによる取得画像の補正処理に適用するパラメータ算出処理を実行するキャリブレーション処理において、発光球体を各カメラにより撮影する構成としたので、照明環境が不安定な撮影現場等でも球体検出と球体中心位置の取得を容易に行うことができ、球体中心位置を特徴点として対応付け処理を実行することで高精度なＦ行列の算出が可能となる。すなわち、多視点映像撮影システムを構成する実写カメラ間の位置関係を示すパラメータであるジオメトリ・パラメータ（ＧｅｏｍｅｔｒｙＰａｒａｍｅｔｅｒｓ）としてのファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の算出を高精度に行なうことが可能となり、高速かつ高精度なキャリブレーションが可能となる。
【０１５３】
さらに、本発明の構成によれば、照明環境等に影響しにくい発光球体をキャリブレーション治具として用い、球体の中心位置をリアルタイムに検出し対応付けしながらＦ行列を推定し、推定されたＦ行列と対応点との誤差や位置関係などを評価することによって、必要となる対応点の数や三次元空間上に置くべき球体位置を指示し、より高精度なＦ行列を効率的に算出することが可能となる。
【０１５４】
さらに、本発明の構成によれば、同期機能を備えた多視点映像撮影システムによって球体画像を観測撮影し、各パソコン上のメモリやハードディスクなどの記録媒体に収録しながらカメラキャリブレーションを行うことが可能であり、各パソコン上の画像処理手法によって、各フレーム画像における球体中心の座標位置（特徴点）抽出、抽出特徴点を用いた画像間の対応付けによるファンダメンタル（Ｆｕｎｄａｍｅｎｔａｌ）行列の推定、推定Ｆ行列の精度評価、対応点の数や対応点の適切な位置の判定、提示等を、例えばＰＣ上で実行可能であり、高速かつ高精度なキャリブレーション処理を、ＰＣ等、簡易なシステムにおいて実行することが可能となる。
【図面の簡単な説明】
【図１】多視点カメラ撮影処理及び多視点カメラで撮影した実写映像を用いて仮想視点映像を生成する処理について説明する図である。
【図２】キャリブレーション処理に適用する各種パラメータを説明する図である。
【図３】一般的なＦパラメータ推定方法を示す図である。
【図４】Ｆ行列を推定するために、チェッカパターンをキャリブレーション治具として用いた場合のカメラキャリブレーション手順を示した図である。
【図５】発光球体をキャリブレーションツールとして用い、異なる視点から発光球体を撮影する構成例を示す図である。
【図６】ネットワーク同期機構を備えた複数カメラ映像の同期収録構成について説明する図である。
【図７】ＤＶキャプチャボードにより構成される映像信号処理部の構成例について説明する図である。
【図８】サーバーＰＣで実行する同期調整処理の詳細について説明する図である。
【図９】一台のカメラで観測した発光球体の時系列画像（ｔ＝１，……，ｋ）について説明する図である。
【図１０】特徴点抽出処理として実行される発光球体の中心点位置判定処理の手順を説明する図である。
【図１１】隣接カメラで観測された時系列画像から、それぞれ球体の中心位置を検出し、一枚の画像から抽出された特徴点のように設定する処理を説明する図である。
【図１２】複数フレームからなる時系列画像に基づいて、それぞれの画像フレームから特徴点としての球体中心位置を求める処理手順を示すフローチャートを示す図である。
【図１３】本発明のキャリブレーション処理装置の機能構成を説明するブロック図である。
【図１４】本発明のキャリブレーション処理装置の実行する最適なＦ行列の算出処理手順について説明する図である。
【図１５】エピポーララインについて説明する図である。
【図１６】エピポーララインの生成と、Ｆ行列推定に使われた対応点とエピポーララインとの距離（誤差）の算出処理について説明する図である。
【図１７】特徴点分布判定処理について説明する図である。
【図１８】本発明に係るキャリブレーション処理装置の具体的なハードウェア構成例について説明する図である。
【符号の説明】
１００被写体
１０１，１０２カメラ
１１１，１１２実写画像
１１３仮想視点カメラ
１２１仮想視点カメラ画像
３０１カメラ
３０２実写画像
３０１カメラ
３２０同期信号発生部
３３０Ａ／Ｄコンバータ
３５０クライアントＰＣ
３５５映像信号処理部
３５６記憶手段
３５７ネットワークカード
３６０画像モニタ
３７０サーバーＰＣ
３８０ネットワーク
３９１ＣＰＵ
３９２ＲＡＭ
３９３ＲＯＭ
３９４１３９４ポート
３９５ＬＩＮＫ／ＰＨＹ
３９６記憶手段
５００発光球体
５０１，５０２カメラ
５１１，５１２特徴点分布データ
６０１画像入力部
６０２特徴点抽出部
６０３対応付け処理部
６０４Ｆ行列算出部
６０５仮想視点画像生成部
９０１ＣＰＵ
９０２ＲＯＭ
９０３ＲＡＭ
９０４ＨＤＤ
９０５入力部
９０６出力部
９０７通信部
９０８ドライブ
９０９リムーバブル記録媒体
９１０バス
９１１入出力インタフェース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a calibration processing device, a calibration processing method, and a computer program. More specifically, a parameter applied to adjustment between multi-view image capturing cameras that generates a multi-view image by shooting a subject with a plurality of cameras arranged at different positions, or correction processing of an acquired image of the multi-view image capturing camera The present invention relates to a calibration processing device that performs a calculation process, a calibration processing method, and a computer program.
[0002]
[Prior art]
2. Description of the Related Art In recent years, use of image data, such as panorama and omnidirectional images, in which a viewpoint can be variously moved has become active. For example, when an image of a subject captured from a plurality of viewpoint positions and line-of-sight directions is stored in a storage medium such as a DVD or a CD, and the stored image is displayed on a CRT or a liquid crystal display device, the user operates the controller to operate the controller. A system for observing an image of a subject by moving a viewpoint to a free position has been realized. In addition, images obtained by photographing a subject from a plurality of viewpoint positions and line-of-sight directions are distributed via a communication system such as the Internet, and a user operates a mouse on a PC or the like to display an image from a desired viewpoint position and line-of-sight direction on a display. A display system and the like are constructed.
[0003]
With the improvement of computer processing capability and the development of various video media playback devices, it became possible to process huge polygon data and video data (content), which were previously difficult, and images were taken with multiple cameras with different viewpoints. 2. Description of the Related Art It has become possible to process video data of the real world (target) by a computer, a video device, or the like, and generate and present a video of an arbitrary viewpoint according to a user's request in real time.
[0004]
In order to generate and present an arbitrary viewpoint video from images captured from a plurality of viewpoints by processing on a computer, (1) all cameras must be able to see the same area (target), (2) It is necessary that the positional relationship between the cameras can be obtained.
[0005]
Although it is possible to generate a virtual viewpoint video taken by a virtual camera at an arbitrary position between the real cameras based on a multi-view video taken by a plurality of cameras having different viewpoints focusing on the same object by image processing, In order to generate a high-precision virtual viewpoint video, it is necessary to accurately determine the positional relationship between the real cameras. There is a geometry parameter as a parameter indicating the positional relationship between the actual photographing cameras. The geometry parameters are represented by a determinant, which is called a fundamental matrix or an F matrix.
[0006]
Many methods have been proposed so far for finding a fundamental matrix representing the positional relationship between the real cameras. For example, several (adjacent) cameras simultaneously observe a checker pattern in a three-dimensional space, extract characteristic points (for example, intersection positions of black and white patterns) on a checker pattern image, and associate the characteristic points with each other. , A method of estimating a fundamental matrix is known (for example, Non-Patent Document 1).
[0007]
However, (1) at an actual shooting site, it is not always possible to accurately determine a corresponding point from a captured checker pattern image due to the influence of illumination light (for example, reflected light) or the like. (Expressed as F matrix) is not easy to estimate. (2) Also, when observing the same checker pattern plate with a number of cameras having different viewpoints, the observation camera is often not directly facing the checker pattern plate. There is a problem that it is not easy to accurately detect a feature point position in a checker pattern image.
[0008]
(3) Further, when a large number of cameras are arranged so as to surround the same object (subject), not only is the photographed checker pattern deformed, but if a planar checker pattern is used, the checker pattern can be shifted by 360 degrees. Even if the cameras are arranged so as to surround the periphery, all cameras cannot simultaneously photograph the same checker pattern. In this case, it is necessary to divide the multi-view camera into some groups and to photograph the checker pattern by a plurality of cameras a plurality of times, which consumes much labor and time. (4) Further, when performing camera calibration, it is clear about a process for determining whether checker image patterns for estimating the F matrix have been sufficiently recorded and whether the spatial positions of the checker patterns are appropriate. There is no standard, and there is a problem that the accuracy of the resulting F matrix is not guaranteed.
[0009]
[Non-patent document 1]
R. Y. Tsai: An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision, Proceedings of IEEE Conference on Computer Relations, Vision and Compliance. 364-374 (1986)
[0010]
[Problems to be solved by the invention]
The present invention has been made in view of the above-described problem, and highly accurately calculates a geometry parameter (Geometry Parameter) which is a parameter indicating a positional relationship between real shooting cameras included in a multi-view video imaging system. It is an object of the present invention to provide a camera calibration device, a camera calibration method, and a computer program that enable high-speed and high-precision camera calibration.
[0011]
The present invention uses a light-emitting sphere that is less likely to affect the lighting environment or the like as a calibration jig, estimates the F matrix while detecting and associating the center position of the sphere in real time, and compares the estimated F matrix with the corresponding point. By evaluating errors and positional relationships, the number of necessary corresponding points and the position of a sphere to be placed in a three-dimensional space are designated, and a more accurate F matrix is automatically estimated.
[0012]
Since the shape of the sphere does not depend on the camera observation viewpoint position, it can be observed by all cameras at various viewpoint positions. In addition, it is possible to perform camera calibration while observing and photographing a sphere image using a multi-viewpoint video photographing system having a synchronization function and recording the spherical image on a recording medium such as a memory or a hard disk of each personal computer. Then, the coordinate position (feature point) of the center of the sphere in each frame image is extracted by an image processing method on each personal computer, and a fundamental matrix is estimated by associating the extracted feature points with the images. Furthermore, the accuracy of the estimated F matrix is evaluated, and the number of corresponding points and the appropriate positions of the corresponding points, that is, the spatial positions of the spheres are automatically presented. Provided are an apparatus and a method for implementing objective estimation.
[0013]
[Means for Solving the Problems]
According to a first aspect of the present invention,
It is a calibration processing device that performs a parameter calculation process applied to the adjustment between the multi-view image shooting cameras or the correction process of the acquired image by the multi-view image shooting camera,
An image input unit for inputting video data of a plurality of cameras capturing a moving sphere from different viewpoint directions,
A feature point extraction unit that executes a process of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input unit,
An association processing unit that executes an association process of a sphere center position extracted by the feature point extraction unit as a feature point in a corresponding frame of each camera;
An F matrix calculation unit that calculates a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing unit;
And a calibration processing device.
[0014]
Further, in one embodiment of the calibration processing device of the present invention, the sphere is a light-emitting sphere, and the feature point extracting unit calculates a sphere center position based on edge information of the sphere captured in a frame. Is performed.
[0015]
Further, in one embodiment of the calibration processing device of the present invention, the F matrix calculation unit sets an epipolar line based on the calculated F matrix, and calculates a relationship between the set epipolar line and a sphere center position corresponding to a feature point. The distance is calculated individually or as an average distance, and when the calculated distance is larger than a predetermined threshold, the F matrix is recalculated based on feature point correspondence data including a new feature point. It is characterized by the following.
[0016]
Furthermore, in one embodiment of the calibration processing device of the present invention, the F matrix calculation unit executes a sphere center position in a captured image frame, that is, a distribution state determination process of feature points, and a feature point uneven distribution is confirmed. In this case, the configuration is such that the F matrix recalculation processing is performed based on feature point correspondence data including a feature point additionally set at a position where the uneven distribution is eliminated.
[0017]
Further, in one embodiment of the calibration processing device of the present invention, the feature point extracting unit executes the center position calculating process based on the edge information only when an edge of the entire sphere exists in the shooting frame. It is characterized by being.
[0018]
Further, in one embodiment of the calibration processing device of the present invention, the feature point extracting unit calculates at least eight spherical center position data required for calculating a ratio of nine elements of the F matrix. Acquired from a plurality of captured image frames of a camera, the associating processing unit executes associating processing of eight or more sphere center positions extracted by the feature point extracting unit, and the F matrix calculating unit It is characterized in that it is configured to execute element value calculation of a fundamental matrix as a calibration parameter based on the attached eight or more feature point corresponding data.
[0019]
Further, a second aspect of the present invention provides
It is a calibration processing method for performing adjustment between multi-viewpoint image capturing cameras, or parameter calculation processing applied to correction processing of an acquired image of the multi-viewpoint image capturing camera,
An image input step of inputting video data of a plurality of cameras that photograph a moving sphere from different viewpoint directions,
A feature point extraction step of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input step;
An associating processing step of executing a feature point associating process based on the sphere center position extracted in the feature point extracting step;
An F matrix calculation step of calculating a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing step;
A calibration processing method characterized by having the following.
[0020]
Further, in one embodiment of the calibration processing method of the present invention, the sphere is a light-emitting sphere, and the feature point extracting step includes calculating a sphere center position based on edge information of the sphere captured in a frame. It is characterized by executing.
[0021]
Further, in one embodiment of the calibration processing method of the present invention, the F matrix calculating step sets an epipolar line based on the calculated F matrix, and sets a distance between the set epipolar line and a spherical center position as the feature point. Is calculated individually or as an average distance, and when the calculated distance is larger than a predetermined threshold, an F matrix recalculation process is performed based on feature point correspondence data including a new feature point.
[0022]
Further, in one embodiment of the calibration processing method of the present invention, the F matrix calculation step executes a center position of a sphere in a captured image frame, that is, a distribution state determination process of feature points, and a feature point uneven distribution is confirmed. In this case, an F matrix recalculation process is performed based on feature point correspondence data including an additional feature point additionally set at a position where the uneven distribution is eliminated.
[0023]
Further, in one embodiment of the calibration processing method according to the present invention, the feature point extracting step performs a center position calculation process based on the edge information only when an edge of the entire sphere exists in the shooting frame. It is characterized by.
[0024]
Further, in one embodiment of the calibration processing method of the present invention, the feature point extracting step includes a step of extracting at least eight spherical center position data required for calculating a ratio of nine elements of the F matrix. Acquiring from a plurality of captured image frames of the camera, the associating processing step executes associating processing of eight or more spherical center positions extracted in the feature point extracting step, and the F matrix calculating step includes: The method is characterized in that element values constituting a fundamental matrix as a calibration parameter are calculated based on the attached data of eight or more feature points.
[0025]
Further, a third aspect of the present invention provides
A computer program for performing adjustment between multi-viewpoint image capturing cameras, or parameter calculation processing applied to correction processing of an acquired image of a multi-viewpoint image capture camera,
An image input step of inputting video data of a plurality of cameras that photograph a moving sphere from different viewpoint directions,
A feature point extraction step of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input step;
An associating process step of executing an associating process of the sphere center position extracted in the feature point extracting step,
An F matrix calculation step of calculating a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing step;
A computer program characterized by comprising:
[0026]
[Action]
According to the configuration of the present invention, in a calibration process for performing adjustment between multi-viewpoint image capturing cameras or a parameter calculation process applied to a correction process of an acquired image by the multi-viewpoint image capturing camera, a luminous sphere is captured by each camera. Sphere detection and acquisition of the sphere center position can be easily performed even in a shooting site where the lighting environment is unstable, and highly accurate by executing the matching process using the sphere center position as a feature point. Calculation of the F matrix becomes possible. In other words, it is possible to calculate a fundamental matrix as a geometry parameter, which is a parameter indicating a positional relationship between real shooting cameras included in the multi-view video imaging system, with high accuracy, and to achieve high speed and high speed. Accurate calibration becomes possible.
[0027]
Furthermore, according to the configuration of the present invention, an F matrix is estimated while detecting and associating the center position of the sphere in real time using the light emitting sphere as a calibration jig, and the error between the estimated F matrix and the corresponding point is determined. By evaluating the position and the positional relationship, it is possible to indicate the number of necessary corresponding points and the position of a sphere to be placed in a three-dimensional space, and efficiently calculate a more accurate F matrix.
[0028]
Furthermore, according to the configuration of the present invention, it is possible to perform camera calibration while observing and photographing a sphere image using a multi-view video photographing system having a synchronization function and recording the spherical image on a recording medium such as a memory or a hard disk on each personal computer. It is possible to extract the coordinate position (feature point) of the center of the sphere in each frame image by using an image processing method on each personal computer, and to estimate and estimate a fundamental (Fundamental) matrix by associating the images using the extracted feature points. Matrix accuracy evaluation, determination of the number of corresponding points and appropriate positions of corresponding points, presentation, etc. can be executed on, for example, a PC, and high-speed and high-precision calibration processing is executed on a simple system such as a PC. It is possible to do.
[0029]
The computer program of the present invention is provided, for example, in a computer-readable format for a general-purpose computer system capable of executing various program codes, in a storage medium or communication medium such as a CD, FD, or MO. And a computer program that can be provided by a communication medium such as a network. By providing such a program in a computer-readable format, processing according to the program is realized on a computer system.
[0030]
Further objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described below and the accompanying drawings. In this specification, the term “system” refers to a logical set of a plurality of devices, and is not limited to a device having each component in the same housing.
[0031]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a camera calibration device, a camera calibration method, and a computer program of the present invention will be described in detail with reference to the drawings.
[0032]
First, with reference to FIG. 1, a description will be given of a multi-view camera photographing process and a process of generating a virtual viewpoint video using a real shot video captured by the multi-view camera. The subject 100 is photographed by a plurality of cameras arranged around the subject 100. Each camera captures an image of the subject 100 from a different viewpoint. Real images are acquired in accordance with the number of cameras to be arranged. However, a real image between cameras cannot be obtained. Therefore, the image between the cameras, that is, the image of the virtual viewpoint, is generated by image processing based on the real image.
[0033]
For example, when no camera is arranged between the camera A101 and the camera B102, an image between the camera A101 and the camera B102 is generated based on two

real images

111 and 112 of these cameras. This can be generated by a process of generating an image that is not actually photographed, that is, a so-called view interpolation process.
[0034]
View interpolation is a technique for generating an image viewed from a viewpoint without an actual camera from images from a plurality of cameras. For this view interpolation, for example, [S. M. Seitz and C.I. R. Dyer, "View Morphing," Proc. SIGGRAPH 96, ACM, 1996 pp. 21-30. ]It is described in. According to the view interpolation, it is possible to generate an image of a viewpoint without a camera based on an actually acquired image based on a plurality of cameras.
[0035]
For example, in order to generate a virtual viewpoint camera image 121 between the camera A101 and the camera B102 as if it were photographed by the virtual camera 113, the difference between the characteristics of the camera A101 and the camera B102 and the relative positional relationship are obtained. It is necessary to perform various corrections in the generation of the virtual viewpoint image based on the parameter as the information of the virtual viewpoint image. In other words, it is necessary to perform an adjustment that accurately grasps the difference in the characteristics of the camera that is capturing the real image, that is, perform a camera calibration.
[0036]
As shown in FIG. 2, the correction process using the calibration parameters required for generating the virtual viewpoint image includes a distortion parameter including a distortion coefficient (Distortion Coefficient), a distortion center (Distortion Center), and an aspect ratio (Aspect Ratio). Parameter correction based on, and internal parameters [A] consisting of scale factors, torsion, image center data, and camera internal / external parameters based on external parameters [T, R] consisting of translation [T] and rotation [R] vectors There is a correction. Regarding camera distortion parameters, generally, checker patterns, straight lines, and the like are photographed by a plurality of cameras that execute calibration, and adjustment parameters between cameras are estimated based on photographed images of the checker patterns, straight lines, and the like.
[0037]
The camera internal / external parameters [A, T, R] are expressed by a determinant indicating geometry parameters (Geometry Parameters) as parameters indicating the positional relationship between the real cameras. This is called a fundamental matrix (F matrix). The F matrix is represented by the following equation (Equation 1).
[0038]
(Equation 1)

[0039]
Several methods have already been proposed for obtaining a fundamental matrix between cameras represented by the above equation. A general method for obtaining a fundamental matrix will be briefly described.
[0040]
FIG. 3 shows a general method of estimating F matrix parameters. First, as shown in FIG. 3A, an image pattern (for example, a checker pattern) as a calibration jig in a three-dimensional space is photographed several times by the

cameras

1, 231 and the

cameras

2, 232. Then, feature points are extracted from the images I1, 241 (cameras 1,231 actually photographed images) and I2, 242 (

cameras

2, 232 photographed images). The feature points are, for example, intersections m11, m12,. . . . . . And m21, m22,. . . . . . Is extracted.
[0041]
After the feature points are extracted, as shown in FIG. 3B or the following equation, an F matrix is obtained by associating the extracted feature points m1i with m2i (i = 1, 2,... N). Can be estimated.
[0042]
(Equation 2)

[0043]
In the above equation, f is W ^T By calculating as an eigenvector corresponding to the minimum eigenvalue of W, the elements of the F matrix can be obtained.
[0044]
FIG. 4 is a diagram showing a camera calibration procedure when a checker pattern is used as a calibration jig in order to actually estimate the F matrix.
[0045]
First, in step S101, a checker pattern as a calibration jig is photographed by the camera 1 and the camera 2 from different viewpoints.
[0046]
Next, in step S102, feature points m1 (x, y) and m2 (x, y) in the checker pattern images I1 (x, y) and I2 (x, y) taken by the

cameras

1 and 2 are extracted. In step S103, a correspondence process is performed between the extracted feature points m1 (x, y) and m2 (x, y).
[0047]
The association process is a matching process that associates pixels in a plurality of images obtained by photographing the same object. 2. Description of the Related Art Conventionally, methods of “association” that are often used include, for example, Pixel-based matching, Area-based matching, and Feature-based matching. Pixel-based matching is a method of searching for the correspondence of points in one image as it is in the other image. Area-based matching is a method of searching for the correspondence of a point in one image by using a local image pattern around the point when searching in the other image. Feature-based matching is a method of extracting features such as dark and light edges from images and performing correspondence using only features between images. Pixels are associated with each other based on feature points extracted from a plurality of images using these methods.
[0048]
Finally, in step S104, an F matrix is calculated by applying the above-described equation (Equation 2) based on the acquired corresponding points. That is, by associating the extracted feature points m1i and m2i (i = 1, 2,... N), f is represented by W ^T By calculating as an eigenvector corresponding to the minimum eigenvalue of W, an F matrix, that is, a fundamental matrix (= F parameter) is obtained.
[0049]
However, in the above-described method, when the camera and the checker pattern plate do not face each other, there is a problem that the extraction accuracy of feature points is reduced. In addition, due to the arrangement of a plurality of cameras, there is a problem that a checker pattern cannot be observed at the same time by each camera, and that the correspondence of feature points between images is affected by the quality of a captured image, and processing time is increased. .
[0050]
As a method for solving these problems, in the present invention, a moving luminous sphere is used as a calibration tool instead of a planar checker pattern, and a plurality of cameras that photograph the luminous sphere from different directions are installed, and each camera image is set. Is performed, a fundamental matrix (= F parameter) is calculated based on the recorded image, and camera calibration is performed. ADVANTAGE OF THE INVENTION According to the structure of this invention, it is efficient and highly accurate without the problem that the characteristic point extraction precision falls by imaging a checker pattern of a plane from a different direction, and the checker pattern cannot be observed by each camera simultaneously. You can execute simple camera calibration.
[0051]
FIG. 5 shows a configuration example in which a moving luminous sphere is used as a calibration tool, and an issuance sphere is photographed from a different viewpoint. FIG. 6 shows an example of the configuration of a synchronous recording processing device for recorded video of each camera.
[0052]
As shown in FIG. 5, cameras 301-1 to N for performing camera calibration are arranged around a luminous sphere 300 as a calibration tool, and the luminous sphere 300 is moved to, for example, display an image of 30 frames / sec. Shoot. The figure shows the acquired images (one frame) 302-1, 2, n of the cameras 1, 301-1, camera 2, 301-2, and cameras n, 301-n.
[0053]
The advantages of using a luminous sphere as shown in FIG. 5 are that (1) the sphere can be easily detected from an image observed in an arbitrary background and environment, and (2) the sphere has the same shape with a camera at an arbitrary viewpoint position. (3) The center position of the sphere in the image can be accurately estimated, and (4) The corresponding point processing between the images using the center of the sphere extracted from the time-series image can be easily performed. It is in.
[0054]
In order to estimate the F matrix, the nine elements (f ₁₁ , F ₁₂ , ... f ₃₃ ) Must be determined. This is substantially equivalent to obtaining the ratio of the nine elements of the F matrix and obtaining the solution of the equation having eight unknowns. Therefore, it is necessary to set relational expressions for a large number of corresponding points (at least eight points or more).
[0055]
In the configuration of the present invention, a moving luminous sphere is photographed, and the center position of the luminous sphere is extracted as a feature point for each frame along the time axis, and a plurality of feature points extracted from the frame along the time axis are extracted. Are used as if they were extracted from one shot image, and are used for association between images. For example, if shooting is performed at 30 frames / second for 5 seconds, 30 × 5 = 150 frames can be shot by each camera, and 150 feature points (= sphere center position) can be extracted.
[0056]
In order to automatically perform such feature point extraction and association, a synchronous recording apparatus for a plurality of camera images provided with a network synchronization mechanism as shown in FIG. 6 is applied.
[0057]
The cameras 301 of the cameras 1 to n that photograph the luminous sphere 300 as a calibration tool from different viewpoints output synchronization signals (horizontal and vertical synchronization signals) output from the synchronization signal generator 320 to external synchronization input terminals of the cameras. And outputs a plurality of videos of multiple viewpoints observed by the plurality of cameras 1 to n as synchronized images.
[0058]
Each camera output video is a video signal such as a Y / C signal or the like as a difference signal (C) between a luminance signal (Y) and an RGB color component signal, and each camera sets a video signal corresponding to each camera. Input to the converted AD converter 330. The AD converter 330 converts a video signal input from the camera 301 into digital data and, if necessary, converts the video signal into compressed data (for example, motion JPEG), and records the processed data as respective data. The image data is output to a client PC (PC1 to PCn) 350 and an image monitor 360 as an image recording processing device.
[0059]
The client PC (PC1 to PCn) 350 as an image recording processing device includes a recording medium 356 for recording data, for example, a hard disk, a DVD, a CD, and the like.
[0060]
The video signal input from the AD converter 330 to the client PC (PC1 to PCn) 350 is input to the video signal processing unit 355 in the client PC (PC1 to PCn) 350. The video signal processing unit 355 is constituted by, for example, a 1394 DV capture board.
[0061]
The plurality of PCs 1 to PCn constituting the client PC 350 record DV images from the respective cameras on the hard disk 356 in the PC using DV capture software executable on a general-purpose PC. The client PCs (PC1 to PCn) 350 and the server PC 370 are connected by a network card 357 as a network interface mounted on each PC and a network 380. By outputting the recording start command, the client PCs 350 can simultaneously start recording.
[0062]
FIG. 7 shows a configuration example of the video signal processing unit 355 formed by, for example, a 1394 DV capture board in the client PC 350.
[0063]
An IEEE 1394 port (hereinafter, 1394 port) 394 and a LINK / PHY 395 take in a digital video (DV) signal processed by the AD converter 330. The 1394 port 394 is an input port for receiving a DV signal, and the input DV signal is transmitted to the LINK / PHY 395. The LINK / PHY 395 temporarily stores the input DV image data in the RAM 392 as buffer means.
[0064]
The RAM 392 functions as an image data storage unit including a buffer that stores the captured DV image data. The image data primarily recorded in the RAM 392 is output to the storage unit 356 such as an HDD or a DVD via the storage unit I / F 396 and stored under the control of the CPU 391. As described above, each of the client PCs (PC1 to PCn) 350 and the server PC 370 are connected by the network card 357 and the network 380 attached to each PC as a network interface. .About.PCn) 350, by outputting a recording start command to each client PC 350, it is possible to simultaneously start recording, that is, start data storage processing in storage means 356 such as an HDD or DVD.
[0065]
The CPU 391 controls the entire apparatus for capturing and storing image data, and functions as image data control means for controlling the entire flow of DV image data. The CPU 391 controls the flow of image data according to a program stored in the ROM 393.
[0066]
The program stored in the ROM 393 is, for example, DV capture software that can be executed on a general-purpose PC. The program is executed by a recording start command from a server PC, and the client PCs start recording simultaneously.
[0067]
As described above, the reference video signal (synchronization signal) from the synchronization signal generator 320 is input to the external synchronization terminal of each of the cameras 301-1 to n, and the video output signals of each of the cameras 301-1 to n are converted to an AD converter. The digital video signal is converted by 330 into a digital video signal, and stored in a recording medium such as a memory or a hard disk built in the personal computer 350 through the capture board. Since the timing until the start of image recording is not always the same depending on the execution environment of each personal computer, the recorded camera images are not usually synchronized. Therefore, when saving each camera video, the image frame number being recorded is reported to the server as measurement point data via the network, and the captured video is recorded using the measurement point data compiled on the server. Synchronization between camera images was realized.
[0068]
With reference to FIG. 8, the details of the synchronization adjustment process executed by the server PC will be described. First, in step S301, the server PC reads the recording frame number data of each PC at each measurement point input from each client PC. This is the data shown in the data example (a) of FIG. 8, for example.
[0069]
Here, it is assumed that the server PC has acquired the recording frame numbers at four measurement points (measurement points 1 to 4) from a plurality of client PCs (PC1 to PCn).
[0070]
According to the list of recording frame numbers of each PC at each measurement point input from each client PC in FIG. 8A, for example, PC1
At measuring point 1, recording processing frame = 6,
At measuring point 2, recording processing frame = 10,
At measuring point 3, recording processing frame = 14,
At measuring point 4, recording processing frame = 19,
Indicates that the frame number is stored at each measurement point in response to the “save current recording frame number” command from the server PC.
[0071]
Next, in step S302, the server PC receives the frame No. in the PC having a large processing delay amount, in this case, the measurement side point 1. 3 is set as a reference PC, and a frame shift amount of each PC from the reference PC is calculated.
[0072]
As a result of this processing, the frame shift amount of the video recorded in each PC at each measurement point (b) shown in FIG. 8 is obtained. In the example shown in FIG. 8, PC2 is used as a reference, and PC2 has a displacement amount = 0 at all measurement points. Further, for example, the displacement amount of PC1 is obtained at each measurement point by the recording processing frame of PC1−the recording processing frame of PC2. As a result, the displacement amount of PC1 is
At measurement side point 1, shift amount = 6−3 = 3
At measurement side point 2, deviation amount = 10−7 = 3
At the measurement point 3, the shift amount = 14−11 = 3
At measurement side point 4, deviation amount = 19−15 = 3
It becomes.
[0073]
For other PCs, PC3 to PCn, the recording frame No. with PC2 is also included. Is obtained as a shift amount.
[0074]
Next, in step S303, a process of determining the shift amount of each PC from the reference PC = PC2 is performed. This is performed, for example, by calculating the average value of the shift amounts at each measurement point of each PC, and calculating an integer value based on the average value by rounding.
[0075]
FIG. 8C shows data obtained by calculating the average value of the shift amount at each measurement point of each PC. The deviation amount of PC1 from PC2 is
(3 + 3 + 3 + 4) /4=3.25
It becomes.
[0076]
FIG. 8D shows data obtained by rounding off the average value of the shift amount at each measurement point of each PC.
The shift amount of PC1 with respect to PC2 is 3,
The shift amount of PCn with respect to PC2 is 2.
In this way, the amount of deviation of each client PC from the reference PC (PC2) is determined.
[0077]
Next, in step S304, the server PC executes a synchronization adjustment process of each client PC based on the calculated shift amount of each PC.
[0078]
The processing frame No. at each time 1 to 4 of each client PC. This is executed as a process for unifying. As shown in FIG. 8E, at times 1 to 4,
At time 1, the recording frames of all PCs = 3
At time 2, the recording frames of all PCs = 7
At time 3, the recording frames of all PCs = 11
At time 4, the recording frames of all PCs = 15
Is set.
[0079]
For example, the PC1 setting frame No. Is the setting frame No. of the reference PC (PC2). Is determined to be 3, and a value obtained by subtracting the difference 3 from the PC1 set frame No. is added to the new frame No. in PC1. Set as The PCn setting frame No. Is the setting frame No. of the reference PC (PC2). Is determined to be 2, the value obtained by subtracting the difference 2 from the PCn setting frame No. is used as the new frame No. in PC1. Is set at each time.
[0080]
Executing the synchronization adjustment processing in each client PC in this manner enables accurate inter-frame synchronization.
[0081]
By the above-described processing, each camera image can be acquired completely synchronously. The calibration device of the present invention executes the process of associating the center of the luminous sphere as a feature point with respect to each of the image capturing frames on which the above-described synchronization adjustment has been performed, and according to the procedure described with reference to FIG. Find the F matrix.
[0082]
FIG. 9 shows a time-series image (t = 1,..., K) of the light emitting sphere observed by one camera (camera 1, 301-1). By automatically extracting the center of a sphere in each frame image from all camera images and associating them, the F matrix can be estimated at high speed and with high accuracy.
[0083]
In the configuration of the present invention, the center point of the luminous sphere is a feature point for executing the association processing. The process of obtaining the center point of the light emitting sphere is executed in each frame of each camera photographed image. In order to determine the elements of the F matrix, it is necessary to associate at least eight feature points as described above, and feature points are extracted from an image captured by a camera performing calibration in at least eight frames. The process of determining the center position of the light-emitting sphere performed as the feature point extraction process will be described with reference to FIG.
[0084]
The procedure of the center point position determination process of the light emitting sphere executed as the feature point extraction process in the configuration of the present invention will be described with reference to the process flow of FIG. FIG. 10 shows a processing flow for obtaining the center pixel position of a sphere from one frame image. First, in step S501, a captured image frame of a light emitting sphere is input.
[0085]
In step S502, a sphere edge is detected from the input image. If the coordinate values of the edge pixels are in contact with the four sides of the acquired frame image, it is determined that a part of the sphere is missing, and no feature point is extracted in that frame (time). This determination processing is step S503.
[0086]
When the entire light-emitting sphere is observed as an image in the input frame (step S503: Yes), in step S504, the initial center position of the sphere is determined by projection processing based on the coordinate values of the edge pixels. Specifically, the difference between the xy coordinate values of the pixels on the left and right edges of the sphere is projected on the X axis and the Y axis, the distribution for X and Y is obtained, and the maximum position of each is determined by the initial center (xs, ys) of the sphere. ) And size (radius).
[0087]
In step S505, the estimated initial center and size are applied to the circular model, and a pixel point on an edge largely deviating from the model is deleted as an exceptional point (Outlier). Finally, in step S506, the edge pixel point after removing the exceptional point (Outlier) is applied to the circular model to determine the center position (and size) of the sphere.
[0088]
FIG. 11 is a diagram illustrating a processing example in which the center position of each sphere is detected from the time-series images observed by the adjacent cameras and set as if they were feature points extracted from one image.
[0089]
It is assumed that an F matrix for camera calibration for adjustment processing between the

cameras

1 and 501 and the

cameras

2 and 502 for photographing the light emitting sphere 500 from different directions is obtained. In this case, it is necessary to extract a plurality of feature points (at least eight) from a plurality of frames as described above.
[0090]
In the configuration of the present invention, the characteristic point is the center position of the luminous sphere in the luminous sphere photographed image, and the center position of the luminous sphere in each frame is extracted as the characteristic point. Each frame image of the shooting frames t = 1 to k of the

cameras

1 and 501 and the

cameras

2 and 502 is an image on which the synchronization adjustment processing described with reference to FIG. 6 has been performed, and each corresponding frame is a synchronized image. It is. By executing the center position coordinate determination process of the sphere described with reference to FIG. 10 for k frames from t = 1 to k, k feature points are obtained for each camera. However, as described above, if the spherical image protrudes from the frame image, the center cannot be obtained, so that the maximum number of feature points is k.
[0091]
As described above, by determining at least eight feature points and executing the association process, the elements of the F matrix including nine elements can be determined. From the synchronous frame image, at least eight frames from which the center position of the luminous sphere can be obtained are selected, and their characteristic points (sphere center position) are obtained. When they are combined into one, feature

point distribution data

511 and 512 shown in FIG. 11 are obtained.
[0092]
The feature point distribution data 511 includes t = 1, 2,. . The distribution of the feature points (sphere center position) of each frame of k is shown. p1 (1) is the center position of the sphere in the frame of t = 1, p1 (2) is the center position of the sphere in the frame of t = 2, and p1 (k) is the center position of the sphere in the frame of t = k. is there.
[0093]
The feature point distribution data 512 includes t = 1, 2,. . The distribution of the feature points (sphere center position) of each frame of k is shown. p2 (1) is the center position of the sphere in the frame at t = 1, p2 (2) is the center position of the sphere in the frame at t = 2, and p2 (k) is the center position of the sphere in the frame at t = k. is there.
[0094]
Corresponding feature points of these feature

point distribution data

511 and 512, that is, each of p1 (1) and p2 (1), p1 (2) and p2 (2),... P1 (k) and p2 (k) The association processing for the feature points is executed according to the processing procedure of the flow of FIG. 4 described above, and the F matrix can be obtained by applying the above-described equation (Equation 2).
[0095]
FIG. 12 is a flowchart illustrating a processing procedure for obtaining a sphere center position as a feature point from each image frame based on a time-series image including a plurality of frames. Each processing step of the flowchart shown in FIG. 12 will be described. Here, the camera number of the feature point extraction processing target. Is m, and the camera No. Let m be the number of image frames taken.
[0096]
In step S701, the observation (photographed) image Im (t) (t = 1, 2,..., K) of the m-th (m = 1, 2,..., N) camera is input to the calibration processing device. .
[0097]
In step S702, t = 1 is sequentially set as an initial value of a processing frame for executing processing for each frame. In step S703, the camera image Im (t) is read, and processing for detecting a sphere in the read frame and estimating the center position thereof is performed according to the procedure described above with reference to FIG.
[0098]
If the image of the sphere is not detected in the image to be processed or if the entire sphere is not included (Step S704: No), the sphere center position estimation process is not executed as described above, but the calibration processing device In step S712, Pm (t) = Null is stored in the memory as the result, that is, frame-corresponding data indicating that the center position has not been determined.
[0099]
When the whole sphere is detected in the image, the center position of the sphere is estimated in accordance with the method of FIG. 10 described above, and the result is stored in step S705 as the center position data Pm (t) corresponding to the frame. To save.
[0100]
Here, Pm (t) means the sphere center position of the t (t = 1,..., N) frame in the m-th camera image. In step S706, it is determined whether or not processing of all frames has been completed, that is, [t <k? Is performed, and if there is an unprocessed frame, an update process of [t = t + 1] is executed in step S711, and a sphere center position estimation process and a save process are executed in step S703 and subsequent steps. .
[0101]
The process shown in FIG. 12 is executed for all N cameras, and the center position Pm (t) of the ball (light emitting sphere) in a plurality of frames is obtained from all the camera images.
[0102]
A set of feature point distribution data shown in FIG. 11 was configured and configured by using a set of frames excluding a frame whose center position was not determined from photographed frames of an adjacent camera or a distant camera performing the calibration process. An F matrix is calculated by executing a feature point association process of a corresponding frame in the feature point distribution data. The process of calculating the F matrix by associating the feature points is executed by the same method as described above with reference to FIGS. When the F matrix is determined, it is possible to perform correction processing of the captured image of each camera using the F matrix, and it is possible to generate a highly accurate virtual viewpoint camera image from actual captured images of a plurality of cameras.
[0103]
FIG. 13 is a block diagram illustrating the functional configuration of the calibration processing device according to the present invention. Various processes executed in the calibration processing device of the present invention can be executed as processes according to a computer program. Specifically, various processes are executed under the control of the CPU as a control unit. FIG. 13 is a block diagram for explaining processing executed by the CPU as the control means individually and explaining a function executed by the calibration processing apparatus of the present invention. A specific hardware configuration will be described later.
[0104]
The block diagram of FIG. 13 will be described. The image input unit 601 inputs image data obtained by photographing a light emitting sphere. In many cases, the purpose of the calibration process is to adjust the image between the two cameras, and frame images of the two cameras, which are obtained by photographing the light emitting sphere from different viewpoints, are input. This input frame image is an image obtained in the configuration described above with reference to FIG. 6, and each frame image of each camera is synchronized image data.
[0105]
The feature point extracting unit 602 obtains the center position of the light emitting sphere of each frame as a feature point according to the procedure described with reference to FIGS. The associating processing unit 603 generates a feature point based on feature point distribution data (for example, feature

point distribution data

511 and 512 shown in FIG. 11) including a plurality of feature points obtained based on an image captured by each camera. Is performed. This association processing is executed by, for example, Pixel-based matching, Area-based matching, Feature-based matching, or the like.
[0106]
When the associating process is executed, the F matrix calculating unit 604 executes the F matrix calculating process based on the associating data. The F matrix calculation processing based on the association data is executed according to the processing described above with reference to FIGS. 3 and 4, and the F matrix is obtained according to the above-described equation (Equation 2).
[0107]
The obtained F matrix is output to the virtual viewpoint image generation unit 605, and calibration of the acquired images of the two cameras based on the F matrix, that is, correction processing is executed. , An image of the virtual viewpoint camera that has not been captured is generated.
[0108]
Note that calibration as adjustment processing of the camera itself may be executed based on the calculated F matrix. Whether to correct the acquired image or to adjust the camera itself is optional.
[0109]
As described above, from a multi-view video taken by multiple cameras with different viewpoints focusing on the same object, an arbitrary virtual viewpoint video is created between those live-action cameras as if it were shot by a virtual camera In order to express the position, it is necessary to accurately determine the positional relationship between the actual cameras. An accurate positional relationship between the two cameras is obtained by a fundamental matrix as a parameter indicating the positional relationship between the real cameras, and correction and synthesis of captured images based on the obtained positional relationship can be performed with high accuracy. In addition, it is possible to generate a highly accurate image of a virtual viewpoint camera that is not actually photographed between two cameras.
[0110]
Next, with reference to the flowchart shown in FIG. 14, a description will be given of an optimal F matrix calculation processing procedure executed by the calibration processing apparatus of the present invention.
[0111]
In step S801, first, the luminous sphere is placed at a position where each camera to be calibrated can observe. For example, a light-emitting sphere is placed and moved where the optical axis of each camera intersects. In step S802, the moving luminous sphere is photographed by each camera. In the photographing process, as described with reference to FIGS. 5 to 8, the photographed frames of each camera are acquired as synchronized frames.
[0112]
In step S803, the center position of a sphere as a feature point is detected from the image captured by each camera. The detection of the sphere center position is performed according to the procedure described with reference to FIGS.
[0113]
In step S804, a process of associating the camera images to be calibrated is performed with the detected sphere center position as a feature point of each camera image. The feature point associating process is executed according to the procedure described with reference to FIGS.
[0114]
In step S805, it is determined whether the number of corresponding points has reached eight. As described above, in order to calculate the ratio of the nine elements of the F matrix, it is necessary to associate eight feature points. If the number of corresponding points is less than 8, the process returns to step S801, and the images captured by a plurality of cameras to be subjected to the calibration process are acquired while moving the light emitting sphere to change the position of the sphere.
[0115]
If the number of corresponding points has reached eight or more, the process proceeds to step S806, where an F matrix is calculated based on the corresponding points of the feature points. The calculation of the F matrix is performed according to the equation (Equation 2) as described with reference to FIGS. Note that an image captured by a camera is captured at a rate of, for example, 30 frames / second for several seconds in a single capturing process, so that 100 or more frames are acquired, and feature point extraction processing is performed from these many frames. You.
[0116]
Therefore, feature points to be matched are randomly selected from 100 or more feature points, and feature point matching processing is performed based on the selected points. Further, the calculation of the F matrix is executed based on the associated feature points. However, since the association of the feature points is performed based on the correlation between the images, the association processing is often performed by mistake. If there is such an error, an accurate F matrix cannot be obtained. Therefore, it is necessary to perform a process of obtaining an accurate F matrix by excluding a combination of feature points that are erroneously associated.
[0117]
The processing in step S807 and subsequent steps is processing for eliminating feature points determined to be inappropriate for the calculation of the F matrix, such as feature points that have been incorrectly associated.
[0118]
In step S807, an epipolar line is calculated using the calculated F matrix. Further, in step S808, the distance (error) between the corresponding point used in the F matrix estimation and the epipolar line is calculated. If the error is larger than a predetermined threshold, the position of the sphere is changed, its center is detected, and a new corresponding point is obtained.
[0119]
With reference to FIGS. 15 and 16, the processing of steps S807 and S808, that is, the generation of the epipolar line and the calculation processing of the distance (error) between the corresponding point used in the F matrix estimation and the epipolar line will be described. First, generation of an epipolar line will be described. An epipolar line is a line set based on the positional relationship between two cameras, and is generally generated by parallax detection using a stereo method.
[0120]
The principle of the stereo method will be briefly described. The stereo method associates pixels in multiple images obtained by photographing the same object from two or more viewpoints (different line-of-sight directions) using a plurality of cameras to associate the position of the measurement object in a three-dimensional space. It is what we seek. For example, the same object is photographed from different viewpoints by the reference camera and the detection camera, and the distance between the measurement objects in each image is measured based on the principle of triangulation.
[0121]
FIG. 15 is a diagram illustrating the principle of the stereo method. The reference camera (Camera 1) and the detection camera (Camera 2) photograph the same object from different viewpoints. Consider finding the depth of a point “mb” in an image captured by a reference camera.
[0122]
Objects that appear at the point “mb” in the image captured by the reference camera are “m1”, “m2”, and “m3” in images captured by the detection cameras capturing the same object from different viewpoints. It will be developed on a straight line. This straight line is referred to as an epipolar line Lp.
[0123]
The position of the point “mb” in the reference camera appears on a straight line called “epipolar line” in the image obtained by the detection camera. As long as the point P to be imaged (a point existing on a straight line including P1, P2, and P3) is on the line of sight of the reference camera, regardless of the depth, that is, the distance from the reference camera, on the reference image. Appears at the same observation point "mb". On the other hand, the point P on the image captured by the detection camera appears on the epipolar line at a position corresponding to the magnitude of the distance between the reference camera and the observation point P.
[0124]
In the configuration of the present invention, feature points are extracted based on the images captured by the two cameras to be calibrated, and the F matrix is calculated in step S806 based on the extracted feature points. The F matrix is a parameter indicating a positional relationship between the two cameras, and an epipolar line can be set based on the positional relationship parameter.
[0125]
The setting of the epipolar line will be described with reference to FIG. FIGS. 16A and 16B are diagrams collectively showing feature points (sphere centers) obtained in a plurality of frames captured by a camera to be subjected to calibration processing. This corresponds to the feature

point distribution data

511, 512 shown in FIG. Further, FIG. 14B shows epipolar lines L1, L2, and Li set based on the F matrix obtained in step S806 of the flow shown in FIG. These epipolar lines are epipolar lines set based on, for example, eight feature points that have been subjected to the association process for calculating the F matrix.
[0126]
For example, from the F matrix estimated using the corresponding points mli (i = 1,..., P) and m2i (i = 1,..., P) in the images of the

cameras

1 and 2, the following equation (Equation 3) , (Equation 4), an epipolar line Li on the camera 2 image is obtained for each feature point (corresponding point) m1i (i = 1,..., P) in the camera 1 image. Similarly, an epipolar line in the camera 1 image is obtained for each feature point in the camera 2 image.
[0127]
[Equation 3]

[0128]
Based on the above equation (Equation 3), one straight line (epipolar line) shown in the following equation (Equation 4) is set.
[0129]
(Equation 4)

[0130]
However, the feature points used for the F matrix calculation are feature points randomly selected from a large number of feature points existing in a large number of frames captured by the two cameras. If the F matrix is calculated correctly, the feature points will be on the epipolar line set based on the F matrix, but the feature points have not been accurately extracted or matched. Then, the feature point becomes a position shifted from the epipolar line. If the deviation is large, the feature point association processing is inaccurate, and it is determined that the F matrix is also inaccurate.
[0131]
As an index for determining the accuracy of the F matrix, a distance Di between the feature point and the epipolar line is calculated. As shown in FIG. 16, the epipolar lines L1, L2,... Li set based on the F matrix and each feature point m21, m22,. . m2i and distances D1, D2,. . Find Di.
[0132]
In step S809, each of the calculated distances D1, D2,. . Di is compared with a predetermined threshold value. If there is a feature point m2i having a distance larger than the threshold value, in step S810, the set of feature points, that is, the set of feature points m1i and m2i of camera 1 and camera 2 is excluded from the association target, and step S810 is performed. In S806, the F matrix is calculated again based on the associating processing data of the other associable feature points.
[0133]
In step S809, each of the calculated distances D1, D2,. . If all Di are equal to or less than a predetermined threshold, in step S811, each of the calculated distances D1, D2,. . An average value of Di is calculated, and the average value is compared with a predetermined second threshold value. If the average distance value is equal to or greater than the second threshold value, the F matrix is calculated again in step S806 based on the association processing data of other associateable feature points.
[0134]
In step S811, each of the calculated distances D1, D2,. . If the average value of Di is less than the second threshold, the process proceeds to step S812, and the distribution of feature points is determined. The feature point distribution determination processing will be described with reference to FIG. When the feature points for executing the association processing are biased to a part of the captured image frame, it is difficult to obtain an accurate positional relationship between the cameras. That is, it is difficult to calculate an accurate F matrix. Therefore, in order to perform more accurate F-matrix calculation assuming that the feature point distribution for which the association process is performed is scattered over the entire captured image frame, the feature point distribution is checked. A new feature point is set at the position, and the F matrix is calculated again.
[0135]
As shown in FIG. 17, the coordinate values of the corresponding point m1i (i = 1,..., P) on the image of the camera 1 and the corresponding point m2i (i = 1,. As shown in the following expression (Expression 5), the respective coordinate distributions (average values of x and y) are obtained.
[0136]
(Equation 5)

[0137]
If the coordinate distribution of each camera obtained by each of the above equations is apart from the image center (xc1, yc1) and (xc2, yc2), it is determined that there is uneven distribution of feature points, and a new feature point is added. I do. More specifically, the distance between the coordinate distribution value of each camera obtained by each of the above equations and the image centers (xc1, yc1) and (xc2, yc2) is compared with a third threshold value determined in advance, and If it is large, it is determined that there is a bias in the feature point distribution, and the process advances from step S813 to step S814 to set a necessary feature point position. The additional feature point to be specified is set at a position that reduces the bias of the coordinate distribution obtained in the above equation (Equation 5). For example, points m1k and m2k shown in FIG.
[0138]
In step S815, the sphere is moved to a position where the instructed feature point can be obtained, and thereafter, the process returns to step S802, where camera shooting of the sphere is performed again, an additional feature point is added, and a feature point association process is performed. Execute the F matrix calculation process.
[0139]
If it is determined in step S813 that the feature points are not biased, the process ends with the obtained F matrix as the final F matrix.
[0140]
Thus, the processing procedure of image observation-> sphere detection-> sphere center estimation->association-> F matrix estimation-> epipolar line calculation-> error calculation between corresponding points and epipolar line-> error value evaluation is as follows: Repeat until the error evaluation is smaller than a certain threshold. Further, when the above-mentioned error value becomes smaller than a certain threshold value, it is determined whether or not the position of the corresponding point (that is, the spatial position of the sphere) is biased using the disparity information between the images. If the corresponding points are unbalanced, the spatial position where the sphere should be placed is presented, and the above-mentioned repetitive work is performed. As a result, it is possible to perform high-precision camera calibration even for on-site shooting where the lighting environment is unstable.
[0141]
As described above, according to the processing procedure described with reference to FIG. 14, in the process of associating the sphere center position obtained from the image of the sphere acquired by the camera to be calibrated as a feature point and calculating the F matrix, Even when a feature point selection error for which the association setting is performed or an error occurs in the association process, the error can be corrected. In addition, even when feature points are unevenly distributed, it is possible to check the uneven distribution status, add a feature point to a position not unevenly distributed, and calculate a new F matrix, so that a more accurate F matrix can be calculated. Becomes possible. Therefore, camera calibration based on a high-precision F matrix, or correction processing of an acquired image, and generation of a high-precision virtual viewpoint image based on the correction processing can be performed.
[0142]
Next, a specific hardware configuration example of the calibration processing device according to the present invention will be described with reference to FIG. A CPU (Central processing Unit) 901 is a processor that executes the processing program described with reference to the above-described flowcharts and the OS (Operating System). A ROM (Read-Only-Memory) 902 stores a program executed by the CPU 901 or fixed data as operation parameters. A RAM (Random Access Memory) 903 is used as a storage area and a work area for a program executed in the processing of the CPU 901 and parameters that change as appropriate in the program processing. The HDD 904 performs control of the hard disk, and stores and reads various data and programs from and to the hard disk.
[0143]
The bus 910 is configured by a PCI (Peripheral Component Internet / Interface) bus or the like, and enables data transfer with each module and each available device via the input / output interface 911.
[0144]
The input unit 905 includes an image data input unit, a keyboard, a pointing device, and the like. The input unit 905 inputs image data obtained by a camera that executes calibration, and also inputs various commands and data to the CPU 901. The output unit 906 is, for example, a CRT, a liquid crystal display, or the like that displays a captured image, a feature point extraction processing image, or a virtual viewpoint image generated based on a camera acquired image after calculating an F matrix.
[0145]
The communication unit 907 performs a communication process with another device. For example, the images of a plurality of cameras acquired in a system for executing the synchronous acquisition processing of images shown in FIG. 6 are input. Based on the input image, the above-described F matrix calculation processing, virtual viewpoint image generation processing, and the like are executed under the control of the CPU 901 as a control unit. Note that the image data to be processed may be input not only through the communication unit but also through an A / V input unit configured in the input unit 905, and the HDD connected to the drive 908. , A CD, a DVD, etc., the image data stored in a removable recording medium 909 may be input as a processing target image.
[0146]
The drive 908 is a drive that executes recording and reproduction of a removable recording medium 909 such as a flexible disk, a CD-ROM (Compact Disc Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory. Yes, it reads a program or data from each removable recording medium 909, and executes a program or data storage process for the removable recording medium 909.
[0147]
The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the spirit of the present invention. That is, the present invention has been disclosed by way of example, and should not be construed as limiting. In order to determine the gist of the present invention, the claims described at the beginning should be considered.
[0148]
Note that the series of processes described in the specification can be executed by hardware, software, or a combined configuration of both. When executing the processing by software, the program recording the processing sequence is installed in a memory in a computer embedded in dedicated hardware and executed, or the program is stored in a general-purpose computer capable of executing various processing. It can be installed and run.
[0149]
For example, the program may be recorded in a hard disk or a ROM (Read Only Memory) as a storage medium in advance. Alternatively, the program is temporarily or permanently stored on a removable recording medium such as a flexible disk, a CD-ROM (Compact Disc Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory. It can be stored (recorded). Such a removable recording medium can be provided as so-called package software.
[0150]
In addition to installing the program on the computer from the above-described removable recording medium, the program is wirelessly transferred from the download site to the computer, or transferred to the computer via a network such as a LAN (Local Area Network) or the Internet by wire. The computer can receive the program transferred in this way and install it on a storage medium such as a built-in hard disk.
[0151]
The various processes described in the specification may be executed not only in chronological order according to the description but also in parallel or individually according to the processing capability of the device that executes the processes or as necessary.
[0152]
【The invention's effect】
As described above, according to the configuration of the present invention, in the calibration process for performing the adjustment between the multi-viewpoint image capturing cameras or the parameter calculation process applied to the correction process of the acquired image by the multi-viewpoint image capturing camera, Since the luminous sphere is photographed by each camera, it is possible to easily detect the sphere and obtain the center position of the sphere even in a shooting site where the lighting environment is unstable. By executing the calculation, it is possible to calculate the F matrix with high accuracy. In other words, it is possible to calculate a fundamental matrix as a geometry parameter, which is a parameter indicating a positional relationship between real shooting cameras included in the multi-view video imaging system, with high accuracy, and to achieve high speed and high speed. Accurate calibration becomes possible.
[0153]
Further, according to the configuration of the present invention, the luminous sphere which is hardly affected by the lighting environment and the like is used as a calibration jig, and the center of the sphere is detected and associated in real time, and the F matrix is estimated. By evaluating the error and positional relationship between the matrix and the corresponding points, the number of necessary corresponding points and the sphere position to be placed in the three-dimensional space are indicated, and a more accurate F matrix is efficiently calculated. It becomes possible.
[0154]
Furthermore, according to the configuration of the present invention, it is possible to perform camera calibration while observing and photographing a sphere image using a multi-view video photographing system having a synchronization function and recording the spherical image on a recording medium such as a memory or a hard disk on each personal computer. It is possible to extract the coordinate position (feature point) of the center of the sphere in each frame image by using an image processing method on each personal computer, and to estimate and estimate a fundamental (Fundamental) matrix by associating the images using the extracted feature points. Matrix accuracy evaluation, determination of the number of corresponding points and appropriate positions of corresponding points, presentation, etc. can be executed on, for example, a PC, and high-speed and high-precision calibration processing is executed on a simple system such as a PC. It is possible to do.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a multi-viewpoint camera photographing process and a process of generating a virtual viewpoint video using a real video taken by a multi-viewpoint camera.
FIG. 2 is a diagram illustrating various parameters applied to a calibration process.
FIG. 3 is a diagram illustrating a general F parameter estimation method.
FIG. 4 is a diagram showing a camera calibration procedure when a checker pattern is used as a calibration jig to estimate an F matrix.
FIG. 5 is a diagram showing an example of a configuration in which a luminescent sphere is photographed from different viewpoints using the luminescent sphere as a calibration tool.
FIG. 6 is a diagram illustrating a synchronous recording configuration of a plurality of camera images provided with a network synchronization mechanism.
FIG. 7 is a diagram illustrating a configuration example of a video signal processing unit configured by a DV capture board.
FIG. 8 is a diagram illustrating details of a synchronization adjustment process executed by the server PC.
FIG. 9 is a diagram illustrating a time-series image (t = 1,..., K) of a light emitting sphere observed by one camera.
FIG. 10 is a diagram illustrating a procedure of a process of determining a center point position of a light emitting sphere executed as a feature point extraction process.
FIG. 11 is a diagram illustrating a process of detecting the center position of each sphere from a time-series image observed by an adjacent camera and setting the position as a feature point extracted from one image.
FIG. 12 is a flowchart illustrating a processing procedure for obtaining a sphere center position as a feature point from each image frame based on a time-series image including a plurality of frames.
FIG. 13 is a block diagram illustrating a functional configuration of a calibration processing device according to the present invention.
FIG. 14 is a diagram for explaining an optimal F matrix calculation processing procedure executed by the calibration processing apparatus of the present invention.
FIG. 15 is a diagram illustrating an epipolar line.
FIG. 16 is a diagram illustrating generation of an epipolar line and calculation processing of a distance (error) between a corresponding point used for F matrix estimation and the epipolar line.
FIG. 17 is a diagram illustrating a feature point distribution determination process.
FIG. 18 is a diagram illustrating a specific hardware configuration example of the calibration processing device according to the present invention.
[Explanation of symbols]
100 subjects
101,102 camera
111,112 Photographed image
113 Virtual viewpoint camera
121 Virtual viewpoint camera image
301 camera
302 live-action image
301 camera
320 Synchronization signal generator
330 A / D converter
350 Client PC
355 video signal processing unit
356 storage means
357 Network Card
360 image monitor
370 Server PC
380 network
391 CPU
392 RAM
393 ROM
394 1394 port
395 LINK / PHY
396 storage means
500 luminous sphere
501,502 camera
511,512 feature point distribution data
601 Image input unit
602 feature point extraction unit
603 Correlation processing unit
604 F matrix calculation unit
605 Virtual viewpoint image generation unit
901 CPU
902 ROM
903 RAM
904 HDD
905 input section
906 output unit
907 Communication unit
908 drive
909 Removable recording medium
910 bus
911 I / O interface

Claims

It is a calibration processing device that performs a parameter calculation process applied to the adjustment between the multi-view image shooting cameras or the correction process of the acquired image by the multi-view image shooting camera,
An image input unit for inputting video data of a plurality of cameras capturing a moving sphere from different viewpoint directions,
A feature point extraction unit that executes a process of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input unit,
An association processing unit that executes an association process of a sphere center position extracted by the feature point extraction unit as a feature point in a corresponding frame of each camera;
An F matrix calculation unit that calculates a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing unit;
A calibration processing device comprising:

The sphere is a luminescent sphere,
The calibration processing apparatus according to claim 1, wherein the feature point extracting unit is configured to execute a process of calculating a sphere center position based on edge information of a sphere captured in a frame.

The F matrix calculation unit sets an epipolar line based on the calculated F matrix, calculates a distance between the set epipolar line and a center position of a sphere corresponding to a feature point individually or as an average distance, and calculates the calculated distance. The calibration processing apparatus according to claim 1, wherein when the value is larger than a predetermined threshold value, the F matrix is recalculated based on feature point correspondence data including a new feature point. .

The F-matrix calculation unit executes a sphere center position in a captured image frame, that is, a distribution state determination process of a feature point, and when a feature point uneven distribution is confirmed, a feature point additionally set to a position at which the uneven distribution is eliminated. 2. The calibration processing apparatus according to claim 1, wherein the calibration processing apparatus is configured to execute an F matrix recalculation process based on the feature point correspondence data included.

2. The calibration according to claim 1, wherein the feature point extracting unit is configured to execute a center position calculation process based on the edge information only when an edge of the entire sphere exists in a shooting frame. 3. Processing equipment.

The feature point extraction unit obtains at least eight spherical center position data required to calculate the ratio of the nine elements of the F matrix from a plurality of captured image frames of each camera,
The associating processing unit executes an associating process of the sphere center positions of eight or more points extracted by the feature point extracting unit, and the F matrix calculating unit performs an associating process of the at least eight feature point associating data. The calibration processing apparatus according to claim 1, wherein the calibration processing apparatus is configured to execute calculation of an element value of a fundamental matrix as a calibration parameter based on the equation.

It is a calibration processing method for performing adjustment between multi-viewpoint image capturing cameras, or parameter calculation processing applied to correction processing of an acquired image of the multi-viewpoint image capturing camera,
An image input step of inputting video data of a plurality of cameras that photograph a moving sphere from different viewpoint directions,
A feature point extraction step of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input step;
An associating processing step of executing a feature point associating process based on the sphere center position extracted in the feature point extracting step;
An F matrix calculation step of calculating a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing step;
A calibration processing method comprising:

The sphere is a luminescent sphere,
The calibration processing method according to claim 7, wherein the feature point extracting step performs a calculation process of a sphere center position based on edge information of a sphere captured in a frame.

The F matrix calculation step sets an epipolar line based on the calculated F matrix, calculates the distance between the set epipolar line and the center position of the sphere as the feature point individually or as an average distance, and calculates the calculated distance in advance. 8. The calibration processing method according to claim 7, wherein when the value is larger than a predetermined threshold value, an F matrix recalculation process is performed based on feature point correspondence data including a new feature point.

The F-matrix calculation step executes a center position of a sphere in the captured image frame, that is, a distribution state determination process of feature points, and when a feature point uneven distribution is confirmed, an additional feature point additionally set to a position at which the uneven distribution is eliminated. 8. The calibration processing method according to claim 7, wherein an F-matrix recalculation process is performed based on feature point correspondence data including:

8. The calibration processing method according to claim 7, wherein the feature point extracting step executes a center position calculation process based on the edge information only when an edge of the entire sphere exists in the photographing frame.

The feature point extracting step acquires at least eight spherical center position data required for calculating a ratio of the nine elements of the F matrix from a plurality of captured image frames of each camera,
The associating processing step executes associating processing of eight or more spherical center positions extracted in the feature point extracting step,
8. The method according to claim 7, wherein the F-matrix calculating step calculates element values constituting a fundamental matrix as a calibration parameter based on eight or more associated feature point correspondence data. The described calibration processing method.

A computer program for performing adjustment between multi-viewpoint image capturing cameras, or parameter calculation processing applied to correction processing of an acquired image of a multi-viewpoint image capture camera,
An image input step of inputting video data of a plurality of cameras that photograph a moving sphere from different viewpoint directions,
A feature point extraction step of extracting a sphere center position as a feature point from a plurality of captured image frames constituting video data of a plurality of cameras input in the image input step;
An associating process step of executing an associating process of the sphere center position extracted in the feature point extracting step,
An F matrix calculation step of calculating a fundamental matrix as a calibration parameter based on the feature point correspondence data associated in the association processing step;
A computer program comprising: