JP7596105B2

JP7596105B2 - Viewing state estimation device, robot system, viewing state estimation method, and viewing state estimation program

Info

Publication number: JP7596105B2
Application number: JP2020162380A
Authority: JP
Inventors: 祐太星; 勇太萩尾; 真利奈上村; 豊金子
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2024-12-09
Anticipated expiration: 2040-09-28
Also published as: JP2022055029A

Description

本発明は、テレビ視聴者の状態を推定するための装置、方法及びプログラムに関する。 The present invention relates to an apparatus, method, and program for estimating the state of a television viewer.

従来、視聴者と一緒にテレビ番組等の映像を視聴するロボットにおいて、視聴者の視聴状態に応じて動作を制御する技術が研究されている。
視聴状態を推定する技術として、例えば、テレビを視聴するような室内において、視聴者が向いている方向を検出するために、カメラを設置して観測する方法、又は視聴者に眼鏡型の視線方向取得装置を装着させて、視聴者の視線方向データを取得する方法がある。 2. Description of the Related Art Conventionally, for a robot that watches a television program or other video together with a viewer, technology has been researched for controlling the movement of the robot in accordance with the viewing state of the viewer.
Techniques for estimating the viewing state include, for example, a method of installing a camera to observe in a room where a television is being watched in order to detect the direction in which the viewer is facing, or a method of having the viewer wear a glasses-type gaze direction acquisition device to acquire gaze direction data of the viewer.

さらに、推定された視聴状態に応じた制御として、例えば、特許文献１では、ユーザの視線方向を検出し、画像投影デバイスから映像を投影する表示位置を決定すると共に、表示画像の幾何補正を行い表示させることで、ユーザが見やすい画像を表示する装置が提案されている。
また、特許文献２では、コンテンツを視聴中の視聴者を含む画像から、視聴状態を検出し、視聴しているコンテンツの開始時からの経過時間に関連付ける装置が提案されている。 Furthermore, as a control according to the estimated viewing state, for example, Patent Document 1 proposes an apparatus that detects the user's line of sight, determines the display position for projecting an image from an image projection device, and performs geometric correction on the displayed image to display it, thereby displaying an image that is easy for the user to see.
Moreover, Patent Document 2 proposes a device that detects a viewing state from an image including a viewer viewing content, and associates the state with the elapsed time from the start of the content being viewed.

特開２０１７－５５１７８号公報JP 2017-55178 A 特許第６６１４５４７号公報Patent No. 6614547

視聴状態を推定する従来の方法のうち、天井、壁面、又はテレビの上等にカメラを設置した実験室のような特定の場所で、視聴者の映像から視線方向を推定する方法は、実験的に一定期間のみ実施されるものである。したがって、日常的な視聴環境において視聴者の視線方向を推定することは困難である。
また、眼鏡型の視線方向取得装置を視聴者の頭部に装着させる方法は、日常的な視聴状況とは異なり、装着することによる違和感が生じ視聴の負担ともなる。したがって、これも日常的な視聴環境においては困難な方法である。 Among conventional methods for estimating the viewing state, a method for estimating the gaze direction from the viewer's video in a specific location, such as a laboratory with cameras installed on the ceiling, wall, or above the television, is only carried out experimentally for a certain period of time, and therefore it is difficult to estimate the viewer's gaze direction in an everyday viewing environment.
In addition, the method of having the viewer wear a glasses-type gaze direction acquisition device on their head is different from everyday viewing conditions, and wearing the device creates a sense of discomfort and puts a strain on the viewer. Therefore, this method is also difficult to use in everyday viewing environments.

特許文献１の方法では、室内の天井部分の四隅に撮像デバイスが設けられ、撮像デバイスから取得される撮像画像に基づき、ユーザの視線方向が推定される。この場合、室内に撮像デバイスを設置する必要があるが、家庭の室内に設置することは困難である。また、室内全体を撮像するために撮像デバイスが複数台必要である。 In the method of Patent Document 1, imaging devices are installed at the four corners of the ceiling of a room, and the user's gaze direction is estimated based on the captured images acquired from the imaging devices. In this case, it is necessary to install imaging devices in the room, but it is difficult to install them in a home. In addition, multiple imaging devices are required to capture an image of the entire room.

特許文献２の方法では、視聴者を含むカメラ画像から抽出した視聴者のバイタル情報に基づいて、視聴者の視聴状態を判定しているが、カメラをディスプレイ上部等に設置する必要があり、日常的な視聴状況とは異なってしまう。また、この方法では、視聴者がディスプレイではなく他者の方を向いて会話をしている等の情報は取得できず、実際には視聴していないときのバイタル情報をも反映させてしまい適切な視聴状態を判定できなくなってしまう。 In the method of Patent Document 2, the viewing state of a viewer is determined based on vital information of the viewer extracted from a camera image that includes the viewer, but the camera needs to be installed above the display, etc., which differs from everyday viewing conditions. In addition, this method cannot obtain information such as when a viewer is facing another person and not the display while talking, and it also reflects vital information when the viewer is not actually watching, making it impossible to determine the appropriate viewing state.

本発明は、追加のカメラ等の装置を用いることなく、ロボットにおいて視聴者の視聴状態を推定できる視聴状態推定装置、視聴状態推定方法及び視聴状態推定プログラムを提供することを目的とする。 The present invention aims to provide a viewing state estimation device, a viewing state estimation method, and a viewing state estimation program that can estimate the viewing state of a viewer using a robot without using additional devices such as cameras.

本発明に係る視聴状態推定装置は、ロボットの周囲を撮像した画像から合成された、全方位のパノラマ画像を取得するパノラマ画像部と、前記パノラマ画像の各画素と対応した距離データを画素値とする距離パノラマ画像を生成する距離パノラマ画像部と、前記パノラマ画像から、テレビ位置を検出するテレビ検出部と、前記パノラマ画像から、視聴者の顔位置を検出する視聴者検出部と、前記距離パノラマ画像から、前記テレビ位置及び前記顔位置における距離をそれぞれ取得する距離取得部と、前記パノラマ画像のサイズ、前記テレビ位置及び前記顔位置に基づいて、前記ロボットから見たテレビと視聴者との間の角度を算出することにより、前記ロボット、前記視聴者及び前記テレビの位置関係を特定し、前記顔位置の画像から得られる前記視聴者の顔方向角度に基づいて、前記パノラマ画像から前記視聴者の視聴方向角度にある視聴方向画像を取得する視聴方向検出部と、前記視聴方向画像に含まれる物体を検出し、当該物体の種類に基づいて、前記視聴者の状態を判定する視聴状態判定部と、を備える。 The viewing state estimation device according to the present invention includes a panoramic image unit that acquires an omnidirectional panoramic image synthesized from images captured around the robot, a distance panoramic image unit that generates a distance panoramic image in which distance data corresponding to each pixel of the panoramic image is used as a pixel value, a television detection unit that detects the position of the television from the panoramic image, a viewer detection unit that detects the position of the viewer's face from the panoramic image, a distance acquisition unit that acquires the distances at the television position and the face position from the distance panoramic image, a viewing direction detection unit that calculates the angle between the television and the viewer as seen by the robot based on the size of the panoramic image, the television position, and the face position to identify the positional relationship between the robot, the viewer, and the television, and acquires a viewing direction image at the viewing direction angle of the viewer from the panoramic image based on the face direction angle of the viewer obtained from the image of the face position, and a viewing state determination unit that detects an object included in the viewing direction image and determines the state of the viewer based on the type of the object.

前記視聴方向検出部は、前記視聴者から視聴方向位置までの距離を、前記ロボットから前記視聴者までの距離と、前記ロボットから前記視聴方向位置までの距離との和で近似してもよい。 The viewing direction detection unit may approximate the distance from the viewer to the viewing direction position by the sum of the distance from the robot to the viewer and the distance from the robot to the viewing direction position.

前記視聴状態判定部は、前記視聴者の状態の一定時間内における統計情報に基づいて、視聴状態を算出してもよい。 The viewing state determination unit may calculate the viewing state based on statistical information about the viewer's state within a certain period of time.

前記視聴状態判定部は、前記視聴状態として、前記テレビを視聴している割合を示す視聴度を算出してもよい。 The viewing state determination unit may calculate a viewing degree indicating the proportion of the television being watched as the viewing state.

前記視聴状態判定部は、前記視聴状態として、前記テレビを見ている状態、及び他者を見ている状態含む複数の状態を判定してもよい。 The viewing state determination unit may determine a plurality of viewing states, including a state of watching the television and a state of watching someone else, as the viewing state.

本発明に係るロボットシステムは、前記視聴状態推定装置と、前記視聴状態推定装置から出力された前記視聴度を、所定の閾値と比較した結果により、前記ロボットの動作を制御する動作制御部と、を備える。 The robot system according to the present invention includes the viewing state estimation device, and an operation control unit that controls the operation of the robot based on the result of comparing the viewing degree output from the viewing state estimation device with a predetermined threshold value.

前記動作制御部は、複数の閾値に基づいて、前記ロボットの制御を段階的に変更してもよい。 The operation control unit may gradually change the control of the robot based on multiple thresholds.

本発明に係るロボットシステムは、前記視聴状態推定装置と、前記視聴状態推定装置から出力された前記複数の状態の区分に応じて、前記ロボットの動作を制御する動作制御部と、を備える。 The robot system according to the present invention includes the viewing state estimation device and an operation control unit that controls the operation of the robot in accordance with the classification of the plurality of states output from the viewing state estimation device.

本発明に係る視聴状態推定方法は、ロボットの周囲を撮像した画像から合成された、全方位のパノラマ画像を取得するパノラマ画像生成ステップと、前記パノラマ画像の各画素と対応した距離データを画素値とする距離パノラマ画像を生成する距離パノラマ画像生成ステップと、前記パノラマ画像から、テレビ位置を検出するテレビ検出ステップと、前記パノラマ画像から、視聴者の顔位置を検出する視聴者検出ステップと、前記距離パノラマ画像から、前記テレビ位置及び前記顔位置における距離をそれぞれ取得する距離取得ステップと、前記パノラマ画像のサイズ、前記テレビ位置及び前記顔位置に基づいて、前記ロボットから見たテレビと視聴者との間の角度を算出することにより、前記ロボット、前記視聴者及び前記テレビの位置関係を特定し、前記顔位置の画像から得られる前記視聴者の顔方向角度に基づいて、前記パノラマ画像から前記視聴者の視聴方向角度にある視聴方向画像を取得する視聴方向検出ステップと、前記視聴方向画像に含まれる物体を検出し、当該物体の種類に基づいて、前記視聴者の状態を判定する視聴状態判定ステップと、をコンピュータが実行する。 The viewing state estimation method according to the present invention includes a panoramic image generation step of acquiring an omnidirectional panoramic image synthesized from images of the robot's surroundings, a distance panoramic image generation step of generating a distance panoramic image in which distance data corresponding to each pixel of the panoramic image is used as a pixel value, a television detection step of detecting a television position from the panoramic image, a viewer detection step of detecting a face position of the viewer from the panoramic image, a distance acquisition step of acquiring the distances at the television position and the face position from the distance panoramic image, a viewing direction detection step of calculating the angle between the television and the viewer as seen by the robot based on the size of the panoramic image, the television position, and the face position, thereby identifying the positional relationship between the robot, the viewer, and the television, and acquiring a viewing direction image at the viewing direction angle of the viewer from the panoramic image based on the face direction angle of the viewer obtained from the image of the face position, and a viewing state determination step of detecting an object included in the viewing direction image and determining the state of the viewer based on the type of the object.

本発明に係る視聴状態推定プログラムは、前記視聴状態推定装置としてコンピュータを機能させるためのものである。 The viewing state estimation program according to the present invention is for causing a computer to function as the viewing state estimation device.

本発明によれば、追加のカメラ等の装置を用いることなく、ロボットにおいて視聴者の視聴状態を推定できる。 According to the present invention, the robot can estimate the viewer's viewing state without using additional cameras or other devices.

実施形態における視聴状態推定装置が組み込まれたロボットの利用シーンを説明する図である。FIG. 1 is a diagram illustrating a usage scene of a robot incorporating a viewing state estimating device according to an embodiment. 実施形態における視聴状態推定装置の機能構成を示すブロック図である。1 is a block diagram showing a functional configuration of a viewing state estimation device according to an embodiment. 実施形態における合成位置データを例示する図である。10A and 10B are diagrams illustrating example synthesis position data according to an embodiment. 実施形態における距離パノラマ画像部の機能構成を示す図である。FIG. 4 is a diagram illustrating a functional configuration of a distance panorama image unit in the embodiment. 実施形態における距離画像合成部の動作内容を例示する図である。6A to 6C are diagrams illustrating an example of the operation of a distance image synthesis unit in the embodiment. 実施形態における重複区間算出部の動作内容を例示する図である。11 is a diagram illustrating an example of the operation of an overlapping section calculation unit in the embodiment. FIG. 実施形態における距離画像データベースを例示する図である。FIG. 4 is a diagram illustrating an example of a distance image database in the embodiment. 実施形態における距離パノラマ画像データベースを例示する図である。FIG. 4 is a diagram illustrating a distance panorama image database in the embodiment. 実施形態における視聴者検出部の機能構成を示す図である。FIG. 4 is a diagram illustrating a functional configuration of a viewer detection unit in the embodiment. 実施形態における視聴方向検出部の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of a viewing direction detection unit in the embodiment. 実施形態におけるロボット、テレビ、視聴者の位置関係を示す図である。FIG. 2 is a diagram showing the positional relationship between a robot, a television, and a viewer in an embodiment. 実施形態におけるテレビ－視聴者間の角度の算出方法を説明する図である。10A and 10B are diagrams for explaining a method for calculating an angle between a television and a viewer in an embodiment. 実施形態における顔方向角度を説明する図である。4A and 4B are diagrams illustrating a face direction angle according to an embodiment. 実施形態におけるロボット、視聴者、及び視聴方向位置の位置関係を示す図である。FIG. 2 is a diagram showing the positional relationship between a robot, a viewer, and a viewing direction position in the embodiment. 実施形態における視聴方向画像の取得方法を例示する図である。10A and 10B are diagrams illustrating a method for acquiring a viewing direction image in an embodiment. 実施形態における視聴状態判定部の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of a viewing state determination unit in the embodiment. 実施形態における視聴度の算出例を示す図である。FIG. 11 is a diagram showing an example of calculation of a viewer rating in the embodiment.

以下、本発明の実施形態の一例について説明する。
図１は、本実施形態における視聴状態推定装置１０が組み込まれたロボット１の利用シーンを説明する図である。 An example of an embodiment of the present invention will now be described.
FIG. 1 is a diagram illustrating a usage scene of a robot 1 incorporating a viewing state estimating device 10 according to the present embodiment.

ロボット１は、例えば、テレビを視聴する視聴者の側にある卓上等に設置される。ロボット１は、視聴状態推定装置１０の他、撮像部２０及び距離検出部３０を備え、さらに、視聴状態推定装置１０により推定された視聴状態に応じて発話等の動作を行う動作制御部４０を備える。 The robot 1 is placed, for example, on a table near a viewer watching television. In addition to the viewing state estimation device 10, the robot 1 is equipped with an imaging unit 20 and a distance detection unit 30, and further equipped with an operation control unit 40 that performs operations such as speaking in accordance with the viewing state estimated by the viewing state estimation device 10.

視聴状態推定装置１０は、撮像部２０から得られるロボット周囲の画像データと、距離検出部３０から得られる距離データとを取得し、後述の手法により、ロボット１からテレビまでの距離、ロボット１から視聴者までの距離、視聴者の視聴方向を求め、さらに、視聴者の視聴方向の画像を取得することで視聴状態を推定する。 The viewing state estimation device 10 obtains image data of the robot's surroundings from the imaging unit 20 and distance data from the distance detection unit 30, and calculates the distance from the robot 1 to the television, the distance from the robot 1 to the viewer, and the viewer's viewing direction using a method described below, and further obtains an image of the viewer's viewing direction to estimate the viewing state.

撮像部２０は、ロボット１に搭載され、画像を取得するためのカメラであり、モータにより水平方向に回転し、ロボット１の周囲を撮影するものであってよい。
なお、撮像部２０は、回転しつつ画像を取得する手法に限らず、例えば、複数個のカメラからなるカメラアレイをロボット１に搭載して画像を取得するものであってもよい。 The imaging unit 20 is a camera mounted on the robot 1 for acquiring images, and may be configured to rotate horizontally by a motor and capture images of the surroundings of the robot 1 .
The imaging unit 20 is not limited to a method of acquiring images while rotating, and may acquire images by mounting a camera array consisting of multiple cameras on the robot 1, for example.

距離検出部３０は、照射部から照射した赤外線光と、対象物に反射して受光部へ到達した光に基づいて距離データを取得する。赤外線の照射方式は、例えば、パターン方式又はＴＯＦ（ＴｉｍｅＯｆＦｌｉｇｈｔ）方式による。距離検出部３０は、撮像部２０により画像を取得した際に、時刻同期して、また、画像の各画素に対応する方向の距離データを取得する。 The distance detection unit 30 acquires distance data based on the infrared light emitted from the irradiation unit and the light reflected from the object and reaching the light receiving unit. The infrared irradiation method is, for example, a pattern method or a TOF (Time Of Flight) method. When an image is acquired by the imaging unit 20, the distance detection unit 30 acquires distance data in a direction corresponding to each pixel of the image in time synchronization.

ここで、視聴状態は、例えば、テレビを見ている、他者を見ている、その他を見ている等、視聴方向の区分を示す。
動作制御部４０は、視聴状態推定装置１０により推定された視聴状態に応じて、ロボット１による発話等の動作を変化させる。 Here, the viewing state indicates a category of viewing direction, such as watching television, watching someone else, or watching something else.
The operation control unit 40 changes the operation, such as speech, of the robot 1 according to the viewing state estimated by the viewing state estimation device 10 .

図２は、本実施形態における視聴状態推定装置１０の機能構成を示すブロック図である。
視聴状態推定装置１０は、制御部、記憶部、及び入出力インタフェース等を備えた情報処理装置（コンピュータ）であり、記憶部に格納されたソフトウェア（視聴状態推定プログラム）を制御部が実行することにより、次の各機能部として動作する。
視聴状態推定装置１０は、パノラマ画像部１１と、距離パノラマ画像部１２と、視聴者検出部１３と、テレビ検出部１４と、距離取得部１５と、視聴方向検出部１６と、視聴状態判定部１７とを備える。 FIG. 2 is a block diagram showing a functional configuration of the viewing state estimating device 10 according to the present embodiment.
The viewing state estimation device 10 is an information processing device (computer) equipped with a control unit, a memory unit, an input/output interface, etc., and operates as the following functional units by the control unit executing software (viewing state estimation program) stored in the memory unit.
The viewing state estimation device 10 includes a panoramic image section 11 , a distance panoramic image section 12 , a viewer detection section 13 , a television detection section 14 , a distance acquisition section 15 , a viewing direction detection section 16 , and a viewing state determination section 17 .

パノラマ画像部１１は、撮像部２０により取得したロボット１の周囲の複数枚の画像を重ね合わせて、ロボット１から見た水平全方位のパノラマ画像を生成する。なお、パノラマ画像を生成するためのソフトウェアとして、オープンソースＯｐｅｎＣＶ・Ｓｔｉｔｃｈｅｒクラスを利用できるが、生成手法はこれに限られない。 The panoramic image unit 11 superimposes multiple images of the surroundings of the robot 1 acquired by the imaging unit 20 to generate a horizontal omnidirectional panoramic image as seen from the robot 1. Note that the open source OpenCV Stitcher class can be used as software for generating the panoramic image, but the generation method is not limited to this.

また、パノラマ画像部１１は、複数枚の画像を合成した際の合成位置データＤｃを出力する。
図３は、本実施形態における合成位置データＤｃを例示する図である。
合成位置データＤｃは、パノラマ画像を生成するとき、画像を張り合わせた複数枚の画像の番号と、合成位置の座標とで構成される。
例えば、１番目の画像の座標（１５０，１０）に２番目の画像の座標（０，０）が張り合わされ、ｎ番目の画像の座標（ｘ_ｎ，ｙ_ｎ）に１番目の画像の座標（０，０）が張り合わされる。
また、この例では、同期して取得された距離画像の番号が対応付けられている。 The panoramic image unit 11 also outputs synthesis position data Dc when a plurality of images are synthesized.
FIG. 3 is a diagram illustrating the composite position data Dc in this embodiment.
The synthesis position data Dc is composed of the numbers of the multiple images that are stitched together when generating a panoramic image, and the coordinates of the synthesis position.
For example, the coordinates (150, 10) of the first image are pasted onto the coordinates (0, 0) of the second image, and the coordinates (0, 0) of the first image are pasted onto the coordinates (x _n , _yn ) of the nth image.
In this example, the numbers of the distance images acquired synchronously are associated with each other.

距離パノラマ画像部１２は、距離検出部３０で取得した距離データを用いて、パノラマ画像部１１で生成されるパノラマ画像と同様に、距離データを画素値とした距離パノラマ画像を生成する。 The distance panoramic image unit 12 uses the distance data acquired by the distance detection unit 30 to generate a distance panoramic image in which the distance data is used as pixel values, similar to the panoramic image generated by the panoramic image unit 11.

図４は、本実施形態における距離パノラマ画像部１２の機能構成を示す図である。
距離パノラマ画像部１２は、距離画像合成部１２１と、重複区間算出部１２２とを備え、合成位置データＤｃが入力され、距離パノラマ画像データＤｐを出力する。 FIG. 4 is a diagram showing the functional configuration of the distance panoramic image unit 12 in this embodiment.
The distance panorama image section 12 includes a distance image synthesis section 121 and an overlapping section calculation section 122, and receives the synthesis position data Dc and outputs distance panorama image data Dp.

距離画像合成部１２１は、パノラマ画像部１１から入力された合成位置データＤｃを用いて、パノラマ画像部１１で合成された複数枚の画像と同位置に、距離検出部３０より得られた複数枚の距離画像を合成する。
ここで、撮像された画像の各画素に対応する距離データは、距離画像データベース１２Ａに保存されている。なお、距離画像データベース１２Ａは、視聴状態推定装置１０の記憶部に格納されてもよいし、ロボット１の各部がアクセス可能な共通の記憶部に設けられてもよい。 The distance image synthesis unit 121 uses the synthesis position data Dc input from the panoramic image unit 11 to synthesize multiple distance images obtained from the distance detection unit 30 at the same position as the multiple images synthesized by the panoramic image unit 11.
Here, distance data corresponding to each pixel of the captured image is stored in distance image database 12 A. Note that distance image database 12 A may be stored in a storage unit of viewing state estimation device 10, or may be provided in a common storage unit accessible to each unit of robot 1.

重複区間算出部１２２は、合成した距離画像が重複する区間（画素）において、重複している複数個の距離データを平均し、画素毎の距離データを決定する。
なお、距離データの決定方法はこれに限られず、いずれかの距離画像のデータを代表として決定してもよい。 The overlapping section calculation unit 122 averages a plurality of overlapping distance data in a section (pixel) where the combined distance images overlap, and determines the distance data for each pixel.
The method of determining the distance data is not limited to this, and data from any of the distance images may be determined as a representative.

図５は、本実施形態における距離画像合成部１２１の動作内容を例示する図である。
例えば、撮像部２０で画像１と画像２とが順に取得され、画像２を画像１における合成位置Ｐ_ｃ１（１５０，０）で合成した場合、距離画像合成部１２１は、画像１及び画像２と同時に取得された距離画像１及び距離画像２も同様に合成し、距離画像データベース１２Ａから取得した距離データを各画素にマッピングする。 FIG. 5 is a diagram illustrating the operation of the distance image synthesis unit 121 in this embodiment.
For example, if images 1 and 2 are acquired in sequence by the imaging unit 20 and image 2 is synthesized at synthesis position P _c1 (150,0) in image 1, the distance image synthesis unit 121 also synthesizes distance image 1 and distance image 2, which were acquired simultaneously with images 1 and 2, and maps the distance data acquired from the distance image database 12A to each pixel.

図６は、本実施形態における重複区間算出部１２２の動作内容を例示する図である。
例えば、前述の図５のように距離画像１と距離画像２とを合成するとき、距離画像１における座標Ｐ_１（１５０，１５９）の距離データが１．５０、距離画像２における座標Ｐ_２（０，１５９）の距離データが１．６０であったとする。この場合、重複区間算出部１２２は、平均値（１．５０＋１．６０）／２＝１．５５を、距離パノラマ画像の位置（１５０，１５９）における距離データとして決定する。
あるいは、重複区間算出部１２２は、距離画像２のＰ_２（０，１５９）の距離データ１．６０を、距離パノラマ画像の距離データとして採用してもよい。 FIG. 6 is a diagram illustrating an example of the operation of the overlapping section calculation unit 122 in this embodiment.
5, suppose that the distance data at coordinate _P1 (150,159) in distance image 1 is 1.50 and the distance data at coordinate _P2 (0,159) in distance image 2 is 1.60. In this case, the overlapping section calculation unit 122 determines the average value (1.50+1.60)/2=1.55 as the distance data at position (150,159) in the distance panorama image.
Alternatively, the overlapping section calculation section 122 may use the distance data 1.60 of P ₂ (0,159) in distance image 2 as the distance data for the distance panoramic image.

同様に、重複区間算出部１２２は、重複区間の全体について、各画素に対応する距離データを決定する。
決定された距離データは、記憶部の距離パノラマ画像データベース１２Ｂに記憶される。 Similarly, the overlapping section calculation unit 122 determines distance data corresponding to each pixel for the entire overlapping section.
The determined distance data is stored in the distance panorama image database 12B in the storage unit.

図７は、本実施形態における距離画像データベース１２Ａを例示する図である。
距離画像データベース１２Ａでは、距離画像番号ｎ_ｄ毎に、距離画像内の位置（座標ｘ_ｄ，ｙ_ｄ）と、距離データｌ_ｄとが対応付けられ、各距離画像における各画素の距離データが格納されている。 FIG. 7 is a diagram illustrating an example of distance image database 12A in this embodiment.
In distance image database 12A, for each distance image number _{n_d} , a position in the distance image (coordinates _{x_d} , _{y_d} ) is associated with distance data _{l_d} , and distance data for each pixel in each distance image is stored.

図８は、本実施形態における距離パノラマ画像データベース１２Ｂを例示する図である。
距離パノラマ画像データベース１２Ｂには、パノラマ画像の画像番号ｎ_ｐ、位置（座標ｘ_ｐ，ｙ_ｐ）、距離データｌ_ｐで構成される距離パノラマ画像データが格納されている。 FIG. 8 is a diagram illustrating an example of the distance panoramic image database 12B in this embodiment.
The distance panorama image database 12B stores distance panorama image data that is composed of an image number _np , a position (coordinates _xp , _yp ) of a panorama image, and distance data _lp .

視聴者検出部１３は、パノラマ画像部１１により生成されたパノラマ画像から、視聴者の顔位置を検出し、パノラマ画像上の顔位置を取得する。 The viewer detection unit 13 detects the face position of the viewer from the panoramic image generated by the panoramic image unit 11 and obtains the face position on the panoramic image.

図９は、本実施形態における視聴者検出部１３の機能構成を示す図である。
視聴者検出部１３は、人検出部１３１と、顔検出部１３２とを備える。 FIG. 9 is a diagram showing the functional configuration of the viewer detection unit 13 in this embodiment.
The viewer detection unit 13 includes a person detection unit 131 and a face detection unit 132 .

人検出部１３１は、パノラマ画像部１１より取得したパノラマ画像から人検出を行う。人検出の手段として、例えば、オープンソースのＯｐｅｎＣＶ（Ｈａａｒｃａｓｃａｄｅ検出器ｆｕｌｌｂｏｄｙモデル）又はＦａｓｔｅｒ－ＲＣＮＮ等のソフトウェアを利用できるが、検出手法はこれらに限られない。
人検出部１３１は、人を検出した場合、視聴者フラグｆ_ｈ＝１を、人を検出しなかった場合、視聴者フラグｆ_ｈ＝０を出力する。 The human detection unit 131 detects humans from the panoramic image acquired from the panoramic image unit 11. As a means for detecting humans, for example, open source software such as OpenCV (Haarcascade detector full body model) or Faster-RCNN can be used, but the detection method is not limited to these.
If the human detection unit 131 detects a human, it outputs a viewer flag f _h =1, and if it does not detect a human, it outputs a viewer flag f _h =0.

顔検出部１３２は、人検出部１３１により人、すなわち視聴者が検出されたとき（ｆ_ｈ＝１）、パノラマ画像部１１により生成されたパノラマ画像から顔検出を行い、パノラマ画像上の顔位置を取得する。一方、視聴者が検出されなかったとき（ｆ_ｈ＝０）には、顔検出部１３２は、顔検出を行わないこととし、撮像部２０から新たな画像を取得する処理に進み高速化を図る。
顔検出の手段には、例えば、オープンソースのＯｐｅｎＣＶ（Ｈａａｒ－ｃａｓｃａｄｅ検出器）又はＯｐｅｎＦａｃｅ等のソフトウェアを利用できるが、検出手法はこれらに限られない。 When the person detection unit 131 detects a person, i.e., a viewer ( _fh = 1), the face detection unit 132 performs face detection from the panoramic image generated by the panoramic image unit 11 and acquires the face position on the panoramic image. On the other hand, when no viewer is detected ( _fh = 0), the face detection unit 132 does not perform face detection and proceeds to the process of acquiring a new image from the imaging unit 20 to increase speed.
As a means for face detection, for example, open source software such as OpenCV (Haar-cascade detector) or OpenFace can be used, but the detection method is not limited to these.

ここで、顔位置は、顔部分が四角枠で検出され、顔検出部１３２は、四角枠の始点Ｐ_ｓｆ（ｘ_ｓｆ，ｙ_ｓｆ）及び終点Ｐ_ｅｆ（ｘ_ｅｆ，ｙ_ｅｆ）から中心点Ｐ_ｆ（ｘ_ｆ，ｙ_ｆ）を算出する。顔の中心点Ｐ_ｆは、顔位置の距離取得に用いられる。
顔検出部１３２は、顔を検出した場合に顔フラグｆ_ｆ＝１を、顔を検出しなかった場合に顔フラグｆ_ｆ＝０を出力する。 Here, the face position is determined by detecting the face portion in a rectangular frame, and the face detection unit 132 calculates a center point _Pf ( _xf , _yf ) from the start point _Psf ( _xsf , _ysf ) and end point _Pef ( _xef , _yef ) of the rectangular frame. The center point _Pf of the face is used to obtain the distance of the face position.
The face detection unit 132 outputs a face flag f _f =1 if a face is detected, and outputs a face flag f _f =0 if a face is not detected.

テレビ検出部１４は、パノラマ画像部１１により生成されたパノラマ画像からテレビ検出を行い、パノラマ画像上のテレビ位置を取得する。
テレビ検出の手段には、例えば、オープンソースのＦａｓｔｅｒ－ＲＣＮＮ等のソフトウェアを利用できるが、検出手法はこれに限られない。 The television detection unit 14 detects a television from the panoramic image generated by the panoramic image unit 11, and obtains the position of the television on the panoramic image.
As a means for detecting a television, for example, open source software such as Faster-RCNN can be used, but the detection method is not limited to this.

ここで、テレビ位置は、テレビ部分が四角枠で検出され、テレビ検出部１４は、四角枠の始点Ｐ_ｓｔｖ（ｘ_ｓｔｖ，ｙ_ｓｔｖ）及び終点Ｐ_ｅｔｖ（ｘ_ｅｔｖ，ｙ_ｅｔｖ）から中心点Ｐ_ｔｖ（ｘ_ｔｖ，ｙ_ｔｖ）を算出する。テレビの中心点Ｐ_ｔｖは、テレビ位置の距離取得に用いられる。
テレビ検出部１４は、テレビを検出した場合にテレビフラグｆ_ｔｖ＝１を、テレビを検出しなかった場合にテレビフラグｆ_ｔｖ＝０を出力する。 Here, the television position is detected by detecting the television portion as a rectangular frame, and the television detection unit 14 calculates a center point _Ptv ( _xtv , _ytv ) from the start point _Pstv ( _xstv , ystv) and end point _Petv ( _xetv , _yetv ) of the _rectangular frame. The television center point _Ptv is used to obtain the distance to the television position.
The television detection unit 14 outputs a television flag f _tv =1 if a television is detected, and outputs a television flag f _tv =0 if a television is not detected.

距離取得部１５は、検出された視聴者及びテレビの各位置における距離データを距離パノラマ画像から取得する。
具体的には、距離取得部１５は、顔検出部１３２から顔中心点Ｐ_ｆと、テレビ検出部１４からテレビ中心点Ｐ_ｔｖとを取得し、距離パノラマ画像データベース１２Ｂに格納している距離データのうち、ロボット１から視聴者の顔までの距離ｄ_ｆと、ロボット１からテレビまでの距離ｄ_ｔｖとを取得する。
例えば、顔中心点Ｐ_ｆが（１０００，２００）の場合、距離パノラマ画像における点（１０００，２００）における距離データｌｄを取得し、ｄ_ｆ＝ｌ_ｄとする。テレビ中心点Ｐ_ｔｖが（３０００，１５０）の場合、パノラマ距離画像における点（３０００，１５０）における距離データｌ_ｄを取得し、ｄ_ｔｖ＝ｌ_ｄとする。 The distance acquisition unit 15 acquires distance data for each position of the detected viewer and television from the distance panorama image.
Specifically, the distance acquisition unit 15 acquires the face center point _Pf from the face detection unit 132 and the television center point _Ptv from the television detection unit 14, and acquires the distance _df from the robot 1 to the viewer's face and the distance _dtv from the robot 1 to the television from the distance data stored in the distance panorama image database 12B.
For example, when the face center point _Pf is (1000, 200), distance data ld at point (1000, 200) in the distance panoramic image is obtained, and _df = _ld . When the television center point _Ptv is (3000, 150), distance data _ld at point (3000, 150) in the panoramic distance image is obtained, and _dtv = _ld .

視聴方向検出部１６は、ロボット１から視聴者の顔までの距離、ロボット１からテレビまでの距離、及びロボット１から見た視聴者とテレビとの間の角度を取得することにより、ロボット１、視聴者及びテレビの位置関係を特定し、視聴者の顔方向角度から視聴方向及び視聴画像を取得する。 The viewing direction detection unit 16 acquires the distance from the robot 1 to the viewer's face, the distance from the robot 1 to the television, and the angle between the viewer and the television as seen by the robot 1, thereby determining the relative positions of the robot 1, the viewer, and the television, and acquires the viewing direction and viewed image from the angle of the viewer's face.

図１０は、本実施形態における視聴方向検出部１６の機能構成を示す図である。
視聴方向検出部１６は、テレビ－視聴者間角度算出部１６１と、テレビ－視聴者間距離算出部１６２と、ロボット－テレビ間角度算出部１６３と、顔方向角度取得部１６４と、視聴方向角度算出部１６５と、視聴方向画像取得部１６６とを備える。 FIG. 10 is a diagram showing the functional configuration of the viewing direction detection unit 16 in this embodiment.
The viewing direction detection unit 16 includes a television-viewer angle calculation unit 161, a television-viewer distance calculation unit 162, a robot-television angle calculation unit 163, a face direction angle acquisition unit 164, a viewing direction angle calculation unit 165, and a viewing direction image acquisition unit 166.

図１１は、本実施形態におけるロボット１、テレビ、視聴者の位置関係を示す図である。
ロボット１（点Ａ）、視聴者（点Ｂ）及びテレビ（点Ｃ）からなる三角形において、ＢＣ＝ｒ、ＡＢ＝ｒ_１、ＡＣ＝ｒ_２が固定されている。
また、ロボット１から見た視聴者とテレビとの間の角度（∠ＢＡＣ）＝θ_ｒ、視聴者から見たロボットとテレビとの間の角度∠ＡＢＣ＝θ_ｈ、テレビから見たロボット１と視聴者との間の角度∠ＡＣＢ＝θ_ｔｖが固定されている。 FIG. 11 is a diagram showing the positional relationship between the robot 1, the television, and the viewer in this embodiment.
In the triangle consisting of robot 1 (point A), viewer (point B) and TV (point C), BC=r, AB=r ₁ , and AC=r ₂ are fixed.
In addition, the angle between the viewer and the television as seen by the robot 1 (∠BAC) = _θr , the angle between the robot and the television as seen by the viewer, ∠ABC = _θh , and the angle between the robot 1 and the viewer as seen by the television, ∠ACB = θtv _, are fixed.

この位置関係において、視聴者がロボット１から顔方向角度θ_ｈ’ずれた方向に顔を向けている場合、視聴者は、視聴方向位置（点Ｄ）を見ていると仮定する。ここで、ＡＤ＝ＡＣ＝ｒ_２と仮定する。また、ＢＤ＝ｒ’である。
このとき、ロボット１から見た視聴者と視聴方向位置との間の角度∠ＢＡＤ＝θ_ｒ’となる。 In this positional relationship, when the viewer faces in a direction shifted by a facial direction angle θ _h ' from the robot 1, it is assumed that the viewer is looking at the viewing direction position (point D). Here, it is assumed that AD = AC = _r2 . Also, BD = r'.
At this time, the angle between the viewer and the viewing direction position as seen by the robot 1 is ∠BAD=θ _r '.

テレビ－視聴者間角度算出部１６１は、ロボット１から見たテレビと視聴者との間の角度θ_ｒを算出する。
図１２は、本実施形態におけるテレビ－視聴者間の角度θ_ｒの算出方法を説明する図である。 The television-viewer angle calculation unit 161 calculates the angle _θr between the television and the viewer as seen by the robot 1 .
FIG. 12 is a diagram for explaining a method for calculating the angle _θr between the television and the viewer in this embodiment.

まず、テレビ－視聴者間角度算出部１６１は、顔中心点Ｐ_ｆ（ｘ_ｆ，ｙ_ｆ）及びテレビ中心点Ｐ_ｔｖ（ｘ_ｔｖ，ｙ_ｔｖ）から、パノラマ画像におけるテレビと視聴者との間の画素数ｄ_１を次のように算出する。
ｄ_１＝｜ｘ_ｔｖ－ｘ_ｆ｜
次に、テレビ－視聴者間角度算出部１６１は、パノラマ画像のサイズ（Ｘ_ｐ，Ｙ_ｐ）から、パノラマ画像におけるテレビと視聴者との間の画素数ｄ_２を次のように算出する。
ｄ_２＝｜Ｘ_ｐ－ｄ_１｜ First, the television-viewer angle calculation unit 161 calculates the number of pixels d1 between the television and the viewer in the panoramic image from the face center point _Pf ( _xf , _yf ) and the television center point _Ptv ( _xtv , _ytv ₎ as follows:
d ₁ = |x _tv −x _f |
Next, the television-viewer angle calculation unit 161 calculates the number of pixels d ₂ between the television and the viewer in the panoramic image from the size of the panoramic image (X _p , Y _p ) as follows:
d ₂ = |X _p −d ₁ |

そして、テレビ－視聴者間角度算出部１６１は、テレビと視聴者との間の画素数ｄ_１とｄ_２とを比較し、
ｄ＝ｍｉｎ（ｄ_１，ｄ_２）
とする。 Then, the television-viewer angle calculation unit 161 compares the number of pixels _d1 and _d2 between the television and the viewer,
d=min(d ₁ , d ₂ )
Let us assume that.

テレビ－視聴者間角度算出部１６１は、パノラマ画像におけるｘ軸のサイズＸ_ｐをロボット１の周囲の角度３６０度に換算し、テレビと視聴者との間の画素数ｄから、ロボット１から見たテレビと視聴者との間の角度θ_ｒを次のように算出する。
θ_ｒ＝ｄ×角度ａ
角度ａ＝３６０／Ｘ_ｐ The television-viewer angle calculation unit 161 converts the size _Xp of the x-axis in the panoramic image into an angle of 360 degrees around the robot 1, and calculates the angle _θr between the television and the viewer as seen by the robot 1 from the number of pixels d between the television and the viewer as follows:
_θr = d × angle a
Angle a=360/X _p

例えば、パノラマ画像において、ｘ軸のサイズＸ_ｐ＝４３２０、テレビ位置のｘ軸値ｘ_ｔｖ＝３５４１、顔位置のｘ軸値ｘ_ｆ＝７１３であったとき、ロボット１から見たテレビと視聴者との間の角度θ_ｒは、次のように算出される。
ｄ_１＝３５４１－７１３＝２８２８
ｄ_２＝４３２０－２８２８＝１４９２
ｄ＝ｄ_２＝１４９２
θ_ｒ＝１４９２×３６０／４３２０≒１２４度 For example, in a panoramic image, when the x-axis size _Xp = 4320, the x-axis value of the television position _xtv = 3541, and the x-axis value of the face position _xf = 713, the angle _θr between the television and the viewer as seen by the robot 1 is calculated as follows:
d ₁ =3541-713=2828
_d2 =4320-2828=1492
d = _d2 = 1492
θ _r =1492×360/4320≒124 degrees

テレビ－視聴者間距離算出部１６２は、ロボット１から見たテレビと視聴者との間の角度θ_ｒ、ロボット１から視聴者までの距離ｒ_１＝ｄ_ｆ、ロボット１からテレビまでの距離ｒ_２＝ｄ_ｔｖを用いて、テレビと視聴者との間の距離ｒを次のように算出する。
ｒ＝√（ｒ_１ ^２＋ｒ_２ ^２－２ｒ_１ｒ_２ｃｏｓθ_ｒ） The television-viewer distance calculation unit 162 calculates the distance _{r between the television and the viewer using the angle θ r} between the television and the viewer as seen by the robot 1, the distance r ₁ from the _robot 1 to the viewer, and _the distance r ₂ from the robot 1 to the television, as follows:
r=√(r ₁ ² + r ₂ ² -2r ₁ r ₂ cosθ _r )

ロボット－テレビ間角度算出部１６３は、ロボット１とテレビとの間の角度θ_ｈを、余弦定理を用いて次のように算出する。
ｒ_２ ^２＝ｒ^２＋ｒ_１ ^２－２ｒｒ_１ｃｏｓθ_ｈ
θ_ｈ＝ｃｏｓ^－１［（ｒ^２＋ｒ_１ ^２－ｒ_２ ^２）／（２ｒｒ_１）］ The robot-television angle calculation unit 163 calculates the angle _θh between the robot 1 and the television using the cosine theorem as follows:
r ₂ ² = r ² + r ₁ ² -2rr ₁ cosθ _h
θ _h =cos ⁻¹ [(r ² +r ₁ ² −r ₂ ² )/(2rr ₁ )]

顔方向角度取得部１６４は、顔検出部１３２により検出された顔画像に基づいて、ロボット１から見た視聴者の顔方向角度θ_ｈ’を推定し、時刻Ｔと共に取得する。 The face direction angle acquisition unit 164 estimates the face direction angle θ _h ′ of the viewer as seen by the robot 1 based on the face image detected by the face detection unit 132 , and acquires it together with the time T.

図１３は、本実施形態における顔方向角度θ_ｈ’を説明する図である。
顔方向角度θ_ｈ’は、視聴者の顔がロボット１に正対している状態（Ａ）を基準とし、（Ｂ）のように基準の方向（ｙ軸）から、視聴者の顔が向いている方向（ｙ_ｈ軸）までの角度である。
なお、顔方向角度θ_ｈ’を推定するためには、例えば、オープンソースのＯｐｅｎＦａｃｅ等のソフトウェアを利用できるが、推定手法はこれに限られない。 FIG. 13 is a diagram illustrating the face direction angle θ _h ′ in this embodiment.
The face direction angle θ _h ′ is based on the state (A) in which the viewer's face is facing directly at the robot 1, and is the angle from the reference direction (y-axis) to the direction ( _yh- axis) in which the viewer's face is facing, as shown in (B).
In order to estimate the face direction angle θ _h ′, for example, open source software such as OpenFace can be used, but the estimation method is not limited to this.

視聴方向角度算出部１６５は、ロボット１から見た視聴者と、視聴者が見ている先（視聴方向位置）との間の視聴方向角度θ_ｒ’を、余弦定理を用いて次のように算出する。
ｒ_２ ^２＝ｒ’^２＋ｒ_１ ^２－２ｒ’ｒ_１ｃｏｓθ_ｈ’
ｒ’^２＝ｒ_１ ^２＋ｒ_２ ^２－２ｒ_１ｒ_２ｃｏｓθ_ｒ’
θ_ｒ’＝ｃｏｓ^－１［（ｒ_１／ｒ_２）－（ｒ’／ｒ_２）ｃｏｓθ_ｈ’］
ここで、視聴者から視聴方向位置までの距離ｒ’は、例えば、次のように近似して求めることができる。 The viewing direction angle calculation unit 165 calculates the viewing direction angle θ _r ′ between the viewer as seen by the robot 1 and the direction in which the viewer is looking (viewing direction position) using the cosine theorem as follows:
r ₂ ² = r' ² + r ₁ ² -2r'r ₁ cosθ _h '
r' ² = r ₁ ² + r ₂ ² -2r ₁ r ₂ cosθ _r '
θ _r ′=cos ⁻¹ [(r ₁ /r ₂ )−(r′/r ₂ )cos θ _h ′]
Here, the distance r' from the viewer to the viewing direction position can be calculated, for example, by approximation as follows.

図１４は、本実施形態におけるロボット１、視聴者、及び視聴方向位置の位置関係を示す図である。
ここで、ロボット１（点Ａ）は、例えば、視聴者（点Ｂ）とテレビとの間のテーブル等に置かれており、視聴方向位置を点Ｄとしたとき、点Ａと線分ＢＤとの距離が十分に近いこととする。
この場合、点Ａから線分ＢＤへ下した垂線の足をＥとし、ＢＥ＝ｒ_１’，ＤＥ＝ｒ_２’とすると、
ｒ’＝｜ｒ_１ ^２－ｒ_２ ^２｜／｜ｒ_１’－ｒ_２’｜
に対して、
ｒ_１’≒ｒ_１，ｒ_２’≒ｒ_２
と近似でき、すなわち、
ｒ’＝ｒ_１＋ｒ_２
と算出される。 FIG. 14 is a diagram showing the positional relationship between the robot 1, the viewer, and the viewing direction position in this embodiment.
Here, the robot 1 (point A) is placed, for example, on a table between the viewer (point B) and the television, and when the viewing direction position is point D, the distance between point A and line segment BD is sufficiently close.
In this case, let E be the foot of the perpendicular line drawn from point A to line segment BD, and let BE = r ₁ ', DE = r ₂ '.
r'=|r ₁ ² - r ₂ ² |/|r ₁ '-r ₂ '|
In contrast,
r ₁ '≒ r ₁ , r ₂ '≒ r ₂
can be approximated as follows:
r′=r ₁ +r ₂
It is calculated as follows.

視聴方向画像取得部１６６は、視聴方向角度θ_ｒ’、テレビ－視聴者間角度θ_ｒ、パノラマ画像におけるテレビ中心点Ｐ_ｔｖ（ｘ_ｔｖ，ｙ_ｔｖ）を用いて、視聴方向位置Ｐ_ｖ（ｘ_ｖ）を次のように算出し、視聴方向画像を取得する。
ｘ_ｖ＝ｘ_ｔｖ－（θ_ｒ’－θ_ｒに相当する画素数） The viewing direction image acquisition unit 166 calculates the viewing direction position P _v (x v ) using the viewing direction angle θ _r ', the television-to-viewer angle θ _r , and the television center point P _tv (x _tv , _y _tv ) in the panoramic image as follows, and acquires the viewing direction image.
x _v = x _tv - (number of pixels corresponding to θ _r '-θ _r )

このとき、視聴方向画像取得部１６６は、例えば、撮像部２０の水平画角θ_ｃを画角画素数ｄ_ｃに変換し、視聴方向位置Ｐ_ｖ（ｘ_ｖ）を中心に（ｘ_ｖ－（ｄ_ｃ／２）、ｘ_ｖ＋（ｄ_ｃ／２））、又は（ｘ_ｖ－（ｄ_ｃ／２）、ｘ_ｖ＋（ｄ_ｃ／２）－１）等の範囲の画像を取得する。
ｄ_ｃ＝（Ｘ_ｐ／３６０）×θ_ｃ
あるいは、視聴方向画像取得部１６６は、撮像部２０による撮像画像のサイズ（画素数）が既知の場合に、この画素数をｄ_ｃとして用いてもよい。 At this time, the viewing direction image acquisition unit 166 converts the horizontal angle of view _θc of the imaging unit 20 into the number of angle of view pixels _dc , for example, and acquires an image in a range such as ( _xv- ( _dc /2), _xv +( _dc /2)) or ( _xv- ( _dc /2) _, _xv +( _dc /2)-1 ₎ centered on the viewing direction position Pv(xv).
d _c = (X _p /360) x θ _c
Alternatively, if the size (number of pixels) of the image captured by the imaging section 20 is known, the viewing direction image acquisition section 166 may use this number of pixels as _dc .

図１５は、本実施形態における視聴方向画像の取得方法を例示する図である。
例えば、カメラの水平画角θ_ｃ＝４０、パノラマ画像のｘ軸のサイズＸ_ｐ＝４３２０、テレビ中心点のｘ座標ｘ_ｔｖ＝３８００、θ_ｒ’－θ_ｒ＝０の場合、ｄ_ｃ＝（４３２０／３６０）×４０＝４８０、ｘ_ｖ＝３８００－０＝３８００となるので、視聴方向画像取得部１６６は、（３８００－２４０、３８００＋２４０）＝（３５６０、４０４０）の範囲の画像を取得する。
また、例えば、θ_ｒ’－θ_ｒ＝３５の場合、ｄ_ｃ＝（４３２０／３６０）×４０＝４８０、ｘ_ｖ＝３８００－（（４３２０／３６０）×３５）＝３３８０となるので、視聴方向画像取得部１６６は、（３３８０－２４０、３３８０＋２４０）＝（３１４０、３６２０）の範囲の画像を取得する。 FIG. 15 is a diagram illustrating a method for acquiring a viewing direction image in this embodiment.
For example, if the camera's horizontal angle of view θ _c = 40, the x-axis size of the panoramic image X _p = 4320, the x-coordinate of the television center point x _tv = 3800, and θ _r ' - _{θ r} = 0, then d _c = (4320/360) × 40 = 480, x _v = 3800 - 0 = 3800, and the viewing direction image acquisition unit 166 acquires images in the range of (3800 - 240, 3800 + 240) = (3560, 4040).
Also, for example, when _θr ' - _θr = 35, _dc = (4320/360) x 40 = 480, _xv = 3800 - ((4320/360) x 35) = 3380, so the viewing direction image acquisition unit 166 acquires images in the range of (3380 - 240, 3380 + 240) = (3140, 3620).

また、視聴方向画像取得部１６６は、テレビ中心点Ｐ_ｔｖではなく、視聴者の顔中心点Ｐ_ｆ（ｘ_ｆ，ｙ_ｆ）を用いて、次のように視聴方向位置Ｐ_ｖ（ｘ_ｖ）を算出してもよい。
ｘ_ｖ＝ｘ_ｆ－（θ_ｒ’に相当する画素数） Furthermore, the viewing direction image acquisition section 166 may use the viewer's face center point _Pf ( _xf , _yf ) instead of the television center point _Ptv to calculate the viewing direction position _Pv ( _xv ) as follows.
x _v = x _f - (number of pixels corresponding to θ _r ')

視聴状態判定部１７は、視聴方向画像を一定時間取得し統計処理することにより、視聴者がテレビを視聴しているか否かといった視聴状態を判定する。
本実施形態では、視聴状態として視聴度を定義する。
視聴度は、テレビ視聴時におけるユーザの視聴度合いの指標であり、値が大きいほどテレビを視聴していること、逆に、値が小さいほどテレビを視聴していないことを示す。 The viewing state determination unit 17 obtains viewing direction images for a certain period of time and performs statistical processing to determine the viewing state, such as whether or not the viewer is watching television.
In this embodiment, the viewing level is defined as the viewing state.
The viewing intensity is an index of the degree to which a user watches television. A larger value indicates that the user is watching more television, and conversely, a smaller value indicates that the user is not watching television.

図１６は、本実施形態における視聴状態判定部１７の機能構成を示す図である。
視聴状態判定部１７は、視聴方向物体検出部１７１と、視聴度算出部１７２とを備え、視聴方向画像が入力されることにより、視聴度Ｉ_ｗを出力する。 FIG. 16 is a diagram showing the functional configuration of the viewing state determining unit 17 in this embodiment.
The viewing state determination unit 17 includes a viewing direction object detection unit 171 and a viewing degree calculation unit 172, and outputs a viewing degree _Iw when a viewing direction image is input.

視聴方向物体検出部１７１は、入力された視聴方向画像から物体検出を行い、キーワードを抽出する。物体検出の手段として、例えば、オープンソースのＦａｓｔｅｒ－ＲＣＮＮ等のソフトウェアを利用できるが、検出手法はこれに限られない。 The viewing direction object detection unit 171 detects objects from the input viewing direction image and extracts keywords. As a means of object detection, for example, open source software such as Faster-RCNN can be used, but the detection method is not limited to this.

視聴度算出部１７２は、視聴方向物体検出部１７１で抽出されたキーワードを用いて、視聴者が実際にテレビを視聴している度合いを表す指標として、視聴度Ｉ_ｗ（Ｔ）を算出する。 The viewership calculation unit 172 uses the keywords extracted by the viewing direction object detection unit 171 to calculate a viewership I _w (T) as an index indicating the degree to which the viewer is actually watching television.

図１７は、本実施形態における視聴度Ｉ_ｗ（Ｔ）の算出例を示す図である。
時刻Ｔにおいて、視聴方向物体検出部１７１により「テレビ」、「テレビジョン」、「モニタ」等の映像視聴デバイスを示す種類のキーワードが抽出された場合、視聴度算出部１７２は、視聴状態を「テレビ」とする。また、一緒にテレビを視聴している人が検出された場合、視聴度算出部１７２は、視聴状態を「他者」とする。これら以外の場合の視聴状態は、「その他」と定義される。 FIG. 17 is a diagram showing an example of calculation of the viewer intensity I _w (T) in this embodiment.
At time T, when the viewing direction object detection unit 171 extracts a keyword indicating a type of video viewing device such as "television", "television", or "monitor", the viewing degree calculation unit 172 determines the viewing state as "television". Furthermore, when a person watching television together is detected, the viewing degree calculation unit 172 determines the viewing state as "others". The viewing state in other cases is defined as "other".

本実施形態では、一定時間Ｔ_ｆにおいて、検出回数に占める「テレビ」の視聴状態の割合を視聴度Ｉ_ｗとした。図１７の例では、視聴度Ｉ_ｗ＝０．６であり、他者又はその他に視線を移しつつも、テレビに注目していることが分かる。 In this embodiment, the ratio of the "television" viewing state to the number of detections during a certain time _Tf is defined as the viewing level _Iw . In the example of Fig. 17, the viewing level _Iw = 0.6, which shows that the subject is paying attention to the television while shifting his/her gaze to other people or other objects.

この場合、動作制御部４０は、例えば、ロボットが視聴者へ話し掛けるための閾値を０．５とし、視聴度Ｉ_ｗが所定以上に高いことから、ロボット１が視聴者へ話し掛ける制御を行ってもよい。
また、例えば、算出した視聴度Ｉ_ｗが０．８等、「テレビ」の視聴状態割合が特に高かったときは、集中してテレビを視聴している可能性が高いため、動作制御部４０は、ロボット１が視聴者に話し掛けるのを控えるように制御してもよい。 In this case, the operation control unit 40 may set the threshold value for the robot to talk to the viewer to 0.5, for example, and perform control so that the robot 1 talks to the viewer since the viewing level _Iw is higher than a predetermined level.
In addition, when the calculated viewing level _Iw is, for example, 0.8, and the viewing state ratio of "television" is particularly high, it is highly likely that the viewer is concentrating on watching television. Therefore, the operation control unit 40 may control the robot 1 to refrain from talking to the viewer.

逆に、Ｉ_ｗが０．３、すなわちテレビ方向を視聴している時間が１０分当たり３分間程度のように、テレビの視聴状態割合が低かったときは、テレビへの関心を抱かせるために、動作制御部４０は、ロボット１が視聴者への話し掛けや身振り手振りを用いた動作を行うように制御してもよい。
例えば、視聴者がテレビ方向を見ていないときに、ロボット１が視聴者や周囲を見まわす動作をしながら、「この場所にぜひとも行ってみたい。」、「チャンネルを変えてみますか？」等、テレビへの関与を促す発話をする。
このように、閾値を段階的に設定し、例えば、Ｉ_ｗが高い時は発話のみ、Ｉ_ｗが低い時は発話と共に身振り手振りによる動作を実施する等、動作制御部４０は、視聴者毎に適宜ロボット１を制御してよい。 Conversely, when _Iw is 0.3, that is, when the ratio of the television viewing state is low, such as when the time spent watching the television direction is about 3 minutes per 10 minutes, the operation control unit 40 may control the robot 1 to talk to the viewer or to perform actions using gestures in order to arouse the viewer's interest in watching television.
For example, when the viewer is not looking in the direction of the television, the robot 1 looks around the viewer and the surroundings and says things like, "I'd really like to go to this place," or "Shall we change the channel?" to encourage the viewer to pay attention to the television.
In this way, the threshold value may be set in stages, and the operation control unit 40 may control the robot 1 appropriately for each viewer, for example, by only speaking when _Iw is high, and by performing gestures along with speaking when _Iw is low.

また、視聴状態として、「テレビ」、「他者」、「その他」といった視聴方向を示す複数種類の区分が取得された場合、動作制御部４０は、この区分に応じてロボット１の動作を変更してもよい。例えば、テレビ番組の内容に関して、開示、質問、確認、情報、応答等の発話種別それぞれの発話頻度が次のように調整されてもよい。 In addition, when multiple types of classifications indicating the viewing direction, such as "television," "others," and "other," are acquired as the viewing state, the operation control unit 40 may change the operation of the robot 1 according to this classification. For example, the frequency of each utterance type, such as disclosure, question, confirmation, information, and response, regarding the content of a television program may be adjusted as follows:

視聴方向が「テレビ」であることが多い場合、視聴者はテレビを注視している状態と考えられるので、動作制御部４０は、視聴者の中止状態を妨げるような、回答を求める質問又は確認の発話頻度を低くする。
視聴方向が「他者」であることが多い場合、視聴者は他者とのコミュニケーションを多く取っている状態と考えられるので、動作制御部４０は、他者とのコミュニケーションの邪魔をしないように、全ての発話種別の発話頻度を低くする。
視聴方向が「その他」であることが多い場合、視聴者はテレビを注視していないし、他者とのコミュニケーションも取っていないと考えられるので、動作制御部４０は、テレビ視聴への関心を促すため、開示又は質問の発話頻度を高くする。 When the viewing direction is often toward the television, it is assumed that the viewer is watching the television, so the operation control unit 40 reduces the frequency of questions requesting answers or confirmations that would prevent the viewer from stopping.
When the viewing direction is often toward "other people," it is considered that the viewer is in a state of frequent communication with others, so the operation control unit 40 reduces the frequency of speech for all speech types so as not to interfere with communication with others.
When the viewing direction is often "other," it is considered that the viewer is not paying attention to the television and is not communicating with others, so the operation control unit 40 increases the frequency of disclosures or questions to stimulate interest in television viewing.

本実施形態によれば、視聴状態推定装置１０は、ロボット１の周囲全方位のパノラマ画像から視聴者の顔及びテレビを検出し、両者の画像上の距離とパノラマ画像のサイズとから、ロボット１から見たテレビと視聴者との間の角度を算出する。さらに、視聴状態推定装置１０は、検出したテレビ及び視聴者のロボット１からの距離を測定することにより、ロボット１、視聴者及びテレビの位置関係を特定する。そして、視聴状態推定装置１０は、顔位置の画像から得られる視聴者の顔方向角度に基づいて、パノラマ画像から視聴者の視聴方向角度にある視聴方向画像に含まれる物体を検出し、この物体の種類に基づいて、視聴者の状態を判定する。 According to this embodiment, the viewing state estimation device 10 detects the viewer's face and the television from a panoramic image in all directions around the robot 1, and calculates the angle between the television and the viewer as seen by the robot 1 from the distance between them on the image and the size of the panoramic image. Furthermore, the viewing state estimation device 10 specifies the positional relationship between the robot 1, the viewer, and the television by measuring the distance from the robot 1 to the detected television and viewer. Then, the viewing state estimation device 10 detects an object included in the viewing direction image at the viewing direction angle of the viewer from the panoramic image based on the viewer's face direction angle obtained from the face position image, and judges the viewer's state based on the type of this object.

したがって、視聴状態推定装置１０は、家庭内の天井等にカメラ等の撮像デバイスを設置することなく、また、視聴者に視線方向取得装置を装着させることなく、卓上等に設置したロボット１において、パノラマ画像に基づいて、視聴方向画像内の物体を検出することで視聴者の視聴状態を推定できる。
この結果、ロボット１は、視聴者がテレビを視聴しているときは話し掛けない配慮をしたり、テレビを視聴していないときはテレビへの関与を促す発話や、身振り手振りの仕草をしたりといった、視聴状態に応じた動作を実現できる。 Therefore, the viewing state estimation device 10 can estimate the viewing state of a viewer by detecting objects in the viewing direction image based on a panoramic image using a robot 1 installed on a table or the like, without installing an imaging device such as a camera on the ceiling or the like in the home, and without having the viewer wear a gaze direction acquisition device.
As a result, the robot 1 can perform actions according to the viewing state, such as being considerate not to talk to the viewer when the viewer is watching television, and making speech and gestures to encourage the viewer to participate in watching television when the viewer is not watching television.

視聴状態推定装置１０は、視聴者から視聴方向位置までの距離を、ロボット１から視聴者までの距離と、ロボット１から視聴方向位置までの距離との和で近似することにより、容易に視聴方向位置を特定して、視聴状態を推定できる。 The viewing state estimation device 10 can easily identify the viewing direction position and estimate the viewing state by approximating the distance from the viewer to the viewing direction position by the sum of the distance from the robot 1 to the viewer and the distance from the robot 1 to the viewing direction position.

視聴状態推定装置１０は、視聴者の状態の一定時間内における統計情報に基づいて、視聴状態を算出するので、視聴方向位置の算出誤差を低減し、判定される視聴状態の信頼性を向上できる。 The viewing state estimation device 10 calculates the viewing state based on statistical information about the viewer's state over a certain period of time, thereby reducing calculation errors in the viewing direction position and improving the reliability of the determined viewing state.

視聴状態推定装置１０は、視聴状態として、テレビを視聴している割合を示す視聴度を算出するので、動作制御部４０は、視聴者が実際にテレビを注視している度合いを把握し、この度合いに応じてロボット１の動作を視聴者の状態に適切に合わせることができる。
このとき、動作制御部４０は、算出された視聴度を所定の閾値と比較した結果により、容易にロボット１の動作を制御できる。
また、動作制御部４０は、複数の閾値に基づいて制御を段階的に変更することにより、ロボット１の動作バリエーションを視聴者の状態に合わせて適切に設定できる。 The viewing state estimation device 10 calculates the viewing degree indicating the proportion of television viewing as the viewing state, so that the operation control unit 40 can grasp the degree to which the viewer is actually watching television and appropriately adjust the operation of the robot 1 to the viewer's state according to this degree.
At this time, the operation control unit 40 can easily control the operation of the robot 1 based on the result of comparing the calculated viewing rate with a predetermined threshold value.
Furthermore, the motion control unit 40 can appropriately set the motion variations of the robot 1 according to the state of the viewer by gradually changing the control based on a plurality of threshold values.

視聴状態推定装置１０は、視聴状態として、テレビを見ている状態、及び他者を見ている状態含む複数の状態を判定するので、動作制御部４０は、視聴者が見ている物体の種類を把握し、この種類に応じてロボット１の動作を視聴者の状態に適切に合わせることができる。 The viewing state estimation device 10 determines multiple viewing states, including a state of watching television and a state of watching others, so the operation control unit 40 can grasp the type of object the viewer is looking at and appropriately match the operation of the robot 1 to the viewer's state depending on this type.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、前述の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Furthermore, the effects described in the above-described embodiments are merely a list of the most favorable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments.

前述の実施形態では、視聴状態推定装置１０は、視聴方向画像の物体検出により視聴者が見ている物体の種類を推定したが、テレビを見ているか否かに関しては、視聴者の顔方向角度に基づいて判定されてもよい。
具体的には、例えば、顔方向角度θ_ｈ’が、
θ_ｈ－α＜θ_ｈ’＜θ_ｈ＋α
の条件を満たす場合に、視聴者がテレビの方向を向いていると判断してもよい。なお、αは、テレビの大きさ及び視聴者からの距離に基づいて設定される調整角度（例えば、テレビの視角の半分）である。 In the above-described embodiment, the viewing state estimation device 10 estimated the type of object the viewer is looking at by object detection in the viewing direction image, but whether or not the viewer is watching television may be determined based on the viewer's face direction angle.
Specifically, for example, the face direction angle θ _h ′ is
θ _h −α＜θ _h '＜θ _h +α
It may be determined that the viewer is facing the television when the above condition is satisfied. Here, α is an adjustment angle (e.g., half the viewing angle of the television) that is set based on the size of the television and the distance from the viewer.

また、前述の実施形態では、顔方向角度θ_ｈ’に基づいて視聴方向角度θ_ｒ’を算出したが、算出方法はこれに限られない。
例えば、θ_ｈ’が適切に取得できない場合には、
θ_ｒ’＝ｃｏｓ^－１［（ｒ_１ ^２＋ｒ_２ ^２－ｒ’^２）／（２ｒ’ｒ_１）］
のように、θ_ｈ’を用いずにθ_ｒ’を求めてもよい。 In addition, in the above embodiment, the viewing direction angle θ _r ′ is calculated based on the face direction angle θ _h ′, but the calculation method is not limited to this.
For example, if θ _h ' cannot be obtained properly,
θ _r '=cos ^-1 [(r ₁ ² + r ₂ ² - r' ² )/(2r'r ₁ )]
As shown above, θ _r ' may be calculated without using θ _h '.

また、前述の実施形態では、ロボット１から視聴方向位置までの距離をロボット１からテレビまでの距離ｒ_２と等しいとしたが、仮定する条件はこれに限られない。
例えば、θ_ｈ’の正負又は値の範囲等に基づいて、該当の距離にｒ_１を用いたり、ｒ_２を用いたり、あるいは他の値を用いたりと、状況に応じて適宜設定されてもよい。 In the above embodiment, the distance from the robot 1 to the viewing direction position is equal to the distance _r2 from the robot 1 to the television, but the assumed condition is not limited to this.
For example, depending on the positive/negative or value range of θ _h ′, r ₁ , r ₂ , or another value may be used for the relevant distance, and the value may be set appropriately according to the situation.

本実施形態では、主に視聴状態推定装置１０の構成と動作について説明したが、本発明はこれに限られず、各構成要素を備え、視聴状態を推定するための方法、又はプログラムとして構成されてもよい。 In this embodiment, the configuration and operation of the viewing state estimation device 10 have been mainly described, but the present invention is not limited to this, and may be configured as a method or program for estimating the viewing state, including each component.

さらに、視聴状態推定装置１０の機能を実現するためのプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。 Furthermore, the functions of the viewing state estimation device 10 may be realized by recording a program for realizing the functions of the viewing state estimation device 10 on a computer-readable recording medium, and reading and executing the program recorded on the recording medium into a computer system.

ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータで読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 The term "computer system" here includes hardware such as the OS and peripheral devices. Additionally, "computer-readable recording media" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems.

さらに「コンピュータで読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでもよい。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Furthermore, "computer-readable recording medium" may include something that dynamically holds a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, or something that holds a program for a fixed period of time, such as volatile memory within a computer system that serves as a server or client in such a case. Furthermore, the above program may be one that realizes part of the functions described above, or may be one that can realize the functions described above in combination with a program already recorded in the computer system.

１ロボット
１０視聴状態推定装置
１１パノラマ画像部
１２距離パノラマ画像部
１２Ａ距離画像データベース
１２Ｂ距離パノラマ画像データベース
１３視聴者検出部
１４テレビ検出部
１５距離取得部
１６視聴方向検出部
１７視聴状態判定部
２０撮像部
３０距離検出部
４０動作制御部
１２１距離画像合成部
１２２重複区間算出部
１３１人検出部
１３２顔検出部
１６１テレビ－視聴者間角度算出部
１６２テレビ－視聴者間距離算出部
１６３ロボット－テレビ間角度算出部
１６４顔方向角度取得部
１６５視聴方向角度算出部
１６６視聴方向画像取得部
１７１視聴方向物体検出部
１７２視聴度算出部 LIST OF SYMBOLS 1 Robot 10 Viewing state estimation device 11 Panoramic image section 12 Distance panoramic image section 12A Distance image database 12B Distance panoramic image database 13 Viewer detection section 14 Television detection section 15 Distance acquisition section 16 Viewing direction detection section 17 Viewing state determination section 20 Imaging section 30 Distance detection section 40 Operation control section 121 Distance image synthesis section 122 Overlapped section calculation section 131 Person detection section 132 Face detection section 161 Television-viewer angle calculation section 162 Television-viewer distance calculation section 163 Robot-television angle calculation section 164 Face direction angle acquisition section 165 Viewing direction angle calculation section 166 Viewing direction image acquisition section 171 Viewing direction object detection section 172 Viewing degree calculation section

Claims

a panoramic image unit for acquiring an omnidirectional panoramic image synthesized from images captured around the robot;
a distance panorama image unit for generating a distance panorama image in which distance data corresponding to each pixel of the panorama image is used as a pixel value;
a television detection unit that detects a television position from the panoramic image;
a viewer detection unit that detects a face position of a viewer from the panoramic image;
a distance acquisition unit that acquires distances at the television position and the face position from the distance panoramic image;
a viewing direction detection unit that specifies a positional relationship between the robot, the viewer, and the television by calculating an angle between the television and the viewer as seen by the robot based on a size of the panoramic image, the television position, and the face position, and obtains a viewing direction image at the viewing direction angle of the viewer from the panoramic image based on a face direction angle of the viewer obtained from the image of the face position;
a viewing state determination unit that detects an object included in the viewing direction image and determines a state of the viewer based on a type of the object.

The viewing state estimation device according to claim 1, wherein the viewing direction detection unit approximates the distance from the viewer to the viewing direction position by the sum of the distance from the robot to the viewer and the distance from the robot to the viewing direction position.

The viewing state estimation device according to claim 1 or claim 2, wherein the viewing state determination unit calculates the viewing state based on statistical information on the state of the viewer within a certain period of time.

The viewing state estimation device according to claim 3, wherein the viewing state determination unit calculates a viewing degree indicating the proportion of the television being watched as the viewing state.

The viewing state estimation device according to claim 3 , wherein the viewing state determination unit determines, as the viewing state, whether the viewing state is one of a plurality of states including a state in which the viewing state is watching the television and a state in which the viewing state is watching another person.

The viewing state estimating device according to claim 4 ;
a motion control unit that controls the motion of the robot based on a result of comparing the viewing degree output from the viewing state estimation device with a predetermined threshold.

The robot system according to claim 6 , wherein the operation control unit provides a plurality of the predetermined thresholds, and changes the control of the robot in a stepwise manner depending on a result of comparing the viewing rate with the plurality of thresholds.

The viewing state estimating device according to claim 5 ;
and an operation control unit that controls an operation of the robot in accordance with the classification of the plurality of states output from the viewing state estimation device.

a panoramic image generating step of acquiring an omnidirectional panoramic image synthesized from images captured around the robot;
a distance panoramic image generating step of generating a distance panoramic image in which distance data corresponding to each pixel of the panoramic image is used as a pixel value;
a television detection step of detecting a television position from the panoramic image;
a viewer detection step of detecting a face position of a viewer from the panoramic image;
a distance acquisition step of acquiring distances at the television position and the face position from the distance panoramic image;
a viewing direction detection step of calculating an angle between the television and the viewer as seen by the robot based on the size of the panoramic image, the television position, and the face position, thereby specifying a positional relationship between the robot, the viewer, and the television, and acquiring a viewing direction image at the viewing direction angle of the viewer from the panoramic image based on the face direction angle of the viewer obtained from the image of the face position;
a viewing state determination step of detecting an object included in the viewing direction image and determining a state of the viewer based on a type of the object, the viewing state determination method being executed by a computer.

A viewing state estimation program for causing a computer to function as the viewing state estimation device according to any one of claims 1 to 5.