JPH099202A

JPH099202A - Index generation method, index generation device, indexing device, indexing method, video minutes generation method, frame editing method and frame editing device

Info

Publication number: JPH099202A
Application number: JP8142477A
Authority: JP
Inventors: Benkatetsushiyu Purasado Kee; ベンカテッシュプラサドケー
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1995-06-23
Filing date: 1996-06-05
Publication date: 1997-01-10
Anticipated expiration: 2016-06-05
Also published as: JP3608758B2

Abstract

(57)【要約】【課題】ビデオ場面内の高レベルのオブジェクトを利
用し、ビデオレコードの内容ベースの索引付けを実現す
る。【解決手段】フレームをサーチして見つけたニュース
アイコン３２０をビデオラベル１として記憶される。画
像内容を利用する場合、ニュースアイコン３２０の画像
内容に類似した画像内容を持つフレームがビデオラベル
１に索引付けされる。ニュースアイコン３２０はポニー
の絵を含むので、少なくともポニーの一部を含むフレー
ム４０６〜４１２が、そのビデオラベルに索引付けされ
る。アイコン３２０のテキスト内容（ＰＯＮＹＴＡＬ
Ｅ）や関連した音声内容も、索引付けすべきフレームの
類似判定に利用できる。 (57) Abstract: A high-level object in a video scene is used to realize content-based indexing of video records. A news icon 320 found by searching a frame is stored as a video label 1. When utilizing image content, frames with image content similar to the image content of news icon 320 are indexed into video label 1. Since the news icon 320 contains a picture of the pony, frames 406-412 containing at least a portion of the pony are indexed into the video label. Text content of icon 320 (PONY TAL
E) and related audio content can also be used to determine the similarity of frames to be indexed.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する分野】本発明は、ビデオレコーディング
技術に係り、特に、ビデオレコードの索引生成、索引付
け及び編集のための技術に関する。FIELD OF THE INVENTION The present invention relates to video recording techniques, and more particularly to techniques for indexing, indexing and editing video records.

【０００２】[0002]

【従来の技術】ビデオ技術は、テレビニュースビデオや
デスクトップテレビ会議のような応用分野でビデオデー
タベースが一般的になるほど進歩した。しかし、ビデオ
データベースの発展に伴い、ビデオデータベースから特
定のビデオ部分を検索するための、より一層効率的な方
法の必要性が増してきた。ビデオデータベースに対する
現在の検索方法の多くは、タイムスタンプ法を利用す
る。タイムスタンプ法では、人がある特定のビデオ部分
の日時を知っていれば、そのビデオ部分を検索すること
ができる。しかしながら、人はビデオ部分の内容につい
ては多少知っていたとしても、日時を知らないことが多
い。したがって、内容ベースのビデオ索引付け方法に対
する関心が高まっている。BACKGROUND OF THE INVENTION Video technology has advanced to the extent that video databases have become commonplace in applications such as television news videos and desktop video conferencing. However, with the development of video databases, there has been an increasing need for more efficient methods for retrieving specific video portions from video databases. Many of the current search methods for video databases utilize the time stamp method. Timestamping allows a person to search for a particular video portion if they know the date and time of that particular video portion. However, people often do not know the date and time, even if they know a little about the contents of the video part. Therefore, there is increasing interest in content-based video indexing methods.

【０００３】既存の内容ベースの索引付け方法の中に
は、低レベル又は中間レベルのオブジェクト、例えば画
素または画素領域に類似（similarity）手法を適用する
ものがある。例えば、ある画素ベースの方法は、まず各
フレームを、輝度レベル毎の画素数を表すヒストグラム
に変換する。そして、そのヒストグラムに対し相関関数
のような類似手法を適用することにより、２フレームが
「一致する」か判定する。画素領域ベースの方法は、ま
ず、各フレームをいくつかの均一輝度レベルの画素領域
の形で表現し、次に、その表現をエンコードし、最後
に、エンコードされた表現に対し相似手法を適用する。
しかし、低レベル又は中間レベルのオブジェクトの認識
よりも、ビデオ場面内の高レベルのオブジェクトを直接
的に検索できるほうが好ましいかもしれない。例えば、
ユーザーが「赤いスポーツ車を含む全フレームの一覧表
を作成せよ」というような高レベルな照会を使ってビデ
オデータベースに照会できると効率的であろう。このよ
うな方法はブロードなコンテキストでは未だ無理である
が、コンテキストが限定されるとしても高レベルオブジ
ェクト認識を提供できれば望ましい。Some existing content-based indexing methods apply a similarity approach to low-level or intermediate-level objects, such as pixels or pixel regions. For example, one pixel-based method first converts each frame into a histogram that represents the number of pixels for each brightness level. Then, a similar method such as a correlation function is applied to the histogram to determine whether the two frames are “matched”. Pixel domain based methods first represent each frame in the form of several uniform intensity level pixel domains, then encode the representation and finally apply a similarity technique to the encoded representation. .
However, it may be preferable to be able to directly search for high-level objects in a video scene rather than recognizing low-level or mid-level objects. For example,
It would be efficient if the user could query the video database using a high level query such as "create a list of all frames including red sports cars". Such a method is still not possible in broad contexts, but it would be desirable to be able to provide high-level object recognition even if the context is limited.

【０００４】ビデオデータベースの発展によってもたら
されたもう一つのことは、より効率的なビデオ編集方法
の必要性が増大したことである。ビデオデータベースの
利用により、ビデオ編集時にフィルムを物理的に切って
つなぐ必要は殆どなくなった。物理的に切ってつなぐの
ではなく、フレームを加工すべくコンピュータにコマン
ドを入力することにより、データベースに格納されてい
るビデオをコンピュータ上で電子的に編集することがで
きる。しかし、場合によっては、このような方法による
ビデオ編集は非現実的であったり、好ましくないかもし
れない。例えば、編集者はビデオが格納されているデー
タベースの近くに現実にいることができないかもしれな
いし、あるいは、編集者はコンピュータシステム又は適
当なソフトウエアを利用できないかもしれない。したが
って、編集者が、データベースに直接アクセスせずに、
データベースに格納されているビデオを効率的に編集で
きる編集手法を提供することが望まれる。特に、編集者
に、編集すべきフレームを表すハードコピーに編集コマ
ンドを手描きすることによるビデオ編集機能を提供し、
編集記号が記入されたハードコピーが自動的に解釈され
ることによって、その後に、また望むならば別の場所
で、ビデオが編集できると望ましい。Another thing that has been brought about by the development of video databases is the increasing need for more efficient video editing methods. With the use of video databases, there is little need to physically cut and splice films when editing videos. The video stored in the database can be electronically edited on the computer by entering commands into the computer to process the frames, rather than physically cutting them together. However, in some cases video editing in this way may be impractical or undesirable. For example, the editor may not be physically present near the database in which the video is stored, or the editor may not have access to a computer system or suitable software. Therefore, the editor can
It is desired to provide an editing method capable of efficiently editing the videos stored in the database. In particular, it gives editors the ability to edit video by hand-drawing edit commands on a hard copy that represents the frame to be edited,
It would be desirable if the video could be edited afterwards, and elsewhere if desired, by automatically interpreting the hard copy with the edit symbol.

【０００５】[0005]

【発明が解決しようとする課題】よって、本発明の目的
は、以上に述べた従来技術の不十分な点を改善し、また
上述の要求に応えるため、新たな索引生成、索引付け、
ビデオ議事録生成及びフレーム編集の手段を提供するこ
とにある。SUMMARY OF THE INVENTION It is therefore an object of the present invention to remedy the deficiencies of the prior art mentioned above and to meet the above-mentioned needs by new index generation, indexing,
It is to provide means for video minutes generation and frame editing.

【０００６】[0006]

【課題を解決するための手段】本発明によれば、音声及
び映像の内容を持つレコードの索引を生成する方法及び
装置が提供される。索引は複数のラベルからなる。レコ
ードは複数のフレームからなる。それらフレーム中の若
干数のフレームには、少なくとも複数のアイコン中の一
つがそれぞれ含まれる。この索引生成方法は、１）複数
のアイコンを使って複数のラベルを生成するステップ、
２）フレーム中で、複数のアイコン中の一つを含まない
各フレームを、その内容が複数のラベル中の一つに割り
当てられたアイコンの内容と一致するならば、そのラベ
ルに索引付けするステップからなる。According to the present invention, there is provided a method and apparatus for generating an index of records having audio and video content. The index consists of multiple labels. A record consists of multiple frames. Some of the frames include at least one of the plurality of icons. This index generation method includes 1) a step of generating a plurality of labels using a plurality of icons,
2) Indexing each frame in the frame that does not include one of the plurality of icons to that label if its content matches the content of the icon assigned to one of the plurality of labels. Consists of.

【０００７】また、本発明によれば、人間の情動（affe
ct）を利用して、少なくとも１人の人間を描写している
映像内容を持つレコードの索引付けをする方法及び装置
も提供される。レコードは複数のフレームを持ち、それ
らフレーム中の若干数のフレームはそれぞれ、人間の複
数の情動中の一つを含んでいる。この方法は、１）複数
フレーム中のどのフレームがある情動を含んでいるか調
べるステップ、２）ある情動を描写するフレームをそれ
ぞれ、複数ラベル中の１つのラベル（１フレームにつき
１つ）として格納するステップ、３）フレーム中で、あ
る情動を描写する各フレーム毎に、そのフレームより生
成されたラベルに、そのフレームに対応する他のフレー
ムそれぞれを索引付けするステップからなる。Further, according to the present invention, human emotion (affe
Also provided is a method and apparatus for utilizing ct) to index records having video content depicting at least one person. A record has multiple frames, some of which are each one of a plurality of human emotions. This method includes 1) a step of checking which frame in a plurality of frames contains an emotion, and 2) storing each frame describing an emotion as one label (one per frame) in the plurality of labels. Step 3) For each frame that describes a certain emotion in the frame, the step of indexing the label generated from that frame with each of the other frames corresponding to that frame.

【０００８】また、本発明によれば、テレビ電子会議の
レコードのビデオ議事録を生成する方法も提供される。
テレビ電子会議には複数の参加者がいる。レコードは複
数のフレームを持ち、それらフレーム中の若干数のフレ
ームはそれぞれ、参加者中の１人による複数の有意な動
きの中の一つの動きを描写する。この方法は、１）フレ
ーム中で、有意な動きを表す各フレームを複数のラベル
中の一つとして格納するステップ（有意な動きを表す各
フレームより一つのラベルが生成される）、２）フレー
ム中で、ある有意な動きを描写する各フレーム毎に、そ
のフレームから生成されたラベルに、そのフレームに対
応する他フレーム中の各フレームを索引付けするステッ
プからなる。According to the present invention, there is also provided a method of generating a video minutes of a video teleconference record.
A video conference has multiple participants. A record has multiple frames, and some of the frames each describe one of the multiple significant motions by one of the participants. In this method, 1) a step of storing each frame representing significant motion as one of a plurality of labels in the frame (one label is generated from each frame representing significant motion), 2) frame For each frame that describes some significant motion therein, the label generated from that frame indexes each frame in the other frames corresponding to that frame.

【０００９】本発明によれば、記録されたニュース放送
の映像索引を生成する方法も提供される。記録されたニ
ュース放送は一定のプレビュー場面（footage）と関連
付けられている。記録されたニュース放送は複数のニュ
ースフレームからなる。プレビュー場面は複数のプレビ
ューフレームからなる。ニュースフレームとプレビュー
フレームは両方とも音声及び映像の内容を持つ。索引は
複数のラベルからなる。この方法は、１）プレビューフ
レームをメモリに格納するステップ、２）プレビューフ
レーム中で、プレビュー場面内で所定回数以上繰り返さ
れる各プレビューフレームを識別するステップ、３）プ
レビューフレーム中で、プレビュー場面内で所定回数以
上繰り返される各プレビューフレームから複数のラベル
を生成するステップからなる。According to the invention, there is also provided a method of generating a video index of a recorded news broadcast. Recorded news broadcasts are associated with certain preview footage. The recorded news broadcast consists of multiple news frames. The preview scene consists of multiple preview frames. Both news frames and preview frames have audio and video content. The index consists of multiple labels. This method comprises the steps of 1) storing the preview frame in memory, 2) identifying each preview frame in the preview frame that is repeated a predetermined number of times or more, and 3) in the preview frame, in the preview scene. The step of generating a plurality of labels from each preview frame repeated a predetermined number of times or more.

【００１０】本発明によれば、ビデオレコードの複数の
フレームを編集する方法及び装置も提供される。それら
フレームはそれぞれ、静的ディスプレイ上に表示され
る。この方法は、１）静的ディスプレイを調べてユーザ
により手描きされた編集記号を探すステップ、２）静的
ディスプレイ上に手描きされた編集記号を認識するステ
ップ、３）編集コマンドを表す編集記号の表に基づい
て、静的ディスプレイ上に手描きされた編集記号それぞ
れを、複数の編集コマンド中の一つのコマンドに関連付
けるステップ、４）静的ディスプレイ上に手書きされた
編集記号に関連付けられた編集コマンドに従ってビデオ
レコードのフレームを修正するステップからなる。According to the present invention, there is also provided a method and apparatus for editing multiple frames of a video record. Each of those frames is displayed on a static display. This method comprises 1) examining the static display for a user-edited edit symbol, 2) recognizing the edit symbol hand-drawn on the static display, and 3) a table of edit symbols representing edit commands. Step 4, associating each edit symbol hand-painted on the static display with one command out of multiple edit commands, 4) Video according to the edit command associated with the edit symbol handwritten on the static display. It consists of modifying the frame of a record.

【００１１】本発明の上記特徴及び他の特徴は、添付図
面及び以下の詳細な説明から明らかになろう。The above and other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.

【００１２】[0012]

【発明の実施の形態】以下、本発明による内容ベースの
ビデオ索引付け及び編集方法について説明する。以下の
記述においては、説明用に、本発明を十分理解できるよ
う多くの具体例が提示される。しかし、それらの具体例
によらずに本発明を実施し得ることは当業者には明白で
あろう。他方、周知の構造及び装置は、本発明をいたず
らに難解にしないためブロック図として表される。DETAILED DESCRIPTION OF THE INVENTION A content-based video indexing and editing method according to the present invention will now be described. In the following description, for purposes of explanation, numerous specific examples are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without depending on those specific examples. On the other hand, known structures and devices are presented as block diagrams in order not to unnecessarily obscure the present invention.

【００１３】以下の記述において、”ビデオ”なる用語
が頻繁に使用される。本明細書において、”ビデオ”な
る用語は、連続して素早く表示されると被写体の動きや
他のアニメーションを表現する、関連した画像の時間順
シーケンスと定義される。このようなシーケンスは普
通、動画と呼ばれる。In the following description, the term "video" is frequently used. As used herein, the term "video" is defined as a time-ordered sequence of related images that, when displayed in rapid succession, represent subject motion or other animation. Such sequences are commonly called moving pictures.

【００１４】図１は本発明が実施されるコンピュータシ
ステム１を示す。このコンピュータシステム１は中央処
理装置（ＣＰＵ）１０、メモリ２０、データ記憶装置３
（例えば磁気ディスク、ＣＤ−ＲＯＭ）、プリンタ４
０、デジタルビデオチップ（ＤＶＣ）５０、ビデオモニ
ター６０、キーボード７０、マウス８０、スキャナ９
０、ビデオ入力装置（ＶＩＵ）１００をシステムバス１
１０により結合してなる。ＶＩＵ１００は、図３に示さ
れるように、ビデオソースであるレーザーディスクプレ
イヤー１２０、ビデオカメラ１４０及びビデオカセット
レコーダー（ＶＣＲ）１３０、又は信号ＲＳを送出する
リモートソースよりビデオデータを受け取るフレームグ
ラバー（grabber）１５０を含む。このリモートソース
は、例えば、ＲＦソース（テレビケーブル又はアンテナ
等）又はＩＳＤＮソースである。メモリ２０は、図２に
示されるように、個別のフレームにフォーマットされた
ビデオデータを記憶する。図３において、フレームグラ
バー１５０はＶＣＲ１３０、ビデオカメラ１４０、レー
ザーディスクプレイヤー１２０又はリモートソースより
ビデオデータを受け取り、そのデータを個々のフレーム
にフォーマットし、フォーマットしたビデオデータをシ
ステムバス１１０を介してメモり２０へ与える。本発明
は、データ記憶装置３０に格納されたソフトウエアコー
ドによって実施されても、図１にＤＶＣ５０として示さ
れている専用チップにより実施されてもよい。以下にさ
らに述べるように、マウス８０は、ライトペン、スタイ
ラス、トラックボール等の他の同等なカーソル制御装置
で置き換えられてもよい。また、タッチスクリーンを持
つモニターを用いることにより、モニター６０とマウス
８０の特定機能が結合されてもよい。FIG. 1 shows a computer system 1 in which the present invention is implemented. The computer system 1 includes a central processing unit (CPU) 10, a memory 20, and a data storage device 3.
(Eg magnetic disk, CD-ROM), printer 4
0, digital video chip (DVC) 50, video monitor 60, keyboard 70, mouse 80, scanner 9
0, video input device (VIU) 100 to system bus 1
It is connected by 10. As shown in FIG. 3, the VIU 100 is a frame grabber that receives video data from a laser disk player 120, a video camera 140 and a video cassette recorder (VCR) 130 that are video sources, or a remote source that sends a signal RS. Including 150. This remote source is, for example, an RF source (such as a television cable or an antenna) or an ISDN source. The memory 20, as shown in FIG. 2, stores video data formatted into individual frames. In FIG. 3, the frame grabber 150 receives video data from the VCR 130, the video camera 140, the laser disc player 120 or a remote source, formats the data into individual frames, and stores the formatted video data via the system bus 110. Give to 20. The present invention may be implemented by software code stored in data storage device 30 or by a dedicated chip shown as DVC 50 in FIG. As described further below, mouse 80 may be replaced with other equivalent cursor control devices such as light pens, styli, trackballs, and the like. Moreover, the specific functions of the monitor 60 and the mouse 80 may be combined by using a monitor having a touch screen.

【００１５】本発明は、テレビニュース放送の音声映像
レコードに適用可能である。図４は、あるテレビ局によ
る放送の数時間分の時間割を示す。この放送はメインニ
ュース放送２００で終わる。このメインニュース放送２
００の前に、番組Ａ、番組Ｂ及び番組Ｃのような様々な
番組（放送）２１０が組み入れられている。各番組２１
０の間に、コマーシャル２１２が差し込まれている。さ
らに、各番組２１０の前に、約１０秒から３０秒までの
長さで、ニュースアンカー（ニュースを伝える人）が
「１１時のニュースの時間です．．．」で始まる言い
回しを話すことが特徴の、短いニュースプレビュー２１
４がある。メインニュース放送２００の前には、アンカ
ーによるヘッドライン２１６のアナウンスもある。The present invention is applicable to audiovisual records for television news broadcasts. FIG. 4 shows a timetable for several hours of broadcasting by a television station. This broadcast ends with the main news broadcast 200. This main news broadcast 2
Before 00, various programs (broadcast) 210 such as program A, program B, and program C are incorporated. Each program 21
Between 0, the commercial 212 is inserted. In addition, in front of each program 210, the news anchor (the person who conveys the news) has a length of about 10 to 30 seconds and speaks a phrase that starts with "11 o'clock news time ...". A short news preview of 21
There are four. Before the main news broadcast 200, there is also an announcement of the headline 216 by the anchor.

【００１６】ニュースビデオは、一定の高レベルのオブ
ジェクトを容易に認識できる比較的狭いコンテキストを
提供する。さて、図５に、ニュース放送のフレーム３０
０が示されている。このフレーム３００は高レベルのオ
ブジェクト、すなわちニュースアイコン３２０、アンカ
ー３３０及びニュースロゴ３４０からなっている。ニュ
ースアイコン３２０は、その後に続くニュースのテーマ
を絵で示すものである。これらの高レベル・オブジェク
トは、ニュースビデオのコンテキストに頻出し、また、
フレーム内の予測可能な位置に出現するため、比較的簡
単に検出できる。その検出は、一般に、各オブジェクト
に関連した大体のオブジェクト領域（ＲＯＲ）３１０を
サーチすることによりなされる。ビデオ索引付けの分野
で周知の多くのオブジェクト検出法が存在するが、その
いずれかにより検出を行うことができる。News videos provide a relatively narrow context in which certain high-level objects can be easily recognized. Now, referring to FIG. 5, a news broadcast frame 30
0 is shown. This frame 300 consists of high-level objects: news icon 320, anchor 330 and news logo 340. The news icon 320 is a pictorial representation of the news theme that follows. These high-level objects often appear in the context of news videos, and
Since it appears at a predictable position in the frame, it can be detected relatively easily. The detection is typically done by searching the approximate object region (ROR) 310 associated with each object. There are many object detection methods well known in the field of video indexing, any of which can be used for detection.

【００１７】ニュースアイコン第１の実施例は、ニュースビデオの索引付けのためにニ
ュースアイコンを利用する。これは、ニュース放送の重
要なテーマの前に関連したニュースアイコンが出るのが
普通であり、このニュースアイコンは通常、図５に示す
ようにアンカー３３０の顔の隣に表示されるとの仮定に
基づいている。図６は、フレーム４０１〜４１６からな
るニュースビデオのフレームシーケンスを表している。
フレーム４０１，４０２，４０５はそれぞれニュースア
イコン３２０を描写しているが、このニュースアイコン
はポニーを描写するとともにテキスト”ＰＯＮＹＴＡ
ＬＥ”を含む。ニュースアイコン３２０はフレーム４０
６〜４１２に描写されるポニーに関するニュースに関連
している。同様に、フレーム４０３，４０４は、帽子を
かぶった男を描写したニュースアイコン３２１を含んで
いる。このニュースアイコン３２１はフレーム４１４〜
４１６の内容に関連している。本発明によれば、ニュー
スアイコンを探すため、入手できるニュース場面（すな
わちフレーム４０４〜４１６）がサーチされる。見つか
った各ニュースアイコンは”ビデオラベル”に選ばれ、
これに他のフレームが索引付けされる。本記述におい
て、索引付けとは、一定のフレームの音声、映像又はそ
の両方の内容と特定のビデオラベルとの間に、そのビデ
オラベルを参照することにより、ある決まった方法でそ
の内容を検索できるような対応関係を作ることと定義さ
れる。ビデオラベルはテキストのサーチに利用されるキ
ーワードに類似している。News Icon The first embodiment utilizes a news icon for indexing news videos. This is usually based on the assumption that the relevant news icon appears in front of an important theme of the news broadcast, and that this news icon is usually displayed next to the face of the anchor 330, as shown in FIG. Is based. FIG. 6 shows a frame sequence of a news video composed of frames 401 to 416.
Frames 401, 402, 405 each depict a news icon 320, which depict a pony and the text "PONY TA.
LE "is included. News icon 320 is frame 40
Related to the news about ponies depicted in 6-412. Similarly, frames 403 and 404 include a news icon 321 depicting a man wearing a hat. This news icon 321 is in the frame 414-
416 content. In accordance with the present invention, available news scenes (ie, frames 404-416) are searched for a news icon. Each news icon found is selected as a "video label",
Other frames are indexed to this. In this description, indexing refers to the content of a certain frame of audio, video, or both, and a specific video label, so that the content can be searched in a certain method by referring to the video label. It is defined as creating such a correspondence. Video labels are similar to the keywords used to search for text.

【００１８】ニュースアイコンをサーチした後、周知の
類似法を使って残りのフレームをビデオラベルに索引付
けする。利用し得る類似法の例は、相関関数又は主要成
分分析である。そして、各ビデオラベル（ニュースアイ
コン）を拡大して表すアイコンサマリーが生成される。
図７はアイコンサマリー３４５の一例であり、これはビ
デオラベル３５０〜３５９を表し、その中のビデオラベ
ル３５０，３５１はそれぞれニュースアイコン３２０，
３２１に相当する。このアイコンサマリー３４５はプリ
ンタ４０を用いてハードコピーに印刷したり、モニター
６０に表示したり、あるいはその両方をすることができ
る。After searching for the news icon, the remaining frames are indexed into the video label using well known analogy. Examples of similar methods that can be used are correlation functions or principal component analysis. Then, an icon summary in which each video label (news icon) is enlarged and displayed is generated.
FIG. 7 shows an example of the icon summary 345, which represents video labels 350 to 359, in which the video labels 350 and 351 are news icons 320 and 352, respectively.
It corresponds to 321. The icon summary 345 can be printed on a hard copy using the printer 40, displayed on the monitor 60, or both.

【００１９】図８はニュースアイコンに基づいてニュー
スビデオを索引付けする方法６００を示すフローチャー
トである。図６と図８を参照し、方法６００をフレーム
４０１〜４１６に関連して説明する。まず、サーチすべ
きビデオの最初のフレームがメモリ２０より取り出され
る（ステップ６０２）。ニュースアイコンがないかフレ
ーム４０１をサーチすると（ステップ６０４）、ニュー
スアイコン３２０がフレーム４０１で検出される。そこ
で、ニュースアイコン３２０はビデオラベル３５０とし
てメモリ２０に格納される（ステップ６１８）。つい
で、残りのフレーム４０２〜４１６について、その内容
がニュースアイコン３２０の内容と一致するか周知の類
似法により調べられる（ステップ６２０〜６２４）。あ
るフレームの内容がニュースアイコン３２０の内容と一
致すると、そのフレームはビデオラベル３５０（すなわ
ちニュースアイコン３２０）に索引付けされる。図６を
参照すると、ニュースアイコン３２０はポニーの絵を含
んでいるので、類似法を適用すると、少なくともポニー
の一部を表しているフレーム４０６〜４１２はニュース
アイコン３２０の内容との高い類似度が割り当てられる
ことになろう。フレーム４０２〜４１６のそれぞれに類
似法が適用された後、別のニュースアイコンが存在する
か調べるためフレーム４０１〜４１６が再びサーチされ
る。フレーム４０３を調べた時に、帽子をかぶった男を
表すニュースアイコン３２１が検出され、ビデオラベル
３５１として格納される。再び、類似法が利用され、そ
の結果、帽子をかぶった男を表すフレーム４１４〜４１
６にニュースアイコン３２１の内容との高い類似度が割
り当てられる。結果として、フレーム４０６〜４１２は
ニュースアイコン３２０に対応するビデオラベル３５０
に索引付けされ、その一方、フレーム４１４〜４１６は
ニュースアイコン３２１に対応するビデオラベル３５１
に索引付けされる。FIG. 8 is a flow chart illustrating a method 600 for indexing news videos based on news icons. The method 600 will be described with reference to FIGS. 6 and 8 in connection with frames 401-416. First, the first frame of the video to be searched is retrieved from memory 20 (step 602). When the frame 401 is searched for the news icon (step 604), the news icon 320 is detected in the frame 401. Therefore, the news icon 320 is stored in the memory 20 as the video label 350 (step 618). Then, the remaining frames 402 to 416 are examined by the well-known similar method to see if their contents match the contents of the news icon 320 (steps 620 to 624). If the content of a frame matches the content of news icon 320, the frame is indexed into video label 350 (ie, news icon 320). Referring to FIG. 6, since the news icon 320 includes a picture of a pony, applying the similarity method, the frames 406 to 412 representing at least a part of the pony have a high similarity with the content of the news icon 320. Will be assigned. After applying a similar method to each of frames 402-416, frames 401-416 are searched again to see if another news icon is present. When examining frame 403, a news icon 321 representing a man wearing a hat is detected and stored as a video label 351. Again, a similar method is used, resulting in frames 414-41 representing a man wearing a hat.
6 is assigned a high degree of similarity with the content of the news icon 321. As a result, frames 406-412 are video labels 350 corresponding to news icons 320.
, While frames 414-416 correspond to the video icon 351 corresponding to the news icon 321.
Indexed.

【００２０】しばしば、一定のフレームの映像内容それ
自体は、ニュースに関連したニュースアイコンの内容と
はっきりとは関係がない。そのような場合、フレームの
映像内容だけを調べる類似法では、そのフレームを適当
なニュースアイコンに索引付けすることができないであ
ろう。しかし、そのようなフレームに関連した音声内容
は、普通は、視聴者にとってニュースのテーマの映像内
容と結びついている。実際には、音声内容はニュースア
イコン中のテキストと密接に対応した言葉を含んでいる
であろう。したがって、ここで述べる方法は、映像だけ
でなく、ニュースビデオに関連した音声、それと利用で
きるならばテキストも利用する。テキストは、聴力障害
者のための字幕もしくはクローズド・キャプション（Ｃ
losedＣaption）サービス及びニュースワイヤ（newswir
e）サービスを提供するためにしばしば利用される。し
たがって、ビデオの任意のフレームは３つの形態、つま
り映像、音声及びテキストを持つ可能性がある。類似法
の利用により、あるフレーム中の映像のポニー（ＣV）
が検出されなかったときは、音声とテキストが内容一致
を検出するための補助的な基準として用いられる。Often, the video content of a given frame itself is not explicitly related to the content of news icons associated with news. In such cases, a similar method of examining only the video content of a frame would not be able to index that frame to the appropriate news icon. However, the audio content associated with such frames is usually associated with the viewer's news-themed video content. In reality, the audio content will contain words that closely correspond to the text in the news icon. Thus, the method described here utilizes not only video, but also audio associated with news videos and, where applicable, text. The text should be subtitles or closed captions (C
Lossed Caption service and news wire (newswir)
e) Often used to provide services. Therefore, any frame of video can have three forms: video, audio and text. Pony (CV) of video in a frame by using similar method
If is not detected, the voice and text are used as ancillary criteria for detecting content match.

【００２１】図９は、フレーム７０１〜７０７と、ポニ
ーを描写するニュースアイコン３２０を示す。ここで、
方法６００により、ニュースアイコン３２０の内容Ｃと
一致するものを探すためにサーチが行われているとす
る。ニュースアイコン３２０はポニーの絵を含んでいる
ので、ポニーを表しているフレーム７０１〜７０７のど
の映像内容も”Ｃv”として表現できる。同様に、フレ
ーム７０１〜７０７に関連した音声内容中に発せられる
単語”ＰＯＮＹ”は”ＣA”により表現できる。フレー
ム７０１〜７０７のテキスト内容中に出現する単語”Ｐ
ＯＮＹ”を”ＣT”により表現できる。図９のフレーム
７０１〜７０７のどれにもＣV，ＣA又はＣTが存在する
ということは、それらフレーム中に映像、音声又はテキ
ストの”ＰＯＮＹ”がそれぞれ存在することを意味す
る。よって、図９において、フレーム７０１，７０２，
７０７はポニーの映像のみならず、”ＰＯＮＹ”及び／
又は”ＴＡＬＥ”なる単語の音声及びテキストを含んで
いる。フレーム７０４，７０５はポニーの映像だけを含
んでいるのに対し、フレーム７０６は”ＰＯＮＹ”もし
くは”ＴＡＬＥ”又はその両方の単語の音声だけを含ん
でいる。FIG. 9 shows frames 701-707 and a news icon 320 depicting a pony. here,
It is assumed that method 600 is searching for a match with content C of news icon 320. Since the news icon 320 includes the picture of the pony, any video content of the frames 701 to 707 representing the pony can be expressed as "Cv". Similarly, the word "PONY" emitted in the audio content associated with frames 701-707 can be represented by "CA". The word "P" that appears in the text content of frames 701-707
ONY "can be represented by" CT ". The presence of CV, CA or CT in any of the frames 701 to 707 in FIG. 9 means that there is a video, audio or text" PONY "in each of those frames. Therefore, in FIG.
707 is not only the video of the pony, but also "PONY" and /
Or it contains the voice and text of the word "TALE". Frames 704 and 705 contain only the pony image, while frame 706 contains only the audio of the words "PONY" and / or "TALE".

【００２２】マルチモーダル（multi-modal)の内容検出
法の概要が図１０のフローチャートに示されている。ま
ず、あるフレームにポニーの映像が存在するか判定する
ために類似法が適用される（ステップ８０２）。その結
果、対象フレームの映像内容とニュースアイコンの映像
内容との間の類似度を表す値ＳVが生成される。この類
似度ＳVがある閾値を超えるときには（ステップ８０
４）、内容は一致する（ステップ８０６）。この場合、
今調べているフレームはニュースアイコンに対応すると
思われるので、そのフレームは対応したビデオラベルに
索引付けされる（ステップ８０６）。類似度ＳVが閾値
を超えないときには（ステップ８０４)、調べているフ
レームの音声内容が、任意の既存のスピーチ・テキスト
変換方法によってテキストに変換される（ステップ８０
７）。次に、対象フレームの変換された音声内容及びす
べてのテキスト内容がニュースアイコンに含まれるすべ
てのテキストと比較されることにより、音声の類似度Ｓ
A及びテキストの類似度ＳTがそれぞれ決まる（ステップ
８０８）。必要ならば、ＳA，ＳTを生成する際の比較
は、対象フレームの音声及びテキストの内容がニュース
アイコンのすべてのテキストとだけでなく、ニュースア
イコンを含むフレームの全ての音声内容（テキストへ変
換された）又はテキスト内容と比較されるように拡張し
てもよい。例えば、ニュースアイコン自体がテキストを
全く含まない場合に、ニュースアイコンを含むフレーム
の音声内容（テキストに変換された）又はテキスト内容
を利用できる。An overview of the multi-modal content detection method is shown in the flow chart of FIG. First, a similarity method is applied to determine if a pony image is present in a frame (step 802). As a result, a value SV representing the degree of similarity between the video content of the target frame and the video content of the news icon is generated. When the similarity SV exceeds a certain threshold (step 80
4), the contents match (step 806). in this case,
Since the frame currently being examined appears to correspond to the news icon, that frame is indexed into the corresponding video label (step 806). When the similarity SV does not exceed the threshold value (step 804), the voice content of the frame being examined is converted into text by any existing speech / text conversion method (step 80).
7). Next, the converted voice content and all the text content of the target frame are compared with all the texts included in the news icon to obtain the voice similarity S.
The similarity ST of A and the text is determined (step 808). If necessary, the comparison when generating SA and ST is performed so that the audio and text contents of the target frame are not only all the text of the news icon but also all the audio contents of the frame containing the news icon (converted to text ) Or may be extended to be compared with the text content. For example, if the news icon itself does not contain any text, the audio content (converted to text) or text content of the frame containing the news icon may be utilized.

【００２３】再び図９を参照する。ニュースアイコン３
２０は単語”ＰＯＮＴＹＴＡＬＥ”を含んでいるの
で、ビデオ場面をサーチして同様内容を探す時に、対応
フレームの音声及びテキストは単語”ＰＯＮＹ”及び単
語”ＴＡＬＥ”と一致比較されることになろう。そし
て、各フレームの映像、音声及びテキスト内容の類似度
ＳV，ＳA，ＳTに重み値ＷV，ＷA，ＷTがそれぞれ割り当
てられる（ステップ８１０）。よって、重み付けした類
似度を結合することにより、ニュースアイコンの内容と
対象フレームの内容との全体的な類似度を決定すること
ができ、この全体的類似度が閾値と比較される（ステッ
プ８１０）。全体的類似度が所定の閾値を超えないとき
には、一致が検出されず対象フレームはニュースアイコ
ンに対応したビデオラベルに索引付けされない（ステッ
プ８１２）。閾値を超えたときには、一致が検出され、
対象フレームはニュースアイコンを表すビデオラベルに
索引付けされる（ステップ８０６）。Referring again to FIG. News icon 3
Since 20 contains the word "PONTY TALE", when searching a video scene for similar content, the audio and text of the corresponding frame will be matched and compared with the words "PONY" and "TALE". . Then, the weight values WV, WA, and WT are assigned to the similarities SV, SA, and ST of the video, audio, and text contents of each frame (step 810). Therefore, by combining the weighted similarities, the overall similarity between the content of the news icon and the content of the target frame can be determined, and this overall similarity is compared with the threshold value (step 810). . If the overall similarity does not exceed the predetermined threshold, then no match is detected and the frame of interest is not indexed to the video label corresponding to the news icon (step 812). When the threshold is exceeded, a match is detected,
The frame of interest is indexed into the video label representing the news icon (step 806).

【００２４】情緒的索引付け以下に述べる方法は”情緒的（affective）索引付け”
と呼ばれる。この方法は、人は話している時や話を聞い
ている時に様々な身振りをしたり、表情を変えたり、声
の大きさを変えたり、あるいは、それらの振る舞いを同
時にしがちであることを利用する。これらの振る舞い
は”情動（affects）”と呼んでよいであろう。ここ
で、”情動”とは、自分の気分や他人に対する反応を示
し、あるいは人の話の内容と密接に対応する人の動作又
は反応のことである。ある音声映像レコードに一定の情
動が存在するということは、その情動が現れるフレーム
のすぐ後に有意な情報が来るということを暗示する。よ
って、情緒的索引付けにおいては、音声映像レコードに
捕捉された人の情動が識別されてレコードの索引付けに
利用されるが、これについて以下に詳細に述べる。Emotional Indexing The method described below is "affective indexing".
Called. This method suggests that people tend to make various gestures, change their facial expressions, change their loudness, or both at the same time when they are talking or listening. To use. These behaviors may be called "affects". Here, "emotion" refers to a person's action or reaction that indicates his / her mood or reaction to another person, or closely corresponds to the content of a person's story. The presence of constant emotion in an audiovisual record implies that significant information comes immediately after the frame in which the emotion appears. Thus, in emotional indexing, the human emotions captured in the audiovisual records are identified and used to index the records, which is described in detail below.

【００２５】情緒的索引付けは、１人の話者のレコード
の索引付けに利用できる。しかし、情緒的索引付けは、
二人以上の参加者間のテレビ電子会議のレコードに適用
された時に特に効果的であろう。つまり、情緒的索引付
けをテレビ電子会議の索引付けに用いると、その会議
の”ビデオ議事録”に相当する映像索引を得られる。映
像レコードに関連した音声レベルの変化の検出や２つの
ビデオフレーム間の相対的動き（表情の変化や身振り
等）の検出のための技術には、いくつもの公知の手法が
存在している。そのような手法の詳細は、本発明を理解
する目的には重要ではないので、ここでは説明しない。
そのような公知の手法の一つが、上に述べたような情動
を含んでいる電子会議ビデオのフレームを識別するため
に使用される。そして、そのようなフレームはビデオラ
ベルとして用いられ、これに残りのフレームが索引付け
される。そして、ビデオラベルとして利用されるフレー
ムを表す図７に示したものと同様なサマリー(summary）
が、ハードコピーとして又はモニター６０上に生成され
る。このサマリーは電子会議の”ビデオ議事録”として
利用できる。すなわち、このサマリーは、文書の”議事
録”が会議や集会の記録を提供するために一般に利用さ
れるのと同じような方法で、電子会議の重要な瞬間や出
来事を提供する。Emotional indexing can be used to index the records of one speaker. But emotional indexing is
It would be particularly effective when applied to a videoconference record between two or more participants. That is, if emotional indexing is used for indexing a videoconference, a video index corresponding to the "video minutes" of the conference can be obtained. There are several known techniques for detecting changes in audio level associated with video records and for detecting relative movement between two video frames (such as changes in facial expressions and gestures). The details of such an approach are not important here for the purpose of understanding the invention and are therefore not described here.
One such known technique is used to identify frames of a teleconferencing video that contain emotions as described above. Then, such a frame is used as a video label to which the remaining frames are indexed. Then, a summary similar to that shown in FIG. 7, which represents a frame used as a video label.
Are generated as a hard copy or on the monitor 60. This summary can be used as the "video minutes" for the teleconference. That is, this summary provides important moments and events of the teleconference, in a manner similar to how the "minutes" of a document are commonly used to provide records of meetings and gatherings.

【００２６】図１１は、電子会議ビデオを索引付けして
ビデオ議事録を生成する本方法９００の概要を示す。ま
ず、一つのフレームがメモリ２０より取り出される（ス
テップ９０２）。取り出されたフレームは、公知の検出
手法によって、情動の存在を検出するためサーチされる
（ステップ９０４）。本方法９００において探索される
情動は被写体である人間の何らかの動きであるが、前に
言及したように音声レベルの有意な変化を識別するよう
に探索を拡張することも容易であろう。ある有意な動き
が見つかると（ステップ９０６）、それが見つかったフ
レームがビデオラベルとして利用される（ステップ９１
０）。データファイルの最後のフレームまで達していな
ければ（ステップ９１４）、次のフレームが取り出され
（ステップ９１６）、有意な動きがないか調べられる
（ステップ９０４）。このフレームとその前のフレーム
との間に意味のある変化が検出されなければ、すなわち
有意な動きが検出されなければ（ステップ９０６）、そ
のフレームは最も最近選ばれたビデオラベルに索引付け
される（ステップ９１２）。しかし、そのフレームが新
たな有意な動きを含んでいるときには、その有意な動き
を含むフレームから新たなビデオラベルが生成される
（ステップ９１０）。したがって、異なった有意な動き
を含む２つのフレームの間にある全てのフレームは、そ
の２フレーム中の一つ目のフレームより作られたビデオ
ラベルに索引付けされる。フレーム全部がビデオラベル
の生成のために利用されるか、あるいはビデオラベルに
索引付けされたならば、ビデオラベルのサマリーが電子
会議のビデオ”議事録”として生成される（ステップ９
１８）。FIG. 11 shows an overview of the method 900 for indexing teleconference videos to generate video minutes. First, one frame is fetched from the memory 20 (step 902). The retrieved frames are searched for the presence of emotion by known detection techniques (step 904). Although the emotion searched for in the method 900 is some movement of the human being, the subject, it would be easy to extend the search to identify significant changes in audio level, as mentioned previously. When a significant motion is found (step 906), the frame in which it is found is used as a video label (step 91).
0). If the last frame of the data file has not been reached (step 914), the next frame is retrieved (step 916) and examined for significant motion (step 904). If no meaningful change is detected between this frame and the previous frame, ie no significant motion is detected (step 906), the frame is indexed to the most recently selected video label. (Step 912). However, if the frame contains new significant motion, a new video label is generated from the frame containing the significant motion (step 910). Therefore, all frames between two frames that contain different significant motion are indexed to the video label made from the first of the two frames. If the entire frame is used for video label generation or indexed into video labels, a video label summary is generated as a video "minutes" of the teleconference (step 9).
18).

【００２７】この索引付け方法９００は、アテンション
・ドリブン（attention-driven）索引付けと呼ばれる別
種の索引付けと組み合わせることもできる。アテンショ
ン・ドリブン索引付けは、テレビ電子会議における二人
以上の参加者による有意な動きは、しばしば有意な情報
のやり取りと時間的に密接に関連しているという事実に
基づくものである。テレビ電子会議は、別々の参加者に
焦点を合わせた複数のカメラを使って記録されるであろ
う。したがって、図１２に複数のソースによる電子会議
のビデオが同時に表示された画面９５０を示す。図１２
において、ウインドウ９６１〜９６４はそれぞれ記録し
た参加者９６５〜９６８の映像を表示する。アテンショ
ン・ドリブン索引付けによれば、参加者９６５〜９６８
の１人１人について、その動きの大きさと方向を示す動
きベクトルが周期的に計算される。ある時点における二
人以上の参加者に関連した動きベクトル間の類似度が高
ければ、それら参加者による”同調した（coherent）動
き”を意味する。（２つ以上のソースに関連した）２つ
以上の同時点のビデオフレームにおける同調した動きの
発生が、残りのフレームを索引付けするために利用され
る。つまり、その同調した動きと一致するウインドウ９
６１，９６２，９６３又は９６４に表された同時点フレ
ームのどれからビデオラベルを生成してもよい。ビデオ
ラベルのサマリーは、前述の方法により生成できる。This indexing method 900 can also be combined with another type of indexing called attention-driven indexing. Attention-driven indexing is based on the fact that significant movements by two or more participants in video teleconferencing are often closely related in time to significant information exchange. Video teleconferencing will be recorded using multiple cameras focused on different participants. Accordingly, FIG. 12 shows a screen 950 where video of a teleconference from multiple sources is displayed simultaneously. FIG.
In, windows 961 to 964 display the recorded images of participants 965 to 968, respectively. Participants 965-968 according to attention driven indexing
A motion vector indicating the magnitude and direction of the motion is periodically calculated for each person. A high degree of similarity between motion vectors associated with two or more participants at a given time means "coherent motion" by those participants. The occurrence of synchronized motion in two or more simultaneous point video frames (associated with more than one source) is used to index the remaining frames. That is, the window 9 that matches the synchronized movement
The video label may be generated from any of the simultaneous point frames represented at 61, 962, 963 or 964. The video label summary can be generated by the method described above.

【００２８】プレビュー索引付け本発明による３番目の方法は、図４に示したプレビュー
場面２１４を利用する。この方法の基礎となっているの
は、プレビュー場面２１４が概して数時間にわたりメイ
ンニュース放送２００に先行して繰り返されるものであ
ること、したがって、頻繁に繰り返されるフレームが、
他のフレームが索引付けされるビデオラベルとして利用
される、ということである。このビデオラベルは、プレ
ビュー場面２１４又はメインニュース放送（場面）２０
０、あるいは、その両方の他のフレームの索引付けに利
用することができる。Preview Indexing A third method in accordance with the invention utilizes the preview scene 214 shown in FIG. The basis of this method is that the preview scene 214 is generally repeated prior to the main news broadcast 200 over a period of several hours, and thus the frequently repeated frames are
That is, other frames are used as indexed video labels. This video label is used for preview scene 214 or main news broadcast (scene) 20.
It can be used to index other frames of zero, or both.

【００２９】図１３は、プレビュー場面２１４を使って
メインニュース放送２００を索引付けする方法１１００
の概要を示すフローチャートである。あるプレビューフ
レームが公知の類似法により残りのプレビュー場面と比
較される（ステップ１１０４）。そして、内容”一致”
の数が所定の閾値と比較される（ステップ１１０６）。
その一致数が閾値を超えたならば、そのフレームはビデ
オラベルとして利用される（ステップ１１０８）。その
フレームに関連し、かつそのフレームのすぐ後に続くテ
キスト又は音声をセーブし、そのビデオラベルに索引付
けすることができる。次に、メインニュース放送（場
面）２００のフレームが、ビデオラベルとして利用され
たプレビューフレームと内容が一致するか調べられ、前
に述べたやり方で索引付けされる（ステップ１１１０〜
１１２０）。最後に、メインニュース放送２００の全て
のフレームが索引付けされた後、プレビューフレームを
代表するビデオラベルのサマリーが生成される（ステッ
プ１１２４）。FIG. 13 illustrates a method 1100 for indexing main news broadcast 200 using preview scene 214.
3 is a flowchart showing an outline of the above. A preview frame is compared to the rest of the preview scenes by a known method (step 1104). And the content "match"
Is compared with a predetermined threshold (step 1106).
If the number of matches exceeds the threshold, the frame is used as a video label (step 1108). The text or audio associated with the frame and immediately following it can be saved and indexed into the video label. Next, the frames of the main news broadcast (scene) 200 are examined for a match in content with the preview frame used as the video label and indexed in the manner previously described (steps 1110).
1120). Finally, after all the frames of the main news broadcast 200 have been indexed, a summary of video labels representing the preview frames is generated (step 1124).

【００３０】ビデオの編集本発明は、編集者が、フレームシーケンスのハードコピ
ー上に編集記号を手描きすることによって、索引付けの
済んだビデオを編集したり検索したりできるようにする
手法も包含する。図７に戻り、アイコンサマリー３４５
は、コンピュータシステムにより、紙片又はスキャナ９
０に読み取らせることが可能な他の材料に出力される。
あるいは、アイコンサマリー３４５をモニタ６０に表示
させるだけでもよい。ユーザーがサマリー３４５内のビ
デオラベル３５１に関連した映像及び音声を視聴したい
と思ったとする。そこで、ユーザーはビデオラベル３５
１を囲む円５０１を手描きする。このマークを記入した
サマリーはスキャナ９０に送り込まれ、そこでデジタイ
ズされてメモリ２０にロードされる。サマリー３４５が
モニタ６０に表示されるだけの場合には、ユーザーは、
ライトペンやタッチスクリーンモニタを利用できるな
ら、それを使ってビデオラベルを囲む円を手描きしてよ
い。コンピュータシステム１は、記号認識論理を使って
手描き記号５０１をユーザに選択されたものと解釈し、
それに対応した記録ビデオ部分を検索して再生する。コ
ンピュータシステム１は、各ビデオラベルのハードコピ
ー上のＸ−Ｙ座標値を予め記憶しているため、各手描き
記号から適切なビデオラベルを知ることができる。ある
いは、ハードコピーのサイドチャネル（すなわち余白）
に、ハードコピー上の各ビデオラベルの物理的位置を判
断する手段として２次元バーコード又は同様の識別模様
を設けてもよい。Video Editing The present invention also includes techniques for allowing an editor to edit and search indexed video by hand-drawing edit symbols on a hard copy of the frame sequence. . Returning to FIG. 7, the icon summary 345
Is a piece of paper or a scanner 9 depending on the computer system.
It is output to another material that can be read by 0.
Alternatively, the icon summary 345 may simply be displayed on the monitor 60. Suppose the user wants to view the video and audio associated with video label 351 in summary 345. So the user can
A circle 501 surrounding 1 is hand-drawn. The summary with this mark is sent to the scanner 90, where it is digitized and loaded into the memory 20. If the summary 345 is only displayed on the monitor 60, the user
If you have a light pen or touchscreen monitor, you can use it to draw a circle around the video label. The computer system 1 uses the symbol recognition logic to interpret the hand-drawn symbol 501 as selected by the user,
The recorded video portion corresponding to it is searched and reproduced. Since the computer system 1 stores the XY coordinate values on the hard copy of each video label in advance, the appropriate video label can be known from each hand-drawn symbol. Or a hardcopy side channel (ie margin)
Alternatively, a two-dimensional bar code or similar identifying pattern may be provided as a means of determining the physical location of each video label on the hard copy.

【００３１】さて、図１４を参照する。ユーザーは拡大
したフレーム１２０１〜１２１６の時間順シーケンスを
表した紙上の（又はモニタ６０に表示された）フレーム
ディスプレイ１２００を手に入れることができる。ユー
ザは、このフレームシーケンスのフレーム１２０２〜１
２０４とフレーム１２１３〜１２１６を削除する編集を
したいとする。さらに、ユーザはフレーム１２０５をフ
レーム１２０１で置き換えたいとする。しかして、ユー
ザは削除すべきフレームの上に削除記号１２１７を、ま
た、フレーム１２０１，１２０５の上に切り取り／貼り
付け記号１２１８を手書きする。つぎに、この記号が記
入されたフレームディスプレイはスキャナ９０に送り込
まれ、そこでデジタイズされて用意された論理により解
釈される。そして、解釈された編集コマンドに基づい
て、図１５に示すように編集されたビデオシーケンス１
２２０が生成される。Referring now to FIG. The user can obtain a frame display 1200 on paper (or displayed on monitor 60) that represents a time-ordered sequence of enlarged frames 1201-1216. The user selects frames 1202-1 of this frame sequence.
Suppose you want to edit to delete 204 and frames 1213-1216. Further, the user wants to replace frame 1205 with frame 1201. The user then handwrites the delete symbol 1217 on the frame to be deleted and the cut / paste symbol 1218 on the frames 1201, 1205. Next, the frame display on which this symbol is written is sent to the scanner 90 where it is digitized and interpreted by the prepared logic. Then, based on the interpreted edit command, the video sequence 1 edited as shown in FIG.
220 is generated.

【００３２】図１６は、フレームディスプレイを表すハ
ードコピー（又はモニタ６０）上にユーザが手描きする
ことができる編集記号の表を示す。当該技術分野におい
て周知の標準的な記号認識法を手描き記号の認識に利用
できる。図１６の編集記号表を使って行うことができる
編集機能の例は、フレームを削除すること、フレームを
切り取って貼り付けること、フレームを切り取って挿入
すること、指定角度だけフレームを左又は右に回転させ
ること、フレームをぼけさせたり鮮明化すること、及
び、選んだＲＢＧ値のブランクフレームを生成すること
である。図１６に示した編集記号表は、それが全てとい
うわけではなく、本発明の範囲を逸脱しない範囲で、他
の編集機能を含むよう容易に拡張できることは明白であ
ろう。FIG. 16 shows a table of edit symbols that the user can hand draw on a hard copy (or monitor 60) representing a frame display. Standard symbol recognition methods known in the art can be used to recognize hand-drawn symbols. Examples of editing functions that can be performed using the edit symbol table of FIG. 16 are deleting a frame, cutting and pasting a frame, cutting and inserting a frame, and moving the frame left or right by a specified angle. Rotating, blurring or sharpening the frame, and creating a blank frame of the chosen RBG value. It will be apparent that the edit symbol table shown in FIG. 16 is not exhaustive and can be readily expanded to include other editing functions without departing from the scope of the invention.

【００３３】表１は図１６に示した編集記号表により実
行可能な編集機能をまとめたものである。Table 1 summarizes the editing functions that can be executed by the editing symbol table shown in FIG.

【００３４】[0034]

【表１】 [Table 1]

【００３５】図１７及び図１８は、図１６に示した編集
コマンドのいくつかを、別のフレームディスプレイ１４
００に適用した例を表している。図１８及び図１９は、
編集コマンドを解釈実行して得られる出力（編集後のフ
レーム）を示す。図１７において、記号１４１４がフレ
ーム１４０１の上に描かれているが、これは”左４５度
回転”コマンドを意味する。したがって、フレーム１４
０１の画像を左に４５度回転したものが図１８に見え
る。フレーム１４０２〜１４０４及びフレーム１４０９
〜１４１２の上に別の記号１４１５が描かれており、そ
れらフレームを新たな（別の）ファイルにセーブすべき
ことを指示している。しかして、図１９に示す新たなフ
ァィルは、フレーム１４０２〜１４０４，１４０９〜１
４１２と同じフレーム１４３１〜１４３７を含む。フレ
ーム１４０５の上には”２００％拡大”コマンドを意味
する編集記号が描かれている。フレーム１４０６の上に
は５０％縮小”コマンドを意味する編集記号１４１６が
描かれている。編集コマンドを組み合わせることができ
ることは、フレーム１４０６，１４０８に関して示す通
りである。フレーム１４０６からフレーム１４０８まで
矢印が描かれているが、フレーム１４０６の縮小結果を
フレーム１４０８にスーパーインポーズする（貼り付け
る）ことを指示する。その結果は図１８に見られる。FIGS. 17 and 18 show some of the editing commands shown in FIG.
00 is applied. 18 and 19 show
The output (frame after editing) obtained by interpreting and executing the editing command is shown. In FIG. 17, the symbol 1414 is drawn above the frame 1401 which means a "rotate left 45 degree" command. Therefore, the frame 14
FIG. 18 shows the image of 01 rotated 45 degrees to the left. Frames 1402-1404 and 1409
Another symbol 1415 is drawn above 1412 to indicate that those frames should be saved to a new (different) file. Thus, the new file shown in FIG. 19 has frames 1402-1404, 1409-1.
412 includes the same frames 1431 to 1437. On the frame 1405, an edit symbol indicating a "200% enlargement" command is drawn. An edit symbol 1416 is drawn above the frame 1406 to mean a "reduce 50%" command. The combination of edit commands is as shown for frames 1406 and 1408. Arrows from frame 1406 to frame 1408 are shown. As depicted, it indicates to superimpose (paste) the reduced result of frame 1406 into frame 1408. The result can be seen in FIG.

【００３６】図２０は、異なった編集記号が描かれた別
のフレームディスプレイ１５００を表している。図２１
は、その結果として出力されるフレームシーケンス１５
２０を示している。記号１５１８がフレーム１５０１〜
１５０３の上に描かれ、フレーム１５０１を現在位置か
ら切り取ってフレーム１５０３の前に挿入することを指
示している。別の記号１５１９がフレーム１５０４，１
４０７，１５１０の上に描かれ、フレーム１５０４をコ
ピーしてフレーム１５１０の前に挿入することを指示し
ている。FIG. 20 illustrates another frame display 1500 with different edit symbols drawn on it. FIG.
Is the resulting frame sequence 15
20 is shown. The symbol 1518 is the frame 1501
Drawn on 1503, it indicates that frame 1501 should be cut from its current position and inserted in front of frame 1503. Another symbol 1519 is a frame 1504,1
Draw on top of 407 and 1510 to indicate that frame 1504 should be copied and inserted before frame 1510.

【００３７】なお、コマンドの結合を可能にするために
は、数学の場合と同様に、コマンドの優先順を管理する
ためのルールセットを開発しなければならない。例え
ば、”全ての拡大縮小コマンドは他のどのコマンドより
も先に実行すべき”といったルールが適用されるかもし
れない。特定の用途又はユーザーのニーズに合わせるた
め特有の優先順ルールを開発してもよい。In order to enable the combination of commands, it is necessary to develop a rule set for managing the priority order of commands, as in the case of mathematics. For example, a rule may be applied that "all scaling commands should be executed before any other command". Specific priority rules may be developed to suit a particular application or user need.

【００３８】よって、以上に説明した編集記号表とその
対応方法は、編集者に、データベースに格納されている
ビデオを、そのデータベースを直接的にアクセスせず、
効率的に編集できる編集技法を提供する。つまり、編集
者は、編集したいフレームを表すハードコピーに編集コ
マンドを手描きすることによりビデオ編集が可能であ
り、その記入したハードコピーを自動的に解釈させるこ
とにより、その後に、希望するならば別の場所で、ビデ
オを編集することができる。Therefore, according to the editing symbol table and the corresponding method described above, the editor does not directly access the database stored in the database.
Provide an editing technique that enables efficient editing. In other words, the editor can edit the video by hand-drawing the edit command on the hard copy representing the frame he wants to edit. You can edit the video at your location.

【００３９】特定の実施例に関連して本発明を説明した
が、特許請求の範囲に記載された本発明の精神と範囲か
ら逸脱することなく、様々な変形と変更をしてもよいこ
とは明白であろう。よって、本明細書及び図面は、本発
明を説明するためのものであって、本発明の限定を意図
したものではないと考えるべきである。Although the present invention has been described with reference to particular embodiments, it is understood that various changes and modifications may be made without departing from the spirit and scope of the invention as claimed. Would be obvious. Therefore, the specification and drawings should be considered as illustrative of the present invention and not intended to limit the present invention.

【００４０】[0040]

【発明の効果】以上に詳細に説明した如く、本発明によ
れば、ニュース放送に見られるニュースアイコン等のよ
うな、ビデオ場面内の高レベルのオブジェクトであるア
イコンや、ニュース放送のメインニュースの前に繰り返
されるプレビューフレームを利用して、ニュース放送等
のレコードの内容ベースの索引生成及び索引付けを行う
ことができる。フレームの映像のほかに音声及び／又は
テキストの情報を索引生成及び索引付けに利用すること
により、映像だけでは索引付けが困難なフレームに関し
ても適切な索引付けが可能になる。人間の情動を利用す
ることにより、人間を描写するビデオレコードの索引生
成及び索引付けが可能になり、また、テレビ電子会議レ
コードの”ビデオ議事録”を得ることができる。ビデオ
レコードの編集したいフレームを紙等に表したフレーム
ディスプレイに編集記号を記入するだけで、ビデオデー
タベースにアクセスすることなく、簡単にフレーム編集
を行うことができるようになる、等々の多くの効果を得
られる。As described above in detail, according to the present invention, an icon which is a high-level object in a video scene, such as a news icon seen in a news broadcast, or the main news of a news broadcast is displayed. Previously repeated preview frames can be used for content-based index generation and indexing of records such as news broadcasts. By using the audio and / or text information in addition to the video of the frame for index generation and indexing, it is possible to appropriately index a frame that is difficult to be indexed by the video alone. The use of human emotions allows the indexing and indexing of video records that depict humans, and also provides a "video minutes" of teleconference records. You can easily edit frames without having to access the video database by simply entering the edit symbol on the frame display that shows the frame of the video record you want to edit on paper. can get.

[Brief description of drawings]

【図１】本発明の一実施例を実現するコンピュータシス
テムを示すブロック図である。FIG. 1 is a block diagram showing a computer system that implements an embodiment of the present invention.

【図２】ビデオデータのフレームを格納するメモリの説
明図である。FIG. 2 is an explanatory diagram of a memory that stores a frame of video data.

【図３】ビデオデータのフレームをバスへ転送するため
のビデオ入力ユニット（ＶＩＵ）を示すブロック図であ
る。FIG. 3 is a block diagram illustrating a video input unit (VIU) for transferring a frame of video data to a bus.

【図４】テレビ放送局によるテレビ放送の数時間分の時
間割りを示す図である。FIG. 4 is a diagram showing a time allocation for several hours of television broadcasting by a television broadcasting station.

【図５】ニュース放送の一つのフレームを示す図であ
る。FIG. 5 is a diagram showing one frame of news broadcasting.

【図６】ニュースビデオのフレームシーケンスを示す図
である。FIG. 6 is a diagram showing a frame sequence of a news video.

【図７】ビデオラベルを表すアイコンサマリーを示す図
である。FIG. 7 is a diagram showing an icon summary representing a video label.

【図８】ニュースアイコンに基づきニュースビデオを索
引付けする方法を示すフローチャートである。FIG. 8 is a flowchart illustrating a method of indexing news videos based on news icons.

【図９】ビデオフレームのシーケンス、及び、それらフ
レームの内容と比較されるニュース・アイコンを示す図
である。FIG. 9 shows a sequence of video frames and a news icon compared to the contents of those frames.

【図１０】マルチモードの内容検出を映像、音声及びテ
キストに基づいて行う方法を示すフローチャートであ
る。FIG. 10 is a flowchart illustrating a method of performing multi-mode content detection based on video, audio and text.

【図１１】電子会議ビデオを索引付けして電子会議のビ
デオ議事録を生成する方法を示すフローチャートであ
る。FIG. 11 is a flow chart illustrating a method of indexing a teleconference video to generate a teleconference video minutes.

【図１２】テレビ電子会議に用いられる、４人の参加者
を描写しているビデオディスプレイを示す図である。FIG. 12 shows a video display depicting four participants used in a video teleconference.

【図１３】プレビュー場面を利用してニュースビデオを
索引付けする方法の概要を示すフローチャートである。FIG. 13 is a flow chart outlining a method of indexing news videos using preview scenes.

【図１４】編集コマンドが手描きされたビデオフレーム
のシーケンスを表すフレームディスプレイを示す図であ
る。FIG. 14 shows a frame display representing a sequence of video frames with edit commands hand-drawn.

【図１５】図１４に示された手描き編集コマンドに従っ
て編集されたビデオフレームの出力シーケンスを示す図
である。FIG. 15 is a diagram showing an output sequence of a video frame edited according to the handwriting edit command shown in FIG.

【図１６】フレームディスプレイ上に手描きできる編集
記号の表を示す図である。FIG. 16 is a diagram showing a table of edit symbols that can be hand-drawn on the frame display.

【図１７】編集コマンドが手描きされたビデオフレーム
のシーケンスを表すフレームディスプレイを示す図であ
る。FIG. 17 is a diagram showing a frame display showing a sequence of video frames in which edit commands are hand-drawn.

【図１８】図１７に示した手描き編集コマンドに従って
生成されたビデオフレームの出力シーケンスを示す図で
ある。FIG. 18 is a diagram showing an output sequence of a video frame generated according to the handwriting edit command shown in FIG. 17.

【図１９】図１７に示した手描き編集コマンドに従って
生成されたビデオフレームの出力シーケンスを示す図で
ある。19 is a diagram showing an output sequence of a video frame generated in accordance with the handwriting edit command shown in FIG.

【図２０】編集コマンドが手描きされたビデオフレーム
のシーケンスを表すフレームディスプレイを示す図であ
る。FIG. 20 is a diagram showing a frame display representing a sequence of video frames with edit commands hand-drawn.

【図２１】図２０に表した手描き編集コマンドに従って
編集されたビデオフレームの出力シーケンスを示す図で
ある。21 is a diagram showing an output sequence of a video frame edited in accordance with the handwriting edit command shown in FIG.

[Explanation of symbols]

１コンピュータシステム１０中央処理装置（ＣＰＵ）２０メモリ３０データ記憶装置４０プリンタ５０デジタルビデオチップ（ＤＶＣ）６０ビデオモニタ７０キーボード８０マウス９０スキャナ１００ビデオ入力装置（ＶＩＵ）１１０システムバス１２０レーザーディスクプレイヤー１３０ビデオカセットレコーダー（ＶＣＲ）１４０ビデオカメラ１５０フレームグラバー２００メインニュース放送（場面）２１０番組２１２コマーシャル２１４ニュースプレビュー（場面）２１６ヘッドライン３００フレーム３１０オブジェクト領域３２０ニュースアイコン３３０アンカー３４０ニュースロゴ４０４〜４１６フレーム３４５アイコンサマリー３５０〜３５９ビデオラベル７０１〜７０７フレーム９６１〜９６４ウインドウ９６５〜９６８参加者１２００フレームディスプレイ１２０１〜１２１６フレーム１２１７〜１２１８編集記号１３０１〜１３１３編集記号１４００フレームディスプレイ１４０１〜１４１２フレーム１４１４から１４１６編集記号１５００フレームディスプレイ１５０１〜１５１６フレーム１５１８，１５１９編集記号 1 Computer System 10 Central Processing Unit (CPU) 20 Memory 30 Data Storage Device 40 Printer 50 Digital Video Chip (DVC) 60 Video Monitor 70 Keyboard 80 Mouse 90 Scanner 100 Video Input Device (VIU) 110 System Bus 120 Laser Disc Player 130 Video Cassette recorder (VCR) 140 Video camera 150 Frame grabber 200 Main news broadcast (scene) 210 Program 212 Commercial 214 News preview (scene) 216 Headline 300 frame 310 Object area 320 News icon 330 Anchor 340 News logo 404-416 frame 345 icon Summary 350-359 Video Label 701-707 Frame 61-964 window 965 to 968 participants 1200 frame display 1201-1216 frame 1217-1218 editing symbol from 1301 to 1313 editing symbol 1400 frame display 1401 to 1412 from the frame 1414 1416 Edit symbol 1500 frame display 1501-1516 frame 1518,1519 editing symbols

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｆ 15/62 Ｐ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI technical display location G06F 15/62 P

Claims

[Claims]

1. A method of generating an index of a record having video content, wherein the index is composed of a plurality of labels, the record is composed of a plurality of frames, and some of the frames are included in at least a plurality of icons. Including one icon of, and using the icon to generate multiple labels,
And indexing the one frame into the one label when the content of the one frame of the plurality of frames matches the content of the icon associated with the one label of the plurality of labels. Generation method.

2. The index generation method according to claim 1, wherein
The index generation method is characterized in that the record is a record of television news broadcast, some icons are news icons, and the record has audio contents.

3. The index generation method according to claim 2, further comprising the step of generating an icon summary including labels.

4. The method further comprising measuring a similarity between the content of the one frame and the content of one icon among a plurality of icons, the content being related to the one frame and the one label. 3. The index generation method according to claim 2, wherein the content of the icon matches when the similarity between the content of the frame and the content of the icon exceeds a predetermined threshold value.

5. A method for generating a video index of a video / audio record of a television news broadcast, wherein the index is composed of a plurality of video labels, and the record is composed of a plurality of frames having audio and video contents. Searching a plurality of frames for a certain news icon, (b) storing the news icon as one video label among a plurality of video labels when the news icon is found, (c) the news Measuring the similarity between the content of each frame following the frame containing the icon and the content of the news icon, (d) determining which frame matches the news icon based on the similarity Step (e) a step of indexing each frame that matches the news icon into the video label And (f) repeating the steps (a) to (e) to generate a plurality of video labels so that substantially all of the frames not containing a news icon are indexed into one video label. Index generation method.

6. The index generation method according to claim 5, further comprising a step of generating an icon summary including a video label.

7. The index generating method according to claim 5,
Indexing method (e) includes associating audio and video content of each frame that matches the news icon with the video label.

8. The index generation method according to claim 5,
An index generation method characterized in that the similarity is measured by a correlation method.

9. The index generation method according to claim 5,
An index generation method characterized in that the similarity is based on the video content of the frame.

10. The index generation method according to claim 9, wherein the similarity is further based on the audio content of the frame.

11. The index generation method according to claim 10, wherein the news icon has text contents, and the step (c) of measuring the degree of similarity converts the audio contents of the frame to text, and An index generation method comprising a step of determining a voice component of similarity by comparing with a text content of a news icon.

12. The index generating method according to claim 9, wherein the similarity is further based on the text content of the frame.

13. The index generation method according to claim 12, wherein the news icon has a text content, and the step (c) of measuring the similarity of each frame compares the text content of the frame with the text content of the news icon. An index generation method comprising the step of determining a text component of similarity.

14. A device for generating a video index of an audio-video record, wherein the index is composed of a plurality of labels, the record is composed of a plurality of frames having a video content, means for searching a frame and searching for an icon, A means for storing the icon as one of a plurality of labels when the icon is found, a means for measuring the similarity between the content of each frame following the frame containing the icon and the content of the icon An index generation device comprising: a unit that determines which frame matches the icon based on the similarity; and a unit that indexes each frame that matches the icon into the video label.

15. The index generating device according to claim 14, wherein the audio / video record is a television news broadcast record and the icon is a news icon.

16. The method according to claim 14, further comprising means for generating an icon summary including a label.
The described index generation device.

17. The index generation device according to claim 14, wherein the indexing means includes means for associating the audio and video contents of each frame corresponding to the icon in a plurality of frames with the label. And an index generator.

18. The index generating device according to claim 14, wherein the similarity is based on the video content of the frame.

19. The index generation device according to claim 18, wherein the similarity is further based on the audio content of the frame.

20. The index generating device according to claim 19, wherein the icon has a text content, the means for measuring the degree of similarity converts the audio content of the frame to text, and the text content of the icon. An index generating device comprising means for determining a voice component of similarity degree by comparing with the index generating device.

21. The index generation device according to claim 18, wherein the similarity is further based on the text content of the frame.

22. The index generating device according to claim 21, wherein the icon has a text content, and the means for measuring the similarity determines the text component of the similarity by comparing the text content of the frame with the text content of the icon. An index generation device comprising:

23. An apparatus for indexing audiovisual records for television news broadcasts, comprising: a memory for storing a plurality of frames having audio and video content, at least some of which are at least a plurality of frames. The content of each frame that contains one icon among multiple news icons, and generates multiple video labels from multiple news icons, and that does not include one news icon among multiple news icons in multiple frames. , The content of each news icon in multiple news icons is compared and compared, and each frame with the content that matches the content of one news icon in multiple news icons in that frame Index into the video label equivalent to,
An indexing device having processor logic coupled to the memory.

24. The indexing device of claim 23, wherein the processor logic creates an icon summary of labels.

25. A method of indexing a record having video content depicting one or more human beings, wherein the record comprises a plurality of frames, some of which are each at least one human being. Comprising one emotion of a plurality of emotions relating to, and determining which frame contains an emotion, each frame depicting an emotion in the plurality of frames,
Storing as one label in the plurality of labels, but one label per frame, and for each frame that describes an emotion in the plurality of frames, another frame that corresponds to the frame that describes the emotion An indexing method comprising indexing each to a label generated from a frame depicting the emotion.

26. The indexing method of claim 25, wherein the plurality of emotions includes a plurality of significant movements by at least one human.

27. The indexing method according to claim 25, wherein the record further includes audio content, each frame of the record has an audio level, and a plurality of emotions have a significant audio level of one frame among the plurality of frames. Indexing method characterized by including various changes.

28. The indexing method according to claim 25, wherein each label in the plurality of labels corresponds to one frame among a plurality of frames containing a certain emotion.

29. The method of claim 25, further comprising generating a summary of a plurality of labels.
Indexing method.

30. The indexing method of claim 25, wherein the step of indexing includes, for each frame containing an emotion, determining which of the subsequent frames matches that frame. Indexing method characterized by.

31. The indexing method of claim 25, wherein the step of determining comprises searching a frame subsequent to the first frame containing one emotion for a second frame containing another emotion, and , Indexing a frame between the first frame and the second frame when the second frame is found, into the label corresponding to the first frame, and An indexing method comprising the step of indexing a frame subsequent to the first frame to the label when not found.

32. The indexing method according to claim 25, wherein the determining step includes the content of the frame containing emotions,
Measuring the degree of similarity between the frame and the content of the frame following the frame, and determining which frame matches the frame including the emotion based on the degree of similarity. Indexing method.

33. The indexing method according to claim 32, wherein the similarity is based on the video content of the frame.

34. A method of generating a video minutes of a record of an audiovisual conference between a plurality of participants, wherein the record comprises a plurality of frames having video content, wherein (a) a plurality of frames are searched. Searching for a first significant motion by one or more participants, and (b) storing the frame as a video label when a frame containing the first significant motion is found; c) indexing into the video label each frame that matches the frame containing the first significant motion, among frames following the frame containing the first significant motion, and (d). Repeating steps (a) to (c) to generate a plurality of video labels so that substantially all of the plurality of frames are indexed into one video label. Oh proceedings generation method.

35. The method of generating video minutes according to claim 34, further comprising the step of generating an icon summary consisting of the frames stored as video labels in step (b). Generation method.

36. The method of generating video minutes according to claim 34, further comprising the step of determining which of the frames following the frame containing significant motion matches the frame containing the significant motion. A method for generating video minutes, which is characterized in that

37. The video minutes generation method according to claim 34, wherein the significant movement is a synchronized movement of two or more participants.

38. A step of calculating a plurality of motion vectors respectively indicating a motion of one participant, a step of measuring a similarity between two or more vectors in the motion vector, and two or more of the motion vectors. 38. The method of generating video minutes according to claim 37, further comprising the step of detecting a synchronized movement when the similarity between the vectors of the vector exceeds a predetermined threshold.

39. An apparatus for indexing records having video content depicting one or more human beings, wherein the records have a plurality of frames, some of which are each a plurality of the human beings. A memory that describes one emotion in emotions and stores frames; and a frame that includes multiple emotions is found, multiple labels are generated from those frames, and almost all frames that do not include emotions An indexing device comprising processor logic coupled to the memory for indexing labels by a correspondence relationship with a frame containing the.

40. The indexing device of claim 39, wherein the processor logic further produces a summary of labels.

41. A method of generating a video index for a recorded news broadcast, the index comprising a plurality of labels,
The recorded news broadcast includes a plurality of news frames and is associated with a preview scene composed of a plurality of preview frames. The preview frames and the news frames have audiovisual contents, and a step of storing the plurality of preview frames in a memory, In the preview frame, a step of identifying each preview frame that is substantially repeated a predetermined number of times or more in the preview scene, from each preview frame that is repeated a predetermined number of times or more in the preview scene in a plurality of preview frames Generating a plurality of labels, and indexing, in a plurality of news frames, each news frame that is substantially identical to one of the repeated preview frames to a label corresponding to the repeated preview frames. Step, index generation method, including.

42. The method of claim 41, further comprising the step of generating a video summary of labels.

43. In a plurality of preview frames, each preview frame that is substantially the same as the selected one preview frame repeated a predetermined number of times or more in the preview scene corresponds to the selected one preview frame. 42. The index generation method according to claim 41, further comprising indexing the label.

44. A method of generating a video index for a recorded news broadcast, the index comprising a plurality of video labels, the recorded news broadcast comprising a plurality of news frames and a plurality of preview frames. Corresponding to the preview scene, the preview frame and the news frame have audio-visual contents, and (a) selecting one preview frame among a plurality of preview frames, (b) substantially the same as the selected preview frame. Counting the number of preview frames of the selected preview frame, (c) if the number of preview frames substantially the same as the selected preview frame exceeds a predetermined number, the selected preview frame is displayed in a plurality of video labels. Storing as one video label of, and (d) a plurality of Substantially so that all is selected by said step (a), index generation method comprising the steps of generating a plurality of video labels Repeat steps of the (a) through (c) of Yufuremu.

45. The index generation method of claim 44, further comprising the step of determining which preview frame in the plurality of preview frames is substantially the same as the selected preview frame.

46. The index generating method according to claim 45, wherein the determining step is based on the video content of the preview frame.

47. The index generating method according to claim 46, wherein the determining step is further based on the audio content of the preview frame.

48. The index generation method according to claim 46, wherein the preview frame has a text content, and the step of determining is further based on the text content of the preview frame.

49. The method of claim 44, further comprising the step of generating a summary of video labels.

50. The index generation of claim 44, further comprising indexing each preview frame having substantially the same content as a selected preview frame in the plurality of preview frames into a video label. Method.

51. A device for generating a video index for a recorded news broadcast, the index comprising a plurality of labels,
The recorded news broadcast includes a plurality of news frames and corresponds to a preview scene composed of a plurality of preview frames. The preview frame and the news frame have audio-visual contents, and (a) one preview in the plurality of preview frames. Means for selecting a frame, (b) means for counting the number of preview frames substantially the same as the selected preview frame, (c) a number of preview frames substantially the same as the selected preview frame Means for determining whether the number exceeds a predetermined number, (d) if the number of frames substantially the same as the selected preview frame exceeds the predetermined number, the selected preview frame is selected from among a plurality of video labels. A means for storing as one video label, and (e) a plurality of preview frames All to be selected by said step (a), wherein (a) from the index generation apparatus Repeat step comprises a means for generating a plurality of labels (c).

52. The index generating device according to claim 51, further comprising means for generating a summary including a plurality of labels.

53. The index generation of claim 51, further comprising means for indexing each label in the plurality of preview frames, each preview frame being substantially the same as the selected preview frame. apparatus.

54. A device for generating a video index of a recorded news broadcast, the index comprising a plurality of labels,
The recorded news broadcast includes a plurality of news frames and is associated with a preview scene composed of a plurality of preview frames. The preview frames and the news frames have audiovisual contents, a memory for storing the plurality of preview frames; An index generation device comprising processor logic coupled to the memory for identifying each preview frame that is repeated a predetermined number of times or more within a preview frame and generating a plurality of video labels from each identified preview frame.

55. A method of editing a plurality of frames of a video record, each frame being represented in a frame display, the frame display being checked for the presence of user-edited edit symbols on the frame display. Recognizing the edit symbols drawn on the display, associating each of the edit symbols drawn on the frame display with one of the edit commands based on a table of edit symbols representing the edit commands, and the frame A frame editing method comprising the step of modifying a frame of a video record according to an edit command associated with an edit symbol drawn on a display.

56. The frame editing method according to claim 55, further comprising the step of providing an edit symbol table.

57. The frame editing method according to claim 55, wherein the frame display is displayed substantially on a paper-like object.

58. The frame editing method according to claim 55, wherein the frame display is displayed on a video monitor, and the edit symbol is drawn on the frame display using a cursor control device.

59. The frame editing method according to claim 58, wherein the video monitor has a touch screen, and the edit symbol is drawn on the touch screen by using the touch screen.

60. An apparatus for editing a plurality of frames of a video record, each frame being represented in a frame display, means for inspecting the frame display for the presence of user-edited edit symbols on the frame display. A means for recognizing edit symbols drawn on the display, a means for associating each edit symbol drawn on the frame display with one edit command among a plurality of edit commands based on a table of edit symbols representing edit commands, and a frame A frame editing device comprising means for modifying a frame of a video record according to an editing command associated with an editing symbol drawn on a display.

61. The frame editing device according to claim 60, further comprising means for providing an edit symbol table.

62. The frame editing device according to claim 60, wherein the frame display is displayed on a substantially paper-like object.

63. The frame editing device according to claim 60, wherein the frame display is displayed on a video monitor, and the editing symbol is drawn on the frame display by using a cursor control device for drawing it freehand. Frame editing device.

64. The frame editing apparatus according to claim 63, wherein the video monitor has a touch screen, and the edit symbol is drawn on the touch screen by physically applying a line drawing tool to the touch screen. Editing device.

65. An apparatus for editing a plurality of frames of a video record, each frame being visually represented on a frame display, a memory for storing the frame display, coupled to the memory, for receiving the frame display An input device that supplies the memory to the memory, and the frame display is checked for the presence of edit symbols hand-painted by the user on the frame display, the edit symbols drawn on the frame display are recognized, and based on the table of the edit symbols. Each of the edit symbols drawn on the frame display is associated with one edit command of the plurality of edit commands, and the frame of the video record is modified according to the edit command associated with the edit symbol drawn on the frame display. Combined processor Frame editing device having a logic.

66. The frame editing apparatus according to claim 65, wherein the input device is a scanner for inputting and digitizing the frame display, and the edit symbol is drawn on the frame display before the scanner inputs the frame display. Frame editing device.

67. A step (c) (1) of measuring a second similarity between the content of the frame and the content of a frame containing the news icon is further included, which frame matches the news icon. The index generating method according to claim 5, wherein the determining step (d) is further based on the second similarity.

68. Means for measuring a second similarity between the content of a frame and the content of a frame containing an icon, and means for determining which frame matches the icon. 15. The index generating device according to claim 14, wherein which frame matches the icon is determined based on the similarity of the icon.