JPH0887501A

JPH0887501A - Topic level control method and topic structure recognition device in topic structure recognition

Info

Publication number: JPH0887501A
Application number: JP6223151A
Authority: JP
Inventors: Atsushi Takeshita; 敦竹下; Takashi Inoue; 孝史井上; Tamaki Saito; 珠喜斎藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1994-09-19
Filing date: 1994-09-19
Publication date: 1996-04-02
Anticipated expiration: 2017-09-30
Also published as: JP3329352B2

Abstract

(57)【要約】【目的】話題構造認識結果における話題レベルが必要以
上に深くなることが抑制され、人間にとって分かりやす
い話題構造を出力できるようにする。【構成】条件に応じて話題レベルの値を増減させるため
の話題レベル増減テーブルと、所定の条件が満足したと
きにそのときの値によらず話題レベルを新しい値に設定
するための話題レベル設定テーブルとを使用する。ま
ず、話題レベル増減テーブルにしたがって話題レベルを
更新し（ステップ２０１）、話題レベル設定テーブルの
条件が満たされている場合には、話題レベル設定テーブ
ルでのその条件に対応する値に話題レベルを設定する
（ステップ２０３）。 (57) [Summary] [Purpose] To prevent the topic level in the result of topic structure recognition from becoming deeper than necessary, and to output a topic structure that is easy for humans to understand. [Structure] A topic level increase / decrease table for increasing / decreasing the topic level value according to conditions, and a topic level setting for setting a new topic level regardless of the value when a predetermined condition is satisfied Use a table and. First, the topic level is updated according to the topic level increase / decrease table (step 201), and if the condition of the topic level setting table is satisfied, the topic level is set to a value corresponding to the condition in the topic level setting table. (Step 203).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法および装置に関し、特に話題レベル
を決定する話題レベル制御方法とこの話題レベル制御方
法が適用される話題構造認識装置とに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for topic structure recognition in natural language analysis, and more particularly to a topic level control method for determining a topic level and a topic structure recognition apparatus to which this topic level control method is applied. .

【０００２】[0002]

【従来の技術】人間にテキストや対話データを呈示して
「これらテキストないし対話データの中から同じことが
書いてあるブロックと、その『同じこと』を求めよ」と
いう課題を与えると、個人差なく同じ構造を答えるとい
う性質が実験的に確認されている。その実験について
は、例えば『竹下他：「話題構造認識の観点からのヒュ
ーマンコミュニケーションの研究」電子情報通信学会１
９９３年秋季大会D-62(p.6-64)』に記載されている。人
間によって把握されるこのような構造を「話題構造」と
呼ぶ。話題構造は入れ子構造を形成するので、各話題
は、話題を示す「話題語」と、入れ子の深さを表す「話
題レベル」と、テキストないし対話データの中において
その話題がどの文からどの文まで継続するかという「話
題スコープ」によって表現できる。以下において、話題
構造の解析の対象となるテキストや対話データのことを
言語データと呼ぶ。2. Description of the Related Art If a human being is presented with text or dialogue data and is given the task of "obtaining the same block written in these texts or dialogue data and the" same thing "", there is no difference between individuals. The property of answering the same structure has been confirmed experimentally. Regarding the experiment, for example, “Takeshita et al .:“ Research on Human Communication from the Perspective of Topic Structure Recognition ”” IEICE 1
Autumn Meeting 993 D-62 (p.6-64) ”. Such a structure understood by humans is called a “topic structure”. Since the topic structure forms a nested structure, each topic has a “topic word” that indicates the topic, a “topic level” that indicates the depth of nesting, and which sentence from which sentence in the text or dialogue data it is. It can be expressed by the "topic scope" of whether to continue until. In the following, the text or dialogue data that is the subject of topic structure analysis is called language data.

【０００３】図１は、電気通信政策に関連した内容の言
語データに対する話題構造の一例を示している。言語デ
ータは、第０文から始まって少なくとも第７７０文まで
続いている。そして、「通信サービス」という話題語を
持つ話題の話題レベルは１であり、その話題スコープは
第０文から第７７０文までの範囲である。なお、説明を
簡単にするために、以下においては、『「通信サービ
ス」の話題』のように、話題語を用いてその話題を指す
ことにする。FIG. 1 shows an example of a topic structure for language data having contents related to a telecommunication policy. The language data starts from the 0th sentence and continues to at least the 770th sentence. The topic level of the topic having the topic word "communication service" is 1, and the topic scope is the range from the 0th sentence to the 770th sentence. In order to simplify the description, a topic word will be used to refer to the topic, such as “topic of“ communication service ””.

【０００４】「通信サービス」の話題の中には、話題レ
ベルが２である「新規サービス」と「従来からのサービ
ス」という話題が存在し、「新規サービス」の話題は第
１２５文から第４３１文までの話題スコープを持ち、
「従来からのサービス」の話題は第４３２文から第７７
０文までの話題スコープを持つ。また、「新規サービ
ス」の話題の中には「サービスＡ」という子話題が、
「従来からのサービス」の話題の中には「サービスＢ」
という子話題が存在し、それぞれの話題スコープは第３
０１文から第４３１文までと第５２１文と第７７０文ま
でである。Among the topics of "communication service", there are topics of "new service" and "conventional service" whose topic level is 2. The topics of "new service" are from the 125th sentence to the 431st sentence. Has topic scope up to the sentence,
The topic of "traditional service" is from the 432nd sentence to the 77th sentence.
Has a topic scope of up to 0 sentences. In addition, in the topic of "new service", the child topic "service A"
"Service B" is one of the topics in "Conventional service"
There is a child topic called, and each topic scope is the third
The first sentence to the 431st sentence, the 521st sentence to the 770th sentence.

【０００５】このような話題構造を計算機によって認識
することを話題構造認識と呼ぶ。話題構造を認識するた
めの方法は、これまでにもいくつか提案されている。こ
こでは、『竹下：「話題構造認識を用いた映像検索シス
テム」情報処理学会情報メディア研究会94-IM-15-1』で
述べられている話題構造の認識方法について簡単に説明
する。図２はこの認識方法で使用する話題構造認識装置
の一例の構成を示すブロック図であり、図３はこの認識
方法における話題構造認識処理を示すフローチャートで
あり、図４はこの話題構造認識処理における話題構造認
識前処理以降の処理の流れの一例を示す図である。Recognition of such a topic structure by a computer is called topic structure recognition. Several methods for recognizing the topic structure have been proposed so far. Here, a brief explanation will be given on the topic structure recognition method described in "Takeshita:" Video Retrieval System Using Topic Structure Recognition "Information Processing Society of Japan Information Media Workshop 94-IM-15-1". FIG. 2 is a block diagram showing the configuration of an example of a topic structure recognition device used in this recognition method, FIG. 3 is a flow chart showing a topic structure recognition process in this recognition method, and FIG. 4 is a flow chart in this topic structure recognition process. It is a figure showing an example of a flow of processing after topic structure recognition preprocessing.

【０００６】図２に示される従来の話題構造認識装置
は、言語データが入力するデータ入力部７０１と、各種
の処理を実行する処理部７０２と、結果を表示する表示
部７０３と、処理結果や処理途中で必要となるデータを
保持する記憶部７０４と、話題構造認識処理で使用され
る辞書や規則類を格納する辞書・規則部７０５によって
構成されている。記憶部７０４には、前処理後の言語デ
ータを記憶する言語データ記憶部７１０と、中間の処理
結果や最終的な処理結果を保持する話題構造記憶部７１
１とが設けられている。さらに話題構造記憶部７１１に
は、基板展開記憶部７１２と意味的展開記憶部７１３と
統合話題記憶部７１４が設けられている。一方、辞書・
規則部７０５には、前処理用辞書７２１と意味的展開処
理規則７２２と基板展開処理規則７２３と統合処理規則
７２４とが設けられている。The conventional topic structure recognition apparatus shown in FIG. 2 has a data input unit 701 for inputting language data, a processing unit 702 for executing various processes, a display unit 703 for displaying the results, a processing result and It is configured by a storage unit 704 that holds data required during the process and a dictionary / rule unit 705 that stores dictionaries and rules used in the topic structure recognition process. The storage unit 704 stores a language data storage unit 710 that stores preprocessed language data, and a topic structure storage unit 71 that holds intermediate processing results and final processing results.
1 and are provided. Further, the topic structure storage unit 711 is provided with a board development storage unit 712, a semantic development storage unit 713, and an integrated topic storage unit 714. On the other hand, dictionaries
The rule unit 705 is provided with a preprocessing dictionary 721, a semantic expansion processing rule 722, a board expansion processing rule 723, and an integrated processing rule 724.

【０００７】この話題構造認識装置を用いて話題構造認
識処理を行なう場合、まず、図３に示すように、入力さ
れた言語データ７３０に対する話題構造認識前処理７４
０を行なう。この話題構造認識前処理７４０の第１ステ
ップは、入力した言語データ７３０に対する形態素解析
処理７４１である。形態素解析処理７４１では、入力さ
れた言語データ７３０の文字列を単語ごとに区切って単
語列とし、さらに各単語の品詞や活用語の活用形等を同
定する。続いて、前処理７４０の第２ステップとして、
形態素解析の結果を入力として、単文区切り処理７４２
を行なう。単文区切り処理７４２は、埋め込み文や重文
のように複数の述語を含む文を、１つの述語のみを含む
単文に分割する処理である。前処理７４０の第３ステッ
プとして、顕著名詞句抽出７４３を実行する。顕著名詞
句抽出７４３は、単文区切り処理７４２の結果を入力と
して、各単文において最も強調されている名詞句を抽出
する処理である。そして、前処理７４０の第４ステップ
として、ブロック認識７４４を実行する。ブロック認識
７４４は、テキストでの段落に相当するブロックを認識
する処理である。これら、話題構造認識前処理７４０に
属する各処理は、辞書・規則部７０５内にある前処理用
辞書７２１を用いて、処理部７０２によって実行され、
その結果は、記憶部７０４内の言語データ記憶部７１０
に格納される。When performing the topic structure recognition processing using this topic structure recognition device, first, as shown in FIG. 3, the topic structure recognition preprocessing 74 for the input language data 730 is performed.
Perform 0. The first step of the topic structure recognition preprocessing 740 is a morphological analysis processing 741 for the input language data 730. In the morphological analysis process 741, the character string of the input language data 730 is divided into words to form a word string, and the part of speech of each word and the inflection of the inflection word are identified. Then, as the second step of the preprocessing 740,
Using the result of morphological analysis as input, simple sentence segmentation processing 742
Perform The single sentence segmentation process 742 is a process of dividing a sentence including a plurality of predicates, such as an embedded sentence or a compound sentence, into a single sentence including only one predicate. As the third step of the preprocessing 740, the prominent noun phrase extraction 743 is executed. The salient noun phrase extraction 743 is a process of inputting the result of the simple sentence segmentation process 742 and extracting the noun phrase most emphasized in each simple sentence. Then, as the fourth step of the preprocessing 740, block recognition 744 is executed. The block recognition 744 is a process of recognizing a block corresponding to a paragraph in text. These processings belonging to the topic structure recognition preprocessing 740 are executed by the processing unit 702 using the preprocessing dictionary 721 in the dictionary / rule unit 705.
The result is the language data storage unit 710 in the storage unit 704.
Stored in.

【０００８】話題構造認識前処理７４０が完了したら、
話題の展開の処理を基板展開処理７５０と意味的展開処
理７６０とに分離して実行する。ここで基盤展開とは、
「まず」や「次に」のような手掛かり句や章立て、箇条
書きなどによって明示的に示された話題展開のことであ
り、意味的展開とは、基盤展開の各話題の中で、明示的
ではない形で提示、進行する話題の展開のことである。When the topic structure recognition preprocessing 740 is completed,
The topic development processing is divided into a board development processing 750 and a semantic development processing 760 and executed. Here, the foundation development is
The topic development explicitly indicated by clue phrases such as "first" and "next", chaptering, and bullet points. Semantic expansion is explicit in each topic of basic expansion. It is the development of a topic that is presented and progressed in a way that is not appropriate.

【０００９】まず、図３に示されるように、基板展開処
理７５０において、話題確立区間の決定７５１、話題語
の決定７５２、話題スコープと話題レベルの決定７５３
という３つの処理を順次行なう。ここで話題確立区間と
は、話題が提示、確立される区間のことである。話題語
の決定７５２では、各話題確立区間における顕著名詞句
を話題語候補とし、これら話題語候補の中で優先順位が
最も高いものを選んで話題語とする。話題スコープと話
題レベルの決定７５３では、箇条書き等の構造に基づい
て、処理が行なわれる。基盤展開処理７５０は、辞書・
規則部７０５内の基盤展開処理規則７２３を用いて処理
部７０２で実行され、その結果は記憶部７０４の中の話
題構造記憶部７１１内に含まれる基盤展開記憶部７１２
に格納される。First, as shown in FIG. 3, in the board development processing 750, a topic establishment section determination 751, a topic word determination 752, and a topic scope and topic level determination 753.
These three processes are sequentially performed. Here, the topic establishment section is a section in which a topic is presented and established. In the topic word determination 752, the prominent noun phrase in each topic establishment section is set as a topic word candidate, and the topic word candidate having the highest priority is selected as a topic word. In determining the topic scope and the topic level 753, processing is performed based on the structure such as bullets. The base development processing 750 is a dictionary /
It is executed by the processing unit 702 using the infrastructure expansion processing rule 723 in the rule unit 705, and the result is the infrastructure expansion storage unit 712 included in the topic structure storage unit 711 in the storage unit 704.
Stored in.

【００１０】このような基板展開処理７５０における処
理の具体例が図４に示されている。まず、言語データの
開始時点と「まず」とか「次に」といった手掛かり句の
近辺とを基盤展開の話題確立区間として決定している。
そして、話題語の決定７５２では、最初の話題確立区間
からは「通信サービス」が、２番目の話題確立区間から
は「新規サービス」が、３番目の話題確立区間からは
「従来からのサービス」が、それぞれ、話題語として選
ばれている。A concrete example of the processing in the substrate development processing 750 is shown in FIG. First, the start point of the language data and the vicinity of the clue phrase such as “first” or “next” are determined as the topic establishment section of the base development.
Then, in the topic word determination 752, “communication service” is started from the first topic establishment section, “new service” is started from the second topic establishment section, and “conventional service” is started from the third topic establishment section. , Respectively, have been selected as topic words.

【００１１】基盤展開処理７５０の実行後、意味的展開
処理７６０が実行される。意味的展開処理７６０は、基
盤展開処理７５０と同様に、話題確立区間の決定７６
１、話題語の決定７６２、話題スコープと話題レベルの
決定７６３という３つの処理によって構成される。この
意味的展開処理７６０は、辞書・規則部７０５内の意味
的展開処理規則７２２を用いるとともに基盤展開処理７
５０の結果も利用して処理部７０２で実行され、その結
果は記憶部７０４の中の話題構造記憶部７１１に含まれ
る意味的展開記憶部７１３に格納される。After the base expansion processing 750 is executed, the semantic expansion processing 760 is executed. The semantic expansion processing 760 is similar to the infrastructure expansion processing 750, in that the topic establishment section is determined 76.
1. The process is composed of three processes: 1. topic word determination 762 and topic scope and topic level determination 763. This semantic expansion processing 760 uses the semantic expansion processing rule 722 in the dictionary / rule part 705 and also the basic expansion processing 7
The result of 50 is also used to be executed by the processing unit 702, and the result is stored in the semantic expansion storage unit 713 included in the topic structure storage unit 711 in the storage unit 704.

【００１２】図４に示した例では、話題確立区間とし
て、ある程度以上長い段落が選択され、それらにおける
話題語として、「サービスＡ」と「サービスＢ」が選ば
れている。話題スコープとしては、上述した話題確立区
間の開始点から基盤展開における次の話題確立区間の開
始点までが求められている。話題レベルは、テキストの
意味的展開の場合には、全て同じレベルすなわちレベル
１とされる。In the example shown in FIG. 4, paragraphs longer than a certain length are selected as topic establishment sections, and "service A" and "service B" are selected as topic words in them. The topic scope is required from the start point of the above-mentioned topic establishment section to the start point of the next topic establishment section in the infrastructure development. The topic levels are all set to the same level, that is, level 1 in the case of semantic expansion of text.

【００１３】最後に、基盤展開と意味的展開の統合処理
７７０が行なわれ、その結果として、言語データ全体の
話題構造７８０が出力される。この統合処理７７０は、
基盤展開処理７５０と意味的展開処理７６０のそれぞれ
の話題構造を入力とし、辞書・規則部７０５内の統合処
理規則７２４を用いて、処理部７０２によって実行され
る。図４に示した例では、統合処理の結果として、図１
に示したのと同様の話題構造７８０が得られている。Finally, an integrated process 770 of the basic expansion and the semantic expansion is performed, and as a result, the topic structure 780 of the entire language data is output. This integration process 770
It is executed by the processing unit 702 by using the topic structure of each of the base expansion process 750 and the semantic expansion process 760 as an input and the integrated process rule 724 in the dictionary / rule unit 705. In the example shown in FIG. 4, as a result of the integration processing, as shown in FIG.
A topic structure 780 similar to that shown in FIG.

【００１４】基盤展開と意味的展開のそれぞれにおい
て、話題確立区間や話題語、話題スコープ、話題レベル
を決定するための規則（意味的展開処理規則７２２や基
盤展開処理規則７２３）は、言語データが対話、モノロ
ーグ、書き言葉テキストなどのどの伝達形態によるもの
であるかによって異なる。伝達形態による話題展開様式
や話題構造認識規則の違いと、話題構造認識実験の結果
については、『竹下他：「話題構造認識の観点からのヒ
ューマンコミュニケーションの研究」電子情報通信学会
１９９３年秋季大会D-62(p.6-64)』に記載がある。In each of the basic expansion and the semantic expansion, the rules (semantic expansion processing rule 722 and basic expansion processing rule 723) for determining the topic establishment section, the topic word, the topic scope, and the topic level are the language data. It depends on the form of communication such as dialogue, monologue, written text, etc. Regarding the differences in topic development styles and topic structure recognition rules depending on the transmission form, and the results of the topic structure recognition experiment, see “Takeshita et al .:“ Study of Human Communication from the Viewpoint of Topic Structure Recognition ”The Institute of Electronics, Information and Communication Engineers 1993 Autumn Meeting D -62 (p.6-64) ”.

【００１５】[0015]

【発明が解決しようとする課題】しかしながら、上述し
た従来の話題構造認識技術では、話題の入れ子の開始点
を認識するのは容易であるのに対して、話題の入れ子の
終了点を認識することは困難である。このため、話題構
造認識結果における話題レベルは言語データの後ろにな
ればなるほど深くなってしまうという傾向があり、この
話題構造認識結果を例えば人間にとっての目次として利
用しようとしたときに、人間にとって分かりにくいもの
になってしまったり、人間の認識する話題レベルとは大
きく異なった話題レベルとして扱われてしまったりする
という問題がある。特に、言語データが長かったり、話
題が多岐にわたる場合にこのような傾向が顕著である。However, in the above-mentioned conventional topic structure recognition technology, it is easy to recognize the start point of topic nesting, whereas it is necessary to recognize the end point of topic nesting. It is difficult. For this reason, the topic level in the topic structure recognition result tends to become deeper as it goes behind the language data. There is a problem that it becomes difficult and it is treated as a topic level that is significantly different from the topic level recognized by humans. This tendency is particularly noticeable when the language data is long or the topic is diverse.

【００１６】例として、議会における会議録（議事録）
を取り上げる。図５は、会議録の例（代表質問）であっ
て、多岐にわたる話題を多数含む言語データの代表的な
ものである。図５に示した言語データに対し、上述した
従来の方法で話題構造を認識し、その結果を目次形態で
出力したものが図６に示されている。なおここに示した
ものでは、通常の目次とは異なり、章タイトルのみなら
ず、言語データ中でその章タイトルが出現した箇所の前
後の文字列も併記してある。図６に示した例でも、目次
の後半ほど章立ての入れ子が深くなっており、最後の
「5.11.1.11.1.2.2.3.3.10 ＰＫＯの法案」では、章立
ての入れ子は１０重にもなっている。これだけ話題レベ
ルが深くなってしまうと、章立ての関係を理解すること
が人間にとって困難となり、目次としての機能が薄れて
しまう。[0016] As an example, the minutes (minutes) of the assembly
Take up. FIG. 5 is an example of a meeting record (representative question), which is representative of language data including a large number of diverse topics. FIG. 6 shows what the topic structure of the language data shown in FIG. 5 is recognized by the above-mentioned conventional method, and the result is output in the form of a table of contents. Note that, unlike the normal table of contents, in the one shown here, not only the chapter title but also the character strings before and after the place where the chapter title appears in the language data are also shown. Even in the example shown in Fig. 6, the nesting of chapters is deeper toward the latter half of the table of contents, and in the last "5.11.1.11.1.2.2.3.3.10 PKO bill", the nesting of chapters is as many as 10 layers. ing. If the topic level becomes deeper by this much, it will be difficult for humans to understand the relationship between chapters, and the function as a table of contents will decline.

【００１７】本発明の目的は、話題構造認識結果におけ
る話題レベルが必要以上に深くなることが抑制され、人
間にとって分かりやすい話題構造を出力できる話題レベ
ル制御方法と話題構造認識装置とを提供することにあ
る。It is an object of the present invention to provide a topic level control method and a topic structure recognition device which can prevent the topic level in the result of the topic structure recognition from becoming unnecessarily deep and output a topic structure that is easy for humans to understand. It is in.

【００１８】[0018]

【課題を解決するための手段】本発明の話題レベル制御
方法は、予め準備された規則を用いて言語データの話題
構造を認識する話題構造認識処理における話題レベルの
決定方法において、条件と話題レベルの増減との関係を
記述する話題レベル増減テーブルと、条件と新しい話題
レベルの対を記述する話題レベル設定テーブルとを予め
準備し、前記話題レベル増減テーブルおよび前記話題レ
ベル設定テーブルを用いて話題レベルを決定する。A topic level control method of the present invention is a method for determining a topic level in a topic structure recognition process for recognizing a topic structure of language data using a rule prepared in advance. The topic level increase / decrease table that describes the relationship with the increase / decrease and the topic level setting table that describes the condition and the new topic level pair are prepared in advance, and the topic level increase / decrease table and the topic level setting table are used. To decide.

【００１９】本発明の話題構造認識装置は、言語データ
を入力するための入力部と、話題構造認識のための規則
類を蓄える辞書・規則部と、該辞書・記憶部の規則類を
用いた処理を行なう処理部と、前記処理部による結果を
蓄える記憶部と、前記処理部による処理結果を表示する
表示部とを有し、手掛かり句が入れ子開始型と話題転換
型と入れ子終了型の３つのタイプに分類され、前記辞書
・規則部が、入れ子開始型の手掛かり句についてその単
語と品詞とを登録した入れ子開始型手掛かり句テーブル
と、話題転換型の手掛かり句についてその単語と品詞と
を登録した話題転換型手掛かり句テーブルと、入れ子終
了型の手掛かり句についてその単語と品詞とを登録した
入れ子終了型手掛かり句テーブルと、前記手掛かり句の
タイプの情報を含む条件と話題レベルの増減との関係を
記述した話題レベル増減テーブルと、条件と話題レベル
の値との組を記述した話題レベル設定テーブルとを含
み、前記記憶部が、入力部から入力された言語データに
関する情報を蓄える言語データ記憶部と、話題構造に関
する情報を蓄える話題構造記憶部とを含み、前記言語デ
ータ記憶部が、前記言語データに含まれる各単語の文字
列と品詞に関する情報を格納する単語情報テーブルと、
前記言語データの各単文に含まれる単語と手掛かり句と
手掛かり句のタイプに関する情報を格納する単文情報テ
ーブルと、各発話番の話者と開始単文番号を含む情報を
格納する発話番情報テーブルとを含み、話題構造記憶部
が、話題が提示、確立される範囲である話題確立区間と
話題語と話題レベルと話題スコープとを含む情報を格納
するテーブルを含む。The topic structure recognition device of the present invention uses an input unit for inputting language data, a dictionary / rule unit for storing rules for topic structure recognition, and rules of the dictionary / storage unit. It has a processing unit that performs processing, a storage unit that stores the result of the processing unit, and a display unit that displays the processing result of the processing unit, and the clue phrase is a nested start type, a topic conversion type, or a nested end type. Classes are classified into three types, and the dictionary / rules section registers the word and part-of-speech table in which the word and the part-of-speech are registered for the nest-start-type clue-phrase, and the word and part-of-speech for the topic conversion-type clue phrase. The topic conversion type clue phrase table, the nesting end type clue phrase table in which the word and the part of speech of the nest end type clue phrase are registered, and information on the type of the clue phrase are included. The topic level increase / decrease table describing the relationship between the condition and the increase / decrease of the topic level, and the topic level setting table describing the combination of the condition and the value of the topic level, the storage unit including the language input from the input unit. The language data storage unit includes a language data storage unit that stores information about data and a topic structure storage unit that stores information about a topic structure, and the language data storage unit stores information about a character string and a part of speech of each word included in the language data. Word information table,
A simple sentence information table that stores information about words, clue phrases, and types of clue phrases included in each simple sentence of the language data, and a utterance number information table that stores information including a speaker of each utterance number and a starting simple sentence number. In addition, the topic structure storage unit includes a table that stores information including a topic establishment section that is a range in which topics are presented and established, topic words, topic levels, and topic scopes.

【００２０】[0020]

【作用】本発明は、話題レベルの増減を記述した話題レ
ベル増減テーブルと、条件と新しい話題レベル値の対を
記述した話題レベル設定テーブルを用いることにより、
認識結果として得られる話題構造の話題レベルが深くな
らないようにすることを可能とする。これにより、例え
ば話題構造を目次として用いた場合に、ユーザは言語デ
ータにおける話の流れをより容易に理解できるようにな
る。According to the present invention, by using the topic level increase / decrease table in which increase / decrease in topic level is described and the topic level setting table in which a pair of condition and new topic level value is described,
It is possible to prevent the topic level of the topic structure obtained as a recognition result from becoming deep. Thereby, for example, when the topic structure is used as the table of contents, the user can more easily understand the flow of the story in the language data.

【００２１】[0021]

【実施例】以下、本発明の実施例について、図面を参照
して説明する。図７は、本発明の一実施例の話題構造認
識装置の構成を示すブロック図である。この話題構造認
識装置は、図２に示す従来の話題構造認識装置と比べ、
特に、辞書・規則部１０５に含まれるテーブル類の構成
において異なっている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 7 is a block diagram showing the configuration of a topic structure recognition device according to an embodiment of the present invention. This topic structure recognition device is different from the conventional topic structure recognition device shown in FIG.
In particular, the structures of the tables included in the dictionary / rule section 105 are different.

【００２２】［本実施例の話題構造装置の構成］本実施
例の話題構造認識装置には、言語データが入力するデー
タ入力部１０１と、各種の処理を実行する処理部１０２
と、結果を表示する表示部１０３と、処理結果や処理途
中で必要となるデータを保持する記憶部１０４と、話題
構造認識処理で使用される辞書や規則類を格納する辞書
・規則部１０５によって構成されている。記憶部１０４
には、前処理後の言語データを記憶する言語データ記憶
部１１０と、中間の処理結果や最終的な処理結果を保持
する話題構造記憶部１１１とが設けられている。言語デ
ータ記憶部１１０には、単文情報テーブル１１５と単語
情報テーブル１１６と発話番情報テーブル１１７が設け
られており、話題構造記憶部１１１には、基板展開記憶
部１１２と意味的展開記憶部１１３と統合話題記憶部１
１４が設けられている。一方、辞書・規則部１０５に
は、前処理用辞書１２１と意味的展開処理規則１２２と
基板展開処理規則１２３と統合処理規則１２４と話題レ
ベル増減テーブル１２５と話題レベル設定テーブル１２
６と入れ子開始型手掛かり句テーブル１２７と話題転換
型手掛かり句テーブル１２８と入れ子終了型手掛かり句
テーブル１２９とが設けられている。[Structure of Topic Structure Apparatus of this Embodiment] The topic structure recognition apparatus of this embodiment has a data input unit 101 for inputting language data and a processing unit 102 for executing various processes.
The display unit 103 for displaying the result, the storage unit 104 for holding the processing result and the data required during the process, and the dictionary / rule unit 105 for storing the dictionaries and rules used in the topic structure recognition processing. It is configured. Storage unit 104
A language data storage unit 110 that stores preprocessed language data, and a topic structure storage unit 111 that holds intermediate processing results and final processing results are provided. The language data storage unit 110 is provided with a simple sentence information table 115, a word information table 116, and an utterance number information table 117, and the topic structure storage unit 111 includes a board expansion storage unit 112 and a semantic expansion storage unit 113. Integrated topic storage 1
14 are provided. On the other hand, the dictionary / rule unit 105 includes a preprocessing dictionary 121, a semantic expansion processing rule 122, a board expansion processing rule 123, an integrated processing rule 124, a topic level increase / decrease table 125, and a topic level setting table 12.
6, a nesting start type clue phrase table 127, a topic conversion type clue phrase table 128, and a nesting end type clue phrase table 129 are provided.

【００２３】この話題構造認識装置を用いて言語データ
の話題構造の解析を行なう場合、その処理は図３に示し
た従来の処理の流れと同様に処理が行なわれるが、基盤
展開処理における話題レベルの決定方法において相違す
る。この話題構造認識装置を使用する場合には、話題レ
ベル増減テーブル１２５と話題レベル設定テーブル１２
６を用いて話題レベルが決定される。これらのテーブル
１２５,１２６の詳細については後述するが、話題レベ
ル増減テーブル１２５はある条件が成立した場合に話題
レベルを相対的にどれだけ変化させるかを記述したもの
であり、話題レベル設定テーブルは特定の条件が成立し
た場合に話題レベルを強制的にどの値に変更するか（絶
対的変化）を記述したものである。そして、これらテー
ブル１２５,１２６に記載されている条件を判断するた
めに、各手掛かり句テーブル１２７〜１２９が使用され
る。以下、話題レベルの決定方法の中心にして、本実施
例の話題構造認識装置による話題構造の解析手順を説明
する。When the topic structure of language data is analyzed using this topic structure recognition device, the processing is performed in the same manner as the conventional processing flow shown in FIG. The method of determining is different. When using this topic structure recognition device, the topic level increase / decrease table 125 and the topic level setting table 12
The topic level is determined using 6. Although details of these tables 125 and 126 will be described later, the topic level increase / decrease table 125 describes how much the topic level is relatively changed when a certain condition is satisfied, and the topic level setting table is It describes which value (absolute change) the topic level is forcibly changed when a specific condition is satisfied. Then, the clue phrase tables 127 to 129 are used to determine the conditions described in these tables 125 and 126. Hereinafter, a procedure for analyzing a topic structure by the topic structure recognition apparatus according to the present embodiment will be described centering on a topic level determination method.

【００２４】［基盤展開における話題レベルの決定の全
体の流れ］図８のフローチャートは、基盤展開における
話題レベル決定の処理手順を示している。まず、基盤展
開の話題確立区間について、話題レベル増減テーブル１
２５にしたがって、話題レベルを更新する（ステップ２
０１）。そして、話題レベル設定テーブル１２６の条件
が満足しているかが判断され（ステップ２０２）、成立
していない場合にはそのまま話題レベル決定の処理を終
了し、成立している場合には、話題レベル設定テーブル
１２６にしたがって話題レベルを更新して（ステップ２
０３）、話題レベル決定処理を終了する。[Overall Flow of Determining Topic Level in Infrastructure Deployment] The flowchart of FIG. 8 shows a processing procedure for determining a topic level in infrastructure deployment. First, the topic level increase / decrease table 1 for the topic establishment section of the infrastructure development
25, update the topic level (step 2
01). Then, it is judged whether or not the condition of the topic level setting table 126 is satisfied (step 202), and if not satisfied, the topic level determination process is ended as it is, and if satisfied, the topic level setting is performed. Update the topic level according to table 126 (step 2
03), and the topic level determination processing ends.

【００２５】［各テーブルの構成］図９は、話題レベル
増減テーブル１２５の構成例を示す図である。本実施例
では、直前の手掛かり句から現在の手掛かり句への手掛
かり句タイプの遷移パターンによって話題レベルを増減
させており、話題レベル増減テーブル１２５はこの遷移
パターンと話題レベルの増減量との関係を記述してい
る。本実施例では、「まず」や「次に」などの手掛かり
句を「入れ子開始型」、「話題転換型」、「入れ子終了
型」の３つのタイプに分類している。各タイプに属する
手掛かり句の例が表１に示されている。[Structure of Each Table] FIG. 9 is a diagram showing a structural example of the topic level increase / decrease table 125. In this embodiment, the topic level is increased / decreased according to the transition pattern of the clue phrase type from the immediately preceding clue phrase to the current clue phrase, and the topic level increase / decrease table 125 shows the relationship between the transition pattern and the increase / decrease amount of the topic level. It describes. In this embodiment, the clue phrases such as "first" and "next" are classified into three types, "nesting start type", "topic conversion type", and "nesting end type". Examples of clue phrases that belong to each type are shown in Table 1.

【００２６】[0026]

【表１】図９に示されるように、現在の手掛かり句が入れ子開始
型であれば、直前の手掛かり句のタイプによらず、話題
レベルを１だけ増加させる。また、現在の手掛かり句が
話題転換型か入れ子終了型で、かつ直前の手掛かり句が
入れ子開始型か話題転換型のいずれかであれば、話題レ
ベルは変わらないものとする。また、現在の手掛かり句
が話題転換型か入れ子終了型で、かつ直前の手掛かり句
が入れ子終了型であれば、話題レベルを１減少させる。[Table 1] As shown in FIG. 9, if the current clue phrase is a nested start type, the topic level is increased by 1 regardless of the type of the immediately preceding clue phrase. If the current clue phrase is the topic conversion type or the nest end type, and the immediately preceding clue phrase is either the nest start type or the topic conversion type, the topic level is not changed. If the current clue phrase is the topic conversion type or the nest end type, and the immediately preceding clue phrase is the nest end type, the topic level is decreased by one.

【００２７】図１０(a)〜(c)は、それぞれ、入れ子開始
型手掛かり句テーブル１２７、話題転換型手掛かり句テ
ーブル１２８および入れ子終了型手掛かり句テーブル１
２９の構成を示す図であり、図中の値は、図５に示した
言語データ例に対する処理過程での値である。表１に示
した３種類の手掛かり句を記憶するために、辞書・規則
部１０５にはこれら手掛かり句テーブル１２７〜１２９
が含まれている。各手掛かり句テーブル１２７〜１２９
は、手掛かり句番号のフィールドと、手掛かり句の単語
を記述するためのフィールドと、各単語の品詞を記述す
るためのフィールドとが設けられている。例えば、入れ
子開始型手掛かり句テーブル１２７において、手掛かり
句番号０の手掛かり句は、「まず」というｌつの単語か
ら構成され、その品詞は副詞である。FIGS. 10A to 10C show nesting start type clue phrase table 127, topic conversion type clue phrase table 128 and nesting end type clue phrase table 1, respectively.
It is a figure which shows the structure of 29, and the value in a figure is a value in the process of processing with respect to the language data example shown in FIG. In order to store the three kinds of clue phrases shown in Table 1, the dictionary / rule section 105 stores these clue phrase tables 127 to 129.
It is included. Each clue phrase table 127-129
Is provided with a field for a clue phrase number, a field for describing a clue phrase word, and a field for describing a part of speech of each word. For example, in the nesting start-type clue phrase table 127, the clue phrase with the clue phrase number 0 is composed of one word “first”, and its part of speech is an adverb.

【００２８】一方、上述したように、言語データ記憶部
１１０には単文情報テーブル１１５と単語情報テーブル
１１６が設けられている。これらのテーブル１１５,１
１６の構成例が、図１１(a),(b)にそれぞれ示されてい
る。単文情報テーブル１１６は、述語を１つだけ持つ単
位である単文に関する情報を記述するためのものであっ
て、各単文番号のフィールドと、各単文に含まれる手掛
かり句をその手掛かり句番号で記述するフィールドと、
その手掛かり句のタイプを記述するフィールドと、その
単文が属する発話番（その単文が何番目の話者によるも
のかを示す）を表わすフィールドと、その単文の開始お
よび終了の単語番号を表わすフィールド（単語範囲フィ
ールド）とによって構成されている。単文番号は、その
単文が言語データ中の何番目の単文であるかを０から始
まる連続番号で示したものである。手掛かり句と手掛か
り句タイプのフィールドにおける値"−１"は、手掛かり
句が存在しないことを示している。単語情報テーブル１
１６には、言語データ中でその単語が何番目の単語であ
るかを示す単語番号フィールドと、その単語を記述する
ためのフィールドと、その単語の品詞等の情報を記述す
るためのフィールドとが設定されている。上述したよう
に、話題構造認識前処理（図３参照）において形態素解
析と単文区切り処理とが行なわれており、これらの処理
結果が単文情報テーブル１１５と単語情報テーブル１１
６に格納されることになる。なお、言語データ記憶部１
１０には、発話番情報テーブル１１７も設けられている
が、この発話番情報テーブル１１７は、話者の発話の順
序を示す発話番番号フィールドと、話者名が記述される
話者フィールドと、その話者の単文がどの単文番号の単
文から始まるかを示す開始単文フィールドとによって、
構成されている。On the other hand, as described above, the language data storage unit 110 is provided with the simple sentence information table 115 and the word information table 116. These tables 115,1
16 configuration examples are shown in FIGS. 11A and 11B, respectively. The simple sentence information table 116 is for describing information about a simple sentence, which is a unit having only one predicate, and describes the field of each simple sentence number and the clue phrase included in each simple sentence by the clue phrase number. Field and the
A field that describes the type of the clue phrase, a field that represents the utterance number to which the simple sentence belongs (indicates which speaker the simple sentence is from), and a field that represents the start and end word numbers of the simple sentence ( (Word range field) and. The simple sentence number indicates the order of the simple sentence in the language data as a continuous number starting from 0. The value "-1" in the clue phrase and clue phrase type fields indicates that the clue phrase does not exist. Word information table 1
Reference numeral 16 includes a word number field indicating the order of the word in the language data, a field for describing the word, and a field for describing information such as the part of speech of the word. It is set. As described above, the morpheme analysis and the simple sentence segmentation process are performed in the topic structure recognition preprocessing (see FIG. 3), and the results of these processes are the simple sentence information table 115 and the word information table 11.
6 will be stored. The language data storage unit 1
10, the utterance number information table 117 is also provided. The utterance number information table 117 has a utterance number field indicating the order of utterances of the speaker, a speaker field in which the speaker name is described, With the starting simple sentence field that indicates from which simple sentence number the simple sentence of the speaker starts,
It is configured.

【００２９】各単文での手掛かり句の検出は、単文情報
テーブル１１５の単語範囲の値を取り出して、単語情報
テーブル１１６上でのこの範囲内に、３種類の各手掛か
り句テーブル１２７〜１２９に含まれる項目が存在する
かどうかを調べることにより行なわれる。例えば、単文
番号０の単語範囲は単語番号０から１２であるから、単
語情報テーブル１１６の単語番号０から１２の範囲内
に、手掛かり句テーブル１２７〜１２９に含まれる項目
が含まれているかどうかを調べる。単文番号０の場合、
手掛かり句は含まれていないので、単文番号０に係る手
掛かり句と手掛かり句タイプの各フィールド値は、共に
−１としている。もし、手掛かり句が検出されれば、そ
の手掛かり句番号を手掛かり句フィールドに記録し、そ
のタイプを手掛かり句タイプフィールドに記述する。To detect a clue phrase in each simple sentence, the value of the word range of the simple sentence information table 115 is extracted and included in this range on the word information table 116 in each of the three types of clue phrase tables 127 to 129. This is done by checking whether there is an item to be displayed. For example, since the word range of the simple sentence number 0 is the word numbers 0 to 12, it is determined whether the items included in the clue phrase tables 127 to 129 are included in the range of the word numbers 0 to 12 of the word information table 116. Find out. In case of simple sentence number 0,
Since no clue phrase is included, the field values of the clue phrase and the clue phrase type associated with simple sentence number 0 are both -1. If a clue phrase is detected, the clue phrase number is recorded in the clue phrase field and its type is described in the clue phrase type field.

【００３０】図１２は、話題レベル設定テーブル１２６
の構成例を示している。話題レベル設定テーブル１２６
は、条件フィールドと新しい話題レベルを記述するフィ
ールドとによって構成されている。条件フィールドに記
されているいずれかの条件が満たされた場合には、その
満足した条件に対応する値（満足した条件の右側の欄に
記載された値）に話題レベルが設定される。上述の話題
レベル増減テーブル１２５による話題レベルの遷移はそ
れまでの話題レベルを基準にした増減であるのに対し
て、この話題レベル設定テーブル１２６による話題レベ
ルの更新では、それまでの話題レベル値とは無関係に話
題レベルの絶対値が指定される。図１２に示した例で
は、話題レベルが５を越えた場合には話題レベルが１に
戻される。また、話者交替が起こり、かつ新しい発話番
が３単文以上から構成されており、かつ今回の話者交替
より前に５単文以上存在する場合に、話題レベルが１に
戻される。また、もし２つ以上の条件が同時に満たされ
た場合には、例えば、話題レベル設定テーブル１２６の
中で上位に記述されているものを採用するようにすれば
よい。FIG. 12 shows a topic level setting table 126.
The example of composition of is shown. Topic level setting table 126
Is composed of a condition field and a field describing a new topic level. When any of the conditions described in the condition field is satisfied, the topic level is set to the value corresponding to the satisfied condition (the value described in the column on the right side of the satisfied condition). While the topic level transition by the topic level increase / decrease table 125 described above is an increase / decrease based on the topic level up to that point, the topic level update by the topic level setting table 126 is the same as the topic level value up to that point. The absolute value of the topic level is specified regardless of. In the example shown in FIG. 12, when the topic level exceeds 5, the topic level is returned to 1. If the speaker change occurs, the new utterance number is composed of 3 or more sentences, and there are 5 or more sentences before the present speaker change, the topic level is returned to 1. If two or more conditions are satisfied at the same time, for example, the one described higher in the topic level setting table 126 may be adopted.

【００３１】次に、基盤展開記憶部１１２、単文情報テ
ーブル１１５および発話番情報テーブル１１７のそれぞ
れと話題レベル設定テーブル１２６に記述された条件と
の関係について、図１３を用いて説明する。図中の値
は、図５に示した言語データ例に対する処理過程での値
である。Next, the relationship between each of the infrastructure expansion storage unit 112, the simple sentence information table 115 and the utterance number information table 117 and the conditions described in the topic level setting table 126 will be described with reference to FIG. The values in the figure are values in the process of processing the language data example shown in FIG.

【００３２】基盤展開記憶部１１２には、話題番号ごと
に、話題が提示・確立される話題確立区間の開始と終了
をそれぞれ示す単文番号を記述するフィールド（話題確
立区間フィールド）と、話題語を記述するフィールド
と、話題レベルを記述するフィールドと、話題スコープ
をその開始および終了の単語番号で記述するフィールド
とが含まれる。上述したように発話番情報テーブル１１
７には、発話番番号フィールドと、話者を記述するフィ
ールドと、発話番が開始する単文番号を記述するフィー
ルドとが含まれ、図示した例では、発話番０の話者は
「議長」であり、単文０から開始し、終了は次の発話番
１の開始単文の１つ前である単文１となっている。The infrastructure expansion storage unit 112 stores, for each topic number, a field (topic establishment section field) describing a single sentence number indicating the start and end of a topic establishment section in which a topic is presented / established and a topic word. It includes a field for describing, a field for describing a topic level, and a field for describing a topic scope by word numbers of its start and end. As described above, the utterance number information table 11
7 includes a utterance number field, a field for describing the speaker, and a field for describing a simple sentence number at which the utterance number starts. In the illustrated example, the speaker with the utterance number 0 is the "chairperson". Yes, it starts from simple sentence 0 and ends at simple sentence 1 which is one before the starting simple sentence of the next utterance number 1.

【００３３】図１２に示した話題レベル設定テーブル１
２６の条件にある「話題レベル＞５」が満たされている
かどうかは、基盤展開記憶部１１２の話題レベルのフィ
ールドの値を調べることにより判定することが可能であ
る。また、話題レベル設定テーブル１２６の条件にある
「話者交替が発生したとき」が満たされているかどうか
は、基盤展開記憶部１１２に記録されている話題確立区
間の開始単文について、単文情報テーブル１１５の発話
番フィールドを調べることにより判定できる。また、話
題レベル設定テーブル１２６の条件にある「新しい発話
番に３単文以上含まれている」と「言語データ開始時か
ら話者交替までに５単文以上含まれている」が満たされ
ているかどうかは、発話番情報テーブル１１７の開始単
文フィールドの値から判定できる。Topic level setting table 1 shown in FIG.
Whether "topic level>5" in the condition of 26 is satisfied can be determined by checking the value of the topic level field of the basic expansion storage unit 112. In addition, whether or not the condition “when a speaker change occurs”, which is a condition of the topic level setting table 126, is satisfied is determined by the simple sentence information table 115 for the start simple sentence of the topic establishment section recorded in the infrastructure development storage unit 112. It can be determined by examining the utterance number field of. Further, whether or not the conditions of the topic level setting table 126, "the new utterance number contains three or more simple sentences" and "there are five or more simple sentences included from the start of the language data to the speaker change", are satisfied. Can be determined from the value of the starting simple sentence field of the utterance number information table 117.

【００３４】［言語データ例を用いた説明］次に、図１
３に示された例を用いて、本実施例における基盤展開で
の話題レベル制御方法を詳細に説明する。以下の説明に
おいて、単文番号がＮの単文のことを簡単のために単文
＃Ｎと記載することにする。[Explanation Using Example of Language Data] Next, referring to FIG.
The topic level control method in the infrastructure development in this embodiment will be described in detail with reference to the example shown in FIG. In the following description, a simple sentence having a simple sentence number N will be referred to as a simple sentence #N for simplicity.

【００３５】話題番号４５の話題レベルは、既に基盤展
開記憶部１１２に記憶されており、その値は４である。
また、話題番号４５の話題確立区間は単文＃７２０から
開始する。単文情報テーブル１１５によると、この単文
＃７２０は発話番４に属する。The topic level of the topic number 45 is already stored in the infrastructure development storage unit 112, and its value is 4.
Further, the topic establishment section of the topic number 45 starts from simple sentence # 720. According to the simple sentence information table 115, the simple sentence # 720 belongs to the utterance number 4.

【００３６】話題番号４６の話題レベルはまだ決定され
ていないので（図示空欄になっている）、本発明にした
がって決定する。まず、話題レベル増減テーブル１２５
にしたがって、話題レベルを更新する。話題番号４６の
話題確立区間は単文＃８２２から＃８２６までであり、
単文情報テーブル１１５のその範囲を見ると、単文＃８
２２に入れ子開始型の手掛かり句が存在する。図９に示
されるようにこの話題レベル増減テーブル１２５によれ
ば、今回の手掛かり句が入れ子開始型であれば前回の手
掛かり句が何であっても話題レベルは＋１されるので、
話題レベルは５となる。Since the topic level of the topic number 46 has not been decided yet (it is blank in the drawing), it is decided according to the present invention. First, the topic level increase / decrease table 125
According to, update the topic level. The topic establishment section of the topic number 46 is simple sentences # 822 to # 826,
Looking at the range of the simple sentence information table 115, simple sentence # 8
There is a nesting-type clue phrase at 22. As shown in FIG. 9, according to the topic level increase / decrease table 125, if the current clue phrase is a nested start type, the topic level is incremented by 1 regardless of the previous clue phrase.
The topic level is 5.

【００３７】次に、話題レベル設定テーブル１２６の条
件が満たされているかどうかを調べる。「話題レベル＞
５」という条件は、話題レベルが５であるので、満たさ
れていない。話題番号４６の話題確立区間は単文＃８２
２から開始するが、単文情報テーブル１１５によると、
この単文は発話番６に属する。前述したように、話題番
号４５は発話番４に属するので、話者交替が起きている
ことがわかる。また、発話番情報テーブル１１７による
と、発話番号６は単文＃７８５から開始しており、次の
発話番号７の開始単文＃１０４７の１つ前の単文＃１０
４６まで続くので、「新しい発話番に３単文以上含まれ
ている」という条件も満たされる。また、発話番号６は
単文＃７８５から開始であるので、「言語データ開始時
から話者交替までに５単文以上含まれている」という条
件も満たされる。これらにより、図１１に示した話題レ
ベル設定テーブル１２６の２番目の条件が満たされてい
ることが分かるので、最終的に話題番号４６の話題レベ
ルは１となる。Next, it is checked whether the conditions of the topic level setting table 126 are satisfied. "Topic level>
The condition "5" is not satisfied because the topic level is 5. The topic establishment section of topic number 46 is simple sentence # 82.
Although it starts from 2, according to the simple sentence information table 115,
This simple sentence belongs to utterance number 6. As described above, since the topic number 45 belongs to the utterance number 4, it can be seen that the speaker change has occurred. Further, according to the utterance number information table 117, the utterance number 6 starts from the simple sentence # 785, and the simple sentence # 10 immediately before the starting simple sentence # 1047 of the next utterance number 7 is # 10.
Since it continues up to 46, the condition that "the new utterance number contains three or more simple sentences" is also satisfied. Further, since the utterance number 6 starts from the simple sentence # 785, the condition that "5 or more simple sentences are included from the start of the language data to the speaker change" is also satisfied. From these, it can be seen that the second condition of the topic level setting table 126 shown in FIG. 11 is satisfied, so that the topic level of the topic number 46 finally becomes 1.

【００３８】本実施例で示した話題レベル制御方法を用
いて図５に例示した言語データに対して話題構造認識を
行なった結果が、図１４に示されている。図１４に示す
目次では、章立ての入れ子は最高でも４重であり、さら
に妥当な箇所で入れ子レベルが戻されているので、従来
の方法による図６の目次と比較すると、人間にとって非
常に分かりやすいものとなっている。FIG. 14 shows the result of topic structure recognition performed on the language data illustrated in FIG. 5 using the topic level control method shown in this embodiment. In the table of contents shown in FIG. 14, the nesting of chapters is at most quadruple, and the nesting level is returned at a reasonable place. Therefore, when compared with the table of contents of FIG. It is easy.

【００３９】[0039]

【発明の効果】以上説明したように本発明は、話題レベ
ル増減テーブルと話題レベル設定テーブルとを用いるこ
とにより、話題構造認識結果における話題レベルが深く
なるのを抑制することが可能となるという効果がある。
会議録や講義録のように長い言語データに対して非常に
有効であり、その中でも特に会議録のように１つの発話
番にたくさんの文が含まれているような言語データに有
効である。本発明により、認識された語題構造はユーザ
にとって理解しやすいものとなる。As described above, according to the present invention, by using the topic level increase / decrease table and the topic level setting table, it is possible to prevent the topic level in the topic structure recognition result from becoming deep. There is.
It is very effective for long language data such as conference proceedings and lecture records, and is particularly effective for language data such as conference proceedings in which one utterance number contains many sentences. The present invention makes the recognized word structure easy for the user to understand.

[Brief description of drawings]

【図１】人間による話題構造認識の例である。FIG. 1 is an example of topic structure recognition by a human.

【図２】従来の話題構造認識装置の一例の構造を示すブ
ロック図である。FIG. 2 is a block diagram showing a structure of an example of a conventional topic structure recognition device.

【図３】従来の話題構造認識のための処理を示すフロー
チャートである。FIG. 3 is a flowchart showing a conventional process for topic structure recognition.

【図４】従来の話題構造認識における前処理以降の例で
ある。FIG. 4 shows an example after preprocessing in conventional topic structure recognition.

【図５】言語データの一例を示す図である。FIG. 5 is a diagram showing an example of language data.

【図６】図５に示す言語データに対して従来の話題構造
認識方法を適用して話題構造を抽出した結果を示す図で
ある。6 is a diagram showing a result of extracting a topic structure by applying a conventional topic structure recognition method to the language data shown in FIG.

【図７】本発明の一実施例の話題構造認識装置の構造を
示すブロック図である。FIG. 7 is a block diagram showing the structure of a topic structure recognition device according to an embodiment of the present invention.

【図８】図７の装置による基盤展開での話題レベル決定
処理を示すフローチャートである。8 is a flowchart showing a topic level determination process in infrastructure development by the apparatus of FIG.

【図９】話題レベル増減テーブルの構成例を示す図であ
る。FIG. 9 is a diagram showing a configuration example of a topic level increase / decrease table.

【図１０】(a)〜(c)はそれぞれ、入れ子開始型手掛かり
句テーブル、話題転換型手掛かり句テーブルおよび入れ
子終了型手掛かり句テーブルの構成例を示す図である。10A to 10C are diagrams showing configuration examples of a nest start type clue phrase table, a topic conversion type clue phrase table, and a nest end type clue phrase table, respectively.

【図１１】(a),(b)はそれぞれ単文情報テーブルおよび
単語情報テーブルの構成例を示す図である。11A and 11B are diagrams showing configuration examples of a simple sentence information table and a word information table, respectively.

【図１２】話題レベル設定テーブルの構成例を示す図で
ある。FIG. 12 is a diagram showing a configuration example of a topic level setting table.

【図１３】基盤展開記憶部、単文情報テーブルおよび発
話番情報テーブルのそれぞれと話題レベル設定テーブル
に記述された条件との関係を説明する図である。FIG. 13 is a diagram illustrating the relationship between each of the infrastructure expansion storage unit, the simple sentence information table, and the utterance number information table and the conditions described in the topic level setting table.

【図１４】図７の装置を用い本発明の方法を適用して行
なった話題構造認識結果の例を示す図である。14 is a diagram showing an example of a topic structure recognition result performed by applying the method of the present invention using the apparatus of FIG.

[Explanation of symbols]

１０１データ入力部１０２処理部１０３表示部１０４記憶部１０５辞書・規則部１１０言語データ記憶部１１１話題構造記憶部１１２基盤展開記憶部１１３意味的展開記憶部１１４統合話題記憶部１１５単文情報テーブル１１６単語情報テーブル１１７発話番情報テーブル１２１前処理用辞書１２２意味的展開処理規則１２３基盤展開処理規則１２４統合処理規則１２５話題レベル増減テーブル１２６話題レベル設定テーブル１２７入れ子開始型手掛かり句テーブル１２８話題転換型手掛かり句テーブル１２９入れ子終了型手掛かり句テーブル２０１〜２０３ステップ 101 data input unit 102 processing unit 103 display unit 104 storage unit 105 dictionary / rule unit 110 language data storage unit 111 topic structure storage unit 112 basic expansion storage unit 113 semantic expansion storage unit 114 integrated topic storage unit 115 simple sentence information table 116 words Information table 117 Utterance number information table 121 Preprocessing dictionary 122 Semantic expansion processing rule 123 Infrastructure expansion processing rule 124 Integrated processing rule 125 Topic level increase / decrease table 126 Topic level setting table 127 Nesting start type clue phrase table 128 Topic conversion type clue phrase Table 129 Nesting type clue phrase table 201 to 203 steps

Claims

[Claims]

1. A topic level increase / decrease table for describing a relationship between a condition and an increase / decrease in a topic level in a method of determining a topic level in a topic structure recognition process for recognizing a topic structure of language data using a prepared rule. , A topic level setting table for describing a pair of a condition and a new topic level is prepared in advance, and a topic level is determined using the topic level increase / decrease table and the topic level setting table.

2. The topic structure recognition processing divides the language data into simple sentences that are units having only one predicate, extracts a clue phrase from each of the simple sentences, and identifies the type of the clue phrase. Perform topic structure recognition preprocessing that includes the topic expansion, and then separate the topic expansion into the explicitly indicated basic expansion and the semantic expansion that expands within the basic expansion, and the basic expansion processing rule and the semantic expansion processing rule, respectively. Using, and with respect to the infrastructure development, determination of a topic establishment section in which a topic is presented and established, determination of a topic word in the topic establishment section, and determination of a topic level and topic scope indicating nesting of topics are sequentially performed. , Next, with respect to the semantic expansion, the topic establishment section, the topic word, the topic scope and the topic level are sequentially determined, and thereafter, the basic development and the meaning are determined using the integrated processing rule. A process for performing a process of integration with respect to each of the processing result of the expansion, the topic level control method according to claim 1.

3. The topic level is updated according to the topic level increase / decrease table, then it is determined whether or not the condition of the topic level setting table is satisfied, and if the condition is satisfied, the topic level is determined. The topic level control method according to claim 1, wherein the topic level is updated according to a setting table.

4. The topic level control method according to claim 3, wherein the condition of the topic level setting table includes information on speaker replacement.

5. The topic level control method according to claim 3, wherein the condition of the topic level setting table includes information about the value of the topic level updated by the topic level increase / decrease table.

6. An input unit for inputting language data,
A dictionary / rules section that stores rules for topic structure recognition,
It has a processing unit that performs processing using the rules of the dictionary / storage unit, a storage unit that stores the result of the processing unit, and a display unit that displays the processing result of the processing unit, and the clue phrase starts nesting. Type, topic conversion type, and nest ending type, and the dictionary / rule part registers a nest start type clue phrase table in which the word and part of speech of a nest start type clue phrase are registered, and a topic conversion type For the clue phrase of, the topic conversion type clue phrase table in which the word and the part of speech are registered,
A nesting end type cue phrase table in which the word and the part of speech are registered for the nesting end type clue phrase, and a topic level increase / decrease table describing the relationship between the condition including the information of the type of the clue phrase and the increase / decrease in the topic level, And a topic level setting table that describes a set of conditions and topic level values, wherein the storage unit stores a language data storage unit that stores information about language data input from the input unit and information about a topic structure. Including a topic structure storage unit, the language data storage unit, a word information table storing information about character strings and parts of speech of each word included in the language data, words included in each single sentence of the language data, and a clue A simple sentence information table that stores information about phrases and clue phrase types, and information that includes the speaker and starting simple sentence number of each utterance number The topic structure storage unit includes a table for storing information including a topic establishment section that is a range in which a topic is presented and established, a topic word, a topic level, and a topic scope. Characterized topic structure recognition device.

7. The dictionary / rule part divides the language data into simple sentences which are units having only one predicate, extracts a clue phrase from each of the simple sentences, and identifies the type of the clue phrase. Preprocessing dictionary for topic structure recognition preprocessing including, basic expansion processing rules for processing basic expansion, semantic expansion processing rules for processing semantic expansion, basic expansion and semantic An integrated processing rule for integrating expansion, wherein the topic structure storage unit stores a basic expansion storage unit that stores information about basic expansion, a semantic expansion storage unit that stores information about semantic expansion, a basic expansion and a meaning Including an integrated topic storage unit that stores information after integration of dynamic expansion, each of the semantic expansion storage unit and the base expansion storage unit is a topic establishment section in which a topic is presented and established, and a topic word, Comprising a table for storing information including a title level, a topic scope, the topic structure recognition apparatus according to claim 6.

8. The processing unit updates the topic level according to the topic level increase / decrease table, saves the result in the infrastructure development storage unit, and then whether the conditions of the topic level setting table are satisfied. The topic structure recognizing device according to claim 6, wherein if satisfied, the topic level is updated according to the topic level setting table and the result is stored in the infrastructure expansion storage unit. .

9. The condition of the topic level setting table includes:
The topic structure recognizing device according to claim 8, further comprising a condition for using information about a start simple sentence described in the utterance number information table and an utterance number described in the simple sentence information table.

10. The topic structure recognition device according to claim 8, wherein the condition of the topic level setting table includes a condition of using a value of the topic level stored in the infrastructure expansion storage unit.