JPH0785041A

JPH0785041A - Relational concept extraction device

Info

Publication number: JPH0785041A
Application number: JP5230122A
Authority: JP
Inventors: Yuichi Tanaka; 裕一田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-09-16
Filing date: 1993-09-16
Publication date: 1995-03-31

Abstract

(57)【要約】【目的】自然言語処理に用いる辞書の意味記述を自動
的に収集作成するため、入力文章の中から関係付けられ
た２語をその関係とともに抽出すること。【構成】入力文を形態素に分割する形態素解析を行う
形態素解析手段１と、この形態素解析手段１より得られ
た形態素解析結果から語間に成立する関係により関係付
け可能な２語の対を抽出する関係概念抽出手段２とを具
備する。 (57) [Summary] [Purpose] To automatically collect and create the semantic descriptions of a dictionary used for natural language processing, to extract two related words from the input sentence together with the relationship. [Structure] A morpheme analysis means 1 for performing morpheme analysis that divides an input sentence into morphemes, and a pair of two words that can be related by a relationship established between words from the morpheme analysis result obtained by this morpheme analysis means 1 are extracted. And a related concept extracting means 2 for

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は辞書を作成するための語
抽出装置に関し、特に自然言語処理に用いる辞書の意味
記述を自動的に収集作成するために、入力文章の中から
関係付けられた２語を、その関係と共に抽出する関係概
念抽出装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word extracting device for creating a dictionary, and in particular, in order to automatically collect and create the semantic description of a dictionary used for natural language processing, it is related from input sentences. The present invention relates to a relational concept extraction device that extracts two words together with their relations.

【０００２】[0002]

【従来の技術】自然言語に対してその意味を記述するよ
うな自然言語処理を行うときに、広範囲な入力に対して
高精度な処理を行うためには、この広範囲の入力に対応
した意味記述を持つことが必要であり、大規模で正確な
辞書が必要となる。2. Description of the Related Art When performing natural language processing for describing the meaning of a natural language, in order to perform highly accurate processing for a wide range of inputs, a semantic description corresponding to the wide range of inputs is required. It is necessary to have a large and accurate dictionary.

【０００３】特に、技術文書等の場合、毎日のように生
まれる新語を辞書に登録するために従来のように人間が
語の選択やその意味記述を行うことは、大量の語彙とそ
の増加に対しての追随が不可能である。従って自動的に
このような仕事を行う装置の開発が強く要望されてい
る。Particularly in the case of technical documents and the like, in order to register new words that are born every day in a dictionary, it is difficult for humans to select words and describe their meanings in the conventional manner, in order to cope with a large amount of vocabulary and its increase. It is impossible to follow. Therefore, there is a strong demand for the development of a device that automatically performs such work.

【０００４】[0004]

【発明が解決しようとする課題】従来、このような目的
のために入力文章からキーワード、専門用語、固有名詞
等を選択し、自動的に抽出する装置が提案されている
（特開平１−１０２６３８号公報、特開平１−１２１９
２８号公報、特開平３−２８６３７２号公報参照のこ
と）。Conventionally, there has been proposed a device for automatically selecting and extracting a keyword, a technical term, a proper noun or the like from an input sentence for such a purpose (Japanese Patent Laid-Open No. 10-102638). Japanese Patent Laid-Open No. 1219/1989
28, JP-A-3-286372).

【０００５】しかし、これらの装置では、対象とする語
の認識に重点が置かれ、語を１語ずつ抽出することしか
できないため、辞書の見出し語に相当する部分しか作成
することができなかった。例えば「フロッピーディスク
等の補助記憶媒体・・・」という文章があるとき、「フ
ロッピーディスク」という語が抽出できるものの、「フ
ロッピーディスク」と「補助記憶媒体」とを関係づけて
抽出することができなかったので、「フロッピーディス
ク」という見出し語に相当する部分は抽出できても、
「フロッピーディスク」が「補助記憶媒体」であるとい
う意味情報は抽出できなかった。従ってこのような装置
では、意味情報を含む辞書を自動的に作成することは不
可能である。However, in these devices, the emphasis is placed on the recognition of the target word and only the words can be extracted one by one, so that only the portion corresponding to the entry word in the dictionary can be created. . For example, when there is a sentence "auxiliary storage medium such as floppy disk ...", the word "floppy disk" can be extracted, but "floppy disk" and "auxiliary storage medium" can be extracted in association with each other. I didn't have it, so I could extract the part corresponding to the entry word "floppy disk",
The semantic information that "floppy disk" is "auxiliary storage medium" could not be extracted. Therefore, in such a device, it is impossible to automatically create a dictionary including semantic information.

【０００６】本発明の目的は、このような意味情報を含
む辞書の自動作成にかかる問題点を解決するために、語
を単独でなく、他の語との関係を自動的に認識し、関係
付けられた語の対として抽出することにより、辞書に登
録すべき意味情報を得ることができる関係概念抽出装置
を提供することである。An object of the present invention is to automatically recognize a relationship not only with a word but with another word in order to solve the problem of automatically creating a dictionary containing such semantic information, and An object of the present invention is to provide a relational concept extraction device that can obtain semantic information to be registered in a dictionary by extracting it as a pair of attached words.

【０００７】[0007]

【課題を解決するための手段】前記目的を達成するた
め、本発明では、図１（Ａ）に示す如く、形態素解析部
１、構文解析部２、文脈解析部３、関係概念抽出部４を
設ける。To achieve the above object, in the present invention, as shown in FIG. 1 (A), a morphological analysis unit 1, a syntax analysis unit 2, a context analysis unit 3, and a relational concept extraction unit 4 are provided. Set up.

【０００８】例えば入力文として「フロッピーディスク
等の補助記憶媒体をを利用して・・・。」という文章が
入力されたとき、形態素解析部１はこれを名詞、格助
詞、副詞、動詞、助動詞・・・の如く、単語つまり形態
素に分け、形態素解析を行う。For example, when a sentence "using an auxiliary storage medium such as a floppy disk ..." Is input as an input sentence, the morphological analysis unit 1 uses it as a noun, case particle, adverb, verb, auxiliary verb. , Etc., the words are divided into morphemes, and morpheme analysis is performed.

【０００９】関係概念抽出部４には、図４（Ａ）に示す
如き関係概念抽出規則が保持され、形態素解析部１から
伝達された形態素解析結果がこれらのパターンに該当す
るか否かを検出する。前記例では「フロッピーディスク
等の補助記憶媒体」ということから「Ｘ等のＹ」という
パターンに該当するものとして認識され、その結果、
「ＸはＹの下位概念である」という意味を持つ「Ｙｓ
ｕｂｃＸ」という出力結果において、ＸとＹとに文中
の語がそれぞれ割り当てられて「補助記憶媒体ｓｕｂｃ
フロッピーディスク」という関係概念が抽出される。The relational concept extraction unit 4 holds a relational concept extraction rule as shown in FIG. 4A, and detects whether or not the morphological analysis result transmitted from the morphological analysis unit 1 corresponds to these patterns. To do. In the above example, it is recognized as corresponding to the pattern of "Y such as X" because it is "auxiliary storage medium such as floppy disk", and as a result,
"Y s" has the meaning "X is a subordinate concept of Y"
In the output result "ubc X", the words in the sentence are respectively assigned to X and Y, and "auxiliary storage medium subc
The related concept of "floppy disk" is extracted.

【００１０】このような関係概念には、図１（Ｂ）に一
例を示すような種類がある。これらの関係概念の意味す
るところは、例えば、「ＸｃｏｍｐＹ」は「装置Ｘ
を構成する要素（モジュール）の一つが装置Ｙである」
という意味であり、「Ｘｔ−ｅｌｅｍＹ」は「技術
Ｘを実現するための要素技術の一つに技術Ｙがある」と
いう意味であり、「ＸｆｕｎｃＹ」は「装置Ｘを動
かすとその機能の一つとして処理Ｙを行う」という意味
である。There are types of such relational concepts, an example of which is shown in FIG. For example, "X comp Y" means "device X".
One of the elements (modules) that make up the device is the device Y. "
Means "X t-elem Y" means "technology Y is one of the elemental technologies for realizing technology X", and "X func Y" means "when device X is moved, The process Y is performed as one of the functions ".

【００１１】ところで、形態素解析のみでは正確に関係
概念が抽出できない場合がある。例えば「フロッピーデ
ィスクは計算機本体から伝達されるデータを磁気的に格
納する。」とは見ずに、「フロッピーディスクは計算機
本体から伝達される」と判断して誤った関係概念を抽出
してしまう。By the way, there are cases where the relational concept cannot be accurately extracted only by morphological analysis. For example, instead of looking at "the floppy disk magnetically stores the data transmitted from the computer body", judge that "the floppy disk is transmitted from the computer body" and extract the wrong relational concept. .

【００１２】このような誤りを防ぐため、構文解析部２
により、「フロッピーディスクにはデータを磁気的に格
納する。」という正しい係り受け関係を構文解析により
識別して、これを関係概念抽出部４に送出し、関係概念
抽出部４においてパターンを正確に抽出する。In order to prevent such an error, the parsing unit 2
The syntactic analysis identifies the correct dependency relationship "data is magnetically stored on the floppy disk." And sends it to the relation concept extraction unit 4 to accurately determine the pattern. Extract.

【００１３】また、「本発明は補助記憶媒体に関する。
特にフロッピーディスクに関するものである。」という
ような２つの文より「フロッピーディスクは補助記憶媒
体」という関係概念を抽出することが必要であるが、こ
のために文脈解析部３が隣接する複数の文にわたって解
析を行う必要があるか否か判断する。The present invention also relates to an auxiliary storage medium.
Especially, it relates to a floppy disk. It is necessary to extract the relational concept that "the floppy disk is an auxiliary storage medium" from two sentences such as ". Is this necessary for the context analysis unit 3 to analyze a plurality of adjacent sentences? Judge whether or not.

【００１４】[0014]

【作用】本発明により、「フロッピーディスク」という
ような見出し語ではなく、「フロッピーディスク等の補
助記憶媒体」、「フロッピーディスクの機能はデータの
格納」、「フロッピーディスクはフロッピーディスクド
ライブで駆動する」・・・というような２語の間に成り
立つ関係を抽出することができる。したがってこれらを
一定期間集めることにより、例えばフロッピーディスク
がどのようなものかその意味の定義情報を収集すること
ができるので、これを用いて辞書の作成を容易にするこ
とができる。According to the present invention, instead of a heading such as "floppy disk", "auxiliary storage medium such as floppy disk", "function of floppy disk stores data", "floppy disk is driven by floppy disk drive" It is possible to extract a relationship that holds between two words such as "...". Therefore, by collecting these for a certain period of time, it is possible to collect the definition information of what the floppy disk is like, and thus it is possible to facilitate the creation of the dictionary.

【００１５】[0015]

【実施例】本発明の一実施例を図２〜図６にもとづき説
明する。図２は本発明の一実施例構成図、図３は文列、
図４はパターン規則例説明図、図５は関係概念抽出状態
説明図、図６は形態素解析／構文解析に適用する他の規
則例である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described with reference to FIGS. 2 is a block diagram of an embodiment of the present invention, FIG. 3 is a sentence sequence,
4 is an explanatory diagram of a pattern rule example, FIG. 5 is an explanatory diagram of a relation concept extraction state, and FIG. 6 is another example of a rule applied to morphological analysis / syntactic analysis.

【００１６】図２において図１と同記号は同一部を示
し、１は形態素解析部、２は構文解析部、３は文脈解析
部、４は関係概念抽出部、５は単語辞書、６は識別辞書
である。In FIG. 2, the same symbols as in FIG. 1 indicate the same parts, 1 is a morpheme analysis part, 2 is a syntactic analysis part, 3 is a context analysis part, 4 is a relational concept extraction part, 5 is a word dictionary, and 6 is an identification. It is a dictionary.

【００１７】形態素解析部１は複数の入力文からなる入
力文章を１文毎に分けるとともに、図３（Ａ）に示す如
く、個々の入力文を単語が格納されている辞書５と照合
して品詞に分け、単語単位に分割するという、形態素規
則を用いて形態素に分割するものである。The morphological analysis unit 1 divides an input sentence consisting of a plurality of input sentences into sentences, and collates each input sentence with a dictionary 5 in which words are stored, as shown in FIG. 3 (A). It is divided into morphemes using a morpheme rule of dividing into parts of speech and dividing into words.

【００１８】構文解析部２は、形態素解析部１から送ら
れてきた個々の入力文に対する形態素解析結果、つまり
単語単位に分割され各単語を構成している品詞の解析結
果を元にして、構文解析規則を用いて入力文章の構造を
解析する構文解析処理を行うものである。The syntax analysis unit 2 constructs a syntax based on the morpheme analysis result for each input sentence sent from the morpheme analysis unit 1, that is, the analysis result of the part of speech which is divided into word units and constitutes each word. The parsing process is performed to analyze the structure of the input sentence using the parsing rules.

【００１９】例えば図３（Ａ）に示す「フロッピーディ
スクは計算機本体から伝達されるデータを格納する。」
という文が入力されたとき、単語辞書５を参照して形態
素解析部１はこれを図３（Ａ）に示す如く、単語と品詞
に分析する形態素解析を行う。そしてこれらが構文解析
部２に伝達される。For example, as shown in FIG. 3A, "the floppy disk stores data transmitted from the computer main body."
3 is input, the morpheme analysis unit 1 refers to the word dictionary 5 and performs a morpheme analysis to analyze the sentence into words and parts of speech as shown in FIG. Then, these are transmitted to the syntax analysis unit 2.

【００２０】構文解析部２はこの結果を構文解析規則に
当てはめて構文解析を行い、「計算機本体から伝達され
る」「伝達されるデータを」「データを格納する」「フ
ロッピーディスクが格納する」という係り受け関係を認
識し、図３（Ｂ）に示すような形式で関係概念抽出部４
に出力する。The syntactic analysis unit 2 applies the result to the syntactic analysis rule to perform syntactic analysis, and "transmits data from computer""storesdata""storesdata""stores in floppy disk". The dependency concept extracting unit 4 recognizes the dependency relationship
Output to.

【００２１】文脈解析部３は、形能素解析部１からの形
態素解析結果、構文解析部２からの構文解析結果及び入
力文章を受け取り、入力文章の文脈解析を行うものであ
る。例えば図３（Ｂ）に示す如く、第１文（前文）とし
て「本発明はフロッピーディスクに関する。」が入力さ
れ、第２文（後文）として「特にワープロ用に関するも
のである。」が入力されたとき、この２文にまたがる関
係概念が存在することを認識し、この２文をまとめて処
理すべきことを関係概念抽出部４に通知する。The context analysis unit 3 receives the morpheme analysis result from the morphological element analysis unit 1, the syntactic analysis result and the input sentence from the syntactic analysis unit 2, and performs context analysis of the input sentence. For example, as shown in FIG. 3B, "the present invention relates to a floppy disk" is input as the first sentence (preamble) and "particularly for word processing" is input as the second sentence (postscript). When this is done, it recognizes that there is a relational concept that spans these two sentences, and notifies the relational concept extraction unit 4 that these two sentences should be processed collectively.

【００２２】文脈解析の対象パターンは勿論これのみで
なく、例えば図４（Ｂ）に示す如く、第１文（前文）と
して「・・・をＸと呼ぶ。」が入力され、第２文（後
文）として「それをＰすると・・・」の場合、これら２
つの文はつながりがあるものと判別し、この２文の組を
まとめて処理すべきことを関係概念抽出部４に通知す
る。Of course, the target pattern of the context analysis is not limited to this. For example, as shown in FIG. 4B, "... refers to X." is input as the first sentence (preamble) and the second sentence ( 2) in the case of “If you do P ...” as the latter sentence)
It is determined that the two sentences are connected, and the relational concept extraction unit 4 is notified that the two sentence sets should be processed together.

【００２３】また、定型的な入力文章において、入力文
のタイプに応じて関係を抽出するためのパターン規則を
選択して切替えると効率的である。例えば特許公報の場
合では、「産業上の利用分野」、「従来技術の問題点」
等の項目によって、その項目を構成する文の形式特徴が
異なるので、上記のパターン規則群をそれぞれの項目毎
に切替えることにより実行効率、抽出精度をともに向上
させることができる。In a typical input sentence, it is efficient to select and switch the pattern rule for extracting the relationship according to the type of the input sentence. For example, in the case of patent publications, "industrial application field" and "problems of conventional technology"
Since the formal characteristics of the sentence forming the item differ depending on the item such as, the execution efficiency and the extraction accuracy can be improved by switching the pattern rule group for each item.

【００２４】関係概念抽出部４は、前記形態素解析部１
における形態素解析結果、前記構文解析部２における構
文解析結果、前記文脈解析部３における文脈解析結果の
情報に対して、予め用意された関係表現パターンを適用
し、識別辞書６を参照して予め指定された関係と、その
関係によって関係付けられた２語の対を抽出する。The relational concept extraction unit 4 includes the morpheme analysis unit 1
The relational expression pattern prepared in advance is applied to the information of the morpheme analysis result in, the syntactic analysis result in the syntactic analysis unit 2, and the context analysis result in the context analysis unit 3, and the designation is made in advance by referring to the identification dictionary 6. And a pair of two words related by the relationship.

【００２５】図４（Ａ）により関係表現パターンと抽出
アクションについて説明する。 (1) 入力文章に「Ｍ₁部を改良したＭ₂」というパター
ンの存在を識別辞書６を参照して認識したとき、さらに
識別辞書６に記入されている制約条項をみてＭ ₁、Ｍ₂
が装置型か否か判別し、これらが装置型のとき、語
Ｍ₁、Ｍ₂を切り出して、（Ｍ₂ｃｏｍｐＭ₁）を抽
出し、図示省略したファイルに保持する。ここで装置型
とは、例えば「入力部を改良したパソコン」における
「入力部」、「パソコン」の如きものを示し、例えば
「上部を改良したもの」における「上部」、「もの」は
装置型と識別しない。Extraction of relational expression pattern according to FIG.
The action will be described. (1) Enter "M₁Part improved M₂The putter
When the existence of a user is recognized by referring to the identification dictionary 6,
Looking at the restrictions entered in the identification dictionary 6, M ₁, M₂
Is a device type, and if these are device types,
M₁, M₂And cut out (M₂comp M₁)
It is stored and stored in a file (not shown). Device type here
Is, for example, in "PC with improved input section"
Indicates something like "input section" or "personal computer".
The “upper” and “thing” in the “improved top”
Do not distinguish from device type.

【００２６】(2) 入力文章に「Ｍ₁を｛含む｜有する｜
具備した｜備えた｜内蔵する｜持つ｝Ｍ₂」というパタ
ーンの存在を認識したとき、Ｍ₁、Ｍ₂が装置型か否か
判別し、これらが装置型のとき、（Ｍ₂ｃｏｍｐ
Ｍ₁）を抽出し、保持する。なおここで｛Ａ｜Ｂ｜・・
・｝は要素Ａ、Ｂ・・・のうちどれか１つが入ることを
表す。(2) The input sentence contains "M ₁ {includes | has |
When the existence of the pattern "equipped | equipped | equipped | built-in | held | having} M ₂ " is recognized, it is determined whether or not M ₁ and M ₂ are device types, and when these are device types, (M ₂ comp
Extract M ₁ ) and retain. Here, {A | B | ...
.} Represents that any one of the elements A, B ... Is included.

【００２７】(3) 入力文章に「Ｍ₁に内蔵されるＭ₂」
というパターンの存在を認識したとき、Ｍ₁、Ｍ₂は装
置型か否か判別し、これらが装置型のとき、（Ｍ₁ｃｏ
ｍｐＭ₂）を抽出し、保持する。(3) In the input sentence, "M ₂ built in M ₁ "
When the existence of such a pattern is recognized, it is determined whether or not M ₁ and M ₂ are device types, and when these are device types, (M ₁ co
mpM ₂ ) is extracted and retained.

【００２８】(4) 入力文章に「Ｍ₁及びそのＭ₂」とい
うパターンの存在が認識され、Ｍ₁、Ｍ₂は装置型であ
ると判別されたとき、（Ｍ₁ｃｏｍｐＭ₂）を抽出す
る。 (5) 入力文章に「紙幣の自動識別機能を有する両替機」
のような、「Ｆ｛機能｜手段｝を｛有する｜持った｝
Ｍ」というパターンの存在が認識され、Ｆは機能型、Ｍ
は装置型であると判別されたとき、（Ｍｆｕｎｃ
Ｆ）を抽出する。(4) When the existence of the pattern "M ₁ and its M ₂ " is recognized in the input sentence and it is determined that M ₁ and M ₂ are device types, (M ₁ comp M ₂ ) is extracted. To do. (5) Input text "Currency changer with automatic bill recognition function"
Such as "F {function | means} {has / has}
The existence of the pattern "M" is recognized, and F is a functional type, M
Is a device type, (M func
F) is extracted.

【００２９】(6) 入力文章に「Ｆ｛させる｜を行う｝
〔｛ことができる｜ための｝〕Ｍ」というパターンの存
在が認識され、Ｆは機能型、Ｍは装置型であると判別さ
れたとき、（ＭｆｕｎｃＦ）を抽出する。ここで
〔Ａ〕は要素Ａはあってもなくてもよいことを示す。(6) In the input sentence, "F {perform |
When the existence of the pattern [{to allow | for}} M] is recognized and F is determined to be a functional type and M is a device type, (M func F) is extracted. Here, [A] indicates that the element A may or may not be present.

【００３０】(7) 入力文章に「Ｆ〔が〕｛できる｜可能
な｝Ｍ」というパターンの存在が認識され、Ｆは機能
型、Ｍは装置型であると判別されたとき（Ｍｆｕｎｃ
Ｆ）を抽出する。(7) When the existence of the pattern "F [ga] {able | possible} M" is recognized in the input sentence and it is determined that F is a functional type and M is a device type (M func
F) is extracted.

【００３１】(8) 入力文章に「Ｐ₁に｛おける｜対す
る｝Ｐ₂」というパターンの存在が認識され、Ｐ₁、Ｐ
₂が入出力とか計算を示すような処理型であると判別さ
れたとき（Ｐ₁ｔ−ｅｌｅｍＰ₂）を抽出する。(8) The existence of a pattern "P ₁ {to | corresponds to P} P ₂ " is recognized in the input sentence, and P ₁ and P
_When it is determined that ₂ is a processing type that indicates input / output or calculation (P ₁ t-elem P ₂ ), it is extracted.

【００３２】(9) 入力文章に「Ｘに｛対する｜対して｝
Ｐ」というパターンの存在が認識され、Ｐが処理型の場
合、Ｘについてはこれを問わずに（ＰｏｂｊＸ）を
抽出する。(9) In the input sentence, "for X {to | to}}
When the existence of the pattern "P" is recognized and P is a processing type, (P obj X) is extracted regardless of X.

【００３３】(10) 入力文章に「Ｘ｛等｜など｝のＹ」
というパターンの存在が認識され、Ｘ、Ｙが同じ型に属
する場合、（ＹｓｕｂｃＸ）を抽出する。関係概念
抽出部４は、このように抽出した結果を、図示省略した
ファイルに保持する。(10) "Y of X {etc. Etc." in the input sentence
If the existence of the pattern is recognized, and X and Y belong to the same type, (Y subc X) is extracted. The relational concept extraction unit 4 holds the result of such extraction in a file (not shown).

【００３４】単語辞書５は、入力された文章を形態素解
析部１で形態素解析を行うために必要な単語、品詞等が
記入されている。識別辞書６は、関係概念抽出部４がア
クションに必要な、例えば図４（Ａ）に示す如き関係表
現パターン、制約等が格納されている。また装置型、機
能型、処理型等の例も格納されている。なお関係表現パ
ターンとしては更に図６に示す如きものがある。なお図
６では制約の表示については省略した。勿論関係表現パ
ターンはこれらのみに限定されるものではない。The word dictionary 5 is filled with words, parts of speech, etc. necessary for the morphological analysis unit 1 to perform morphological analysis on the input sentence. The identification dictionary 6 stores relational expression patterns, constraints, etc., such as those shown in FIG. 4A, which the relational concept extraction unit 4 needs for actions. In addition, examples of device type, functional type, processing type, etc. are also stored. Note that the relation expression pattern further includes that shown in FIG. It should be noted that the display of constraints is omitted in FIG. Of course, the relational expression patterns are not limited to these.

【００３５】図２に示す本発明の動作について説明す
る。形態素解析部１に対し、図５に示す如き入力文章
「フロッピーディスク等の補助記憶媒体を利用して・・
・」が入力されたとき、形態素解析部１は単語辞書５を
参照して、これを単語毎に分析し、これに品詞を付加し
て構文解析部２および関係概念抽出部４に送出する。
構文解析部２では、これを前記の如く係り受け関係を解
析して文脈解析部３及び関係概念抽出部４に送出する。The operation of the present invention shown in FIG. 2 will be described. For the morphological analysis unit 1, the input sentence “using an auxiliary storage medium such as a floppy disk as shown in FIG.
When "..." is input, the morpheme analysis unit 1 refers to the word dictionary 5, analyzes each word, adds a part of speech to this, and sends it to the syntax analysis unit 2 and the related concept extraction unit 4.
The syntax analysis unit 2 analyzes the dependency relationship as described above and sends it to the context analysis unit 3 and the relation concept extraction unit 4.

【００３６】構文解析部２での解析の結果、形態素解析
処理にもとづき関係表現パターンの抽出が行えることが
わかるので、関係概念抽出部４は識別辞書６を参照し、
関係表現パターン「Ｘ等のＹ」に相当すること、Ｘがフ
ロッピーディスクでありＹが補助記憶媒体でいずれも装
置型であることが識別される。これらの結果「フロッピ
ーディスクＳｕｂｃ補助記憶媒体」という関係概念
が抽出され、これが図示省略したファイルに保持され
る。As a result of the analysis in the syntactic analysis unit 2, it can be understood that the relational expression pattern can be extracted based on the morphological analysis processing. Therefore, the relational concept extraction unit 4 refers to the identification dictionary 6,
It is identified that they correspond to the relational expression pattern "Y such as X", that X is a floppy disk, Y is an auxiliary storage medium, and both are device type. As a result, the related concept of "floppy disk Subc auxiliary storage medium" is extracted and held in a file (not shown).

【００３７】このようにして例えば半年とか１年とかの
期間に多数の入力文章から関係概念が抽出されるが、こ
れらを特定の語毎に、例えば「フロッピーディスク」を
キーにしてまとめることにより「フロッピーディスク」
の属性として、補助記憶媒体の一種、計算機に使用、ワ
ープロに使用、・・・等の意味定義が得られるので、こ
のようにして辞書、例えば専門用語集をきわめて容易に
編集することができる。In this way, related concepts are extracted from a large number of input sentences during a period of, for example, half a year or one year. By grouping these with a specific word, for example, using "floppy disk" as a key, floppy disk"
As an attribute of, a meaning definition such as one of auxiliary storage media, used in a computer, used in a word processor, etc. can be obtained, and thus a dictionary, for example, a technical terminology can be extremely easily edited.

【００３８】なお、入力文章として論文のタイトルと
か、新聞の見出しなどのように、単純な文の場合には、
形態素解析処理を行うだけで所定の情報が抽出できるの
で、このような場合には構文解析部２や文脈解析部３を
停止状態にして、形態素解析部１からの処理にもとづき
効率よく抽出できる。In the case of a simple sentence such as a title of a paper or a headline of a newspaper as an input sentence,
Since the predetermined information can be extracted only by performing the morpheme analysis process, in such a case, the syntactic analysis unit 2 and the context analysis unit 3 are stopped and can be efficiently extracted based on the processing from the morpheme analysis unit 1.

【００３９】また入力文及び入力文章の種類によっては
構文解析部及び文脈解析部の一方を省略することができ
る。例えば複数の文にまたがるものは抽出しないと決め
た場合には文脈解析部は使用しないことができる。Further, depending on the type of the input sentence and the input sentence, one of the syntactic analysis unit and the context analysis unit can be omitted. For example, if it is decided not to extract a sentence that spans a plurality of sentences, the context analysis unit may not be used.

【００４０】[0040]

【発明の効果】本発明により２語の対の関係概念を、自
動的に収集することができるので、従来の用語辞書の編
集のように、マニアルでこのような関係を収集する場合
に比較して編集効能を非常に向上することができる。As described above, according to the present invention, the relation concept of a pair of two words can be automatically collected. Therefore, as compared with the conventional term dictionary editing, when the relation is manually collected, a comparison is made. Therefore, the editing effect can be greatly improved.

[Brief description of drawings]

【図１】本発明の原理図を示す。FIG. 1 shows a principle diagram of the present invention.

【図２】本発明の一実施例構成図を示す。FIG. 2 shows a configuration diagram of an embodiment of the present invention.

【図３】文例を示す。FIG. 3 shows example sentences.

【図４】パターン規則例を示す。FIG. 4 shows an example of pattern rules.

【図５】抽出状態説明図を示す。FIG. 5 shows an explanatory diagram of an extraction state.

【図６】形態素解析／構文解析に適用する他のパターン
規則例を示す。FIG. 6 shows another example of pattern rules applied to morphological analysis / syntactic analysis.

[Explanation of symbols]

１形態素解析部２構文解析部３文脈解析部４関係概念抽出部５単語辞書６識別辞書 1 Morphological analysis unit 2 Syntax analysis unit 3 Context analysis unit 4 Relational concept extraction unit 5 Word dictionary 6 Discrimination dictionary

Claims

[Claims]

1. A morpheme analysis means (1) for performing a morpheme analysis for dividing an input sentence into morphemes, and a morpheme analysis result obtained by this morpheme analysis means (1) can be related by a relationship established between words. A relational concept extraction device comprising: a relational concept extraction means (4) for extracting a pair of two words.

2. A syntactic analysis unit (2) for analyzing the structure of the input sentence is provided, and the sentence parsing result obtained by the syntactic analysis unit (2) enables relations to be established by a relationship established between words. 2. The relational concept extraction device according to claim 1, wherein a pair of words is extracted.

3. A context analysis means (3) for analyzing a relation established between a plurality of sentences of the input sentence is provided, and a word space is established from a context analysis result obtained by the context analysis means (3). 2. The relationship concept extracting device according to claim 1, wherein a pair of two words that can be related by the relationship is extracted.

4. A context analysis means (3) for analyzing a relationship established between a plurality of input sentences is provided, and a context analysis result obtained by the context analysis means (3) is used to establish a relationship between words. 3. The relational concept extraction device according to claim 2, wherein pairs of two words that can be related are extracted.