JPH0412871B2

JPH0412871B2 -

Info

Publication number: JPH0412871B2
Application number: JP59171754A
Authority: JP
Inventors: Juji Uchida; Akinari Masuyama
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-08-18
Filing date: 1984-08-18
Publication date: 1992-03-05
Also published as: JPS6151270A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は大量の自然言語文から名詞と動詞との
間の係り受け可能な二語間関係を抽出する二語間
関係抽出方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a two-word relationship extraction method for extracting dependent two-word relationships between nouns and verbs from a large amount of natural language sentences.

機械翻訳例えば日英機械翻訳で用いられる実用
性のある翻訳方式には、大別して２つの方式即ち
普遍的意味表現経由方式（ピボツト方式）と言語
間対応変換方式（トランスフア方式）とがある。
これらのいづれの方式においても、その機械翻訳
において、自然言語文の二語間関係を用いること
が必要になる。従つて、機械翻訳装置において
は、自然言語文の二語間関係を予め二語間関係デ
ータベースに登録しておかなければならない。 Practical translation methods used in machine translation, for example, Japanese-English machine translation, can be roughly divided into two methods: the universal meaning expression method (pivot method) and the interlingual correspondence conversion method (transfer method).
In any of these methods, it is necessary to use the relationship between two words in a natural language sentence in the machine translation. Therefore, in a machine translation device, the relationship between two words in a natural language sentence must be registered in a two-word relationship database in advance.

[Conventional technology]

このような二語間関係データベースは国語辞書
やシソーラスのように世の中に存在するものでは
ない一方、そこに登録されるべきデータ量は多量
になるものである。従来この種のデータベース作
成は人手に頼るか、又は自然言語文内の連語を調
べる装置に依存していた。 While such two-word relationship databases do not exist in the world like Japanese dictionaries or thesauruses, the amount of data that must be registered in them is large. Traditionally, creating this type of database has relied on humans or on devices that examine collocations in natural language sentences.

[Problem that the invention seeks to solve]

上述のような自然言語文内の二語間関係をデー
タベースに登録するのに前者に依存する技法によ
り、必要とする大量のデータを得ることはほぼ不
可能であつた。 It has been almost impossible to obtain the required large amount of data using the technique described above that relies on the former for registering the relationships between two words in a natural language sentence in a database.

又、後者による技法では、二語間関係として抽
出されるものであつて最も重要な名詞−動詞の関
係のうち、日本語文にあつては名詞−サ変動詞の
関係を調べることはできるが、名詞−本動詞の関
係が調べられないこと、更には助詞の介在した関
係が調べ得ないという不具合がある。 Furthermore, with the latter technique, it is possible to examine the noun-sa verb relationship in Japanese sentences, which is the most important noun-verb relationship that is extracted as a two-word relationship. -There is a problem that the relationship between main verbs cannot be investigated, and furthermore, the relationship involving particles cannot be investigated.

[Means for solving problems]

本発明は上述した問題点を解決し得る二語間関
係抽出方式を提供するもので、その手段は自然言
語文を記憶する自然言語データベースと、該デー
タベースから読み出された自然言語文を単語に分
割する単語分割手段と、単語に分割された文につ
いて文節合成を行ない、その文節から名詞と動詞
との係り受け可能な関係を抽出する二語間関係抽
出手段と、抽出された名詞と動詞との間の係り受
け可能な関係を示す二語間関係データを記憶する
二語間関係データベースとを備えて構成したもの
である。 The present invention provides a two-word relationship extraction method capable of solving the above-mentioned problems, and the method includes a natural language database that stores natural language sentences, and a natural language sentence read from the database that is converted into words. a word division means for dividing a sentence into words; a two-word relationship extraction means for performing clause synthesis on a sentence divided into words; and a two-word relation extraction means for extracting a dependent relationship between a noun and a verb from the clause; and a two-word relationship database that stores two-word relationship data indicating a relationship that can be modified between words.

[Effect]

本発明方式によれば、自然言語文を単語に分割
し、単語に分割された文について文節合成を行な
い、その文節から名詞と動詞との係り受け可能な
関係を抽出するようにしているから、二語間関係
のうちの最も重要な名詞−動詞の係り受け可能な
関係を大量の自然言語文から取得して、これによ
つて名詞−動詞の係り受け可能な関係について客
観的な判断材料を提供し得る。 According to the method of the present invention, a natural language sentence is divided into words, clauses are synthesized for the sentences divided into words, and dependable relationships between nouns and verbs are extracted from the clauses. Obtain the most important noun-verb dependency relationships from a large number of natural language sentences, and use this information to objectively judge the noun-verb dependency relationships. can be provided.

〔Example〕

以下、第１図乃至第３図を参照しながら本発明
の実施例を説明する。 Embodiments of the present invention will be described below with reference to FIGS. 1 to 3.

第１図は本発明の一実施例を示す。この図にお
いて、１は自然言語文データベース（以下、日本
語文について説明する。）で、２は日本語文を日
本語辞書３を参照しながら単語に分割する単語分
割装置である（この装置は既に本出願人によつて
提案済みのものである）。４は二語間関係抽出装
置であり、この装置は単語に分割された文につい
て文節合成を行ない、その文節から名詞と動詞と
の係り受け可能な関係を抽出するものである。５
は抽出された名詞と動詞との間の係り受け可能な
関係を示す二語間関係データを記憶する二語間関
係データベースである。６は二語間関係データベ
ースに検索処理を行なう検索装置である。 FIG. 1 shows an embodiment of the invention. In this figure, 1 is a natural language sentence database (Japanese sentences will be explained below), and 2 is a word dividing device that divides Japanese sentences into words while referring to a Japanese dictionary 3 (this device is already in production). proposed by the applicant). Reference numeral 4 denotes a two-word relationship extraction device, which performs phrase synthesis on sentences divided into words, and extracts dependable relationships between nouns and verbs from the phrases. 5
is a two-word relationship database that stores two-word relationship data indicating a dependent relationship between extracted nouns and verbs. Reference numeral 6 denotes a search device that performs search processing on the two-word relationship database.

第２図は、二語間関係抽出装置４の細部構成図
である。文節合成装置１０と、解析装置１１と、
抽出装置１２とから成る。文節合成装置１０は、
単語分割装置２によつて単語に分割された文内の
付属語を左の自立語に接続させる（接頭語につい
ては右の自立語に接続させる）処理で文節合成を
行なう。解析装置１１は、合成された文節に文脈
自由文法（公知のもの）、動詞句→動詞 …… 動詞句→連用文節、且つ動詞句 …… を適用してその適用がなくなるときまでの解析木
を求める処理を行なう。抽出装置１２は、求めら
れた解析木から名詞と動詞との係り受け可能な関
係を示す二語間関係データを抽出出力する。 FIG. 2 is a detailed configuration diagram of the two-word relationship extraction device 4. As shown in FIG. A phrase synthesis device 10, an analysis device 11,
It consists of an extraction device 12. The phrase synthesis device 10 is
Clause synthesis is performed by connecting adjuncts in sentences divided into words by the word dividing device 2 to independent words on the left (or connecting prefixes to independent words on the right). The analysis device 11 applies a context-free grammar (known) to the synthesized phrase, verb phrase → verb...verb phrase → conjunctive clause, and verb phrase... and creates an analysis tree until the application is no longer applied. Perform the desired process. The extraction device 12 extracts and outputs two-word relationship data indicating a dependent relationship between a noun and a verb from the obtained analytic tree.

次に、上述構成の下における二語間関係抽出過
程を説明する。 Next, the process of extracting the relationship between two words under the above configuration will be explained.

説明の都合上、次の日本語文を例にとつて説明
する。 For convenience of explanation, the following Japanese sentence will be used as an example.

（例文）このLSIは10倍以上の実装密度を実現
でき、装置の小型化に寄与している。 (Example) This LSI can achieve more than 10 times the packaging density, contributing to the miniaturization of devices.

この例文が日本語文データベース１から読み出
され、単語分割装置２において、日本語辞書３を
参照しながら次のように単語分割される。 This example sentence is read from the Japanese sentence database 1, and is word-divided by the word division device 2 as follows while referring to the Japanese dictionary 3.

単語に分割された文が二語間関係抽出装置４に
与えられ、その文節合成装置１０において付属語
を左の自立語に接続させる（接頭語については右
の自立語に接続させる）ことで文節合成を行なつ
て次のような文節を得る。 The sentence divided into words is given to the two-word relationship extraction device 4, and the clause synthesis device 10 connects the adjunct to the independent word on the left (the prefix is connected to the independent word on the right) to create a clause. Perform composition to obtain the following clause.

この｜LSI｜は、｜10倍以上の｜実装｜密度を
｜実現でき、｜装置の｜小型化に｜寄与している。
｜この文節合成された文が解析装置１１へ入力さ
れる。解析装置１１の解析は、文節合成された文
に対し、上記文脈自由文法、を文の右から左
へ適用することである。そして、これら文脈自由
文法、を適用できなくなるとき、解析装置１
０による解析処理は終了する。例えば、上記文節
合成された文において、第３図に示すように、
「寄与している。」なる動詞に文脈自由文法を適
用したとき、これを動詞句とする。この動詞句と
「小型化に」なる連用文節とに文脈自由文法を
適用したとき、これら両者は動詞句として取り扱
われて解析木が生成される。この最新に生成され
た解析木における動詞句と、上記文節合成された
文内の次の文節即ち連体文節とは、上記文脈自由
文法を適用しても文脈自由文法を適用し得ないの
で、解析装置による解析処理は終了する。 This |LSI| can achieve |a mounting density |of more than 10 times |and contributes to |the miniaturization of |devices.
| This phrase-synthesized sentence is input to the analysis device 11. | The analysis performed by the analysis device 11 is to apply the above-mentioned context-free grammar to the phrase-synthesized sentence from right to left of the sentence. When these context-free grammars cannot be applied, the analysis device 1
The analysis process using 0 ends. For example, in the clause-synthesized sentence above, as shown in Figure 3,
When context-free grammar is applied to the verb ``contribute.'', this is treated as a verb phrase. When the context-free grammar is applied to this verb phrase and the conjunctive phrase "to miniaturize," both are treated as verb phrases and a parse tree is generated. The verb phrase in this latest generated parse tree and the next clause in the clause-combined sentence, that is, the adjunctive clause, cannot be analyzed using the context-free grammar even if the context-free grammar is applied. The analysis process by the device ends.

このような解析処理により得られた解析木を構
成し、名詞と動詞との係り受け可能な関係を示す
二語間関係データを残す。上記例文では、小型化に｜寄与している。 An analysis tree obtained through such analysis processing is constructed, and two-word relationship data indicating a dependent relationship between a noun and a verb is left. In the example sentence above, it contributes to miniaturization.

を残す。この二語間関係データにおける「小型
化」が名詞であり、「寄与している」が動詞であ
る。leave. In this two-word relationship data, "miniaturization" is a noun, and "contributing" is a verb.

このようにして得られた二語間関係データの各
データ要素は二語間関係データベース５に所定の
関連付けで読み出し可能に記憶される。上記デー
タ要素は次のように記憶される。 Each data element of the two-word relationship data thus obtained is readably stored in the two-word relationship database 5 in a predetermined association. The above data elements are stored as follows.

（子型化、寄与、に）なお、上記実施例においては、自然言語文を日
本語文について説明したが、自然言語文は他の言
語であつてもよく、その場合にも名詞と動詞との
間の二語間関係が求められる。(Naturalization, contribution, etc.) In the above example, the natural language sentence was explained as a Japanese sentence, but the natural language sentence may be in another language, and in that case, the combination of a noun and a verb is also possible. The relationship between two words is required.

〔Effect of the invention〕

以上説明したように本発明によれば、二語間関係のうちの最も重要な名詞と動詞と
の係り受け可能な関係を大量の自然言語文から
処理取得し得、従来、人の感覚で行なつていた動詞の格セツ
トを客観的に行ない得る、等の効果が得られ
る。 As explained above, according to the present invention, it is possible to process and acquire the most important two-word relationship, the relationship between a noun and a verb, which can be dependent on a large number of natural language sentences. Effects such as being able to objectively set the case of familiar verbs can be obtained.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す図、第２図は
二語間関係抽出装置の詳細図、第３図は解析木の
例を示す図である。図において、１は自然言語文データベース、２
は単語分割装置、３は日本文辞書、４は二語間関
係抽出装置、５は二語間関係データベースであ
る。 FIG. 1 is a diagram showing an embodiment of the present invention, FIG. 2 is a detailed diagram of a two-word relationship extraction device, and FIG. 3 is a diagram showing an example of an analytic tree. In the figure, 1 is a natural language sentence database, 2
3 is a word segmentation device, 3 is a Japanese sentence dictionary, 4 is a two-word relationship extraction device, and 5 is a two-word relationship database.

Claims

[Claims]

1. A natural language database that stores natural language sentences; a word dividing means that divides the natural language sentences read from the database into words; and a phrase synthesis unit that performs phrase synthesis on the sentences that have been divided into words, and extracts nouns and verbs from the phrases. and a two-word relationship database that stores two-word relationship data indicating a dependent relationship between the extracted noun and the verb. A method for extracting relationships between two words, which is characterized by being constructed in advance.