JPH0336662A

JPH0336662A - Natural language processing method

Info

Publication number: JPH0336662A
Application number: JP1171472A
Authority: JP
Inventors: Yoshitoshi Yamauchi; 佐敏山内; Kazuhiro Inoue; 和博井上; Masaru Nakajima; 勝中島; Nobuyuki Oro; 大呂　延幸
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-07-03
Filing date: 1989-07-03
Publication date: 1991-02-18

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】校延発互ワードプロセッサやＤＰＳの日本語入力などに用いる仮
名漢字変換処理装置、音声認識、機械翻訳や校正支援や
文字認識等における自然言語解析装置などの自然言語処
理装置に関し、例えば、音声合成等における仮名漢字変
換装置等に適用されるものである。[Detailed description of the invention] Natural language processing devices such as proofreading word processors, kana-kanji conversion processing devices used for Japanese input in DPS, natural language analysis devices for speech recognition, machine translation, proofreading support, character recognition, etc. Regarding this, it is applied, for example, to a kana-kanji conversion device in speech synthesis, etc.

丈来挟佐従来のかな漢字変換処理方式としては、最初の文節と後
続の文節を解析して、２文節の読み長を元に、最初の文
節を決定する２文節最長−教法が良く知られているが、
２文節の読み長など形態素的な関係だけでは同音異義語
を正しく判断できないという欠点があった。これについ
て、意味・共起情報を辞書に記述することでこの問題を
解決する方法がいくつか発表されているが、いずれも辞
書とのマツチングにより候補を優先出力するか、辞書と
のマツチした候補に対しである得点を与えるかであり、
多種多様な次元の異なる意味・共起情報を利用して解析
を行うとき（例えば、格関係で判断すると候補１が最高
得点であるが、共起情報で判断すると候補２が最高得点
であるなど）、それぞれの情報に対してどれだけの比重
を与えるかが大きな問題であり、次元の異なる情報を追
加するのが困難であった。As for the conventional kana-kanji conversion processing method, the 2-bunsetsu longest teaching method is well-known, in which the first clause and the following clauses are analyzed and the first clause is determined based on the reading length of the two clauses. Although,
The drawback was that homophones could not be determined correctly based on morphological relationships such as the reading length of two clauses. Regarding this, several methods have been announced to solve this problem by writing meaning/co-occurrence information in a dictionary, but all of them either prioritize outputting candidates by matching with a dictionary, or output candidates that match with a dictionary. It is a matter of giving a certain score to
When performing analysis using different meanings and co-occurrence information in a wide variety of dimensions (for example, candidate 1 has the highest score when judged based on case relationships, but candidate 2 has the highest score when judged based on co-occurrence information, etc.) ), the major problem was how much weight to give to each piece of information, and it was difficult to add information of different dimensions.

また、先の出願として特願昭６３−２８８３３８号があ
るが、１．５文節に尤もらしさを与えて最初の１文節を
決定する方式のもので、この方式では、複数文節間の情
報を扱えないという欠点があった。In addition, there is an earlier application, Japanese Patent Application No. 63-288338, which uses a method that determines the first clause by giving plausibility to 1.5 clauses, and this method cannot handle information between multiple clauses. There was a drawback that there was no

且−一放本発明は、上述のごとき欠点を解析するためになされた
もので、自立部としての尤もらしさを正確に表わすこと
、１．５文節としての尤もらしさを正確に表わすこと、
前後文節の情報を利用して複数文節間での矛盾をなくす
こと、複数の情報に対する比重を簡単に与え、新たな情
報を追加しやすくすること、複数の情報を合成する方法
を実現し、高性能な自然言語解析を行う自然言語処理方
式を提供することを目的としてなされたものである。The present invention was made in order to analyze the above-mentioned drawbacks, and it is necessary to accurately represent the plausibility as an independent part, accurately represent the plausibility as a clause,
By using the information of the preceding and following clauses to eliminate contradictions between multiple clauses, by easily giving weight to multiple pieces of information and making it easier to add new information, and by realizing a method for synthesizing multiple pieces of information, The purpose was to provide a natural language processing method that performs high-performance natural language analysis.

週−一」又本発明は、上記目的を達成するために、（１）単語の情
報を保持する単語辞書を用いて、仮名文字列を漢字仮名
混じり文字列に変換する仮名漢字変換処理、および文字
列に対して文節を判断する処理において、文節の自立部
内の接頭語、自立語、接尾語の意味・共起の関係から、
それを含む候補に尤もらしさを与えること、更には、（
２）文節内の自立部とその前文節の自立部および付属部
との格関係から、それを含む候補に尤もらしさを与える
こと、更には、（３）文節内の自立部とその前文節の自
立部との意味・共起の関係から、それを含む候補に尤も
らしさを与えること、更には、（４）連続する３候補の
自立部および付属部の格関係の尤もらしさを解析に利用
すること、更には、（５）連続する３候補の自立部の意
味・共起の関係の尤もらしさを解析に利用すること、更
には、（６）単語の頻度などを含む前記それぞれの情報
について尤もらしさを与えて、解析に利用すること、更
には、（７）複数文節間で次元の異なる多種多様な情報
を合成して尤度と確度を求めこれに基づいて、最初の１
文節を決定することを特徴としたものである。以下、本
発明の実施例に基づいて説明する。In order to achieve the above object, the present invention provides (1) a kana-kanji conversion process that converts a kana character string into a kanji-kana-mixed character string using a word dictionary that holds word information; In the process of determining a clause from a character string, from the meaning and co-occurrence relationships of prefixes, independent words, and suffixes in the independent part of the clause,
To give plausibility to candidates that include it, and furthermore, (
2) From the case relationship between the independent part within a clause and the independent part and attached part of its preceding clause, give plausibility to candidates that include it; Based on the relationship of meaning and co-occurrence with the independent part, give plausibility to the candidates that include it, and (4) use the likelihood of the case relationship between the independent part and the attached part of three consecutive candidates for analysis. Furthermore, (5) the likelihood of the meaning/co-occurrence relationship of the independent parts of three consecutive candidates is used for analysis; and (6) the likelihood of each of the above information including the frequency of words is Furthermore, (7) combining a wide variety of information with different dimensions between multiple clauses, finding the likelihood and accuracy, and based on this,
It is characterized by determining the phrase. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明による自然言語処理方式の一実施例を
説明するための構成図で、図中、１は入力部、２は解析
処理部、３は候補抽出部、４は辞書検索部、５は辞書、
６は候補評価部、７は合成演算部、８は出力部である。FIG. 1 is a block diagram for explaining an embodiment of the natural language processing method according to the present invention. In the figure, 1 is an input section, 2 is an analysis processing section, 3 is a candidate extraction section, and 4 is a dictionary search section. , 5 is a dictionary,
6 is a candidate evaluation section, 7 is a composition calculation section, and 8 is an output section.

なお、解析を行う以前に既に確定している直前の文節を
確定文節という。文節と次に続く自立部を合わせて１．
５文節という。Note that the immediately preceding phrase that has already been determined before analysis is called a determined phrase. Combine the clause and the following independent part to 1.
It is called 5 clauses.

候補抽出部３は１．５文節単位の候補群を抽出する処理
を行う。候補評価部６は抽出した候補に対して１．５文
節としての尤もらしさを評価して与える。合成演算部７
は、確定文節の尤もらしさと、最初の１．５文節の尤も
らしさと、最初の１文節に続く１．５文節の尤もらしさ
に対して合成演算を行い最尤の候補の最初の１文節を決
定する。The candidate extraction unit 3 performs a process of extracting a group of candidates in units of 1.5 bunsetsu. The candidate evaluation unit 6 evaluates and provides the extracted candidates with respect to their plausibility as 1.5 clauses. Synthesis operation section 7
calculates the first clause of the most likely candidate by performing a composition operation on the likelihood of the definite clause, the likelihood of the first 1.5 clauses, and the likelihood of the 1.5 clauses following the first clause. decide.

第２図は、候補抽出部の具体例を示すもので。FIG. 2 shows a specific example of the candidate extraction section.

解析開始位置から、１．５文節の単位で候補群をすべて
蓄積（候補１、候補２、・・・・・・、候補ｎ）する。All candidate groups are accumulated (candidate 1, candidate 2, . . . , candidate n) in units of 1.5 clauses from the analysis start position.

次にそれぞれの候補の第１文節に続く候補を同様に１．
５文節の単位で蓄積（候補１に対して候補１−１．候補
１−２、・・・・・・候補１−ｍ、候補２に対して候補
２−１、候補２−２・・・・・・）する。Next, select the candidates following the first clause of each candidate in the same manner as 1.
Accumulate in units of 5 clauses (Candidate 1-1 for Candidate 1. Candidate 1-2, ... Candidate 1-m, Candidate 2-1 for Candidate 2, Candidate 2-2, etc.) ···)do.

第３図は、接頭語分類と接尾語分類に対する尤もらしさ
を示す図で、“運動に対する尤もらしさを示す。請求項
１の構成を説明するためのものである。すなわち、接頭
語分類と接尾語分類に対する尤もらしさの情報を使用し
てそれぞれの自立部に尤もらしさを与える。この情報を
それぞれの自立部を含む１．５文節の尤もらしさの情報
に加味する。例として″うんどうかいでは・・・・・・
′″という文字列に対して１．５文節を解析した場合、
′“運動会を含む１．５文節候補に対して０．７′運動
化″を含む↓、５文節候補に対して０．５″運動″を含
む１．５文節候補に対して０．６の尤もらしさを与えて
、１．５文節候補の読み長。FIG. 3 is a diagram showing the likelihood of prefix classification and suffix classification, and shows the likelihood of movement. The likelihood information for the classification is used to give a likelihood to each independent part.This information is added to the likelihood information of the 1.5 clauses that include each independent part.For example, "Undo Kai de...・・・・・・
When 1.5 clauses are analyzed for the string ``'',
' 0.7 for 1.5 clause candidates including "athletic day" ↓, 0.5 for 5 clause candidates including ``exercise'' ↓, 0.6 for 1.5 clause candidates including ``exercise'' Given the plausibility, the reading length of 1.5 clause candidates.

頻度、接続確立度等から得られる尤もらしさと合成して
１．５文節候補の尤もらしさを判断する。The likelihood of the 1.5 phrase candidate is determined by combining it with the likelihood obtained from the frequency, connection probability, etc.

第４図（ａ）、（ｂ）は、前文節の自立部分類と付属部
分類に対する尤もらしさを示す図で、（ａ）は″言った
″（言う）に対する尤もらしさ、（ｂ）はパ行った″（
行く）に対する尤もらしさを示す。請求項２の構成を説
明するためのものである。すなわち、前文節の自立部分
類と付属部分類に対する尤もらしさの情報を使用して１
．５文節に尤もらしさを与える。例として″・・・・・
・いった。′という文字例に対して↓、５文節を解析し
た場合。Figures 4 (a) and (b) are diagrams showing the plausibility of the independent and attached subclauses of the preceding clause, where (a) is the plausibility for ``said'' (say), and (b) is the plausibility of the preceding clause. went"(
show the plausibility of This is for explaining the configuration of claim 2. That is, using the information on the likelihood for the independent and attached subclasses of the preceding clause, 1
．． Give plausibility to 5 clauses. As an example"·····
·said. When ↓ and 5 clauses are analyzed for the example character ''.

″言った。"Said.

″行った。"went.

等の候補が得られる。この時、前文節が、″友達（人間
）が′の場合は、パ言った。″に対して、０．８ ″行った。′に対して、０．７の尤もらしさを与えて、１．５文節候補の読み長、頻度
、接続確立度等から得られる尤もらしさと合成して１．
５文節候補の尤もらしさを判断する。Candidates such as In this case, if the previous clause is ``My friend (human being) is'', then 0.8 is given to ``I went.'', and 0.7 is given to 1. .5 Combining the likelihood obtained from the reading length, frequency, degree of connection probability, etc. of the bunsetsu candidates, 1.
Determine the plausibility of the 5 clause candidates.

第５図は、前文節の自立部分類との意味・共起部情報を
使用した尤もらしさを示す図で、請求項３の構成を説明
するためのものである。すなわち、前文節の自立部分類
との意味・共起部情報を使用して、それぞれの自立部を
含む１．５文節の尤もらしさの情報を加味する。例とし
て″・・・・・・あつい。′という文字例に対して１．
５文節を解析した場合、 “淳い。FIG. 5 is a diagram showing the likelihood using the meaning/co-occurrence part information with the independent subclass of the preceding clause, and is for explaining the structure of claim 3. That is, using the meaning/co-occurrence part information with the independent part class of the previous clause, information on the likelihood of 1.5 clauses including each independent part is added. For example, for the character ``......hot.'', 1.
When analyzing 5 clauses, “Junii.

パ暑い・等の候補が得られる。この時、前文節が、本が″の場合
は、ｔｒＪダい。″に対して、０．７ “著い。”に対して、０．４の尤もらしさを与えて、↓、５文節候補の読み長、頻度
、接続確立度等から得られる尤もらしさと合成して１．
５文節候補の尤もらしさを判断する。You can get candidates such as ``Passatsu''. At this time, if the previous clause is ``book,'' give a likelihood of 0.7 for ``trJ daai.'' and 0.4 for ``remarkable.'' and select ↓, 5 clause candidates. Combined with the likelihood obtained from reading length, frequency, degree of connection probability, etc., 1.
Determine the plausibility of the 5 clause candidates.

第６図は、前後文節の情報と一致する文節候補の組合わ
せに対する尤もらしさを示す図で、請求項４，５の構成
を説明するためのものである。FIG. 6 is a diagram showing the likelihood of a combination of phrase candidates that match the information of the preceding and following phrases, and is for explaining the configurations of claims 4 and 5.

前後文節の情報と一致する文節候補の組み合わせに対し
て尤もらしさを与える。前後の３文節を解析することで
、″東京″と″１毎上″１．１１海上″と″′保険″の
意味・共起のつながりは弱いが、″東京″と″海上″と
″保険″の３文節のつながりが非常に強い場合を表わす
ことができる。A likelihood is given to a combination of phrase candidates that match the information of the preceding and following phrases. By analyzing the three preceding and following clauses, it was found that the meanings and co-occurrence of "Tokyo", "1. This can represent a case where the connection between the three clauses of `` is very strong.

第７図は、前記実施例に関する候補評価部と合成演算部
とのフローチャートを示す。図中、請求項↓〜５で得ら
れた各候補の尤もらしさを証拠上〜５として示しである
。FIG. 7 shows a flowchart of the candidate evaluation section and the composition calculation section regarding the embodiment. In the figure, the likelihood of each candidate obtained in claims ↓ to 5 is shown as evidence-based ~5.

第８図は、最初の１．５文節と、その文節に続く１．５
文節についての実施例を示すフローチャートである。以
下５ｔｅｐに従って説明する。Figure 8 shows the first 1.5 clauses and the 1.5 clauses following that clause.
It is a flowchart which shows an example about a phrase. The following will be explained in 5 steps.

匹吐よ；それぞれの証拠から１．５文節単位で候補の尤
もらしさを解析する。Let's spit it out; analyze the likelihood of each candidate in 1.5 clause units from each piece of evidence.

幻」上」工；正解候補は、蓄積された候補の中で上位ｌ
Ｏ候補の中にあると子側して、上位１０候補のみを蓄積
する。ただし、最尤候補の１．５文節請み長が、読み文
字列の区切り長さ（最初に現われる句読点・記号までの
長さ）と同じ場合は、以下の処理を行なわないので、こ
の時点でのｌｉｋ尤候抽を、第１文節として決定する。The correct answer candidate is the top one among the accumulated candidates.
If it is among the O candidates, only the top 10 candidates are accumulated. However, if the 1.5 clause length of the maximum likelihood candidate is the same as the separation length of the reading character string (the length up to the first punctuation mark/symbol that appears), the following processing will not be performed, so at this point lik likelihood lottery is determined as the first clause.

吐牡又；蓄積した各候補に対して、次文節を２の処理と
同じ１．５文節単位で解析し、上位１０候袖を蓄積する
。For each accumulated candidate, the next clause is analyzed in units of 1.5 clauses, which is the same as in step 2, and the top 10 candidates are accumulated.

！厚狸−±；」二連の処理によって蓄積した及初の１．
５文節、および次に続く１．５文節についてそれぞれの
候補の確からしさを独立の証拠として扱い、証拠１〜５
をＤ−３演算による合成演１→゛処理を行う。! Atsushi-±;” The first 1.
Treat the certainty of each candidate for 5 clauses and the following 1.5 clauses as independent evidence, and use evidence 1 to 5.
is subjected to the synthesis operation 1→' process using the D-3 operation.

７ｉ上述の尤度演算の結果より、尤度が最高値の候補の
第１文節を最初の１文節として決定する。7i Based on the result of the above-described likelihood calculation, the first clause of the candidate with the highest likelihood is determined as the first clause.

これら複数の証拠は、ある候補については証拠１は存在
するが他の証拠は存在しない。またある候補については
証拠２だけが存在するなど、それぞれの候補に対して証
拠が均等でなく、線形的な合成で候補の尤もらしさを正
しく判断するのは非常に困難であった。これについて発
明者らは既に複数の証拠を合成する方法についてデンプ
スターシェーファーの確立論を応用した方法（前述の特
願昭６３−２８８３３８号）を提案している。この方法
を使用する際に１合成する証拠について証拠の重要度を
判断して重み付けを行った後に合成を行うことで、さら
に効果が上がる。Among these multiple pieces of evidence, evidence 1 exists for a certain candidate, but other pieces of evidence do not exist. Furthermore, the evidence is not equal for each candidate, such as only evidence 2 exists for a certain candidate, and it is extremely difficult to correctly judge the likelihood of a candidate using linear synthesis. In this regard, the inventors have already proposed a method (Japanese Patent Application No. 63-288338 mentioned above) that applies Dempster Schaefer's establishment theory to a method of synthesizing a plurality of pieces of evidence. When using this method, the effectiveness can be further increased by determining the importance of the evidence to be synthesized and weighting it before synthesizing it.

劾−−−聚以上の説明から明らかなように、本発明によると以下の
ような効果がある。As is clear from the above description, the present invention has the following effects.

（１）請求項１については、自立部の構成要素の接頭語
、自立語、付属請の、形態素的、意味・共起の関係に基
づいて尤もらしさを与えるので、自立部の尤もらしさを
正確に表わすことができる。(1) Regarding claim 1, plausibility is given based on the morphological, semantic, and co-occurrence relationships of the prefixes, independent words, and adjuncts of the constituent elements of the independent part, so the plausibility of the independent part is determined accurately. can be expressed as

（２）請求項２，３については、前文節との格関係、意
味・共起の関係に基づいて尤もらしさを与えるので、１
．５文節としての尤もらしさを正確に表わすことができ
る。(2) Regarding claims 2 and 3, plausibility is given based on the case relationship with the preceding clause, meaning/co-occurrence relationship, so 1
．． The plausibility of the 5 clauses can be accurately expressed.

（３）請求項４，５については、前後文節の情報を利用
しているので複数文節間での矛盾をなくすことができる
。(3) Regarding claims 4 and 5, since the information on the preceding and following clauses is used, contradictions between a plurality of clauses can be eliminated.

（４）請求項６については、複数の情報に対して情報の
尤もらしさを与えているので、情報の重要度を簡単に操
作することができるとともに、新たな情報を追加しやす
くすることができる。(4) Regarding claim 6, since information plausibility is given to multiple pieces of information, the importance of information can be easily manipulated, and new information can be added easily. .

（５）請求項７については、複数の情報を簡単に合成す
ることができるので、それぞれの情報を独立して与える
ことができるとともに、高性能な自然言語解析を行うこ
とができる。(5) Regarding claim 7, since a plurality of pieces of information can be easily combined, each piece of information can be provided independently, and high-performance natural language analysis can be performed.

[Brief explanation of drawings]

第１図は、本発明による自然言語処理方式の一実施例を
説明するための構成図、第２図は、候補抽出部の具体例
を示す図、第３図は、接頭語分類と接尾語分類に対する
尤もらしさを示す図、第４図（ａ）、（ｂ）は、前文節
の自立部分類と付属部分類に対する尤もらしさを示す図
、第５図は、前文節の自立部分類との意味・共起部情報
を使用した尤もらしさを示す図、第６図は１前後文節の
情報と一致する文節候補の組み合わせに対する尤もらし
さを示す図、第７図は、候補評価部と合成演算部とのフ
ローチャートを示す図、第８図は、１．５文節について
実施した場合のフローチャートを示す図である。１・・・入力部、２・・・解析処理部、３・・・候補抽
出部、４・・・辞書検索部、５・・・辞書、６・・・候
補評価部、７・・合成演算部、８・・・出力部。FIG. 1 is a block diagram for explaining an embodiment of the natural language processing method according to the present invention, FIG. 2 is a diagram showing a specific example of the candidate extraction section, and FIG. 3 is a diagram showing prefix classification and suffix classification. Figures 4 (a) and (b) are diagrams showing the likelihood of classification, and Figure 4 (a) and (b) are diagrams showing the likelihood of the independent and attached parts of the preceding clause. A diagram showing the likelihood using meaning/co-occurrence part information. Figure 6 is a diagram showing the likelihood for a combination of clause candidates that match the information of the first and previous clauses. Figure 7 is a diagram showing the candidate evaluation unit and the composition calculation unit. FIG. 8 is a diagram showing a flowchart when the process is performed for 1.5 clauses. DESCRIPTION OF SYMBOLS 1... Input unit, 2... Analysis processing unit, 3... Candidate extraction unit, 4... Dictionary search unit, 5... Dictionary, 6... Candidate evaluation unit, 7... Synthesis operation Part, 8... Output part.

Claims

[Scope of Claims] 1. In a kana-kanji conversion process that converts a kana character string into a character string containing kanji and kana using a word dictionary that holds word information, and in a process that determines a clause from a character string,
A natural language processing method that is characterized by assigning plausibility to candidates containing prefixes, independent words, and suffixes based on their meanings and co-occurrence relationships within the independent part of a clause. 2. The natural language processing method according to claim 1, wherein a likelihood is given to a candidate containing the independent part in the clause based on the case relationship between the independent part in the clause and the independent part and attached part of the preceding clause. 3. The natural language processing method according to claim 1, characterized in that based on the relationship of meaning and co-occurrence between the independent part within a clause and the independent part of the preceding clause, likelihood is given to candidates containing the independent part. 4. The natural language processing method according to claim 1, wherein the likelihood of a case relationship between the independent part and the attached part of three consecutive candidates is used for analysis. 5. The natural language processing method according to claim 1, wherein the likelihood of the meaning/co-occurrence relationship of the independent parts of three consecutive candidates is used for analysis. 6. A natural language processing method according to at least one of claims 1 to 5, characterized in that each piece of information including the frequency of words is given a likelihood and used for analysis. 7. At least one of claims 1 to 6, characterized in that a wide variety of information with different dimensions is synthesized between a plurality of phrases to obtain likelihood and accuracy, and based on this, the first phrase is determined. The natural language processing method described in Section 1.