JPH02122366A

JPH02122366A - natural language processing device

Info

Publication number: JPH02122366A
Application number: JP63275834A
Authority: JP
Inventors: Yoshitoshi Yamauchi; 佐敏山内; 勝中島; 和博井上
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-10-31
Filing date: 1988-10-31
Publication date: 1990-05-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、自然言語処理装置、より詳細には。[Detailed description of the invention] Technical field The present invention relates to a natural language processing device, and more particularly to a natural language processing device.

機械翻訳１校正支援等における自然言語の形態素解析装
置や、かな漢字変換装置４、漢字かな変換装置などの自
然言語処理装置に関する。The present invention relates to natural language processing devices such as a natural language morphological analysis device, a kana-kanji conversion device 4, a kanji-kana conversion device, etc. in machine translation 1 proofreading support, etc.

従来技術従来、節最長一致法として、２文節のよみの長さが最長
となるような文節候補を出力するものがある。BACKGROUND ART Conventionally, as a longest clause matching method, there is a method that outputs a clause candidate whose reading length is the longest among two clauses.

また、読みの長さ、接続確率、頻度を用いたかな漢字変
換方式（特開昭５９−２２１７３３号公報）は、３つの
パラメータを用いる評価式によって単語の評価値を求め
、評価値の積分値が最大となるような候補単語列を出力
するものである。In addition, the Kana-Kanji conversion method (Japanese Unexamined Patent Publication No. 59-221733) that uses reading length, connection probability, and frequency calculates the evaluation value of a word using an evaluation formula using three parameters, and the integral value of the evaluation value is It outputs the maximum candidate word string.

このように、従来方式では正しい候補を選択するために
、抽出された候補に対して評価を行ない、その評価結果
に基づいて出力候補を決定していた。In this way, in the conventional method, in order to select the correct candidate, the extracted candidates are evaluated, and output candidates are determined based on the evaluation results.

しかし、自然言語にはあいまいさがあるために。But because natural language has ambiguities.

最適な候補を１つの評価により的確に行なうことはでき
ない。自然言語処理をより適切に行なうためには、複数
の評価（証拠）を用いてもっともらしい候補を選択する
必要がある。The optimal candidate cannot be accurately determined by a single evaluation. In order to perform natural language processing more appropriately, it is necessary to select plausible candidates using multiple evaluations (evidence).

また、候補の尤もらしさに対して複数の証拠を与え、そ
れを合成して確度と尤度を計算することで、出力すべき
候補についてより的確な候補を選択する自然言語処理装
置が提案されている。In addition, a natural language processing device has been proposed that selects more accurate candidates to output by providing multiple pieces of evidence regarding the likelihood of a candidate and combining them to calculate accuracy and likelihood. There is.

この方式は、候補の尤もらしさの証拠を基本確率の形で
与え１合成を行ない、あいまいな証拠を尤もらしさとい
う経験的な数値で示すことにより、種類の異なる評価に
よって得られた証拠を容易に合成できるが、このことは
この方式の長所の１つである。しかし、逆に、従来方式
の資産を利用して、この方式を用いた自然言語処理装置
を設計する場合、次の問題点がある。従来方式では候補
の評価値も最も高い候補が出力される。従って、評価値
が必ずしもその候補の尤もらしさを的確に表わしている
必要はなく、抽出された候補に対する評価値の順位が候
補の尤もらしさの順位に一致していれば十分であった。This method provides evidence of the likelihood of a candidate in the form of basic probabilities and performs 1-synthesis, and expresses ambiguous evidence using an empirical numerical value called likelihood, making it easy to combine evidence obtained by different types of evaluations. This is one of the advantages of this method. However, conversely, when designing a natural language processing device using this method using the assets of the conventional method, there are the following problems. In the conventional method, the candidate with the highest candidate evaluation value is output. Therefore, it is not necessary that the evaluation value accurately represents the likelihood of the candidate, and it is sufficient that the evaluation value ranking for the extracted candidate matches the likelihood ranking of the candidate.

従って、候補の尤もらしさを疑似可能性分布あるいは基
本確率といった直接的な形で与えるのではなく、評価値
から基本確率を算出する場合、基本確率が適切に与えら
れない場合があった。Therefore, when calculating the basic probability from the evaluation value instead of giving the likelihood of a candidate directly in the form of a pseudo-likelihood distribution or basic probability, the basic probability may not be given appropriately.

目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
評価値で示される評価結果を適切な基本確率に変換して
証拠として利用することにより、より高性能な自然言語
処理装置を提供することを構　　　成本発明は、上記目的を達成するために、単語情報を保持
する単語辞書と、単語あるいは単語の組み合わせからな
る候補を入力された文字列に基づいて抽出する候補抽出
部と、候補の尤もらしさに対して複数の証拠を与える評
価部と、候補の尤もらしさに対する複数の証拠を合成す
る合成演算部とを有し、合成された証拠を利用して出力
する候補を判断する自然言語処理装置において、尤もら
しさについての１つの評価に対し、評価の値に基づく証
拠と、評価の順位に基づく証拠の２つを与えることを特
徴としたものである。以下、本発明の実施例に基づいて
具体的に説明する。Purpose The present invention was made in view of the above-mentioned circumstances.
In order to achieve the above object, the present invention provides a natural language processing device with higher performance by converting the evaluation result indicated by the evaluation value into an appropriate basic probability and using it as evidence. a word dictionary that holds information; a candidate extractor that extracts candidates consisting of words or combinations of words based on input character strings; an evaluation unit that provides multiple pieces of evidence regarding the likelihood of the candidates; In a natural language processing device that has a synthesis calculation unit that synthesizes multiple pieces of evidence for likelihood, and that uses the synthesized evidence to determine candidates to output, the evaluation value is determined for one evaluation of likelihood. It is characterized by providing two types of evidence: one based on the evaluation and the other based on the ranking of the evaluation. Hereinafter, the present invention will be specifically explained based on examples.

第１図は５本発明の自然言語処理装置の一実施例を説明
するための構成図で、入力装置１により解析すべき文字
列が入力され、候補抽出部２へ送られる。解析すべき文
字列は、かな漢字変換装置であれば読みを示すかな文字
列であり、漢字かな変換装置や、校正支援における形態
素解析装置であれば漢字かな交じりの表記文字列である
。また、機械翻訳であれば、解析しようとする言語が入
力文字列となる。FIG. 1 is a block diagram for explaining one embodiment of the natural language processing device of the present invention. A character string to be analyzed is inputted by an input device 1 and sent to a candidate extraction section 2. In FIG. The character string to be analyzed is a kana character string indicating the pronunciation in the case of a kana-kanji conversion device, and a written character string containing a mixture of kanji and kana in the case of a kanji-kana conversion device or a morphological analysis device for proofreading support. In addition, in the case of machine translation, the input string is the language to be analyzed.

候補抽出部２は入力された文字列を用いて単語辞書３を
検索し、入力された文字列に一致する単語により構成さ
れる候補を抽出する。The candidate extraction unit 2 searches the word dictionary 3 using the input character string and extracts candidates consisting of words that match the input character string.

評価部４では抽出された候補に対して単語間の接続情報
などを用いた評価を行ない、候補の尤もらしさに対する
証拠を複数与える。The evaluation unit 4 evaluates the extracted candidates using connection information between words, etc., and provides multiple pieces of evidence regarding the likelihood of the candidates.

本発明の特徴は、評価の結果が評価値で表わされ、その
評価値が尤もらしさを示す基本確率を直接表わすもので
はない場合に、評価値を基本確率に変換し、証拠として
与える方法にある。この詳細については後述する。The present invention is characterized by a method of converting the evaluation value into a basic probability and providing it as evidence when the evaluation result is expressed as an evaluation value and the evaluation value does not directly represent the basic probability that indicates the likelihood. be. The details will be described later.

合成演算部５では、評価部より与えられた、基本確率で
示される証拠をデンプスターシエーフア−の確率理論に
おける結合規則を用いて合成する。The synthesis calculation section 5 synthesizes the evidence given by the evaluation section, which is represented by the basic probability, using the combination rule in Dempster Schaefer's probability theory.

出力候補決定部６では１合成された証拠より、候補の尤
度や確度を計算し、出力すべき候補を決定する。出力す
べき候補の情報は出力装置に送られ、候補の表示が行な
われる。本発明が機械翻訳などに対して実施される場合
には、出力候補の情報は次段階の処理へ送られる。The output candidate determination unit 6 calculates the likelihood and accuracy of the candidates from the combined evidence and determines the candidates to be output. Information on candidates to be output is sent to an output device, and the candidates are displayed. When the present invention is applied to machine translation, etc., information on output candidates is sent to the next stage of processing.

次に、本発明を用いたかな漢字変換装置を例にして、評
価値を証拠として与える方法について説明する。Next, a method of providing evaluation values as evidence will be explained using a kana-kanji conversion device using the present invention as an example.

本実施例では、文節を単位として候補を抽出し、評価の
ため、さらに抽出された候補に接続するような自立部を
抽出する。そして、抽出された候補の自立部の読みの長
さと接続のしやすさと頻度とから計算される評価値と、
候補に接続する自立部について同様にして計算される評
価値とを基本確率に変換し、候補の尤もらしさの証拠と
して用いる。In this embodiment, candidates are extracted in units of clauses, and for evaluation, independent parts that are connected to the extracted candidates are further extracted. Then, an evaluation value calculated from the reading length of the free-standing part of the extracted candidate, ease of connection, and frequency,
Evaluation values similarly calculated for independent parts connected to the candidate are converted into basic probabilities and used as evidence of the likelihood of the candidate.

この評価値は、より尤もらしい候補の方が大きい値とな
るように与えられるが、尤もらしさは評価値に対して指
数的に大きくなる。また、同一の評価値であっても、評
価値の順位が高い方がより尤もらしい。This evaluation value is given so that the more likely candidate has a larger value, but the likelihood increases exponentially with respect to the evaluation value. Furthermore, even if the evaluation values are the same, it is more likely that the evaluation value has a higher rank.

このような評価値を証拠として利用するため、第２図に
示すように、本実施例のかな漢字変換装置の評価部４は
、評価値をもとにした基本確率と、評価値順位をもとに
した基本確率とをそれぞれ証拠として合成演算部５に与
える。In order to use such evaluation values as evidence, as shown in FIG. The calculated basic probabilities are respectively provided to the synthesis calculation unit 5 as evidence.

第３図は、評価値を基本確率で表わされる証拠へ変換す
るフローチャートを示す。FIG. 3 shows a flowchart for converting evaluation values into evidence expressed in basic probabilities.

まず、各候補の評価値より、評価値に対応する尤もらし
さ（可能性）を記した変換表を参照して、疑似可能性分
布に変換する６本実施例で用いている評価値は、評価値
に対して尤もらしさが指数的に大きくなるので、変換表
をそのように作成することで、適切な証拠が得られるよ
うにしている。First, from the evaluation value of each candidate, convert it to a pseudo-likelihood distribution by referring to a conversion table that describes the likelihood (possibility) corresponding to the evaluation value. Since the likelihood increases exponentially with respect to the value, by creating a conversion table in this way, appropriate evidence can be obtained.

疑似可能性分布はその値の差を用いてさらに基本確率へ
変換される。The pseudo-likelihood distribution is further converted into basic probabilities using the difference in its values.

評価順位から基本確率への変換も同様にして評価順位を
疑似可能性分布へ変換した後に行なっている。本実施例
では、評価順位の変換表は評価方法の尤もらしさを反映
させている。従って、同じように計算される評価値であ
るが、それが候補の自立部に対する評価か候補に接続す
る自立部に対する評価かによって、同じ評価値、同じ評
価順位であっても異なる証拠が与えられることになる。Conversion from evaluation rankings to basic probabilities is also performed after converting evaluation rankings to pseudo-likelihood distributions in the same manner. In this embodiment, the evaluation ranking conversion table reflects the likelihood of the evaluation method. Therefore, although the evaluation values are calculated in the same way, different evidence is given depending on whether the evaluation is for the independent part of the candidate or for the independent part connected to the candidate, even if the evaluation value is the same and the evaluation rank is the same. It turns out.

なお、本実施例では評価値を変換表により疑似可能性分
布に変換することで変換処理速度を向上させているが、
変換式を用いて計算する方法もある。Note that in this example, the conversion processing speed is improved by converting the evaluation value into a pseudo-likelihood distribution using a conversion table.
There is also a method of calculating using a conversion formula.

また、評価値を疑似可能性分布に変換することなく、評
価値、順位を直接基本確率に変換する方法も本発明の範
囲内である。Furthermore, a method of directly converting the evaluation value and rank into a basic probability without converting the evaluation value into a pseudo-likelihood distribution is also within the scope of the present invention.

評価値の算出方法や候補の抽出単位が本実施例と異なる
場合についても１本発明を実施することができる。The present invention can also be implemented in a case where the evaluation value calculation method or candidate extraction unit is different from this embodiment.

また、上記実施例では、自立部の単語の読みの長さと単
語の出現頻度と、単語間の接続のしやすさを別の評価式
で算出し、その値を証拠としてＤ−８演算を行なったが
、単語の読みの長さ、単語の出現頻度、単語間の接続の
しやすさの各項目の値を証拠として、Ｄ−５演算しても
よく、各項目の任意の組みを別の評価式で算出した後、
その値を証拠として、Ｄ−３演算を行なってもよい。In addition, in the above example, the reading length of the words in the free-standing part, the word appearance frequency, and the ease of connection between words are calculated using different evaluation formulas, and the D-8 calculation is performed using the values as evidence. However, the D-5 calculation may be performed using the values of each item such as length of word pronunciation, word frequency, and ease of connection between words as evidence, and any set of each item may be calculated by After calculating with the evaluation formula,
The D-3 calculation may be performed using that value as evidence.

勿論その証拠の使い方としては評価の値に基づく証拠と
、評価の順位に基づく証拠の２つを用いながら、演算し
ていく。Of course, as for how to use the evidence, calculations are performed using two types of evidence: evidence based on the evaluation value and evidence based on the evaluation rank.

効　　　果以上の説明から明らかなように、本発明によれば、１つ
の評価により得られた評価値を、その値と順位の両方を
証拠として与えるので、他の証拠と合成する場合に、評
価値が示す尤もらしさを適切に表わすことになり、高性
能な自然言語処理を行なうことができるという効果があ
る。Effects As is clear from the above explanation, according to the present invention, the evaluation value obtained from one evaluation is given both the value and the ranking as evidence, so when combining it with other evidence, the evaluation value is This appropriately represents the likelihood of a value, and has the effect of enabling high-performance natural language processing.

また、１つの評価につき証拠が２つ与えられることにな
るので、証拠の合成回数が増し、確度が高くなりやすい
。従って、尤度と確度により示される適切な候補とそう
でない候補の区別が、従来に比べ明確に示されることに
なる。Furthermore, since two pieces of evidence are given for each evaluation, the number of times the pieces of evidence are combined increases, which tends to increase the accuracy. Therefore, the distinction between suitable candidates and unsuitable candidates, which is indicated by likelihood and accuracy, is more clearly shown than in the past.

さらに請求項２により、同じ評価要素を用いても、より
厳密に尤もらしさと確からしさを算出し適切な候補を提
示できる効果もある。Furthermore, according to claim 2, even if the same evaluation factors are used, the likelihood and probability can be calculated more strictly and appropriate candidates can be presented.

[Brief explanation of the drawing]

第１図は、本発明の自然言語処理装置の一実施例を示す
構成図、第２図は、本発明の実施例のかな漢字変換装置
の評価部の構成図、第３図は、評価値を基本確率で表わ
される証拠へ変換するためのフローチャートである。１・・・入力装置、２・・・候補抽出部、３・・・単語
辞書、４・・・評価部、５・・・合成演算部、６・・・
出力候補決定部、７・・出力装置。第図第図尤度、確度Fig. 1 is a block diagram showing an embodiment of the natural language processing device of the present invention, Fig. 2 is a block diagram of the evaluation section of the kana-kanji conversion device of the embodiment of the present invention, and Fig. 3 shows the evaluation value. It is a flowchart for converting into evidence expressed by basic probability. DESCRIPTION OF SYMBOLS 1... Input device, 2... Candidate extraction part, 3... Word dictionary, 4... Evaluation part, 5... Synthesis calculation part, 6...
Output candidate determining unit, 7...output device. Diagram Diagram Likelihood, Accuracy

Claims

[Claims] 1. A word dictionary that holds word information, a candidate extraction unit that extracts candidates consisting of words or word combinations based on input character strings, and a plurality of A natural language processing device that has an evaluation unit that provides evidence and a composition calculation unit that combines multiple pieces of evidence regarding the likelihood of a candidate, and uses the combined evidence to determine a candidate to output. For one evaluation,
Evidence based on evaluation value and evidence based on evaluation ranking.
A natural language processing device characterized by providing one. 2. The above evaluation value includes at least two factors: the length of the character string of the written (or pronunciation) word, the frequency of appearance of the word, and the ease of connection between words. 2. The natural language processing device according to claim 1, wherein the natural language processing device uses independent evaluation values.