JPH07129589A

JPH07129589A - Morpheme analyzer

Info

Publication number: JPH07129589A
Application number: JP5187907A
Authority: JP
Inventors: Atsushi Kawai; 淳河井; Eiichiro Sumida; 英一郎隅田; Kura Furuse; 蔵古瀬; Jinichi Murakami; 仁一村上
Original assignee: ATR ONSEI HONYAKU TSUSHIN KENKYUSHO KK; ATR Interpreting Telecommunications Research Laboratories
Current assignee: ATR ONSEI HONYAKU TSUSHIN KENKYUSHO KK; ATR Interpreting Telecommunications Research Laboratories
Priority date: 1993-07-29
Filing date: 1993-07-29
Publication date: 1995-05-19
Anticipated expiration: 2015-06-05
Also published as: JP3048101B2

Abstract

PURPOSE:To obtain partial morpheme linking likelihood through the use of a statistic morpheme linking frequency and to effectively execute morpheme analytic processing based on the value. CONSTITUTION:A morpheme analytic processing module 1 repeatedly calls a morpheme dictionary consulting module 2 and a morpheme linking likelihood calculation module 4 to an inputted natural language sentence so as to successively generate the sequence of proper morphemes. The morpheme dictionary consulting module 2 reads a morpheme to be a candidate from a morpheme dictionary 3. The morpheme linking likelihood calculation module 4 calculates morpheme linking likelihood through the use of morpheme linking frequency data 5.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は形態素解析装置に関し
て、情報処理分野で用いられ、特に、かな漢字変換や機
械翻訳や情報検索などの自然言語を処理するために形態
素の連接頻度を利用して処理を行うような形態素解析装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a morphological analyzer, which is used in the field of information processing, and particularly, to processing natural language such as kana-kanji conversion, machine translation, information retrieval, etc. by using the morpheme connection frequency. The present invention relates to a morphological analysis device for performing.

【０００２】[0002]

【従来の技術】従来の形態素解析装置では、形態素辞書
から得られた複数の形態素候補に対して、形態素が連接
可能であるかどうかを記した連接テーブル、または文法
などによって連接チェックを行うことにより形態素候補
数を絞っていた。さらに、処理の流れの制御や出力結果
の決定に関しては、単一の形態素の出現頻度を優先した
り、出現形のより長い形態素を優先する最長一致法など
のヒューリスティクスを用いたりしていた。2. Description of the Related Art In a conventional morpheme analysis device, a morpheme candidate obtained from a morpheme dictionary is subjected to a concatenation check by a concatenation table indicating whether or not a morpheme can be concatenated, or by a grammar. The number of morpheme candidates was narrowed down. Furthermore, regarding the control of the processing flow and the determination of the output result, heuristics such as the longest matching method that prioritizes the appearance frequency of a single morpheme or the morpheme having a longer appearance form have been used.

【０００３】[0003]

【発明が解決しようとする課題】ところが、従来の方法
では、連接テーブルまたは文法を利用して解析を行ってい
るため、そこから得られる情報は形態素が連接可能かど
うかということだけであり、そのチェックを通過しただ
けでは形態素候補数の十分な絞り込みができず、候補数
の爆発が生じる。このことは多くの曖昧性を残し、処理
速度の低下または処理結果の信頼性の低下を引き起こ
す。However, in the conventional method, since the analysis is performed using the concatenation table or the grammar, the information obtained therefrom is only whether or not the morphemes can be concatenated. The number of morpheme candidates cannot be sufficiently narrowed down just by passing the check, which causes an explosion in the number of candidates. This leaves a lot of ambiguity, which causes a decrease in processing speed or a decrease in reliability of the processing result.

【０００４】形態素の出現頻度のみを用いているた
め、前後の形態素との連接に関する情報が得られず、誤
った結果を生じやすい。Since only the appearance frequency of morphemes is used, information about the connection with the preceding and following morphemes cannot be obtained, and an erroneous result is likely to occur.

【０００５】処理の過程で生じる分岐の優先順位、
または最終的な出力候補の尤度を与えるための最長一致
法などの単語の連接の個別性を無視した一般的ヒューリ
スティクスに頼らなければならない。この方法では、シ
ステムの性能を向上させるための調整が困難である。こ
のように、従来の方法は多くの問題点を抱えている。The priority of the branches that occur in the course of processing,
Or one must resort to general heuristics that ignore the individuality of word concatenation, such as the longest match method to give the likelihood of the final output candidate. This method is difficult to tune to improve system performance. As described above, the conventional method has many problems.

【０００６】それゆえに、この発明の主たる目的は、形
態素連接頻度を用いて、部分的形態素連接尤度を求め、
その値に基づいて効果的に形態素解析処理を行うことが
できるような形態素解析装置を提供することである。Therefore, the main object of the present invention is to obtain a partial morpheme connection likelihood using the morpheme connection frequency,
It is an object of the present invention to provide a morphological analysis device that can effectively perform morphological analysis processing based on the value.

【０００７】[0007]

【課題を解決するための手段】請求項１に係る発明は、
入力された自然言語文を形態素ごとに分割し、その結果
を出力する形態素解析装置であって、形態素辞書と連接
頻度データとを用いて適切な形態素を選出する。The invention according to claim 1 is
A morphological analyzer that divides an input natural language sentence for each morpheme and outputs the result, and selects an appropriate morpheme using a morpheme dictionary and concatenation frequency data.

【０００８】請求項２の発明では、請求項１の形態素辞
書は、形態素の出現形から当該形態素の他の形態素情報
を引くために用いる。In the invention of claim 2, the morpheme dictionary of claim 1 is used to subtract other morpheme information of the morpheme from the appearance form of the morpheme.

【０００９】請求項３に係る発明では、請求項１の連接
頻度データは、形態素情報の一部、あるいは全部に着目
し、互いに連接する形態素の頻度を予め算出する。In the invention according to claim 3, in the concatenation frequency data of claim 1, attention is paid to a part or all of the morpheme information, and the frequency of the morphemes concatenated with each other is calculated in advance.

【００１０】請求項４に係る発明では、請求項２の形態
素辞書によって得られる複数の形態素候補に対して、連
接頻度を用いて部分的形態素連接尤度を求める。In the invention according to claim 4, the partial morpheme connection likelihood is calculated using the connection frequency for a plurality of morpheme candidates obtained by the morpheme dictionary of claim 2.

【００１１】請求項５に係る発明は、処理が分岐する場
合の優先順位の決定や、形態素列候補数の爆発の防止な
ど、処理過程において常に存在する課題に対して、請求
項４で求めた部分的形態素連接尤度を利用して処理の効
率化と適正化を図る。The invention according to claim 5 has determined in claim 4 a problem which is always present in the process, such as determining the priority order when the process branches and preventing the explosion of the number of morpheme string candidates. Utilizing partial morpheme concatenated likelihood to improve the efficiency and optimization of processing.

【００１２】請求項６に係る発明では、最終的に得られ
た形態素列候補から最尤のものを選択するために、請求
項４で得られた部分的形態素連接尤度を文全体にわたっ
て累積した値を出力結果の尤度とする。In the invention according to claim 6, in order to select the maximum likelihood one from the finally obtained morpheme sequence candidates, the partial morpheme connection likelihoods obtained in claim 4 are accumulated over the entire sentence. The value is the likelihood of the output result.

【００１３】[0013]

【作用】この発明に係る形態素解析装置は、形態素連接
頻度を用いることにより、連接の可否のみならず連接尤
度を得て、ヒューリスティクスを用いることなく形態素
候補の絞り込みができ、また出力結果に対して信頼性の
尺度となる尤度を与えることができる。The morpheme analyzer according to the present invention obtains not only the possibility of concatenation but also the concatenation likelihood by using the morpheme concatenation frequency, and it is possible to narrow down the morpheme candidates without using heuristics. On the other hand, it is possible to give a likelihood as a measure of reliability.

【００１４】[0014]

【実施例】図１はこの発明を日本語文に適用した一実施
例の概略ブロック図である。図１において、入力文は形
態素解析処理モジュール１に与えられ、この形態素解析
処理モジュール１は形態素辞書引きモジュール２によっ
て形態素辞書３から形態素を読出すとともに、形態素連
接尤度計算モジュール４によって形態素連接頻度データ
５を読出しながら処理を実行する。1 is a schematic block diagram of an embodiment in which the present invention is applied to a Japanese sentence. In FIG. 1, an input sentence is given to a morpheme analysis processing module 1. This morpheme analysis processing module 1 reads a morpheme from a morpheme dictionary 3 by a morpheme dictionary drawing module 2 and a morpheme connection frequency calculation module 4 by a morpheme connection likelihood calculation module 4. The process is executed while reading the data 5.

【００１５】図２は図１に示した形態素辞書の一例を示
す図である。図２において形態素辞書は、形態素の出現
形から当該形態素の他の形態素情報（標準形，品詞，活
用形，活用型など）を引くための辞書である。FIG. 2 is a diagram showing an example of the morpheme dictionary shown in FIG. In FIG. 2, the morpheme dictionary is a dictionary for subtracting other morpheme information (standard form, part of speech, inflectional form, inflectional form, etc.) of the morpheme from the appearance form of the morpheme.

【００１６】図３は形態素連接頻度データの一例を示す
図である。形態素連接頻度データとは、連接するｎ個の
形態素の組の出現頻度（ｎグラム頻度という。ここで連
接数ｎは１以上の整数）である。ｎ＝１の時は各形態素
の出現頻度と等価である。FIG. 3 is a diagram showing an example of morpheme connection frequency data. The morpheme concatenation frequency data is the appearance frequency of a set of n contiguous morphemes (referred to as n-gram frequency, where the concatenation number n is an integer of 1 or more). When n = 1, it is equivalent to the appearance frequency of each morpheme.

【００１７】この実施例では、形態素連接頻度として、
すべての形態素情報に関して連接する場合の連接頻度に
対してｎ＝１の場合（以下、単語モノグラム頻度とい
う）、およびｎ＝２の場合（以下、単語バイグラム頻度
という）と、品詞，活用形，活用型の３項目に関して連
接する場合の連接頻度に対してｎ＝１の場合（以下、品
詞モノグラム頻度という）、およびｎ＝２の場合（以
下、品詞バイグラム頻度という）を考えた計４種類の連
接頻度を想定している。In this embodiment, as the morpheme connection frequency,
When n = 1 (hereinafter referred to as the word monogram frequency) and n = 2 (hereinafter referred to as the word bigram frequency) with respect to the connection frequency in the case of connecting all morpheme information, the part-of-speech, inflectional form, and utilization A total of four types of concatenation, considering n = 1 (hereinafter referred to as part-of-speech monogram frequency) and n = 2 (hereinafter referred to as part-of-speech bigram frequency) with respect to the concatenation frequency when three types of items are connected. Frequency is assumed.

【００１８】以下に、連接頻度を用いた形態素解析処理
装置の処理手続について詳細に説明する。The processing procedure of the morphological analysis processing apparatus using the connection frequency will be described in detail below.

【００１９】（０）初め、次の（１）に出てくる形態
素列候補のダミーに文頭形態素をセットし、同じく未処
理部分自然言語文に入力文をセットし、累積形態素連接
尤度Ｙｐ＝１に初期化した状態から（１）の処理に入
る。ここで、ダミーの文頭形態素とは、出力結果には現
れないが、文頭と初めの形態素との連接尤度計算のため
にだけ使われる仮想的な形態素である。(0) First, a sentence head morpheme is set in a dummy of a morpheme string candidate that appears in the next (1), an input sentence is also set in an unprocessed partial natural language sentence, and a cumulative morpheme connection likelihood Yp = The process of (1) starts from the state of being initialized to 1. Here, the dummy sentence head morpheme is a virtual morpheme that does not appear in the output result, but is used only for the concatenation likelihood calculation between the sentence head and the first morpheme.

【００２０】（１）形態素列候補に続く未処理部分自
然言語文の先頭部分文字列に一致する形態素候補を形態
素辞書から引く。ここで形態素が一つも辞書引きできな
かった場合は、その形態素列候補は失敗とし、その形態
素列候補を取り除く。(1) A morpheme candidate that matches the leading partial character string of the unprocessed partial natural language sentence following the morpheme string candidate is subtracted from the morpheme dictionary. If no morpheme can be looked up in the dictionary, the morpheme sequence candidate is considered to have failed, and the morpheme sequence candidate is removed.

【００２１】（２）形態素列候補の最後の形態素ｃ１
のモノグラム頻度およびｃ１と（１）で得られた一般に
複数の形態素候補のそれぞれｃ２との間のバイグラム頻
度を用いて、次の第（１）式〜第（３）式で示される形
態素連接尤度Ｙｃを求める。(2) The last morpheme c1 of the morpheme string candidate
, And the bigram frequency between c1 and each of the plurality of morpheme candidates generally obtained in (1), the morpheme concatenated likelihood shown in the following equations (1) to (3). Find Yc.

【００２２】形態素連接尤度Ｙｃ＝（Ｃ・Ｗｃ＋Ｐｃ）／（Ｃ＋１）…（１）単語連接尤度Ｗｃ＝Ｗｂ（ｃ１，ｃ２）／Ｗｍ（ｃ１）…（２）品詞連接尤度Ｐｃ＝Ｐｂ（ｃ１，ｃ２）／Ｐｍ（ｃ１）…（３）ただし、Ｃ：品詞連接尤度に対する単語連接尤
度の重み係数Ｗｍ（ｃ１）：形態素ｃ１の単語モノグラム頻度Ｗｂ（ｃ１，ｃ２）：形態素ｃ１，ｃ２間の単語バイグ
ラム頻度Ｐｍ（ｃ１）：形態素ｃ１の品詞モノグラム頻度Ｐｂ（ｃ１，ｃ２）：形態素ｃ１，ｃ２間の品詞バイグ
ラム頻度（３）Ｙｃが０の場合、連接しないので失敗とし、そ
の形態素列候補を取り除く。Ｙｃ＞０の場合、連接可能
として形態素列候補に当該形態素を付加し、未処理部分
自然言語文から当該形態素を除く。さらに、形態素連接
尤度Ｙｃを累積形態素連接尤度Ｙｐに累積する。Morphological concatenated likelihood Yc = (C · Wc + Pc) / (C + 1) ... (1) Word concatenated likelihood Wc = Wb (c1, c2) / Wm (c1) ... (2) Part-of-speech concatenated likelihood Pc = Pb (C1, c2) / Pm (c1) ... (3) where C: weighting coefficient of word concatenation likelihood to part-of-speech concatenation likelihood Wm (c1): word monogram frequency of morpheme c1 Wb (c1, c2): morpheme c1 , C2 word bigram frequency Pm (c1): part-of-speech monogram frequency of morpheme c1 Pb (c1, c2): part-of-speech bigram frequency between morphemes c1 and c2 (3) If Yc is 0, it is regarded as a failure because it is not connected Remove morpheme string candidates. When Yc> 0, the morpheme is added to the morpheme string candidate as concatenable and the morpheme is removed from the unprocessed partial natural language sentence. Further, the morpheme connected likelihood Yc is accumulated in the accumulated morpheme connected likelihood Yp.

【００２３】（４）累積形態素連接尤度Ｙｐにより形
態素列候補に優先順位をつけ、候補数が多すぎる場合に
は優先順位の低いものから削除し、適当な数にする。(4) The morpheme string candidates are prioritized according to the cumulative morpheme concatenated likelihood Yp, and when the number of candidates is too large, the ones with the lowest priorities are deleted to obtain an appropriate number.

【００２４】（５）（４）で残った一般に複数の形態
素列候補のそれぞれに対して、未処理の部分自然言語文
に対して、次の終了条件を満たすまで（１）〜（４）を
繰返す。(5) For each of the plurality of morpheme sequence candidates generally remaining in (4), (1) to (4) are applied to the unprocessed partial natural language sentence until the following end condition is satisfied. Repeat.

【００２５】＜終了条件１＞全ての形態素列候補に対
して、未処理の部分自然言語文が存在しなくなり処理が
完全に終了すること。<End condition 1> For all morpheme string candidates, there is no unprocessed partial natural language sentence, and the process is completely completed.

【００２６】＜終了条件２＞予め設定した数の形態素
列候補に対して未処理の部分自然言語文が存在しなくな
り処理が部分的に終了すること。<Termination condition 2> There is no unprocessed partial natural language sentence for a preset number of morpheme string candidates, and the process is partially completed.

【００２７】＜終了条件３＞全ての形態素列候補が失
敗する場合、換言すれば次の形態素が辞書引きできない
かあるいはＹｃが０となって解析を進めるのに必要な形
態素列候補が存在しなくなった場合。<End condition 3> When all the morpheme string candidates fail, in other words, the next morpheme cannot be looked up in the dictionary, or Yc becomes 0, and there is no morpheme string candidate necessary for proceeding with the analysis. If

【００２８】（６）終了条件を満たした場合、最終的
に得られた累積形態素連接尤度Ｙｐの値を尤度として、
未処理の部分自然言語文がなくなった形態素列候補尤度
付き形態素列候補として出力する。(6) When the termination condition is satisfied, the value of the finally obtained cumulative morpheme concatenated likelihood Yp is set as the likelihood.
The unprocessed partial natural language sentence is output as a morpheme string candidate with likelihood likelihood morpheme string candidate.

【００２９】図４はこの発明の一実施例の具体的な動作
を説明するためのフローチャートである。この図４で
は、連接頻度を用いた形態素解析の基本的な処理の流れ
を次の（入力文１）を例として示している。FIG. 4 is a flow chart for explaining the specific operation of the embodiment of the present invention. In FIG. 4, the basic processing flow of morphological analysis using the concatenation frequency is shown by taking the following (input sentence 1) as an example.

【００３０】“こちらは事務局です”…（入力文１）まず、“こちら”，“は”に対して登録されている形態
素がそれぞれ一つずつであったとする。このとき、形態
素列候補およびそれに対する未処理部分自然言語文（以
下、未処理部分と称する）と累積形態素連接尤度はたと
えば次のようになる。“This is the secretariat” (input sentence 1) First, it is assumed that there is one morpheme registered for each of “here” and “wa”. At this time, the morpheme string candidate, the unprocessed partial natural language sentence (hereinafter, referred to as an unprocessed part) and the cumulative morpheme connected likelihood are as follows, for example.

【００３１】累積形態素連接尤度Ｙｐ形態素列候補未処理部分（１） 0.60 （こちらは） “事務局です” （１）の未処理部分に対する形態素辞書引きにより、次
の複数の形態素候補が得られたとする。Cumulative morpheme connected likelihood Yp morpheme sequence candidate unprocessed part (1) 0.60 (here) “It is the secretariat” (1) By morpheme dictionary lookup for the unprocessed part, the following multiple morpheme candidates are obtained. Suppose

【００３２】事（0.00）事務（0.50）事務
局（0.80）ただし、括弧内の数値は１つ前の形態素“は”との形態
素連接尤度Ｙｃであり、０．００は連接しないことを意
味する。Things (0.00) Office work (0.50) Secretariat (0.80) However, the numerical value in the parentheses is the morpheme connection likelihood Yc with the immediately preceding morpheme "wa", and 0.00 means not connected. To do.

【００３３】すると、形態素列候補とは未処理部分は、
それぞれ次のように複数の組合せに分裂する。また、累
積形態素連接尤度が０．００になったものは失敗として
以降の処理から除外する。Then, the unprocessed part of the morpheme string candidate is
Each is divided into multiple combinations as follows. In addition, if the cumulative morpheme connection likelihood becomes 0.00, it is regarded as a failure and is excluded from the subsequent processing.

【００３４】累積形態素連接尤度Ｙｐ形態素列候補未処理部分（２ａ）0.60×0.00＝0.00 （こちらは事） “務局です”→失敗（２ｂ）0.60×0.50＝0.30 （こちらは事務） “局です” （２ｃ）0.60×0.80＝0.48 （こちらは事務局） “です” ここで、（２ａ）は累積形態素連接尤度は０．００とな
り失敗する。Cumulative morpheme connected likelihood Yp Unprocessed part of morpheme sequence candidate (2a) 0.60 × 0.00 = 0.00 (this is the case) “It is a bureau” → Failed (2b) 0.60 × 0.50 = 0.30 (this is clerical) “Station (2c) 0.60 × 0.80 = 0.48 (this is the secretariat) “Yes” where (2a) fails with a cumulative morpheme concatenated likelihood of 0.00.

【００３５】さらに、（２ｂ），（２ｃ）の各未処理部
分に対する形態素辞書引きにより、次の複数の形態素候
補が得られたとする。（２ｂ）に対して局（0.50）（２ｃ）に対してで（0.20）です（0.80）ただし、括弧内の数値はそれぞれの１つ前の形態素との
形態素連接尤度Ｙｃである。すると、形態素列候補と未
処理部分はそれぞれ次のようになる。Further, it is assumed that the following plural morpheme candidates are obtained by the morpheme dictionary lookup for the unprocessed portions of (2b) and (2c). For (2b), it is (0.50) for (2c) and (0.20) for (0.80) However, the value in parentheses is the morpheme concatenated likelihood Yc with the previous morpheme. Then, the morpheme string candidate and the unprocessed part are as follows.

【００３６】累積形態素連接尤度Ｙｐ形態素列候補未処理部分（３ｂ）0.30×0.50＝0.15 （こちらは事務局） “です” （３ｃ）0.48×0.20＝0.09 （こちらは事務局で） “す” （３ｄ）0.48×0.80＝0.33 （こちらは事務局です） …→終了以下、同様に終了条件を満たすまで処理を繰返す。Cumulative morpheme connected likelihood Yp Unprocessed part of morpheme sequence candidate (3b) 0.30 × 0.50 = 0.15 (this is the secretariat) “is” (3c) 0.48 × 0.20 = 0.09 (this is the secretariat) “su” (3d) 0.48 × 0.80 = 0.33 (This is the secretariat)… → End Repeat the same process until the end condition is satisfied.

【００３７】この例でもわかるとおり、一般に形態素解
析処理では、結果の候補数が処理が進むにつれて増大す
る傾向がある。しかし、連接頻度を用いる手法では、処
理の各時点で累積形態素連接尤度が得られるので、必要
があれば可能性の低い候補を削除することができる。As can be seen from this example, generally, in the morphological analysis process, the number of result candidates tends to increase as the process progresses. However, in the method using the concatenation frequency, the cumulative morpheme concatenated likelihood is obtained at each time point of the processing, so that a candidate with a low possibility can be deleted if necessary.

【００３８】[0038]

【発明の効果】以上のように、この発明によれば形態素
解析処理に対して、形態素連接頻度情報を用いることに
より、処理結果の精度を向上させることが可能となる。
また、一般に処理速度の向上と処理結果の信頼性の向上
の間にはトレード・オフの関係が生じるが、これに対し
ても処理途中での形態素列候補数を調整することによ
り、必要に応じて適度なバランスにすることが可能とな
る。As described above, according to the present invention, by using the morpheme concatenation frequency information for the morpheme analysis processing, it is possible to improve the accuracy of the processing result.
Generally, there is a trade-off relationship between the improvement of the processing speed and the improvement of the reliability of the processing result. Against this, by adjusting the number of morpheme string candidates in the middle of the processing, it is possible to adjust as necessary. It is possible to achieve an appropriate balance.

[Brief description of drawings]

【図１】この発明の一実施例の概略ブロック図である。FIG. 1 is a schematic block diagram of an embodiment of the present invention.

【図２】形態素辞書の具体的な一例を示した図である。FIG. 2 is a diagram showing a specific example of a morpheme dictionary.

【図３】形態素連接頻度データの具体的な一例を示した
図である。FIG. 3 is a diagram showing a specific example of morpheme connection frequency data.

【図４】連接頻度を用いた形態素解析の基本的な処理の
流れを示すフローチャートである。FIG. 4 is a flowchart showing a basic processing flow of morphological analysis using a concatenation frequency.

[Explanation of symbols]

１形態素解析処理モジュール２形態素辞書引きモジュール３形態素辞書４形態素連接尤度計算モジュール５形態素連接頻度データ 1 morpheme analysis processing module 2 morpheme dictionary lookup module 3 morpheme dictionary 4 morpheme connection likelihood calculation module 5 morpheme connection frequency data

───────────────────────────────────────────────────── フロントページの続き (72)発明者隅田英一郎京都府相楽郡精華町光台２丁目２番地株式会社エイ・ティ・アール音声翻訳通信研究所内 (72)発明者古瀬蔵京都府相楽郡精華町光台２丁目２番地株式会社エイ・ティ・アール音声翻訳通信研究所内 (72)発明者村上仁一京都府相楽郡精華町光台２丁目２番地株式会社エイ・ティ・アール音声翻訳通信研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Eiichiro Sumida, 2-chome, Kodai, Seika-cho, Soraku-gun, Kyoto Prefecture ATTR Voice Translation Research Laboratory (72) Inventor, Kura Furuse Soraku Prefecture 2-2 Kodai, Seika-cho, Gunma Incorporated, E.T.R. Co., Ltd. Voice Translation Research Laboratory (72) Inventor, Shinichi Murakami 2-2, Kodai, Seika-cho, Soraku-gun, Kyoto E.T. R Speech Translation and Communication Research Laboratory

Claims

[Claims]

1. A morphological analyzer that divides an input natural language sentence for each morpheme and outputs the result, wherein an appropriate morpheme is selected using a morpheme dictionary and concatenation frequency data. A morphological analyzer.

2. The morpheme analysis apparatus according to claim 1, wherein the morpheme dictionary is used to subtract other morpheme information of the morpheme from the appearance form of the morpheme.

3. The morpheme analysis apparatus according to claim 1, wherein the connection frequency data is obtained by paying attention to a part or all of the morpheme information and calculating in advance the frequency of morphemes connected to each other.

4. The morpheme analysis apparatus according to claim 1, wherein a partial morpheme connection likelihood is calculated using the connection frequency for a plurality of morpheme candidates obtained by the morpheme dictionary.

5. The partial morpheme concatenation likelihood is used for a task that is always present in a processing process, such as determining a priority order when processing branches and preventing explosion of the number of morpheme string candidates. The morphological analysis device according to claim 4, wherein the efficiency and the optimization of the processing are achieved.

6. A value obtained by accumulating the partial morpheme concatenated likelihood over the entire sentence is used as the likelihood of the output result in order to select the maximum likelihood morpheme sequence candidate finally obtained. The morphological analyzer according to claim 4.