JPH07129589A - Morpheme analyzer - Google Patents
Morpheme analyzerInfo
- Publication number
- JPH07129589A JPH07129589A JP5187907A JP18790793A JPH07129589A JP H07129589 A JPH07129589 A JP H07129589A JP 5187907 A JP5187907 A JP 5187907A JP 18790793 A JP18790793 A JP 18790793A JP H07129589 A JPH07129589 A JP H07129589A
- Authority
- JP
- Japan
- Prior art keywords
- morpheme
- likelihood
- frequency
- dictionary
- concatenation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims description 15
- 230000000877 morphologic effect Effects 0.000 claims description 14
- 238000004880 explosion Methods 0.000 claims description 3
- 238000007476 Maximum Likelihood Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 6
- 230000001186 cumulative effect Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000013519 translation Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 101000772194 Homo sapiens Transthyretin Proteins 0.000 description 1
- 102100029290 Transthyretin Human genes 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
【0001】[0001]
【産業上の利用分野】この発明は形態素解析装置に関し
て、情報処理分野で用いられ、特に、かな漢字変換や機
械翻訳や情報検索などの自然言語を処理するために形態
素の連接頻度を利用して処理を行うような形態素解析装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a morphological analyzer, which is used in the field of information processing, and particularly, to processing natural language such as kana-kanji conversion, machine translation, information retrieval, etc. by using the morpheme connection frequency. The present invention relates to a morphological analysis device for performing.
【0002】[0002]
【従来の技術】従来の形態素解析装置では、形態素辞書
から得られた複数の形態素候補に対して、形態素が連接
可能であるかどうかを記した連接テーブル、または文法
などによって連接チェックを行うことにより形態素候補
数を絞っていた。さらに、処理の流れの制御や出力結果
の決定に関しては、単一の形態素の出現頻度を優先した
り、出現形のより長い形態素を優先する最長一致法など
のヒューリスティクスを用いたりしていた。2. Description of the Related Art In a conventional morpheme analysis device, a morpheme candidate obtained from a morpheme dictionary is subjected to a concatenation check by a concatenation table indicating whether or not a morpheme can be concatenated, or by a grammar. The number of morpheme candidates was narrowed down. Furthermore, regarding the control of the processing flow and the determination of the output result, heuristics such as the longest matching method that prioritizes the appearance frequency of a single morpheme or the morpheme having a longer appearance form have been used.
【0003】[0003]
【発明が解決しようとする課題】ところが、従来の方法
では、 連接テーブルまたは文法を利用して解析を行ってい
るため、そこから得られる情報は形態素が連接可能かど
うかということだけであり、そのチェックを通過しただ
けでは形態素候補数の十分な絞り込みができず、候補数
の爆発が生じる。このことは多くの曖昧性を残し、処理
速度の低下または処理結果の信頼性の低下を引き起こ
す。However, in the conventional method, since the analysis is performed using the concatenation table or the grammar, the information obtained therefrom is only whether or not the morphemes can be concatenated. The number of morpheme candidates cannot be sufficiently narrowed down just by passing the check, which causes an explosion in the number of candidates. This leaves a lot of ambiguity, which causes a decrease in processing speed or a decrease in reliability of the processing result.
【0004】 形態素の出現頻度のみを用いているた
め、前後の形態素との連接に関する情報が得られず、誤
った結果を生じやすい。Since only the appearance frequency of morphemes is used, information about the connection with the preceding and following morphemes cannot be obtained, and an erroneous result is likely to occur.
【0005】 処理の過程で生じる分岐の優先順位、
または最終的な出力候補の尤度を与えるための最長一致
法などの単語の連接の個別性を無視した一般的ヒューリ
スティクスに頼らなければならない。この方法では、シ
ステムの性能を向上させるための調整が困難である。こ
のように、従来の方法は多くの問題点を抱えている。The priority of the branches that occur in the course of processing,
Or one must resort to general heuristics that ignore the individuality of word concatenation, such as the longest match method to give the likelihood of the final output candidate. This method is difficult to tune to improve system performance. As described above, the conventional method has many problems.
【0006】それゆえに、この発明の主たる目的は、形
態素連接頻度を用いて、部分的形態素連接尤度を求め、
その値に基づいて効果的に形態素解析処理を行うことが
できるような形態素解析装置を提供することである。Therefore, the main object of the present invention is to obtain a partial morpheme connection likelihood using the morpheme connection frequency,
It is an object of the present invention to provide a morphological analysis device that can effectively perform morphological analysis processing based on the value.
【0007】[0007]
【課題を解決するための手段】請求項1に係る発明は、
入力された自然言語文を形態素ごとに分割し、その結果
を出力する形態素解析装置であって、形態素辞書と連接
頻度データとを用いて適切な形態素を選出する。The invention according to claim 1 is
A morphological analyzer that divides an input natural language sentence for each morpheme and outputs the result, and selects an appropriate morpheme using a morpheme dictionary and concatenation frequency data.
【0008】請求項2の発明では、請求項1の形態素辞
書は、形態素の出現形から当該形態素の他の形態素情報
を引くために用いる。In the invention of claim 2, the morpheme dictionary of claim 1 is used to subtract other morpheme information of the morpheme from the appearance form of the morpheme.
【0009】請求項3に係る発明では、請求項1の連接
頻度データは、形態素情報の一部、あるいは全部に着目
し、互いに連接する形態素の頻度を予め算出する。In the invention according to claim 3, in the concatenation frequency data of claim 1, attention is paid to a part or all of the morpheme information, and the frequency of the morphemes concatenated with each other is calculated in advance.
【0010】請求項4に係る発明では、請求項2の形態
素辞書によって得られる複数の形態素候補に対して、連
接頻度を用いて部分的形態素連接尤度を求める。In the invention according to claim 4, the partial morpheme connection likelihood is calculated using the connection frequency for a plurality of morpheme candidates obtained by the morpheme dictionary of claim 2.
【0011】請求項5に係る発明は、処理が分岐する場
合の優先順位の決定や、形態素列候補数の爆発の防止な
ど、処理過程において常に存在する課題に対して、請求
項4で求めた部分的形態素連接尤度を利用して処理の効
率化と適正化を図る。The invention according to claim 5 has determined in claim 4 a problem which is always present in the process, such as determining the priority order when the process branches and preventing the explosion of the number of morpheme string candidates. Utilizing partial morpheme concatenated likelihood to improve the efficiency and optimization of processing.
【0012】請求項6に係る発明では、最終的に得られ
た形態素列候補から最尤のものを選択するために、請求
項4で得られた部分的形態素連接尤度を文全体にわたっ
て累積した値を出力結果の尤度とする。In the invention according to claim 6, in order to select the maximum likelihood one from the finally obtained morpheme sequence candidates, the partial morpheme connection likelihoods obtained in claim 4 are accumulated over the entire sentence. The value is the likelihood of the output result.
【0013】[0013]
【作用】この発明に係る形態素解析装置は、形態素連接
頻度を用いることにより、連接の可否のみならず連接尤
度を得て、ヒューリスティクスを用いることなく形態素
候補の絞り込みができ、また出力結果に対して信頼性の
尺度となる尤度を与えることができる。The morpheme analyzer according to the present invention obtains not only the possibility of concatenation but also the concatenation likelihood by using the morpheme concatenation frequency, and it is possible to narrow down the morpheme candidates without using heuristics. On the other hand, it is possible to give a likelihood as a measure of reliability.
【0014】[0014]
【実施例】図1はこの発明を日本語文に適用した一実施
例の概略ブロック図である。図1において、入力文は形
態素解析処理モジュール1に与えられ、この形態素解析
処理モジュール1は形態素辞書引きモジュール2によっ
て形態素辞書3から形態素を読出すとともに、形態素連
接尤度計算モジュール4によって形態素連接頻度データ
5を読出しながら処理を実行する。1 is a schematic block diagram of an embodiment in which the present invention is applied to a Japanese sentence. In FIG. 1, an input sentence is given to a morpheme analysis processing module 1. This morpheme analysis processing module 1 reads a morpheme from a morpheme dictionary 3 by a morpheme dictionary drawing module 2 and a morpheme connection frequency calculation module 4 by a morpheme connection likelihood calculation module 4. The process is executed while reading the data 5.
【0015】図2は図1に示した形態素辞書の一例を示
す図である。図2において形態素辞書は、形態素の出現
形から当該形態素の他の形態素情報(標準形,品詞,活
用形,活用型など)を引くための辞書である。FIG. 2 is a diagram showing an example of the morpheme dictionary shown in FIG. In FIG. 2, the morpheme dictionary is a dictionary for subtracting other morpheme information (standard form, part of speech, inflectional form, inflectional form, etc.) of the morpheme from the appearance form of the morpheme.
【0016】図3は形態素連接頻度データの一例を示す
図である。形態素連接頻度データとは、連接するn個の
形態素の組の出現頻度(nグラム頻度という。ここで連
接数nは1以上の整数)である。n=1の時は各形態素
の出現頻度と等価である。FIG. 3 is a diagram showing an example of morpheme connection frequency data. The morpheme concatenation frequency data is the appearance frequency of a set of n contiguous morphemes (referred to as n-gram frequency, where the concatenation number n is an integer of 1 or more). When n = 1, it is equivalent to the appearance frequency of each morpheme.
【0017】この実施例では、形態素連接頻度として、
すべての形態素情報に関して連接する場合の連接頻度に
対してn=1の場合(以下、単語モノグラム頻度とい
う)、およびn=2の場合(以下、単語バイグラム頻度
という)と、品詞,活用形,活用型の3項目に関して連
接する場合の連接頻度に対してn=1の場合(以下、品
詞モノグラム頻度という)、およびn=2の場合(以
下、品詞バイグラム頻度という)を考えた計4種類の連
接頻度を想定している。In this embodiment, as the morpheme connection frequency,
When n = 1 (hereinafter referred to as the word monogram frequency) and n = 2 (hereinafter referred to as the word bigram frequency) with respect to the connection frequency in the case of connecting all morpheme information, the part-of-speech, inflectional form, and utilization A total of four types of concatenation, considering n = 1 (hereinafter referred to as part-of-speech monogram frequency) and n = 2 (hereinafter referred to as part-of-speech bigram frequency) with respect to the concatenation frequency when three types of items are connected. Frequency is assumed.
【0018】以下に、連接頻度を用いた形態素解析処理
装置の処理手続について詳細に説明する。The processing procedure of the morphological analysis processing apparatus using the connection frequency will be described in detail below.
【0019】(0) 初め、次の(1)に出てくる形態
素列候補のダミーに文頭形態素をセットし、同じく未処
理部分自然言語文に入力文をセットし、累積形態素連接
尤度Yp=1に初期化した状態から(1)の処理に入
る。ここで、ダミーの文頭形態素とは、出力結果には現
れないが、文頭と初めの形態素との連接尤度計算のため
にだけ使われる仮想的な形態素である。(0) First, a sentence head morpheme is set in a dummy of a morpheme string candidate that appears in the next (1), an input sentence is also set in an unprocessed partial natural language sentence, and a cumulative morpheme connection likelihood Yp = The process of (1) starts from the state of being initialized to 1. Here, the dummy sentence head morpheme is a virtual morpheme that does not appear in the output result, but is used only for the concatenation likelihood calculation between the sentence head and the first morpheme.
【0020】(1) 形態素列候補に続く未処理部分自
然言語文の先頭部分文字列に一致する形態素候補を形態
素辞書から引く。ここで形態素が一つも辞書引きできな
かった場合は、その形態素列候補は失敗とし、その形態
素列候補を取り除く。(1) A morpheme candidate that matches the leading partial character string of the unprocessed partial natural language sentence following the morpheme string candidate is subtracted from the morpheme dictionary. If no morpheme can be looked up in the dictionary, the morpheme sequence candidate is considered to have failed, and the morpheme sequence candidate is removed.
【0021】(2) 形態素列候補の最後の形態素c1
のモノグラム頻度およびc1と(1)で得られた一般に
複数の形態素候補のそれぞれc2との間のバイグラム頻
度を用いて、次の第(1)式〜第(3)式で示される形
態素連接尤度Ycを求める。(2) The last morpheme c1 of the morpheme string candidate
, And the bigram frequency between c1 and each of the plurality of morpheme candidates generally obtained in (1), the morpheme concatenated likelihood shown in the following equations (1) to (3). Find Yc.
【0022】 形態素連接尤度Yc=(C・Wc+Pc)/(C+1)…(1) 単語連接尤度Wc=Wb(c1,c2)/Wm(c1)…(2) 品詞連接尤度Pc=Pb(c1,c2)/Pm(c1)…(3) ただし、 C :品詞連接尤度に対する単語連接尤
度の重み係数 Wm(c1) :形態素c1の単語モノグラム頻度 Wb(c1,c2):形態素c1,c2間の単語バイグ
ラム頻度 Pm(c1) :形態素c1の品詞モノグラム頻度 Pb(c1,c2):形態素c1,c2間の品詞バイグ
ラム頻度 (3) Ycが0の場合、連接しないので失敗とし、そ
の形態素列候補を取り除く。Yc>0の場合、連接可能
として形態素列候補に当該形態素を付加し、未処理部分
自然言語文から当該形態素を除く。さらに、形態素連接
尤度Ycを累積形態素連接尤度Ypに累積する。Morphological concatenated likelihood Yc = (C · Wc + Pc) / (C + 1) ... (1) Word concatenated likelihood Wc = Wb (c1, c2) / Wm (c1) ... (2) Part-of-speech concatenated likelihood Pc = Pb (C1, c2) / Pm (c1) ... (3) where C: weighting coefficient of word concatenation likelihood to part-of-speech concatenation likelihood Wm (c1): word monogram frequency of morpheme c1 Wb (c1, c2): morpheme c1 , C2 word bigram frequency Pm (c1): part-of-speech monogram frequency of morpheme c1 Pb (c1, c2): part-of-speech bigram frequency between morphemes c1 and c2 (3) If Yc is 0, it is regarded as a failure because it is not connected Remove morpheme string candidates. When Yc> 0, the morpheme is added to the morpheme string candidate as concatenable and the morpheme is removed from the unprocessed partial natural language sentence. Further, the morpheme connected likelihood Yc is accumulated in the accumulated morpheme connected likelihood Yp.
【0023】(4) 累積形態素連接尤度Ypにより形
態素列候補に優先順位をつけ、候補数が多すぎる場合に
は優先順位の低いものから削除し、適当な数にする。(4) The morpheme string candidates are prioritized according to the cumulative morpheme concatenated likelihood Yp, and when the number of candidates is too large, the ones with the lowest priorities are deleted to obtain an appropriate number.
【0024】(5) (4)で残った一般に複数の形態
素列候補のそれぞれに対して、未処理の部分自然言語文
に対して、次の終了条件を満たすまで(1)〜(4)を
繰返す。(5) For each of the plurality of morpheme sequence candidates generally remaining in (4), (1) to (4) are applied to the unprocessed partial natural language sentence until the following end condition is satisfied. Repeat.
【0025】<終了条件1> 全ての形態素列候補に対
して、未処理の部分自然言語文が存在しなくなり処理が
完全に終了すること。<End condition 1> For all morpheme string candidates, there is no unprocessed partial natural language sentence, and the process is completely completed.
【0026】<終了条件2> 予め設定した数の形態素
列候補に対して未処理の部分自然言語文が存在しなくな
り処理が部分的に終了すること。<Termination condition 2> There is no unprocessed partial natural language sentence for a preset number of morpheme string candidates, and the process is partially completed.
【0027】<終了条件3> 全ての形態素列候補が失
敗する場合、換言すれば次の形態素が辞書引きできない
かあるいはYcが0となって解析を進めるのに必要な形
態素列候補が存在しなくなった場合。<End condition 3> When all the morpheme string candidates fail, in other words, the next morpheme cannot be looked up in the dictionary, or Yc becomes 0, and there is no morpheme string candidate necessary for proceeding with the analysis. If
【0028】(6) 終了条件を満たした場合、最終的
に得られた累積形態素連接尤度Ypの値を尤度として、
未処理の部分自然言語文がなくなった形態素列候補尤度
付き形態素列候補として出力する。(6) When the termination condition is satisfied, the value of the finally obtained cumulative morpheme concatenated likelihood Yp is set as the likelihood.
The unprocessed partial natural language sentence is output as a morpheme string candidate with likelihood likelihood morpheme string candidate.
【0029】図4はこの発明の一実施例の具体的な動作
を説明するためのフローチャートである。この図4で
は、連接頻度を用いた形態素解析の基本的な処理の流れ
を次の(入力文1)を例として示している。FIG. 4 is a flow chart for explaining the specific operation of the embodiment of the present invention. In FIG. 4, the basic processing flow of morphological analysis using the concatenation frequency is shown by taking the following (input sentence 1) as an example.
【0030】“こちらは事務局です”…(入力文1) まず、“こちら”,“は”に対して登録されている形態
素がそれぞれ一つずつであったとする。このとき、形態
素列候補およびそれに対する未処理部分自然言語文(以
下、未処理部分と称する)と累積形態素連接尤度はたと
えば次のようになる。“This is the secretariat” (input sentence 1) First, it is assumed that there is one morpheme registered for each of “here” and “wa”. At this time, the morpheme string candidate, the unprocessed partial natural language sentence (hereinafter, referred to as an unprocessed part) and the cumulative morpheme connected likelihood are as follows, for example.
【0031】 累積形態素連接尤度Yp 形態素列候補 未処理部分 (1) 0.60 (こちら は) “事務局です” (1)の未処理部分に対する形態素辞書引きにより、次
の複数の形態素候補が得られたとする。Cumulative morpheme connected likelihood Yp morpheme sequence candidate unprocessed part (1) 0.60 (here) “It is the secretariat” (1) By morpheme dictionary lookup for the unprocessed part, the following multiple morpheme candidates are obtained. Suppose
【0032】事(0.00) 事務(0.50) 事務
局(0.80) ただし、括弧内の数値は1つ前の形態素“は”との形態
素連接尤度Ycであり、0.00は連接しないことを意
味する。Things (0.00) Office work (0.50) Secretariat (0.80) However, the numerical value in the parentheses is the morpheme connection likelihood Yc with the immediately preceding morpheme "wa", and 0.00 means not connected. To do.
【0033】すると、形態素列候補とは未処理部分は、
それぞれ次のように複数の組合せに分裂する。また、累
積形態素連接尤度が0.00になったものは失敗として
以降の処理から除外する。Then, the unprocessed part of the morpheme string candidate is
Each is divided into multiple combinations as follows. In addition, if the cumulative morpheme connection likelihood becomes 0.00, it is regarded as a failure and is excluded from the subsequent processing.
【0034】 累積形態素連接尤度Yp 形態素列候補 未処理部分 (2a)0.60×0.00=0.00 (こちら は 事) “務局です”→失敗 (2b)0.60×0.50=0.30 (こちら は 事務) “局です” (2c)0.60×0.80=0.48 (こちら は 事務局) “です” ここで、(2a)は累積形態素連接尤度は0.00とな
り失敗する。Cumulative morpheme connected likelihood Yp Unprocessed part of morpheme sequence candidate (2a) 0.60 × 0.00 = 0.00 (this is the case) “It is a bureau” → Failed (2b) 0.60 × 0.50 = 0.30 (this is clerical) “Station (2c) 0.60 × 0.80 = 0.48 (this is the secretariat) “Yes” where (2a) fails with a cumulative morpheme concatenated likelihood of 0.00.
【0035】さらに、(2b),(2c)の各未処理部
分に対する形態素辞書引きにより、次の複数の形態素候
補が得られたとする。 (2b)に対して 局(0.50) (2c)に対して で(0.20) です(0.80) ただし、括弧内の数値はそれぞれの1つ前の形態素との
形態素連接尤度Ycである。すると、形態素列候補と未
処理部分はそれぞれ次のようになる。Further, it is assumed that the following plural morpheme candidates are obtained by the morpheme dictionary lookup for the unprocessed portions of (2b) and (2c). For (2b), it is (0.50) for (2c) and (0.20) for (0.80) However, the value in parentheses is the morpheme concatenated likelihood Yc with the previous morpheme. Then, the morpheme string candidate and the unprocessed part are as follows.
【0036】 累積形態素連接尤度Yp 形態素列候補 未処理部分 (3b)0.30×0.50=0.15 (こちら は 事務 局) “です” (3c)0.48×0.20=0.09 (こちら は 事務局 で) “す” (3d)0.48×0.80=0.33 (こちら は 事務局 です) …→終了 以下、同様に終了条件を満たすまで処理を繰返す。Cumulative morpheme connected likelihood Yp Unprocessed part of morpheme sequence candidate (3b) 0.30 × 0.50 = 0.15 (this is the secretariat) “is” (3c) 0.48 × 0.20 = 0.09 (this is the secretariat) “su” (3d) 0.48 × 0.80 = 0.33 (This is the secretariat)… → End Repeat the same process until the end condition is satisfied.
【0037】この例でもわかるとおり、一般に形態素解
析処理では、結果の候補数が処理が進むにつれて増大す
る傾向がある。しかし、連接頻度を用いる手法では、処
理の各時点で累積形態素連接尤度が得られるので、必要
があれば可能性の低い候補を削除することができる。As can be seen from this example, generally, in the morphological analysis process, the number of result candidates tends to increase as the process progresses. However, in the method using the concatenation frequency, the cumulative morpheme concatenated likelihood is obtained at each time point of the processing, so that a candidate with a low possibility can be deleted if necessary.
【0038】[0038]
【発明の効果】以上のように、この発明によれば形態素
解析処理に対して、形態素連接頻度情報を用いることに
より、処理結果の精度を向上させることが可能となる。
また、一般に処理速度の向上と処理結果の信頼性の向上
の間にはトレード・オフの関係が生じるが、これに対し
ても処理途中での形態素列候補数を調整することによ
り、必要に応じて適度なバランスにすることが可能とな
る。As described above, according to the present invention, by using the morpheme concatenation frequency information for the morpheme analysis processing, it is possible to improve the accuracy of the processing result.
Generally, there is a trade-off relationship between the improvement of the processing speed and the improvement of the reliability of the processing result. Against this, by adjusting the number of morpheme string candidates in the middle of the processing, it is possible to adjust as necessary. It is possible to achieve an appropriate balance.
【図1】この発明の一実施例の概略ブロック図である。FIG. 1 is a schematic block diagram of an embodiment of the present invention.
【図2】形態素辞書の具体的な一例を示した図である。FIG. 2 is a diagram showing a specific example of a morpheme dictionary.
【図3】形態素連接頻度データの具体的な一例を示した
図である。FIG. 3 is a diagram showing a specific example of morpheme connection frequency data.
【図4】連接頻度を用いた形態素解析の基本的な処理の
流れを示すフローチャートである。FIG. 4 is a flowchart showing a basic processing flow of morphological analysis using a concatenation frequency.
1 形態素解析処理モジュール 2 形態素辞書引きモジュール 3 形態素辞書 4 形態素連接尤度計算モジュール 5 形態素連接頻度データ 1 morpheme analysis processing module 2 morpheme dictionary lookup module 3 morpheme dictionary 4 morpheme connection likelihood calculation module 5 morpheme connection frequency data
───────────────────────────────────────────────────── フロントページの続き (72)発明者 隅田 英一郎 京都府相楽郡精華町光台2丁目2番地 株 式会社エイ・ティ・アール音声翻訳通信研 究所内 (72)発明者 古瀬 蔵 京都府相楽郡精華町光台2丁目2番地 株 式会社エイ・ティ・アール音声翻訳通信研 究所内 (72)発明者 村上 仁一 京都府相楽郡精華町光台2丁目2番地 株 式会社エイ・ティ・アール音声翻訳通信研 究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Eiichiro Sumida, 2-chome, Kodai, Seika-cho, Soraku-gun, Kyoto Prefecture ATTR Voice Translation Research Laboratory (72) Inventor, Kura Furuse Soraku Prefecture 2-2 Kodai, Seika-cho, Gunma Incorporated, E.T.R. Co., Ltd. Voice Translation Research Laboratory (72) Inventor, Shinichi Murakami 2-2, Kodai, Seika-cho, Soraku-gun, Kyoto E.T. R Speech Translation and Communication Research Laboratory
Claims (6)
割し、その結果を出力する形態素解析装置であって、 形態素辞書と連接頻度データとを用いて適切な形態素を
選出することを特徴とする、形態素解析装置。1. A morphological analyzer that divides an input natural language sentence for each morpheme and outputs the result, wherein an appropriate morpheme is selected using a morpheme dictionary and concatenation frequency data. A morphological analyzer.
当該形態素の他の形態素情報を引くために用いることを
特徴とする、請求項1の形態素解析装置。2. The morpheme analysis apparatus according to claim 1, wherein the morpheme dictionary is used to subtract other morpheme information of the morpheme from the appearance form of the morpheme.
の一部あるいは全部に着目し、互いに連接する形態素の
頻度を予め算出したものであることを特徴とする、請求
項1の形態素解析装置。3. The morpheme analysis apparatus according to claim 1, wherein the connection frequency data is obtained by paying attention to a part or all of the morpheme information and calculating in advance the frequency of morphemes connected to each other.
形態素候補に対して、前記連接頻度を用いて部分的形態
素連接尤度を求めることを特徴とする、請求項1の形態
素解析装置。4. The morpheme analysis apparatus according to claim 1, wherein a partial morpheme connection likelihood is calculated using the connection frequency for a plurality of morpheme candidates obtained by the morpheme dictionary.
の決定や、形態素列候補数の爆発の防止など、処理過程
において常に存在する課題に対して、前記部分的形態素
連接尤度を利用して処理の効率化と適正化を図ることを
特徴とする、請求項4の形態素解析装置。5. The partial morpheme concatenation likelihood is used for a task that is always present in a processing process, such as determining a priority order when processing branches and preventing explosion of the number of morpheme string candidates. The morphological analysis device according to claim 4, wherein the efficiency and the optimization of the processing are achieved.
のものを選択するために、前記部分的形態素連接尤度を
文全体にわたって累積した値を出力結果の尤度とするこ
とを特徴とする、請求項4の形態素解析装置。6. A value obtained by accumulating the partial morpheme concatenated likelihood over the entire sentence is used as the likelihood of the output result in order to select the maximum likelihood morpheme sequence candidate finally obtained. The morphological analyzer according to claim 4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP5187907A JP3048101B2 (en) | 1993-07-29 | 1993-07-29 | Morphological analyzer |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP5187907A JP3048101B2 (en) | 1993-07-29 | 1993-07-29 | Morphological analyzer |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPH07129589A true JPH07129589A (en) | 1995-05-19 |
| JP3048101B2 JP3048101B2 (en) | 2000-06-05 |
Family
ID=16214297
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP5187907A Expired - Lifetime JP3048101B2 (en) | 1993-07-29 | 1993-07-29 | Morphological analyzer |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP3048101B2 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS61187077A (en) * | 1985-02-14 | 1986-08-20 | Ricoh Co Ltd | Japanese language analysis device |
| JPH01156869A (en) * | 1987-12-14 | 1989-06-20 | Nippon Telegr & Teleph Corp <Ntt> | Japanese sentence analyzing processor |
| JPH04312168A (en) * | 1991-04-11 | 1992-11-04 | Mitsubishi Electric Corp | Processor for statistical language |
-
1993
- 1993-07-29 JP JP5187907A patent/JP3048101B2/en not_active Expired - Lifetime
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS61187077A (en) * | 1985-02-14 | 1986-08-20 | Ricoh Co Ltd | Japanese language analysis device |
| JPH01156869A (en) * | 1987-12-14 | 1989-06-20 | Nippon Telegr & Teleph Corp <Ntt> | Japanese sentence analyzing processor |
| JPH04312168A (en) * | 1991-04-11 | 1992-11-04 | Mitsubishi Electric Corp | Processor for statistical language |
Also Published As
| Publication number | Publication date |
|---|---|
| JP3048101B2 (en) | 2000-06-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4024861B2 (en) | Natural language parser with dictionary-based part-of-speech probabilities | |
| US7636657B2 (en) | Method and apparatus for automatic grammar generation from data entries | |
| JPH0689302A (en) | Dictionary memory | |
| US20050033566A1 (en) | Natural language processing method | |
| US20050149888A1 (en) | Method and apparatus for minimizing weighted networks with link and node labels | |
| JPH07129589A (en) | Morpheme analyzer | |
| JP4220151B2 (en) | Spoken dialogue device | |
| US20050203934A1 (en) | Compression of logs of language data | |
| JP3836607B2 (en) | Statistical language model generator for speech recognition. | |
| JP3258079B2 (en) | Compound word dictionary registration device | |
| JP4007504B2 (en) | Word division device, storage medium, and program | |
| JPS61190657A (en) | Recognizing system for japanese language character string | |
| JPH10177575A (en) | Term extraction apparatus and method, information storage medium | |
| JP2002245062A (en) | Document search device, document search method, program, and recording medium | |
| JP2004206631A (en) | Search tuning method and information search system | |
| JP4040233B2 (en) | Important sentence extraction device and storage medium | |
| JPH06325082A (en) | Machine translation system | |
| JP2693467B2 (en) | Priority control method for homophone candidates in Kana-Kanji conversion processing | |
| KR20250100211A (en) | System and Method for Correcting Grammar Error Using Beam Re-ranking of a Generation based Language Model with Multi-Feature Integration | |
| JP6569543B2 (en) | Abbreviated sentence generation apparatus, method and program. | |
| JPH0612539B2 (en) | Kanji / Kana conversion device | |
| JPH02105968A (en) | Automatic test and correction system for japanese sentence error | |
| JP2002269084A (en) | Morpheme conversion rule generating device and morpheme string converting device | |
| JPH11175522A (en) | Method for processing natural language and device therefor | |
| JP2001022752A (en) | Method and device for character group extraction, and recording medium for character group extraction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 19971111 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090324 Year of fee payment: 9 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090324 Year of fee payment: 9 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100324 Year of fee payment: 10 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100324 Year of fee payment: 10 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20110324 Year of fee payment: 11 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20110324 Year of fee payment: 11 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120324 Year of fee payment: 12 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120324 Year of fee payment: 12 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130324 Year of fee payment: 13 |