JPH0895976A

JPH0895976A - Natural language analyzer

Info

Publication number: JPH0895976A
Application number: JP6232745A
Authority: JP
Inventors: Kazuhiro Takahashi; 一裕高橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-09-28
Filing date: 1994-09-28
Publication date: 1996-04-12

Abstract

PURPOSE: To utilize the repeated appearance of a character string and the appearance conditions of a character string following the repeatedly appearing character string for extracting a unregistered word. CONSTITUTION: An input part 1 transmits an inputted sentence to an unregistered word candidate extracting part 2 and a language analyzing part 5. The unregistered word candidate extracting part 2 extracts unregistered word candidates out of the document sent from the input part 1 and records them in an unregistered word candidate appearance information recording part 3 together with character strings to be connected before and after. An unregistered word candidate evaluating part 4 evaluates the respective unregistered word candidates by analyzing the information recorded in the unregistered word candidate appearance information recording part 3. While referring to the evaluated result of the unregistered word candidate appearance information evaluating part 4, the language analyzing part 5 performs language analysis for analyzing the snetence sent from the input part 1 and transmitting the result to an output part 6. The output part 6 outputs the analyzed result. Thus, natural language analysis is performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、言語処理分野で広く利
用される自然言語解析装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language analyzer widely used in the field of language processing.

【０００２】[0002]

【従来の技術】近年、機械翻訳や自然言語インタフェー
スの実用化などに伴い、自然言語処理分野は急速に発展
してきている。これに伴って、解析対象とする自然言語
文もまた多様なものになってきている。2. Description of the Related Art In recent years, the field of natural language processing has been rapidly developing with the practical use of machine translation and natural language interfaces. Along with this, the natural language sentences to be analyzed have become diverse.

【０００３】しかし、自然言語は常に変化を続けてお
り、そこで使用される単語も常に変化し続ける。そのた
めシステム作成時に組み込まれなかった単語、いわゆる
未登録語を適切に扱う能力が自然言語解析装置には不可
欠である。However, natural language is constantly changing, and the words used therein are also constantly changing. Therefore, it is essential for the natural language analysis device to have the ability to properly handle words that have not been incorporated when the system was created, so-called unregistered words.

【０００４】これに対して従来は、解析に失敗した部分
の文字を未登録語としたり失敗した部分を含む字種ブロ
ックを未登録語としたりするなどの方法で未登録語を検
出し、それを名詞であると想定したり、あるいは接続検
定に成功する品詞の中で最も出現頻度の高いものと想定
したりするなどの処理を行なっていた。これらの方法
は、未登録語の多くが漢字またはカタカナから構成され
る名詞であるということを利用したヒューリスティック
スであると考えることもでき、それなりの有効性がある
ことが実証されている。On the other hand, conventionally, an unregistered word is detected by a method such that the character of the part whose analysis has failed is an unregistered word or the character type block including the failed part is an unregistered word. Was assumed to be a noun, or was assumed to be the most frequently occurring part-of-speech that succeeded in connection verification. These methods can also be considered as heuristics that take advantage of the fact that many unregistered words are nouns composed of Kanji or Katakana, and have proved to be reasonably effective.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら上記の方
法による検出精度は必ずしも充分なものではなく、字種
境界が単語の区切りと一致しない単語や相対的に出現頻
度の低い品詞の単語などは人間ならば容易に把握できる
ような単語でも検出が困難であった。例えば、「……。
サケがね鯔鰯ってしまった。……。ね鯔鰯るともう使え
ない。……」という文章があったとき、人間ならば意味
などの高度な内容を用いることなく「ね鯔鰯る」という
ラ行五段動詞を把握することが出来る。しかし、従来の
技術でこれを検出することは困難であった。However, the accuracy of detection by the above method is not always sufficient, and humans cannot recognize words whose character boundaries do not match word divisions or words of relatively low frequency of occurrence. It was difficult to detect even words that could be easily grasped. For example, "...
The salmon has gone sardines. ……. It cannot be used anymore if you eat sea bream. When the sentence "..." is found, a human being can understand the ra-gyu five-verb verb "ne-ni-ru-ru" without using advanced content such as meaning. However, it was difficult to detect this by the conventional technique.

【０００６】本発明の目的は、このような字種が混合し
ている未登録語や相対的に出現頻度が低い品詞の未登録
語などを適切に処理できる自然言語解析装置を提供する
ことである。An object of the present invention is to provide a natural language analysis apparatus capable of appropriately processing unregistered words in which such character types are mixed or unregistered words of a part of speech having a relatively low frequency of occurrence. is there.

【０００７】[0007]

【課題を解決するための手段】本発明の自然言語解析装
置は、入力文章を受け取る入力部と、前記言語解析部の
解析結果を出力する出力部と、前記入力部から入力文章
を受け取り、該文章中から未登録語の可能性がある文字
列を抽出する未登録語候補抽出部と、前記未登録語候補
抽出部が抽出した未登録語候補と該未登録語候補に隣接
する文字列とを記録する未登録語出現情報記録部と、前
記未登録語出現情報記録部に記録された情報をもとに未
登録語候補の評価を行なう未登録語候補評価部と、前記
未登録語候補評価部の評価結果を参照し、前記入力部か
ら受け取った入力文章を解析する言語解析部とを具備す
ることを特徴とする。A natural language analysis apparatus according to the present invention includes an input section for receiving an input sentence, an output section for outputting an analysis result of the language analysis section, and an input sentence for receiving the input sentence from the input section. An unregistered word candidate extraction unit that extracts a character string that may be an unregistered word from a sentence, an unregistered word candidate extracted by the unregistered word candidate extraction unit, and a character string adjacent to the unregistered word candidate An unregistered word appearance information recording unit, an unregistered word candidate evaluation unit that evaluates an unregistered word candidate based on the information recorded in the unregistered word appearance information recording unit, and the unregistered word candidate. A language analysis unit for analyzing the input sentence received from the input unit with reference to the evaluation result of the evaluation unit.

【０００８】本発明の自然言語解析方法は、入力された
文章の中から未登録語候補を抽出し、該未登録語候補と
該未登録語候補に隣接する文字列とを記録し、この記録
内容から未登録語候補出現情報を獲得し、各未登録語候
補の評価を行ない、未登録語候補評価を利用して、入力
文章を解析し、解析結果を出力することを特徴とする。The natural language analysis method of the present invention extracts an unregistered word candidate from an input sentence, records the unregistered word candidate and a character string adjacent to the unregistered word candidate, and records the unregistered word candidate. It is characterized in that the unregistered word candidate appearance information is acquired from the content, each unregistered word candidate is evaluated, the input sentence is analyzed using the unregistered word candidate evaluation, and the analysis result is output.

【０００９】[0009]

【実施例】次に本発明の実施例について、図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１０】図１は、請求項１の発明の一実施例を示す
ブロック図である。FIG. 1 is a block diagram showing an embodiment of the invention of claim 1.

【００１１】この自然言語解析装置は、入力文章を受け
取る入力部１と、前記入力部１から受け取った入力文章
を解析する言語解析部５と、前記言語解析部５の解析結
果を出力する出力部６と前記入力部１から入力文章を受
け取り、該文章中から未登録語の可能性がある文字列を
抽出する未登録語候補抽出部２と、前記未登録語候補抽
出部２が抽出した未登録語候補と該未登録語候補に隣接
する文字列とを記録する未登録語出現情報記録部３と、
前記未登録語候補出現情報記録部３に記録された情報を
もとに未登録語候補の評価を行なう未登録語候補評価部
４とを備えている。This natural language analysis apparatus has an input unit 1 for receiving an input sentence, a language analysis unit 5 for analyzing the input sentence received from the input unit 1, and an output unit for outputting an analysis result of the language analysis unit 5. 6, an unregistered word candidate extraction unit 2 that receives an input sentence from the input unit 1 and extracts a character string that may be an unregistered word from the sentence, and an unregistered word candidate extraction unit 2 extracts the unregistered word candidate extraction unit 2. An unregistered word appearance information recording unit 3 that records a registered word candidate and a character string adjacent to the unregistered word candidate,
An unregistered word candidate evaluation unit 4 that evaluates an unregistered word candidate based on the information recorded in the unregistered word candidate appearance information recording unit 3 is provided.

【００１２】入力部１に入力された文章は、データ線１
２を通して未登録語候補抽出部２に、データ線１５を通
して言語解析部６に、送られる。未登録語候補抽出部２
は入力文章の中から未登録語候補を抽出し、該未登録語
候補と該未登録語候補に隣接する文字列とをデータ線２
３を通して未登録語候補出現情報記録部３に記録する。
未登録語候補評価部４はデータ線３４を通して未登録語
候補出現情報記録部３から未登録語候補出現情報を獲得
し、各未登録語候補の評価を行なう。言語解析部５は、
データ線４５を通して得られる未登録語候補評価部の未
登録語候補評価を利用して、入力部１から送られた入力
文章を解析し、解析結果をデータ線５６を通して出力部
６に送る。出力部６は解析結果を出力する。The text input to the input unit 1 is the data line 1
2 to the unregistered word candidate extraction unit 2 and the data line 15 to the language analysis unit 6. Unregistered word candidate extraction unit 2
Extracts an unregistered word candidate from the input sentence, and connects the unregistered word candidate and a character string adjacent to the unregistered word candidate to the data line 2
It records in the unregistered word candidate appearance information recording unit 3 through 3.
The unregistered word candidate evaluation unit 4 acquires unregistered word candidate appearance information from the unregistered word candidate appearance information recording unit 3 through the data line 34, and evaluates each unregistered word candidate. The language analysis unit 5
The input sentence sent from the input unit 1 is analyzed using the unregistered word candidate evaluation of the unregistered word candidate evaluation unit obtained through the data line 45, and the analysis result is sent to the output unit 6 through the data line 56. The output unit 6 outputs the analysis result.

【００１３】次に、具体例を用いて動作原理を説明す
る。Next, the operation principle will be described using a specific example.

【００１４】まず、図４（ａ）に示す文章が入力された
とする。First, it is assumed that the sentence shown in FIG. 4 (a) is input.

【００１５】未登録語候補抽出部２は、図４（ａ）の文
章中から図４（ｂ）のような未登録語候補を抽出し、隣
接文字列とともに未登録語候補出現情報記録部３に記録
する。このときの未登録語候補出現情報記録部３の記録
内容の一部を図_５に示す。The unregistered word candidate extraction unit 2 extracts unregistered word candidates as shown in FIG. 4B from the sentence of FIG. 4A, and the unregistered word candidate appearance information recording unit 3 together with the adjacent character string. To record. FIG. ₅ shows a part of the recorded contents of the unregistered word candidate appearance information recording unit 3 at this time.

【００１６】未登録語候補評価部４は、未登録語候補出
現情報記録部３の記録内容をもとに各未登録語候補の評
価を行なう。図_６に評価方法の一例を、図_７に評価結果
の一例を示す。The unregistered word candidate evaluation unit 4 evaluates each unregistered word candidate based on the recorded contents of the unregistered word candidate appearance information recording unit 3. FIG. ₆ shows an example of the evaluation method, and FIG. ₇ shows an example of the evaluation result.

【００１７】言語解析部５は、未登録語候補評価部４の
評価を参考にしながら入力部１から送られてきた文章の
解析を行なう。図_８に解析結果の一例を示す。The language analysis unit 5 analyzes the sentence sent from the input unit 1 with reference to the evaluation of the unregistered word candidate evaluation unit 4. FIG. ₈ shows an example of the analysis result.

【００１８】なお、本実施例の未登録語候補出現情報記
録部３には先行および後続文字列の文境界までの部分が
記録されている。しかし、未登録語候補評価部４の評価
方法によっては先行または後続のどちらか一方だけで済
ませてもよい。記録される文字列の長さも文境界までで
あることは本発明の本質ではなく、より短い又はより長
い文字列を記録しても構わない。In the unregistered word candidate appearance information recording unit 3 of this embodiment, the portions up to the sentence boundary of the preceding and succeeding character strings are recorded. However, depending on the evaluation method of the unregistered word candidate evaluation unit 4, only the preceding or succeeding one may be sufficient. It is not the essence of the present invention that the length of the recorded character string is up to the sentence boundary, and a shorter or longer character string may be recorded.

【００１９】次に、請求項２の発明について説明する。Next, the invention of claim 2 will be described.

【００２０】図２は、請求項２の発明の一実施例を示す
ブロック図である。FIG. 2 is a block diagram showing an embodiment of the invention of claim 2.

【００２１】この自然言語解析装置は、入力文章を受け
取る入力部１と、前記入力部１から受け取った入力文章
を解析する言語解析部５と、前記言語解析部５の解析結
果を出力する出力部６と前記入力部１から入力文章を受
け取り、該文章中から未登録語の可能性がある文字列を
抽出する未登録語候補抽出部２と、前記未登録語候補抽
出部２が抽出した未登録語候補と該未登録語候補に隣接
する文字列とを記録する未登録語出現情報記録部３と、
文字とその文字に対応する意味情報とを格納した文字意
味辞書７と、前記未登録語候補出現情報記録部３に記録
された情報と前記文字意味辞書７に格納された情報とを
もとに未登録語候補の評価を行なう未登録語候補評価部
４とを備えている。This natural language analysis apparatus has an input unit 1 for receiving an input sentence, a language analysis unit 5 for analyzing the input sentence received from the input unit 1, and an output unit for outputting an analysis result of the language analysis unit 5. 6, an unregistered word candidate extraction unit 2 that receives an input sentence from the input unit 1 and extracts a character string that may be an unregistered word from the sentence, and an unregistered word candidate extraction unit 2 extracts the unregistered word candidate extraction unit 2. An unregistered word appearance information recording unit 3 that records a registered word candidate and a character string adjacent to the unregistered word candidate,
A character meaning dictionary 7 storing characters and meaning information corresponding to the characters, based on information recorded in the unregistered word candidate appearance information recording unit 3 and information stored in the character meaning dictionary 7. An unregistered word candidate evaluation unit 4 that evaluates unregistered word candidates.

【００２２】入力部１に入力された文章は、データ線１
２を通して未登録語候補抽出部２に、データ線１５を通
して言語解析部６に、送られる。未登録語候補抽出部２
は入力文章の中から未登録語候補を抽出し、該未登録語
候補と該未登録語候補に隣接する文字列とをデータ線２
３を通して未登録語候補出現情報記録部３に記録する。
未登録語候補評価部４はデータ線３４を通して未登録語
候補出現情報記録部３から未登録語候補出現情報を、デ
ータ線４７を通して文字意味辞書７から各未登録語候補
の構成文字列の意味情報を獲得し、各未登録語候補の評
価を行なう。言語解析部５は、データ線４５を通して得
られる未登録語候補評価部の未登録語候補評価を利用し
て、入力部１から送られた入力文章を解析し、解析結果
をデータ線５６を通して出力部６に送る。出力部６は解
析結果を出力する。The text input to the input unit 1 is the data line 1
2 to the unregistered word candidate extraction unit 2 and the data line 15 to the language analysis unit 6. Unregistered word candidate extraction unit 2
Extracts an unregistered word candidate from the input sentence, and connects the unregistered word candidate and a character string adjacent to the unregistered word candidate to the data line 2
It records in the unregistered word candidate appearance information recording unit 3 through 3.
The unregistered word candidate evaluation unit 4 receives the unregistered word candidate appearance information from the unregistered word candidate appearance information recording unit 3 through the data line 34 and the meaning of the constituent character string of each unregistered word candidate from the character meaning dictionary 7 through the data line 47. Obtain information and evaluate each unregistered word candidate. The language analysis unit 5 analyzes the input sentence sent from the input unit 1 by using the unregistered word candidate evaluation of the unregistered word candidate evaluation unit obtained through the data line 45, and outputs the analysis result through the data line 56. Send to Part 6. The output unit 6 outputs the analysis result.

【００２３】以下、具体例を用いて説明する。A specific example will be described below.

【００２４】まず、図９（ａ）に示す文章が入力された
とする。First, it is assumed that the sentence shown in FIG. 9A is input.

【００２５】未登録語候補抽出部２は、図９（ａ）の文
章中から図９（ｂ）のような未登録語候補を抽出し、隣
接文字列とともに未登録語候補出現情報記録部３に記録
する。このときの未登録語候補出現情報記録部３の記録
内容の一部を図１０に示す。The unregistered word candidate extraction unit 2 extracts unregistered word candidates as shown in FIG. 9B from the sentence of FIG. 9A, and the unregistered word candidate appearance information recording unit 3 together with the adjacent character string. To record. FIG. 10 shows a part of the recorded contents of the unregistered word candidate appearance information recording unit 3 at this time.

【００２６】未登録語候補評価部４は、未登録語候補出
現情報記録部３の記録内容と文字意味辞書７から得られ
る意味情報とをもとに各未登録語候補の評価を行なう。
図１１（ａ）に文字味辞書７から得た意味情報の一例
を、図１１（ｂ）に意味情報を利用した評価方法の一例
を、図１２に評価結果の一例を示す。The unregistered word candidate evaluation unit 4 evaluates each unregistered word candidate based on the recorded contents of the unregistered word candidate appearance information recording unit 3 and the semantic information obtained from the character meaning dictionary 7.
FIG. 11A shows an example of the semantic information obtained from the character dictionary 7, an example of the evaluation method using the semantic information is shown in FIG. 11B, and an example of the evaluation result is shown in FIG.

【００２７】言語解析部５は、未登録語候補評価部４の
評価を参考にしながら入力部１から送られてきた文章の
解析を行なう。図１３に解析結果の一例を示す。The language analysis unit 5 analyzes the sentence sent from the input unit 1 with reference to the evaluation of the unregistered word candidate evaluation unit 4. FIG. 13 shows an example of the analysis result.

【００２８】図３は、請求項３の発明の一実施例を示す
ブロック図である。FIG. 3 is a block diagram showing an embodiment of the invention of claim 3.

【００２９】この自然言語解析装置は、入力文章を受け
取る入力部１と、言語解析用情報を格納した更新可能な
辞書８と、前記入力部１から受け取った入力文章を前記
辞書８を参照しながら解析し、解析結果を元に前記辞書
８の内容を更新する言語解析部５と、前記言語解析部５
の解析結果を出力する出力部６と前記入力部１から入力
文章を受け取り、該文章中から未登録語の可能性がある
文字列を抽出する未登録語候補抽出部２と、前記未登録
語候補抽出部２が抽出した未登録語候補と該未登録語候
補に隣接する文字列とを記録する未登録語出現情報記録
部３と、前記未登録語候補出現情報記録部３に記録され
た情報をもとに未登録語候補の評価を行ない、評価結果
を元に前記辞書８の内容を更新する未登録語候補評価部
４とを備えている。This natural language analyzing apparatus refers to the input section 1 for receiving an input sentence, an updatable dictionary 8 storing information for language analysis, and the input sentence received from the input section 1 with reference to the dictionary 8. A language analysis unit 5 that analyzes and updates the contents of the dictionary 8 based on the analysis result; and the language analysis unit 5
An unregistered word candidate extraction unit 2 that receives an input sentence from the output unit 6 that outputs the analysis result of 1 and the input unit 1 and extracts a character string that may be an unregistered word from the sentence; The unregistered word appearance information recording unit 3 which records the unregistered word candidate extracted by the candidate extraction unit 2 and the character string adjacent to the unregistered word candidate, and the unregistered word candidate appearance information recording unit 3 are recorded. An unregistered word candidate evaluation unit 4 that evaluates unregistered word candidates based on the information and updates the contents of the dictionary 8 based on the evaluation result.

【００３０】入力部１に入力された文章は、データ線１
２を通して未登録語候補抽出部２に、データ線１５を通
して言語解析部６に、送られる。未登録語候補抽出部２
は入力文章の中から未登録語候補を抽出し、該未登録語
候補と該未登録語候補に隣接する文字列とをデータ線２
３を通して未登録語候補出現情報記録部３に記録する。
未登録語候補評価部４はデータ線３４を通して未登録語
候補出現情報記録部３から未登録語候補出現情報を獲得
し、各未登録語候補の評価を行ない、評価結果を元にデ
ータ線４８を通して辞書８の内容を更新する。言語解析
部５は、データ線４５を通して得られる未登録語候補評
価部の未登録語候補評価とデータ線５８を通して得られ
る辞書情報とを利用して、入力部１から送られた入力を
解析し、解析結果を元にデータ線５８を通して辞書８の
内容を更新し、解析結果をデータ線５６を通して出力部
６に送る。出力部６は解析結果を出力する。The text input to the input unit 1 is the data line 1
2 to the unregistered word candidate extraction unit 2 and the data line 15 to the language analysis unit 6. Unregistered word candidate extraction unit 2
Extracts an unregistered word candidate from the input sentence, and connects the unregistered word candidate and a character string adjacent to the unregistered word candidate to the data line 2
It records in the unregistered word candidate appearance information recording unit 3 through 3.
The unregistered word candidate evaluation unit 4 acquires the unregistered word candidate appearance information from the unregistered word candidate appearance information recording unit 3 through the data line 34, evaluates each unregistered word candidate, and based on the evaluation result, the data line 48. To update the contents of the dictionary 8. The language analysis unit 5 analyzes the input sent from the input unit 1 using the unregistered word candidate evaluation of the unregistered word candidate evaluation unit obtained through the data line 45 and the dictionary information obtained through the data line 58. , The contents of the dictionary 8 are updated through the data line 58 based on the analysis result, and the analysis result is sent to the output unit 6 through the data line 56. The output unit 6 outputs the analysis result.

【００３１】以下、具体例を用いて説明する。A specific example will be described below.

【００３２】まず、図１４（ａ）に示す文章が入力され
たとする。First, it is assumed that the sentence shown in FIG. 14A is input.

【００３３】未登録語候補抽出部２は、図１４（ａ）の
文章中から図１４（ｂ）のような未登録語候補を抽出
し、隣接文字列とともに未登録語候補出現情報記録部３
に記録する。このときの未登録語候補出現情報記録部３
の記録内容の一部を図１５に示す。The unregistered word candidate extraction unit 2 extracts unregistered word candidates as shown in FIG. 14B from the sentence of FIG. 14A, and the unregistered word candidate appearance information recording unit 3 together with the adjacent character string.
To record. Unregistered word candidate appearance information recording unit 3 at this time
FIG. 15 shows a part of the recorded contents of FIG.

【００３４】未登録語候補評価部４は、未登録語候補出
現情報記録部３の記録内容をもとに各未登録語候補の評
価を行ない、評価結果をもとに辞書８の内容を更新す
る。図１６に評価結果の一例を、図１７に評価結果によ
る辞書８の内容の更新の一例を示す。The unregistered word candidate evaluation unit 4 evaluates each unregistered word candidate based on the recorded contents of the unregistered word candidate appearance information recording unit 3, and updates the contents of the dictionary 8 based on the evaluation result. To do. FIG. 16 shows an example of the evaluation result, and FIG. 17 shows an example of updating the contents of the dictionary 8 based on the evaluation result.

【００３５】言語解析部５は、未登録語候補評価部４の
評価を参考にしながら入力部１から送られてきた文章の
解析を行ない、解析結果を元に辞書８の内容を更新す
る。The language analysis unit 5 analyzes the sentence sent from the input unit 1 while referring to the evaluation of the unregistered word candidate evaluation unit 4, and updates the contents of the dictionary 8 based on the analysis result.

【００３６】図１８に解析結果を元にした辞書８の内容
の更新の一例を示す。FIG. 18 shows an example of updating the contents of the dictionary 8 based on the analysis result.

【００３７】別の具体例として、未登録語候補抽出部２
が抽出用テンプレートを用いて抽出を行なう場合を説明
する。As another specific example, the unregistered word candidate extraction unit 2
A case will be described in which extraction is performed using the extraction template.

【００３８】図１９に示す文章が入力されたとする。更
に、未登録語候補抽出部２が図２０に示す抽出用テンプ
レートを用いるとする。但し、抽出用テンプレート中の
［］は任意の長さの文字列を意味する。未登録語候補抽
出部２は、入力文章に対して抽出用テンプレートを適用
することで未登録語候補を抽出する。その結果、未登録
語候補出現情報記録部３には、図２１に示す未登録語候
補が記録される。It is assumed that the text shown in FIG. 19 is input. Further, it is assumed that the unregistered word candidate extraction unit 2 uses the extraction template shown in FIG. However, [] in the extraction template means a character string of an arbitrary length. The unregistered word candidate extraction unit 2 extracts unregistered word candidates by applying the extraction template to the input sentence. As a result, the unregistered word candidate appearance information recording unit 3 records the unregistered word candidates shown in FIG.

【００３９】別の具体例として、未登録語候補評価部４
が出現頻度を用いて評価を行なう場合を説明する。As another specific example, the unregistered word candidate evaluation unit 4
A case will be described where is evaluated using the appearance frequency.

【００４０】未登録語候補評価部４が未登録語候補出現
情報記録部３に記録されている情報を検索し、各表層語
とその表層語の出現回数とを計数する。図２２に計数結
果の一例を示す。未登録語候補評価部４は計数結果をも
とに各表層の評価を行なう。図２３に評価結果の一例を
示す。The unregistered word candidate evaluation unit 4 searches the information recorded in the unregistered word candidate appearance information recording unit 3, and counts each surface word and the number of appearances of the surface word. FIG. 22 shows an example of the counting result. The unregistered word candidate evaluation unit 4 evaluates each surface layer based on the counting result. FIG. 23 shows an example of the evaluation result.

【００４１】別の具体例として、未登録語候補評価部４
が候補の音の並びを用いて評価を行なう場合を説明す
る。As another specific example, the unregistered word candidate evaluation unit 4
A case will be described in which the evaluation is performed using a sequence of candidate sounds.

【００４２】未登録語候補評価部４が用いる評価方法の
一例を図２４に示す。FIG. 24 shows an example of an evaluation method used by the unregistered word candidate evaluation unit 4.

【００４３】未登録語候補評価部４は未登録語候補出現
情報記録部３に記録されている未登録語候補に該評価方
法を適用して評価を行なう。評価結果の一例を図２５に
示す。The unregistered word candidate evaluation unit 4 evaluates the unregistered word candidates recorded in the unregistered word candidate appearance information recording unit 3 by applying the evaluation method. An example of the evaluation result is shown in FIG.

【００４４】別の例として、複数の評価部を持つ例を図
２６に示す。As another example, FIG. 26 shows an example having a plurality of evaluation sections.

【００４５】２つの異なった方法で未登録語候補の評価
を行なう未登録語候補評価部αと未登録語候補評価部β
とがあり、両評価部の評価を総合する評価総合部とがあ
る。このとき、点線で囲んだ部分全体を１つの未登録語
候補評価部であるとみなすことができ、請求項１と同じ
発明であるといえる。An unregistered word candidate evaluation unit α and an unregistered word candidate evaluation unit β that evaluate unregistered word candidates by two different methods.
There is a comprehensive evaluation section that combines the evaluations of both evaluation sections. At this time, the entire part surrounded by the dotted line can be regarded as one unregistered word candidate evaluation part, and can be said to be the same invention as claim 1.

【００４６】ここでは２つの評価部を持つ例を示した
が、３つ以上の評価部を持つ場合も同様に考えることが
できる。Although an example having two evaluation sections is shown here, the case of having three or more evaluation sections can be considered in the same manner.

【００４７】また、本実施例では説明文、数値、記号、
などが用いられているが、これらの表現方式は本発明の
本質ではなく、同等の説明文、数値、記号などに適宜置
き換えても構わない。In the present embodiment, the explanation, numerical values, symbols,
, Etc. are used, but these expressions are not the essence of the present invention, and may be appropriately replaced with equivalent explanations, numerical values, symbols, and the like.

【００４８】[0048]

【発明の効果】以上に述べたように、本発明では未登録
語候補を該当候補出現回数と隣接文字列情報とによって
評価しており、従来の手法では困難であった複数の字種
からなる未登録語や出現頻度の低い品詞である未登録語
などを高精度に抽出することが可能になる。As described above, according to the present invention, unregistered word candidates are evaluated based on the number of appearances of the corresponding candidate and the adjacent character string information. It becomes possible to extract an unregistered word, an unregistered word that is a part of speech with a low occurrence frequency, and the like with high accuracy.

[Brief description of drawings]

【図１】請求項１の発明の一実施例を示すブロック図で
ある。FIG. 1 is a block diagram showing an embodiment of the invention of claim 1;

【図２】請求項２の発明の一実施例を示すブロック図で
ある。FIG. 2 is a block diagram showing an embodiment of the invention of claim 2;

【図３】請求項３の発明の一実施例を示すブロック図で
ある。FIG. 3 is a block diagram showing an embodiment of the invention of claim 3;

【図４】入力文章、未登録語候補抽出の一例を示すブロ
ック図である。FIG. 4 is a block diagram showing an example of input sentence and extraction of unregistered word candidates.

【図５】未登録語候補出現情報記録部の内容を一例を示
す説明図である。FIG. 5 is an explanatory diagram showing an example of contents of an unregistered word candidate appearance information recording unit.

【図６】未登録語候補評価部の評価方法の一例を示す説
明図である。FIG. 6 is an explanatory diagram showing an example of an evaluation method of an unregistered word candidate evaluation unit.

【図７】未登録語候補候補評価結果の一例を示す説明図
である。FIG. 7 is an explanatory diagram showing an example of an unregistered word candidate candidate evaluation result.

【図８】入力文章の解析結果の一例を示す説明図であ
る。FIG. 8 is an explanatory diagram showing an example of an analysis result of an input sentence.

【図９】入力文章、未登録語候補抽出の一例を示す説明
図である。FIG. 9 is an explanatory diagram showing an example of an input sentence and extraction of unregistered word candidates.

【図１０】未登録語候補出現情報記録部の内容の一例を
示す説明図である。FIG. 10 is an explanatory diagram showing an example of the contents of an unregistered word candidate appearance information recording unit.

【図１１】文字意味辞書の内容と文字意味辞書の内容を
利用した評価方法の一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of an evaluation method using the contents of the character meaning dictionary and the contents of the character meaning dictionary.

【図１２】文字意味辞書の内容を利用した評価結果の一
例を示す説明図である。FIG. 12 is an explanatory diagram showing an example of an evaluation result using the contents of a character meaning dictionary.

【図１３】入力文章の解析結果の一例を示す説明図であ
る。FIG. 13 is an explanatory diagram showing an example of an analysis result of an input sentence.

【図１４】入力文章、未登録語候補抽出の一例を示す説
明図である。FIG. 14 is an explanatory diagram showing an example of input sentence and extraction of unregistered word candidates.

【図１５】未登録語候補出現情報記録部の内容の一例を
示す説明図である。FIG. 15 is an explanatory diagram showing an example of contents of an unregistered word candidate appearance information recording unit.

【図１６】未登録語候補評価結果の一例を示す説明図で
ある。FIG. 16 is an explanatory diagram showing an example of an unregistered word candidate evaluation result.

【図１７】未登録語候補評価結果を用いた辞書の内容更
新の一例を示す説明図である。FIG. 17 is an explanatory diagram showing an example of updating the contents of a dictionary using an unregistered word candidate evaluation result.

【図１８】言語解析部の解析結果を基にした辞書内容更
新の一例を示す説明図である。FIG. 18 is an explanatory diagram showing an example of dictionary content update based on an analysis result of a language analysis unit.

【図１９】入力文章の一例を示す説明図である。FIG. 19 is an explanatory diagram showing an example of an input sentence.

【図２０】抽出用テンプレートの一例を示す説明図であ
る。FIG. 20 is an explanatory diagram showing an example of an extraction template.

【図２１】抽出用テンプレートによって抽出された未登
録語候補の一例を示す説明図である。FIG. 21 is an explanatory diagram showing an example of unregistered word candidates extracted by the extraction template.

【図２２】頻度情報の一例を示す説明図である。FIG. 22 is an explanatory diagram showing an example of frequency information.

【図２３】頻度情報を用いた評価結果の一例を示す説明
図である。FIG. 23 is an explanatory diagram showing an example of an evaluation result using frequency information.

【図２４】音の並びを用いた評価方法の一例を示す説明
図である。FIG. 24 is an explanatory diagram showing an example of an evaluation method using a sequence of sounds.

【図２５】音の並びを用いた評価結果の一例を示す説明
図である。FIG. 25 is an explanatory diagram showing an example of an evaluation result using a sequence of sounds.

【図２６】複数の評価部を持つ例を示すブロック図であ
る。FIG. 26 is a block diagram showing an example having a plurality of evaluation units.

[Description of marks _Nos.]

１入力部２未登録語候補抽出部３未登録語候補出現情報記録部４未登録語候補評価部５言語解析部６出力部７文字意味辞書８辞書 1 input unit 2 unregistered word candidate extraction unit 3 unregistered word candidate appearance information recording unit 4 unregistered word candidate evaluation unit 5 language analysis unit 6 output unit 7 character semantic dictionary 8 dictionary

Claims

[Claims]

1. An input unit that receives an input sentence, an output unit that outputs an analysis result of the language analysis unit, a character string that receives the input sentence from the input unit, and may be an unregistered word from the sentence. An unregistered word candidate extraction unit that extracts the unregistered word candidate, and an unregistered word appearance information recording unit that records the unregistered word candidate extracted by the unregistered word candidate extraction unit and a character string adjacent to the unregistered word candidate, With reference to the unregistered word candidate evaluation unit that evaluates unregistered word candidates based on the information recorded in the unregistered word appearance information recording unit, the evaluation result of the unregistered word candidate evaluation unit, and from the input unit A natural language analysis device, comprising: a language analysis unit that analyzes a received input sentence.

2. A character semantic dictionary that stores characters and semantic information corresponding to the characters, further comprising: referring to the character semantic dictionary when the unregistered word candidate evaluation unit evaluates unregistered word candidates. The natural language analysis device according to claim 1, which is performed.

3. An updatable dictionary storing information for language analysis is further provided, wherein the unregistered word candidate evaluation unit updates the contents of the dictionary based on an evaluation result, and the language analysis unit based on the analysis result. 2. The natural language analysis device according to claim 1, wherein the content of the dictionary is updated by referring to the information of the dictionary when the language analysis unit analyzes the input sentence.

4. An unregistered word candidate is extracted from an input sentence, the unregistered word candidate and a character string adjacent to the unregistered word candidate are recorded, and an unregistered word candidate appears from the recorded content. A natural language analysis method characterized by acquiring information, evaluating each unregistered word candidate, analyzing the input sentence using the unregistered word candidate evaluation, and outputting the analysis result.