JPH0345421B2 - - Google Patents

Info

Publication number
JPH0345421B2
JPH0345421B2 JP60279121A JP27912185A JPH0345421B2 JP H0345421 B2 JPH0345421 B2 JP H0345421B2 JP 60279121 A JP60279121 A JP 60279121A JP 27912185 A JP27912185 A JP 27912185A JP H0345421 B2 JPH0345421 B2 JP H0345421B2
Authority
JP
Japan
Prior art keywords
word
analysis
analysis results
kana
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP60279121A
Other languages
Japanese (ja)
Other versions
JPS62139076A (en
Inventor
Akihiro Hirai
Hideaki Shinohara
Yoichi Hitano
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Agency of Industrial Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology filed Critical Agency of Industrial Science and Technology
Priority to JP60279121A priority Critical patent/JPS62139076A/en
Publication of JPS62139076A publication Critical patent/JPS62139076A/en
Publication of JPH0345421B2 publication Critical patent/JPH0345421B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は、言語を解析するための解析方式に係
り、特に、自然言語で記述された文章の解析を効
率良く行う言語解析方式に関するものである。
[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to an analysis method for analyzing language, and particularly to a language analysis method that efficiently analyzes sentences written in natural language. .

〔発明の背景〕[Background of the invention]

従来の言語解析の方式、例えば、長尾真編“言
語の機械処理”(1984)の第3章で論じられてい
る方式では、処理は一文単位で実行され、複数の
文を順に解析する場合でも、前文までの解析結果
は利用されず、既出の文節と同一の文節が現れて
も、単語への分割等の解析処理のすべてを最初か
ら実行しなければならなかつた。そのため、複数
の文を順に解析する場合、文の数に比例した処理
の手段を要し、処理効率が悪い、という問題があ
つた。
In conventional language analysis methods, such as the method discussed in Chapter 3 of Makoto Nagao's "Machine Processing of Language" (1984), processing is performed on a sentence-by-sentence basis, even when multiple sentences are analyzed in sequence. , the analysis results up to the preamble were not used, and even if a clause that was the same as a clause that had already appeared appeared, all the analysis processing, such as division into words, had to be performed from the beginning. Therefore, when a plurality of sentences are sequentially analyzed, processing means proportional to the number of sentences are required, resulting in a problem of poor processing efficiency.

〔発明の目的〕[Purpose of the invention]

本発明の目的は、かかる従来方式の問題点を解
決し、複数の文を順に解析する場合に、解析効率
の向上する言語処理方式を提供することにある。
An object of the present invention is to solve the problems of the conventional method and to provide a language processing method that improves analysis efficiency when sequentially analyzing a plurality of sentences.

〔発明の概要〕[Summary of the invention]

本発明の言語処理方式は、自然言語の解析、あ
るいは、翻訳を行う言語処理装置において、解析
結果を記憶媒体に格納し、次の文の解析の際、前
文までの解析結果を利用した解析処理の部分的省
略を行うことにより、前記目的を達成するもので
ある。
The language processing method of the present invention stores analysis results in a storage medium in a language processing device that analyzes or translates natural language, and when analyzing the next sentence, performs an analysis process that uses the analysis results up to the previous sentence. The above objective is achieved by partially omitting .

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の一実施例を図に従つて、詳細に
説明する。第1図は、本発明の実施例の言語処理
システムであるところの第1言語から第2言語へ
の自動翻訳システムの構成図である。ここでは、
便宜上、第1言語を日本語、第2言語を英語とす
る。第1図に示すように、本発明に係わる言語処
理システムは、処理装置1、処理プログラム、解
析結果が格納される記憶媒体〔1〕2、辞書デー
タが格納される記憶媒体〔2〕3、処理すべき入
力文章が格納される記憶媒体〔3〕4、表示装置
5、キー・ボード6より構成される。本言語処理
システムは、記憶媒体〔3〕4内の文を順に取り
出し、記憶媒体〔2〕3内の辞書データを利用し
ながら翻訳し、結果を表示装置5へ出力する。
Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram of an automatic translation system from a first language to a second language, which is a language processing system according to an embodiment of the present invention. here,
For convenience, the first language will be Japanese and the second language will be English. As shown in FIG. 1, the language processing system according to the present invention includes a processing device 1, a processing program, a storage medium [1] 2 in which analysis results are stored, a storage medium [2] 3 in which dictionary data is stored, It is composed of a storage medium [3] 4 in which input text to be processed is stored, a display device 5, and a keyboard 6. This language processing system sequentially extracts sentences in the storage medium [3] 4, translates them using the dictionary data in the storage medium [2] 3, and outputs the results to the display device 5.

第2,3,4図に本発明による解析方式の流れ
を示す。
Figures 2, 3, and 4 show the flow of the analysis method according to the present invention.

第5図は入力文章の例、第6図は格納された解
析結果の例を示しており、第7図は入力文の単語
分割処理の実例(aは部分列への分割、bは単語
分割の結果を示す図である)、第8図は解析結果
の別の格納形式を示す図である。第6,8図に示
すように、本実施例では、解析結果は、文節が助
動詞列が一つの単位として格納され、格納される
情報は、各単位の文中での表記(以降、これを見
出し文字列と呼ぶ)、各単位を構成している単語
の文中での表記、品詞、活用情報(活用形、活用
の種類)等の辞書データである。なお、第6,
7,8図における2文字の英文字から成るコード
は品詞コードである。
Figure 5 shows an example of an input sentence, Figure 6 shows an example of stored analysis results, and Figure 7 shows an example of word division processing of an input sentence (a is division into substrings, b is word division). FIG. 8 is a diagram showing another storage format of the analysis results. As shown in Figures 6 and 8, in this example, the analysis result is that the sentence clause is stored as a unit of the auxiliary verb string, and the stored information is expressed as the in-sentence representation of each unit (hereinafter referred to as the header). This is dictionary data such as character strings), the in-sentence notation of the words that make up each unit, parts of speech, and conjugation information (conjugations, types of conjugations), etc. In addition, the 6th,
The code consisting of two English letters in Figures 7 and 8 is a part-of-speech code.

日本語を英語に自動翻訳するためには、最初の
ステツプとして、ベタ書きで書かれた漢字かな混
じり文を単語に分割しなければならない。この単
語分割の処理を本発明の実施例として、第2,
3,4図に従つて説明する。
In order to automatically translate Japanese into English, the first step is to divide the solidly written sentences with kanji and kana into words. As an embodiment of the present invention, this word division process is described in the second,
This will be explained according to Figures 3 and 4.

今、第5図に示す文章が記憶媒体〔3〕4に格
納されており、解析処理は第1番目の文が終了し
たところだとする。この時点で、第1番目の文の
解析結果が第6図に示す形式で記憶媒体〔1〕2
に格納される。そして、第2番目の文の解析処理
が実行される(101)。
It is now assumed that the sentences shown in FIG. 5 are stored in the storage medium [3] 4, and the analysis process has just finished for the first sentence. At this point, the analysis result of the first sentence is stored on the storage medium [1] 2 in the format shown in Figure 6.
is stored in Then, analysis processing of the second sentence is executed (101).

解析の最初として、処理対象の文を、格納され
ている解析結果の利用可能な部分列とそうでない
部分列とに分離する(102)。具体的には、格納さ
れた解析結果の見出し文字列と一致する部分列を
解析結果の利用可能な部分列とみなす。その結
果、第2番目の入力文は第7図aのようになる。
ただし、斜線の部分が解析結果の利用可能な部分
列である。次に、分離した部分列中の未処理の部
分列の内、先頭のものを取り上げ(これを部分列
aとする)(103)、部分列aが解析結果の利用可
能な部分列であれば(104)、第4図の処理
(105)を、そうでなければ、第3図の処理
(106)を実行する。この処理を未処理の部分列が
存在しなくなるまで(107)繰返した後、解析結
果(第7図aの解析結果は第7図b)を記憶媒体
〔1〕2に格納する(108)。解析結果の格納は、
文節か助動詞列を一つの単位として行うが、同一
の見出し文学列を持つ解析結果の単位に関して
は、格納処理を行わない。以上の処理を、未処理
の文がなくなるまで(109)繰り返す。
At the beginning of analysis, the sentence to be processed is separated into subsequences that can be used as stored analysis results and subsequences that cannot (102). Specifically, a substring that matches the heading character string of the stored analysis result is regarded as a usable substring of the analysis result. As a result, the second input sentence becomes as shown in FIG. 7a.
However, the shaded part is the available subsequence of the analysis result. Next, among the unprocessed subsequences in the separated subsequences, pick up the first one (this is called subsequence a) (103), and if subsequence a is a subsequence for which analysis results can be used, (104), the process (105) in FIG. 4 is executed, and if not, the process (106) in FIG. 3 is executed. After repeating this process until there are no unprocessed subsequences (107), the analysis results (the analysis results of FIG. 7a and FIG. 7b) are stored in the storage medium [1]2 (108). To store the analysis results,
This is done with a clause or auxiliary verb string as one unit, but storage processing is not performed for units of analysis results that have the same heading literary string. The above process is repeated until there are no more unprocessed sentences (109).

次に、の処理について、第3図に従つて説明
する。この処理は、格納されている解析結果が利
用できない部分列に対するものであり、最初に、
最長一致を原則とした単語の切出しを(この単語
をWとする)その部分列に関して行う(201)。例
えば、第7図aの先頭の部分列に関しては、“そ
して”が切出される。ただし、切出した単語が活
用のある語の場合は、語尾変化も含めて切出す。
次に、切出した単語が前方の語と接続可能である
かチエツクする(202)。接続可能であるならば、
切出した語の後方の語を最長一致の原則で切出し
(203)、その品詞を基準とし、後方接続の可能性
をチエツクする(204)。接続可能であれば、単語
の認定を行う(205)。前方接続、あるいは、後方
接続が不可の場合、単語Wの切出し、認定を棄却
し、同一部分列から別の単語を切出し(207)、前
方接続可能性のチエツクからやり直す。また、
207の処理が不可能ならば、単語Wの直前の単語
の切出し・認定を棄却し、の処理をやり直す
(208)。ただし、単語Wが文頭の語の場合、処理
のやり直しが出来ないため、単語分割処理は失敗
したとする。このような処理を未解析の文字列が
なくなるまで(206)、繰り返して、の処理は終
了する。なお、第7図aの先頭の部分列に関して
は、の処理により、“そして”が接続詞、“、”
が読点であると解析される。
Next, the process will be explained with reference to FIG. This process is for subsequences for which stored analysis results are not available, and first,
Word extraction based on the principle of longest match (this word is W) is performed for the substring (201). For example, "and" is cut out for the first subsequence in FIG. 7a. However, if the extracted word is a word with a conjugation, the inflection is also included in the extraction.
Next, it is checked whether the extracted word can be connected to the preceding word (202). If it is possible to connect,
Words after the extracted word are extracted using the longest match principle (203), and the possibility of backward connection is checked using that part of speech as a criterion (204). If connection is possible, the word is certified (205). If forward connection or backward connection is not possible, the extraction and recognition of word W is rejected, another word is extracted from the same subsequence (207), and the check for forward connection possibility is restarted. Also,
If the process in 207 is impossible, the extraction and recognition of the word immediately before word W is rejected, and the process in 207 is redone (208). However, if the word W is the first word in a sentence, the word division process is assumed to have failed because the process cannot be redone. This process is repeated until there are no more unparsed character strings (206), and the process ends. Regarding the first subsequence in Figure 7a, by the process, "and" is a conjunction, and ","
is interpreted as a comma.

の処理について、第4図に従つて説明する。
この処理は、格納されている解析結果が利用可能
な部分列に対するものである。最初に、格納され
ている解析結果から得られるその部分列の先頭の
単語の品詞を基準に、前方接続可能性のチエツク
を行う(301)。第7図aの第2番目の部分列に関
しては、“、”と“太郎”の接続可能性のチエツク
が、301の処理に対応する。接続可能ならば、後
方の部分列より、単語を最長一致の原則を用いて
切出し、その品詞情報を得、該部分列の最後尾の
単語との後方接続可能性のチエツクを行う
(303)。第7図aの第2番目の部分列に関しては、
“は”と“栗”の接続可能性のチエツクを行うこ
とになる。接続可能ならば、の処理は終了す
る。ただし、前方接続不可の場合、該部分列に対
応する解析結果を棄却し、の処理を実行する
(304)。また、後方接続不可の場合は、該部分列
の最後尾の単語に対応する解析結果を棄却し、
の処理を実行する(305)。
The processing will be explained with reference to FIG.
This process is for subsequences for which stored analysis results are available. First, the possibility of forward connection is checked based on the part of speech of the first word of the subsequence obtained from the stored analysis results (301). Regarding the second subsequence in FIG. 7a, checking the connectability of "," and "Taro" corresponds to the process 301. If it is possible to connect, the word is extracted from the subsequent subsequence using the longest match principle, its part of speech information is obtained, and the possibility of backward connection with the last word of the subsequence is checked (303). Regarding the second subsequence in Figure 7a,
The possibility of connecting “ha” and “chestnut” will be checked. If the connection is possible, the process ends. However, if the forward connection is not possible, the analysis result corresponding to the subsequence is rejected and the process is executed (304). In addition, if backward connection is not possible, reject the analysis result corresponding to the last word of the subsequence,
(305).

なお、記憶媒体の容量の制限のため、解析結果
のすべてを格納できない場合は、解析結果の得ら
れた時間を基準に優先順位をつけ、新しい解析結
果が常に保持されるようにすると、優先順位を付
けない場合と比較して、処理効率が良くなる。
If it is not possible to store all the analysis results due to storage medium capacity limitations, you can prioritize the analysis results based on the time they were obtained and ensure that new analysis results are always retained. Processing efficiency is improved compared to the case without .

また、第8図に示すような形式で解析結果の格
納を行えば、すなわち、解析結果の構成要素とな
るべき情報の格納番地を示す情報により、解析結
果を表現すれば、同一要素に対して、一重に記憶
領域を確保する必要が無く、解析結果の記憶効率
が良くなり、全体の処理効率も向上する。
Furthermore, if the analysis results are stored in the format shown in Figure 8, that is, if the analysis results are expressed using information indicating the storage addresses of information that should become the constituent elements of the analysis results, then the same element can be , there is no need to secure a single storage area, the storage efficiency of analysis results is improved, and the overall processing efficiency is also improved.

〔発明の効果〕〔Effect of the invention〕

以上、本発明の実施例につき説明したが、本発
明によれば、同一の文字列の解析処理を省略が可
能となるため、複数の文の解析処理の効率を向上
せることが出来る。特に、繰返し表現の多い文
章、文の終わり方にくせのある文章の解析には、
大きな効果を得ることができる。
The embodiments of the present invention have been described above, but according to the present invention, it is possible to omit the analysis process for the same character string, so it is possible to improve the efficiency of the analysis process for a plurality of sentences. In particular, when analyzing sentences with many repeated expressions or sentences with a habit of ending sentences,
You can get a big effect.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明による言語処理システム全体の
構成図、第2,3,4図は本発明による解析処理
の流れを示す図、第5図は入力文章の例を示す
図、第6図は格納された解析結果の例を示す図、
第7図は単語分割の実行を示す図、第8図は解析
結果の別種の格納形式を示す図である。 1……中央処理装置、2……記憶媒体〔1〕、
3……記憶媒体〔2〕、4……記憶媒体〔3〕、5
……表示装置、6……キー・ボード。
Figure 1 is a block diagram of the entire language processing system according to the present invention, Figures 2, 3, and 4 are diagrams showing the flow of analysis processing according to the present invention, Figure 5 is a diagram showing an example of an input sentence, and Figure 6 is a diagram showing the flow of analysis processing according to the present invention. Diagram showing an example of stored analysis results,
FIG. 7 is a diagram showing the execution of word division, and FIG. 8 is a diagram showing another type of storage format for the analysis results. 1...Central processing unit, 2...Storage medium [1],
3...Storage medium [2], 4...Storage medium [3], 5
...display device, 6...keyboard.

Claims (1)

【特許請求の範囲】[Claims] 1 漢字かな混じり文を解析する言語解析方式で
あつて、漢字かな混じり文を入力する手段と、前
記漢字かな混じり文をそれを構成する単語に分割
する手段と、前記分割手段により分割した単語と
該単語の直前および直後の単語の少なくとも一方
の接続可能な単語とを接続した文節または助動詞
列を見出し語とし、前記単語および前記接続可能
な単語に関する情報を格納する手段を有し、さら
に前記入力手段により引き続き入力された漢字か
な混じり文中で、前記格納する手段により格納さ
れている前記見出し語と一致する部分文字列を前
記引き続き入力された漢字かな混じり文から分離
する手段を設け、前記分離された文字列以外の残
された文字列を、前記分割する手段により、該残
された文字列を構成する単語に分割し、その分割
結果に基づいて、前記格納する手段により上記所
定の情報を格納することを特徴とする言語解析方
式。
1. A language analysis method for analyzing a sentence containing kanji and kana, which comprises means for inputting a sentence containing kanji and kana, means for dividing the sentence containing kanji and kana into its constituent words, and words divided by the dividing means. A phrase or an auxiliary verb string in which at least one of the connectable words immediately before and after the word is connected as a headword, and means for storing information regarding the word and the connectable word; means for separating a partial character string that matches the headword stored by the storing means from the continuously inputted sentence containing Kanji and Kana; dividing the remaining character string other than the character string into words constituting the remaining character string by the dividing means, and storing the predetermined information by the storing means based on the division result. A language analysis method characterized by:
JP60279121A 1985-12-13 1985-12-13 Language analysis system Granted JPS62139076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60279121A JPS62139076A (en) 1985-12-13 1985-12-13 Language analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60279121A JPS62139076A (en) 1985-12-13 1985-12-13 Language analysis system

Publications (2)

Publication Number Publication Date
JPS62139076A JPS62139076A (en) 1987-06-22
JPH0345421B2 true JPH0345421B2 (en) 1991-07-11

Family

ID=17606719

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60279121A Granted JPS62139076A (en) 1985-12-13 1985-12-13 Language analysis system

Country Status (1)

Country Link
JP (1) JPS62139076A (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62203276A (en) * 1986-03-03 1987-09-07 Nec Corp Form element analysis device
JPH07113922B2 (en) * 1987-04-14 1995-12-06 富士通株式会社 Machine translation device
JP2960936B2 (en) * 1987-07-13 1999-10-12 日本電信電話株式会社 Dependency analyzer
JPH02140871A (en) * 1988-11-22 1990-05-30 Matsushita Electric Ind Co Ltd Japanese language analysis device
JP3139658B2 (en) * 1993-05-06 2001-03-05 シャープ株式会社 Document display method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58192173A (en) * 1982-05-07 1983-11-09 Hitachi Ltd System for selecting word used in translation in machine translation
JPS59183469A (en) * 1983-03-31 1984-10-18 Fujitsu Ltd Machine interpreter
JPS59197929A (en) * 1983-04-25 1984-11-09 Ricoh Co Ltd Kana-kanji conversion processing device
JPS61156466A (en) * 1984-12-28 1986-07-16 Ricoh Co Ltd Word extracting system
JPS61260366A (en) * 1985-05-14 1986-11-18 Sharp Corp Mechanical translating system having learning function

Also Published As

Publication number Publication date
JPS62139076A (en) 1987-06-22

Similar Documents

Publication Publication Date Title
US6816830B1 (en) Finite state data structures with paths representing paired strings of tags and tag combinations
JP4459443B2 (en) Word segmentation in Chinese text
EP0180888A2 (en) Method and apparatus for natural language processing
JP6902945B2 (en) Text summarization system
JPH0724055B2 (en) Word division processing method
JP2002215619A (en) Translation sentence extracting method from translated document
JPH02165378A (en) Machine translation system
CN101339547A (en) Apparatus and method for machine translation
JP3831357B2 (en) Parallel translation information creation device and parallel translation information search device
US7302384B2 (en) Left-corner chart parsing
US20090216522A1 (en) Apparatus, method, and computer program product for determing parts-of-speech in chinese
JPS6033665A (en) Automatic keyword extraction method
JPH0345421B2 (en)
JPS61100861A (en) Document editing device
Tambouratzis Automatic corpora-based stemming in Greek
JP2002503849A (en) Word segmentation method in Kanji sentences
JP2006004366A (en) Machine translation system and computer program therefor
Luong et al. Word graph-based multi-sentence compression: Re-ranking candidates using frequent words
JP3380077B2 (en) Morphological analyzer
KR100463376B1 (en) A Translation Engine Apparatus for Translating from Source Language to Target Language and Translation Method thereof
JP2807236B2 (en) Morphological analysis method
JP4708682B2 (en) Bilingual word pair learning method, apparatus, and recording medium on which parallel word pair learning program is recorded
Kadam Develop a Marathi Lemmatizer for Common Nouns and Simple Tenses
JPS63221475A (en) Parsing method
JP2004163993A (en) Method for preparing a topic-based translation knowledge base and a computer-executable program for causing a computer to perform the method, and a program and method for topic-based machine translation

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term