JPH03276252A

JPH03276252A - Document treatment system and document treatment equipment

Info

Publication number: JPH03276252A
Application number: JP2075795A
Authority: JP
Inventors: Kazuo Kondo; 一生近藤
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1990-03-27
Filing date: 1990-03-27
Publication date: 1991-12-06
Anticipated expiration: 2012-10-27
Also published as: JP2669437B2

Abstract

PURPOSE: To correctly judge the character kind for even a reading string unregistered in a dictionary by converting the string of reading to the string of phoneme so as to judge the character kinds of a reading string based on the converted phoneme. CONSTITUTION: An inputted reading string is stored in a reading string buffer 110, decomposed into the phoneme strings by a decomposing means 120 and stored in a phoneme buffer 140. Then the phoneme string is sent to a judging means 150, which refers to a frequency table 160 to judge the character kind of the phoneme string. A conversion means 170 converts the reading string stored in the reading string buffer 110 to a specific character kinds corresponding to this judging result and sends the conversion result to a character string buffer 180 to display it on an output means 30 such as CRT or LCD, thereby judging a correct character kind concerning a reading string unregistered in a dictionary.

Description

【発明の詳細な説明】Ａ、産業上の利用分野本発明は入力された読みの列の正しい文字種（字種）を
判定る、機能、即ち、入力された読みの列が正しくは漢
字表記されるべきが、カタカナ表記されるべきか、或は
、ひらがな表記されるべきか等を判定る、機能を有る、
文書処理装置に関る、ものである。[Detailed description of the invention] A. Industrial application field The present invention has a function of determining the correct character type (character type) of an input string of readings, that is, whether the input string of readings is correctly written in kanji. Has a function that determines whether something should be written in katakana or hiragana, etc.
It is related to document processing devices.

Ｂ、従来の技術従来の文書処理装置では、入力された読みの列（通常は
、ひらがな列）に対応る、漢字、或はカタカナ等が辞書
に登録されていない場合は、変換キーを何度押しても入
力された読みの列は入力時のままであるか、或は、入力
された読みの列が不適切に細かく分割されて不適切な漢
字の組合せ（当て字）に変換されることが通常であった
。即ち、辞書に登録されていない単語（未知語）につい
ては正しく変換る、ことが困難であった。特に、カタカ
ナは地名や人名の他、外来語や科学技術用語のように日
々に増加る、語句について多く用いられるので、カタカ
ナ語句については辞書に登録しきれない場合が多い。従
って、従来の文書処理装置では入力された読みの列が正
しくはカタカナに変換されるべき場合に、入力された読
みの列のままであったり、当て字にしか変換されないも
のであったり、或は、通常の変換キーの操作とは別のカ
タカナ変換専用の操作をしなければならなかった。B. Conventional technology In conventional document processing devices, if the kanji or katakana corresponding to the input reading sequence (usually a hiragana sequence) is not registered in the dictionary, the conversion key is pressed several times. Even when pressed, the input string of readings is usually left as it was when input, or the string of input readings is inappropriately divided into smaller pieces and converted into inappropriate kanji combinations (taji). Met. That is, it is difficult to correctly convert words that are not registered in the dictionary (unknown words). In particular, katakana is often used for words and phrases that are increasing day by day, such as place names and people's names, as well as foreign words and scientific and technological terms, so it is often not possible to register all katakana words and phrases in dictionaries. Therefore, in conventional document processing devices, when the input string of readings should be converted into katakana, the input string of readings may remain as it is, or it may only be converted into ``taji''. , I had to perform a special operation for katakana conversion, which is different from the normal conversion key operation.

また、カタカナに変換る、場合に限らず、入力された読
みの列の正しい文字種を判定る、ことは、適切な文書処
理を行う上で望ましいことである。In addition, it is desirable to determine the correct character type of an input string of pronunciations, not only when converting to katakana, but also when performing appropriate document processing.

Ｃ０発明が解決しようとる、問題点本発明の目的は、辞書に登録されていないような読みの
列についても、その文字種を正しく判定る、ことのでき
る機能を有る、文書処理方式及び装置を提供る、ことで
ある。Problems to be Solved by the C0 Invention The purpose of the present invention is to provide a document processing method and device having a function of correctly determining the character type even for a sequence of pronunciations that are not registered in a dictionary. Is Rukoto.

Ｄ２問題点を解決る、ための手段本発明者は、ひらがなの語句にはひらがならしいと感じ
るような音の響きがあり、カタカナにはカタカナらしい
と感じるような音の響きがあること、即ち、各文字種に
はその文字穫らしさを示す音の響きや音の繋り方があり
、従って、読みの列を音素の列に変換る、と、変換した
音素の列には文字穫の特徴が認められることを見出し、
このような知見に基づいて本発明を成すに至った。Means for Solving Problem D2 The present inventor discovered that hiragana words have a sound that feels like hiragana, and katakana has a sound that feels like katakana. Each character type has its own sound and the way the sounds are connected, which indicates its character. Therefore, when a sequence of pronunciations is converted into a sequence of phonemes, the character of the character can be recognized in the converted phoneme sequence. I discovered that
The present invention has been accomplished based on such findings.

本発明に係る文書処理方式は、読みの列を音素の列に変
換し、変換した音素の列に基づいて前記読みの列の文字
種を判定る、ことにより前記目的を達成しようとる、も
のである。The document processing method according to the present invention attempts to achieve the above object by converting a string of pronunciations into a string of phonemes, and determining the character type of the string of pronunciations based on the converted string of phonemes. .

本発明に係る第１の文書処理装置は、読みの列を入力る
、ための入力手段と、入力された読みの列を音素の列に
分解る、分解手段と、音素の列に基づいて前記読みの列
の文字種を判定る、判定手段と、前記判定手段に応じて
前記読みの列を特定の文字種に変換る、変換手段と、を
設けることにより、入力された読みの列を正しい文字穫
の表記に変換しようとる、ものである。A first document processing device according to the present invention includes: an input means for inputting a string of pronunciations; a decomposition means for decomposing the input string of pronunciations into a string of phonemes; By providing a determining means for determining the character type of a string of readings, and a converting means for converting the string of readings into a specific character type according to the determining means, it is possible to convert the input string of readings into correct character types. This is what we are trying to convert into the notation.

また、本発明に係る第２の文書処理装置は、文字列を読
みの列に変換る、読み列変換手段と、読みの列を音素の
列に分解る、分解手段と、音素の列に基づいて前記読み
の列の文字種を判定る、判定手段と、判定手段の示した
文字種と前記文字列の文字種とを比較る、比較手段と、
比較結果を表示る、出力手段と、を設けることにより、
何らかの文字種で既に表記されている語句に対して、当
該文字種の正しさを検査しようとる、ものである。Further, a second document processing device according to the present invention includes a reading string converting means for converting a character string into a reading string, a decomposing means for decomposing the reading string into a phoneme string, and a reading string converting means for converting a character string into a reading string. a determining means for determining the character type of the string of readings, and a comparing means for comparing the character type indicated by the determining means with the character type of the character string;
By providing an output means for displaying the comparison results,
It attempts to check the correctness of a word or phrase that has already been written in a certain character type.

Ｅ、実施例以下、本発明の実施例を図面に基づいて説明る、。E. Example Hereinafter, embodiments of the present invention will be described based on the drawings.

第１図には本発明に係る文書処理装置の一実施例が示さ
れている。FIG. 1 shows an embodiment of a document processing device according to the present invention.

図中、入力手Ｒ２０は読みの列を入力る、ためのキーボ
ード或は手書き入力タブレット等であり、人力された読
みの列は読み列バッファ１１０に記憶される。読み列バ
ッファ１１０に記憶された読みの列は分解手段１２０に
より音素の列に分解され、分解された音素の列は音素列
バッファ１４０に記憶される。分解手段１２０は音素変
換テーブル１３０を参照して読みの列を音素の列に分解
（変換）る、。In the figure, an input hand R20 is a keyboard, a handwriting input tablet, or the like for inputting a reading string, and the reading string input manually is stored in a reading string buffer 110. The yomi string stored in the yomi string buffer 110 is decomposed into a phoneme string by the decomposition means 120, and the decomposed phoneme string is stored in the phoneme string buffer 140. The decomposition means 120 refers to the phoneme conversion table 130 and decomposes (converts) the string of pronunciations into a string of phonemes.

音素の列は判定手段１５０に送られ、判定手段１５０は
頻度テーブル１６０を参照して音素の列の文字種を判定
し、判定結果を変換手段１７０に送る。変換手段１７０
は読み列バッファ１１０に記憶されている読みの列を、
前記判定結果に応じて、特定の文字種に変換し、変換結
果を文字列バッファ１８０に送り、文字列バッファ１８
０の内容はＣＲＴ或はＬＣＤ等の出力手段３０により表
示される。The string of phonemes is sent to the determining means 150, which determines the character type of the string of phonemes by referring to the frequency table 160, and sends the determination result to the converting means 170. Conversion means 170
is the reading sequence stored in the reading sequence buffer 110,
According to the determination result, the character is converted into a specific character type, the conversion result is sent to the character string buffer 180, and the character string buffer 18
The contents of 0 are displayed by output means 30 such as CRT or LCD.

次に、読みの列が「あいでんでぃふめい」である場合を
例にして本実施例を更に詳細に説明る、。Next, the present embodiment will be described in more detail using an example in which the reading string is "Aiden Difumei".

入力手段２０によって入力された読みの列「あいでんて
ぃふぁい」は所定の文字コードにより読み列バッファ１
１０に記憶され、記憶された「あいでんでいふめい」と
いう読みの列は分解手段１２０により音素の列に分解さ
れる。The reading string "Aiden Tiffany" inputted by the input means 20 is stored in the reading string buffer 1 using a predetermined character code.
10, and the stored string of pronunciations "Aidendeifumei" is decomposed into a string of phonemes by the decomposition means 120.

第２図には分解手段１２０が参照る、音素変換テーブル
１３０が示されている。音素変換テーブル１３０を用い
た場合における、読みの列「あいでんでいふめい」と音
素の列との対応間係を示すと以下の第１表のようになる
。FIG. 2 shows a phoneme conversion table 130 that is referred to by the decomposition means 120. When the phoneme conversion table 130 is used, the following Table 1 shows the correspondence between the pronunciation string "Aidendeifumei" and the phoneme string.

第１表　読みの列「あいでんでいふめい」とその音素の
列との対応関係即ち、読みの列「あいでんていふあい」は分解手段！２
０により音素の列ｒ　ＡＩＤＥＮＮＴＥｉＨＵａＩ　Ｊ
に分解（変換）される。音素列バッファ１４０内の音素
の列ｒ　ＡＩＤＥＮＮＴＥｉＨＵａＩ　Ｊは判定手段１
５０による文字種判定の対象となり、判定手段１５０は
頻度テーブル１６０を参照して音素の列ｒ　ＡＩＤＥＮ
ＮＴＥｉＨＵａＩ　Ｊの文字種を判定る、。Table 1 Correspondence between the pronunciation sequence “Aiden teifumei” and its phoneme sequence, that is, the pronunciation sequence “Aiden teifumei” is a means of decomposition! 2
0 makes a sequence of phonemes r AIDENNTEiHUaI J
It is decomposed (converted) into The phoneme sequence r AIDENNTEiHUaI J in the phoneme sequence buffer 140 is the determination means 1
50, and the determining means 150 refers to the frequency table 160 to determine the phoneme sequence r AIDEN.
Determine the character type of NTEiHUaI J.

第３図には頻度テーブル１６０が示されている。頻度テ
ーブル１６０の縦軸及び横軸の夫々には全ての音素が並
べられ、特定の縦軸の音素の行と特定の横軸の音素の列
との交差位置には、前記縦軸の音素の次に前記横軸の音
素が繋がっているような音素の接続形態が、カタカナで
表記されるべき語句の中にどの程度の頻度で出現してい
るか、或は、ひらがなで表記されるべき語句の中にどの
程度の頻度で出現しているか、という文字種に間る、相
対的な頻度が示されている。別言すれば、頻度テーブル
１６０は、２つの音素の全ての組合（接続形態）につい
ての、カタカナらしさ或はひらがならしさの度合を数値
で表している。A frequency table 160 is shown in FIG. All phonemes are arranged on each of the vertical and horizontal axes of the frequency table 160, and at the intersection of a specific vertical axis phoneme row and a specific horizontal axis phoneme column, the vertical axis phoneme column is arranged. Next, how often do phoneme connections in which the phonemes on the horizontal axis are connected occur in words that should be written in katakana, or in words that should be written in hiragana? It shows the relative frequency of each character type. In other words, the frequency table 160 numerically represents the degree of Katakana-likeness or Hiragana-likeness for all combinations (connection forms) of two phonemes.

第３図の頻度テーブル１６０について更に説明る、。The frequency table 160 of FIG. 3 will be further explained.

頻度テーブル１８０は以下のようにして作成る、。The frequency table 180 is created as follows.

先ず、既存の文書中のひらがな及びカタカナの語句の夫
々について出現回数を計数る、とともに、第２図の音素
テーブルを参照して読みの列を音素の列に変換る、。以
下の第２表及び第３表は夫々、参照した既存の文書中の
ひらがなの語句についての出現回数及びカタカナの語句
についての出現回数の例を示している。First, the number of occurrences of each hiragana and katakana word in the existing document is counted, and the reading sequence is converted into a phoneme sequence with reference to the phoneme table shown in FIG. Tables 2 and 3 below show examples of the number of occurrences of hiragana words and phrases and the number of occurrences of katakana words in the referenced existing documents, respectively.

第３表　カタカナの語句の出現回数の例第２表　ひらが
なの語句の出現回数の例（以下余白）こうして得られた第２表及び第３表から、ひらがなの語
句中に現れる音素の対の出現回数とカタカナの語句中に
現れる音素の対の出現回数との夫々を計数る、。例えば
、ｒＡＩＪという音素の対が、ひらがなの語句中に現れ
る出現回数を第２表から求めると、第２表の２行目のｒ
ＡＡＩＵＪ中に現れる回数が１回で、３行目のｒＡＩＪ
中に現れる回数が２回なので、これらの回数を合計して
得られる３回が求める出現回数になる。同様の操作を適
当な数の既存の文書について行った結果が以下の第４゛
表及び第５表の夫々に示されている。即ち、第４表はひ
らがなの語句中に現れた音素の対の出現回数を示し、第
５表はカタカナの語句中に現れた音素の対の出現回数を
示している。Table 3 Examples of the number of occurrences of katakana words Table 2 Examples of the number of occurrences of hiragana words (blank below) From Tables 2 and 3 obtained in this way, the occurrence of phoneme pairs that appear in hiragana words Count the number of times a phoneme pair appears in a katakana phrase. For example, if we calculate the number of times the phoneme pair rAIJ appears in hiragana words from Table 2, we find that rAIJ appears in the second row of Table 2.
The number of times it appears in AAIUJ is 1, and rAIJ in the 3rd line
Since the number of times it appears is two, the number of times it appears is 3 times, which is the sum of these numbers. The results of similar operations performed on a suitable number of existing documents are shown in Tables 4 and 5 below, respectively. That is, Table 4 shows the number of times phoneme pairs appear in hiragana words, and Table 5 shows the number of times phoneme pairs appear in katakana words.

第５表　カタカナの語句中の音素の対の出現回数の例第４表　ひらがなの語句中の音素の対の出現回数の例これらの表から、例えばｒＵａＪという音素の対（「Ｕ
」が先でその次にｒＢ」が続くような音素の接続形態）
は、ひらがなの語句の中には１回しか出現しないが、カ
タカナの語句の中には２２３９回も出現る、ことが分か
る。Table 5 Examples of the number of occurrences of phoneme pairs in katakana phrases Table 4 Examples of the number of occurrences of phoneme pairs in hiragana phrases From these tables, for example, the phoneme pair rUaJ (``U
`` first, followed by rB'')
It can be seen that although it appears only once in Hiragana words, it appears 2239 times in Katakana words.

音素列バッファ１４０に記憶された音素の列の文字種（
ひらがな或はカタカナ）を判定る、には、こうして得ら
れた第４表及び第５表をそのまま利用してもよいが、本
実施例では第４表及び第５表から頻度テーブル１６０（
第３図）を作り、この頻度テーブル１６０を利用してい
る。The character type of the phoneme string stored in the phoneme string buffer 140 (
Although the fourth and fifth tables obtained in this way may be used as they are to determine the frequency table 160 (hiragana or katakana), in this embodiment, the frequency table 160 (
FIG. 3) is created and this frequency table 160 is used.

次に、第４表及び第５表から頻度テーブル１６０を作る
手順を説明る、。Next, a procedure for creating the frequency table 160 from Tables 4 and 5 will be explained.

第４表中に示されている音素の対の出現回数の総数をｓ
ｂとし、第５表中に示されている音素の対の出現回数の
総数をＳｋとる、と、ｓｈの値とＳｋの値とは一致しな
いことが通常である。そこで、音素の対の出現回数の総
数が第４表の場合と第５表の場合とで同じ値になるよう
に、第５表中の各僅に「Ｓｈ／Ｓｋ　Ｊを乗じる。第４
表中の値はそのままとる、。The total number of occurrences of the phoneme pairs shown in Table 4 is s
If b is the total number of occurrences of the phoneme pairs shown in Table 5, and Sk is the total number of occurrences of the phoneme pairs shown in Table 5, then the value of sh and the value of Sk usually do not match. Therefore, each number in Table 5 is multiplied by Sh/Sk J so that the total number of occurrences of phoneme pairs is the same in Table 4 and Table 5.
Take the values in the table as they are.

第４表のｉ行とｊ列との交差位置（出現回数）の値をＨ
ｌｊとし、第５表中のｉ行とｊ列との交差位置の値（出
現回数）にｒ　Ｓｈ／Ｓｋ　Ｊを乗じた値をＫｉｊとし
たとき、第３図の頻度テーブル１６０中のｉ行とｊ列と
の交差位置の値（相対頻度）をＲｉｊとる、と、Ｒｆｊ
　ｔ　Ｈｉｊ及びＫｆｊから以下のようにして求める。The value of the intersection position (number of occurrences) between row i and column j in Table 4 is
lj, and Kij is the value obtained by multiplying the value (number of occurrences) of the intersection of row i and column j in Table 5 by rSh/SkJ, then row i in frequency table 160 in FIG. Taking the value (relative frequency) of the intersection position of and column j as Rij, then Rfj
It is determined from t Hij and Kfj as follows.

（ｉ）　０　＊　Ｈｉｊかつ０≠Ｋｉｊのとき：Ｒｉｊ
　　＝　　　ＲｃＸ　　Ｌｏｇ（Ｋｉｊ／Ｈｉｊ）とる
、。(i) When 0 * Hij and 0≠Kij: Rij
Take = RcX Log(Kij/Hij).

ここでＲｅは実計算のための重み定数で通常は１である
。また１つの相対頻度テーブル１６０中で常に一定値を
とる。小数点以下を切り捨てる計算を行う場合にはｆｌ
ａを大きく取り、切り捨て誤差を小さくる、。また対数
の底は自然対数の底（ｅ＝２．７１８．、、　’）とる
、。Here, Re is a weight constant for actual calculation and is usually 1. Also, it always takes a constant value in one relative frequency table 160. When performing calculations that round down the decimal places, use fl
Increase a to reduce the truncation error. Also, the base of the logarithm is the base of the natural logarithm (e=2.718.,,').

（ｉｉ）　Ｈｉｊ＝Ｏかつ　Ｋｉｊ＝Ｏのとき：Ｒｉｊ
　＝　　Ｏとる、。(ii) When Hij=O and Kij=O: Rij
= Take O.

（ｉｉｉ）　　Ｈｉｊ≠Ｏかつ　Ｋｉｊ＝Ｑ　　のとき
：Ｒｆｊ　＝　　−Ｍａｘとる、。(iii) When Hij≠O and Kij=Q: Take Rfj = -Max.

但し、Ｍａｘは扱い得る最大の絶対値であり、従って、
ｒ　−Ｍａｘ　Ｊは扱い得る最小の値である。However, Max is the maximum absolute value that can be handled, and therefore,
r-Max J is the minimum value that can be handled.

この場合、音素の対はひらがなの語句中には出現る、が
カタカナの語句中には全く出現しなかった訳である。第
３図の頻度テーブル１６０は、カタカナの語句中に現れ
る頻度の高い場合を正側の大きな値で表し、ひらがなの
語句中に現れる頻度の高い場合を負側の大きな値で表す
こととしたので、この場合は、扱い得る最大（絶対値が
最大という意味である。）の負側の値をＲｌｊの値とる
、。In this case, the phoneme pair appears in hiragana words, but never in katakana words. In the frequency table 160 of FIG. 3, cases that appear frequently in katakana words and phrases are represented by large values on the positive side, and cases that appear frequently in hiragana words and phrases are represented by large values on the negative side. , In this case, the value of Rlj is taken as the negative side of the maximum that can be handled (meaning the maximum absolute value).

１ＨＵａＩＪ中の全ての隣接る、音素の対についての頻
度テーブル１６０中の値（相対頻度）ｆ：求めると、以
下の第６表のようになる。Values (relative frequencies) f in the frequency table 160 for all adjacent phoneme pairs in 1HUaIJ: When calculated, the results are as shown in Table 6 below.

（以下余白）（ｉｖ）　Ｈｉｊ＝Ｏかつ　Ｋｉｊ≠０　のとき：Ｒｉ
ｊ　＝　　＋　Ｍａｘ　とる、。(Left below) (iv) When Hij=O and Kij≠0: Ri
Take j = + Max.

この場合、音素の対はカタカナの語句中には出現る、が
ひらがなの語句中には全く出現しなかった訳である。従
って、この場合は、扱い得る最大の正側の値をＲｉｊの
値とる、。In this case, the phoneme pair appears in katakana words, but never in hiragana words. Therefore, in this case, the maximum positive value that can be handled is taken as the value of Rij.

以上のようにして第３図の頻度テーブル１６０が得られ
るので、次に、相対頻度テーブル１６０を参照して音素
の列Ｖ　ＡＴＤＥＮＮＴＥｉＨＵａＩ　Ｊの文字種を判
定る、過程を説明る、。先ず、音素の列ｒＡＩＤＥＮＮ
ＴＥ第６表　音素の列ｒＡＩＤＥＮＮＴＥｉＨＵａｌ　
Ｊの各音素対の相対頻度値素の列ｒ　ＡＩＤＥＮＮＴＥｉＨＵａｌ　Ｊの文字種は
カタカナテあると判定る、。また、判定結果が正しいで
あろうと予想される度合を判定度として以下のように定
義る、。Since the frequency table 160 of FIG. 3 is obtained in the above manner, the process of determining the character type of the phoneme string V ATDENNTEiHUaI J with reference to the relative frequency table 160 will now be described. First, the phoneme sequence rAIDENN
TE Table 6 Phoneme sequence rAIDENNTEiHUal
Relative frequency value sequence r of each phoneme pair of J AIDENNTEiHUal It is determined that the character type of J is Katakana. Further, the degree to which the judgment result is expected to be correct is defined as the judgment degree as follows.

合計値と同符合のここで、第６表の右横の値、即ち、音素の対の相対頻度
の値の合計値は＋４３３になる。従って、音これを第６
表の場合に適用る、と、判定度＝９５％（＝４５７／４
８１）である。従って、音素の列ｒＡＩＤＥＮＮＴＥｉ
ＨＵａＩＪは判定度９５％を持ってカタカナであると判
定される。こうして得られた判定結果は判定手段１６０
から変換手段１７０へと送られ、変換手段１７０は読み
列バッファ１１０に記憶されている読みの列「おいてん
ていふあい」を判定結果に基づいてカタカナという文字
種に変換し文字列バッファ１８０に送り、出力手段３０
はカタカナという文字種で「アイデンティファイ」と表
記る、。The total value has the same sign as the value on the right side of Table 6, that is, the total value of the relative frequency values of the phoneme pairs is +433. Therefore, the sound is the 6th
When applied to tables, the judgment level = 95% (=457/4
81). Therefore, the sequence of phonemes rAIDENNTEi
HUaIJ is determined to be katakana with a determination degree of 95%. The determination result obtained in this way is determined by the determination means 160.
The conversion means 170 converts the string of pronunciations stored in the reading string buffer 110 ``Otenteifai'' into the character type katakana based on the determination result, and sends the converted string to the character string buffer 180. , output means 30
is written as ``identify'' in katakana.

以上、読みの列「あいでんでぃふぁい」の文字種がカタ
カナであると判定されてカタカナで表記される過程を説
明したが、次に、読みの列「まどろっこしい」の文字種
がひらがなであると判定される過程を説明る、。Above, we have explained the process by which the character type for the reading string ``aidendi-fai'' is determined to be katakana and is written in katakana. Explaining the process of determining hiragana.

先ず、読みの列「まどろっこしい」を音素の列にる、と
第７表のようになる。First, if we put the pronunciation column ``Madorokkoshii'' in the phoneme column, we get something like Table 7.

第８表　音素の列ｒ　ＭＡＤＯＲＯＫＫＯＳＩ　Ｉ　、
Ｊの各音素対の相対頻度値第７表　読みの列「まどろっこしい」とその音素の列と
の対応間係こうして、音素の列ｒ　ＭＡＤＯＲＯＫＫＯＳＩ　Ｉ　
、Ｊが求まるので、次は、音素の列ｒ　ＭＡＤＯＲＯＫ
ＫＯＳＩ　Ｉ　Ｊの全ての音素の対の相対頻度値を求め
ると第８表のようになる。Table 8 Phoneme sequence r MADOROKKOSI I,
Relative frequency values for each phoneme pair in J Table 7 Correspondence between the pronunciation sequence ``Madorokkoshii'' and its phoneme sequence Thus, the phoneme sequence r MADOROKKOSI I
, J is found, next we need to find the phoneme sequence r MADOROK
Table 8 shows the relative frequency values of all phoneme pairs in KOSI I J.

（以下余白）ここで、第８表の右欄の値、即ち、音素の対の相対頻度
の値の合計値は−３８になる。従って、音素の列ｒ　Ｍ
ＡＤＯＲＯＫＫＯＳ　Ｉ　Ｉ　Ｊの文字種はひらがなで
あると判定る、。また、判定度は６４％（＝８５／１３
２）である。このように、本実施例によれば、辞書に登
録されていない語句（読みの列）についても、正しい文
字種に変換る、ことが可能になる。(Left below) Here, the total value of the values in the right column of Table 8, that is, the relative frequency values of phoneme pairs, is -38. Therefore, the sequence of phonemes r M
The character type of ADOROKKOS I I J is determined to be Hiragana. Also, the judgment degree is 64% (=85/13
2). In this way, according to this embodiment, it is possible to convert words (reading sequences) that are not registered in the dictionary into the correct character type.

第４図には前記以外の実施例が示されており、この実施
例ではかな漢変換を一旦行った後の文章或は自動翻訳機
で翻訳された文章中の語句の文字種のブルーフ・リード
（検証）を行うことができる。FIG. 4 shows an embodiment other than the above, and in this embodiment, the character type of the word in the sentence after Kana-Kan conversion or the sentence translated by an automatic translator ( verification).

第４図において、文章メモリ２１０には、かな漢変換を
既に行った後の文章或は自動翻訳機で翻訳された文章等
が記憶されている。文章メモリ２１０内の文章中の文字
列は文字列抽出手段２２０によって文字種別に抽出され
、抽出された文字列は文字列バッファ２３０に記憶され
、抽出された文字列の文字種は文字種バッファ２４０に
記憶される。文字列バッファ２３０に記憶された文字列
は読み列変換手段２５０により読み列に変換される。こ
の際、読み列変換手段２５０は、文字列と読み列との対
応関係を示している読みテーブル２６０を参照る、。In FIG. 4, a sentence memory 210 stores sentences that have already undergone kana-kan conversion or sentences that have been translated by an automatic translator. The character strings in the sentences in the sentence memory 210 are extracted by character type by the character string extraction means 220, the extracted character strings are stored in the character string buffer 230, and the character types of the extracted character strings are stored in the character type buffer 240. be done. The character string stored in the character string buffer 230 is converted into a reading string by the reading string conversion means 250. At this time, the pronunciation conversion means 250 refers to the pronunciation table 260 that shows the correspondence between character strings and pronunciation sequences.

読み列変換手段２５０で得られた読み列は、第１図の実
施例と同様に、分解手段１２０により音素の列に分解さ
れ、分解された音素の列は分解手段１２０により音素の
列に分解（変換）され、判定手段１５０は相対頻度テー
ブル１６０ｔ！：参照して音素の列の文字種を判定し、
判定結果を比較手段１７０に送る。The pronunciation sequence obtained by the pronunciation conversion means 250 is decomposed into a sequence of phonemes by the decomposition means 120, and the decomposition means 120 decomposes the sequence of phonemes into a sequence of phonemes, as in the embodiment of FIG. (conversion), and the determining means 150 uses the relative frequency table 160t! : Reference and determine the character type of the phoneme string,
The determination result is sent to comparison means 170.

比較手段手段１７０は前記判定結果と文字種バッファ２
４０に記憶されている文字種情報とを比較し、比較結果
を出力バッファ２８０に送り、出力手段は比較結果を表
示る、。この際の表示方法は例えば判定結果と文字種バ
ッファ２４０に記憶されている文字種情報とが異なって
いる場合には文字列をハイライト表示る、とともに前記
判定結果の示す文字種を表示る、等である。The comparison means 170 compares the judgment result with the character type buffer 2.
40, the comparison result is sent to the output buffer 280, and the output means displays the comparison result. The display method at this time is, for example, if the determination result and the character type information stored in the character type buffer 240 are different, the character string is highlighted, and at the same time, the character type indicated by the determination result is displayed. .

なお、読みと音素との対応関係は前出の第４図の音素変
換テーブルに示すものに限られず、例えば、第５図の音
素変換テーブルに示すような対応間係であってもよい。Note that the correspondence between pronunciations and phonemes is not limited to that shown in the phoneme conversion table shown in FIG. 4, but may be, for example, the correspondence shown in the phoneme conversion table shown in FIG. 5.

また、前記実施例では頻度テーブル１６０は音素の全て
の接続形態についての出現頻度を示していたが、頻度テ
ーブルは文字種を判定る、上で太きな要素となるような
一部の音素の接続形態についてだけの出現頻度を示すも
のであってもよい。また、文字種の判定を行う際には、
判定対象の音素列の全ての接続形態の出現頻度を常に調
べるのではなく、ある文字種にとっての極めて特徴的な
音素列を認めた場合には直ちに文字種の判定を完了して
もよい。また、本発明が適用されるのは日本語に限られ
ず、文字種を有る、外国語についての文書処理にも適用
できる。In addition, in the above embodiment, the frequency table 160 shows the appearance frequency of all connection forms of phonemes, but the frequency table is used to determine the character type, and the connection of some phonemes, which are the bold elements above. It may also indicate the appearance frequency of only the form. Also, when determining the character type,
Rather than constantly checking the frequency of appearance of all connected forms of the phoneme string to be determined, the determination of the character type may be completed immediately when a phoneme string that is extremely characteristic of a certain character type is recognized. Furthermore, the present invention is applicable not only to Japanese, but also to document processing for foreign languages that have different character types.

Ｆ８発明の効果上述のように、本発明によれば辞書に登録されていない
読みの列について正しい文字種を判定る、ことができる
。F8 Effects of the Invention As described above, according to the present invention, it is possible to determine the correct character type for a sequence of pronunciations that are not registered in a dictionary.

[Brief explanation of drawings]

第１図は本発明に係る文書処理装置の一実施例の構成を
示すブロック図、第２図は前記実施例の音素変換テーブ
ルを示す図表、第３図は前記実施例の相対頻度テーブル
を示す図表、第４図は前記以外の実施例の構成を示すブ
ロック図、第５図は前記以外の音素変換テーブルを示す
図表である。FIG. 1 is a block diagram showing the configuration of an embodiment of a document processing device according to the present invention, FIG. 2 is a diagram showing a phoneme conversion table of the embodiment, and FIG. 3 is a relative frequency table of the embodiment. FIG. 4 is a block diagram showing the configuration of an embodiment other than the above, and FIG. 5 is a chart showing a phoneme conversion table other than the above.

Claims

[Claims]

(1) A document processing method including a process of converting a string of readings into a string of phonemes and determining the character type of the string of readings based on the converted string of phonemes.

(2) an input means for inputting a string of readings; a decomposition means for breaking down the string of readings into a string of phonemes; a determining means for determining the character type of the string of readings based on the string of phonemes; A document processing device comprising: a conversion means for converting a string of characters into a character type indicated by a determination means; and an output means for displaying a conversion result.

(3) The document processing apparatus according to claim 2, wherein the determining means determines the character type based on the connection form of at least a portion of each phoneme forming a string of phonemes.

(4) The document processing device according to claim 2 or 3, wherein the determining means refers to a frequency table representing a relationship between a phoneme connection form and the frequency with which the connection form appears in character types.

(5) reading string conversion means for converting a character string into a reading string;
a decomposition means for decomposing a string of pronunciations into a string of phonemes; a determining means for determining the character type of the string of pronunciations based on the string of phonemes;
A document processing device comprising: a comparison means for comparing the character type indicated by the determination means with a character type of the character string; and an output means for displaying the comparison result.