JPH03276252A - Document treatment system and document treatment equipment - Google Patents
Document treatment system and document treatment equipmentInfo
- Publication number
- JPH03276252A JPH03276252A JP2075795A JP7579590A JPH03276252A JP H03276252 A JPH03276252 A JP H03276252A JP 2075795 A JP2075795 A JP 2075795A JP 7579590 A JP7579590 A JP 7579590A JP H03276252 A JPH03276252 A JP H03276252A
- Authority
- JP
- Japan
- Prior art keywords
- string
- phoneme
- character
- reading
- character type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000006243 chemical reaction Methods 0.000 claims abstract description 24
- 238000000354 decomposition reaction Methods 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Document Processing Apparatus (AREA)
Abstract
Description
【発明の詳細な説明】
A、産業上の利用分野
本発明は入力された読みの列の正しい文字種(字種)を
判定る、機能、即ち、入力された読みの列が正しくは漢
字表記されるべきが、カタカナ表記されるべきか、或は
、ひらがな表記されるべきか等を判定る、機能を有る、
文書処理装置に関る、ものである。[Detailed description of the invention] A. Industrial application field The present invention has a function of determining the correct character type (character type) of an input string of readings, that is, whether the input string of readings is correctly written in kanji. Has a function that determines whether something should be written in katakana or hiragana, etc.
It is related to document processing devices.
B、従来の技術
従来の文書処理装置では、入力された読みの列(通常は
、ひらがな列)に対応る、漢字、或はカタカナ等が辞書
に登録されていない場合は、変換キーを何度押しても入
力された読みの列は入力時のままであるか、或は、入力
された読みの列が不適切に細かく分割されて不適切な漢
字の組合せ(当て字)に変換されることが通常であった
。即ち、辞書に登録されていない単語(未知語)につい
ては正しく変換る、ことが困難であった。特に、カタカ
ナは地名や人名の他、外来語や科学技術用語のように日
々に増加る、語句について多く用いられるので、カタカ
ナ語句については辞書に登録しきれない場合が多い。従
って、従来の文書処理装置では入力された読みの列が正
しくはカタカナに変換されるべき場合に、入力された読
みの列のままであったり、当て字にしか変換されないも
のであったり、或は、通常の変換キーの操作とは別のカ
タカナ変換専用の操作をしなければならなかった。B. Conventional technology In conventional document processing devices, if the kanji or katakana corresponding to the input reading sequence (usually a hiragana sequence) is not registered in the dictionary, the conversion key is pressed several times. Even when pressed, the input string of readings is usually left as it was when input, or the string of input readings is inappropriately divided into smaller pieces and converted into inappropriate kanji combinations (taji). Met. That is, it is difficult to correctly convert words that are not registered in the dictionary (unknown words). In particular, katakana is often used for words and phrases that are increasing day by day, such as place names and people's names, as well as foreign words and scientific and technological terms, so it is often not possible to register all katakana words and phrases in dictionaries. Therefore, in conventional document processing devices, when the input string of readings should be converted into katakana, the input string of readings may remain as it is, or it may only be converted into ``taji''. , I had to perform a special operation for katakana conversion, which is different from the normal conversion key operation.
また、カタカナに変換る、場合に限らず、入力された読
みの列の正しい文字種を判定る、ことは、適切な文書処
理を行う上で望ましいことである。In addition, it is desirable to determine the correct character type of an input string of pronunciations, not only when converting to katakana, but also when performing appropriate document processing.
C0発明が解決しようとる、問題点
本発明の目的は、辞書に登録されていないような読みの
列についても、その文字種を正しく判定る、ことのでき
る機能を有る、文書処理方式及び装置を提供る、ことで
ある。Problems to be Solved by the C0 Invention The purpose of the present invention is to provide a document processing method and device having a function of correctly determining the character type even for a sequence of pronunciations that are not registered in a dictionary. Is Rukoto.
D2問題点を解決る、ための手段
本発明者は、ひらがなの語句にはひらがならしいと感じ
るような音の響きがあり、カタカナにはカタカナらしい
と感じるような音の響きがあること、即ち、各文字種に
はその文字穫らしさを示す音の響きや音の繋り方があり
、従って、読みの列を音素の列に変換る、と、変換した
音素の列には文字穫の特徴が認められることを見出し、
このような知見に基づいて本発明を成すに至った。Means for Solving Problem D2 The present inventor discovered that hiragana words have a sound that feels like hiragana, and katakana has a sound that feels like katakana. Each character type has its own sound and the way the sounds are connected, which indicates its character. Therefore, when a sequence of pronunciations is converted into a sequence of phonemes, the character of the character can be recognized in the converted phoneme sequence. I discovered that
The present invention has been accomplished based on such findings.
本発明に係る文書処理方式は、読みの列を音素の列に変
換し、変換した音素の列に基づいて前記読みの列の文字
種を判定る、ことにより前記目的を達成しようとる、も
のである。The document processing method according to the present invention attempts to achieve the above object by converting a string of pronunciations into a string of phonemes, and determining the character type of the string of pronunciations based on the converted string of phonemes. .
本発明に係る第1の文書処理装置は、読みの列を入力る
、ための入力手段と、入力された読みの列を音素の列に
分解る、分解手段と、音素の列に基づいて前記読みの列
の文字種を判定る、判定手段と、前記判定手段に応じて
前記読みの列を特定の文字種に変換る、変換手段と、を
設けることにより、入力された読みの列を正しい文字穫
の表記に変換しようとる、ものである。A first document processing device according to the present invention includes: an input means for inputting a string of pronunciations; a decomposition means for decomposing the input string of pronunciations into a string of phonemes; By providing a determining means for determining the character type of a string of readings, and a converting means for converting the string of readings into a specific character type according to the determining means, it is possible to convert the input string of readings into correct character types. This is what we are trying to convert into the notation.
また、本発明に係る第2の文書処理装置は、文字列を読
みの列に変換る、読み列変換手段と、読みの列を音素の
列に分解る、分解手段と、音素の列に基づいて前記読み
の列の文字種を判定る、判定手段と、判定手段の示した
文字種と前記文字列の文字種とを比較る、比較手段と、
比較結果を表示る、出力手段と、を設けることにより、
何らかの文字種で既に表記されている語句に対して、当
該文字種の正しさを検査しようとる、ものである。Further, a second document processing device according to the present invention includes a reading string converting means for converting a character string into a reading string, a decomposing means for decomposing the reading string into a phoneme string, and a reading string converting means for converting a character string into a reading string. a determining means for determining the character type of the string of readings, and a comparing means for comparing the character type indicated by the determining means with the character type of the character string;
By providing an output means for displaying the comparison results,
It attempts to check the correctness of a word or phrase that has already been written in a certain character type.
E、実施例 以下、本発明の実施例を図面に基づいて説明る、。E. Example Hereinafter, embodiments of the present invention will be described based on the drawings.
第1図には本発明に係る文書処理装置の一実施例が示さ
れている。FIG. 1 shows an embodiment of a document processing device according to the present invention.
図中、入力手R20は読みの列を入力る、ためのキーボ
ード或は手書き入力タブレット等であり、人力された読
みの列は読み列バッファ110に記憶される。読み列バ
ッファ110に記憶された読みの列は分解手段120に
より音素の列に分解され、分解された音素の列は音素列
バッファ140に記憶される。分解手段120は音素変
換テーブル130を参照して読みの列を音素の列に分解
(変換)る、。In the figure, an input hand R20 is a keyboard, a handwriting input tablet, or the like for inputting a reading string, and the reading string input manually is stored in a reading string buffer 110. The yomi string stored in the yomi string buffer 110 is decomposed into a phoneme string by the decomposition means 120, and the decomposed phoneme string is stored in the phoneme string buffer 140. The decomposition means 120 refers to the phoneme conversion table 130 and decomposes (converts) the string of pronunciations into a string of phonemes.
音素の列は判定手段150に送られ、判定手段150は
頻度テーブル160を参照して音素の列の文字種を判定
し、判定結果を変換手段170に送る。変換手段170
は読み列バッファ110に記憶されている読みの列を、
前記判定結果に応じて、特定の文字種に変換し、変換結
果を文字列バッファ180に送り、文字列バッファ18
0の内容はCRT或はLCD等の出力手段30により表
示される。The string of phonemes is sent to the determining means 150, which determines the character type of the string of phonemes by referring to the frequency table 160, and sends the determination result to the converting means 170. Conversion means 170
is the reading sequence stored in the reading sequence buffer 110,
According to the determination result, the character is converted into a specific character type, the conversion result is sent to the character string buffer 180, and the character string buffer 18
The contents of 0 are displayed by output means 30 such as CRT or LCD.
次に、読みの列が「あいでんでぃふめい」である場合を
例にして本実施例を更に詳細に説明る、。Next, the present embodiment will be described in more detail using an example in which the reading string is "Aiden Difumei".
入力手段20によって入力された読みの列「あいでんて
ぃふぁい」は所定の文字コードにより読み列バッファ1
10に記憶され、記憶された「あいでんでいふめい」と
いう読みの列は分解手段120により音素の列に分解さ
れる。The reading string "Aiden Tiffany" inputted by the input means 20 is stored in the reading string buffer 1 using a predetermined character code.
10, and the stored string of pronunciations "Aidendeifumei" is decomposed into a string of phonemes by the decomposition means 120.
第2図には分解手段120が参照る、音素変換テーブル
130が示されている。音素変換テーブル130を用い
た場合における、読みの列「あいでんでいふめい」と音
素の列との対応間係を示すと以下の第1表のようになる
。FIG. 2 shows a phoneme conversion table 130 that is referred to by the decomposition means 120. When the phoneme conversion table 130 is used, the following Table 1 shows the correspondence between the pronunciation string "Aidendeifumei" and the phoneme string.
第1表 読みの列「あいでんでいふめい」とその音素の
列との対応関係
即ち、読みの列「あいでんていふあい」は分解手段!2
0により音素の列r AIDENNTEiHUaI J
に分解(変換)される。音素列バッファ140内の音素
の列r AIDENNTEiHUaI Jは判定手段1
50による文字種判定の対象となり、判定手段150は
頻度テーブル160を参照して音素の列r AIDEN
NTEiHUaI Jの文字種を判定る、。Table 1 Correspondence between the pronunciation sequence “Aiden teifumei” and its phoneme sequence, that is, the pronunciation sequence “Aiden teifumei” is a means of decomposition! 2
0 makes a sequence of phonemes r AIDENNTEiHUaI J
It is decomposed (converted) into The phoneme sequence r AIDENNTEiHUaI J in the phoneme sequence buffer 140 is the determination means 1
50, and the determining means 150 refers to the frequency table 160 to determine the phoneme sequence r AIDEN.
Determine the character type of NTEiHUaI J.
第3図には頻度テーブル160が示されている。頻度テ
ーブル160の縦軸及び横軸の夫々には全ての音素が並
べられ、特定の縦軸の音素の行と特定の横軸の音素の列
との交差位置には、前記縦軸の音素の次に前記横軸の音
素が繋がっているような音素の接続形態が、カタカナで
表記されるべき語句の中にどの程度の頻度で出現してい
るか、或は、ひらがなで表記されるべき語句の中にどの
程度の頻度で出現しているか、という文字種に間る、相
対的な頻度が示されている。別言すれば、頻度テーブル
160は、2つの音素の全ての組合(接続形態)につい
ての、カタカナらしさ或はひらがならしさの度合を数値
で表している。A frequency table 160 is shown in FIG. All phonemes are arranged on each of the vertical and horizontal axes of the frequency table 160, and at the intersection of a specific vertical axis phoneme row and a specific horizontal axis phoneme column, the vertical axis phoneme column is arranged. Next, how often do phoneme connections in which the phonemes on the horizontal axis are connected occur in words that should be written in katakana, or in words that should be written in hiragana? It shows the relative frequency of each character type. In other words, the frequency table 160 numerically represents the degree of Katakana-likeness or Hiragana-likeness for all combinations (connection forms) of two phonemes.
第3図の頻度テーブル160について更に説明る、。The frequency table 160 of FIG. 3 will be further explained.
頻度テーブル180は以下のようにして作成る、。The frequency table 180 is created as follows.
先ず、既存の文書中のひらがな及びカタカナの語句の夫
々について出現回数を計数る、とともに、第2図の音素
テーブルを参照して読みの列を音素の列に変換る、。以
下の第2表及び第3表は夫々、参照した既存の文書中の
ひらがなの語句についての出現回数及びカタカナの語句
についての出現回数の例を示している。First, the number of occurrences of each hiragana and katakana word in the existing document is counted, and the reading sequence is converted into a phoneme sequence with reference to the phoneme table shown in FIG. Tables 2 and 3 below show examples of the number of occurrences of hiragana words and phrases and the number of occurrences of katakana words in the referenced existing documents, respectively.
第3表 カタカナの語句の出現回数の例第2表 ひらが
なの語句の出現回数の例(以下余白)
こうして得られた第2表及び第3表から、ひらがなの語
句中に現れる音素の対の出現回数とカタカナの語句中に
現れる音素の対の出現回数との夫々を計数る、。例えば
、rAIJという音素の対が、ひらがなの語句中に現れ
る出現回数を第2表から求めると、第2表の2行目のr
AAIUJ中に現れる回数が1回で、3行目のrAIJ
中に現れる回数が2回なので、これらの回数を合計して
得られる3回が求める出現回数になる。同様の操作を適
当な数の既存の文書について行った結果が以下の第4゛
表及び第5表の夫々に示されている。即ち、第4表はひ
らがなの語句中に現れた音素の対の出現回数を示し、第
5表はカタカナの語句中に現れた音素の対の出現回数を
示している。Table 3 Examples of the number of occurrences of katakana words Table 2 Examples of the number of occurrences of hiragana words (blank below) From Tables 2 and 3 obtained in this way, the occurrence of phoneme pairs that appear in hiragana words Count the number of times a phoneme pair appears in a katakana phrase. For example, if we calculate the number of times the phoneme pair rAIJ appears in hiragana words from Table 2, we find that rAIJ appears in the second row of Table 2.
The number of times it appears in AAIUJ is 1, and rAIJ in the 3rd line
Since the number of times it appears is two, the number of times it appears is 3 times, which is the sum of these numbers. The results of similar operations performed on a suitable number of existing documents are shown in Tables 4 and 5 below, respectively. That is, Table 4 shows the number of times phoneme pairs appear in hiragana words, and Table 5 shows the number of times phoneme pairs appear in katakana words.
第5表 カタカナの語句中の音素の対の出現回数の例
第4表 ひらがなの語句中の音素の
対の出現回数の例
これらの表から、例えばrUaJという音素の対(「U
」が先でその次にrB」が続くような音素の接続形態)
は、ひらがなの語句の中には1回しか出現しないが、カ
タカナの語句の中には2239回も出現る、ことが分か
る。Table 5 Examples of the number of occurrences of phoneme pairs in katakana phrases Table 4 Examples of the number of occurrences of phoneme pairs in hiragana phrases From these tables, for example, the phoneme pair rUaJ (``U
`` first, followed by rB'')
It can be seen that although it appears only once in Hiragana words, it appears 2239 times in Katakana words.
音素列バッファ140に記憶された音素の列の文字種(
ひらがな或はカタカナ)を判定る、には、こうして得ら
れた第4表及び第5表をそのまま利用してもよいが、本
実施例では第4表及び第5表から頻度テーブル160(
第3図)を作り、この頻度テーブル160を利用してい
る。The character type of the phoneme string stored in the phoneme string buffer 140 (
Although the fourth and fifth tables obtained in this way may be used as they are to determine the frequency table 160 (hiragana or katakana), in this embodiment, the frequency table 160 (
FIG. 3) is created and this frequency table 160 is used.
次に、第4表及び第5表から頻度テーブル160を作る
手順を説明る、。Next, a procedure for creating the frequency table 160 from Tables 4 and 5 will be explained.
第4表中に示されている音素の対の出現回数の総数をs
bとし、第5表中に示されている音素の対の出現回数の
総数をSkとる、と、shの値とSkの値とは一致しな
いことが通常である。そこで、音素の対の出現回数の総
数が第4表の場合と第5表の場合とで同じ値になるよう
に、第5表中の各僅に「Sh/Sk Jを乗じる。第4
表中の値はそのままとる、。The total number of occurrences of the phoneme pairs shown in Table 4 is s
If b is the total number of occurrences of the phoneme pairs shown in Table 5, and Sk is the total number of occurrences of the phoneme pairs shown in Table 5, then the value of sh and the value of Sk usually do not match. Therefore, each number in Table 5 is multiplied by Sh/Sk J so that the total number of occurrences of phoneme pairs is the same in Table 4 and Table 5.
Take the values in the table as they are.
第4表のi行とj列との交差位置(出現回数)の値をH
ljとし、第5表中のi行とj列との交差位置の値(出
現回数)にr Sh/Sk Jを乗じた値をKijとし
たとき、第3図の頻度テーブル160中のi行とj列と
の交差位置の値(相対頻度)をRijとる、と、Rfj
t Hij及びKfjから以下のようにして求める。The value of the intersection position (number of occurrences) between row i and column j in Table 4 is
lj, and Kij is the value obtained by multiplying the value (number of occurrences) of the intersection of row i and column j in Table 5 by rSh/SkJ, then row i in frequency table 160 in FIG. Taking the value (relative frequency) of the intersection position of and column j as Rij, then Rfj
It is determined from t Hij and Kfj as follows.
(i) 0 * Hijかつ0≠Kijのとき:Rij
= RcX Log(Kij/Hij)とる
、。(i) When 0 * Hij and 0≠Kij: Rij
Take = RcX Log(Kij/Hij).
ここでReは実計算のための重み定数で通常は1である
。また1つの相対頻度テーブル160中で常に一定値を
とる。小数点以下を切り捨てる計算を行う場合にはfl
aを大きく取り、切り捨て誤差を小さくる、。また対数
の底は自然対数の底(e=2.718.、、 ’)とる
、。Here, Re is a weight constant for actual calculation and is usually 1. Also, it always takes a constant value in one relative frequency table 160. When performing calculations that round down the decimal places, use fl
Increase a to reduce the truncation error. Also, the base of the logarithm is the base of the natural logarithm (e=2.718.,,').
(ii) Hij=Oかつ Kij=Oのとき:Rij
= Oとる、。(ii) When Hij=O and Kij=O: Rij
= Take O.
(iii) Hij≠Oかつ Kij=Q のとき
:Rfj = −Maxとる、。(iii) When Hij≠O and Kij=Q: Take Rfj = -Max.
但し、Maxは扱い得る最大の絶対値であり、従って、
r −Max Jは扱い得る最小の値である。However, Max is the maximum absolute value that can be handled, and therefore,
r-Max J is the minimum value that can be handled.
この場合、音素の対はひらがなの語句中には出現る、が
カタカナの語句中には全く出現しなかった訳である。第
3図の頻度テーブル160は、カタカナの語句中に現れ
る頻度の高い場合を正側の大きな値で表し、ひらがなの
語句中に現れる頻度の高い場合を負側の大きな値で表す
こととしたので、この場合は、扱い得る最大(絶対値が
最大という意味である。)の負側の値をRljの値とる
、。In this case, the phoneme pair appears in hiragana words, but never in katakana words. In the frequency table 160 of FIG. 3, cases that appear frequently in katakana words and phrases are represented by large values on the positive side, and cases that appear frequently in hiragana words and phrases are represented by large values on the negative side. , In this case, the value of Rlj is taken as the negative side of the maximum that can be handled (meaning the maximum absolute value).
1HUaIJ中の全ての隣接る、音素の対についての頻
度テーブル160中の値(相対頻度)f:求めると、以
下の第6表のようになる。Values (relative frequencies) f in the frequency table 160 for all adjacent phoneme pairs in 1HUaIJ: When calculated, the results are as shown in Table 6 below.
(以下余白)
(iv) Hij=Oかつ Kij≠0 のとき:Ri
j = + Max とる、。(Left below) (iv) When Hij=O and Kij≠0: Ri
Take j = + Max.
この場合、音素の対はカタカナの語句中には出現る、が
ひらがなの語句中には全く出現しなかった訳である。従
って、この場合は、扱い得る最大の正側の値をRijの
値とる、。In this case, the phoneme pair appears in katakana words, but never in hiragana words. Therefore, in this case, the maximum positive value that can be handled is taken as the value of Rij.
以上のようにして第3図の頻度テーブル160が得られ
るので、次に、相対頻度テーブル160を参照して音素
の列V ATDENNTEiHUaI Jの文字種を判
定る、過程を説明る、。先ず、音素の列rAIDENN
TE第6表 音素の列rAIDENNTEiHUal
Jの各音素対の相対頻度値
素の列r AIDENNTEiHUal Jの文字種は
カタカナテあると判定る、。また、判定結果が正しいで
あろうと予想される度合を判定度として以下のように定
義る、。Since the frequency table 160 of FIG. 3 is obtained in the above manner, the process of determining the character type of the phoneme string V ATDENNTEiHUaI J with reference to the relative frequency table 160 will now be described. First, the phoneme sequence rAIDENN
TE Table 6 Phoneme sequence rAIDENNTEiHUal
Relative frequency value sequence r of each phoneme pair of J AIDENNTEiHUal It is determined that the character type of J is Katakana. Further, the degree to which the judgment result is expected to be correct is defined as the judgment degree as follows.
合計値と同符合の
ここで、第6表の右横の値、即ち、音素の対の相対頻度
の値の合計値は+433になる。従って、音これを第6
表の場合に適用る、と、判定度=95%(=457/4
81)である。従って、音素の列rAIDENNTEi
HUaIJは判定度95%を持ってカタカナであると判
定される。こうして得られた判定結果は判定手段160
から変換手段170へと送られ、変換手段170は読み
列バッファ110に記憶されている読みの列「おいてん
ていふあい」を判定結果に基づいてカタカナという文字
種に変換し文字列バッファ180に送り、出力手段30
はカタカナという文字種で「アイデンティファイ」と表
記る、。The total value has the same sign as the value on the right side of Table 6, that is, the total value of the relative frequency values of the phoneme pairs is +433. Therefore, the sound is the 6th
When applied to tables, the judgment level = 95% (=457/4
81). Therefore, the sequence of phonemes rAIDENNTEi
HUaIJ is determined to be katakana with a determination degree of 95%. The determination result obtained in this way is determined by the determination means 160.
The conversion means 170 converts the string of pronunciations stored in the reading string buffer 110 ``Otenteifai'' into the character type katakana based on the determination result, and sends the converted string to the character string buffer 180. , output means 30
is written as ``identify'' in katakana.
以上、読みの列「あいでんでぃふぁい」の文字種がカタ
カナであると判定されてカタカナで表記される過程を説
明したが、次に、読みの列「まどろっこしい」の文字種
がひらがなであると判定される過程を説明る、。Above, we have explained the process by which the character type for the reading string ``aidendi-fai'' is determined to be katakana and is written in katakana. Explaining the process of determining hiragana.
先ず、読みの列「まどろっこしい」を音素の列にる、と
第7表のようになる。First, if we put the pronunciation column ``Madorokkoshii'' in the phoneme column, we get something like Table 7.
第8表 音素の列r MADOROKKOSI I 、
Jの各音素対の相対頻度値
第7表 読みの列「まどろっこしい」とその音素の列と
の対応間係
こうして、音素の列r MADOROKKOSI I
、Jが求まるので、次は、音素の列r MADOROK
KOSI I Jの全ての音素の対の相対頻度値を求め
ると第8表のようになる。Table 8 Phoneme sequence r MADOROKKOSI I,
Relative frequency values for each phoneme pair in J Table 7 Correspondence between the pronunciation sequence ``Madorokkoshii'' and its phoneme sequence Thus, the phoneme sequence r MADOROKKOSI I
, J is found, next we need to find the phoneme sequence r MADOROK
Table 8 shows the relative frequency values of all phoneme pairs in KOSI I J.
(以下余白)
ここで、第8表の右欄の値、即ち、音素の対の相対頻度
の値の合計値は−38になる。従って、音素の列r M
ADOROKKOS I I Jの文字種はひらがなで
あると判定る、。また、判定度は64%(=85/13
2)である。このように、本実施例によれば、辞書に登
録されていない語句(読みの列)についても、正しい文
字種に変換る、ことが可能になる。(Left below) Here, the total value of the values in the right column of Table 8, that is, the relative frequency values of phoneme pairs, is -38. Therefore, the sequence of phonemes r M
The character type of ADOROKKOS I I J is determined to be Hiragana. Also, the judgment degree is 64% (=85/13
2). In this way, according to this embodiment, it is possible to convert words (reading sequences) that are not registered in the dictionary into the correct character type.
第4図には前記以外の実施例が示されており、この実施
例ではかな漢変換を一旦行った後の文章或は自動翻訳機
で翻訳された文章中の語句の文字種のブルーフ・リード
(検証)を行うことができる。FIG. 4 shows an embodiment other than the above, and in this embodiment, the character type of the word in the sentence after Kana-Kan conversion or the sentence translated by an automatic translator ( verification).
第4図において、文章メモリ210には、かな漢変換を
既に行った後の文章或は自動翻訳機で翻訳された文章等
が記憶されている。文章メモリ210内の文章中の文字
列は文字列抽出手段220によって文字種別に抽出され
、抽出された文字列は文字列バッファ230に記憶され
、抽出された文字列の文字種は文字種バッファ240に
記憶される。文字列バッファ230に記憶された文字列
は読み列変換手段250により読み列に変換される。こ
の際、読み列変換手段250は、文字列と読み列との対
応関係を示している読みテーブル260を参照る、。In FIG. 4, a sentence memory 210 stores sentences that have already undergone kana-kan conversion or sentences that have been translated by an automatic translator. The character strings in the sentences in the sentence memory 210 are extracted by character type by the character string extraction means 220, the extracted character strings are stored in the character string buffer 230, and the character types of the extracted character strings are stored in the character type buffer 240. be done. The character string stored in the character string buffer 230 is converted into a reading string by the reading string conversion means 250. At this time, the pronunciation conversion means 250 refers to the pronunciation table 260 that shows the correspondence between character strings and pronunciation sequences.
読み列変換手段250で得られた読み列は、第1図の実
施例と同様に、分解手段120により音素の列に分解さ
れ、分解された音素の列は分解手段120により音素の
列に分解(変換)され、判定手段150は相対頻度テー
ブル160t!:参照して音素の列の文字種を判定し、
判定結果を比較手段170に送る。The pronunciation sequence obtained by the pronunciation conversion means 250 is decomposed into a sequence of phonemes by the decomposition means 120, and the decomposition means 120 decomposes the sequence of phonemes into a sequence of phonemes, as in the embodiment of FIG. (conversion), and the determining means 150 uses the relative frequency table 160t! : Reference and determine the character type of the phoneme string,
The determination result is sent to comparison means 170.
比較手段手段170は前記判定結果と文字種バッファ2
40に記憶されている文字種情報とを比較し、比較結果
を出力バッファ280に送り、出力手段は比較結果を表
示る、。この際の表示方法は例えば判定結果と文字種バ
ッファ240に記憶されている文字種情報とが異なって
いる場合には文字列をハイライト表示る、とともに前記
判定結果の示す文字種を表示る、等である。The comparison means 170 compares the judgment result with the character type buffer 2.
40, the comparison result is sent to the output buffer 280, and the output means displays the comparison result. The display method at this time is, for example, if the determination result and the character type information stored in the character type buffer 240 are different, the character string is highlighted, and at the same time, the character type indicated by the determination result is displayed. .
なお、読みと音素との対応関係は前出の第4図の音素変
換テーブルに示すものに限られず、例えば、第5図の音
素変換テーブルに示すような対応間係であってもよい。Note that the correspondence between pronunciations and phonemes is not limited to that shown in the phoneme conversion table shown in FIG. 4, but may be, for example, the correspondence shown in the phoneme conversion table shown in FIG. 5.
また、前記実施例では頻度テーブル160は音素の全て
の接続形態についての出現頻度を示していたが、頻度テ
ーブルは文字種を判定る、上で太きな要素となるような
一部の音素の接続形態についてだけの出現頻度を示すも
のであってもよい。また、文字種の判定を行う際には、
判定対象の音素列の全ての接続形態の出現頻度を常に調
べるのではなく、ある文字種にとっての極めて特徴的な
音素列を認めた場合には直ちに文字種の判定を完了して
もよい。また、本発明が適用されるのは日本語に限られ
ず、文字種を有る、外国語についての文書処理にも適用
できる。In addition, in the above embodiment, the frequency table 160 shows the appearance frequency of all connection forms of phonemes, but the frequency table is used to determine the character type, and the connection of some phonemes, which are the bold elements above. It may also indicate the appearance frequency of only the form. Also, when determining the character type,
Rather than constantly checking the frequency of appearance of all connected forms of the phoneme string to be determined, the determination of the character type may be completed immediately when a phoneme string that is extremely characteristic of a certain character type is recognized. Furthermore, the present invention is applicable not only to Japanese, but also to document processing for foreign languages that have different character types.
F8発明の効果
上述のように、本発明によれば辞書に登録されていない
読みの列について正しい文字種を判定る、ことができる
。F8 Effects of the Invention As described above, according to the present invention, it is possible to determine the correct character type for a sequence of pronunciations that are not registered in a dictionary.
第1図は本発明に係る文書処理装置の一実施例の構成を
示すブロック図、第2図は前記実施例の音素変換テーブ
ルを示す図表、第3図は前記実施例の相対頻度テーブル
を示す図表、第4図は前記以外の実施例の構成を示すブ
ロック図、第5図は前記以外の音素変換テーブルを示す
図表である。FIG. 1 is a block diagram showing the configuration of an embodiment of a document processing device according to the present invention, FIG. 2 is a diagram showing a phoneme conversion table of the embodiment, and FIG. 3 is a relative frequency table of the embodiment. FIG. 4 is a block diagram showing the configuration of an embodiment other than the above, and FIG. 5 is a chart showing a phoneme conversion table other than the above.
Claims (5)
に基づいて前記読みの列の文字種を判定する過程を含む
文書処理方式。(1) A document processing method including a process of converting a string of readings into a string of phonemes and determining the character type of the string of readings based on the converted string of phonemes.
を音素の列に分解する分解手段と、音素の列に基づいて
前記読みの列の文字種を判定する判定手段と、前記読み
の列を判定手段の示した文字種に変換するための変換手
段と、変換結果を表示する出力手段と、を有する文書処
理装置。(2) an input means for inputting a string of readings; a decomposition means for breaking down the string of readings into a string of phonemes; a determining means for determining the character type of the string of readings based on the string of phonemes; A document processing device comprising: a conversion means for converting a string of characters into a character type indicated by a determination means; and an output means for displaying a conversion result.
なくとも一部についての接続形態に基づいて文字種を判
定する、請求項(2)に記載の文書処理装置。(3) The document processing apparatus according to claim 2, wherein the determining means determines the character type based on the connection form of at least a portion of each phoneme forming a string of phonemes.
が文字種中に現れる頻度との関係を表す頻度テーブルを
参照する、請求項(2)または(3)に記載の文書処理
装置。(4) The document processing device according to claim 2 or 3, wherein the determining means refers to a frequency table representing a relationship between a phoneme connection form and the frequency with which the connection form appears in character types.
読みの列を音素の列に分解する分解手段と、音素の列に
基づいて前記読みの列の文字種を判定する判定手段と、
判定手段の示した文字種と前記文字列の文字種とを比較
する比較手段と、比較結果を表示する出力手段と、を有
する文書処理装置。(5) reading string conversion means for converting a character string into a reading string;
a decomposition means for decomposing a string of pronunciations into a string of phonemes; a determining means for determining the character type of the string of pronunciations based on the string of phonemes;
A document processing device comprising: a comparison means for comparing the character type indicated by the determination means with a character type of the character string; and an output means for displaying the comparison result.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2075795A JP2669437B2 (en) | 1990-03-27 | 1990-03-27 | Character type conversion method and apparatus |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2075795A JP2669437B2 (en) | 1990-03-27 | 1990-03-27 | Character type conversion method and apparatus |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPH03276252A true JPH03276252A (en) | 1991-12-06 |
| JP2669437B2 JP2669437B2 (en) | 1997-10-27 |
Family
ID=13586498
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2075795A Expired - Fee Related JP2669437B2 (en) | 1990-03-27 | 1990-03-27 | Character type conversion method and apparatus |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP2669437B2 (en) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6091442A (en) * | 1983-10-25 | 1985-05-22 | Toshiba Corp | Sentence producing device |
| JPS61120276A (en) * | 1984-11-16 | 1986-06-07 | Sanyo Electric Co Ltd | Katakana-to-hiragana converter |
| JPS62130458A (en) * | 1985-11-30 | 1987-06-12 | Fujitsu Ltd | Kana to kanji conversion processing system |
| JPS62145462A (en) * | 1985-12-20 | 1987-06-29 | Matsushita Electric Ind Co Ltd | Japanese sentence input device |
| JPS6453265A (en) * | 1987-08-24 | 1989-03-01 | Nec Corp | Roman character and chinese character conversion device |
| JPS6452070U (en) * | 1987-09-21 | 1989-03-30 | ||
| JPH01241671A (en) * | 1988-03-23 | 1989-09-26 | Nec Corp | Alphabet/kana converting system |
-
1990
- 1990-03-27 JP JP2075795A patent/JP2669437B2/en not_active Expired - Fee Related
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6091442A (en) * | 1983-10-25 | 1985-05-22 | Toshiba Corp | Sentence producing device |
| JPS61120276A (en) * | 1984-11-16 | 1986-06-07 | Sanyo Electric Co Ltd | Katakana-to-hiragana converter |
| JPS62130458A (en) * | 1985-11-30 | 1987-06-12 | Fujitsu Ltd | Kana to kanji conversion processing system |
| JPS62145462A (en) * | 1985-12-20 | 1987-06-29 | Matsushita Electric Ind Co Ltd | Japanese sentence input device |
| JPS6453265A (en) * | 1987-08-24 | 1989-03-01 | Nec Corp | Roman character and chinese character conversion device |
| JPS6452070U (en) * | 1987-09-21 | 1989-03-30 | ||
| JPH01241671A (en) * | 1988-03-23 | 1989-09-26 | Nec Corp | Alphabet/kana converting system |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2669437B2 (en) | 1997-10-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US5161105A (en) | Machine translation apparatus having a process function for proper nouns with acronyms | |
| EP0932871A1 (en) | Methods and apparatus for translating between languages | |
| JPH0261763A (en) | Mechanical translation equipment | |
| JPH03276252A (en) | Document treatment system and document treatment equipment | |
| JPS6190269A (en) | Translation system | |
| KR100984293B1 (en) | Hanagog system, Hanja translation system and conversion method for world common language | |
| JP3387421B2 (en) | Word input support device and word input support method | |
| JPH05151256A (en) | Machine translation method and its system | |
| JPH0544056B2 (en) | ||
| JPS61134866A (en) | Japanese word analyzer | |
| JPH0350668A (en) | character processing device | |
| JP2655922B2 (en) | Machine translation equipment | |
| JPH03129568A (en) | Document processor | |
| JPH05181854A (en) | Sentence proofreading device, sentence processing device, and kanji-kana conversion device | |
| JP4061001B2 (en) | Machine translation device | |
| JPS6190268A (en) | Translation system | |
| JPS59153232A (en) | Character converter | |
| JPH01112367A (en) | Machine translator | |
| JPH03222060A (en) | Japanese reading system | |
| JPH02288971A (en) | Information processing system provided with japanese language processing function | |
| JPS6365566A (en) | 'kana' to 'kanji' converter | |
| JPH032960A (en) | Kana/kanji converting device | |
| JPS6298456A (en) | Japanese language input device | |
| JPH061468B2 (en) | Japanese sentence proofreading device | |
| JPH08202720A (en) | Machine translation device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| LAPS | Cancellation because of no payment of annual fees |