JPH0640352B2

JPH0640352B2 - Character recognition device

Info

Publication number: JPH0640352B2
Application number: JP59281244A
Authority: JP
Inventors: 由明黒沢
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-12-29
Filing date: 1984-12-29
Publication date: 1994-05-25
Anticipated expiration: 2009-05-25
Also published as: JPS61160182A

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、たとえば手書入力されたかまたは音声入力
されたｎ文字からなる単語を簡易に且つ効果的に認識し
得る文字認識、あるいは音声認識装置に関するものであ
る。Description: TECHNICAL FIELD OF THE INVENTION The present invention relates to a character recognition device or a voice recognition device capable of easily and effectively recognizing, for example, a handwritten or voice input word consisting of n characters. It is about.

〔発明の技術的背景とその問題点〕文字を認識する場合、基本的にはその文字の特徴を検出
して行われるが、ｎ文字からなる単語を認識するとき、
個々の文字をそれぞれ認識し、その結果を組合わせるだ
けでは不十分なことが多く、或る文字の認識が不充分な
場合には、その単語を認識することが困難となる。そこ
で従来では、第４図３に示すように単語照合部を構成
し、有意の単語が格納されている単語辞書を検索するこ
とによって個々の文字認識が不充分な場合であってもそ
の単語を認識するような工夫が施されている（特願昭56
-138163号参照）。この場合、単語を構成するｎ文字
が、まず文字認識部Ｉにおいて認識され、その結果であ
るB_j,i（ｉ＝１，ｎ；ｊ＝１，ｋ）が候補文字レジスタ
５に格納される。ここで、B_j,iは単語内の文字位置ｉに
ある文字を認識したときのｊ番目の候補文字の文字コー
ドである。次に単語辞書メモリ４に格納されている単語
の文字コードと、前記文字コードの比較がコンパレータ
６にて行なわれ、その結果これらの一致出力が得られた
時、その候補順位に基づいて一致度計算部９にてこの単
語と入力された単語との一致度計算が行なわれる。[Technical Background of the Invention and Problems Thereof] When recognizing a character, basically, the characteristic of the character is detected, but when recognizing a word consisting of n characters,
It is often not enough to recognize each individual character and combine the results, and if the recognition of a certain character is insufficient, it becomes difficult to recognize the word. Therefore, conventionally, even when individual character recognition is insufficient, a word collating unit is configured as shown in FIG. 4 and a word dictionary in which significant words are stored is searched to search for that word. It has been devised to recognize it (Japanese Patent Application Sho 56).
-See 138163). In this case, n characters forming a word are first recognized by the character recognition unit I, and the result B _j , i (i = 1, n; j = 1, k) is stored in the candidate character register 5. . Here, B _j , i is the character code of the j-th candidate character when the character at the character position i in the word is recognized. Next, when the comparator 6 compares the character code of the word stored in the word dictionary memory 4 with the character code, and when these match outputs are obtained, the degree of matching is determined based on the candidate rank. The calculation unit 9 calculates the degree of coincidence between this word and the input word.

しかしながらこの方式では文字コードの一致検査を行な
うために、文字ごとに候補文字レジスタ５を候補数ｋ回
検索しなければならず、この為の時間を要し処理時間が
長くなり、また制御回路が複雑であり不具合を生じてい
た。However, in this method, the candidate character register 5 must be searched for the number of candidates k times for each character in order to carry out a character code matching test, which requires time and processing time, and the control circuit It was complicated and had problems.

この欠点を解消するために従来（特願昭58-071372号参
照）では第４図の候補文字レジスタ５、コンパレータ６
のかわりに第５図のような回路を用いて一致度計算を行
なっている。この方式は単語データ１３の文字コードに
よって、直接的に類似度メモリ１７のアドレスを参照
し、このメモリ出力を一致度計算部１８に送り、一致度
計算を行なうものである。なお類似度メモリ１７には各
文字コードに応じ、その文字の認識結果から得られる類
似度がこの文字コードの示すアドレスに格納されてい
る。しかしながらこの方式で、たとえば文字コードのビ
ット数の大きいもの、たとえば漢字コード（１６ビッ
ト）を取り扱おうとすると単語としての類似度メモリが
巨大化し、実現不可能になるという問題点がある。In order to solve this drawback, the candidate character register 5 and the comparator 6 shown in FIG. 4 are conventionally used (see Japanese Patent Application No. 58-071372).
Instead, the circuit shown in FIG. 5 is used to calculate the degree of coincidence. In this method, the address of the similarity memory 17 is directly referred to by the character code of the word data 13, and the output of this memory is sent to the matching degree calculation unit 18 to perform the matching degree calculation. In the similarity memory 17, the similarity obtained from the recognition result of the character corresponding to each character code is stored at the address indicated by this character code. However, with this method, if a character code having a large number of bits, for example, a kanji code (16 bits) is to be handled, the similarity memory as a word becomes huge and it becomes impossible to realize.

[Object of the Invention]

この発明はこのような事情を考慮してなされたもので、
その目的とするところは、漢字コードのような文字コー
ドのビット数の大きなものを取扱う場合でも簡易にかつ
高速度に単語辞書と入力単語の比較を行なって単語認識
を効果的に行なう事のできる文字認識装置を提供するこ
とにある。This invention was made in consideration of such circumstances.
The purpose is to enable efficient word recognition by easily and quickly comparing the word dictionary with the input word even when handling a character code having a large number of bits such as a kanji code. To provide a character recognition device.

[Outline of Invention]

この発明はｎ文字からなる入力単語の各文字A_i（ｉ＝
１，２，…，ｎ）につきそれぞれ認識し、各文字ごとに
その文字A_iと、文字認識辞書にＬ個登録されている認識
対象文字B_k（ｋ＝１，２，…Ｌ）との類似度S_k,iを求
め、このS_k,iを２次元メモリである第２のメモリに格納
する。そして、複数個に分割された第１のメモリに、文
字B_kの文字コードが分割入力した時に、このS_n,iの格納
場所を示すようなアドレスポインタを書込み、単語認識
時には、単語辞書の出力である文字コードと文字位置ｉ
を第１のメモリのアドレスに分割入力し、この結果得ら
れるアドレスと文字位置ｉを第２のメモリに入力し、こ
の出力を一致度計算部に送り、単語の一致度を計算し、
この一致度に基づいて認識候補となる単語を厳選し、こ
れによって高速度で適確な認識を簡易に行う手段を可能
としたものである。This invention uses each character A _i (i =
1, 2, ..., N) are respectively recognized, and the character A _i for each character and the recognition target character B _k (k = 1, 2, ... L) registered in L characters in the character recognition dictionary The similarity S _k , i is obtained, and this S _k , i is stored in the second memory which is a two-dimensional memory. Then, when the character code of the character B _k is divided and input into the first memory divided into a plurality of pieces, an address pointer indicating the storage location of this S _n , i is written, and at the time of word recognition, a word dictionary Output character code and character position i
Is divided into the addresses of the first memory, the resulting address and the character position i are input to the second memory, and the output is sent to the matching degree calculation unit to calculate the matching degree of the words.
A word that is a candidate for recognition is carefully selected on the basis of the degree of coincidence, thereby enabling a means for easily performing accurate recognition at high speed.

〔The invention's effect〕

この発明は以上説明したとおり、一致度計算における類
似度の検索が、第１のメモリと第２のメモリを直列に参
照することによってのみ行われるので、制御回路が簡単
化され、しかも各メモリはアドレスが分割入力になるの
で小さくてすみ、認識処理時間も大幅に短縮され、きわ
めて効果的で実用性の高い文字認識が可能となる効果を
有している。As described above, according to the present invention, the similarity search in the coincidence calculation is performed only by serially referring to the first memory and the second memory, so that the control circuit is simplified and each memory is Since the address is divided input, it can be made small, and the recognition processing time can be greatly shortened, which has the effect of enabling extremely effective and highly practical character recognition.

Example of Invention

以下、図面を参照してこの発明の一実施例につき説明す
る。なお、以下の説明では第１のメモリの分割数を２と
して説明する。An embodiment of the present invention will be described below with reference to the drawings. In the following description, the number of divisions of the first memory will be two.

第２図は文字認識の結果得られる文字コード列で、第１
位候補がＭ_１Ｎ_１なる文字候補、以下第ｊ位がM_jN_jであ
る。ここでＭ_１は文字コードの上位８ビットコード、Ｎ
_１が下位８ビットコードである。すなわち、この例では
文字コードM_jN_jは１６ビットで分割はそれぞれ８ビット
づつとしている。次にこれらＭ_１〜Ｍ_１５（最大候補数
は１５に限定されるものとする）に１〜１５の番号を割
当て、このとき、Ｍ_１〜Ｍ_１５の中に同一のコードが存
在するならば同じ番号を割当てる。すなわちM_j＝M_k（ｊ
＜ｋ）ならばM_j，M_k共に番号Ｊを割当てる。この番号を
JM_jのように記述する。N_jについても同様である。次に
第１のメモリ〔Ｍ〕（第３図２２）のM_j（８ビット）を
アドレスとする位置に前記した対応する番号JM_jを書込
む。もう１つの第１のメモリ〔Ｍ〕（第３図２３）のN_j
（８ビット）をアドレスとする位置に同様に対応する番
号JN_jを書込む。第１のメモリの残りの部分はすべて０
である。さらに第ｊ位候補M_jN_jに対応する類似度S_jをM_j
に対応する番号JM_j，N_jに対応する番号JN_jの組により作
られるアドレスJM_j，JN_jで定義する第２のメモリのアド
レスに格納する。すなわち、入力された文字A_iが実はB
_jiである類似度S_jを（JM_j，JN_j）の２次元マトリクスS
_JM,JNに表示すると第３図の第２メモリ２８のように書
込まれる。第２のメモリ２８の残りの部分はすべて０で
ある。なお、第３図中の２６，２７は第２のメモリ２８
のアドレスの上位４ビット、下位４ビットを表示したも
のである。このようにすることによって、第１のメモリ
〔Ｍ〕、〔Ｎ〕のそれぞれのアドレスにM_jN_jなる文字コ
ードが信号線２０，２１によって入力されると第１のメ
モリはJM_j，JN_jなる番号を各々信号線２４，２５に出力
し、これが第２のメモリのアドレスに入力され、第２の
メモリからM_jN_jに対応する類似度S_jが得られる。一方、
M_jN_j（ｊ＝１，２，……１５）以外のコードが入力され
ると、第２のメモリ出力は０となる。FIG. 2 shows a character code string obtained as a result of character recognition.
The position candidate is a character candidate of M ₁ N ₁ , and the jth position is M _j N _j . Here, M ₁ is the upper 8-bit code of the character code, N
₁ is the lower 8-bit code. That is, in this example, the character code M _j N _j is 16 bits, and each division is 8 bits. Next, these M _{1 to} M ₁₅ (the maximum number of candidates is limited to 15) are assigned numbers _{1 to} ₁₅ , and if the same code exists in M _{1 to} M ₁₅ at this time, Assign the same number. That is, M _j = M _k (j
If <k), the number J is assigned to both M _j and M _k . This number
Described as JM _j . The same applies to N _j . Next, the above-mentioned corresponding number JM _j is written in the position of M _j (8 bits) of the first memory [M] (FIG. 22) as an address. N _j of the other first memory [M] (FIG. 3)
Similarly, the corresponding number JN _j is written in a position having (8 bits) as an address. The rest of the first memory is all 0s
Is. Further, the similarity S _j corresponding to the jth candidate M _j N _j is set to M _j
Storing the corresponding number JM _j, address JM _j produced by a set of numbers JN _j corresponding to N _j, the address of the second memory as defined in JN _j to. That is, the input characters A _i are actually B
_The similarity S _j that is _ji is the two-dimensional matrix S of (JM _j , JN _j ).
_When it is displayed on _{JM and JN} , it is written like the second memory 28 in FIG. The rest of the second memory 28 is all zeros. Incidentally, 26 and 27 in FIG. 3 are the second memory 28.
The upper 4 bits and the lower 4 bits of the address are displayed. By doing so, when the character codes of M _j N _j are input to the respective addresses of the first memories [M] and [N] through the signal lines 20 and 21, the first memories are JM _j and JN. outputs _j becomes numbers each signal line 24, which is input to the address of the second memory, the similarity S _j corresponding from the second memory to M _j N _j is obtained. on the other hand,
When a code other than M _j N _j (j = 1, 2, ... 15) is input, the second memory output becomes 0.

これら第１、第２のメモリのアドレスにそれぞれｍビッ
ト付加して文字位置ｉを第１、第２の追加アドレスｍに
入力しておく。こうすることによってｎ文字からなる単
語の各文字ｉについて前記した処理によって類似度S_jが
得られる。The character position i is input to the first and second additional addresses m by adding m bits to the addresses of the first and second memories, respectively. By doing so, the similarity S _j is obtained by the above-described processing for each character i of the word consisting of n characters.

このようにして得られた各文字の類似度は一致度計算部
２９へ送られ、たとえば、これら類似度を加算すること
によって単語の確からしさ、すなわち一致度を得る。こ
の一致度は辞書に登録された一単語と入力文字との間で
計算されるものであり、これを辞書内の単語全部につい
て計算し、たとえば一番高い一致度の候補単語を外部に
結果として出力する。The degree of similarity of each character thus obtained is sent to the degree-of-coincidence calculator 29, and, for example, the degree of similarity, that is, degree of coincidence, of the word is obtained by adding these degrees of similarity. This degree of matching is calculated between one word registered in the dictionary and the input character, and this is calculated for all the words in the dictionary. For example, the candidate word with the highest degree of matching is output to the outside as a result. Output.

第１図はこの発明の一実施例の概略構成図であり、その
動作は次の通りである。すなわち、文字認識部３１の結
果は前述した処理によって第１、第２のメモリ３３，３
６に書かれ、単語辞書メモリ３２から出力される文字コ
ードは第１のメモリ３３、第２のメモリ３６（第３図の
第２のメモリ２８と同じもの）、一致度計算部３８へ縦
続的に送られて一致度が計算される。制御部はアドレス
カウンタ３４、文字位置カウンタ３５をコントロールす
ると共に全体をコントロールし、類似度に基づいた結果
を文字認識制御部３０へ送る手段を有している。FIG. 1 is a schematic configuration diagram of an embodiment of the present invention, and its operation is as follows. That is, the result of the character recognition unit 31 is the result of the above-described processing in the first and second memories 33 and 3.
The character code written in 6 and output from the word dictionary memory 32 is cascaded to the first memory 33, the second memory 36 (the same as the second memory 28 in FIG. 3), and the coincidence calculation unit 38. Is sent to and the degree of coincidence is calculated. The control unit has means for controlling the address counter 34 and the character position counter 35 and also for controlling the whole, and sending a result based on the similarity to the character recognition control unit 30.

なお、この発明は前記した実施例以外にも種々変形して
実施することができる。たとえば、第１のメモリの分割
数は任意であり、また、そのアドレスのビット数も任意
である。一致度の計算方法も任意であって、どのような
装置および手段であってもこの発明に適用し得るもので
あればよい。また、文字認識時の最大候補数も特に限定
しない。また、実現すべき手段も特定のハードウェアに
限定するものではない。The present invention can be implemented in various modifications other than the above-described embodiments. For example, the number of divisions of the first memory is arbitrary, and the number of bits of the address is also arbitrary. The method of calculating the degree of coincidence is also arbitrary, and any device and means can be applied to the present invention. Also, the maximum number of candidates at the time of character recognition is not particularly limited. Also, the means to be realized are not limited to specific hardware.

なお、以上の記載中で文字として説明されたものは、手
書入力されたパターンおよび音声入力されたパターンの
いずれをも含むものである。It should be noted that what is described as characters in the above description includes both a handwritten input pattern and a voice input pattern.

また、文字単位の類似度とは文字の確からしさを表わす
ものであればどのようなものでもよく、たとえば、文字
認識時の候補順位、またはその順位に基づく得点であっ
ても良い。Further, the similarity on a character-by-character basis may be any value as long as it indicates the likelihood of a character, and may be, for example, a candidate rank at the time of character recognition, or a score based on the rank.

また、この発明にさらに第１のメモリの出力を直接一致
度計算部へ送る部分を付加して文字コードのビット数が
少ない時は直接的な方式も可能であるような方式にもで
きる。Further, a system for directly sending the output of the first memory to the coincidence degree calculating unit may be added to the present invention so that a direct system is possible when the bit number of the character code is small.

要するにこの発明はその要旨を逸脱しない限り種々に変
形して実施することができる。In short, the present invention can be variously modified and implemented without departing from the gist thereof.

[Brief description of drawings]

第１図はこの発明の一実施例の概略構成図、第２図、第
３図はこの発明の各手段を説明する図、第４図、第５図
は従来装置の一例を示す概略構成図である。図において、１，３１……文字認識部、２……編集部、３，３９……
単語照合部、４，１２，１９，３２……単語辞書メモ
リ、５……候補文字レジスタ、６……コンパレータ、７
……単語辞書レジスタ、８……レジスタカウンタ、９，
１８，２９，３８……一致度計算部、１０，３４……ア
ドレスカウンタ、１１……ソート処理部、１３……単語
データ、１４，１５，１６……類似度格納位置、１７…
…類似度メモリ、２０，２１，２４，２５……信号線、
２２，２３，３３……第１のメモリ、２８，３６……第
２のメモリ、２６……第２のメモリの下位アドレス、２
７……第２のメモリの上位アドレス、３０……文字認識
制御部、３５……文字位置カウンタ、３７……制御部。FIG. 1 is a schematic configuration diagram of an embodiment of the present invention, FIGS. 2 and 3 are diagrams for explaining each means of the present invention, and FIGS. 4 and 5 are schematic configuration diagrams showing an example of a conventional device. Is. In the figure, 1,31 ... Character recognition unit, 2 ... Editing unit, 3,39 ...
Word collating unit, 4, 12, 19, 32 ... Word dictionary memory, 5 ... Candidate character register, 6 ... Comparator, 7
...... Word dictionary register, 8 ...... Register counter, 9,
18, 29, 38 ... Matching degree calculation section, 10, 34 ... Address counter, 11 ... Sort processing section, 13 ... Word data, 14, 15, 16 ... Similarity storage position, 17 ...
... Similarity memory, 20, 21, 24, 25 ... Signal line,
22, 23, 33 ... First memory, 28, 36 ... Second memory, 26 ... Lower address of second memory, 2
7 ... Upper address of second memory, 30 ... Character recognition control unit, 35 ... Character position counter, 37 ... Control unit.

Claims

[Claims]

1. A character recognition means for performing character recognition on a voice input word, which is given as a character string consisting of n characters, or each character of the character input word, and a similarity obtained at the time of processing by the character recognition means. Second memory means for storing, a plurality of first memory means for dividing and storing an address referencing the second memory means, and a word based on the similarity of each character stored in the second memory means. And a means for selecting a word as a recognition candidate from the word dictionary based on the result obtained by the coincidence degree calculating means. The first memory means has a plurality of character codes of the respective characters of the word registered in the word dictionary as addresses, and stores the addresses referencing the second memory means in a divided manner. dictionary The selection of words as recognition candidates is performed by referring to the plurality of first memories for each character of the word in the word dictionary, synthesizing each output, and using the synthesized output as an address in the second memory. The character recognition device is characterized in that it is performed based on the matching degree of words calculated using the similarity of each character.