JPH0226268B2

JPH0226268B2 -

Info

Publication number: JPH0226268B2
Application number: JP56155629A
Authority: JP
Inventors: Hideaki Sugawara; Eiichiro Yamamoto
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1990-06-08
Also published as: JPS5856189A

Description

【発明の詳細な説明】本発明は文字認識装置に関するものであつて、
特に文字読取手段により入力された入力文字を文
字辞書（例えば漢字辞書）と文字認識処理後の複
数順位の認識結果にもとづきさらに単語辞書との
マツチングを行なうことにより入力単語を正確に
認識できるようにした文字認識装置に関するもの
である。[Detailed Description of the Invention] The present invention relates to a character recognition device, and includes:
In particular, input words can be accurately recognized by matching input characters input by a character reading means with a character dictionary (for example, a kanji dictionary) and a word dictionary based on the recognition results of multiple ranks after character recognition processing. The invention relates to a character recognition device.

従来の文字認識方式では、例えば第１図に示す
如く、認識部１において入力文字の特徴抽出を行
ないこれをフアイルと比較してもつとも認識順位
の高いものを出力レジスタ２に出力し、その後、
文字認識後処理としてこの出力レジスタ２に出力
された３ケの文字が都道府県名を示すものである
とあらかじめわかつている場合には、これらの文
字を都道府県辞書３と順次マツチング回路４にて
比較を行ない入力文字を正確に認識するようにし
ている。 In the conventional character recognition method, for example, as shown in FIG. 1, a recognition unit 1 extracts features of input characters, compares them with a file, and outputs characters with a higher recognition rank to an output register 2.
If it is known in advance that the three characters output to this output register 2 indicate the name of a prefecture as a post-character recognition process, these characters are sequentially matched with the prefecture dictionary 3 by a matching circuit 4. It performs comparisons to ensure that input characters are recognized accurately.

すなわち第１図において都道府県名の記入領域
に３個の文字の記入されたデータ入力用紙（図示
省略）を例えばOCR（図示省略）で読取り、これ
により得られたデータにもとづき認識部１ではそ
れぞれ認識順位のもつとも高い「宮」、「埼」「県」
を出力レジスタ２に出力し、これらをマツチング
回路４において都道府県辞書３にセツトされてい
る都道府県名と順次比較してその一致度のもつと
も高い都道府県名を読取出力として出力するもの
である。しかるにこのような後処理方式のもので
は、第１図に示すく、認識部１から「宮」「埼」，
「県」と出力されたことにもとづき都道府県辞書
３にセツトされている都道府県名とマツチングを
行なつた場合、「宮埼県」と「宮城県」の２つが
同一優先順位で存在することになり、自動的にこ
のいずれか一方を選択することはできなかつた。
そのため場合によつてはオペレータによる認識操
作制御を行なう必要が存在するため、その後処理
にかなりの時間を必要とする問題が存在する。 That is, in FIG. 1, a data input form (not shown) on which three characters are written in the prefecture name entry area is read using, for example, OCR (not shown), and based on the data obtained, the recognition unit 1 reads each ``Miya'', ``Sai'', and ``Prefecture'' have the highest recognition rankings.
are outputted to the output register 2, these are sequentially compared with the prefecture names set in the prefecture dictionary 3 in the matching circuit 4, and the prefecture name with the highest degree of matching is output as a read output. However, in such a post-processing method, as shown in FIG.
If you perform matching with the prefecture name set in the prefecture dictionary 3 based on the output of "prefecture", you will find that "Miyazaki prefecture" and "Miyagi prefecture" exist with the same priority order. Therefore, it was not possible to automatically select one or the other.
Therefore, in some cases, it is necessary for the operator to perform recognition operation control, which poses a problem in that subsequent processing requires a considerable amount of time.

したがつて本発明はこのような問題を改善する
ために認識部からの出力を最高順位のもの１つの
みに限定せずに、例えば３番目までのものとかあ
るいはまた５番目までのものというように、優先
順位の高いものから複数個出力させ、これらの複
数個の認識出力により単語辞書とのマツチングを
行なうようにした文字認識装置を提供することを
目的とするものである。そしてこのために本発明
における文字認識装置では、読取文字を認識する
文字認識装置において、読取文字を認識する認識
手段と、該認識した文字が単語辞書に記入された
文字と一致することを検出するマツチング手段
と、マツチング度合のもつとも大きい単語を選択
する判定手段を具備し、前記認識手段より複数順
位の認識候補文字を出力させかつ前記マツチング
手段において単語辞書に保持された単語と前記複
数順位の認識文字候補と比較して単語辞書から各
単語の各文字について、前記認識手段からの候補
文字群の中にその文字が有るか否かを示すフラグ
を作成し、文字「有り」のフラグの最も多い単語
を前記判定手段から認識単語として出力するよう
にしたことを特徴とする。 Therefore, in order to improve this problem, the present invention does not limit the output from the recognition unit to only the highest rank, but outputs the output from the third highest rank, or even the fifth rank. Another object of the present invention is to provide a character recognition device which outputs a plurality of characters in descending order of priority and performs matching with a word dictionary based on these recognition outputs. For this purpose, the character recognition device of the present invention includes a recognition means for recognizing the read characters, and detects that the recognized character matches a character entered in a word dictionary. The method includes a matching means and a determining means for selecting a word with the highest matching degree, the recognition means outputs recognition candidate characters of a plurality of ranks, and the matching means recognizes the words held in a word dictionary and the plurality of ranks. For each character of each word from the word dictionary in comparison with the character candidates, a flag is created indicating whether or not that character exists in the candidate character group from the recognition means, and the flag indicating the character "present" is the most common. The present invention is characterized in that the word is outputted from the determining means as a recognized word.

本発明を詳述するに先立ち、第２図にもとづき
その動作原理について簡単に説明する。 Before describing the present invention in detail, its operating principle will be briefly explained based on FIG. 2.

いま認識部で３文字の都道府県名を読取つたと
き第１番目の文字については第２図に示す如く、
第１順位が「科」、第２順位が「秩」、第３順位が
「秋」、第４順位が「材」、第５順位が「林」であ
り、第２番目の文字については第１順位が「田」、
第２順位が「内」、第３順位が「口」、第４順位が
「円」、第５順位が「由」であり、第３番目の文字
については第１順位が「具」、第２順位が「県」、
第３順位が「目」、第４順位が「且」、第５順位が
「旦」とそれぞれ認識順位にしたがつて複数の候
補文字が得られたとき、これらの各候補文字を都
道府県名の格納された都道府県辞書の各都道府県
名と順次比較する。すなわち都道府県辞書３から
第１番目に「北海道」を読出し、その第１番目の
文字「北」を前記「科，秩，秋，材，林」と比較
してマツチングをとる。この場合には「北」と
「科，秩，秋，材，林」は不一致である。そして
第２番目の文字「海」を前記「田，内，口，円，
由」と比較し、第３番目の文字「道」を前記
「具，県，目，且，旦」と比較しそれぞれ一致を
とる。そしてこの場合には第４番目の文字が存在
しないということで認識部の出力と都道府県辞書
３の第１番の単語「北海道」とは一致するのみで
ある。 When the recognition unit reads a three-character prefecture name, the first character is as shown in Figure 2.
The first rank is "ka", the second rank is "chichi", the third rank is "autumn", the fourth rank is "wood", the fifth rank is "bayashi", and the second character is 1st place is "Ten",
The second rank is "uchi", the third rank is "guchi", the fourth rank is "yen", the fifth rank is "yu", and for the third character, the first rank is "gu", and the third rank is "gu". 2nd place is "prefecture",
When multiple candidate characters are obtained according to the recognition rankings, the third rank is "me," the fourth rank is "and," and the fifth rank is "dan." The name of each prefecture is sequentially compared with the name of each prefecture in the prefecture dictionary stored in the prefecture dictionary. That is, "Hokkaido" is read out first from the prefectural dictionary 3, and the first character "kita" is compared with the above-mentioned "ka, chichi, autumn, wood, forest" to perform matching. In this case, ``north'' and ``ke, chichi, aki, wood, hayashi'' are inconsistent. Then, the second character ``Umi'' is added to the above ``Ta, Uchi, Kuchi, Yen,''
The third character ``道'' is compared with the above ``gu, prefecture, eye, and tan'' and a match is found. In this case, since the fourth character does not exist, the output of the recognition unit only matches the first word "Hokkaido" in the prefectural dictionary 3.

ところが都道府県辞書３の第２番目の単語「青
森県」は、その第３番目の文字「県」が前記
「具，県，目，且，旦」のうちの１つと一致し、
またお互に第４番目の文字が存在しないというこ
とでも一致するので、第２番目の単誤「青森県」
は第１番目の単語「北海道」よりもマツチング度
合が大きい。 However, in the second word "Aomori Prefecture" in Prefectural Dictionary 3, the third character "ken" matches one of the above "gu, prefecture, eyes, and, tan",
They also agree that the fourth character does not exist, so the second single error is ``Aomori Prefecture''.
has a higher degree of matching than the first word "Hokkaido".

そして都道府県辞書３の第３番目の単語「秋田
県」は、各文字とも前記候補文字と一致する。す
なわち「秋」は「科，秩，秋，材，林」の１つに
一致し、「田」は「田，内，口，円，由」の１つ
に一致し、「県」は「具，県，目，且，旦」の１
つに一致し、しかも第４番目の文字が存在しない
ことでも一致する。それ故、この場合、この第３
番目の単語とのマツチング度合がもつとも大きい
ので、この第３番の単語である「秋田県」を読取
文字として認識出力するものである。 In the third word "Akita Prefecture" in the prefectural dictionary 3, each character matches the candidate character. In other words, "autumn" matches one of "ka, chichi, autumn, wood, hayashi", "den" matches one of "ta, uchi, kuchi, yen, yu", and "prefecture" matches "ken" 1 of ``gu、prefecture、目、and、dan''
, and the absence of the fourth character also matches. Therefore, in this case, this third
Since the degree of matching with the third word is very high, the third word "Akita Prefecture" is recognized and output as a read character.

以下本発明の一実施例を第３図にもとづき説明
する。 An embodiment of the present invention will be described below based on FIG.

図中、他図と同符号部は同一部分を示し、５は
文字マトリクス・レジスタ、６は順位レジスタ、
７は単語辞書、８はマツチング結果出力レジス
タ、９は結果判定回路、１０は出力レジスタ、１
１はマツチング回路であつてマツチング回路４に
対応するものである。 In the figure, the same reference numerals as in other figures indicate the same parts, 5 is a character matrix register, 6 is a rank register,
7 is a word dictionary, 8 is a matching result output register, 9 is a result judgment circuit, 10 is an output register, 1
Reference numeral 1 denotes a matching circuit, which corresponds to the matching circuit 4.

文字マトリクス・レジスタ５は認識部１から出
力された複数の候補文字がセツトされるレジスタ
であつて、例えば第１番目の文字に対しては第１
順位〜第５順位までの「科」，「秩」，「秋」，「材」
，
「林」がその認識順位〜にしたがつてセツト
され、第２番目の文字および第３番目の文字に対
しても、同様に第１順位〜第５順位までの「田」，
「内」，「口」，「円」，「由」及び「具」，「県」，
「目」，「且」，「旦」がセツトされる。 The character matrix register 5 is a register in which a plurality of candidate characters output from the recognition unit 1 are set. For example, for the first character, the first
"Kana", "Chichi", "Autumn", "Material" from rank to 5th rank
，
``Hayashi'' is set according to its recognition rank, and similarly for the second and third characters, ``田'',
“Uchi”, “Kuchi”, “Yen”, “Yu” and “Gu”, “Prefecture”,
``me'', ``and'', and ``dan'' are set.

順位レジスタ６は文字マトリクス・レジスタ５
にセツトされている文字が出力されるレジスタで
あつて、マツチング回路１１からの順序制御信号
C₁によりそのセツトされる文字が順次変更され
るものである。 Rank register 6 is character matrix register 5
This is a register that outputs the characters set in
The characters set by _C1 are sequentially changed.

単語辞書７は後処理に必要な、例えば都道府県
名用の単語集とか、各都道府県毎に例えば秋田県
内の都市町村名のような分類された複数の単語集
がフアイルされているものであり、マツチング回
路１１からの単語毎の制御信号C₂により分類別
に、しかも一定の順序にしたがつて所望の分類の
単語が順次出力されるものである。 The word dictionary 7 is a file containing a plurality of word collections necessary for post-processing, such as a word collection for prefecture names, and a plurality of word collections classified for each prefecture, such as the names of cities, towns, and villages in Akita Prefecture. According to the control signal _C2 for each word from the matching circuit 11, words of a desired classification are sequentially output according to classification and in a fixed order.

マツチング結果出力レジスタ８は文字マトリク
ス・レジスタ５にセツトされた文字と単語辞書７
から出力された単語との一致度を単語対応に保持
するレジスタである。 The matching result output register 8 contains the characters set in the character matrix register 5 and the word dictionary 7.
This is a register that holds the degree of matching with the word output from the word corresponding to the word.

結果判定回路９はマツチング回路１１にて行な
われたマツチングの結果、そのもつともマツチン
グ度合の大きな単語を選択出力するものである。 As a result of the matching performed by the matching circuit 11, the result determination circuit 9 selects and outputs words with a high degree of matching.

次に第３図の動作について説明する。 Next, the operation shown in FIG. 3 will be explained.

(1) 認識部１から出力された認識候補文字はその
認識順位にしたがつて文字マトリクス・レジス
タ５に出力される。例えば、第２図に示す如
く、第１番目の文字に対しては第１順位〜第５
順位までの「科，秩，秋，材，林」が出力さ
れ、第２番目の文字に対しては「田，内，口，
円，由」が出力され、第３番目の文字に対して
は「具，県，目，且，旦」が出力される。そし
て前記認識部１の出力が都道府県名であること
があらかじめわかつているので、単語辞書７か
ら都道府県名用の単語集フアイル部が読出され
る。この場合、マツチング回路１１から出力さ
れる単語毎の制御信号C₂により先ず「北海道」
が読出される。そしてマツチング回路１１から
の順序制御信号C₁により順位レジスタ６に先
ず「科田具」がセツトされ、「北海道」と比較
されるが、このとき４番目に文字がないという
ことのみで一致するが他は一致しない。次にマ
ツチング回路１１から再び順序制御信号C₁が
出力され、順位レジスタ６に「秩内県」がセツ
トされ同様に「北海道」と比較される。このよ
うにしてマツチング回路１１からの順序制御信
号C₁により順位レジスタ６に第３順位の「秋
口目」、第４順位の「材円且」、第５順位の「林
由旦」が順次セツトされ「北海道」とのマツチ
ングが行なわれるが、これらはいずれも文字同
志不一致であり、その結果第４番目の文字が存
在しないということで一致するので、マツチン
グ結果出力レジスタ８の区分１の(4)に「１」が
記入され区分１の(1)〜(3)には「０」が記入され
る。(1) The recognition candidate characters output from the recognition unit 1 are output to the character matrix register 5 according to their recognition order. For example, as shown in Figure 2, for the first character, the first to fifth
"Kana, Chichi, Autumn, Material, Hayashi" up to the rank are output, and for the second character "Ta, Uchi, Kuchi,
"Yen, Yu" is output, and "gu, prefecture, eye, and tan" is output for the third character. Since it is known in advance that the output of the recognition section 1 is a prefecture name, a word collection file section for prefecture names is read from the word dictionary 7. In this case, the word-by-word control signal _C2 output from the matching circuit 11 first selects "Hokkaido".
is read out. Then, by the order control signal _C1 from the matching circuit 11, "Shidagu" is first set in the order register 6 and compared with "Hokkaido". Others do not match. Next, the order control signal _C1 is output again from the matching circuit 11, and "Chichinai Prefecture" is set in the order register 6, and similarly compared with "Hokkaido". In this way, the order control signal C ₁ from the matching circuit 11 sequentially sets the third rank "Akiguchime", the fourth rank "Zaien", and the fifth rank "Yutan Hayashi" in the rank register 6. is matched with "Hokkaido", but all of these characters are mismatched, and as a result, there is a match in that the fourth character does not exist, so (4 ) is entered with "1" and sections (1) to (3) of category 1 are entered with "0".

(2) このようにして第１番目の単語「北海道」と
の照合が終るとマツチング回路１１は制御信号
C₂を出力し、第２番目の単語「青森県」を出
力させる。それから順序制御信号C₁を出力し
順位レジスタ６に第１順位「科田具」〜第５順
位「林由旦」を順次セツトして前記「青森県」
と照合する。このとき第２順位の「秩内県」に
おける「県」と第４番目の文字がないという２
つの点で一致するので、マツチング回路１１は
マツチング結果出力レジスタ８の区分２の(3)，
(4)にそれぞれ文字「有り」を示すフラグ「１」
が記入され区分２の(1)，(2)にそれぞれ文字「な
し」を示す「０」が記入される。なお、前記文
字「有り」を示すフラグは、文字が一致した場
合のみでなく、前記の如く、文字が存在しない
ということで一致する場合も記入される。(2) When the matching with the first word "Hokkaido" is completed in this way, the matching circuit 11 sends a control signal.
Output C ₂ to output the second word "Aomori Prefecture". Then, the order control signal _C1 is output, and the first rank "Shidagu" to the fifth rank "Yudan Hayashi" are sequentially set in the rank register 6, and the above-mentioned "Aomori Prefecture" is set.
Check with At this time, there is no ``prefecture'' and the fourth character in ``Chichiuchi prefecture'', which is ranked second.
Since they match in two points, the matching circuit 11 performs (3) in section 2 of the matching result output register 8,
Flag “1” indicating the character “present” in (4)
is entered, and the character "0" indicating "none" is entered in (1) and (2) of Category 2, respectively. Note that the flag indicating the character "presence" is written not only when the characters match, but also when the characters match because the characters do not exist, as described above.

(3) 次いでマツチング回路１１は制御信号C₂に
より第３番の単語「秋田県」を出力させ、それ
から前記(1)，(2)と同様にして順位レジスタ６に
「科田具」〜「林由旦」を順次セツトしてこの
「秋田県」との照合を行なう。この場合には第
１順位の「科田具」における「田」、第２順位
の「秩内県」の「県」、第３順位の「秋口目」
における「秋」と第４番目の文字がないという
ことでそれぞれ一致が得られるので、マツチン
グ結果出力レジスタ８の区分３の(1)〜(4)にはそ
れぞれ「１」が記入されることになる。(3) Next, the matching circuit 11 outputs the third word "Akita Prefecture" using the control signal C ₂ , and then, in the same way as in (1) and (2) above, the matching circuit 11 outputs "Shidagu" to "Shinagu" to " ``Yutan Hayashi'' are set one after another and matched with this ``Akita Prefecture.'' In this case, the first rank is "Den" in "Shina Tagu", the second rank is "Prefecture" (Chichiuchi Prefecture), and the third rank is "Akiguchime".
Since a match is obtained for "autumn" and the absence of the fourth character, "1" is entered in (1) to (4) of section 3 of matching result output register 8, respectively. Become.

(4) このようにしてすべての都道府県名との照合
が終了したとき、結果判定回路９はこのマツチ
ング結果出力レジスタ８の各区分の内容にもと
づきそのもつともマツチング度合の大きな区分
３の照合結果により第３番目の都道府県名の
「秋田県」を最終的な読取出力として出力レジ
スタ１０に出力する。このようにして後処理に
より「秋田県」を正確に取出すことができる。(4) When the matching with all prefecture names is completed in this way, the result judgment circuit 9 uses the matching result of the category 3, which has the highest degree of matching, based on the contents of each category of the matching result output register 8. The third prefecture name "Akita Prefecture" is output to the output register 10 as the final read output. In this manner, "Akita Prefecture" can be accurately extracted through post-processing.

なお第４図に示す如く、結果判定回路９′に
第１入力レジスタ１２、第２入力レジスタ１３
および比較制御部１４を設け、第１入力レジス
タ１２にマツチング回路１１からの個々のマツ
チング状態を入力してこれを先に入力している
第２入力レジスタ１３に保持されているものと
のマツチング状態と比較して、新らしく伝達さ
れた第１入力レジスタ１２のマツチング度が大
きいときにこれを第２入力レジスタ１３に記入
し、小さいときにはそのまま第１入力レジスタ
１２に次の単語に対するマツチング度を入力す
るように構成すれば、第３図におけるマツチン
グ結果出力レジスタ８は不必要となり、単語辞
書から読出される被照合単語数が大きい場合で
も簡単な構成で対処することができる。 As shown in FIG. 4, the result judgment circuit 9' includes a first input register 12 and a second input register 13.
and a comparison control section 14, which inputs each matching state from the matching circuit 11 into the first input register 12 and compares it with the matching state held in the second input register 13 which was previously input. When the matching degree of the newly transmitted first input register 12 is large, it is written in the second input register 13, and when it is small, the matching degree for the next word is directly input into the first input register 12. If configured to do so, the matching result output register 8 shown in FIG. 3 becomes unnecessary, and even if the number of matched words read from the word dictionary is large, it can be handled with a simple configuration.

本発明では、前記の如く、単に辞書単語の各
文字が認識候補文字中に有つたか否かのみで判
定するのでハードが簡単である。しかも認識対
象が分類された単語集のように限定される場合
には非常に効果的で十分な実用性が得られる。 In the present invention, as described above, the hardware is simple because the determination is made simply based on whether each character of the dictionary word is included in the recognition candidate characters. Furthermore, when the recognition target is limited, such as a classified word collection, it is very effective and has sufficient practicality.

なお上記説明では候補文字を第５順位まで選
択した例について説明したが、これに限定され
るものではない。例えば第５図に示す如く、こ
れを変更することもできる。第５図の例では、
第１順位が「宮埼県」であつても、その第２順
位の「官崎具」、第３順位の「富峠旦」を都道
府県辞書３と比較してマツチングをとることに
より、入力文字が「宮崎県」であるものと正し
く識別できるので、第３順位まででも第１図に
示した従来のような問題を正しく解決すること
ができる。 In the above explanation, an example was explained in which candidate characters were selected up to the fifth rank, but the present invention is not limited to this. For example, this can be modified as shown in FIG. In the example in Figure 5,
Even if the first rank is "Miyazaki Prefecture", the input characters are can be correctly identified as "Miyazaki Prefecture", so the conventional problem shown in FIG. 1 can be correctly solved even with up to the third rank.

[Brief explanation of drawings]

第１図は従来の文字認識装置説明図、第２図は
本発明の動作原理説明図、第３図は本発明の一実
施例構成図、第４図はその結果判定回路の他の実
施例、第５図は候補文字を第３順位までにした場
合の説明図である。図中、１は認識部、２は出力レジスタ、３は都
道府県辞書、４はマツチング回路、５，５′は文
字マトリクス・レジスタ、６は順位レジスタ、７
は単語辞書、８はマツチング結果出力レジスタ、
９，９′は結果判定回路、１０は出力レジスタ、
１１はマツチング回路、１２は第１入力レジス
タ、１３は第２入力レジスタ、１４は比較制御部
をそれぞれ示す。 Fig. 1 is an explanatory diagram of a conventional character recognition device, Fig. 2 is an explanatory diagram of the operating principle of the present invention, Fig. 3 is a configuration diagram of one embodiment of the present invention, and Fig. 4 is another embodiment of the result judgment circuit. , FIG. 5 is an explanatory diagram when candidate characters are ranked up to the third rank. In the figure, 1 is a recognition unit, 2 is an output register, 3 is a prefectural dictionary, 4 is a matching circuit, 5 and 5' are character matrix registers, 6 is a ranking register, and 7
is a word dictionary, 8 is a matching result output register,
9, 9' are result judgment circuits, 10 is an output register,
11 is a matching circuit, 12 is a first input register, 13 is a second input register, and 14 is a comparison control section.

Claims

[Claims] 1. In a character recognition device that recognizes read characters,
A recognition means for recognizing a read character, a matching means for detecting that the recognized character matches a character entered in a word dictionary, and a determination means for selecting a word with the highest degree of matching, the recognition means The matching means outputs recognition candidate characters of a plurality of ranks, and compares the words held in the word dictionary with the recognition character candidates of the plurality of ranks from the word dictionary, and selects candidates from the recognition means for each character of each word from the word dictionary. A character recognition device characterized in that a flag indicating whether or not the character is present in a character group is created, and the word with the most flags indicating that the character is present is outputted as a recognized word from the determination means. .