JPH02230488A

JPH02230488A - Character recognizing device

Info

Publication number: JPH02230488A
Application number: JP1052369A
Authority: JP
Inventors: Hideya Yamaki; 秀哉山木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-03-03
Filing date: 1989-03-03
Publication date: 1990-09-12

Abstract

PURPOSE:To dispense with registering work by a person in advance by automatically generating a limited character kind table targeted to be recognized by providing a character kind fetching part to fetch all character kinds out of a word group and to store them in a character kind storage part. CONSTITUTION:When a character recognition part 2 outputs a candidate character 3 to a word collation part 4, no character kind other than a targeted one is fetched as a candidate as referring to the character kind storage part 5, and the word collation part 4 collates the candidate character 3 with the word group stored in a word storage part 6, and outputs the character as a candidate word 7. Here, the character kind fetching part 8 is comprised between the word storage part 6 and the character kind storage part 5, and picks up the character kind comprising each word and forms the limited character kind table corresponding to the word group into the character kind storage part 5. Consequently, it is possible to easily judge an appropriate candidate word at the word collation part 4. In such a way, the generation of a table for the limitation of the character kind can be automatically performed, and no registering work by the person is required.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字詔，識装置に関し、特に都道府県名等の
各種の目的に応した認識対象とすべき単語群を予め記憶
しておき文字認識結果を組合わせててきる単語がこの単
語群の中に存在するか否かを調べるための単語照合の機
能を有する文字認識装置に関する。[Detailed Description of the Invention] [Industrial Field of Application] The present invention relates to a character recognition device, and particularly relates to a character recognition device in which groups of words to be recognized according to various purposes such as prefecture names are stored in advance. The present invention relates to a character recognition device having a word matching function for checking whether a word obtained by combining character recognition results exists in a word group.

[Conventional technology]

従来、この種の文字認識装置は、文字認識部と単語照合
部とか論理的に離れた構造となっており、文字認識部は
読取対象となる全字種の中から認識結果として妥当な候
補を選ひ出し、単語照合部へ送る動作となっていた。Conventionally, this type of character recognition device has a structure in which the character recognition section and the word matching section are logically separate, and the character recognition section selects valid candidates as recognition results from among all the character types to be read. The operation was to select it and send it to the word matching section.

また、認識部独自で字種限定のためのテーブルを持ち、
認識対象の字種を絞って候補を選び出すｍ造のものもあ
ったか、このテーブルの作成に当たっては、人間の手を
介する操作により作成されるものであった。In addition, the recognition unit has its own table for limiting character types.
Some of these tables were created by narrowing down the types of characters to be recognized and selecting candidates, or by manual operations when creating these tables.

[Invention or problem to be solved]

上述した従来の文字認識装置は、全字種を認識対象とす
る装置の場合、字種の多さに伴い認識部の認識時間が多
くかかる欠点と、不適当な候補文字が単語照合部へ送ら
れるために、余分な彫語叩合時間か取られるという欠点
かあった。The above-mentioned conventional character recognition devices have the drawbacks that, when recognizing all character types, the recognition unit takes a long time to recognize the large number of character types, and that inappropriate candidate characters are sent to the word matching unit. The disadvantage was that it took extra time to hammer out the words.

また、認識部に字種限定のためのテーフルを保有する装
置の場合、認識と単語照合の時間を短縮することはでき
るが、テーフルを作成するために多くの人為的操作が必
要てあった。例えは、ｎ個の単語を照合用の単語として
登録する場合、全ｎ個の単語中に使用される字種を拾い
上げ、これらの字種をテーブル中に登録する作業は、人
間か行っていた。このため、単語数ｎが増大ずる程、人
間が行なう字種のチェックの手間かかかる欠点があり、
また結果的にチェックミスか生じたり、認識率が低下し
たりすることにもなった。Furthermore, in the case of a device in which the recognition unit has a table for limiting character types, the time required for recognition and word matching can be shortened, but many manual operations are required to create the table. For example, when registering n words as words for matching, a person would have to pick up the character types used in all n words and register these character types in a table. . For this reason, as the number of words n increases, there is a drawback that it becomes more time-consuming for humans to check the type of characters.
This also resulted in check errors and a drop in recognition rate.

[Means to solve the problem]

本発明は、予め対象となる単語群を記憶する浄語記憶部
と、予め認識対象とする字種を記憶する字種記憶部と、
入力される文字画像を認，識して前記字種記憶部に記憶
された字種に限定された候補文字を出力する文字認識部
と、複数の前記文字画像について前記文字認識部から出
力される候補文字を組み合をぜててきる単語が前記単語
記憶部に記憶されている単語群の中に存在するかを調べ
る機能を有する単語照合部を備える文字認識装置におい
て、　前記単語記憶部に登録されている中語群に使用さ
れている字種を全て拾い上げ前記字種記憶部に記憶させ
る字種抜き取り部を含んで構成される。The present invention includes a pure word storage unit that stores a group of target words in advance, a character type storage unit that stores a character type that is a recognition target in advance,
a character recognition unit that recognizes and identifies input character images and outputs candidate characters limited to character types stored in the character type storage unit; and a character recognition unit that outputs candidate characters limited to character types stored in the character type storage unit; In a character recognition device comprising a word matching unit having a function of checking whether a word combining candidate characters exists in a group of words stored in the word storage unit, the word matching unit is registered in the word storage unit. The present invention includes a character type extraction section which picks up all the character types used in the Chinese language group and stores them in the character type storage section.

〔Example〕

次に、本発明について、図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は、本発明の読取り装置の構成図である。文字イ
メーシ］は、紙等の媒体から人力される文字のイメージ
画像であり、文字認識部２は文字イメーシ］か何の文字
てあるかを認識し、その結果を複製の候補文字３として
ｔｉｉ語照合部４へ出力する。この時、文字３２識部２
は、認識の対象字種を記憶する字種記憶部５を参照しな
から、対象外の字種を候補として取り上げない機能を有
する。単語照合部４は、文字詔，識部２か出力した候補
文字３を複数文字分、一時的に記憶し、候補文字３を組
合わせて７できる単語を、単語記憶部６に格納されてい
る単語群と照合し、一致または類似した単語を候補単語
７として出力する機能を有する。FIG. 1 is a block diagram of a reading device of the present invention. The character image] is an image of a character manually generated from a medium such as paper, and the character recognition unit 2 recognizes what character is in the character image and uses the result as a candidate character 3 for reproduction. Output to the matching section 4. At this time, character 32 recognition part 2
has a function of not selecting character types other than the target character types as candidates without referring to the character type storage unit 5 that stores character types to be recognized. The word collation section 4 temporarily stores a plurality of candidate characters 3 outputted from the character edict and identification section 2, and stores words that can be formed by combining the candidate characters 3 into 7 in the word storage section 6. It has a function of comparing word groups and outputting matching or similar words as candidate words 7.

一方、１１１語記憶部６と字種記憶部５の間に本発明の
中核である字種抜き取り部８が構成されており、単語記
憶部６に格納されている単語群の中から、各単語を構成
する文字種を拾い上け単語群に対応する限定字種テーブ
ルを字種記憶部５の中に形成する機能を有する。On the other hand, a character type extraction unit 8, which is the core of the present invention, is configured between the 111 word storage unit 6 and the character type storage unit 5. It has a function of picking up the character types constituting the word group and forming a limited character type table corresponding to the word group in the character type storage section 5.

以下に本発明の効果を明確にずるために、都道府県名を
認識する場合を例にとり、各部の動作を説明する。In order to clearly demonstrate the effects of the present invention, the operations of each part will be explained below, taking the case of recognizing prefecture names as an example.

認識に先立ち、都道府県名（゛東京都゛′、″神奈川県
′゜等）を全て単語群として単語記憶部６中に登録して
おく。次に、字種拾い」二げ部８を起動し、全都道府県
名に使用されている字種を拾い士け、都道府県名の限定
テーブルを字種記憶部５グ）１１１Ｇご形成させる。Prior to recognition, all prefecture names (such as ``Tokyo'', ``Kanagawa Prefecture'', etc.) are registered as a word group in the word storage section 6. Next, the second section 8, which picks up character types, is activated. Then, select the character types used for all prefecture names and create a limited table of prefecture names in the character type storage section 5g) 111G.

認識動作か開始されると、文字イメーシ１か次々と入力
されるか、例として紙面上に“′東京都゛′と書かれた
イメーシか入力されてきた場合について第２図を参照し
て説明する。When the recognition operation starts, character images 1 are input one after another, or as an example, a case where an image written as "'Tokyo ゛'" is input on a piece of paper will be explained with reference to Figure 2. do.

文字認識２は“東′”、゛′京”、″都“゜の順に認識
を行ってゆくか、第２図に示すように゛東′″に対する
類似文字として′゛束′”゛京″、“車゛′なとが゛京
′″に対しては“東″、“哀″″、“享′゜なとが、′
゛都”゜に対して“部″′、′゛卸“′、“′郡′”な
とか考えられる（第２図（ａ））か、字種記憶部５の限
定字種テーフルを参照することにより、“車“′、゛哀
′”、゛享”′、″部゛′、゜“卸″、″郡“′は認識
対象文字から除外されるため、候補文字３としては″東
′″に対して゜゜東″と“京゜′か、また゛′京′”に
対して゛′京″′と“東”′が、“都″に対して゛都″
′たけか妥当な候補として単語照合部４に渡される（第
２図（ｂ））。従って−中語として考えられる組合せは
、゛゜東東都“、“東京都′″一　５ ′“京東都′”“京京都“″の４通りに絞られ、’Ｎ語
照合部４ては容易に“′東京都”′を妥当な候補彫語で
あると判定することか可能てある。ｉ；Ｊ．　ｌか字種
の限定機能の効果に関する説明である。Character recognition 2 recognizes "To'", "Kyo", and "To" in this order, or, as shown in Figure 2, recognizes characters similar to "To" such as "To" and "Kyo". , “Kuruma natoga kyo” is “east”, “sadness”, “kyoya nato”,
For ゛都゜゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛゛) As a result, “Kuma”, “Ai”, “Kyo”, “Part”, “Wholesale”, and “Gun” are excluded from the recognition target characters, so “To” is the candidate character 3. For ``゜゜東'' and ``京゜′, and for ``kyo'', ゛′kyo''′ and ``east''′, and for ``to'', ``to''.
' is passed to the word matching unit 4 as a valid candidate (FIG. 2(b)). Therefore, the combinations that can be considered as Chinese words are narrowed down to four: ゛゜Higashi-Toto'', ``Tokyo'''', ``Kyoto'''', and ``Kyo-Kyoto''. It is possible to determine that "'Tokyo"' is a valid candidate carved word. i;J. This is an explanation regarding the effect of the limited function of the character type.

次に、字種抜き取り部８の具体例について説明する。、
第３図は字種記憶部５、乍語記憧部６およひ字種抜ぎ取
り部８の一例の模式図である。１｝１−語記憶部６には
、都道府県名や都市名、人名、その他任意に単話群を登
録しておくこととする。字種記憶部５には各単語群に対
応ずる字種セッ１−からなる字種限定テーフルを記憶す
るエリアを確保する。字種抜き取り部８の動作としては
都道府県名の屯語群からは′゜東京都″、″神奈川県″
′なとの単語に使われている“東″、“京′″、“都′
″、″神″、′゛奈“″、゛ノ１ビ、゛゜県″といった
字種を拾い上げ、字種記憶部５中の都道府県名字種セッ
トの字種限定テーブルに登録する。同様に、都市名、人
名、その他からも字種の拾い上け行ない、対応ずる字種
セットとして登録を行なう。Next, a specific example of the character type extraction section 8 will be explained. ,
FIG. 3 is a schematic diagram of an example of the character type storage section 5, the word recording section 6, and the character type extraction section 8. 1}1- In the word storage unit 6, prefecture names, city names, personal names, and other arbitrary single-word groups are registered. The character type storage section 5 has an area for storing a character type limited table consisting of character type set 1- corresponding to each word group. The operation of the character type extractor 8 is to extract ``゜Tokyo'' and ``Kanagawa Prefecture'' from the Tongo group of prefecture names.
``East'', ``Kyo'', ``To'' used in the word ``nato''
Character types such as ``,''``Kami'',``゛NA'', ゛NO1BI, and ゛゜ken are picked up and registered in the character type limited table of the prefecture name type set in the character type storage section 5. Similarly, character types are picked up from city names, people's names, and others, and registered as a corresponding character type set.

文字認識に際しては、都道府県名の認識を行なう時には
字種記憶部５の都道府県名の字種セツ１一の字種限定テ
ーフルに登録された字種に限定して招，識を行なうこと
とし、同様に都市名、人名等に対しても字種セットに登
録それる字種に限定して認識を行なうものとする。In character recognition, when recognizing a prefecture name, recognition is limited to the character types registered in the character type restriction table in the character type set 11 of the prefecture name in the character type storage unit 5. Similarly, for city names, personal names, etc., recognition is limited to character types that are registered in the character type set.

〔Effect of the invention〕

以上説明したように、本発明は単語群の中から全字種を
抜き取り、字種記憶部に記憶する字種抜き収り部を設け
ることにより、認識対象とする限定字種テーフルの作成
を、自動的に行ない、予め人間の手による登録作業を行
なう必要を無くずことかできる効果かある。As explained above, the present invention extracts all character types from a word group and provides a character type extraction storage unit that stores them in the character type storage unit, thereby creating a limited character type table to be recognized. This is done automatically, which has the effect of eliminating the need for manual registration work in advance.

[Brief explanation of the drawing]

第１図は本発明の一実施例の文字読取装置の構成フロッ
ク図、第２図は第１図に示す文字認識部２におりる字種
の限定の説明図であり、第２図（ａ）は全字種対象とし
た認識時の候補文字、第２図（１））は字種限定認識時
の候補文字を示す。第３図は第１図に示す字種抜き取り部８の動作説明図て
ある。］・・・・文字イメーシ、２・・・文字詔，識部、３・
・・候補文字、４・・・・・・単語照合部、５・・字種
記憶部、６・・・・即語記憶部、７　　候補１林語、８
字種抜き取り部。FIG. 1 is a block diagram of the configuration of a character reading device according to an embodiment of the present invention, and FIG. ) shows candidate characters when recognizing all character types, and FIG. 2 (1)) shows candidate characters when recognizing limited character types. FIG. 3 is an explanatory diagram of the operation of the character type extraction section 8 shown in FIG. 1. ]...Character image, 2...Character edict, Shikibu, 3.
... Candidate character, 4... Word matching section, 5... Character type storage section, 6... Immediate word storage section, 7 Candidate 1 Hayashi language, 8
Character type extraction part.

Claims

[Claims]

A word storage unit that stores a group of target words in advance, a character type storage unit that stores character types to be recognized in advance, and a character type that recognizes an input character image and stores it in the character type storage unit. a character recognition unit that outputs candidate characters limited to , and a word group in which words formed by combining candidate characters output from the character recognition unit for a plurality of the character images are stored in the word storage unit; In a character recognition device including a word matching unit having a function of checking whether a word exists in a word group, the character recognition device picks up all character types used in a word group registered in the word storage unit and stores them in the character type storage unit. A character recognition device comprising a character type extraction section.