JPH0340434B2

JPH0340434B2 -

Info

Publication number: JPH0340434B2
Application number: JP59229113A
Authority: JP
Priority date: 1984-10-31
Filing date: 1984-10-31
Publication date: 1991-06-18
Also published as: JPS61107486A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文字認識後処理方式、特にパターン認
識された候補文字に対して、候補文字中に特定の
文字があるか否かに着目することにより、文字列
の区切りを見つけ、切り出された文字列に対して
辞書と照合することにより後処理を行うようにし
た文字認識後処理方式に関するものである。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition post-processing method, in particular, to a pattern-recognized candidate character, focusing on whether or not there is a specific character among the candidate characters. This invention relates to a character recognition post-processing method in which post-processing is performed by finding a break in a character string and comparing the extracted character string with a dictionary.

[Conventional technology and problems]

第３図は従来方式による問題点を説明するため
の図を示す。 FIG. 3 shows a diagram for explaining problems with the conventional method.

従来、例えば光学的手段により漢字を認識する
文字認識装置において、認識率を上げるために、
例えば住所辞書というような特定の知識辞書と照
合を行う後処理が行われている。従来方式によれ
ば、例えば第３図に示すように、入力シート１に
住所を記入する場合、都道府県、市郡区、区町村
別に入力する文字列を、区切つて記入しなければ
後処理ができないという問題があつた。即ち、後
処理を行うためには、入力シート１において、予
め都道府県入力枠、市郡区入力枠、区町村入力枠
というように区分された入力枠を持つフオーマツ
ト用紙を使用する必要があつた。 Conventionally, in character recognition devices that recognize kanji using optical means, for example, in order to increase the recognition rate,
For example, post-processing is performed to check against a specific knowledge dictionary such as an address dictionary. According to the conventional method, as shown in Figure 3, when entering an address on the input sheet 1, post-processing is required unless the character strings are entered separately for each prefecture, city, town, and village. The problem was that I couldn't do it. That is, in order to perform post-processing, it was necessary to use a formatted paper that had input boxes divided in advance into prefecture input frames, city/town/ward input frames, and ward/town/village input frames on input sheet 1. .

そのため、入力枠の数が多くなり、用紙のフオ
ーマツテイングが難しくなると共に、記入しにく
く、記入した文字についても読みにくいという問
題があつた。 As a result, the number of input boxes increases, making it difficult to format the paper, making it difficult to fill in information, and making it difficult to read written characters.

[Means for solving problems]

本発明は上記問題点の解決を図り、べた書きさ
れた文字列、例えば住所の場合、都道府県、市郡
区などを区切らずに書かれた文字列に対して、認
識後処理を行い、認識率を向上させる文字認識後
処理方式を提供する。そのため、本発明の文字認
識後処理方式は、入力文字列の認識結果として各
文字に対して候補文字が出力され、それらの候補
文字に対して最終候補を選択する後処理を行う文
字認識装置における文字認識後処理方式におい
て、予め上記入力文字列の区切りとなる特定文字
を登録し記憶する特定文字登録部と、上記特定文
字によつて区切られる文字列について当該特定文
字によつて定まるレベルに対応して意味ある用語
を記憶する辞書と、上記候補文字を検索し候補文
字中における上記特定文字の位置を見つける特定
文字検索部と、該特定文字検索部によつて得られ
た特定文字の位置情報に基づいて上記候補文字か
ら文字列を切り出す候補文字列抽出部と、該候補
文字列抽出部によつて切り出された文字列につい
て上記辞書と照合する辞書照合部とを備えたこと
を特徴としている。 The present invention aims to solve the above-mentioned problems, and performs post-recognition processing on character strings written in solid form, for example, in the case of addresses, character strings written without separating the prefecture, city, town, etc. To provide a character recognition post-processing method that improves the rate. Therefore, in the character recognition post-processing method of the present invention, candidate characters are output for each character as a result of recognition of an input character string, and a character recognition device that performs post-processing to select a final candidate for these candidate characters. In the character recognition post-processing method, there is a specific character registration section that registers and stores specific characters that delimit the input character string in advance, and a level determined by the specific characters for the character strings delimited by the specific characters. a dictionary that stores meaningful terms; a specific character search unit that searches the candidate characters to find the position of the specific character among the candidate characters; and position information of the specific character obtained by the specific character search unit. The present invention is characterized by comprising a candidate character string extraction section that extracts a character string from the candidate characters based on the above, and a dictionary matching section that compares the character string extracted by the candidate character string extraction section with the dictionary. .

[Effect]

本発明は、入力文字を認識した結果、候補文字
の中に特定の入力文字が存在することを利用し、
例えば住所の場合、県、市、郡、区、町などの予
め登録された特定の文字が候補文字の中に存在す
るかどうかを順次探していき、入力文字列をその
特定の文字位置で区切ることにより、べた書きさ
れた文字列について認識後処理を行い得るように
したものである。 The present invention utilizes the fact that a specific input character exists among candidate characters as a result of recognizing input characters,
For example, in the case of an address, it sequentially searches for the presence of specific pre-registered characters such as prefecture, city, county, ward, town, etc. in the candidate characters, and separates the input string at the specific character position. This makes it possible to perform post-recognition processing on solid character strings.

〔Example〕

以下、図面を参照しつつ、実施例に従つて説明
する。 Hereinafter, embodiments will be described with reference to the drawings.

第１図は本発明の一実施例構成、第２図は本発
明の一実施例についての処理態様を説明するため
の図を示す。 FIG. 1 shows the configuration of an embodiment of the present invention, and FIG. 2 is a diagram for explaining the processing aspect of the embodiment of the present invention.

図中、１は入力シート、２は例えばOCR等の
文字入力部、３は入力文字についてパターン解析
し候補文字を選出する認識部、４は選出された候
補文字列が格納される候補メモリ、５は予め文字
列の区切りとなる特定文字が登録される特定文字
登録部、６は住所辞書、７は候補メモリ４におい
て特定文字を検索する特定文字検索部、８は特定
文字によつて区切られた候補文字列を切り出す候
補文字列抽出部、９は候補文字列抽出部８によつ
て切り出された候補文字列について住所辞書６と
照合する辞書照合部、１０は辞書照合結果を出力
する結果出力部を表す。また、第２図において、
符号２０は記入文字列を表している。 In the figure, 1 is an input sheet, 2 is a character input unit such as OCR, 3 is a recognition unit that analyzes patterns of input characters and selects candidate characters, 4 is a candidate memory in which the selected candidate character strings are stored, and 5 6 is an address dictionary, 7 is a specific character search unit that searches for specific characters in the candidate memory 4, and 8 is a specific character registration section in which specific characters that separate character strings are registered in advance. A candidate character string extraction unit extracts a candidate character string; 9 is a dictionary collation unit that collates the candidate character string extracted by the candidate character string extraction unit 8 with the address dictionary 6; 10 is a result output unit that outputs a dictionary collation result represents. Also, in Figure 2,
Reference numeral 20 represents an input character string.

本発明の場合、例えば第２図に示す記入文字列
２０のように、住所を都道府県や市郡区などで区
切らずに、べた書きで入力シート１に記入できる
ようになつている。文字入力部２は、例えば
OCRなどによる光学的手段により入力シート１
を走査し、光の強弱情報を認識部３に伝達する。
認識部３は、入力情報について、例えば位相幾何
学的特徴を抽出したり、ストローク解析を行うな
どして入力文字についての候補文字を選出する
が、この認識処理については、種々の方式が周知
となつており、詳細な説明は省略する。 In the case of the present invention, addresses can be written in solid writing on the input sheet 1 without dividing them into prefectures, cities, towns, etc., for example, as shown in the input character string 20 shown in FIG. 2. The character input section 2 is, for example,
Input sheet 1 by optical means such as OCR
The light intensity information is transmitted to the recognition unit 3.
The recognition unit 3 selects candidate characters for the input characters by, for example, extracting topological features or performing stroke analysis on the input information, and various methods are well known for this recognition process. The detailed explanation will be omitted.

認識部３は、選出した各候補文字について、例
えば第２図に示すように、いわゆる相違度の小さ
い順に順位をつけて、第１位から第20位まで候補
メモリ４に格納する。言うまでもなく、第１位の
候補文字が、必ずしも正しい入力文字であるとは
限らない。 The recognition unit 3 ranks each of the selected candidate characters in descending order of so-called dissimilarity as shown in FIG. 2, and stores them in the candidate memory 4 from the 1st to the 20th rank. Needless to say, the first candidate character is not necessarily the correct input character.

ところで、入力文字列が住所である場合、住所
には通常「都、道、府、県、市、郡、区、町、
村」等の特定の文字が含まれることになる。特定
文字登録部５には、これらの特定文字が、入力文
字列中に現れる順番に従つたレベルに対応して、
予め登録され記憶される。住所の場合、例えば
「都、道、府、県」の各漢字が都道府県レベルと
して登録され、「市、郡、区」の各漢字が市郡区
レベルとして登録され、「区、町、村」の各漢字
が区町村レベルとして登録される。また、特定文
字登録部５には、各レベルまたは各特定文字に対
応して、そのレベル等に現れ得る文字列を記憶す
る辞書へのインデツクス情報が、設定されるよう
になつている。 By the way, when the input character string is an address, the address usually includes the following characters:
This will include specific characters such as "mura". The specific character registration section 5 stores the levels of these specific characters according to the order in which they appear in the input character string.
It is registered and stored in advance. In the case of an address, for example, the kanji for ``都, 道, ふ, 県'' are registered at the prefecture level, the kanji for ``city, county, ward'' are registered as the city/town/ward level, and the kanji for ``ku, town, village'' are registered at the city/town/ward level. ” are registered at the ward, town, and village level. Further, in the specific character registration section 5, index information for a dictionary storing character strings that can appear at that level, etc. is set corresponding to each level or each specific character.

特定文字検索部７は、候補メモリ４に候補文字
列が格納されると、特定文字登録部５を参照し、
各レベルに対応する特定文字をキーにして、候補
文字中にそのキーとなる特定文字が出現するかど
うかを先頭から順次調べていく。第２図に示した
例の場合、第３文字目の第１順位の場所に、「都」
という文字が見つけられることになる。その結
果、第１文字目から第３文字目までが、都道府県
レベルの文字列であることがわかる。次は、第４
文字目から順次調べていくことにより、第６文字
目で「市」が出現するので、第４文字目から第６
文字目までの３文字が、市郡区レベルであると認
識される。同様にして、第７文字目から第10文字
目までが区町村レベルの単語であることが認識さ
れる。 When the candidate character string is stored in the candidate memory 4, the specific character search unit 7 refers to the specific character registration unit 5,
Using a specific character corresponding to each level as a key, whether or not the specific character serving as the key appears among the candidate characters is sequentially checked from the beginning. In the example shown in Figure 2, "Miyako" is placed in the first position of the third character.
You will find the characters. As a result, it can be seen that the first to third characters are character strings at the prefecture level. Next is the 4th
By checking sequentially from the 6th character, "city" will appear in the 6th character, so from the 4th character to the 6th character.
The first three characters are recognized as being at the city/town/ward level. Similarly, it is recognized that the 7th to 10th characters are words at the ward, town, and village level.

なお、これらの特定文字は、必ず候補順位の第
１位に現れなければならないわけではなく、例え
ば第３文字目または第４文字目等の特定文字が現
れやすい場所について、候補順位の高いほうから
順に検索結果が選択されるようになつている。 Note that these specific characters do not necessarily have to appear first in the candidate ranking; for example, for locations where specific characters are likely to appear, such as the third or fourth character, the candidates are ranked first, starting with the highest candidate ranking. Search results are selected in order.

特定文字検索部７によつて、特定文字の位置が
検出されると、候補文字列抽出部８は、候補メモ
リ４に記憶されている候補文字列から、その特定
文字が現れるまでの部分候補文字列を切り出し、
辞書照合部９に通知する。 When the specific character search unit 7 detects the position of the specific character, the candidate character string extraction unit 8 extracts partial candidate characters from the candidate character string stored in the candidate memory 4 until the specific character appears. Cut out the columns,
The dictionary checking unit 9 is notified.

辞書照合部９は、通知された文字列が例えば都
道府県レベルであるとき、住所辞書６の辞書Ａ部
に登録されている各単語と、通知された部分候補
文字列における候補順位に従つた文字の組合わせ
とが、一致するか否かを順次調べていく。これに
より、第２図図示の例の場合、都道府県レベルで
は単語が「東京都」であることがわかる。なお、
住所辞書６との照合において２以上の一致する単
語がある場合、候補順位のポイント計算により、
候補順位のより高い方の文字の組合わせのものが
選出されるようになつている。 When the notified character string is, for example, at the prefecture level, the dictionary matching unit 9 compares each word registered in the dictionary A part of the address dictionary 6 with the characters according to the candidate ranking in the notified partial candidate character string. It is sequentially checked to see if the combinations match. As a result, in the case of the example shown in FIG. 2, it can be seen that the word is "Tokyo" at the prefecture level. In addition,
If there are two or more matching words in the comparison with the address dictionary 6, by calculating the candidate ranking points,
The character combination with the higher candidate ranking is selected.

同様に、第２図図示の例において市郡区レベル
では、第４文字目から第６文字目までの部分候補
文字列について、住所辞書６の辞書Ｂ部との照合
により、「町田市」が照合結果として得られる。
さらに区町村レベルでは、第７文字目ないし第10
文字目までの部分候補文字列により、「真光寺町」
が照合結果として得られる。 Similarly, at the city/town/ward level in the example shown in FIG. Obtained as a matching result.
Furthermore, at the ward/town/village level, the 7th to 10th letters
"Shinkoji Town" is determined by the partial candidate character string up to the first character.
is obtained as the matching result.

照合結果は、結果出力部１０に通知され、結果
出力部１０は、必要に応じて入力者への確認を行
つて、最終的な認識結果を確定し、予め定められ
た機器への出力処理等を実行する。 The verification results are notified to the result output unit 10, and the result output unit 10 confirms with the inputter as necessary, determines the final recognition result, and outputs it to a predetermined device, etc. Execute.

以上、住所の文字入力を例に説明したが、本発
明は、例えば会社における所属等の入力におい
て、「部」や「課」などを特定文字とするという
ように、文字列の区切りに通常よく現れる文字が
あるものについて同様に適用することができる。
また、手書き文字に限らず、活字による印刷文字
の認識等にも適用できる。 The above explanation has been given using the character input of an address as an example, but the present invention is generally useful for separating character strings, for example, when inputting affiliation in a company, "department" or "section" is used as a specific character. The same can be applied to anything that has characters that appear.
Furthermore, the present invention is applicable not only to handwritten characters but also to recognition of printed characters.

〔Effect of the invention〕

以上説明した如く、本発明によれば、べた書き
された入力文字列を、後処理可能な単語単位に区
切ることができるので、複数の候補から最も妥当
な最終的候補を選出する後処理を行うことがで
き、認識率を向上させることができる。入力文字
列について、べた書きが可能であることから、記
入にあたつて書き易く、記入された文字列は読み
易い。また、用紙の無駄を少なくすることができ
る。入力者は特定文字を意識する必要はなく、入
力者に負担を与えることはない。 As explained above, according to the present invention, a solid input character string can be divided into word units that can be post-processed, so post-processing is performed to select the most appropriate final candidate from a plurality of candidates. It is possible to improve the recognition rate. Since the input character string can be written in solid form, it is easy to write and the entered character string is easy to read. Additionally, paper waste can be reduced. The inputter does not need to be aware of specific characters, and there is no burden on the inputter.

[Brief explanation of drawings]

第１図は本発明の一実施例構成、第２図は本発
明の一実施例についての処理態様を説明するため
の図、第３図は従来方式による問題点を説明する
ための図を示す。図中、１は入力シート、２は文字入力部、３は
認識部、４は候補メモリ、５は特定文字登録部、
６は住所辞書、７は特定文字検索部、８は候補文
字列抽出部、９は辞書照合部、１０は結果出力
部、２０は記入文字列を表す。 Fig. 1 shows the configuration of an embodiment of the present invention, Fig. 2 is a diagram for explaining the processing mode of an embodiment of the present invention, and Fig. 3 is a diagram for explaining problems with the conventional method. . In the figure, 1 is an input sheet, 2 is a character input section, 3 is a recognition section, 4 is a candidate memory, 5 is a specific character registration section,
6 is an address dictionary, 7 is a specific character search section, 8 is a candidate character string extraction section, 9 is a dictionary collation section, 10 is a result output section, and 20 is an input character string.

Claims

[Claims]

1 In a character recognition post-processing method in a character recognition device that outputs candidate characters for each character as a recognition result of an input character string, and performs post-processing to select the final candidate for those candidate characters, the above input characters are a specific character registration unit that registers and stores specific characters that delimit columns; a dictionary that stores meaningful terms corresponding to levels determined by the specific characters for character strings delimited by the specific characters; a specific character search unit that searches the candidate characters and locates the position of the specific character among the candidate characters;
A candidate character string extraction unit that extracts a character string from the candidate characters based on the position information of the specific character obtained by the specific character search unit, and the above-mentioned character string extracted by the candidate character string extraction unit. A character recognition post-processing method characterized by comprising a dictionary collation unit that collates with a dictionary.