JPH0458073B2

JPH0458073B2 -

Info

Publication number: JPH0458073B2
Application number: JP57132620A
Authority: JP
Inventors: Hiroyuki Kami
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-07-29
Filing date: 1982-07-29
Publication date: 1992-09-16
Also published as: JPS5922179A

Description

【発明の詳細な説明】本発明は文字サンプル帳票の文字により辞書を
作り、帳票読取時には作られた辞書との照合によ
り文字を認識する文字認識方法に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character recognition method in which a dictionary is created from the characters of a character sample form, and when the form is read, the characters are recognized by comparison with the created dictionary.

従来、この種の文字認識方法では乱雑な文字を
書く人でも各個人に限定すれば、字形は似たパタ
ーンになるというこで、帳票記入者が何回も書い
た同一形式の帳票を読ませ、各文字の特徴を抽出
し、文字カテゴリごとに得られる特徴量の範囲を
求め帳票記入者の辞書としている。 Conventionally, with this type of character recognition method, even if a person writes messy characters, if the characters are limited to each individual, the letterforms will have a similar pattern. , the features of each character are extracted, and the range of feature values obtained for each character category is determined and used as a dictionary for form fillers.

第１図は辞書作成のための手引き文字サンプル
帳票の一例を示す図であり、何というカテゴリ名
かはこの例の場合、帳票上の位置によつて決めら
れている。 FIG. 1 is a diagram showing an example of a guide character sample form for dictionary creation, and in this example, the category name is determined by the position on the form.

ところで、この方法でも、他カテゴリの特徴量
を考慮しての辞書作成でないために似た形の異な
るカテゴリに対して抽出される特徴量は違わなけ
ればならず、マクロな特徴とミクロな特徴とを同
時に多数抽出し、辞書を作る必要があり、辞書作
成は困難である。また、他カテゴリ全部の特徴量
を考慮しての辞書作成では、時間がかかりすぎ
る。 By the way, even with this method, the feature values extracted for different categories with similar shapes must be different because the dictionary is not created taking into account the feature values of other categories, and it is difficult to distinguish between macro features and micro features. It is difficult to create a dictionary because it is necessary to extract many at the same time and create a dictionary. Furthermore, creating a dictionary by taking into consideration the feature amounts of all other categories takes too much time.

本発明の目的は上記問題を解決する分割処理に
より辞書を作る文字認識方法を提供することにあ
る。 SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition method for creating a dictionary through division processing, which solves the above-mentioned problems.

上記目的を達成するため、本発明の文字認識方
法は、まず、文字サンプル帳票を入力し、各文字
ごとに与えたカテゴリ名と予め定めた複数個の特
徴の特徴量を符号化したコード値の列とを記憶し
文字サンプル帳票上の文字に対する符号化が終了
すると、前記特徴の中の一つのマクロな特徴（以
下特徴Ｈとする）における各カテゴリごとのコー
ド値出現頻度分布から、コード値の範囲内にある
カテゴリ数が少なくなるようにコード値範囲を決
定し、各コード値範囲ごとに前記特徴Ｈに対応す
るコード値が範囲内にあるコード値の列を使用し
同一カテゴリ名のコード値の列を他カテゴリのコ
ード値の列を含まないようにして各特徴ごとにコ
ード値を組合せ下限値コードと上限値コードとを
求めてコード値の範囲とし、カテゴリ名と各特徴
ごとのコード値の範囲とで分割辞書の辞書要素を
作り、コード値範囲ごとに作られた分割辞書の集
合を辞書とする。 In order to achieve the above object, the character recognition method of the present invention first inputs a character sample form, and generates a code value that encodes the category name given to each character and the feature quantities of a plurality of predetermined features. When the encoding of the characters on the character sample form is completed, the code value is calculated from the code value appearance frequency distribution for each category in one of the macro features (hereinafter referred to as feature H). Determine the code value range so that the number of categories within the range is small, and for each code value range, use the sequence of code values whose code values corresponding to the feature H are within the range, and calculate the code values with the same category name. Do not include the code value columns of other categories, combine the code values for each feature, find the lower limit code and upper limit code, make the code value range, and then calculate the category name and code value for each feature. A dictionary element of a divided dictionary is created using the range of , and a set of divided dictionaries created for each code value range is defined as a dictionary.

本方法は他カテゴリの特徴値を常に考慮して辞
書を作るために、本方法によると自動的に帳票記
入者の文字の認識に適した辞書を作れ、しかも辞
書作成の際同時に処理する必要のあるデータ数を
少なくできるので、作成時間を短縮出来る。 This method creates a dictionary by always considering the feature values of other categories. Therefore, this method automatically creates a dictionary that is suitable for recognizing the characters of the person filling out the form. Since the amount of data can be reduced, the creation time can be shortened.

第２図は従来の文字認識方法を説明するための
具体的な装置のブロツク図であり、帳票読取前に
辞書を補助記憶部７から辞書部５に記憶する。 FIG. 2 is a block diagram of a specific device for explaining the conventional character recognition method, in which a dictionary is stored in the dictionary section 5 from the auxiliary storage section 7 before reading the form.

票帳上の一文字の文字パターンは走査部１で光
電変換され画像データとしてパターンメモリ部２
に記憶される。特徴抽出部３はパターンメモリ部
２内の二次元パターンから認識に必要な特徴の特
徴量を抽出し、照合部４は辞書部５に記憶されて
いる特徴量と抽出された特徴量とも照合し、読取
結果６を出力する。 The character pattern of one character on the form is photoelectrically converted in the scanning unit 1 and stored as image data in the pattern memory unit 2.
is memorized. The feature extraction section 3 extracts the feature amount of the feature necessary for recognition from the two-dimensional pattern in the pattern memory section 2, and the matching section 4 also matches the feature amount stored in the dictionary section 5 with the extracted feature amount. , outputs the reading result 6.

一方、第３図は本発明に係る文字認識方法を説
明するための具体的な装置の一実施例を示すブロ
ツク図であり、まず文字サンプル帳票を入力する
と、帳票上の一文字の文字パターンは走査部１で
光電変換され、画像データとしてパターンメモリ
部２に記憶され、特徴抽出部３はパターンメモリ
部２内の二次元パターンから定められた複数個の
特徴の特徴量を抽出、符号化し、コード値の列と
して与えられたカテゴリ名と共に、コード記憶部
８に記憶する。文字サンプル帳票上の文字に対す
る記憶が終了すると、次に辞書発生部９はコード
記憶部８のコード値列を用い、前記特徴Ｈに対す
る各カテゴリごとのコード値出現頻度分布を作
り、得られたコード値出現頻度分布からコード値
の範囲内にあるカテゴリ数が少なくなるように複
数のコード値範囲を決定、すなわち分割する。そ
の後前記特徴Ｈに対応するコード値が前述のコー
ド値範囲内にあるコート記憶部８のコード値列ご
とに次の分割辞書作成を行い、辞書部５に記憶す
る。同一カテゴリ名のコード値の列を他カテゴリ
のコード値の列を含まないようにして各特徴ごと
にコード値を組合せ、下限値コードと上限値コー
ドとを求めてコード値の範囲とし、カテゴリ名と
各特徴ごとのコード値の範囲とで、分割辞書の辞
書要素を表現する。 On the other hand, FIG. 3 is a block diagram showing an embodiment of a specific device for explaining the character recognition method according to the present invention. First, when a character sample form is input, the character pattern of one character on the form is scanned. It is photoelectrically converted in section 1 and stored as image data in pattern memory section 2, and feature extraction section 3 extracts and encodes the feature quantities of a plurality of features determined from the two-dimensional pattern in pattern memory section 2. It is stored in the code storage unit 8 together with the category name given as a value string. When the storage of the characters on the character sample form is completed, the dictionary generation section 9 uses the code value string in the code storage section 8 to create a code value appearance frequency distribution for each category for the feature H, and generates the obtained code. A plurality of code value ranges are determined, that is, divided, so that the number of categories within the code value range is reduced from the value appearance frequency distribution. Thereafter, the next divided dictionary is created for each code value string in the code storage unit 8 in which the code value corresponding to the feature H is within the code value range described above, and is stored in the dictionary unit 5. Combining the code values for each feature by making sure that the code value columns with the same category name do not include the code value columns of other categories, and then determining the lower limit code and upper limit value code as the range of code values, and specifying the category name. The dictionary element of the divided dictionary is expressed by the code value range for each feature.

従つて、辞書はコード値を特徴ともとに並べた
コード値範囲列で構成される。 Therefore, the dictionary is composed of a code value range sequence in which code values are arranged according to their features.

また、文字サンプル票帳を使用しないときは、
帳票上の文字に対するカテゴリ名をカテゴリ名入
力部１０で与える。 Also, when not using the character sample form,
A category name input section 10 gives a category name for the characters on the form.

第４図はコード値列の一例を示す図であり、カ
テゴリ名と各特徴に対する特徴値の符号化された
コード値を並べたものである。ただし、特徴数は
簡単のために２個とする。 FIG. 4 is a diagram showing an example of a code value string, in which category names and encoded code values of feature values for each feature are arranged. However, the number of features is assumed to be two for simplicity.

第５図は第４図のコード値列の一つの特徴に対
するコード値（前側のコード値）から得られるコ
ード値出現頻度分布の一例を示す図である。 FIG. 5 is a diagram showing an example of the code value appearance frequency distribution obtained from the code value (front code value) for one feature of the code value sequence shown in FIG.

図において、１から８はコード値をCi（ｉ−１
〜６）はカテゴリ名を、記号０は頻度のあること
を表わすとする。コード値範囲を決定する方法の
一つは、まず、カテゴリを最も多く含むコート値
を一つ選び、選ばれたコード値のカテゴリに含ま
れるカテゴリを持つコード値を求め、求められた
コード値の下限と上限とでコード値範囲とする方
法である。 In the figure, 1 to 8 are code values Ci (i-1
~6) are category names, and the symbol 0 represents a frequency. One way to determine the code value range is to first select one code value that includes the largest number of categories, find code values that have categories that are included in the categories of the selected code value, and then This method uses a lower limit and an upper limit as a code value range.

まず、最もカテゴリ数の多いコード値２を選び
コード値２のカテゴリＣ２，Ｃ３，Ｃ５，Ｃ６だ
けを含むコード値を次々に求めコード値範囲の作
成を行う。コード値２の隣のコード値１のカテゴ
リも含まれ、片方の隣のコード値３のカテゴリも
含まれる。次のコード値４のカテゴリＣ４はコー
ド値２のカテドゴリには含まれないので、コード
値範囲１から３が得られる。 First, code value 2, which has the largest number of categories, is selected, and code values including only categories C2, C3, C5, and C6 of code value 2 are successively determined to create a code value range. The category with code value 1 next to code value 2 is also included, and the category with code value 3 adjacent to one side is also included. Since the next category C4 with code value 4 is not included in the category with code value 2, a code value range of 1 to 3 is obtained.

次に残りのコード値に対して処理をくり返す。
次のカテゴリ数の多いコード値は同数のときはコ
ード値の小さい方を選ぶとすと、コード値５が選
択され、同様にコード値５のカテゴリが含まれる
片側のコード値は４が求まり、他方のコード値６
では、カテゴリＣ２が含まれないので、次のコー
ド値範囲は４から５となる。さらに、残りのコー
ド値に対して処理をくり返すと、まず、コード値
６が選ばれ、コード値６のカテゴリとの包含関係
よりコード値範囲６から８が得られる。 The process is then repeated for the remaining code values.
If the next code value with the largest number of categories is the same, choose the one with the smaller code value, then code value 5 will be selected, and similarly the code value of one side that includes the category with code value 5 will be 4, other code value 6
Then, since category C2 is not included, the next code value range will be from 4 to 5. Furthermore, when the process is repeated for the remaining code values, code value 6 is selected first, and the code value range 6 to 8 is obtained from the inclusion relationship of code value 6 with the category.

帳票の読取りは、次のようにして行う。 The reading of the form is performed as follows.

帳票上の一文字の文字パターンは走査部１で光
電変換され、画像データとしてパターンメモリ部
２に記憶され、特徴抽出部３はパターンメモリ部
２内の二次元パターンから定められた特徴の特徴
量を抽出、符号化し、前記特徴Ｈの特徴コード値
に対応して読出された分割辞書のコード値範囲列
と前記特徴抽出部３で得られるコード値列とを照
合し、読取結果６を出力する。ここで特徴抽出部
３において抽出される特徴の種類は大別して２つ
に分けられ、１つは文字線追跡によつて得られる
もの、もう１つは背景解析によつて得られるもの
である。前者は文字を細線パターンに変換し、線
を追跡して検出される端点、分岐点交差点等の特
徴点の個数、位置関係、つながり、特徴点間の曲
り等であり、後者は文字の輪郭を追跡して凹部、
凸部に分割し、各部のわん曲度、各部の開方向、
全長に対する各部の追跡長比、各部の方向ヒスト
グラム等である。例えば、前記特徴Ｈとして凹部
の開方向の特徴を用いる。 The character pattern of one character on the form is photoelectrically converted by the scanning unit 1 and stored in the pattern memory unit 2 as image data, and the feature extraction unit 3 extracts the feature amount of the feature determined from the two-dimensional pattern in the pattern memory unit 2. The extracted and encoded code value range string of the divided dictionary read out corresponding to the feature code value of the feature H is compared with the code value string obtained by the feature extraction section 3, and a reading result 6 is output. The types of features extracted by the feature extractor 3 are roughly divided into two types: one is obtained by character line tracing, and the other is obtained by background analysis. The former converts characters into thin line patterns and traces the lines to detect the number of feature points such as end points, branch points, intersections, etc., their positional relationships, connections, curves between feature points, etc., and the latter converts the outline of characters. Track and recess,
Divided into convex parts, the degree of curvature of each part, the opening direction of each part,
The tracking length ratio of each part to the total length, the direction histogram of each part, etc. For example, the characteristic of the opening direction of the recess is used as the characteristic H.

第６図は第３図に対応する本発明の文字認識方
法をプロセツサとメモリを使つて構成する文字認
識装置の一実施例を示すブロツク図である。１１
は所定のパターン領域を走査する走査回路、１２
はパターンメモリ、１３は照合に使う辞書を記憶
する辞書メモリ、１４は辞書作成に使うカテゴリ
名と特徴量のコード値列を記憶するコードメモ
リ、１５はプログラムメモリ、１６は読取結果を
出力表示する出力装置、１７は出力結果の修正を
行うためのキー入力回路、１８はプログラムメモ
リ１５にセツトする特徴抽出プログラム、照合プ
ログラム、辞書作成プログラム、コード値範囲作
成プログラムを記憶している補助記録装置、２０
はプロセツサである。 FIG. 6 is a block diagram showing an embodiment of a character recognition device using a processor and memory to implement the character recognition method of the present invention corresponding to FIG. 11
12 is a scanning circuit that scans a predetermined pattern area;
13 is a pattern memory, 13 is a dictionary memory that stores a dictionary used for matching, 14 is a code memory that stores code value strings of category names and feature amounts used for dictionary creation, 15 is a program memory, and 16 is an output display for reading results. an output device; 17 is a key input circuit for modifying the output results; 18 is an auxiliary storage device that stores a feature extraction program, a collation program, a dictionary creation program, and a code value range creation program to be set in the program memory 15; 20
is a processor.

第３図における機能を第６図の文字認識装置で
行うには、次のような処理が必要である。 In order to perform the function shown in FIG. 3 with the character recognition device shown in FIG. 6, the following processing is required.

まず、プロセツサ２０は補助的記憶装置１８に
ある特徴抽出プログラムをプログラムメモリ１５
にセツトする。次に文字サンプル帳票を入力する
と帳票上の文字は走査回路１１で走査、量子化さ
れ２値パターンとしてパターンメモリ１２にセツ
トされる。プロセツサ２０はプログラムメモリ１
５にセツトされた特徴抽出プログラムを実行し、
パターンメモリ１２にある２値パターンから特徴
を抽出し、その特徴量を求め符号化し、帳票上の
位置によつて与えられるカテゴリ名と共に得られ
たコード値列をモードメモリ１４に記憶する。文
字サンプル帳票上の文字を次々と処理してコード
メモリ１４への記録が終了すると、コード値範囲
決定処理に入る。プロセツサ２０が補助記録装置
１８にあるコード範囲作成プログラムをプログラ
ムメモリ１５にセツトすると、指定された特徴
（前記特徴Ｈ）に対応するコードメモリ１４内の
コード値を使用し、コード値出現頻度分布を作成
し、前述の方法でコード値範囲を求める。次に各
コード値範囲ごとに分割辞書作成処理に入る。プ
ロセツサ２０は補助記憶装置１８の辞書作成プロ
グラムをプログラムメモリ１５にセツトし、プロ
グラムを実行し、コードメモリ１４のコード値列
をインタフエースバス１９を介して取出し、前記
コード範囲作成プログラムで指定された特徴と同
じ特徴に対応するコード値が求まつたコード値範
囲内にあるコード値列だけで、分割辞書を発生し
辞書メモリ１３にセツトする。各コード値範囲で
の上記処理終了後に、実際の帳票読取りを行う。 First, the processor 20 transfers the feature extraction program stored in the auxiliary storage device 18 to the program memory 15.
Set to . Next, when a character sample form is input, the characters on the form are scanned and quantized by a scanning circuit 11 and set in a pattern memory 12 as a binary pattern. Processor 20 is program memory 1
Run the feature extraction program set to 5.
Features are extracted from the binary pattern stored in the pattern memory 12, their feature amounts are determined and encoded, and the resulting code value string is stored in the mode memory 14 together with the category name given by the position on the form. When the characters on the character sample form are processed one after another and recorded in the code memory 14, a code value range determination process begins. When the processor 20 sets the code range creation program in the auxiliary storage device 18 into the program memory 15, it uses the code values in the code memory 14 corresponding to the specified feature (the feature H) to calculate the code value appearance frequency distribution. and find the code value range using the method described above. Next, divided dictionary creation processing begins for each code value range. The processor 20 sets the dictionary creation program in the auxiliary storage device 18 in the program memory 15, executes the program, takes out the code value string in the code memory 14 via the interface bus 19, and extracts the code range creation program specified by the code range creation program. A divided dictionary is generated and set in the dictionary memory 13 using only code value strings within the code value range for which code values corresponding to the same feature are found. After the above processing is completed for each code value range, actual reading of the form is performed.

帳票が入力されると、帳票上の文字は走査回路
１１で走査量子化され、２値パターンとしてパタ
ーンメモリ１２にセツトされる。プロセツサ２０
はプログラムメモリ１５にある特徴抽出プログラ
ムを実行し、パターンメモリ１２にある２値パタ
ーンから特徴を抽出し、求まつた各特徴量を符号
化し、コード値列に変換すると同時に前述の特徴
Ｈのコード値で分割辞書を辞書メモリ１３から読
出す。 When a form is input, the characters on the form are scanned and quantized by a scanning circuit 11, and set in a pattern memory 12 as a binary pattern. Processor 20
executes the feature extraction program stored in the program memory 15, extracts features from the binary pattern stored in the pattern memory 12, encodes each determined feature amount, converts it into a code value string, and at the same time converts the code of the feature H described above. The divided dictionary is read from the dictionary memory 13 based on the value.

次にプロセツサ２０はプログラムメモリ１５に
セツトされた照合プログラムを実行し、求まつた
特徴料のコード値列と読出された分割辞書のコー
ド値範囲列とで照合を行い、結果を出力装置１６
に出力する。 Next, the processor 20 executes the collation program set in the program memory 15, collates the code value string of the determined feature material with the code value range string of the read divided dictionary, and outputs the result to the output device 16.
Output to.

第７図はコード値範囲を決めるための第５図を
記号で一般的に表現したもので、第５図の○印は
“１”、それ以外は“０”で表示している。また、
一つのコード値ｉの“０”と“１”からなる列を
Vi、カテゴリ数をTiとすると、前述のコード値
範囲作成は、第８図のフローチヤートとなる。第
８図におては10で示す処理は、最大または残つた
コード値の中で最もカテゴリ数の多いコード値を
検出する処理でMIは検出されたコード値、MT
はMIのコード値に対応するカテゴリ数を表わす。
検出されたカテゴリ数が０であれば、コード値範
囲作成は終る。20で示す処理は前述のコード値
MIのカテゴリＶと包含関係にあるMIより小さい
コード値検出を行い得られるコード値はLIであ
り、また30の処理は前述のコード値MIのカテゴ
リＶと包含関係にあるMIより大きいコード値検
出を行いUIとする。前記処理からコード値範囲
LIからUIまでが求まり、処理をくり返すことに
より複数個のコード値範囲が得られる。 FIG. 7 is a general symbol representation of FIG. 5 for determining the code value range, and the ○ marks in FIG. 5 are shown as "1", and the others are shown as "0". Also,
A sequence consisting of “0” and “1” for one code value i
When Vi is the number of categories and Ti is the number of categories, the code value range creation described above becomes the flowchart shown in FIG. In Figure 8, the process indicated by 10 is the process to detect the code value with the largest number of categories among the maximum or remaining code values, and MI is the detected code value, MT
represents the number of categories corresponding to the MI code value.
If the number of detected categories is 0, code value range creation ends. The process indicated by 20 is the code value described above.
The code value obtained by detecting a code value smaller than MI that has an inclusive relationship with category V of MI is LI, and the process of 30 described above detects a code value that is larger than MI that has an inclusive relationship with category V of code value MI. and create the UI. Code value range from the above processing
From LI to UI is determined, and multiple code value ranges can be obtained by repeating the process.

第９図は分割辞書を作るため、文字サンプルか
ら得られたカテゴリ名とあらかじめ決められた何
種類かの特徴の特徴量のコード値を記号で示した
コード値列の図であり、一列として（Ｃ１）548
……６、（Ｃ２）826……５のコード値列が示され
ている。 Figure 9 is a diagram of a code value string in which category names obtained from character samples and code values of several predetermined features are shown as symbols in order to create a divided dictionary. C1) 548
. . . 6, (C2) 826 . . . 5 code value strings are shown.

ここで、前述の特徴Ｈに対するコード値は全て
前述の一つのコード値範囲内にあるとする。 Here, it is assumed that all the code values for the feature H mentioned above are within the one code value range mentioned above.

図において、ｃはカテゴリ名を符号化したカテ
ゴリパラメータを、ｋはサンプル数を、Ｆ（ｃ，
ｋ）は特徴量のコード値を表わすとすると、文字
サンプル数は、各カテゴリごとに同数のＬ個づ
つ、カテゴリ数はＮ個、特徴数はＭ個であること
を表わしている。 In the figure, c is the category parameter that encodes the category name, k is the number of samples, and F(c,
Assuming that k) represents the code value of the feature amount, this means that the number of character samples is the same L for each category, the number of categories is N, and the number of features is M.

第１０図は第９図の記号を使つて分割辞書を作
るフローチヤート図である。 FIG. 10 is a flowchart for creating a divided dictionary using the symbols shown in FIG.

１１０で示す処理は、カテゴリパラメータｃと
サンプル数に対応するサンプル数パラメータｋで
決まるメモリ上の位置Ｐ（ｃ，ｋ）を文字Ａでク
リアする処理で、すでに辞書作に使われたかを示
すフラグとみなし、Ｐ（ｃ，ｋ）＝Ａであれば、未
処理を表わす。 The process indicated by 110 is the process of clearing the memory location P (c, k) determined by the category parameter c and the sample number parameter k corresponding to the number of samples with the letter A, and clears a flag indicating whether it has already been used for dictionary creation. If P(c,k)=A, it means unprocessed.

１２０で示す処理は未処理、すなわちＰ（ｃ，
ｋ）＝Ａのとき、Ｐ（ｃ，ｋ）をもとに特徴Fjの特
徴値の下限値F₁ｊと上限値F₃ｊを作るを作り処
理であり、Ｐ（ｃ，ｋ）−Ｙであれな処理ずみを表
わす。 The processing indicated by 120 is unprocessed, that is, P(c,
When k) = A, the process is to create the lower limit value F ₁ j and upper limit value F ₃ j of the feature value of feature Fj based on P (c, k), and P (c, k) - Y Represents something that has been processed.

１３０で示す処理は、１２０で指定されたカテ
ゴリパラメータ値ｃと同じパランメータ値ｃで、
サンプル数パラメータｋを変えて未処理のＰ（ｃ，
ｋ）を求め、前記サンプル数パメタータｋの特徴
Fjの特値をF₂ｊとする処理である。 The process indicated by 130 uses the same parameter value c as the category parameter value c specified in 120,
The raw P(c,
k), and the characteristics of the sample number parameter k
This is a process in which the special value of Fj is set to F ₂ j.

１４０で示す処理は前記特徴値F₁ｊとF₂ｊの
うち１７０で示す処理は、前述の１３０，１４
０，１５０および１６０処理を、サンプル数パラ
メータｋを変えて全サンプル数Ｌ回くり返すため
の処理である。 The process indicated by 140 is the process indicated by 170 of the feature values F ₁ j and F ₂ j, which is the process indicated by 130 and 14 described above.
This is a process for repeating the 0, 150, and 160 processes for a total number of samples L times by changing the sample number parameter k.

１８０で示す処理はカテゴリパラメータｃと特
徴Fjの下限値F₁ｊと上限値F₃ｊとで１つの辞書
を作る処理である。 The process indicated by 180 is a process of creating one dictionary using the category parameter c and the lower limit value F ₁ j and upper limit value F ₃ j of the feature Fj.

１９０で示す処理はサンプル数パラメータｈを
変えて上述の処理を、全サンプル数Ｌ回くり返す
ための処理である。 The process indicated by 190 is a process for changing the sample number parameter h and repeating the above process L times for the total number of samples.

２００で示す処理はカテゴリ数パラメータｃを
変えて上述の各ｃごとの辞書作成処理を、全カテ
ゴリ数Ｎ回くり返すための処理である。 The process indicated by 200 is a process for changing the number of categories parameter c and repeating the above-mentioned dictionary creation process for each c a total of N times, the total number of categories.

従つて、作成される分割辞書は第１１図に示す
ようにカテゴリ名のコード値ｃと各特徴ごとの特
徴量の下限値コードF₁ｊと上限値コードF₃ｊと
から構成される。 Therefore, the created divided dictionary is composed of the code value c of the category name, the lower limit value code F ₁ j and the upper limit value code F ₃ j of the feature amount for each feature, as shown in FIG.

前記処理がコード値範囲ごとくり返され、分割
辞書の集合が本認識方法の辞書である。 The above process is repeated for each code value range, and the set of divided dictionaries is the dictionary of this recognition method.

本範囲作成方法はデータ数が２倍になると、く
り返し回数は約４倍になる。従つて、前述のよう
な分割を行うと同時に処理する必要なデータ数が
減り、辞書作成時間を短縮出来る。例えば４つに
分割すると全辞書作成時間は1/4に減少する。 In this range creation method, when the number of data doubles, the number of repetitions increases approximately four times. Therefore, the amount of data that needs to be processed at the same time as the above-described division is reduced, and the dictionary creation time can be shortened. For example, if you divide it into four parts, the total dictionary creation time will be reduced to 1/4.

最後の照合処理方法の一例を示す。 An example of the final verification processing method is shown below.

読取対象の文字パターンから特徴抽出プログラ
ムの実行によつて得られた特徴量のコード値列
を、F_I1，F_I2……F_IMとすると、前記特徴Ｈのコ
ード値により選択されたコード値範囲での分割辞
書の小さい値の方をFjnに、前記特徴値F₃ｊとF₂
ｊのうち、大きい値の方をFjmにする処理であ
る。 If the code value strings of the features obtained by executing the feature extraction program from the character pattern to be read are F _I1 , F _I2 . . . F _IM , then the code value range selected by the code value of the feature H is The smaller value of the divided dictionary in is Fjn, and the feature values F ₃ j and F ₂
This is a process in which the larger value of j is set to Fjm.

１５０で示す処理は前記ｃ以外のカテゴリパラ
メータａとサンプル数パラメータｌとで決まる位
置にある特徴値Fj（ａ，ｌ）と前記Fjn，Fjmとで
相違量Dalを下記計算式で求め、カテゴリパラメ
ータａとサンプル数パラメータｌとを変えて得ら
れる最小相違量をＤとする処理である。 The process indicated by 150 calculates the difference Dal between the feature value Fj (a, l) at a position determined by the category parameter a other than c and the sample number parameter l and the Fjn, Fjm using the following formula, and This is a process in which D is the minimum difference amount obtained by changing a and the sample number parameter l.

Dal＝_M 〓^j=1 Wj〔Fjn−Fj（ａ，ｌ）〕＋_M 〓^j=1 Wj〔Fjn（ａ，ｌ）−Fjm〕ただし、〔θ〕＝０（θ０），〔θ〕＝θ（θ＞０
）ここでWjは特徴Fjの重みで、統計処理であら
かじめ求まつているとする。 Dal＝ _M 〓 ^j=1 Wj [Fjn − Fj (a, l)] + _M 〓 ^j=1 Wj [Fjn (a, l) − Fjm] However, [θ] = 0 (θ0), [θ] = θ(θ>0
) Here, Wj is the weight of feature Fj, which is assumed to be determined in advance through statistical processing.

１６０で示す処理は最小相違量Ｄが閾値Ｔ以上
であれば、Fjnを特徴Fjの下限値F₁ｊにFjmを特
徴Fjの上限値F₃ｊにし、フラグＰ（ｃ，ｋ）にＹ
を入れて処理ずみとする。 In the process shown at 160, if the minimum difference amount D is equal to or greater than the threshold T, Fjn is set to the lower limit value F ₁ j of the feature Fj, Fjm is set to the upper limit value F ₃ j of the feature Fj, and the flag P(c, k) is set to Y.
Add it and consider it as processed.

上限値コードF₁ｊ(b)、上限値F₃ｊ(b)とで相違
量Ｄ(b)を計算する。 A difference amount D(b) is calculated using the upper limit code F ₁ j(b) and the upper limit value F ₃ j(b).

Ｄ(b)＝_M 〓^j=1 Wj〔F₁ｊ(b)−F_Ij〕＋_M 〓^j=1 〔F_Ij−F₃ｊ(b)〕ただし、〔θ〕＝０（θ０），〔θ〕＝θ（θ＞
０），Wjは特徴Fjの重みである。 D(b)= _M 〓 ^j=1 Wj [F ₁ j(b) − F _Ij ] + _M 〓 ^j=1 [F _Ij − F ₃ j(b)] However, [θ] = 0 (θ0), [θ]=θ(θ>
0), Wj is the weight of feature Fj.

ｂ＝１からＢまでで最小相違量となるｂに対応
するカテゴリ各コード値ｃを読取対象文字の読取
結果とする。 The code value c of each category corresponding to b that has the minimum difference amount from b=1 to B is taken as the reading result of the character to be read.

本発明の特長は、マクロ特徴の特徴値を符号化
して得られるコード値出現頻度分布からカテゴリ
数をもとに複数個のコード値範囲を求め、各コー
ド値範囲ごとに分割辞書を作るようにすることで
同時に考慮する必要のあるデータ数が少なくなり
辞書作成時間が短縮される。今までの説明では一
個の特徴を使つて分割辞書の作成を行つているが
複数個の特等の組を使つても同様に分割辞書を作
れる。 The feature of the present invention is that multiple code value ranges are obtained based on the number of categories from the code value appearance frequency distribution obtained by encoding the feature values of macro features, and a divided dictionary is created for each code value range. By doing so, the number of data that needs to be considered at the same time is reduced, and the dictionary creation time is shortened. In the explanation so far, a divided dictionary is created using a single feature, but a divided dictionary can be created in the same way by using a set of multiple special features.

以上説明したように、本発明によれば特徴量を
符号化しコード列をして記憶した後、文字読取装
置内で辞書が作成でき、読取対象帳票の文字に対
する辞書を発生できるので、性能の良い文字読取
装置を得ることが可能となり、その効果は大なる
ものがある。 As explained above, according to the present invention, a dictionary can be created in the character reading device after the feature values are encoded and stored as a code string, and a dictionary for the characters of the form to be read can be generated, resulting in high performance. It becomes possible to obtain a character reading device, and its effects are significant.

[Brief explanation of drawings]

第１図は辞書作成のための文字サンプル帳票の
一例を示す図、第２図は従来の文字認識方法のブ
ロツク図、第３図は本発明に係る文字認識方式を
具体的に実現した一実施例を示すブロツク図、第
４図はコード値列の一例を示す図、第５図はコー
ド値出現頻度分布の一例を示す図、第６図は本発
明の文字認識方式をプロセツサとメモリと使つて
構成する文字認識装置の一実施例を示すブロツク
図、第７図は第５図のコード値出現頻度分布を記
号で示す図、第８図は第７図の記号を使つてコー
ド値範囲を作るフローチヤートの一例を示す図、
第９図は辞書を作るため、文字サンプルから得ら
れたカテゴリ名と、あらかじめ決められた何種類
かの特徴の特徴量のコード値を記号で例示した図
第１０図は第９図の記号を使つて分割辞書を作る
フローチヤートの一例を示す図、第１１図は辞書
の形式の一例を示す図である。図において、１は走査部、２はパターンメモリ
部、３は特徴抽出部、４は照合部、５は辞書部、
６は出力結果、７は補助記憶部、８はコード記憶
部、９は辞書発生部、１０はカテゴリ名入力部、
１１は走査部、１２はパターンメモリ部、１３は
辞書メモリ、１４はコードメモリ、１５はプログ
ラムメモリ、１６は出力装置、１７はキー入力回
路、１８は補助記憶装置、１９はバスライン、２
０はプロセツサを、それぞれ示す。 Fig. 1 is a diagram showing an example of a character sample form for dictionary creation, Fig. 2 is a block diagram of a conventional character recognition method, and Fig. 3 is a concrete implementation of the character recognition method according to the present invention. A block diagram showing an example, FIG. 4 is a diagram showing an example of a code value string, FIG. 5 is a diagram showing an example of the code value appearance frequency distribution, and FIG. Fig. 7 is a diagram showing the code value appearance frequency distribution of Fig. 5 using symbols, and Fig. 8 shows the code value range using the symbols shown in Fig. 7. A diagram showing an example of a flowchart to create,
Figure 9 shows the category names obtained from character samples and the code values of several predetermined features for making a dictionary. Figure 10 shows the symbols in Figure 9. FIG. 11 is a diagram showing an example of a dictionary format. In the figure, 1 is a scanning section, 2 is a pattern memory section, 3 is a feature extraction section, 4 is a matching section, 5 is a dictionary section,
6 is an output result, 7 is an auxiliary storage section, 8 is a code storage section, 9 is a dictionary generation section, 10 is a category name input section,
11 is a scanning section, 12 is a pattern memory section, 13 is a dictionary memory, 14 is a code memory, 15 is a program memory, 16 is an output device, 17 is a key input circuit, 18 is an auxiliary storage device, 19 is a bus line, 2
0 indicates a processor.

Claims

[Claims]

1 A dictionary created from the feature quantities of features extracted from the characters on the form is stored in the character reading device in advance, and when reading the form, the feature quantities of the determined features are extracted from the characters on the form, and the dictionary is In character recognition, a character sample form is input before reading starts, and a string of code values is encoded with the category name given to each character and the feature quantities of multiple predetermined features. When the encoding of the characters on the character sample form is completed, first, the number of categories is calculated from the code value appearance frequency distribution for each category in one of the features (hereinafter referred to as feature H). Then, using the stored column of category names and code values in which the code value corresponding to the feature H is within the one code value range obtained, The code value column does not include the code value column of other categories, and the code values are combined for each feature to find the lower limit code and upper limit code, and the code value range is set as the category name and each feature. A character recognition method characterized in that dictionary elements of a divided dictionary are created based on code value ranges, and the dictionary is represented by a set of divided dictionaries created for each of the code value ranges.