JPS6336037B2

JPS6336037B2 -

Info

Publication number: JPS6336037B2
Application number: JP54020435A
Authority: JP
Inventors: Yukio Hoshino
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1979-02-22
Filing date: 1979-02-22
Publication date: 1988-07-18
Also published as: JPS55112687A

Description

【発明の詳細な説明】本発明は、ある文字の前後の文字情報や、ある
文字を含む行の平均的な文字情報を利用して判定
する機能を持たせた文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character recognition device having a function of making a determination using character information before and after a certain character or average character information of a line including a certain character.

文字を認識する場合、活字に対しては、入力文
字を２値のメツシユパターンに量子化して、各メ
ツシユに適当なウエイトを持たせたり、あるいは
認識に必要なメツシユだけを選んで構成された標
準パターンとを重ね合わせて類似度あるいは相違
度を求めてカテゴリー名を判定する手法が一般に
行われている。 When recognizing characters, for printed text, the input characters are quantized into a binary mesh pattern and each mesh is given an appropriate weight, or only the meshes necessary for recognition are selected. A commonly used method is to determine the category name by superimposing standard patterns and determining the degree of similarity or dissimilarity.

ところが、印字文字の大きさとか字形が規定す
ることができない場合は、困難な問題がある。そ
の１つには英大文字と英小文字の区別の問題で、
（Ｃ、ｃ）、（Ｏ、ｏ）、（Ｐ、ｐ）、（Ｓ、ｓ）、（
Ｕ、
ｕ）、（Ｖ、ｖ）、（Ｗ、ｗ）、（Ｘ、ｘ）、（Ｚ、ｚ
）
の区別がその例である。また、英小文字と数字で
は、Ｑの小文字のｑと、数字の９、が形が似てい
るため区別が困難な場合がある。 However, a difficult problem arises when the size and shape of printed characters cannot be specified. One of them is the issue of distinguishing between uppercase and lowercase letters.
(C, c), (O, o), (P, p), (S, s), (
U,
u), (V, v), (W, w), (X, x), (Z, z
)
An example is the distinction between Furthermore, when it comes to lowercase letters and numbers, the lowercase letter q in Q and the number 9 are similar in shape, so it may be difficult to distinguish them.

本発明の目的は、英大文字、小文字、数字から
なる文字列中に含まれる特定文字を認識する場合
に、前後の文字の予め定められたカテゴリーの文
字の高さとか位置等の情報を参照することにより
判定を可能とした文字認識方式を提供することに
ある。 The purpose of the present invention is to refer to information such as the height and position of characters in predetermined categories of the preceding and following characters when recognizing a specific character included in a character string consisting of uppercase letters, lowercase letters, and numbers. The object of the present invention is to provide a character recognition method that enables determination based on the above.

本発明は帳票中の文字を２値の量子化パターン
に量子化し、１文字毎の量子化パターンに分離す
る量子化分離手段と、前記量子化パターンP_iの上
端位置T_i及び下端位置B_iを検出する上下端位置検
出手段と、前記量子化パターンP_iを予め用意され
た標準パターンと照合してカテゴリー名C_iを決定
するカテゴリ判定手段と、前記カテゴリー名C_i、
前記上端位置T_i及び前記下端位置B_iを組にして、
文字順に記憶するバツフアーと、前記カテゴリー
名C_iが予め定められたカテゴリー集合S_Aに属する
場合の高さの平均α、及び前記カテゴリー名C_iが
予め定められたカテゴリー集合S_aに属する場合の
高さ平均βとを求める平均高さ検出手段と、前記
バツフアー内のカテゴリー名C_iが予め定められた
カテゴリー集合S_Pに属する時、前記カテゴリーC_i
の最も近くに検出されるカリゴリー集合S_Bに属す
るカテゴリーC_oの最上端位置T_o及び最下端位置
B_oと、前記カテゴリー名C_i、前記最上端位置T_i及
び最下端位置B_iと、前記高さα及びβとを用い
て、前記カテゴリー名C_iを認定するか変更する突
出判定手段とによつて構成されたことを特徴とす
る文字認識方式である。また本発明は帳票中の文
字を２値の量子化パターンに量子化し、１文字毎
の量子化パターンに分離する量子化分離手段と、
前記量子化パターンP_iの上端位置T_i及び下端位置
B_iを検出する上下端位置検出手段と、前記量子化
パターンP_iを予め用意された標準パターンと照合
してカテゴリー名C_iを決定するカテゴリ判定手段
と、前記カテゴリー名C_i前記上端位置T_i及び前記
下端位置B_iを組にして、文字順に記憶するバツフ
アーと、前記カテゴリー名C_iが予め定められたカ
テゴリ集合S_Aに属する場合の高さの平均α、及
び前記カテゴリー名C_iが予め定められたカテゴリ
集合S_aに属する場合の高さの平均βとを求める平
均高さ検出手段と、前記バツフア内のカテゴリー
名C_iが予め定められたカテゴリー集合S_cに属する
時に、前記上端位置T_i前記下端位置B_i前記高さ平
均α及びβとの比較によつて、カテゴリー名C_iを
認定または変更する大小判定手段と、前記バツフ
ア内のカテゴリ名C_iが予め定められたカテゴリー
集合S_Pに属する時前記カテゴリC_iの最も近くに検
出されるカテゴリー集合S_Bに属するカテゴリC_o
の最上端位置T_o及び最下端位置B_oと、前記カテ
ゴリー名C_i前記最上端位置T_i及び最下端位置B_i
と、前記高さ平均α及びβとを用いて、前記カテ
ゴリー名C_iを認定するか変更する突出判定手段と
によつて構成されたことを特徴とする文字認識方
式である。 The present invention provides a quantization separation means for quantizing characters in a form into binary quantization patterns and separating them into quantization patterns for each character, and an upper end position T _i and a lower end position B _i of the quantization pattern P _i . upper and lower end position detection means for detecting the upper and lower end positions; category determination means for determining a category name C _i by comparing the quantization pattern P _i with a standard pattern prepared in advance _;
The upper end position T _i and the lower end position B _i are set as a set,
Buffers to be stored in alphabetical order, average height α when the category name C _i belongs to a predetermined category set S _A , and average height α when the category name C _i belongs to a predetermined category set S _a . an average height detection means for calculating a height average β; and when a category name C _i in the buffer belongs to a predetermined category set S _P , the category C _i
The uppermost position T _o and the lowermost position of the category C _o belonging to the caligorie set S _B detected closest to
B _o , the category name C _i , the top end position T _i and the bottom end position B _i , and the heights α and β, using protrusion determining means for recognizing or changing the category name C _i ; This is a character recognition method characterized by being constructed by. The present invention also provides quantization separation means for quantizing characters in a form into binary quantization patterns and separating them into quantization patterns for each character;
Upper end position T _i and lower end position of the quantization pattern P _i
upper and lower end position detection means for detecting B _i ; category determining means for determining a category name C _i by comparing the quantization pattern _P _i with a standard pattern prepared in advance; _i and the lower end position B _i are stored as a set in alphabetical order, the average height α when the category name C _i belongs to a predetermined category set S _A , and the category name C _i average height detection means for determining the average height _β when belonging to a predetermined category set S _a _; a size determining means for certifying or changing a category name C _i by comparing the position T _i with the lower end position B _i and the height average α and β; and a category name C _i in the buffer that is a predetermined category. A category C _o belonging to the category set S _B that is detected closest to the category C _i when belonging to the set S _P
the top end position T _o and the bottom end position B _o , and the category name C _i , the top end position T _i and the bottom end position B _i
and a prominence determining means for recognizing or changing the category name C _i using the height averages α and β.

第１図は、本発明の文字認識方式の１実施例を
示すブロツク図である。 FIG. 1 is a block diagram showing one embodiment of the character recognition system of the present invention.

第１図の９１は、帳票上の印字された文字を走
査して黒部分を“１”、紙の白地部分を“０”の
ように２値パターンに変換し、１文字づつ分離す
る量子化分離回路である。９２は、１文字づつに
分離された量子化パターンの最上位にある“１”
の高さ位置を最上端Ｔとし、最下位にある“１”
の高さ位置を最下端Ｂとして検出して、レジスタ
９５の一部にセツトする文字の最上端最下端検出
回路である。座標は、上部から順に０、１、２、
３…と与えられるとする。この実現手段は、例え
ば特公昭43−126により可能である。９３は、標
準パターン記憶装置に記憶された標準パターンの
各々を量子化パターンと重ね合わせて相違度（類
似度でも良い）を求めて最も相違度の小さい標準
パターンの名前を判定カテゴリー名としてレジス
タ９５に出力するカテゴリ判定回路である。この
実現手段は、一例として特開昭50−156322号公報
によつても可能である。マルチプレクサ５では、
カテゴリ判定回路が動作している間は信号線９０
７上の信号によつて、レジスタ９５の内容を通す
ようになつており、レジスタ９５に、カテゴリ判
定回路による判定カテゴリが設定されると、アド
レスカウンタ８で指定されるバツフア６の番地
に、レジスタ９５の内容が、信号線９０５上の書
き込み信号により書き込まれる。帳票上の１行が
走査される前に、バツフア６及びアドレスカウン
タ８は、制御部９からの信号９０１Ｓ（信号９０
１から送られる信号を示す。以下、信号線lmn上
の信号を、lmnSで表わす）によつてリセツトさ
れており、マルチプレクサ７は、アドレスカウン
タ８の内容を通すように制御される。１文字が認
識される度に、アドレスカウンタ８の内容が信号
９０２Ｓによつて１づつ増加され、入力の量子化
パターンの最上端、最下端、判定カテゴリ名が、
バツフア６の異る番地に１行分遂次記憶される。 91 in Figure 1 is a quantization method that scans the characters printed on a form, converts them into a binary pattern such as "1" for black parts and "0" for white parts of the paper, and separates each character one by one. It is a separate circuit. 92 is “1” at the top of the quantization pattern separated into individual characters.
The height position of is the top end T, and “1” is at the bottom.
This is a character uppermost and lowermost end detection circuit which detects the height position of B as the lowest end B and sets it in a part of the register 95. The coordinates are 0, 1, 2, starting from the top.
Suppose that 3... is given. This realization means is possible, for example, in Japanese Patent Publication No. 43-126. A register 93 superimposes each of the standard patterns stored in the standard pattern storage device with the quantized pattern to determine the degree of dissimilarity (or similarity), and stores the name of the standard pattern with the smallest degree of dissimilarity as the judgment category name in the register 95. This is a category judgment circuit that outputs This implementation means can also be achieved by, for example, Japanese Patent Laid-Open No. 156322/1983. In multiplexer 5,
While the category judgment circuit is operating, the signal line 90
The contents of the register 95 are passed through according to the signal on the register 95, and when the category judged by the category judgment circuit is set in the register 95, the register 95 is passed through at the address of the buffer 6 specified by the address counter 8. The contents of 95 are written by a write signal on signal line 905. Before one line on the form is scanned, the buffer 6 and address counter 8 receive a signal 901S (signal 90
1 shows the signal sent from 1. The signal on the signal line lmn is hereinafter referred to as lmnS), and the multiplexer 7 is controlled to pass the contents of the address counter 8. Every time one character is recognized, the contents of the address counter 8 are incremented by 1 by the signal 902S, and the top end, bottom end, and judgment category name of the input quantization pattern are
One line is sequentially stored at different addresses in the buffer 6.

第２図は本願の動作を具体的に示すための図で
ある。第２図ａは、Susan opened the box
quicklyという文章を量子化分離回路９１で量子
化した場合の各文字の上端位置、下端位置を示す
ためのものである。上から０、１、２、…20…40
…60…80…99というように、０〜99の100個のメ
ツシユに分けられる。例えば、最初のＳは、上端
位置Ｔは20、下端位置Ｂは60の40メツシユの間に
入つていて、openのｐの文字は上端位置Ｔは40、
下端位置Ｂは80になつている。 FIG. 2 is a diagram specifically showing the operation of the present application. Figure 2 a shows Susan opened the box.
This is to show the upper and lower end positions of each character when the sentence "quickly" is quantized by the quantization separation circuit 91. 0, 1, 2,…20…40 from top
...It is divided into 100 meshes numbered 0 to 99, such as 60...80...99. For example, the first S is between 40 meshes with the top position T at 20 and the bottom position B at 60, and the character P in open is at the top position T at 40,
The lower end position B is 80.

このような量子化された入力文字列は上下端位
置検出回路９２によつて、各文字の上端位置T_i
（ｉはｉ番目の文字を表わす）、下端位置B_iが求め
られ、カテゴリ判定回路９３によつて、各文字の
カテゴリC_iが認識されて出力される。このカテゴ
リ判定回路は、Ｓとｓ、Ｕとｕのように形が同じ
で大きさのみ異るカテゴリの時には大文字と小文
字の区別は出来ないもので、判定出力として、大
文字がなされるものとする。第２図ｂは、ａの入
力文字列に対応したレジスタ６の内容を示す。単
純化するため、T_i、B_iは、15、20、40、60、80の
いずれかにした。quicklyのｑは、英字小文字の
ｑと、数字９とが判定不可として併記されてい
る。 Such a quantized input character string is detected by the upper and lower end position detection circuit 92 to determine the upper end position T _i of each character.
(i represents the i-th character), the lower end position B _i is determined, and the category determination circuit 93 recognizes and outputs the category C _i of each character. This category judgment circuit cannot distinguish between uppercase and lowercase letters when the categories are the same in shape but differ in size, such as S and s, or U and u, so uppercase letters are output as the judgment output. . FIG. 2b shows the contents of the register 6 corresponding to the input character string a. For simplicity, T _i and B _i were set to 15, 20, 40, 60, or 80. The q in "quick" is written with the lowercase alphabetic letter q and the number 9, which cannot be determined.

バツフア６には、１行分貯えられると、マルチ
プレクサ１０において信号線９０４上の信号によ
つて、レジスタ１２、及び１３が選ばれ、アドレ
スカウンタ８は、信号９０１Ｓによつて、クリア
される。検索回路１４は、レジスタ１２のカテゴ
リー名部によつて指定されるメモリ１４１の内容
（１または０）が読み出され、内容が“１”の時、
検索信号“１”と、レジスタ１２の内容である最
上端及び最下端位置とが、平均値計算回路１６に
入力される。メモリ１４１では、大文字
ABCDEFGHIJKLMNQRTY、小文字
bdghklpqy、数字1.2.3.4.5.6.7.8.9.に対応するアド
レスの内容が１となつている。平均値計算回路１
６では、判定カテゴリがこれらのカテゴリー名
（以下、カテゴリ集合S_Aと呼ぶ。すなわちこのカ
テゴリ集合S_Aには、英数字の中で、縦方向の文
字幅が大きい文字が含まれる。）の時に量子化パ
ターンの高さ（上端と下端の表）を加え合わせ
る。そしてバツフア６の内容を読み出し終つた
ら、検索回路１４からの出力信号“１”の和で、
高さの和を割つて平均を求めて、カテゴリー集合
S_Aの文字の高さαとする。 When buffer 6 stores data for one row, registers 12 and 13 are selected in multiplexer 10 by a signal on signal line 904, and address counter 8 is cleared by signal 901S. The search circuit 14 reads out the content (1 or 0) of the memory 141 specified by the category name part of the register 12, and when the content is "1",
The search signal “1” and the contents of the register 12, that is, the uppermost and lowermost positions, are input to the average value calculation circuit 16. In memory 141, uppercase letters
ABCDEFGHIJKLMNQRTY, lowercase
bdghklpqy, the content of the address corresponding to the number 1.2.3.4.5.6.7.8.9. is 1. Average value calculation circuit 1
6, when the judgment category is these category names (hereinafter referred to as category set S _A. In other words, this category set S _A includes characters with the largest vertical character width among alphanumeric characters). Add the heights of the quantization pattern (top and bottom tables). After reading out the contents of the buffer 6, the sum of the output signals "1" from the search circuit 14 is
Divide the sum of heights to find the average and set the category.
Let the height of the character S _A be α.

第２図ｂのレジスタ６のカテゴリC_iのうちで、
レジスタ１４から“１”を出力させるものは、第
２図ｃに示すように左から順にｑ、ｄ、ｈ、ｂ、
ｑ、ｋ、ｌ、ｙの７文字である。これらの高さ
は、全て、Bi−Ti＝40であるから、カテゴリ集
合S_Aの文字の高さ平均αも40となる。 Among the categories C _i of register 6 in Figure 2b,
The registers 14 that output "1" are q, d, h, b, in order from the left as shown in FIG.
There are 7 characters: q, k, l, and y. Since all these heights are Bi-Ti=40, the average height α of the characters in the category set S _A is also 40.

レジスタ１３、検索回路１５、メモリ１５１、
平均値計算回路１７は、レジスタ１２、検索回路
１４、メモリ１４１、平均値計算回路１６に夫々
対応する。但し、メモリ１５１には、小文字の
ａ、ｅ、ｍ、ｎ、ｒ、（以下、カテゴリ集合S_aと
呼ぶ。すなわちこのカテゴリ集合S_aには、英字の
中で縦方向の文字幅か小さい文字が含まれる。）
に対応するアドレスの内容が“１”となつている
ので、平均値計算回路１７では、カテゴリー集合
S_aの文字の高さβを求めることになる。 register 13, search circuit 15, memory 151,
The average value calculation circuit 17 corresponds to the register 12, the search circuit 14, the memory 141, and the average value calculation circuit 16, respectively. However, the memory 151 stores lowercase letters a, e, m, n, r, (hereinafter referred to as category set S _a . In other words, this category set S _a contains letters with the smallest vertical character width among alphabetic characters. (includes)
Since the content of the address corresponding to is "1", the average value calculation circuit 17 calculates the category set.
We will find the height β of the character S _a .

第２図ｂのレジスタ６のカテゴリC_iのうちで、
レジスタ１５から“１”を出力させるものは、第
２図ｄに示すように、左からａ、ｎ、ｅ、ｎ、
ｅ、ｅの６文字である。これらの高さは、全て、
B_i−T_i＝20であるから、カテゴリ集合S_aの文字の
高さ平均βも20となる。（大文字のＰと小文字の
ｐ）あるいは、（小文字のｑと、数字の９）等は
形も大きさも殆んど同一であるがこれらの文字の
最下端Ｂが、近くに並んでいる特定のカテゴリ集
合の文字、例えば大文字Ａ、Ｂ、Ｃ、Ｄ、Ｅ、
Ｆ、Ｇ…Ｙ、Ｚ、小文字のabcdeh、ｉ、ｋ、ｌ、
ｍ、ｎ、ｏ、ｒ、ｓ、ｔ、ｕ、ｖ、ｗ、ｘ、ｚの
うちの１つの文字の最下端よりも下方につきでて
いないなら（この条件を満たす文字の集合をカテ
ゴリ集合S_Bと呼ぶ）、大文字のＰ、つきでていれ
ば小文字のｐと判定することが出来る。具体的に
は、アドレスカウンタ８は、リセツトされて、バ
ツフア６から読み出しを開始する。制御信号９０
４は、今度はレジスタ３０に読み出された情報を
送るようにマルチプレクサ１０を制御する。レジ
スタ３０のカテゴリー部の内容がメモリ３１０に
登録された大文字Ｐ、数字９等カテゴリ集合S_P
（形、大きさが同じ他の英数字があり、それらと
は文字の位置のみが異なつている英数字の集合）
のカテゴリであれば、検索回路３１から信号
“１”が信号線３２に出力される。信号線３２は
制御部９に接続されており、制御部９は、アドレ
スカウンタ８の内容を、アドレスカウンタ５０及
び６０にセツトする。アドレスカウンタ５０は、
信号５１Ｓによつて１つづつ増加するようになつ
ており、アドレスカウンタ６０は信号６１Ｓによ
つて１つづつ減少するようになつている。アドレ
スカウンタ５０，６０のオーバフローは信号５２
Ｓ，６２Ｓを通じて制御部９がチエツクする。ア
ドレスカウンタ５０の内容が１つ増加した後、マ
ルチプレクサ７では、信号５３Ｓをバツフア６の
アドレスとし、アドレスカウンタ６０の内容が１
つ減少した後、マルチプレクサ７では信号６３Ｓ
をバツフア６のアドレスとするように、且つ、信
号５３Ｓと６３Ｓとは交互に発生される。アドレ
スカウンタ５０と６０とは交互に選ばれるように
制御信号９０３が発生される。 Among the categories C _i of register 6 in Figure 2b,
As shown in FIG. 2d, the registers 15 output "1" from the left: a, n, e, n,
There are 6 characters: e and e. All these heights are
Since B _i −T _i =20, the average height β of the characters in the category set S _a is also 20. (uppercase P and lowercase p) or (lowercase q and number 9), etc., are almost the same in shape and size, but the bottom B of these letters is the same as the specific Letters from category sets, such as uppercase letters A, B, C, D, E,
F, G...Y, Z, lowercase letters abcdeh, i, k, l,
If one of the characters m, n, o, r, s, t, u, v, w, x, z does not appear below the bottom edge (the set of characters that meet this condition is defined as a category set S _B ), an uppercase P, and a lowercase P if it appears. Specifically, the address counter 8 is reset and starts reading from the buffer 6. control signal 90
4 controls multiplexer 10 to send the read information to register 30 in turn. The contents of the category section of the register 30 are registered in the memory 310 as a category set S _P such as the capital letter P and the number 9.
(A set of alphanumeric characters that have the same shape and size as other alphanumeric characters, but differ only in the position of the characters)
, the search circuit 31 outputs a signal “1” to the signal line 32. The signal line 32 is connected to the control section 9, and the control section 9 sets the contents of the address counter 8 to the address counters 50 and 60. The address counter 50 is
The address counter 60 is incremented by one by the signal 51S, and the address counter 60 is decremented by one by the signal 61S. Overflow of the address counters 50, 60 is signal 52
The control unit 9 checks through S and 62S. After the content of the address counter 50 increases by one, the multiplexer 7 sets the signal 53S to the address of the buffer 6, and the content of the address counter 60 increases by one.
After the signal 63S has been decreased by
is the address of the buffer 6, and the signals 53S and 63S are generated alternately. A control signal 903 is generated so that address counters 50 and 60 are selected alternately.

マルチプレクサ１０では、制御信号９０４によ
つて、バツフア６から読まれたデータは、レジス
タ４０にセツトされる。レジスタ４０のカテゴリ
ー部の内容がメモリ４１０に登録された前述した
Ａ、Ｂ…YZ、abcdeh、ｉ、ｋ、ｌ、ｍ、ｎ、
ｏ、ｒ、ｓ、ｔ、ｕ、ｖ、ｗ、ｘ、ｚといつたカ
テゴリー集合S_Bの内のものであるかを、検索回路
４１がチエツクし、登録されたものであれば、信
号線４２に“１”を出力する。アドレスカウンタ
５０と６０は交互に選ばれるので、信号線４２に
“１”が出力されたということは、メモリ４１０
に登録されたカテゴリーの文字のうち、レジスタ
３０にセツトされているカテゴリ名の文字の最近
辺にある文字が検索されたことを示している。検
索回路４１から信号線４２に“１”が出力される
と、突出し判定回路３４では、レジスタ４０の、
最下端位置B_o、レジスタ３０のカテゴリー名C_i最
下端位置B_i大文字の大きさを示すα、特定小文字
の大きさを示すβの情報及び0.2、0.3、0.4等の定
数を用いて、判定する。具体的には、カリゴリ名Ｐで、B_i−B_o＞0.3αまたはB_i−B_o
＞0.4βなら小文字ｐカテゴリ名９で、B_i−B_o＞0.3αまたはB_i−B_o
＞0.4βなら小文字ｑカテゴリ名Ｐで、B_i−B_o＜0.2αまたはB_i−B_o
＜0.3βなら大文字Ｐカテゴリ名９で、B_i−B_o＜0.2αまたはB_i−B_o
＜0.3βなら数字９のような計算によつて実現される。但し、ここで
は、最上端位置を使用していないが必要に応じて
使うことはできる。 In multiplexer 10, the data read from buffer 6 is set in register 40 by control signal 904. The contents of the category section of the register 40 are registered in the memory 410.
The search circuit 41 checks whether it is in the category set S _B such as o, r, s, t, u, v, w, x, z, and if it is registered, the signal line is Outputs “1” to 42. Since the address counters 50 and 60 are selected alternately, the fact that "1" is output to the signal line 42 means that the memory 410
This indicates that among the characters of the categories registered in , the characters closest to the characters of the category name set in the register 30 have been retrieved. When “1” is output from the search circuit 41 to the signal line 42, the protrusion determination circuit 34 selects the register 40.
Determination is made using the lowest position B _o , the category name C _i of the register 30, the lowest position B _i , α indicating the size of uppercase letters, β indicating the size of specific lowercase letters, and constants such as 0.2, 0.3, 0.4, etc. do. Specifically, in the caligori name P, B _i −B _o ＞0.3α or B _i −B _o
If >0.4β then lowercase p Category name 9, B _i −B _o >0.3α or B _i −B _o
If >0.4β, lowercase q Category name P, B _i −B _o <0.2α or B _i −B _o
If <0.3β then capital P Category name 9, B _i −B _o <0.2α or B _i −B _o
If <0.3β, it can be realized by calculation like number 9. However, although the topmost position is not used here, it can be used if necessary.

第２図の場合には、第２図ｅに示すように、先
ず、OPenedのＰが検索回路３１から“１”を出
力させる。この“１”がトリガーとなつて、第２
図ｆのように次の文字ｅが、検索回路４１から
“１”を出力させる。カテゴリＰの下端位置B_iは
80、カテゴリーｅの下端位置B_oは60であるから、
突出回路３４において、 B_i−B_o＝80−60＝20＞0.3×40＝12 従つて、カテゴリＰは小文字となる。次に
quicklyのｑが英字のｑか数字の９か分らないま
まであるが、このｑまたは９が検索回路３１から
“１”を引き出し、次の文字のｕが検索回路４１
から“１”を引き出す。カテゴリｑまたは９の下
端位置B_iは80、カテゴリーｕの下端位置B_oは60
であるから B_i−B_o＝80−60＝20＞0.3×40＝12 となつて、カテゴリーは小文字のｑとなる。この
ようにして、小文字ｐとｑとが、第２図ｆのよう
に突出判定回路３４から出力される。上端位置
T_i、下端位置B_iも一緒に出力される。 In the case of FIG. 2, as shown in FIG. 2e, first, P of OPened causes the search circuit 31 to output "1". This “1” becomes the trigger, and the second
As shown in FIG. f, the next character e causes the search circuit 41 to output "1". The bottom position B _i of category P is
80, since the bottom position B _o of category e is 60,
In the salient circuit 34, B _i −B _o =80−60=20>0.3×40=12 Therefore, category P is a lowercase letter. next
I still don't know whether the q in quickly is the alphabetic letter q or the number 9, but this q or 9 pulls out "1" from the search circuit 31, and the next letter u pulls out the number 9 from the search circuit 41.
Pull out “1” from. The bottom position B _i of category q or 9 is 80, and the bottom position B _o of category u is 60.
Therefore, B _i −B _o = 80 − 60 = 20 > 0.3 × 40 = 12, and the category is lowercase q. In this way, the lowercase letters p and q are output from the protrusion determining circuit 34 as shown in FIG. 2f. Top position
T _i and the lower end position B _i are also output.

以上のような動作によつて、大文字のＰか、小
文字のｐか、あるいは数字の９か、小文字のｑか
判定されて、信号線３５に、最上端T_i最下端B_iと
共に送り出される。制御部９は、これらの情報
を、信号線９０６に出力し、更にアドレスカウン
タ８によつて指定されるバツフア６のアドレスに
マルチプレクサ８を経て書き込む。 Through the above operations, it is determined whether it is an uppercase letter P, a lowercase letter p, the number 9, or a lowercase letter q, and the result is sent to the signal line 35 together with the top end T _{i and} the bottom end B _i . The control section 9 outputs this information to the signal line 906 and further writes it to the address of the buffer 6 specified by the address counter 8 via the multiplexer 8.

以上のような手段によつて従来困難であつたマ
ルチフオント活字の英字の大文字Ｐと小文字ｐの
判定、及び英字Ｑと小文字ｑと数字９との判定が
可能である。 By the means described above, it is possible to determine the uppercase letter P and the lowercase letter P, as well as the alphabetical letter Q, the lowercase letter q, and the number 9, in multi-font type, which has been difficult in the past.

ところで、郵便物の宛名のように単に入力帳票
を仕分けるだけであれば英字と数字の区別はとも
かく、大文字と小文字の区別は不必要である。し
かし、読んだ結果を人に送つたりする場合には大
文字と小文字の区別が必要である。そのためには
アドレスカウンタ８をリセツトさせそして、制御
信号９０４Ｓによつてマルチプレクサ１０におい
て、レジスタ２０に、バツフア６からのデータを
セツトする。メモリ２１０では、カテゴリ集合S_c
として、大文字のＣ、Ｏ、Ｓ、Ｕ、Ｖ、Ｗ、Ｘ、
及びＺに対応する番地に“１”が記憶されてお
り、検索回路２１ではレジスタ２０の内容が、こ
れらカテゴリ集合S_C（すなわち、大文字と小文字
の形が同一の文字の集合）に含まれる時に、信号
線２２に信号“１”が送られる。大小判定回路２
４では、信号線２２に信号“１”が表われた時、
レジスタ２０のカテゴリC_i最上端T_i及び最下端
B_i、平均値計算回路１６及び１７の高さα及びβ
とが入力され、これらのT_i、B_i、α、β、及びそ
の他の定数0.8、0.9、1.2、1.3によつて、カテゴ
リC_iが大文字か小文字かを判定する。具体的に
は、｜T_i−B_i｜0.8αまたは｜T_i−B_i｜1.2βなら
小文字に変更する｜T_i−B_i｜0.9αまたは｜T_i−B_i｜1.3βなら
大文字のまゝとするのような判定をして、信号線２６に、確認または
変更後のカテゴリー名C_i最上端位置T_i、最下端位
置B_iを制御部９に送る。制御部９は、この結果を
信号線９０６に送り出し、制御信号９０７Ｓによ
つて、マルチプレクサ５を通過させ、アドレスカ
ウンタ８によつて示されるバツフア６のアドレス
に信号線９０５上の書き込み信号に従つて、書き
込む。このような動作を、バツフア６の内容が全
てレジスタ２０を通過する迄、続ける。その結果
COSUVWXZのように大文字と小文字とが同じ
形のカテゴリが大小の区別をつけられてバツフア
６に入力される。 By the way, if input forms are simply to be sorted, such as the address of mail, there is no need to distinguish between uppercase and lowercase letters, regardless of the distinction between alphanumeric characters and numbers. However, when you send the read result to someone, it is necessary to distinguish between uppercase and lowercase letters. To do this, address counter 8 is reset, and data from buffer 6 is set in register 20 in multiplexer 10 by control signal 904S. In the memory 210, the category set S _c
As, capital letters C, O, S, U, V, W, X,
"1" is stored in the addresses corresponding to and Z, and in the search circuit 21, when the contents of the register 20 are included in these category sets _S , a signal “1” is sent to the signal line 22. Size judgment circuit 2
4, when the signal "1" appears on the signal line 22,
Category C _i of register 20 Top end T _i and bottom end
B _i , heights α and β of average value calculation circuits 16 and 17
is input, and it is determined whether the category C _i is an uppercase or lowercase letter based on these T _i , B _i , α, β, and other constants 0.8, 0.9, 1.2, and 1.3. Specifically, if |T _i −B _i |0.8α or |T _i −B _i |1.2β, change to lower case. If |T _i −B _i |0.9α or |T _i −B _i |1.3β, change to upper case. It makes a decision such as leaving it as is, and sends the confirmed or changed category name C _{i ,} top end position T _i , and bottom end position B _i to the control unit 9 via the signal line 26 . The control unit 9 sends this result to the signal line 906, passes it through the multiplexer 5 according to the control signal 907S, and writes the result to the address of the buffer 6 indicated by the address counter 8 according to the write signal on the signal line 905. , write. This operation is continued until the contents of the buffer 6 have all passed through the register 20. the result
Categories such as COSUVWXZ, in which uppercase and lowercase letters are the same, are input into the buffer 6 with a distinction between sizes.

第２図ｂのレジスタ６においては、まだＳ、
Ｕ、Ｓ、Ｏ、Ｏ、Ｘ、Ｕ、Ｃが大文字か小文字か
不明のままであつたが、これらが、第２図ｋのよ
うに検索回路２１から“１”を引き出す。最初の
カテゴリーＳでは、T_i＝20、B_i＝60であるが大小
判定回路２４において｜20−60｜＝40≧0.9α＝0.9×40＝36 従つて、大文字のＳのまま出力する。このＳ以
外は、全てT_i＝40、B_i＝60であるが、大小判定回
路２４において｜40−60｜＝20≦0.8α＝0.8×40＝32 従つて、全て小文字に変えられて出力される。 In register 6 of FIG. 2b, S,
Although it remains unclear whether U, S, O, O, X, U, and C are uppercase or lowercase letters, they extract "1" from the search circuit 21 as shown in FIG. 2k. In the first category S, T _i =20 and B _i =60, but in the size determination circuit 24, |20−60|=40≧0.9α=0.9×40=36 Therefore, the uppercase letter S is output as is. All letters other than S are T _i = 40 and B _i = 60, but in the size judgment circuit 24, |40−60|=20≦0.8α=0.8×40=32 Therefore, all letters are changed to lower case and output. be done.

これらの結果は、第２図ｉのように示される。
そして、最終的に、レジスタ６の内容は、第２図
ｊのように変えられて、認識が終了する。 These results are shown as in Figure 2i.
Finally, the contents of the register 6 are changed as shown in FIG. 2j, and the recognition is completed.

このような手段を、付加することによつて形が
同じで大きさのみが異るような大文字と小文字を
区別することが出来る。 By adding such means, it is possible to distinguish between uppercase and lowercase letters that have the same shape but differ only in size.

以上説明した文字認識方式は英数字の活字に
とゞまらず、その他の字種例えばカタカナの濁
点″とハ、半濁点゜とロとの区別にも適用出来る。 The character recognition method described above can be applied not only to alphanumeric characters, but also to distinguishing between katakana characters such as ``dakuten'' and ``ha'', and handakuten ゜ and ro.

また、手書文字の認識にも、同様の手段を構ず
ることによつてＣとｃ、Ｗとｗ、等の区別、及び
大文字Ｐと小文字ｐ、Ｑの小文字ｑと数字の９の
区別が可能である。 In addition, by using similar means to recognize handwritten characters, it is possible to distinguish between C and c, W and w, etc., and between uppercase P and lowercase p, lowercase q of Q, and the number 9. It is possible.

[Brief explanation of the drawing]

第１図は、本発明の文字認識方式の１実施例を
示すブロツク図である。第２図は本発明の動作を
説明するための図である。図において、９１は、量子化分離装置、９２は
最上端、最下端検出回路、９３はカテゴリ判定回
路、９４は、標準パターンメモリ、５，７，１０
はマルチプレクサ、６はバツフア、８，５０、及
び６０は、アドレスカウンタ、９は制御部、１
２，１３，２０，３０及び４０は、レジスタ、１
４，１５，２１，３１及び４１は検索回路、１４
１，１５１，２１０，３１０及び４１０は、メモ
リー、１６及び１７は平均値計算回路、２４は大
小判定回路、３４は突出判定回路である。 FIG. 1 is a block diagram showing one embodiment of the character recognition system of the present invention. FIG. 2 is a diagram for explaining the operation of the present invention. In the figure, 91 is a quantization separation device, 92 is a top end/bottom end detection circuit, 93 is a category determination circuit, 94 is a standard pattern memory, 5, 7, 10
is a multiplexer, 6 is a buffer, 8, 50, and 60 are address counters, 9 is a control unit, 1
2, 13, 20, 30 and 40 are registers, 1
4, 15, 21, 31 and 41 are search circuits, 14
1, 151, 210, 310 and 410 are memories, 16 and 17 are average value calculation circuits, 24 is a magnitude determination circuit, and 34 is a protrusion determination circuit.

Claims

[Claims] 1. A character recognition device that recognizes alphanumeric characters, which quantizes characters in a form into a binary quantization pattern,
quantization separation means for separating each character into quantization patterns P _i ; upper and lower end position detection means for detecting the upper end position _{T i} _and lower end position B _i of the quantization pattern P i;
Category determining means for determining _a category name C _i by comparing the quantized pattern _P _i with a standard pattern prepared in advance _; , the buffer to be stored in alphabetical order, the average height α when the category name C _i belongs to the category set S _A , which is a set of alphanumeric characters with the largest vertical width, and the category name C _i average height detection means for calculating the average height β when belongs to a category set S _a which is a set of letters with a small vertical width among lowercase English letters _; When belonging to the category set S _P , which is a set of characters that have the same shape and size but differ only in position among numbers, the character that does not have a protrusion at the bottom among the alphabetic characters detected closest to the category C _i . The uppermost position T _o and the lowermost position B _o of the category C o belonging to the category set S _B which is a set of , the category name _{C i} _, the uppermost position T _i and the lowermost position B _i
and a prominence determining means for recognizing or changing the category name C _i using the height averages α and β. 2. Quantization separation means for quantizing characters in a form into binary quantization patterns and separating them into quantization patterns P _i for each character, and an upper end position T _i and a lower end position B _i of the quantization pattern P _i upper and lower end position detection means for detecting the upper and lower end positions, category determining means for determining the category name C _i by comparing the quantization pattern P _i with a standard pattern prepared in advance, and the category name C _i and the upper end position T _i and the lower end position B _i as a set, a buffer to be stored in alphabetical order, and the category name C _i
The average height α when belongs to the category set S _A , which is a set of characters with a large vertical width among alphanumeric characters, and the category name C _i is a character with a small vertical width among lowercase alphabetic characters. and an average height detection means for calculating the average height β when belonging to a category set S _a that is a set _of , the category name C _i is determined by comparison with the upper end position T _i , the lower end position B _i , and the height averages α and β _.
_size determination means for recognizing or changing the size _of C _i
The top end _position T _o and the bottom end position B _o of the category C _o belonging to the category set S _B , which is a set of alphabetic characters detected closest to , the uppermost position
T _i , the lowest end position B _i , and the average height α and β
and a salient determination means for recognizing or changing the category name C _i using the above character recognition method.