JPH03225489A

JPH03225489A - Contact character segmenting system

Info

Publication number: JPH03225489A
Application number: JP2019062A
Authority: JP
Inventors: Tadaaki Kitamura; 忠明北村; Masao Takato; 高藤　政雄; Norio Tanaka; 紀夫田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-01-31
Filing date: 1990-01-31
Publication date: 1991-10-04

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字読取り装置に関し、特に、文字列パター
ンから文字を１文字ずつ切り出す方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character reading device, and particularly to a method for cutting out characters one by one from a character string pattern.

[Conventional technology]

印刷あるいは刻印された文字等を認識する文字認識装置
の一例は、第６図のように被認識パターンをテレビカメ
ラ１等で撮像し、得られたアナログ信号をＡ／Ｄ変換器
２で７〜８ビット程度に量子化し、これを２値化回路３
で２値化し、フレームメモリ４に格納する。このフレー
ムメモリ４に格納された被認識パターンから文字切出回
路５で文字を１文字ずつ切り出し、例えば辞書パターン
メモリ６に予め記憶している辞書パターンと認識回路７
で比較照合し、最も類似しているパターンを認識結果と
して出力するものである。An example of a character recognition device that recognizes printed or engraved characters, etc. is as shown in FIG. It is quantized to about 8 bits and converted to binarization circuit 3.
The data is binarized and stored in the frame memory 4. A character cutting circuit 5 cuts out characters one by one from the pattern to be recognized stored in the frame memory 4. For example, a dictionary pattern previously stored in a dictionary pattern memory 6 and a recognition circuit 7
It compares and matches the patterns and outputs the most similar pattern as the recognition result.

このような認識装置では、文字の品質が悪かったりする
と、上記２値化処理の過程で文字が接触する場合が発生
しやすい。このため、接触した文字を分離するために、
第７図のように文字切出回路５で文字列（イ）の行に対
して垂直な方向の投影パターン（ロ）を求め、投影パタ
ーンの少ない位置で切断し文字認識を行っていた。しか
し、この方法では誤切出しが頻繁に起り認識率が極端に
低下する問題があった。In such a recognition device, if the quality of the characters is poor, the characters are likely to touch each other during the binarization process. Therefore, in order to separate the touching characters,
As shown in FIG. 7, a character cutting circuit 5 obtains a projection pattern (b) in a direction perpendicular to the rows of a character string (a), and cuts the projected pattern at a position where there are fewer projection patterns to perform character recognition. However, this method has a problem in that erroneous clipping occurs frequently and the recognition rate is extremely reduced.

このような単純な処理に対し、特開昭６１−７２３７３
号や特開昭６３−２１６１８８号では改善を加え誤切出
しを防いでいるが、いずれも投影パターンを用いている
ために第８図のような接触パターンに対して切出し位置
を誤まる場合があった。For such simple processing, Japanese Patent Application Laid-Open No. 61-72373
No. 63-216188 and JP-A-63-216188 have been improved to prevent erroneous cutting, but since they both use a projected pattern, there are cases where the cutting position is incorrect for the contact pattern as shown in Figure 8. Ta.

すなわち、投影パターンは垂直方向の画素数をカウント
するものであるため、第８図（ａ）の場合は垂直方向の
画素数はＡ部は全て同じであるため、Ａ部のどこで切出
してよいか決定しにくいし、第８図（ｂ）のように−′
（ハイフン）の投影パターン（Ａ部）と′７′の上部の
投影パターン（Ｂ部）との区別がつかないため切断位置
を誤る場合があった。また、文字の大きさや文字間隔が
変化する場合の対処方法に関しての配慮がなされていな
かった（文字間隔は等間隔としている）。In other words, since the projection pattern counts the number of pixels in the vertical direction, in the case of FIG. 8(a), the number of pixels in the vertical direction is the same for all portions A, so it is difficult to determine where in the portion A should be cut out. It is difficult to determine, and as shown in Figure 8 (b) -'
Because it was difficult to distinguish between the projection pattern of (hyphen) (section A) and the projection pattern of the upper part of '7' (section B), the cutting position could be incorrect. Furthermore, no consideration was given to how to deal with changes in character size or character spacing (character spacing is set at equal intervals).

[Problem to be solved by the invention]

上記従来技術は垂直方向の画素数のみに着目した投影パ
ターンを用いて接触位置を求めているため、−′（ハイ
フン）の区別ができない問題があった。Since the above-mentioned prior art determines the contact position using a projection pattern that focuses only on the number of pixels in the vertical direction, there is a problem in that it is not possible to distinguish -' (hyphen).

又、画像を取込む毎に文字の大きさが変化するような場
合に対して配慮されておらず誤切出しが発生する問題が
あった。Further, there is a problem in that erroneous cutting occurs because no consideration is given to the case where the size of characters changes each time an image is captured.

本発明の目的は、接触文字を含む文字列を正しく読み取
れる接触文字切出し方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a touching character extraction method that can correctly read character strings including touching characters.

[Means to solve the problem]

（ハイフン）を考慮した文字切出しを行うために１本発
明では投影パターンの代りに、文字列の中心Ｘ座標ｙｃ
　　（水平方向をＸ座標、垂直方向をｙ座標とする）回
りについてｙ。より離れている画素は大きな値となるよ
うな重み分布を用いることを特徴とする。また、文字の
大きさが変化したり、文字間隔が多少変動しても切り出
せるように得られた被認識パターンから文字の大きさ（
幅）を求め、これを基準に切り出すことを特徴とする。In order to perform character extraction taking into consideration (hyphen), the present invention uses the center X coordinate yc of the character string instead of the projection pattern.
(where the horizontal direction is the X coordinate and the vertical direction is the y coordinate) y. It is characterized by using a weight distribution such that pixels that are further apart have larger values. In addition, the character size (
It is characterized by determining the width) and cutting out based on this.

[Effect]

（ハイフン）の位置は文字の中心Ｘ座標’／ｃに近い位
置にあることに着目し、上記のように’ｊｃの回りの重
み分布を求めると、’ｊｃに近い画素データのパターン
は小さくなり、　′７′の上部の横線はｙｃより離れて
いるため大きな値となり、単なる投影分布では分離でき
ないパターンについても誤切断することがない。Focusing on the fact that the position of the hyphen (hyphen) is close to the character's center X coordinate '/c, and calculating the weight distribution around 'jc as described above, the pattern of pixel data close to 'jc will become smaller. , '7' has a large value because it is farther away than yc, and patterns that cannot be separated by mere projection distribution are not cut incorrectly.

また、上記重み分布を用いても分離できない文字ブロッ
クについては、基準の文字幅を求めこの幅の何倍に相当
している文字ブロックであるかを判断し、例えば３文字
分の幅があれば、３等分の位置を求め、その位置近辺で
最も重み分布が小さくなる位置で切断すれば良好に切出
せる。In addition, for character blocks that cannot be separated even using the above weight distribution, find the standard character width and determine how many times this width the character block corresponds to.For example, if the width is three characters, then , by finding the position of three equal parts and cutting at the position where the weight distribution is the smallest in the vicinity of that position.

〔Example〕

以下、本発明の一実施例を図を用いて説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第２図は、フレームメモリ４の構造を示す。水平方向を
Ｘ座標、垂直方向をｙ座標とすると１通常Ｘを２５６、
ｙを２５６分割したドツトパターンｇ　（ｘ　ｔ　ｙ　
）として画像データを記憶している。FIG. 2 shows the structure of the frame memory 4. As shown in FIG. If the horizontal direction is the X coordinate and the vertical direction is the y coordinate, 1 normal X is 256,
Dot pattern g (x t y
) is stored as image data.

１ドツトすなわち１画素の値は、２値画像の記憶である
ので、′０′又は′１′の値をとる。Since the value of one dot, that is, one pixel is stored as a binary image, it takes the value of '0' or '1'.

従来技術で述べた投影パターンとは、文字の画素の数を
垂直方向に累積したものであり、第３図のようにある画
素のデータをｇ（ｘ、ｙ）とし、文字の画素のときｇ（
ｘｔｙ）＝１　　それ以外のときｇ（ｘ＋ｙ）＝Ｏとす
ると、ｆ（ｘ）＝Σｇ（ｘ、ｊ）を全てのＸ座標について求めたものである。第３図（ａ
）が文字パターン（ｂ）がその投影パターンを示す。The projection pattern mentioned in the prior art is the vertical accumulation of the number of pixels of a character, and as shown in Figure 3, the data of a certain pixel is g (x, y), and when it is a pixel of a character, g (
xty)=1 Otherwise, if g(x+y)=O, f(x)=Σg(x, j) is obtained for all X coordinates. Figure 3 (a
) is the character pattern (b) indicating its projection pattern.

第４図に′８′と′７′の文字の間に′（ハイフン）が
あるパターン（ａ）の投影パターンｆ（ｘ）を（ｂ）に
示す。あるしきい値ｔｈｌよりも大きな値となる部分で
文字を切出すと、　′７′の一部が欠けてしまう。そこ
で、この投影パターンの代りに、第３図（ａ）に示す文
字列の高さの中心ｙｃの回りの重み分布Ｍ　（ｘ　）を
求める。この重み分布Ｍ　（ｘ　）とは以下の式によっ
て求まる。The projected pattern f(x) of pattern (a) in which there is a hyphen ' (hyphen) between characters '8' and '7' in FIG. 4 is shown in (b). If a character is cut out at a portion where the value is larger than a certain threshold thl, part of '7' will be missing. Therefore, instead of this projection pattern, a weight distribution M (x ) around the center yc of the height of the character string shown in FIG. 3(a) is determined. This weight distribution M (x) is determined by the following equation.

Ｍ　（ｘ　）　＝Σ（ｌｙ　　ｙｃｌ　・ｇ（ｘ、ｙ）
）・・・・・・文字の画素が′１′のときＭ（ｘ）＝Σ
（Ｉｙ−ｙｃｌ　・ｇ（ｘ、ｙ））・・・・・文字の画
素が′０′のときここでｙは、フレームメモリ４の左上端を原点としたと
きの各画素のＸ座標である。また文字列の中心高さｙｃ
は例えば水平方向に投影パターンを求め、しきい値より
大きい頻度を有する領域の中点とする。M (x) = Σ(ly ycl ・g(x, y)
)...When the pixel of the character is '1', M(x)=Σ
(Iy-ycl ・g(x, y))...When the pixel of the character is '0', y is the X coordinate of each pixel with the upper left corner of the frame memory 4 as the origin . Also, the center height of the character string yc
For example, a projection pattern is obtained in the horizontal direction, and the midpoint of a region having a frequency greater than a threshold value is determined.

このＭ（ｘ）は、文字の中心高さｙｃよりも離れている
程大きくなる値であるため、第３図（ｃ）のように、　
−（ハイフン）と′７′の上部の横線との違いがはっき
りと表われる。したがって、しきい値ｔｈｚによって文
字を切出せば文字１（’８’　）、文字２（’７’）が
良好に抽出できる。This M(x) is a value that increases as the distance from the center height yc of the character increases, so as shown in Fig. 3(c),
The difference between the - (hyphen) and the horizontal line above '7' is clearly visible. Therefore, character 1 ('8') and character 2 ('7') can be successfully extracted by cutting out characters using the threshold value thz.

ところで、第５図（ａ）の文字列について上記の手法で
文字を分離すると、文字列の中心高さｙｃ付近で接触し
ている部分はしきい値ｔｈを用いて分割することが可能
であるが、文字１（’５’と′２′）文字５（’５’　
と′７′）を分離することができない。そこで、全体的
な処理を第１図のように行う。By the way, when the characters in the character string in Figure 5(a) are separated using the above method, the parts that touch near the center height yc of the character string can be divided using the threshold value th. But character 1 ('5' and '2') character 5 ('5'
and '7') cannot be separated. Therefore, the overall processing is performed as shown in FIG.

まず、上述した重み分布Ｍ　（ｘ　）を７０回りの重み
分布抽出回路８で求め、この分布に対し切出回路９でし
きい値ｔｈよりも大きくなる部分の座標（ｕｐ座標）と
、小さくなる部分の座標（ｄｗ座標）を求める。更に標
準文字幅抽出回路１０で切出した文字ブロックの幅を前
記ｄｗ座標とｕｐ座標により求め、ある所定の幅を満足
している文字ブロックのみを用いて平均の文字幅を求め
る。第５図の場合なら、文字２，３．４がこれに該当す
る文字ブロックであり、これらの文字幅の平均値を求め
る。もし、所定の値を満足する文字ブロックが少ない（
例えば２ブロツク以下）場合には、標準文字幅の精度が
悪くなるため、文字列全体の平均高さｙ４から推定する
。標準の文字幅をＸ’Ａとするとｘ−ｓ＝ｄＸｙｍ　　（ｄはあらかじめ設定）として求
まる。First, the above-mentioned weight distribution M (x) is obtained by the 70-round weight distribution extraction circuit 8, and for this distribution, the cutting circuit 9 calculates the coordinates (up coordinates) of the portion larger than the threshold th, and Find the coordinates (dw coordinates) of the part. Further, the width of the character block cut out by the standard character width extraction circuit 10 is determined from the dw coordinate and the up coordinate, and the average character width is determined using only the character blocks satisfying a certain predetermined width. In the case of FIG. 5, characters 2, 3.4 are character blocks corresponding to this, and the average value of these character widths is determined. If there are few character blocks that satisfy the specified value (
For example, if the length is 2 blocks or less), the accuracy of the standard character width becomes poor, so it is estimated from the average height y4 of the entire character string. Letting the standard character width be X'A, it can be found as x-s=dXym (d is set in advance).

そして、重み分布Ｍ（ｘ）に対してしきい値ｔｈで分割
できない文字ブロック（第５図の文字１゜文字５）に対
して文字分割回路１１を用い切断する。具体的には、文
字ブロックの幅ｘ４が標準文字幅の何文字分に相当する
かを前記標準文字幅ｘ、−ｓを用いて　β＝ＸｄＸＸ−
８で求め、例えば２文字ならＸ、の中心Ｘ座標を求め、
このＸ座標に対し±γ（文字幅の誤差範囲を設定）の範
囲内で重み分布Ｍ　（ｘ　）の大きさを調べ、最小のＭ
（ｘ）をもつＸ座標で分割する。３文字分なら３等分し
て同様に行うことで良好に文字を分離することができる
。Then, a character block (character 1° character 5 in FIG. 5) which cannot be divided by the threshold value th with respect to the weight distribution M(x) is cut using the character division circuit 11. Specifically, how many characters of the standard character width corresponds to the width x4 of the character block is calculated using the standard character width x, -s as β=XdXX-
For example, if there are two characters, find the center X coordinate of
With respect to this
Divide at the X coordinate with (x). If it is three characters, you can separate the characters well by dividing it into three equal parts and repeating the same process.

このような手法により、文字間隔が多少変動しても±γ
だけ振っているため良好に切出し可能である（等間隔に
印刷されている文字であっても、量子化誤差や文字の太
り、細りなどにより、文字間隔は変動する）。With this method, even if the character spacing changes slightly, ±γ
(Even if the characters are printed at regular intervals, the character spacing will vary due to quantization errors, thickening or thinning of the characters, etc.)

ところで、重み分布Ｍ　（ｘ　）は文字の中心高さ’ｊ
ｃ回りで求めているため、英字のｌ　Ｈｌや′Ｎ′のと
き文字の中心で切断位置が発生してしまう。By the way, the weight distribution M (x) is the center height of the character 'j
Since the calculation is performed around c, the cutting position occurs at the center of the alphabetic characters l Hl and 'N'.

数字ならＯ〜９全てが上辺あるいは下辺に線を持ってい
るため前述のＭ　（ｘ　）で処理できるが、英文字では
不具合が発生する欠点を持つが、数字に関しては確実に
分離可能である。これら英字に対する問題は、従来技術
の投影パターンを用いる方法でも同様に発生する。If it is a number, all of them from 0 to 9 have a line on the top or bottom side, so they can be processed using the above-mentioned M (x), but alphabetic characters have the disadvantage of causing problems, but numbers can be reliably separated. These problems with alphabetic characters also occur in prior art methods using projection patterns.

〔Effect of the invention〕

以上の説明から明らかなように、文字列の中心高さ回り
の重み分布を用いて接触文字の切出し位置を決定してい
るため、　−′（ハイフン）が文字に接触しても良好に
分割することが可能である。As is clear from the above explanation, the cutting position of touching characters is determined using the weight distribution around the center height of the character string, so even if −′ (hyphen) touches a character, it can be divided well. Is possible.

また、文字の大きさが変化しても、文字幅を求めること
によって分割処理を変化させているため、良好に対処可
能である。又、文字間隔が多少変動してもこれに対処可
能である。Furthermore, even if the size of the characters changes, the dividing process is changed by determining the character width, so this can be handled well. Furthermore, even if the character spacing varies somewhat, it can be handled.

[Brief explanation of drawings]

第１図は本発明の接触文字切出回路のブロック図、第２
図はフレームメモリを説明する図、第３図は投影パター
ンの概念図、第４図は投影パターンと本発明の重み分布
の違いを説明する図、第５図は文字列に対しての重み分
布を抽出し、文字を分離した図、第６図は文字認識装置
のブロック図、第７図は従来の投影パターンによる文字
分離方式の説明図、第８図は従来法の問題点を説明する
図である。４・・・フレームメモリ、８・・・重み分布抽出回路、
９・・・切出回路、１０・・・標準文字幅抽出回路、１
１・・・第図第図第図２２８１ｍ１ｊｊ；第４図第第６図図FIG. 1 is a block diagram of the contact character extraction circuit of the present invention, and FIG.
The figure is a diagram explaining the frame memory, Figure 3 is a conceptual diagram of the projection pattern, Figure 4 is a diagram explaining the difference between the projection pattern and the weight distribution of the present invention, and Figure 5 is the weight distribution for character strings. Figure 6 is a block diagram of a character recognition device, Figure 7 is an explanatory diagram of a conventional character separation method using projection patterns, and Figure 8 is a diagram explaining problems with the conventional method. It is. 4... Frame memory, 8... Weight distribution extraction circuit,
9... Cutting circuit, 10... Standard character width extraction circuit, 1
1...Figure 2281m1j j; Figure 4 Figure 6

Claims

[Scope of Claims] 1. In a recognition device that recognizes characters from a character string pattern, a weight distribution extraction circuit that determines the center height y_c of a character string and calculates a weight distribution in the vertical direction with this y_c as an axis;
An extraction circuit calculates the position where the distribution value is larger than a predetermined threshold value (up coordinate) and the position where it is smaller than a predetermined threshold value (dw coordinate), and cuts out the width of the dw coordinate and up coordinate as a character block, and from the character string. A standard character width extraction circuit extracts the standard character width, and the standard character width is used to determine how many characters the width of the character block touches, and when the width of the character block is two or more characters, the width of the character block is determined by the number of characters. A contact character cutting circuit comprising: a character dividing circuit which calculates coordinates equally divided by the number of characters, and calculates a position where the value of the weight distribution is the minimum value within a minute range of left and right with respect to the coordinates. method. 2. In claim 1, the center y-coordinate of the character height y_
The weight distribution M(x) around c is the pixel g(x, y) of the character line.
A contact character extraction circuit characterized by having a weight distribution extraction circuit that calculates the sum of the absolute values of the differences between the y-coordinate of Method. 3. In claim 1, the standard character width is defined as the average width of character blocks whose width satisfies a predetermined value determined by a cutting circuit, or the width of characters in a character string. A contact character extraction method characterized by having a standard character width extraction circuit that estimates from the height.