JPS6024677A - Character separating device - Google Patents

Character separating device

Info

Publication number
JPS6024677A
JPS6024677A JP58133324A JP13332483A JPS6024677A JP S6024677 A JPS6024677 A JP S6024677A JP 58133324 A JP58133324 A JP 58133324A JP 13332483 A JP13332483 A JP 13332483A JP S6024677 A JPS6024677 A JP S6024677A
Authority
JP
Japan
Prior art keywords
character
lump
width
pitch
separation position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58133324A
Other languages
Japanese (ja)
Inventor
Yoshitake Tsuji
辻 善丈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp, Nippon Electric Co Ltd filed Critical NEC Corp
Priority to JP58133324A priority Critical patent/JPS6024677A/en
Publication of JPS6024677A publication Critical patent/JPS6024677A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To execute a separation to one character even if a contact is generated between characters by forecasting by a character pitch a separated position of a character group image which has generated a contact, and correcting the separating position by referring to a blank information of an adjacent other character-string image. CONSTITUTION:A scanner 1 scans optically a character-string image, converts it to an electric signal, quantizes it and writes it to a character-string image memory 2. A character lump extracting device 3 extracts successively a position, a width and a height of a character image (hereinafter called a character lump) which can be separated by a space, to each character- string unit from the memory 2, and stores them in a character lump information register 4. A character pitch detecting device 5 measures a character pitch by using an information of the register 4. A contact character lump detecting device 6 reads out successively a width of a character lump of the contents of the resister 4, compares it with the character pitch, and when it is detected that plural characters are contained in the character lump, its position, width and height are transferred to a separated position calculating device 7. The device 7 begins with the initial part of the character lump, calculates a candidate example position to be divided in the next time, calculates the upper and lower limit values of a separable section from said result and a prescribed allowable width, and executes a correction.

Description

【発明の詳細な説明】 本発明は、紙面上に記載された文字列イメージを個々の
文字に分離するj文字分離装置、特に接触した文字を分
離する文字分離装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character separation device that separates a character string image written on paper into individual characters, and particularly to a character separation device that separates touching characters.

各種印刷文字群を光学的に読み取る装置(以下OCR,
と呼ぶ)において、一連の文字を認識する場合、各文字
を1字毎に分離して文字認識部に送出してやる必要があ
る。各文字を1字毎に分離するために必要となる情報と
して、文字ピッチがあり、印刷物の大きさが限定できね
ば、前もって与えることができる。しかし、最近のよう
にOCRにおける読み取り対象も不特定な文字ビソヂを
持つ郵便物や文書のような広汎な適用範囲のものがとら
れると、紙面上から文字ピッチを算出する必要があり、
このような文字ピッチ検出方法は、例えば、同一出願人
による特願昭58−33068号明細書「文字ピッチ検
出装置」で示されるような文字ピッチ検出装置などがあ
る。そこで、上述した文字ピッチを用いて、実際に文字
間に接触が生じた文字同志の分離を行う場合、単lこ上
述した文字ピッチにより強制分離すると、文字ピッチの
推定精度などによっては、正しく分離位置が決定されず
、そのため、文字認識部において、誤読や読み取り不能
が生じることがある。
A device that optically reads various groups of printed characters (hereinafter referred to as OCR)
When recognizing a series of characters, it is necessary to separate each character and send them to the character recognition unit. The information required to separate each character is the character pitch, which can be given in advance if the size of the printed matter cannot be limited. However, as has recently been the case, when OCR scans a wide range of applications such as mail and documents with unspecified character visibility, it is necessary to calculate the character pitch from the paper surface.
Such a character pitch detection method includes, for example, a character pitch detection device as disclosed in Japanese Patent Application No. 58-33068 ``Character Pitch Detection Device'' filed by the same applicant. Therefore, when separating characters that actually touch each other using the character pitch described above, if you forcibly separate them using the character pitch described above, it may not be possible to correctly separate the characters depending on the accuracy of character pitch estimation etc. The position is not determined, which may cause misreading or unreadability in the character recognition unit.

そこで、本発明の目的は、上述した問題点を解決するた
めに、文字間に接触が生じた文字群イメージの分離位置
をまず、文字ピッチによって予測し、予測した分離位置
付近における隣接する他の文字列イメージの空白情報を
参照して、分離位置を補正することによって、文字間に
接触が生じた場合にも容易に、文字分離精度を向上させ
ることが可能な文字分離装置を提供することにある。
SUMMARY OF THE INVENTION Therefore, in order to solve the above-mentioned problems, it is an object of the present invention to first predict the separation position of a character group image in which contact has occurred between characters based on the character pitch, and then To provide a character separation device that can easily improve character separation accuracy even when contact occurs between characters by correcting separation positions with reference to blank information in a character string image. be.

本発明によれば、紙面上に記載された複数個の文字列イ
メージを走査し、−文字単位に分離する文字分離装置に
おいて、複数個の文字列イメージからスペースで分離可
能な文字イメージ(以下、文字塊と呼ぶ)の位置と幅と
高さを各文字列単位毎に順次抽出する手段と、複数個の
文字塊を用いて、文字ピッチを検出する手段と、複数個
の文字を含む文字塊を検知する手段と、文字塊の始端位
置から順次始めて、1つ前の分離位置と文字ピッチによ
って、次に分割すべき分離位置を予測し、予測した分離
位置を基準として、左右一定許容幅を持つ分離可能区間
を設け、上方または下方に隣接する文字列イメージ上の
分離可能区間内を、予測して分離位置からサーチして最
初に、スペースを検出した位置によって、予測した分離
位置を補正する手段とを有することを特徴とする文字分
離装置が得られる。
According to the present invention, in a character separation device that scans a plurality of character string images written on a paper surface and separates them into character units, character images (hereinafter referred to as A means for sequentially extracting the position, width, and height of a character block (called a character block) for each character string unit, a means for detecting character pitch using a plurality of character blocks, and a character block containing a plurality of characters. Starting from the starting end position of the character block, the next separation position to be divided is predicted based on the previous separation position and character pitch, and a constant allowable width on the left and right is set based on the predicted separation position. Create a separable section with a space, and search within the separable section on the upper or lower adjacent character string image from the separation position. First, correct the predicted separation position based on the position where a space is detected. A character separating device is obtained, characterized in that it has means.

以下、本発明における具体的一実施例を参照して、説明
する。
Hereinafter, the present invention will be explained with reference to a specific embodiment.

第1図は、本発明の詳細な説明するために文字パターン
の一例を示した図である。図において、点線で囲んだ文
字塊’Mrs“は文字列イメージの一部であり、その幅
ΔvLを有し、点線で囲んだ文字塊%’[J“及び文字
塊′″O8“は、文字塊’Mrs“の下方に位置する文
字列イメージの一部である。図中Pは文字ピッチであり
、予め与えられたものでも良いし、紙面との文字塊から
推定されたものでも良い。そこで、文字塊’Mrs“を
−文字単位に分離するための分離位置は次のようにして
めることができる。文字塊′″Mrs“の幅Δv1 と
文字ピッチPとの関係から文字塊’Mrs“が複数の文
字を含むことが検知されると、文字塊’Mrs“の始点
x、と文字ピッチPの和xs十Pを算出することによっ
て、第1番目の分離位置x’(1)が得られる。
FIG. 1 is a diagram showing an example of a character pattern for explaining the present invention in detail. In the figure, the character block 'Mrs' surrounded by the dotted line is part of the character string image and has a width ΔvL, and the character block %'[J'' and the character block '''O8'' surrounded by the dotted line are the characters This is a part of the character string image located below the block 'Mrs'. In the figure, P is the character pitch, which may be given in advance or estimated from the character block with the paper surface. , the separation position for separating the character block 'Mrs' into -character units can be determined as follows. When it is detected that the character block 'Mrs' includes multiple characters from the relationship between the width Δv1 of the character block '''Mrs' and the character pitch P, the sum of the starting point x of the character block 'Mrs' and the character pitch P is detected. By calculating xs+P, the first separation position x'(1) is obtained.

分離位置X’(1)を基準として、一定許容幅Δτを有
する分離可能区間(X’(1)−Δτ、 X’+1)+
Δτ)を算出する。次lこ、文字塊’Mrs“が属する
文字列イメージの上方または下方に位置する文字列イメ
ージにおける前述した分離可能区間内で、分離位置x’
(1)から左右にナーチして、スペースが検出されると
、最初に検出された位置によって、分離位置X’(1)
を補正する。図では、文字塊′U“と文字塊’os”と
にスペースSPが存在するために、分離位置X’(1)
は、分離位置X(1)に補正される。
A separable section (X'(1) - Δτ, X'+1) + having a constant allowable width Δτ based on the separation position X'(1)
Δτ) is calculated. Next, within the above-mentioned separable section in the character string image located above or below the character string image to which the character block 'Mrs' belongs, the separation position x'
When a space is detected by arching left and right from (1), the separation position X' (1) is determined by the first detected position.
Correct. In the figure, since there is a space SP between the character block 'U'' and the character block 'os', the separation position X'(1)
is corrected to the separation position X(1).

次に、分割すべき分離位置x/ (2)は、分離位置X
(1)から文字ピッチPを加えることによって算出され
、同様にして、分離位置X’(21の補正が行われる。
Next, the separation position x/ (2) to be divided is the separation position
It is calculated by adding the character pitch P from (1), and the separation position X' (21) is corrected in the same way.

図では文字塊′O8“が接触しているため、スペースが
検出されず、分離位置x/(2)は移動されずそのまま
となる。
In the figure, since the character block 'O8'' is in contact, no space is detected and the separation position x/(2) remains unchanged without being moved.

このようにして、任意の文字列における接触を含む文字
塊の分離位置が決定される。・第2図は、本発明におけ
る具体的一実施例を示した論理ブロック図である。走査
装置if:1は紙面上0〕記載さイまた複数個の文字列
イメージを光学的に走査して電気信号に変換し、2値量
子化を行った後、文字列イメージメモリ2へ書き込む。
In this way, the separation position of a character block including a contact in an arbitrary character string is determined. - FIG. 2 is a logical block diagram showing a specific embodiment of the present invention. Scanning device if: 1 is written as 0 on the page] Also, a plurality of character string images are optically scanned and converted into electrical signals, and after binary quantization is performed, they are written into the character string image memory 2.

3は、文字塊抽出装置であり、文字列イメージメモリ2
に格納さイまた文字列イメージからスペースで分離可能
な文字塊の位置と幅と高さく以下、幅と高さを大きさと
呼ぶことにする)を各文字列単位にJilt’;次抽出
し、文字塊情報1/ジスタ4へ格納する。尚、上述した
文字塊抽出装置3は公知の技術を用いて実現できる。5
は、文字ピッチ検出装置であり、文字塊情報17ジスタ
4、に格納された複数個の文字塊の位置及び大きさを用
いて、文字ピッチPを測定する。尚、文字ピッチ検出装
置5は前述した公知の技術を用いることができる。
3 is a character block extraction device, and a character string image memory 2
Also, extract the position, width, and height of a character block that can be separated by a space from the character string image (hereinafter, the width and height will be referred to as the size) for each character string, Store in character block information 1/register 4. Note that the above-described character block extraction device 3 can be realized using a known technique. 5
is a character pitch detection device, which measures the character pitch P using the positions and sizes of a plurality of character blocks stored in the character block information 17 register 4. Note that the character pitch detection device 5 can use the known technology described above.

6は、接触文字塊検知装置であり、文字塊情報レジスタ
4の内容である文字塊の幅を順次読み出し、文字ピッチ
検出装置5により算出された文字ピッチPと比較するこ
とによって、文字塊に複数個の文字を含むか否かを検知
し、検知されると、入力された文字塊の位置及び大きさ
が分離位置算出装置7に転送される。分離位置算出装置
7は、入力された文字塊の始端から始めて順次、第に番
目(k = 0.1.2・・・)の分離位fa X (
k)と文字ピッチPを用いて、次に分割すべき候補位置
となる第に+1番目の分離位置X’(k+1)を算出し
、算出された分離位置と一定許容幅Δτから分離可能区
間を意味する上、下限値r(k+1)−Δτ。
Reference numeral 6 denotes a touching character block detection device, which sequentially reads out the width of the character block which is the content of the character block information register 4 and compares it with the character pitch P calculated by the character pitch detection device 5. If it is detected, the position and size of the input character block are transferred to the separation position calculation device 7. The separation position calculation device 7 sequentially calculates the th (k = 0.1.2...) separation position fa X (
k) and the character pitch P, calculate the +1st separation position X'(k+1), which is the next candidate position for division, and calculate the separable section from the calculated separation position and the constant allowable width Δτ. Meaning, the lower limit r(k+1)−Δτ.

x’(k+1)+Δτを算出する。尚、分離位置X(0
)は入力された文字塊の始端を表わすものとする。また
、一定許容幅Δτは文字ピッチPの値に応じて設定して
も良い。次に分離位置算出装置7は、前述した分離可能
区間及び入力された文字塊が含まわる文字列イメージの
上方向または下方向に隣接する文字列イメージの位置及
び高さを分離位置補正装置8へ°転送する。尚、文字列
イメージの位置及び高さは、それに属する複数個の文字
塊の位置及び高さから、前述した文字塊抽出装置によっ
て算出されているものとする。分離位置補正装w8は、
投影分布抽出装置81、投影情報レジスタ82、スペー
ス検出装置83から構成される。投影分布抽出装M81
は、分離位置算出装置7より転送された分離可能区間及
び前述した隣接する文字列イメージの位置及び高さによ
って設定される領域内の上下方向の投影分布を文字列イ
メージメモリ2を走査してめ、投影情報レジスタ82に
記憶される。
Calculate x'(k+1)+Δτ. In addition, the separation position X (0
) represents the beginning of the input character block. Further, the constant allowable width Δτ may be set according to the value of the character pitch P. Next, the separation position calculation device 7 sends to the separation position correction device 8 the position and height of the character string image adjacent to the upper or lower direction of the character string image including the above-mentioned separable section and the input character block. °Transfer. It is assumed that the position and height of the character string image are calculated by the above-mentioned character block extraction device from the positions and heights of a plurality of character blocks belonging to the character string image. The separation position correction device w8 is
It is composed of a projection distribution extraction device 81, a projection information register 82, and a space detection device 83. Projection distribution extraction device M81
The character string image memory 2 is scanned to determine the vertical projection distribution within the area set by the separable section transferred from the separation position calculation device 7 and the position and height of the adjacent character string image described above. , are stored in the projection information register 82.

スペース検出装置83は、投影情報レジスタ82を参照
して、スペース即ち、投影情報の値が0となる位置が存
在するか否かを調べ、存在すれば、前述した分離位置X
’(k+1)から最初にスペースが検出される補正位置
(尚、補正位置は分離位置x(k+1)を表わす)を分
離位置算出装置へ転送する。
The space detection device 83 refers to the projection information register 82 to check whether a space exists, that is, a position where the value of the projection information is 0, and if it exists, the above-mentioned separation position
The correction position where a space is first detected from '(k+1) (the correction position represents the separation position x(k+1)) is transferred to the separation position calculation device.

一方、スペース検出装置83において、スペースとなる
位置が存在しなければ、前述した分離位置X’(k+1
)を分離位置x(k+1)とり、T、分離位置算出装置
7へ転送する。スペース検出装置間より分割位置x(k
+1)が転送されると、分離位置算出装置7は、分割位
置X (k)と分割位置x(k+1)により分割した文
字イメージを文字列イメージメモリより抽出し、文字認
識装置9へ転送する。次に、分離位置算出装置は、第に
+1番目の分割位置と文字ピッチPを用いて、次に分割
すべき位置を予測し、前述した操作を入力された文字塊
の終端が検出されるまで行われる。尚、文字認識装置9
は入力された文字イメージを所定のカテゴリーに認識す
るものであり、公知の技術である。また前述した分離位
置補正装置8は再度、文字列イメージメモリ2を走査す
ることによって、スペースの存在を検知したが、文字塊
情報レジスタ4に格納された複数個の文字塊の位置及び
幅よりスペースの存在を検知するようにしても良い。
On the other hand, in the space detection device 83, if there is no space position, the separation position X'(k+1
) is taken as the separation position x(k+1) and transferred to T and the separation position calculation device 7. The dividing position x(k
+1) is transferred, the separation position calculating device 7 extracts the character image divided by the dividing position X (k) and the dividing position x(k+1) from the character string image memory, and transfers it to the character recognition device 9. Next, the separation position calculation device predicts the next position to be divided using the +1st division position and character pitch P, and performs the above operation until the end of the input character block is detected. It will be done. In addition, the character recognition device 9
This is a known technology that recognizes input character images into predetermined categories. Furthermore, the aforementioned separation position correction device 8 detected the presence of a space by scanning the character string image memory 2 again, but the space was smaller than the position and width of the plurality of character blocks stored in the character block information register 4. It may also be possible to detect the presence of.

以上述べたように、本発明を用いることにより、文字間
に接触が生じた場合にも容易に一文字に分割でき、且つ
文字分離精度を向上させる文字分離袋f首を実現するこ
とができる。
As described above, by using the present invention, it is possible to realize a character separation bag f-neck that can easily divide characters into single characters even when there is contact between characters and improve character separation accuracy.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の詳細な説明するための一例図である。 第2図は本発明における具体的一実施例を示した論理ブ
ロック図である。 図において、1は走査装置、2は文字列イメージメモリ
、3は文字塊抽出装置、4は文字塊情報レジスタ、5は
文字ピッチ検出装置、6は接触文字塊検知装置、7は分
離位置算出装置、8は分離位置補正装置、81は投影分
布抽出装置、82は投影情報レジスタ、83はスペース
検出装置、9は文字認識装置。 代J)・1人弁理士白原 晋 。 、l゛ 71 図 5ビ ア1′2図
FIG. 1 is an example diagram for explaining the present invention in detail. FIG. 2 is a logical block diagram showing a specific embodiment of the present invention. In the figure, 1 is a scanning device, 2 is a character string image memory, 3 is a character block extraction device, 4 is a character block information register, 5 is a character pitch detection device, 6 is a contact character block detection device, and 7 is a separation position calculation device. , 8 is a separation position correction device, 81 is a projection distribution extraction device, 82 is a projection information register, 83 is a space detection device, and 9 is a character recognition device. Susumu Shirahara, one patent attorney. , l゛71 Figure 5 Via 1'2 diagram

Claims (1)

【特許請求の範囲】[Claims] 紙面上に記載された複数個の文字列イメージを走査し、
−文字単位に分離する文字分離装置において、前記複数
個の文字列イメージからスペースで分離可能な文字塊の
位置と幅と高さを各文字列単位毎に、順次抽出する手段
と、複数個の前記文字塊の位置と幅と高さを用いて、文
字ピッチを検出する手段と、複数個の文字を含む文字塊
を検知する手段と、前記文字塊の始端位置から順次始め
て、1つ前の分離位置と該文字ピンチによって、次に分
割すべき分離位置を予測し、該分離位置を基準として、
左右一定許容幅を持つ分離可能区間を設け、上方または
下方に隣接する文字列イメージ上の該分離可能区間内を
該分離位置からサーチし最初にス鏝−スを検出した位置
によって、該分離位置を補正する手段とを有することを
特徴とする文字分離装置。
Scan multiple character string images written on paper,
- In a character separation device that separates character units, means for sequentially extracting, for each character string unit, the position, width, and height of character chunks that can be separated by spaces from the plurality of character string images; means for detecting a character pitch using the position, width, and height of the character mass; means for detecting a character mass including a plurality of characters; Using the separation position and the character pinch, predict the separation position to be divided next, and use the separation position as a reference,
A separable section with a constant allowable width on the left and right sides is provided, and the separation position is determined by searching from the separation position within the separable section on the upper or lower adjacent character string image, and determining the separation position according to the position where the trowel is first detected. A character separation device characterized in that it has means for correcting.
JP58133324A 1983-07-21 1983-07-21 Character separating device Pending JPS6024677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58133324A JPS6024677A (en) 1983-07-21 1983-07-21 Character separating device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58133324A JPS6024677A (en) 1983-07-21 1983-07-21 Character separating device

Publications (1)

Publication Number Publication Date
JPS6024677A true JPS6024677A (en) 1985-02-07

Family

ID=15102040

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58133324A Pending JPS6024677A (en) 1983-07-21 1983-07-21 Character separating device

Country Status (1)

Country Link
JP (1) JPS6024677A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6425894U (en) * 1987-08-07 1989-02-14

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6425894U (en) * 1987-08-07 1989-02-14

Similar Documents

Publication Publication Date Title
US5761344A (en) Image pre-processor for character recognition system
US4757551A (en) Character recognition method and system capable of recognizing slant characters
US5325447A (en) Handwritten digit normalization method
EP0050234B1 (en) Method and apparatus for character preprocessing
US4813078A (en) Character recognition apparatus
CN100568263C (en) Layout analysis device and layout analysis method
JPH0863583A (en) Document storage retrieval device and method
JPS5991582A (en) Character reader
JPS6024677A (en) Character separating device
Aparna et al. A complete OCR system development of Tamil magazine documents
JPH0368431B2 (en)
JP3957471B2 (en) Separating string unit
JPH0750496B2 (en) Image signal processor
JPS63304387A (en) document reading device
JPH0259502B2 (en)
JP2643092B2 (en) Method and system for processing non-standard data located outside predefined fields on a document form
JPS59121589A (en) Character pitch discriminating device
JPS59158478A (en) Character pitch detector
JPH10162104A (en) Character recognition device
JPS5875278A (en) Character and symbol recognizing device
JP2004030340A (en) Document identifying apparatus and identifying method therefor
JPS58214969A (en) Character reading device
JPS63184181A (en) Optical character recognition device
JPS6154575A (en) Character reader
JPH0222427B2 (en)