JPH0365584B2 - - Google Patents

Info

Publication number
JPH0365584B2
JPH0365584B2 JP57229025A JP22902582A JPH0365584B2 JP H0365584 B2 JPH0365584 B2 JP H0365584B2 JP 57229025 A JP57229025 A JP 57229025A JP 22902582 A JP22902582 A JP 22902582A JP H0365584 B2 JPH0365584 B2 JP H0365584B2
Authority
JP
Japan
Prior art keywords
character
width
limit value
blocks
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP57229025A
Other languages
Japanese (ja)
Other versions
JPS59121589A (en
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed filed Critical
Priority to JP57229025A priority Critical patent/JPS59121589A/en
Priority to DE8383113182T priority patent/DE3380462D1/en
Priority to EP83113182A priority patent/EP0113119B1/en
Publication of JPS59121589A publication Critical patent/JPS59121589A/en
Publication of JPH0365584B2 publication Critical patent/JPH0365584B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Description

【発明の詳細な説明】 本発明は、文字分離装置、特に文字の接触を含
む印字された紙面上の文字列イメージが可変ピツ
チ文字列か等ピツチ文字列かを識別する装置に関
するものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character separation device, and more particularly to a device for identifying whether a character string image on a printed paper including touching characters is a variable pitch character string or a constant pitch character string.

従来、印字あるいは手書き文字を光学的に読み
取る装置(以下、OCRと称す)において、英数
字及びカタカナを対象とするものはすでに実用化
されている。しかし、このようなOCRにおける
読取対象も種々な印字文字や品質を持つ郵便物や
文書のような、紙面や印字機械の制限が緩和され
たものとなると、タイプライター等の印字の際に
生じる印字ぶれや位置ずれ等により隣接文字間の
接触が生じる場合が増える。このような接触する
文字が含まれる紙面上の文字列イメージを分離す
るためには、まず最初に読取対象が等ピツチ文字
列を識別する必要が生じる。可変文字ピツチ列か
等ピツチ列かの識別方法として接触した文字も含
めたすべての文字幅の頻度分布等に基づいて、統
計的に調べる方法を用いると、接触する文字が増
えるにつれて、等ピツチ文字列か可変ピツチ文字
列かの識別が不正確となる。等ピツチ文字列か可
変ピツチ文字列かの識別が不正確となると、等ピ
ツチ文字列の接触する文字塊の検出等が困難とな
るか、または、文字接触の比較的少ない可変ピツ
チ文字列の不正確な分離を引き起こす結果とな
る。
BACKGROUND ART Conventionally, devices for optically reading printed or handwritten characters (hereinafter referred to as OCR) for alphanumeric characters and katakana have already been put into practical use. However, when OCR scans items such as mail and documents with various printed characters and qualities, and the limitations of paper and printing machines are relaxed, it is difficult to read the prints that occur when printing with a typewriter, etc. Increasingly, contact between adjacent characters occurs due to blurring, misalignment, etc. In order to separate character string images on paper that include such touching characters, it is first necessary to identify character strings that are evenly spaced to be read. To distinguish between variable character pitch strings and regular pitch strings, if you use a statistical method to check based on the frequency distribution of all character widths, including characters that touch, as the number of characters that touch increases, equal pitch characters Identification of column or variable pitch string becomes inaccurate. If the identification of a constant pitch character string or a variable pitch character string becomes inaccurate, it may become difficult to detect a block of characters that touch each other in a constant pitch character string, or it may become difficult to detect a block of characters that touch a character string with a relatively small number of characters. This results in a precise separation.

本発明の目的は、複数個の文字塊の平均高さよ
り一文字となり得る可能な文字塊の幅の存在範囲
を設定し、存在範囲内で、予め定めた許容幅で求
めた最頻度となる文字塊の幅の上限値を頻度情報
に基づいて更新し、一文字幅クラスを得ることに
よつて等ピツチ文字列の接触文字塊の幅やコンマ
等を除去し、容易に文字ピツチの識別が行なえる
ようにした文字ピツチ識別装置を提供することに
ある。
An object of the present invention is to set the range of possible widths of character blocks that can become one character from the average height of a plurality of character blocks, and to obtain the most frequent character block within the range of existence determined by a predetermined allowable width. By updating the upper limit value of the width based on frequency information and obtaining a single character width class, the width and commas of contacting character blocks of evenly pitched character strings can be removed, making it easier to identify character pitches. An object of the present invention is to provide a character pitch identification device that allows for accurate character pitch identification.

本発明の他の目的は、一文字幅クラスの文字塊
の幅の広がり具合に基づいて文字ピツチの識別を
行なうようにした文字ピツチ識別装置を提供する
ことにある。
Another object of the present invention is to provide a character pitch identification device that identifies character pitches based on the degree of width of a character block in one character width class.

本発明の更に他の目的は、文字ピツチの値が予
め、知ることができない種々の印字文字から、安
定に文字ピツチの識別を行なうようにした文字ピ
ツチ識別装置を提供することにある。
Still another object of the present invention is to provide a character pitch identification device which can stably identify character pitch from various printed characters whose character pitch values cannot be known in advance.

本発明によれば、紙面上に記載された文字列イ
メージを走査し、個々の文字に分離する文字分離
装置において、2値化された文字列イメージから
複数個の文字塊の高さ及び幅を得る手段と、複数
個の文字塊の幅に関する頻度情報を得る頻度テー
ブルメモリと、複数個の文字塊の高さから文字塊
の平均高さを得る手段と、文字塊の平均高さを用
いて一文字となり得る可能な文字塊の幅の存在範
囲を設定し、頻度テーブルメモリ内の頻度情報か
ら予め定められた許容幅で最頻度となる文字塊の
幅の上限値及び下限値を存在範囲内で得る手段
と、上限値を頻度テーブルメモリ内の頻度情報に
基づいて更新し、一文字クラスの上限値及び下限
値を得る手段と、一文字幅クラスの上限値及び下
限値の広がり具合を調べ、可変文字ピツチ文字列
か等文字ピツチ文字列かを識別する手段とを有す
ることを特徴とする文字ピツチ識別装置が得られ
る。
According to the present invention, in a character separation device that scans a character string image written on a paper surface and separates it into individual characters, the height and width of a plurality of character blocks are calculated from a binarized character string image. a frequency table memory for obtaining frequency information regarding the width of a plurality of character blocks; a means for obtaining an average height of a character block from the heights of a plurality of character blocks; and a frequency table memory for obtaining frequency information regarding the width of a plurality of character blocks; Set the existence range of the width of a possible character block that can be one character, and set the upper and lower limit values of the width of the most frequent character block within the existing range from the frequency information in the frequency table memory with a predetermined allowable width. means to update the upper limit value based on the frequency information in the frequency table memory and obtain the upper limit value and lower limit value of a single character class; There is obtained a character pitch identification device characterized in that it has means for identifying whether it is a pitch character string or an equal character pitch character string.

以下、本発明における具体的一実施例を参照し
て説明する。第1図は、文字間の接触を含む等ピ
ツチ文字列イメージの一部を示した一例である。
図において、斜線で示した白地で分離可能な文字
部イメージ即ち文字塊を矩形領域で示しており、
図中Vi,Hi(i=1,…,8)は、各文字塊の幅
及び高さを示している。ここで、文字塊幅V1
V8を持つ文字塊は文字塊の接触によつて、2文
字を含んだ文字イメージとなつている。
Hereinafter, the present invention will be described with reference to a specific embodiment. FIG. 1 is an example showing a part of an evenly spaced character string image including contact between characters.
In the figure, the white background indicated by diagonal lines shows a separable character part image, that is, a character block, as a rectangular area.
In the figure, V i and H i (i=1, . . . , 8) indicate the width and height of each character block. Here, the character block width V 1 ,
The character block with V 8 becomes a character image containing two characters due to the contact of the character blocks.

第2図は、第1図で示したような紙面上の複数
個の文字塊幅の頻度分布の一例を示しており、こ
れを用いて本発明の原理を説明する。第2図にお
いて、図中、複数個の文字塊に関する頻度分布の
横軸は、文字塊幅Vの値を示しており、縦軸
NUMは任意の文字塊幅の値における文字塊の個
数、即ち、頻度値を示している。本発明では前述
した頻度分布を参照しながら第2図A1で示した
1文字と見なせる文字幅の区間A1即ち、その上
限値VUと下限値VLとの差を抽出し、この値を文
字塊の平均高さHnに基づいて設定された閾値と
比較することによつて等ピツチ/可変ピツチ文字
列の識別が自動的に行われる。そこで、1文字と
見なせる文字幅の区間A1の抽出について説明す
る。最初に、文字の幅と高さには密接な関係があ
るため、文字塊の平均高さHn及び定数C1,C2(但
し、C1<C2、例えばC1=0.5、C2=1.2)を用いて
1文字可能範囲Aを設定する。次に頻度分布を参
照して上記範囲A内で最頻値を持つ文字塊を最も
1文字らしい文字幅として抽出する。ここで、上
記最頻値を得る場合、通常、郵便物上の住所等の
ように観測できる文字塊数が少ない場合も考慮す
るために、許容幅ΔT(図中、ΔT=3)を設け、
許容幅ΔT内に含まれる文字塊数の頻度数が最大
となる文字塊幅の区間ASの上限値V′Uと下限値
V′Lを算出する。尚、ここで、1文字可能範囲A
内に対し許容幅ΔTで最頻値を算出するために、
V′U−V′L=ΔTが成立する。
FIG. 2 shows an example of the frequency distribution of the widths of a plurality of character blocks on a paper surface as shown in FIG. 1, and the principle of the present invention will be explained using this. In Figure 2, the horizontal axis of the frequency distribution regarding multiple character blocks indicates the value of the character block width V, and the vertical axis
NUM indicates the number of character blocks at a given character block width value, that is, the frequency value. In the present invention, while referring to the frequency distribution described above, the character width section A 1 that can be considered as one character shown in FIG . Identification of constant pitch/variable pitch character strings is automatically performed by comparing the average height H n of character blocks with a threshold value set based on the average height H n of character blocks. Therefore, extraction of the character width section A1 that can be considered as one character will be explained. First, since there is a close relationship between the width and height of a character, the average height of a character block H n and the constants C 1 and C 2 (however, C 1 < C 2 , e.g. C 1 = 0.5, C 2 =1.2) to set the one-character possible range A. Next, referring to the frequency distribution, a character block having the most frequent value within the range A is extracted as the character width that is most likely to be one character. Here, when obtaining the above-mentioned mode, an allowable width ΔT (ΔT = 3 in the figure) is usually set, in order to take into account cases where the number of observable character blocks is small, such as in addresses on mail, etc.
Upper limit value V′ U and lower limit value of the character block width interval A S where the frequency of the number of character blocks included in the allowable width ΔT is maximum
Calculate V′ L. In addition, here, one character possible range A
In order to calculate the mode within the allowable width ΔT,
V′ U −V′ L =ΔT holds true.

次に、上述した最も1文字らしい文字幅の区間
ASを初期値として、その上限値V′Uを頻度分布を
参照しながらその値を増加する方向に更新するこ
とによつて、1文字と見なせる文字幅の区間A1
の下限値VUと下限値VLを抽出することができる。
尚、下限値VLについて、例えば、ピリオツド、
カンマ等のように文字幅が小さいものを除去し、
等ピツチ文字列と可変ピツチ文字列との識別の信
頼性を上げるために上記最も1文字らしい文字幅
の区間ASの下限値V′Lがその値が減少する方向に
更新を行わない。即ち、下限値VLは下限値V′L
等しくなる。また、上記下限値V′Uを更新し、1
文字と見なせる文字幅区間A1の上限値VUを求め
るための更新動作の終了は、第3図を用いて後記
に説明するように定数C3及び文字塊の平均高さ
Hn(但し、C3<C2であり、例えばC3=0.4)を用
いて行うことができる。ここで、上記文字幅の区
間A1の値によつて等ピツチ文字列か可変ピツチ
文字列かの識別ができることについて簡単に述べ
る。即ち、ピリオツド、カンマ等の文字幅が小さ
い物を除けば、等ピツチ文字列の場合、上述した
最も1文字らしい文字塊幅付近(即ち、図中AS
付近)に集中する印刷物が大半であり、一方、可
変ピツチ文字列では、ピリオツド、カンマ等の文
字幅が小さいものを除いても、最も1文字塊幅付
近に集中することなく、ばらつく性質を持つてい
る。そのため、第2図で示す如く、1文字幅と見
なせ得る区間A1を検出することによつて等ピツ
チ文字列と可変ピツチ文字列との識別が可能とな
る。
Next, the section of character width that is most likely to be one character mentioned above.
By setting A S as the initial value and updating its upper limit value V′ U in the direction of increasing it while referring to the frequency distribution, we can create an interval A 1 of character width that can be considered as one character.
The lower limit value V U and lower limit value V L can be extracted.
Regarding the lower limit value V L , for example, period,
Remove characters with small width such as commas,
In order to improve the reliability of discrimination between a constant pitch character string and a variable pitch character string, the lower limit value V'L of the character width section A S that is most likely to be one character is not updated in a direction in which its value decreases. That is, the lower limit value V L is equal to the lower limit value V′ L. Also, update the above lower limit value V′ U and 1
The update operation for determining the upper limit value V U of the character width section A 1 that can be considered as a character is completed based on the constant C 3 and the average height of the character block, as explained later using Figure 3.
This can be carried out using H n (where C 3 <C 2 , for example C 3 =0.4). Here, it will be briefly described that a constant pitch character string or a variable pitch character string can be identified based on the value of the character width section A1 . In other words, excluding characters with small character widths such as periods and commas, in the case of evenly spaced character strings, the width of the character block most likely to be one character as described above (i.e., A S
On the other hand, in variable pitch character strings, even if characters with small widths such as periods and commas are excluded, the character strings have a tendency to vary without being concentrated around the width of one character block. ing. Therefore, as shown in FIG. 2, by detecting the section A1 that can be regarded as one character wide, it is possible to distinguish between a constant pitch character string and a variable pitch character string.

第3図は、本発明における具体的一実施例を示
す論理ブロツク図である。尚、図において、信号
線の末尾にSを付けることにより、その信号を表
わすものとする。走査装置1は、紙面上の印字又
は手書きされた文字列を光学的に走査して電気信
号に変換し、2値に量子化した文字列イメージを
順次文字列イメージメモリ2へ書き込む。3は文
字塊抽出装置であり、文字列イメージメモリ2に
格納された文字列イメージから白ビツトで囲まれ
た文字イメージ(以下文字塊と呼ぶ)を順次検出
し、各文字塊の始端位置及び大きさを文字塊レジ
スタ4へ格納する。尚、文字塊の大きさは、文字
塊の幅及び高さを表わすものとする。また、この
ような文字塊抽出装置は、例えば同一出願人によ
る特願昭56−27512号明細書で示されている技術
を用いて求めることができる。文字塊レジスタ4
に格納された複数個の文字塊の幅は、順次制御装
置6へ転送される。制御装置6は順次転送される
文字塊の幅の値を、頻度テーブルメモリ5のアド
レスに変換し、対応する頻度テーブルメモリ5の
内容を読み出し、+1加算回路11によつて、イ
ンクリメントし、頻度テーブルメモリ5内の同一
の記憶場所に書き込む。このようにして、頻度テ
ーブルメモリ5内に文字列イメージメモリ2より
抽出された文字塊の幅Viの頻度値が、頻度テー
ブルメモリ5のアドレスVi番地に格納される。
FIG. 3 is a logic block diagram showing a specific embodiment of the present invention. In the figure, the signal is represented by adding S to the end of the signal line. A scanning device 1 optically scans a printed or handwritten character string on a paper surface, converts it into an electrical signal, and sequentially writes a binary quantized character string image into a character string image memory 2. 3 is a character block extraction device that sequentially detects character images surrounded by white bits (hereinafter referred to as character blocks) from the character string images stored in the character string image memory 2, and determines the starting position and size of each character block. The value is stored in the character block register 4. Note that the size of a character block represents the width and height of the character block. Further, such a character block extraction device can be obtained using, for example, the technique disclosed in Japanese Patent Application No. 1983-27512 filed by the same applicant. Character block register 4
The widths of the plurality of character chunks stored in are sequentially transferred to the control device 6. The control device 6 converts the width values of the character blocks that are sequentially transferred into addresses in the frequency table memory 5, reads out the contents of the corresponding frequency table memory 5, increments them by +1 addition circuit 11, and converts them into addresses in the frequency table memory 5. Write to the same storage location in memory 5. In this way, the frequency value of the width Vi of the character block extracted from the character string image memory 2 is stored in the frequency table memory 5 at the address Vi of the frequency table memory 5.

尚、頻度テーブルメモリ5は、最初、0にクリ
アされているとする。次に、文字塊レジスタ4に
格納された複数個の文字塊の高さは、順次平均高
さ検出部7へ転送される。平均高さ検出部7は、
複数個の文字塊の高さの平均値を算出する。尚、
平均高さ検出部7において、最頻値を有する高さ
を算出する回路を用いても良い。
It is assumed that the frequency table memory 5 is initially cleared to 0. Next, the heights of the plurality of character blocks stored in the character block register 4 are sequentially transferred to the average height detection section 7. The average height detection unit 7 is
Calculate the average height of multiple character blocks. still,
In the average height detection section 7, a circuit that calculates the height having the most frequent value may be used.

15は、定数レジスタであり、第2図で説明し
た定数C1,C2,C3(但し、C1<C2、C3<C2)を格
納する。定数C1,C2は、平均高さ検出部7にお
いて検出された文字塊の平均高さHmより、一文
字塊の幅の可能な範囲を決めるものであり、定数
C3は、図中点線で囲まれた一文字幅クラス検出
部8において、用いる閾値を生成するためのもの
である。乗算部10は、文字塊の平均高さHmと
定数C1,C2,C3との乗算を行なう。乗算部10
によつて得られた値のうち、C1・Hm及びC2
Hm(但し、C1・Hm<C2・Hm)はそれぞれ、可
能な一文字の幅の下限及び上限値とし一文字範囲
設定レジスタ12へ格納され、C3・Hmは閾値レ
ジスタ17へ格納される。16は、定数レジスタ
であり、予め設定された許容幅を示す定数△Tが
設定されている。13は、最頻度幅検出部であ
る。一文字範囲設定レジスタ12に格納された可
能な一文字の幅の下限値C1・Hm及び上限値C2
Hm内に含まれる文字塊の幅の頻度値を制御装置
6によつて、頻度テーブルメモリ5から順次、最
頻度幅検出部13へ入力させると、最頻度幅検出
部13は、定数レジスタ16の内容である定数△
T間隔で、頻度値を計数し、最頻度値を有する文
字塊の幅の区間を示す下限値、上限値VL,VU(但
し、C1・Hm≦VL≦C2・Hm−△T、C1:Hm+
△T≦VU≦C2・Hm、VU−VL=△Tを満たす。)
を検出する。最頻度値を有する文字塊の幅の下限
値VL及び上限値VUは、一文字初期クラスレジス
タ81へ格納される。
15 is a constant register, which stores constants C 1 , C 2 , C 3 (C 1 <C 2 , C 3 <C 2 ) explained in FIG. The constants C 1 and C 2 are used to determine the possible range of the width of one character block from the average height Hm of the character block detected by the average height detection unit 7, and are constants.
C3 is for generating a threshold value to be used in the one character width class detection unit 8 surrounded by a dotted line in the figure. The multiplication unit 10 multiplies the average height Hm of the character blocks by constants C 1 , C 2 , and C 3 . Multiplication section 10
Of the values obtained by C 1・Hm and C 2
Hm (C 1 ·Hm<C 2 ·Hm) is stored in the one character range setting register 12 as the lower limit and upper limit of the possible width of one character, respectively, and C 3 ·Hm is stored in the threshold value register 17. 16 is a constant register, in which a constant ΔT indicating a preset allowable width is set. 13 is a most frequent width detection section. The lower limit value C 1 · Hm and the upper limit value C 2 · of the possible width of one character stored in the one character range setting register 12
When the frequency value of the width of the character block included in Hm is inputted sequentially from the frequency table memory 5 to the most frequent width detecting section 13 by the control device 6, the most frequently occurring width detecting section 13 inputs the frequency value of the width of the character block contained in the constant register 16. Constant △ which is the content
Count the frequency values at intervals of T, and calculate the lower and upper limits V L , V U (however, C 1・Hm≦V L ≦C 2・Hm−△ T, C 1 : Hm+
△T≦V U ≦C 2・Hm, V U −V L = △T is satisfied. )
Detect. The lower limit value V L and upper limit value V U of the width of the character block having the most frequent value are stored in the single character initial class register 81 .

一文字幅クラス検出部8は前述した最頻度値を
有する文字塊の幅の上限値VUを、頻度テーブル
メモリ5の頻度情報に基づいて、更新し、一文字
の幅が有する区間即ち、第2図で示した一文字幅
と見なせ得る区間A1の上限値及び下限値を、求
める回路であり、構成要素として、一文字初期ク
ラスレジスタ81、カウンター82、頻度値レジ
スタ83、ゼロ検出部84、減算部85比較部8
6、OR回路87、より構成される。一文字初期
クラスレジスタ81に格納された文字塊の幅の上
限値は、カウンター82にセツトされた後、1カ
ウントアツプされ、カウンター82の内容である
文字塊の幅に対応する頻度値を制御装置6によつ
て、頻度テーブルメモリ5から読み出し、頻度値
レジスタ83に格納される。ゼロ検出部84にお
いて、頻度値レジスタ83の内容が0か否かを検
出し、0であれば、出力信号841Sを“1”に
することによつて、OR回路87が開き、カウン
タ82が1カウントアツプされる。
The one-character width class detection unit 8 updates the upper limit value V U of the width of the character block having the above-mentioned most frequent value based on the frequency information in the frequency table memory 5, and updates the upper limit value V U of the width of the character block having the above-mentioned most frequent value, and calculates the interval of the width of one character, that is, as shown in FIG. This is a circuit for finding the upper and lower limits of the interval A1 that can be regarded as one character width, as shown in FIG. 85 comparison part 8
6. It is composed of an OR circuit 87. The upper limit value of the width of a character block stored in the one-character initial class register 81 is set in the counter 82, and then incremented by 1, and the frequency value corresponding to the width of the character block, which is the content of the counter 82, is transferred to the control device 6. Accordingly, the data is read from the frequency table memory 5 and stored in the frequency value register 83. The zero detection unit 84 detects whether the content of the frequency value register 83 is 0 or not. If it is 0, the output signal 841S is set to "1" to open the OR circuit 87 and the counter 82 becomes 1. It will be counted up.

一方、ゼロ検出部84において、頻度値レジス
タの内容が0でなければ、その出力信号841S
より“0”が出力され、制御装置6によつて一文
字初期クラスレジスタ81の内容である文字塊の
幅の上限値とカウンター82の内容とを減算部8
5へ転送する。減算回路85は、カウンター82
の内容から初期文字クラスレジスタ81の内容で
ある文字塊の幅の上限値を減じ、減じた値を比較
部86へ転送する。比較部86において、閾値レ
ジスタ17の内容と減算部85からの出力値とを
比較し閾値レジスタ17の内容が減算部85の出
力値よりも大きければ、その出力信号861Sよ
り“1”を出力する。出力信号861Sより
“1”が出力されると、制御装置6によつて、カ
ウンター82の内容を用いて、一文字初期クラス
レジスタ82に格納された文字塊の幅の上限値を
更新すると共に、OR回路87が開き、カウンタ
ー82が1カウントアツプされる。一方、閾値レ
ジスタ17の内容が減算部85の出力値よりも小
さいか等しければ、比較部86の出力信号861
Sより“0”が出力され制御装置6によつて、一
文字初期クラスレジスタ81の内容である文字塊
の幅の下限値及び前述したようにして更新された
上限値を一文字幅クラスレジスタ14へ転送され
る。ここで、以上述べた一文字幅クラス検出部8
の動作を要約する。即ち、1文字初期クラスレジ
スタ81に初期値としてセツトされた文字塊幅の
上限値を検査中の文字塊幅を示すカウンタ82の
内容によつて更新する動作は、カウンタ82の内
容と一文字初期クラスレジスタ81に記憶された
上記上限値との差が閾値レジスタ17の内容であ
る値C3Hmより大きくなつた時に終了し、一文字
初期クラスレジスタ81お記憶された上限値が一
文字幅レジスタ14へ転送されることになる。
On the other hand, in the zero detection unit 84, if the content of the frequency value register is not 0, the output signal 841S
“0” is output from the controller 6, and the subtracter 8 subtracts the upper limit value of the width of a character block, which is the content of the one-character initial class register 81, and the content of the counter 82.
Transfer to 5. The subtraction circuit 85 includes a counter 82
The upper limit value of the width of a character block, which is the content of the initial character class register 81, is subtracted from the content of the initial character class register 81, and the subtracted value is transferred to the comparison unit 86. The comparison unit 86 compares the contents of the threshold register 17 and the output value from the subtraction unit 85, and if the contents of the threshold register 17 is larger than the output value of the subtraction unit 85, outputs “1” from the output signal 861S. . When “1” is output from the output signal 861S, the control device 6 uses the contents of the counter 82 to update the upper limit value of the width of a character block stored in the single character initial class register 82, and also performs OR. The circuit 87 is opened and the counter 82 is counted up by one. On the other hand, if the contents of the threshold register 17 are smaller than or equal to the output value of the subtraction section 85, the output signal 861 of the comparison section 86
"0" is output from S, and the control device 6 transfers the lower limit value of the character block width, which is the content of the one-character initial class register 81, and the upper limit value updated as described above to the one-character width class register 14. be done. Here, the single character width class detection unit 8 described above
Summarize the behavior of That is, the operation of updating the upper limit value of the character block width set as an initial value in the 1-character initial class register 81 by the contents of the counter 82 indicating the character block width under inspection is based on the contents of the counter 82 and the 1-character initial class. The process ends when the difference from the upper limit value stored in the register 81 becomes greater than the value C 3 Hm that is the content of the threshold register 17, and the upper limit value stored in the one-character initial class register 81 is transferred to the one-character width register 14. will be done.

また、文字塊幅の上限値を更新する条件につい
ては、カウンタ82の内容である検査中の文字塊
幅に対応する頻度分布の頻度値が0の時(ゼロ検
出部84で検知)には、無条件でカウンタ82の
内容をカウントアツプ(即ち、次の文字塊幅)
し、上記頻度値が0でなく且つまだ上述した文字
塊幅の上限値を更新する動作終了とならない時に
は、検査中の文字塊幅を示すカウンタ82の内容
を一文字初期クラスレジスタ81の上限値として
転送することによつて文字塊幅の上限値の更新が
行われることになる。
Regarding the conditions for updating the upper limit value of the character block width, when the frequency value of the frequency distribution corresponding to the character block width under inspection, which is the content of the counter 82, is 0 (detected by the zero detection unit 84), Unconditionally counts up the contents of the counter 82 (i.e., the width of the next character block)
However, when the frequency value is not 0 and the operation of updating the upper limit value of the character block width described above is not yet completed, the contents of the counter 82 indicating the character block width under inspection are set as the upper limit value of the single character initial class register 81. By transferring, the upper limit value of the character block width will be updated.

図中点線で囲まれた判定部9は、文字列イメー
ジメモリ2に記憶された文字列イメージが等ピツ
チ文字列か可変ピツチ文字列かを決定する。91
は減算部であり、一文字幅クラスレジスタ14の
内容である文字塊の幅の上限値から下限値を減
じ、その出力値を比較部93へ転送する。92は
閾値レジスタであり、予め設定された閾値が格納
される。尚、閾値レジスタ92の内容は、閾値レ
ジスタ17の内容のように、平均高さ検出部7で
得られた文字塊の平均高さに基づいて設定しても
良い。比較部93において、閾値レジスタ92の
内容と減算部91の出力値とを比較する。閾値レ
ジスタ92の内容が減算部91の出力値よりも大
きければ、その出力信号931Sより“1”を出
力することによつて等ピツチ文字列の検出が可能
となる。閾値レジスタ92の内容が減算部91の
出力値よりも小さいか等しければ、その出力信号
931Sより“0”を出力することによつて可変
ピツチ文字列の検出が可能となる。
A determining unit 9 surrounded by a dotted line in the figure determines whether the character string image stored in the character string image memory 2 is a constant pitch character string or a variable pitch character string. 91
is a subtraction unit which subtracts the lower limit value from the upper limit value of the width of a character block, which is the content of the one-character width class register 14, and transfers the output value to the comparison unit 93. Reference numeral 92 is a threshold value register in which a preset threshold value is stored. Note that the contents of the threshold value register 92 may be set based on the average height of the character blocks obtained by the average height detection section 7, like the contents of the threshold value register 17. A comparison section 93 compares the contents of the threshold register 92 and the output value of the subtraction section 91 . If the content of the threshold register 92 is larger than the output value of the subtraction unit 91, "1" is output from the output signal 931S, thereby making it possible to detect an equally pitched character string. If the contents of the threshold register 92 are smaller than or equal to the output value of the subtraction unit 91, the variable pitch character string can be detected by outputting "0" from the output signal 931S.

以上述べたように、本発明を適用することによ
り、隣接文字間の接触が生じやすくなる場合でも
安定に可変ピツチ文字列か等ピツチ文字列かを識
別することが可能となる。
As described above, by applying the present invention, it is possible to stably identify whether a character string is a variable pitch character string or a constant pitch character string even when adjacent characters are likely to come into contact with each other.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は、等ピツチ文字列イメージの一部を一
例として示した図である。第2図は、本発明の原
理を示すために示した文字塊幅の頻度分布の一例
を示した図である。第3図は本発明における具体
的一実施を示す論理ブロツク図である。図におい
て、1は走査装置、2は文字列イメージメモリ、
3は文字塊抽出装置、4は文字塊レジスタ、5は
頻度テーブルメモリ、6は制御装置、7は平均高
さ検出部、10は乗算部、11は+1加算回路、
12は一文字範囲設定レジスタ、13は最頻度幅
検出部、14は一文字幅クラスレジスタ、15,
16は定数レジスタ、17は閾値レジスタ、8は
一文字幅クラス検出部、81は一文字初期クラス
レジスタ、82はカウンター、83は頻度値レジ
スタ、84はゼロ検出部、85は減算部、86は
比較部、87はOR回路、9は判定部、92は閾
値レジスタ、91は減算部、93は比較部であ
る。
FIG. 1 is a diagram showing a part of an evenly pitched character string image as an example. FIG. 2 is a diagram showing an example of the frequency distribution of character block widths shown to illustrate the principle of the present invention. FIG. 3 is a logic block diagram showing one specific implementation of the present invention. In the figure, 1 is a scanning device, 2 is a character string image memory,
3 is a character block extraction device, 4 is a character block register, 5 is a frequency table memory, 6 is a control device, 7 is an average height detection unit, 10 is a multiplication unit, 11 is a +1 addition circuit,
12 is a single character range setting register, 13 is the most frequent width detection unit, 14 is a single character width class register, 15,
16 is a constant register, 17 is a threshold register, 8 is a one-character width class detection section, 81 is a one-character initial class register, 82 is a counter, 83 is a frequency value register, 84 is a zero detection section, 85 is a subtraction section, and 86 is a comparison section. , 87 is an OR circuit, 9 is a determination section, 92 is a threshold register, 91 is a subtraction section, and 93 is a comparison section.

Claims (1)

【特許請求の範囲】[Claims] 1 紙面上に記載された文字列イメージを走査
し、個々の文字に分離する文字分離装置におい
て、2値量子化した前記文字列イメージから白ビ
ツト列で囲まれた複数個の文字塊を抽出し、前記
文字塊の高さ及び幅を得る手段と、複数個の前記
文字塊の幅に関する頻度情報を得る頻度テーブル
メモリと、複数個の前記文字塊の高さから、文字
塊の平均高さを得る手段と、前記文字塊の平均高
さを用いて一文字となり得る可能な文字塊の幅の
存在範囲を設定し、前記頻度テーブルメモリ内の
頻度情報から予め定められた許容幅で最頻度とな
る文字塊の幅の上限値及び下限値を前記存在範囲
内で得る手段と、前記上限値を前記頻度テーブル
メモリ内の頻度情報に基づいて更新し、一文字幅
クラスの上限値及び下限値を得る手段と、前記一
文字幅クラスの上限値及び下限値の広がり具合を
調べ、可変文字ピツチ文字列か等文字ピツチ文字
列かを識別する手段とを有することを特徴とする
文字ピツチ識別装置。
1 A character separation device that scans a character string image written on a paper and separates it into individual characters extracts multiple character blocks surrounded by white bit strings from the binary quantized character string image. , means for obtaining the height and width of the character blocks; a frequency table memory for obtaining frequency information regarding the widths of the plurality of character blocks; and a means for obtaining the average height of the character blocks from the heights of the plurality of character blocks. A range of widths of possible character blocks that can become one character is set using the average height of the character blocks, and the frequency information in the frequency table memory is used to obtain the highest frequency within a predetermined allowable width. means for obtaining upper and lower limit values of the width of a character block within the existing range; and means for updating the upper limit value based on frequency information in the frequency table memory to obtain the upper and lower limit values for a single character width class. and means for checking the spread of the upper limit value and lower limit value of the one character width class and identifying whether it is a variable character pitch character string or an equal character pitch character string.
JP57229025A 1982-12-28 1982-12-28 Character pitch discriminating device Granted JPS59121589A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP57229025A JPS59121589A (en) 1982-12-28 1982-12-28 Character pitch discriminating device
DE8383113182T DE3380462D1 (en) 1982-12-28 1983-12-28 Character pitch detecting apparatus
EP83113182A EP0113119B1 (en) 1982-12-28 1983-12-28 Character pitch detecting apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57229025A JPS59121589A (en) 1982-12-28 1982-12-28 Character pitch discriminating device

Publications (2)

Publication Number Publication Date
JPS59121589A JPS59121589A (en) 1984-07-13
JPH0365584B2 true JPH0365584B2 (en) 1991-10-14

Family

ID=16885565

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57229025A Granted JPS59121589A (en) 1982-12-28 1982-12-28 Character pitch discriminating device

Country Status (1)

Country Link
JP (1) JPS59121589A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2565150B2 (en) * 1988-04-28 1996-12-18 セイコーエプソン株式会社 Character cutting method
JP2570415B2 (en) * 1988-04-28 1997-01-08 セイコーエプソン株式会社 Character extraction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5339094B2 (en) * 1973-06-30 1978-10-19

Also Published As

Publication number Publication date
JPS59121589A (en) 1984-07-13

Similar Documents

Publication Publication Date Title
EP0120334B1 (en) Letter pitch detection system
US8306325B2 (en) Text character identification system and method thereof
US4635290A (en) Sectioning apparatus and method for optical character reader systems
JPS63158678A (en) Inter-word space detection method
CN111814780A (en) Bill image processing method, device and equipment and storage medium
JPH0365584B2 (en)
JPH0430070B2 (en)
US8391606B2 (en) Image processing device, image processing method, and computer readable medium
JPH0410087A (en) Base line extracting method
EP4036871A1 (en) Image processing apparatus, image processing method, program and storage medium
JPS59125477A (en) Identifying device of character pitch
EP0113119B1 (en) Character pitch detecting apparatus
JPH0368431B2 (en)
JP2001251490A (en) Print control apparatus, copier, print control method, copy control method, medium recording print control program, and medium recording copy control program
JPS60164879A (en) Character separating device
JPH0259502B2 (en)
JPH0468669B2 (en)
JPH0471232B2 (en)
JP2778436B2 (en) Character segmentation device
JP3006294B2 (en) Optical character reader
JP2001291058A (en) Character recognition device and recording medium
JPH0632074B2 (en) Normalization method
JPH0459670B2 (en)
JPH11272795A (en) Character reading device and character reading method, character inspection device and character inspection method
CN119992573A (en) Method, device and storage medium for extracting verification information in a report