JPH0365584B2

JPH0365584B2 -

Info

Publication number: JPH0365584B2
Application number: JP57229025A
Authority: JP
Priority date: 1982-12-28
Filing date: 1982-12-28
Publication date: 1991-10-14
Also published as: JPS59121589A

Description

【発明の詳細な説明】本発明は、文字分離装置、特に文字の接触を含
む印字された紙面上の文字列イメージが可変ピツ
チ文字列か等ピツチ文字列かを識別する装置に関
するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character separation device, and more particularly to a device for identifying whether a character string image on a printed paper including touching characters is a variable pitch character string or a constant pitch character string.

従来、印字あるいは手書き文字を光学的に読み
取る装置（以下、OCRと称す）において、英数
字及びカタカナを対象とするものはすでに実用化
されている。しかし、このようなOCRにおける
読取対象も種々な印字文字や品質を持つ郵便物や
文書のような、紙面や印字機械の制限が緩和され
たものとなると、タイプライター等の印字の際に
生じる印字ぶれや位置ずれ等により隣接文字間の
接触が生じる場合が増える。このような接触する
文字が含まれる紙面上の文字列イメージを分離す
るためには、まず最初に読取対象が等ピツチ文字
列を識別する必要が生じる。可変文字ピツチ列か
等ピツチ列かの識別方法として接触した文字も含
めたすべての文字幅の頻度分布等に基づいて、統
計的に調べる方法を用いると、接触する文字が増
えるにつれて、等ピツチ文字列か可変ピツチ文字
列かの識別が不正確となる。等ピツチ文字列か可
変ピツチ文字列かの識別が不正確となると、等ピ
ツチ文字列の接触する文字塊の検出等が困難とな
るか、または、文字接触の比較的少ない可変ピツ
チ文字列の不正確な分離を引き起こす結果とな
る。 BACKGROUND ART Conventionally, devices for optically reading printed or handwritten characters (hereinafter referred to as OCR) for alphanumeric characters and katakana have already been put into practical use. However, when OCR scans items such as mail and documents with various printed characters and qualities, and the limitations of paper and printing machines are relaxed, it is difficult to read the prints that occur when printing with a typewriter, etc. Increasingly, contact between adjacent characters occurs due to blurring, misalignment, etc. In order to separate character string images on paper that include such touching characters, it is first necessary to identify character strings that are evenly spaced to be read. To distinguish between variable character pitch strings and regular pitch strings, if you use a statistical method to check based on the frequency distribution of all character widths, including characters that touch, as the number of characters that touch increases, equal pitch characters Identification of column or variable pitch string becomes inaccurate. If the identification of a constant pitch character string or a variable pitch character string becomes inaccurate, it may become difficult to detect a block of characters that touch each other in a constant pitch character string, or it may become difficult to detect a block of characters that touch a character string with a relatively small number of characters. This results in a precise separation.

本発明の目的は、複数個の文字塊の平均高さよ
り一文字となり得る可能な文字塊の幅の存在範囲
を設定し、存在範囲内で、予め定めた許容幅で求
めた最頻度となる文字塊の幅の上限値を頻度情報
に基づいて更新し、一文字幅クラスを得ることに
よつて等ピツチ文字列の接触文字塊の幅やコンマ
等を除去し、容易に文字ピツチの識別が行なえる
ようにした文字ピツチ識別装置を提供することに
ある。 An object of the present invention is to set the range of possible widths of character blocks that can become one character from the average height of a plurality of character blocks, and to obtain the most frequent character block within the range of existence determined by a predetermined allowable width. By updating the upper limit value of the width based on frequency information and obtaining a single character width class, the width and commas of contacting character blocks of evenly pitched character strings can be removed, making it easier to identify character pitches. An object of the present invention is to provide a character pitch identification device that allows for accurate character pitch identification.

本発明の他の目的は、一文字幅クラスの文字塊
の幅の広がり具合に基づいて文字ピツチの識別を
行なうようにした文字ピツチ識別装置を提供する
ことにある。 Another object of the present invention is to provide a character pitch identification device that identifies character pitches based on the degree of width of a character block in one character width class.

本発明の更に他の目的は、文字ピツチの値が予
め、知ることができない種々の印字文字から、安
定に文字ピツチの識別を行なうようにした文字ピ
ツチ識別装置を提供することにある。 Still another object of the present invention is to provide a character pitch identification device which can stably identify character pitch from various printed characters whose character pitch values cannot be known in advance.

本発明によれば、紙面上に記載された文字列イ
メージを走査し、個々の文字に分離する文字分離
装置において、２値化された文字列イメージから
複数個の文字塊の高さ及び幅を得る手段と、複数
個の文字塊の幅に関する頻度情報を得る頻度テー
ブルメモリと、複数個の文字塊の高さから文字塊
の平均高さを得る手段と、文字塊の平均高さを用
いて一文字となり得る可能な文字塊の幅の存在範
囲を設定し、頻度テーブルメモリ内の頻度情報か
ら予め定められた許容幅で最頻度となる文字塊の
幅の上限値及び下限値を存在範囲内で得る手段
と、上限値を頻度テーブルメモリ内の頻度情報に
基づいて更新し、一文字クラスの上限値及び下限
値を得る手段と、一文字幅クラスの上限値及び下
限値の広がり具合を調べ、可変文字ピツチ文字列
か等文字ピツチ文字列かを識別する手段とを有す
ることを特徴とする文字ピツチ識別装置が得られ
る。 According to the present invention, in a character separation device that scans a character string image written on a paper surface and separates it into individual characters, the height and width of a plurality of character blocks are calculated from a binarized character string image. a frequency table memory for obtaining frequency information regarding the width of a plurality of character blocks; a means for obtaining an average height of a character block from the heights of a plurality of character blocks; and a frequency table memory for obtaining frequency information regarding the width of a plurality of character blocks; Set the existence range of the width of a possible character block that can be one character, and set the upper and lower limit values of the width of the most frequent character block within the existing range from the frequency information in the frequency table memory with a predetermined allowable width. means to update the upper limit value based on the frequency information in the frequency table memory and obtain the upper limit value and lower limit value of a single character class; There is obtained a character pitch identification device characterized in that it has means for identifying whether it is a pitch character string or an equal character pitch character string.

以下、本発明における具体的一実施例を参照し
て説明する。第１図は、文字間の接触を含む等ピ
ツチ文字列イメージの一部を示した一例である。
図において、斜線で示した白地で分離可能な文字
部イメージ即ち文字塊を矩形領域で示しており、
図中V_i，H_i（ｉ＝１，…，８）は、各文字塊の幅
及び高さを示している。ここで、文字塊幅V₁，
V₈を持つ文字塊は文字塊の接触によつて、２文
字を含んだ文字イメージとなつている。 Hereinafter, the present invention will be described with reference to a specific embodiment. FIG. 1 is an example showing a part of an evenly spaced character string image including contact between characters.
In the figure, the white background indicated by diagonal lines shows a separable character part image, that is, a character block, as a rectangular area.
In the figure, V _i and H _i (i=1, . . . , 8) indicate the width and height of each character block. Here, the character block width V ₁ ,
The character block with V ₈ becomes a character image containing two characters due to the contact of the character blocks.

第２図は、第１図で示したような紙面上の複数
個の文字塊幅の頻度分布の一例を示しており、こ
れを用いて本発明の原理を説明する。第２図にお
いて、図中、複数個の文字塊に関する頻度分布の
横軸は、文字塊幅Ｖの値を示しており、縦軸
NUMは任意の文字塊幅の値における文字塊の個
数、即ち、頻度値を示している。本発明では前述
した頻度分布を参照しながら第２図A₁で示した
１文字と見なせる文字幅の区間A₁即ち、その上
限値V_Uと下限値V_Lとの差を抽出し、この値を文
字塊の平均高さH_nに基づいて設定された閾値と
比較することによつて等ピツチ／可変ピツチ文字
列の識別が自動的に行われる。そこで、１文字と
見なせる文字幅の区間A₁の抽出について説明す
る。最初に、文字の幅と高さには密接な関係があ
るため、文字塊の平均高さH_n及び定数C₁，C₂（但
し、C₁＜C₂、例えばC₁＝0.5、C₂＝1.2）を用いて
１文字可能範囲Ａを設定する。次に頻度分布を参
照して上記範囲Ａ内で最頻値を持つ文字塊を最も
１文字らしい文字幅として抽出する。ここで、上
記最頻値を得る場合、通常、郵便物上の住所等の
ように観測できる文字塊数が少ない場合も考慮す
るために、許容幅ΔT（図中、ΔT＝３）を設け、
許容幅ΔT内に含まれる文字塊数の頻度数が最大
となる文字塊幅の区間A_Sの上限値V′_Uと下限値
V′_Lを算出する。尚、ここで、１文字可能範囲Ａ
内に対し許容幅ΔTで最頻値を算出するために、
V′_U−V′_L＝ΔTが成立する。 FIG. 2 shows an example of the frequency distribution of the widths of a plurality of character blocks on a paper surface as shown in FIG. 1, and the principle of the present invention will be explained using this. In Figure 2, the horizontal axis of the frequency distribution regarding multiple character blocks indicates the value of the character block width V, and the vertical axis
NUM indicates the number of character blocks at a given character block width value, that is, the frequency value. In the present invention, while referring to the frequency distribution _described above, _{the character width section A 1} _that can be considered as one character shown in _FIG . Identification of constant pitch/variable pitch character strings is automatically performed by comparing the average height H _n of character blocks with a threshold value set based on the average height H n of character blocks. Therefore, extraction of the character width section _A1 that can be considered as one character will be explained. First, since there is a close relationship between the width and height of a character, the average height of a character block H _n and the constants C ₁ and C ₂ (however, C ₁ < C ₂ , e.g. C ₁ = 0.5, C ₂ =1.2) to set the one-character possible range A. Next, referring to the frequency distribution, a character block having the most frequent value within the range A is extracted as the character width that is most likely to be one character. Here, when obtaining the above-mentioned mode, an allowable width ΔT (ΔT = 3 in the figure) is usually set, in order to take into account cases where the number of observable character blocks is small, such as in addresses on mail, etc.
Upper limit value V′ _{U and lower limit value of the character block width interval A S} _where the frequency of the number of character blocks included in the allowable width ΔT is maximum
Calculate V′ _L. In addition, here, one character possible range A
In order to calculate the mode within the allowable width ΔT,
V′ _U −V′ _L =ΔT holds true.

次に、上述した最も１文字らしい文字幅の区間
A_Sを初期値として、その上限値V′_Uを頻度分布を
参照しながらその値を増加する方向に更新するこ
とによつて、１文字と見なせる文字幅の区間A₁
の下限値V_Uと下限値V_Lを抽出することができる。
尚、下限値V_Lについて、例えば、ピリオツド、
カンマ等のように文字幅が小さいものを除去し、
等ピツチ文字列と可変ピツチ文字列との識別の信
頼性を上げるために上記最も１文字らしい文字幅
の区間A_Sの下限値V′_Lがその値が減少する方向に
更新を行わない。即ち、下限値V_Lは下限値V′_Lに
等しくなる。また、上記下限値V′_Uを更新し、１
文字と見なせる文字幅区間A₁の上限値V_Uを求め
るための更新動作の終了は、第３図を用いて後記
に説明するように定数C₃及び文字塊の平均高さ
H_n（但し、C₃＜C₂であり、例えばC₃＝0.4）を用
いて行うことができる。ここで、上記文字幅の区
間A₁の値によつて等ピツチ文字列か可変ピツチ
文字列かの識別ができることについて簡単に述べ
る。即ち、ピリオツド、カンマ等の文字幅が小さ
い物を除けば、等ピツチ文字列の場合、上述した
最も１文字らしい文字塊幅付近（即ち、図中A_S
付近）に集中する印刷物が大半であり、一方、可
変ピツチ文字列では、ピリオツド、カンマ等の文
字幅が小さいものを除いても、最も１文字塊幅付
近に集中することなく、ばらつく性質を持つてい
る。そのため、第２図で示す如く、１文字幅と見
なせ得る区間A₁を検出することによつて等ピツ
チ文字列と可変ピツチ文字列との識別が可能とな
る。 Next, the section of character width that is most likely to be one character mentioned above.
By setting A _S as the initial value and updating its upper limit value V′ _U in the direction of increasing it while referring to the frequency distribution, we can create an interval A ₁ of character width that can be considered as one character.
The lower limit value V _U and lower limit value V _L can be extracted.
Regarding the lower limit value V _L , for example, period,
Remove characters with small width such as commas,
In order to improve the reliability of discrimination between a constant pitch character string and a variable pitch character string, the lower limit value _V'L of the character width section A _S that is most likely to be one character is not updated in a direction in which its value decreases. That is, the lower limit value V _L is equal to the lower limit value V′ _L. Also, update the above lower limit value V′ _U and 1
The update operation for determining the upper limit value V _U of the character width section A ₁ that can be considered as a character is completed based on the constant C ₃ and the average height of the character block, as explained later using Figure 3.
This can be carried out using H _n (where C ₃ <C ₂ , for example C ₃ =0.4). Here, it will be briefly described that a constant pitch character string or a variable pitch character string can be identified based on the value of the character width section _A1 . In other words, excluding characters with small character widths such as periods and commas, in the case of evenly spaced character strings, the width of the character block most likely to be one character as described above (i.e., A _S
On the other hand, in variable pitch character strings, even if characters with small widths such as periods and commas are excluded, the character strings have a tendency to vary without being concentrated around the width of one character block. ing. Therefore, as shown in FIG. 2, by detecting the section _A1 that can be regarded as one character wide, it is possible to distinguish between a constant pitch character string and a variable pitch character string.

第３図は、本発明における具体的一実施例を示
す論理ブロツク図である。尚、図において、信号
線の末尾にＳを付けることにより、その信号を表
わすものとする。走査装置１は、紙面上の印字又
は手書きされた文字列を光学的に走査して電気信
号に変換し、２値に量子化した文字列イメージを
順次文字列イメージメモリ２へ書き込む。３は文
字塊抽出装置であり、文字列イメージメモリ２に
格納された文字列イメージから白ビツトで囲まれ
た文字イメージ（以下文字塊と呼ぶ）を順次検出
し、各文字塊の始端位置及び大きさを文字塊レジ
スタ４へ格納する。尚、文字塊の大きさは、文字
塊の幅及び高さを表わすものとする。また、この
ような文字塊抽出装置は、例えば同一出願人によ
る特願昭56−27512号明細書で示されている技術
を用いて求めることができる。文字塊レジスタ４
に格納された複数個の文字塊の幅は、順次制御装
置６へ転送される。制御装置６は順次転送される
文字塊の幅の値を、頻度テーブルメモリ５のアド
レスに変換し、対応する頻度テーブルメモリ５の
内容を読み出し、＋１加算回路１１によつて、イ
ンクリメントし、頻度テーブルメモリ５内の同一
の記憶場所に書き込む。このようにして、頻度テ
ーブルメモリ５内に文字列イメージメモリ２より
抽出された文字塊の幅Viの頻度値が、頻度テー
ブルメモリ５のアドレスVi番地に格納される。 FIG. 3 is a logic block diagram showing a specific embodiment of the present invention. In the figure, the signal is represented by adding S to the end of the signal line. A scanning device 1 optically scans a printed or handwritten character string on a paper surface, converts it into an electrical signal, and sequentially writes a binary quantized character string image into a character string image memory 2. 3 is a character block extraction device that sequentially detects character images surrounded by white bits (hereinafter referred to as character blocks) from the character string images stored in the character string image memory 2, and determines the starting position and size of each character block. The value is stored in the character block register 4. Note that the size of a character block represents the width and height of the character block. Further, such a character block extraction device can be obtained using, for example, the technique disclosed in Japanese Patent Application No. 1983-27512 filed by the same applicant. Character block register 4
The widths of the plurality of character chunks stored in are sequentially transferred to the control device 6. The control device 6 converts the width values of the character blocks that are sequentially transferred into addresses in the frequency table memory 5, reads out the contents of the corresponding frequency table memory 5, increments them by +1 addition circuit 11, and converts them into addresses in the frequency table memory 5. Write to the same storage location in memory 5. In this way, the frequency value of the width Vi of the character block extracted from the character string image memory 2 is stored in the frequency table memory 5 at the address Vi of the frequency table memory 5.

尚、頻度テーブルメモリ５は、最初、０にクリ
アされているとする。次に、文字塊レジスタ４に
格納された複数個の文字塊の高さは、順次平均高
さ検出部７へ転送される。平均高さ検出部７は、
複数個の文字塊の高さの平均値を算出する。尚、
平均高さ検出部７において、最頻値を有する高さ
を算出する回路を用いても良い。 It is assumed that the frequency table memory 5 is initially cleared to 0. Next, the heights of the plurality of character blocks stored in the character block register 4 are sequentially transferred to the average height detection section 7. The average height detection unit 7 is
Calculate the average height of multiple character blocks. still,
In the average height detection section 7, a circuit that calculates the height having the most frequent value may be used.

１５は、定数レジスタであり、第２図で説明し
た定数C₁，C₂，C₃（但し、C₁＜C₂、C₃＜C₂）を格
納する。定数C₁，C₂は、平均高さ検出部７にお
いて検出された文字塊の平均高さHmより、一文
字塊の幅の可能な範囲を決めるものであり、定数
C₃は、図中点線で囲まれた一文字幅クラス検出
部８において、用いる閾値を生成するためのもの
である。乗算部１０は、文字塊の平均高さHmと
定数C₁，C₂，C₃との乗算を行なう。乗算部１０
によつて得られた値のうち、C₁・Hm及びC₂・
Hm（但し、C₁・Hm＜C₂・Hm）はそれぞれ、可
能な一文字の幅の下限及び上限値とし一文字範囲
設定レジスタ１２へ格納され、C₃・Hmは閾値レ
ジスタ１７へ格納される。１６は、定数レジスタ
であり、予め設定された許容幅を示す定数△Ｔが
設定されている。１３は、最頻度幅検出部であ
る。一文字範囲設定レジスタ１２に格納された可
能な一文字の幅の下限値C₁・Hm及び上限値C₂・
Hm内に含まれる文字塊の幅の頻度値を制御装置
６によつて、頻度テーブルメモリ５から順次、最
頻度幅検出部１３へ入力させると、最頻度幅検出
部１３は、定数レジスタ１６の内容である定数△
Ｔ間隔で、頻度値を計数し、最頻度値を有する文
字塊の幅の区間を示す下限値、上限値V_L，V_U（但
し、C₁・Hm≦V_L≦C₂・Hm−△Ｔ、C₁：Hm＋
△Ｔ≦V_U≦C₂・Hm、V_U−V_L＝△Ｔを満たす。）
を検出する。最頻度値を有する文字塊の幅の下限
値V_L及び上限値V_Uは、一文字初期クラスレジス
タ８１へ格納される。 15 is a constant register, which stores constants C ₁ , C ₂ , C ₃ (C ₁ <C ₂ , C ₃ <C ₂ ) explained in FIG. The constants C ₁ and C ₂ are used to determine the possible range of the width of one character block from the average height Hm of the character block detected by the average height detection unit 7, and are constants.
_C3 is for generating a threshold value to be used in the one character width class detection unit 8 surrounded by a dotted line in the figure. The multiplication unit 10 multiplies the average height Hm of the character blocks by constants C ₁ , C ₂ , and C ₃ . Multiplication section 10
Of the values obtained by C ₁・Hm and C ₂・
Hm (C ₁ ·Hm<C ₂ ·Hm) is stored in the one character range setting register 12 as the lower limit and upper limit of the possible width of one character, respectively, and C ₃ ·Hm is stored in the threshold value register 17. 16 is a constant register, in which a constant ΔT indicating a preset allowable width is set. 13 is a most frequent width detection section. The lower limit value C ₁ · Hm and the upper limit value C ₂ · of the possible width of one character stored in the one character range setting register 12
When the frequency value of the width of the character block included in Hm is inputted sequentially from the frequency table memory 5 to the most frequent width detecting section 13 by the control device 6, the most frequently occurring width detecting section 13 inputs the frequency value of the width of the character block contained in the constant register 16. Constant △ which is the content
Count the frequency values at intervals of T, and calculate the lower and upper limits V _L , V _U (however, C ₁・Hm≦V _L ≦C ₂・Hm−△ T, C ₁ : Hm+
△T≦V _U ≦C ₂・Hm, V _U −V _L = △T is satisfied. )
Detect. The lower limit value V _L and upper limit value V _U of the width of the character block having the most frequent value are stored in the single character initial class register 81 .

一文字幅クラス検出部８は前述した最頻度値を
有する文字塊の幅の上限値V_Uを、頻度テーブル
メモリ５の頻度情報に基づいて、更新し、一文字
の幅が有する区間即ち、第２図で示した一文字幅
と見なせ得る区間A₁の上限値及び下限値を、求
める回路であり、構成要素として、一文字初期ク
ラスレジスタ８１、カウンター８２、頻度値レジ
スタ８３、ゼロ検出部８４、減算部８５比較部８
６、OR回路８７、より構成される。一文字初期
クラスレジスタ８１に格納された文字塊の幅の上
限値は、カウンター８２にセツトされた後、１カ
ウントアツプされ、カウンター８２の内容である
文字塊の幅に対応する頻度値を制御装置６によつ
て、頻度テーブルメモリ５から読み出し、頻度値
レジスタ８３に格納される。ゼロ検出部８４にお
いて、頻度値レジスタ８３の内容が０か否かを検
出し、０であれば、出力信号８４１Ｓを“１”に
することによつて、OR回路８７が開き、カウン
タ８２が１カウントアツプされる。 The one-character width class detection unit 8 updates the upper limit value V U of the width of the character block having the above-mentioned most frequent value based on the frequency information in the frequency table memory 5, and updates the upper limit value V _U of the width of the character block having the above-mentioned most frequent value, and calculates the interval of the width of one character, that is, as shown in FIG. This is a circuit for finding the upper and lower limits of the interval _A1 that can be regarded as one character width, as shown in FIG. 85 comparison part 8
6. It is composed of an OR circuit 87. The upper limit value of the width of a character block stored in the one-character initial class register 81 is set in the counter 82, and then incremented by 1, and the frequency value corresponding to the width of the character block, which is the content of the counter 82, is transferred to the control device 6. Accordingly, the data is read from the frequency table memory 5 and stored in the frequency value register 83. The zero detection unit 84 detects whether the content of the frequency value register 83 is 0 or not. If it is 0, the output signal 841S is set to "1" to open the OR circuit 87 and the counter 82 becomes 1. It will be counted up.

一方、ゼロ検出部８４において、頻度値レジス
タの内容が０でなければ、その出力信号８４１Ｓ
より“０”が出力され、制御装置６によつて一文
字初期クラスレジスタ８１の内容である文字塊の
幅の上限値とカウンター８２の内容とを減算部８
５へ転送する。減算回路８５は、カウンター８２
の内容から初期文字クラスレジスタ８１の内容で
ある文字塊の幅の上限値を減じ、減じた値を比較
部８６へ転送する。比較部８６において、閾値レ
ジスタ１７の内容と減算部８５からの出力値とを
比較し閾値レジスタ１７の内容が減算部８５の出
力値よりも大きければ、その出力信号８６１Ｓよ
り“１”を出力する。出力信号８６１Ｓより
“１”が出力されると、制御装置６によつて、カ
ウンター８２の内容を用いて、一文字初期クラス
レジスタ８２に格納された文字塊の幅の上限値を
更新すると共に、OR回路８７が開き、カウンタ
ー８２が１カウントアツプされる。一方、閾値レ
ジスタ１７の内容が減算部８５の出力値よりも小
さいか等しければ、比較部８６の出力信号８６１
Ｓより“０”が出力され制御装置６によつて、一
文字初期クラスレジスタ８１の内容である文字塊
の幅の下限値及び前述したようにして更新された
上限値を一文字幅クラスレジスタ１４へ転送され
る。ここで、以上述べた一文字幅クラス検出部８
の動作を要約する。即ち、１文字初期クラスレジ
スタ８１に初期値としてセツトされた文字塊幅の
上限値を検査中の文字塊幅を示すカウンタ８２の
内容によつて更新する動作は、カウンタ８２の内
容と一文字初期クラスレジスタ８１に記憶された
上記上限値との差が閾値レジスタ１７の内容であ
る値C₃Hmより大きくなつた時に終了し、一文字
初期クラスレジスタ８１お記憶された上限値が一
文字幅レジスタ１４へ転送されることになる。 On the other hand, in the zero detection unit 84, if the content of the frequency value register is not 0, the output signal 841S
“0” is output from the controller 6, and the subtracter 8 subtracts the upper limit value of the width of a character block, which is the content of the one-character initial class register 81, and the content of the counter 82.
Transfer to 5. The subtraction circuit 85 includes a counter 82
The upper limit value of the width of a character block, which is the content of the initial character class register 81, is subtracted from the content of the initial character class register 81, and the subtracted value is transferred to the comparison unit 86. The comparison unit 86 compares the contents of the threshold register 17 and the output value from the subtraction unit 85, and if the contents of the threshold register 17 is larger than the output value of the subtraction unit 85, outputs “1” from the output signal 861S. . When “1” is output from the output signal 861S, the control device 6 uses the contents of the counter 82 to update the upper limit value of the width of a character block stored in the single character initial class register 82, and also performs OR. The circuit 87 is opened and the counter 82 is counted up by one. On the other hand, if the contents of the threshold register 17 are smaller than or equal to the output value of the subtraction section 85, the output signal 861 of the comparison section 86
"0" is output from S, and the control device 6 transfers the lower limit value of the character block width, which is the content of the one-character initial class register 81, and the upper limit value updated as described above to the one-character width class register 14. be done. Here, the single character width class detection unit 8 described above
Summarize the behavior of That is, the operation of updating the upper limit value of the character block width set as an initial value in the 1-character initial class register 81 by the contents of the counter 82 indicating the character block width under inspection is based on the contents of the counter 82 and the 1-character initial class. The process ends when the difference from the upper limit value stored in the register 81 becomes greater than the value C ₃ Hm that is the content of the threshold register 17, and the upper limit value stored in the one-character initial class register 81 is transferred to the one-character width register 14. will be done.

また、文字塊幅の上限値を更新する条件につい
ては、カウンタ８２の内容である検査中の文字塊
幅に対応する頻度分布の頻度値が０の時（ゼロ検
出部８４で検知）には、無条件でカウンタ８２の
内容をカウントアツプ（即ち、次の文字塊幅）
し、上記頻度値が０でなく且つまだ上述した文字
塊幅の上限値を更新する動作終了とならない時に
は、検査中の文字塊幅を示すカウンタ８２の内容
を一文字初期クラスレジスタ８１の上限値として
転送することによつて文字塊幅の上限値の更新が
行われることになる。 Regarding the conditions for updating the upper limit value of the character block width, when the frequency value of the frequency distribution corresponding to the character block width under inspection, which is the content of the counter 82, is 0 (detected by the zero detection unit 84), Unconditionally counts up the contents of the counter 82 (i.e., the width of the next character block)
However, when the frequency value is not 0 and the operation of updating the upper limit value of the character block width described above is not yet completed, the contents of the counter 82 indicating the character block width under inspection are set as the upper limit value of the single character initial class register 81. By transferring, the upper limit value of the character block width will be updated.

図中点線で囲まれた判定部９は、文字列イメー
ジメモリ２に記憶された文字列イメージが等ピツ
チ文字列か可変ピツチ文字列かを決定する。９１
は減算部であり、一文字幅クラスレジスタ１４の
内容である文字塊の幅の上限値から下限値を減
じ、その出力値を比較部９３へ転送する。９２は
閾値レジスタであり、予め設定された閾値が格納
される。尚、閾値レジスタ９２の内容は、閾値レ
ジスタ１７の内容のように、平均高さ検出部７で
得られた文字塊の平均高さに基づいて設定しても
良い。比較部９３において、閾値レジスタ９２の
内容と減算部９１の出力値とを比較する。閾値レ
ジスタ９２の内容が減算部９１の出力値よりも大
きければ、その出力信号９３１Ｓより“１”を出
力することによつて等ピツチ文字列の検出が可能
となる。閾値レジスタ９２の内容が減算部９１の
出力値よりも小さいか等しければ、その出力信号
９３１Ｓより“０”を出力することによつて可変
ピツチ文字列の検出が可能となる。 A determining unit 9 surrounded by a dotted line in the figure determines whether the character string image stored in the character string image memory 2 is a constant pitch character string or a variable pitch character string. 91
is a subtraction unit which subtracts the lower limit value from the upper limit value of the width of a character block, which is the content of the one-character width class register 14, and transfers the output value to the comparison unit 93. Reference numeral 92 is a threshold value register in which a preset threshold value is stored. Note that the contents of the threshold value register 92 may be set based on the average height of the character blocks obtained by the average height detection section 7, like the contents of the threshold value register 17. A comparison section 93 compares the contents of the threshold register 92 and the output value of the subtraction section 91 . If the content of the threshold register 92 is larger than the output value of the subtraction unit 91, "1" is output from the output signal 931S, thereby making it possible to detect an equally pitched character string. If the contents of the threshold register 92 are smaller than or equal to the output value of the subtraction unit 91, the variable pitch character string can be detected by outputting "0" from the output signal 931S.

以上述べたように、本発明を適用することによ
り、隣接文字間の接触が生じやすくなる場合でも
安定に可変ピツチ文字列か等ピツチ文字列かを識
別することが可能となる。 As described above, by applying the present invention, it is possible to stably identify whether a character string is a variable pitch character string or a constant pitch character string even when adjacent characters are likely to come into contact with each other.

[Brief explanation of drawings]

第１図は、等ピツチ文字列イメージの一部を一
例として示した図である。第２図は、本発明の原
理を示すために示した文字塊幅の頻度分布の一例
を示した図である。第３図は本発明における具体
的一実施を示す論理ブロツク図である。図におい
て、１は走査装置、２は文字列イメージメモリ、
３は文字塊抽出装置、４は文字塊レジスタ、５は
頻度テーブルメモリ、６は制御装置、７は平均高
さ検出部、１０は乗算部、１１は＋１加算回路、
１２は一文字範囲設定レジスタ、１３は最頻度幅
検出部、１４は一文字幅クラスレジスタ、１５，
１６は定数レジスタ、１７は閾値レジスタ、８は
一文字幅クラス検出部、８１は一文字初期クラス
レジスタ、８２はカウンター、８３は頻度値レジ
スタ、８４はゼロ検出部、８５は減算部、８６は
比較部、８７はOR回路、９は判定部、９２は閾
値レジスタ、９１は減算部、９３は比較部であ
る。 FIG. 1 is a diagram showing a part of an evenly pitched character string image as an example. FIG. 2 is a diagram showing an example of the frequency distribution of character block widths shown to illustrate the principle of the present invention. FIG. 3 is a logic block diagram showing one specific implementation of the present invention. In the figure, 1 is a scanning device, 2 is a character string image memory,
3 is a character block extraction device, 4 is a character block register, 5 is a frequency table memory, 6 is a control device, 7 is an average height detection unit, 10 is a multiplication unit, 11 is a +1 addition circuit,
12 is a single character range setting register, 13 is the most frequent width detection unit, 14 is a single character width class register, 15,
16 is a constant register, 17 is a threshold register, 8 is a one-character width class detection section, 81 is a one-character initial class register, 82 is a counter, 83 is a frequency value register, 84 is a zero detection section, 85 is a subtraction section, and 86 is a comparison section. , 87 is an OR circuit, 9 is a determination section, 92 is a threshold register, 91 is a subtraction section, and 93 is a comparison section.

Claims

[Claims]

1 A character separation device that scans a character string image written on a paper and separates it into individual characters extracts multiple character blocks surrounded by white bit strings from the binary quantized character string image. , means for obtaining the height and width of the character blocks; a frequency table memory for obtaining frequency information regarding the widths of the plurality of character blocks; and a means for obtaining the average height of the character blocks from the heights of the plurality of character blocks. A range of widths of possible character blocks that can become one character is set using the average height of the character blocks, and the frequency information in the frequency table memory is used to obtain the highest frequency within a predetermined allowable width. means for obtaining upper and lower limit values of the width of a character block within the existing range; and means for updating the upper limit value based on frequency information in the frequency table memory to obtain the upper and lower limit values for a single character width class. and means for checking the spread of the upper limit value and lower limit value of the one character width class and identifying whether it is a variable character pitch character string or an equal character pitch character string.