JPH03250385A - Character string extracting system - Google Patents

Character string extracting system

Info

Publication number
JPH03250385A
JPH03250385A JP2048362A JP4836290A JPH03250385A JP H03250385 A JPH03250385 A JP H03250385A JP 2048362 A JP2048362 A JP 2048362A JP 4836290 A JP4836290 A JP 4836290A JP H03250385 A JPH03250385 A JP H03250385A
Authority
JP
Japan
Prior art keywords
character string
processing
run length
point
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2048362A
Other languages
Japanese (ja)
Inventor
Takashi Ishikawa
孝 石川
Akihiro Oka
昭宏 岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pentel Co Ltd
Original Assignee
Pentel Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pentel Co Ltd filed Critical Pentel Co Ltd
Priority to JP2048362A priority Critical patent/JPH03250385A/en
Publication of JPH03250385A publication Critical patent/JPH03250385A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 (産業上の利用分野) 文書情報をコンピュータに入力するため、文書をイメー
ジスキャナで読み取りコード化文書に変換する文書認識
システムに関するもので、特に文書画像データを入力と
して、文書項目の文字列を出力する文書認識システムに
関するものである。
Detailed Description of the Invention (Industrial Application Field) This invention relates to a document recognition system that reads a document using an image scanner and converts it into a coded document in order to input document information into a computer. The present invention relates to a document recognition system that outputs character strings of document items.

(従来の技術およびその課題) 文書情報をコンピュータに入力するため、従来はイメー
ジスキャナと画像処理装置を組み合わせた光学的文書読
取装置が知られており、文書情報から文字列領域を抽出
する方法として、連結画素の外接矩形を一定のアルゴリ
ズムでマージする方法が行なわれている。この方法では
、罫線が存在する場合には前処理によって罫線を構成す
る画素を除去することが必要であり、余分の計算時間を
必要としていた。
(Prior art and its problems) Conventionally, optical document reading devices that combine an image scanner and an image processing device have been known for inputting document information into a computer. , a method is used in which circumscribed rectangles of connected pixels are merged using a certain algorithm. In this method, if a ruled line exists, it is necessary to remove pixels forming the ruled line by preprocessing, which requires extra calculation time.

(課題を解決するための手段) 本発明は如上の問題点に鑑みなされたもので、文書画像
から文字列領域を抽出する文字列抽出処理において、水
平方向と垂直方向のランレングスを基準値と比較するこ
とによって、文字列領域を罫線から分離して抽出する文
字列抽出方式を提案するものである。
(Means for Solving the Problems) The present invention has been made in view of the above problems, and uses horizontal and vertical run lengths as reference values in a character string extraction process that extracts a character string area from a document image. This paper proposes a character string extraction method that separates and extracts character string regions from ruled lines through comparison.

(作用) 本発明の文字列抽出方式は、基本矩形の抽出と罫線の除
去を同時に、かつ、高速に処理する方式を提案するもの
である。
(Operation) The character string extraction method of the present invention proposes a method for processing basic rectangle extraction and ruled line removal simultaneously and at high speed.

(実施例) 本発明の基本的な考え方は、2列の画素列のOR処理を
上から下へと、下から上への2回処理を行なう際に、水
平方向のランレングスと垂直方向のランレングスとをチ
エツクして、ランレングスが予め定めた基準値を超える
場合には、そのランを構成する画素についてはOR処理
を行なわないというものである。つまり、ランレングス
に対する基準値は最大の文字サイズに相当し、この値を
超える連結画素は文字ではない、すなわち罫線と認識す
ることになる。
(Example) The basic idea of the present invention is that when performing OR processing of two pixel columns twice, from top to bottom and from bottom to top, the horizontal run length and vertical run length are When the run length is checked and the run length exceeds a predetermined reference value, the OR processing is not performed on the pixels forming the run. In other words, the reference value for the run length corresponds to the maximum character size, and connected pixels exceeding this value are not recognized as characters, that is, as ruled lines.

本発明を添付図面を参照して説明すると、第1図はゼネ
ラルフロー、第2図は各ステップのデータ、第3図は基
本矩形抽呂処理フローである。
The present invention will be explained with reference to the accompanying drawings. FIG. 1 shows a general flow, FIG. 2 shows data of each step, and FIG. 3 shows a basic rectangular drawer processing flow.

第1図のゼネラルフローのステップ1で、文書の画像を
イメージスキャナで読み込み、第2図のaで表すような
画像がデータとして入力される。
In step 1 of the general flow shown in FIG. 1, an image of a document is read by an image scanner, and an image shown by a in FIG. 2 is input as data.

ステップ2では、基本矩形抽出として4方向のOR処理
を行なうが、方向の順序(下向き、上向き、右向き、左
向き)は任意に設定できるものである。
In step 2, OR processing in four directions is performed as basic rectangle extraction, but the order of the directions (downward, upward, rightward, leftward) can be set arbitrarily.

本実施例では下向き、上向き、右向き、左向きの順序と
した(第3図参照)6尚、各OR処理は方向が異なるこ
とを除いて同じアルゴリズムによるので、下向きの処理
の例を第4図に示す。第4図はOR処理前の原画像を示
す。OはONの画素(データがあることを示す)を表し
、×は処理の着目点を示す。ここで下向きに処理してき
て、着目点×において、この画素がONでなく処理方向
についての1つ前の画素(0で示す)がONなので、着
目点Xは候補点となる。この候補点の画素をONとする
か否とするかは次の2つの基準に従う。
In this example, the order is downward, upward, rightward, and leftward (see Figure 3).6 Note that each OR process is based on the same algorithm except for the direction, so an example of the downward process is shown in Figure 4. show. FIG. 4 shows the original image before OR processing. O represents an ON pixel (indicating that there is data), and × represents a processing focus point. Here, processing is performed downward, and at the point of interest x, this pixel is not ON, but the previous pixel in the processing direction (indicated by 0) is ON, so the point of interest X becomes a candidate point. Whether the pixel at this candidate point is turned ON or not is determined according to the following two criteria.

(1)処理方向についての1つ前の画素までのランレン
グス(連続した画素数)が基準値未満である。
(1) The run length (number of consecutive pixels) up to the previous pixel in the processing direction is less than the reference value.

(2)候補点の処理方向に直交する方向での隣接画素の
少なくとも1つがONである。
(2) At least one of the pixels adjacent to the candidate point in the direction perpendicular to the processing direction is ON.

ここで、ランレングスの基準値は最大文字サイズ(画素
数単位)に対応し、予め定めておく。第4図の例ではこ
の基準値を10としておくと、着目点×でのランレング
スは3であり、(1)の基準を満たす。更に、左隣の画
素がONなので、(2)の基準をも満たす。従って、着
目点×をONにする(Oで示す、第5図参照)。第5図
の状態で次の着目点Xは(2)の基準を満たさないので
ONにしない。
Here, the reference value of the run length corresponds to the maximum character size (in units of number of pixels) and is determined in advance. In the example of FIG. 4, if this reference value is set to 10, the run length at the point of interest x is 3, which satisfies the criterion (1). Furthermore, since the pixel on the left is ON, the criterion (2) is also satisfied. Therefore, the point of interest x is turned on (indicated by O, see FIG. 5). In the state shown in FIG. 5, the next point of interest X does not satisfy the criterion (2), so it is not turned on.

次に罫線の除去に対する前記した(1)の基準の効果に
ついて第6図を参照して説明する。着目点×の1つ前ま
でのランレングスは10であり、予め定めた基準値以上
(未満ではない)なので、前記(2)の基準を満たして
もONにしない。これによって罫線の領域が拡大される
ことがなくなる。また、(1)の基準で基準値未満とし
たのは、第6図の処理の後で第7図の処理方向に処理す
る場合、着目点×が(1)の基準を満たすので、ONに
なってしまうため、この分の余裕を持たせるためである
Next, the effect of the above-mentioned criterion (1) on the removal of ruled lines will be explained with reference to FIG. The run length up to one point before the point of interest x is 10, which is greater than (but not less than) a predetermined reference value, so it is not turned on even if the criterion (2) is satisfied. This prevents the ruled line area from being expanded. Also, the reason why it is set to be less than the reference value based on the criterion (1) is that when processing in the processing direction shown in FIG. 7 after the processing shown in FIG. This is to provide some margin for this.

基本矩形抽出されたものを第2図のbに示す。The extracted basic rectangle is shown in FIG. 2b.

ステップ3では、基本矩形を文字列の長手方向に直交す
る方向に投影して重なりを持つ基本矩形の集まりに外接
する矩形として文字素を構成する(第2図のC参照)。
In step 3, a grapheme is constructed as a rectangle that circumscribes a collection of overlapping basic rectangles by projecting the basic rectangle in a direction perpendicular to the longitudinal direction of the character string (see C in FIG. 2).

ステップ4では、文字素を文字列の長手方向に投影して
重なりを持つものの集まりの外接矩形を文字列領域とし
て抽出する(第2図のd参照)。
In step 4, the graphemes are projected in the longitudinal direction of the character string, and a circumscribed rectangle of a collection of overlapping characters is extracted as a character string area (see d in FIG. 2).

(発明の効果) 本発明は如上のような構成となしたので、基本矩形抽出
処理が単純なOR処理とランレングスの基準値との比較
で構成されているので、高速に処理することが出来、か
つ、罫線の除去を同時に行なうことが出来るので、従来
のように罫線除去の前処理が不要であり、処理を効率化
することができるものである。
(Effects of the Invention) Since the present invention has the above configuration, the basic rectangle extraction process consists of a simple OR process and a comparison with the run length reference value, so it can be processed at high speed. Moreover, since the ruled lines can be removed at the same time, there is no need for pre-processing for removing the ruled lines as in the conventional method, and the processing can be made more efficient.

【図面の簡単な説明】[Brief explanation of drawings]

図面は本発明の一実施例を示すもので、第1図は本発明
のゼネラルフロー、第2図は各ステップのデータ、第3
図は基本矩形抽出処理フロー、第4図〜第7図は処理を
説明する図である。
The drawings show one embodiment of the present invention, and Fig. 1 shows the general flow of the invention, Fig. 2 shows the data of each step, and Fig. 3 shows the general flow of the invention.
The figure is a basic rectangle extraction process flow, and FIGS. 4 to 7 are diagrams explaining the process.

Claims (1)

【特許請求の範囲】[Claims] 文書画像から文字列領域を抽出する文字列抽出処理にお
いて、水平方向と垂直方向のランレングスを基準値と比
較することによって、文字列領域を罫線から分離して抽
出することを特徴とする文字列抽出方式
A character string characterized in that in a character string extraction process for extracting a character string region from a document image, the character string region is separated from the ruled lines and extracted by comparing the horizontal and vertical run lengths with reference values. Extraction method
JP2048362A 1990-02-28 1990-02-28 Character string extracting system Pending JPH03250385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2048362A JPH03250385A (en) 1990-02-28 1990-02-28 Character string extracting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2048362A JPH03250385A (en) 1990-02-28 1990-02-28 Character string extracting system

Publications (1)

Publication Number Publication Date
JPH03250385A true JPH03250385A (en) 1991-11-08

Family

ID=12801237

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2048362A Pending JPH03250385A (en) 1990-02-28 1990-02-28 Character string extracting system

Country Status (1)

Country Link
JP (1) JPH03250385A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6005976A (en) * 1993-02-25 1999-12-21 Fujitsu Limited Image extraction system for extracting patterns such as characters, graphics and symbols from image having frame formed by straight line portions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6005976A (en) * 1993-02-25 1999-12-21 Fujitsu Limited Image extraction system for extracting patterns such as characters, graphics and symbols from image having frame formed by straight line portions

Similar Documents

Publication Publication Date Title
JP2940936B2 (en) Tablespace identification method
KR0167616B1 (en) Image processing apparatus and method
JPH05342408A (en) Document image filing device
US20020085755A1 (en) Method for region analysis of document image
JPH03250385A (en) Character string extracting system
JP2890306B2 (en) Table space separation apparatus and table space separation method
JP2000090194A (en) Image processing method and image processor
JPS61193277A (en) Document reader
JPH10105647A (en) Device and method for reading container number
JPH05159062A (en) Document recognition device
JPH03142691A (en) Table format document recognizing system
JPH09134404A (en) Bar graph recognizing device
JP2794042B2 (en) Recognition device for tabular documents
JPS63304387A (en) document reading device
JPS615383A (en) Character pattern separating device
JP3197441B2 (en) Character recognition device
JP4040231B2 (en) Character extraction method and apparatus, and storage medium
JP3140079B2 (en) Ruled line recognition method and table processing method
JPS6254380A (en) character recognition device
JP3163698B2 (en) Character recognition method
JP2509992B2 (en) Separation character integration method
JPH0475186A (en) Character reader
JPH0281189A (en) Character recognition method
JPH0463435B2 (en)
JPH05284335A (en) Picture information reduction method