JPH03218590A

JPH03218590A - Optimal binarization method

Info

Publication number: JPH03218590A
Application number: JP1169034A
Authority: JP
Inventors: Goro Bessho; 吾朗別所; Michiyoshi Tachikawa; 道義立川; Hajime Sato; 元佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-07-29
Filing date: 1989-06-30
Publication date: 1991-09-26

Abstract

PURPOSE:To deal with an original printed badly by normalizing the density of an input picture by expressing the number of black picture elements at every threshold by rate on the basis of the black picture element obtained by considering the picture element excepting the picture element of the lightest level as the black picture element, and obtaining the optimal threshold from the rate of the number of the black picture elements at each threshold. CONSTITUTION:The number of the black picture elements at every threshold is counted by counting the number of the picture elements of the multi-level-quantized picture for every density, and accumulating successively the number of picture elements as changing the threshold from a dark level to a light level. In this case, when the density of the input picture is normalized by expressing the number of the black picture elements at every threshold by the rate on the basis of the number of the black picture elements obtained by considering the picture element other than the picture element of the lightest level as the black picture element, an obtained result comes to express the worn-down state of a character at that threshold to the most worn-down character. Thus, by determining the optimal threshold by consulting the normalized rate value of the character of the best state after such normalization, the original of a bad printed state can be dealt with.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字認識などのパターン認識装置における最
適２値化方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an optimal binarization method in a pattern recognition device such as character recognition.

[Conventional technology]

一般に、文字認識などのパターン認識装置において処理
される画像は、スキャナのＣＣＤイメージセンサ出力な
どの値を閾値（スレッシュレベル）によって白黒２値化
したものである。この際、印字状態の良くない原稿であ
っても最適なる２値化を可能とするため、原稿の濃度の
相違に対応して各々最適な２値化閾値を生成する必要が
ある。In general, images processed in pattern recognition devices such as character recognition are obtained by converting values such as the output of a CCD image sensor of a scanner into black and white binarized values using a threshold value (threshold level). At this time, in order to enable optimal binarization even for documents with poor printing conditions, it is necessary to generate respective optimal binarization thresholds corresponding to differences in the density of the documents.

このような２値化方法に関しては、種々の方法が提案さ
れている。例えば、［田村秀行著、総研出版、１９８５
　［ｒコンピュータ画像処理入門Ｊ中、第６７頁」なる
文献に示されるモード法や微分ヒストグラム法やｐ一タ
イル法がある。モード法は、与えられた画像の濃度値の
ヒストグラムを求め、２つのピークを持つ分布となる場
合に，２つのピークの間の谷のところに閾値を決めるも
のである．また、微分ヒス１・ダラム法は、画像中の対
象物と背景との境界は．１度値が急に変化する部分に位
置すると考えられるため、画像の濃度値を直接利用する
のではなく、微分値（濃度の変化率）を利用して閾値を
決めるというものである。ｐ一タイル法は画像全体の面
積を基準にして処理するものである。Various methods have been proposed regarding such binarization methods. For example, [Hideyuki Tamura, Souken Publishing, 1985
There are the mode method, the differential histogram method, and the p-tile method, which are shown in the document "Introduction to Computer Image Processing J, p. 67". The mode method calculates a histogram of the density values of a given image, and when the distribution has two peaks, a threshold is determined at the valley between the two peaks. In addition, in the differential His1-Durham method, the boundary between the object and the background in the image is . Since this is considered to be located in a portion where the 1 degree value changes suddenly, the threshold value is determined using a differential value (rate of change in density) rather than directly using the density value of the image. The p-tile method performs processing based on the area of the entire image.

また、［昭和５２年度電子通信学会情報部門全国大会、
大津展之、『濃度分布からの閾値決定法』中、１４５』
なる文献に示される濃度分布からの閾値決定法がある。In addition, [1972 National Conference of the Institute of Electronics and Communication Engineers, Information Section,
Nobuyuki Otsu, “Threshold Determination Method from Concentration Distribution”, 145”
There is a method for determining the threshold value from the concentration distribution, which is shown in the literature.

これは、濃度分布の０次、１次モーメントのみを利用し
、積分に基づいて最適なる閾値を決定するものである。This method uses only the 0th and 1st moments of the concentration distribution and determines the optimal threshold based on integration.

さらに、特公昭６０−３７９５２号公報に示されるｆ最
適二値化方式」がある，これは、多値ビデオ信号をビデ
オ・バッファに格納し、ビデオ・バッファから読出され
たビデオ信号を可変スライスレベルのスライス回路によ
り２値化し、多値ビデオ情報を異なるスライスレベルで
スライスして２値化ビデオ信号に変換し、異なるスライ
スレベルでスライスして作成した複数の２値化ビデオ信
号の各々について（黒点数）／（周囲数）なる線幅増幅
率を求め，複数の線幅増幅率と基準の線幅増幅率とに基
づきスライス回路のスライスレベルを設定するものであ
る。Furthermore, there is an f-optimal binarization method disclosed in Japanese Patent Publication No. 60-37952, which stores a multilevel video signal in a video buffer and converts the video signal read from the video buffer into a variable slice level. For each of the plurality of binary video signals created by slicing the multilevel video information at different slice levels, The line width amplification factor (number of points)/(number of circumferences) is determined, and the slice level of the slice circuit is set based on the plurality of line width amplification factors and the standard line width amplification factor.

[Problem to be solved by the invention]

ところが、モード法は、印字状態の悪い原稿では、ヒス
１〜グラムに明確な谷を生じないので、適用できない方
法である。微分ヒストグラム法は、対象物と背景の境界
付近の濃度値が複雑に変化するものに対しては、有効に
働かない。ｐ一タイル法にあっては、画像全体の面積を
基準とするため，原稿の文字数、１文字についての大き
さ、複雑さ、画数によっては、最適な閾値が得られない
ことがある。However, the mode method cannot be applied to documents with poor printing conditions because no clear valleys are produced in the hiss 1 to gram range. The differential histogram method does not work effectively when the density value near the boundary between the object and the background changes in a complex manner. In the p-tile method, since the area of the entire image is used as a standard, an optimal threshold value may not be obtained depending on the number of characters in the document, the size, complexity, and number of strokes of each character.

また、濃度分布からの閾値決定法は、文字認識などのパ
ターン認識において扱われる画像としての「線」のつぶ
れやかすれに対する処理としては、効果的な方法ではな
い。Further, the method of determining a threshold value from the density distribution is not an effective method for processing blurred or blurred "lines" as images used in pattern recognition such as character recognition.

さらに、上記公報の最適二値化法は、実験の結果、原稿
の濃淡によっては適正な閾値決定が不安定であることが
分った。Further, as a result of experiments, it has been found that the optimal binarization method disclosed in the above-mentioned publication is unstable in determining an appropriate threshold value depending on the density of the original.

また、何れにしても、実際の原稿画像では濃度が部分的
に変化していることが多いが（１枚の原稿中で印字状態
にムラがある場合や、入力装置のシェーディングなどで
画像の濃度値が変化する場合等）、従来方式による場合
、このような原稿の潴度ｌ１ラなどの局所的な濃度変化
に対応した最適な２値画像を生成するのが困難である。In addition, in any case, the density of the actual original image often changes partially (the density of the image may vary due to uneven printing within a single original, shading of the input device, etc.). If the conventional method is used, it is difficult to generate an optimal binary image that corresponds to local density changes such as the orientation of the document.

[Means to solve the problem]

多値量子化された画像を白黒２値の画像に変換する２値
化方法において，特許請求の範囲の請求項（１）記載の
発明では、閾値を濃レベルから淡レベルに変化させた時
の黒画素数を計数し、最も淡いレベルの画素以外を黒画
素としたものを基準として各閾値における黒画素数を割
合で表して入力画像の密度を正規化し、各閾値における
黒画素数の割合から最適な閾値を求める。In the binarization method for converting a multi-level quantized image into a black and white binary image, the invention described in claim (1) provides that when the threshold value is changed from a dark level to a light level, The density of the input image is normalized by counting the number of black pixels, and expressing the number of black pixels at each threshold as a percentage based on the black pixels excluding pixels at the lightest level, and calculating from the ratio of the number of black pixels at each threshold. Find the optimal threshold.

請求項（２）記載の発明では、各閾値における黒画素数
の割合に代えて、最も濃い画素が現れ始めるレベルを用
いて最適な閾値を求める。In the invention described in claim (2), the optimal threshold value is determined using the level at which the darkest pixel begins to appear, instead of the ratio of the number of black pixels at each threshold value.

請求項（３）記載の発明では、各閾値における黒画素数
の割合の変化率から原稿濃度を判別して最適な閾値を求
める。In the invention described in claim (3), the optimum threshold value is determined by determining the original density from the rate of change in the ratio of the number of black pixels at each threshold value.

請求項（４）記載の発明では、固定の小領域分について
の多値画像をメモリに保有し、当該小領域内において閾
値を濃レベルから淡レベルに変化させた時の黒画素数を
計数し、最も淡いレベルの画素以外を黒画素としたもの
を基準として各閾値における黒画素数を割合で表して入
力画像の密度を正規化し、各閾値における黒画素数の割
合から当該小領域の最適閾値を求め、前記小領域を変化
させて全画像に対する最適閾値を求め、各小領域毎にそ
の小領域の最適閾値により２値化した２値画像を出力さ
せる。In the invention described in claim (4), a multivalued image for a fixed small area is held in a memory, and the number of black pixels is counted when the threshold value is changed from a dark level to a light level within the small area. , the density of the input image is normalized by expressing the number of black pixels at each threshold as a percentage, with all pixels other than those at the lightest level as black pixels, and the optimal threshold for the small area is determined from the ratio of the number of black pixels at each threshold. is determined, an optimal threshold for the entire image is determined by changing the small region, and a binary image is output for each small region, which is binarized using the optimal threshold for that small region.

請求項（５）記載の発明では、請求項（４）記載の発明
における、各閾値における黒画素数の割合に代えて、最
も濃い画素が現れ始めるレベルを用いて最適な閾値を求
める。In the invention set forth in claim (5), instead of the ratio of the number of black pixels at each threshold value in the invention set forth in claim (4), the optimum threshold value is determined using a level at which the darkest pixel begins to appear.

請求項（６）記載の発明では，請求項（４）記載の発明
における、各閾値における黒画素数の割合に代えて、各
閾値における黒画素数の割合の変化率から原稿濃度を判
別して最適な閾値を求める。In the invention described in claim (6), instead of the ratio of the number of black pixels at each threshold value in the invention described in claim (4), the document density is determined from the rate of change in the ratio of the number of black pixels at each threshold value. Find the optimal threshold.

請求項（７）記載の発明では、画像の部分を複数の小領
域に分割し，各小領域が文字領域であるか否かを判定し
、文字領域と判定された小領域数が一定数を越えたとき
に、当該画像部分内の文字領域と判定された小領域の統
合領域について請求項（１），（２）または（３）記載
の方法によって画像全体の２値化のための最適閾値を求
める。In the invention described in claim (7), a part of the image is divided into a plurality of small areas, it is determined whether each small area is a character area, and the number of small areas determined to be a character area is a certain number. an optimal threshold for binarizing the entire image by the method according to claim (1), (2), or (3) for the integrated area of the small area determined to be a character area in the image portion when the threshold is exceeded; seek.

請求項（８）記載の発明では、請求項（１），（２），
（３）または（７）記載の方法において、決定した最適
閾値をスキャナに設定し、スキャナより２値画像を直接
入力する。In the invention described in claim (8), claims (1), (2),
In the method described in (3) or (7), the determined optimal threshold is set in the scanner, and the binary image is directly input from the scanner.

請求項（９）記載の発明では、請求Ｉｎ　（　１　）な
いし（８）記載の方法において、あるレベルを閾値とし
て計数された黒画素数と次のレベルを閾値として計数さ
れた黒画素数との割合に基づいて基準レベルを決定し、
この基準レベルを閾値とした黒画素数を、″最も淡いレ
ベルの画素以外を黒画素としたもの″に代えて、入力画
像の密度の正規化の基準として用いる。In the invention described in claim (9), in the method described in claims In (1) to (8), the number of black pixels counted using a certain level as a threshold and the number of black pixels counted using the next level as a threshold are Determine the reference level based on the percentage,
The number of black pixels using this reference level as a threshold is used as a reference for normalizing the density of the input image instead of "black pixels other than pixels at the lightest level".

[For production]

請求項（１）記載の発明によれば、多値量子化された画
像を濃度レベル毎に画素数を計数し、閾値を濃レベル側
から淡レベル側に変化させ、画素数を累積していくこと
により、各閾値毎の黒画素数を計数する。この場合、最
も淡いレベルの画素以外を黒画素とした黒画素数は原稿
中の文字数等によって変化してしまうのである。この点
、最も淡いレベルの画素以外を黒画素とした黒画素数を
基準として各閾値における黒画素数を割合で表して入力
画像の密度を正規化すると、得られた結果は、最もつぶ
れた文字に対するその閾値での文字のつぶれ（或いはか
すれ）具合を表すことになる。よって、このような正規
化の後、最も良い状態となっている文字での正規化割合
値を参考して、最適な閾値を決定することにより、印字
状態の悪い原稿にも対応できる。According to the invention described in claim (1), the number of pixels of a multi-level quantized image is counted for each density level, the threshold value is changed from the dark level side to the light level side, and the number of pixels is accumulated. By doing this, the number of black pixels for each threshold value is counted. In this case, the number of black pixels, with pixels other than those at the lightest level being black, changes depending on the number of characters in the document, etc. In this regard, if we normalize the density of the input image by expressing the number of black pixels at each threshold as a percentage based on the number of black pixels with pixels other than those at the lightest level as black pixels, the obtained result is It represents the extent to which characters are blurred (or faded) at that threshold value. Therefore, after such normalization, by determining the optimal threshold value with reference to the normalized ratio value of the character in the best condition, it is possible to deal with documents with poor printing conditions.

また、正規化割合の値がほんの小さな値をとるレベル、
即ち、画像としては最もかすれていて消える寸前のレベ
ルに着目すれば、請求項（２）記載の発明のように、そ
の原稿にとって最も濃い画素が現れ始めるレベルを考慮
しても、最適なる閾値が求まる。Also, the level where the value of the normalized ratio takes a very small value,
In other words, if we focus on the level at which the image is at its faintest and is on the verge of disappearing, as in the invention set forth in claim (2), it is possible to determine the optimal threshold even when considering the level at which the darkest pixels begin to appear for the original. Seek.

さらには、正規化割合の値と濃度レベルとの関係を調べ
ると、あるレベルにおいては線形性が見られる。よって
、各閾値における黒画素数の割合自身ではなく、請求項
（３）記載の発明のように、各閾値における黒画素数の
割合の変化率から原稿濃度を判別することによっても、
さらに最適なる閾値が求まる。Furthermore, when examining the relationship between the normalized ratio value and the concentration level, linearity is observed at a certain level. Therefore, by determining the document density from the rate of change in the ratio of the number of black pixels at each threshold value, as in the invention described in claim (3), instead of the ratio itself of the number of black pixels at each threshold value,
Furthermore, the optimal threshold value is determined.

請求項（４），（５）または（６）記載の発明によれは
、前述のような処理を、原稿画像全体について一括して
行うのではなく、固定の小領域に分割して各小領域毎に
最適閾値を求めて２値化処理するので，局所的な濃度変
化に対応した最適な２値画像が得られることになる。According to the invention described in claim (4), (5), or (6), the above-mentioned processing is not performed on the entire document image at once, but is divided into fixed small areas and processed on each small area. Since the binarization process is performed by finding the optimum threshold value for each image, an optimum binary image corresponding to local density changes can be obtained.

請求項（７）記載の発明によれば、閾値決定のための処
理斌を少なくすることにより，処理の高速化を図ること
ができるとともに、文字領域と判定された小領域を統合
し、その統合領域に関して閾値決定のための処理を行う
ので、領域ごとの閾値決定のバラツキが少なくなり，良
好な２値画像が得られる。According to the invention described in claim (7), by reducing the number of processing steps for threshold determination, it is possible to speed up the processing, and also to integrate small areas determined to be character areas, Since the processing for determining the threshold value is performed for each region, variations in threshold value determination for each region are reduced, and a good binary image can be obtained.

請求項（８）記載の発明によれば、スキャナより最適閾
値による２値化画像を直接入力するので、多値画像デー
タの処理によって２値画像を得る方法よりもデータ処理
量が少なくなり、比較的低速な処理系によっても高速に
２値画像を得られる。According to the invention described in claim (8), since the binarized image using the optimal threshold value is directly input from the scanner, the amount of data processing is smaller than the method of obtaining a binary image by processing multi-valued image data. Binary images can be obtained at high speed even with a relatively slow processing system.

また請求項（９）記載の発明によれば、文書画像におけ
る地肌を除いた最も淡い濃度レベルを基準レベルとして
入力画像の密度の正規化を行うことができるので、地肌
ノイズが存在する原稿の画像に対しても最適な閾値によ
り２値化を行い、良好な認識率が得られるようになる。Further, according to the invention described in claim (9), the density of the input image can be normalized using the lightest density level excluding the background in the document image as the reference level, so that the density of the input image can be normalized using the lightest density level excluding the background in the document image. Also, binarization is performed using an optimal threshold value, and a good recognition rate can be obtained.

〔Example〕

朶施例一↓ 特許請求の範囲の請求項（１）記載の発明の一実施例を
第１図及び第２図に基づいて説明する。Example 1 ↓ An example of the invention described in claim (1) will be described based on FIGS. 1 and 2.

第２図に本実施例を実施するブロック構成図を示す。FIG. 2 shows a block diagram for implementing this embodiment.

これは、多値画像読込み部１から２値画像出力部２まで
の処理に関するものである。概略的には、まず、多値画
像読込み部１にてスキャナ３から多値画像（ここでは１
６値量子化画像とする）を読み込み、多値イメージメモ
リ４に保有する。次に濃素ヒス１・ダラムカウント部５
で，この多値イメージメモリ４から１６階調の多値画像
（′ａ度レベル０か６１５）を読み込み、各々の濃度レ
ベルの画素数を計数する。そして、このようにして得ら
れた濃度ヒス１一グラムに基づき、閾値計算部６におい
て画像の濃度を示す特性値を求めて最適な閾値を計算し
、２値化部７により当該最適閾値に基づき多値画像を２
値化する。２値化された画像は、２値イメージメモリ８
に保有する一方、２値画像出力部２を経て文字認識部９
などに送出し文字認識等の処理に供する。This relates to the processing from the multivalued image reading section 1 to the binary image outputting section 2. Briefly, first, the multi-value image reading unit 1 reads a multi-value image (in this case, 1 image) from the scanner 3.
A six-value quantized image) is read and stored in the multi-value image memory 4. Next, concentrated hiss 1 and duram count section 5
Then, a 16-gradation multi-value image ('a degree level 0 or 615) is read from this multi-value image memory 4, and the number of pixels at each density level is counted. Then, based on the density histogram obtained in this way, the threshold calculation unit 6 calculates the characteristic value indicating the density of the image to calculate the optimal threshold, and the binarization unit 7 calculates the optimal threshold based on the optimal threshold. 2 multivalued images
Value. The binarized image is stored in a binary image memory 8.
On the other hand, it is stored in the character recognition unit 9 via the binary image output unit 2.
etc., and is used for processing such as character recognition.

ここに、特に，閾値計算部６による処理に特徴があり、
この閾値計算部６には、累積ヒストグラムメモリ１０、
正規化ヒストグラムメモリ１１、最適百分率一定値１２
及び閾値テーブル１３が各々接続されている。Here, there is a particular feature in the processing by the threshold calculation unit 6,
This threshold calculation unit 6 includes a cumulative histogram memory 10,
Normalized histogram memory 11, optimal percentage constant value 12
and threshold table 13 are connected to each other.

このような構成において、本実施例による処理を、第１
図のフローチャートを参照し、１６値の量子化画像を扱
う例で説明する。まず，スキャナ３より入力された１６
値の量子化画像の各濃度レベルの画素数を計数する。第
１図中，“ｃｏｎ″′は濃度レベルを示し、′″Ｑｖ″
′は得られた濃度ヒストグラム（画素数）を示す。In such a configuration, the processing according to this embodiment is performed in the first
An example of handling a 16-value quantized image will be explained with reference to the flowchart shown in the figure. First, 16 input from scanner 3
Value quantization Counts the number of pixels at each density level of the image. In Figure 1, "con"' indicates the concentration level, and '"Qv"
′ indicates the obtained density histogram (number of pixels).

ここに、一般に、白地に書かれた文字画像のようなもの
の濃度の分布としては、何も書かれていない、即ち，最
も淡いと感知された画素（これを０レベルの画素と呼ぶ
）が圧倒的に多い．レベル０以外の画素として感知され
るのは、文字及びその周辺部（或いはノイズ）といった
ものである。Here, in general, the density distribution of something like a character image written on a white background is dominated by pixels that are perceived to be the lightest (these are called 0-level pixels). There are many cases. What is sensed as pixels other than level 0 are characters and their surroundings (or noise).

この際、従来のように、読み込む画像領域を指定して、
その領域内の全ての画素を同一とみなして閾値を判定す
る方法によると，このレベル０の画素が文書の形態、即
ち、白地が多いか少ないかにより閾値が異なることにな
ってしまい、同一の濃度で書かれた原稿であっても白地
部分の多少によって閾値が変化してしまうことになる。At this time, as before, specify the image area to be read,
According to the method of determining the threshold value by considering all pixels in the area to be the same, the threshold value for this level 0 pixel will differ depending on the format of the document, that is, whether there is a lot of white background or not. Even if the document is written in high density, the threshold value will change depending on the amount of white background.

この点、本実施例では、閾値を求める計算をする際に、
このようなレベル０の画素というものを排除して次のよ
うに処理する．即ち、閾値計算部６において，濃度ヒストグラムを入力
データとし、レベル１以上の画素に関して濃い画素の方
から画素数を累積していく．第１図中、”　ｓ　Ｑ　ｖ
″′はこの累積値を示す。この累積値を各レベル毎に計
算し、累積ヒストグラムメモリ１０に保有する。この累
積値はそのレベルを閾値とした時の黒画素数に相当する
ことになる．このような画素の累積処理を、濃い画素レ
ベルからレベル１なる淡レベルまで行う。レベル１の累
積値とは、上述したレベルＯに対する文字及びその周辺
部（或いはノイズ）の黒画素数である。ところが、この
レベルｌの累積値というのも，原稿中の文字数或いは１
文字についての文字の大きさ、複雑さ、画数などによっ
て変化する。In this regard, in this embodiment, when calculating the threshold value,
These level 0 pixels are excluded and processed as follows. That is, the threshold calculation unit 6 uses the density histogram as input data, and accumulates the number of pixels starting from the darkest pixels of level 1 or higher. In Figure 1, "s Q v
``'' indicates this cumulative value. This cumulative value is calculated for each level and stored in the cumulative histogram memory 10. This cumulative value corresponds to the number of black pixels when that level is taken as a threshold. Such pixel accumulation processing is performed from a dark pixel level to a light level of level 1. The cumulative value of level 1 is the number of black pixels of the character and its surrounding area (or noise) with respect to the above-mentioned level O. However, the cumulative value of level l is also the same as the number of characters in the manuscript or 1
It changes depending on the size, complexity, number of strokes, etc. of the characters.

そこで、引き続き、レベル１の累積値を基準として，各
レベルでの累稙値を正規化する。具体的には、レベル１
の累積値を１００として、各レベルでの累積値をレベル
１に対する百分率で表す。Therefore, successively, the cumulative value at each level is normalized using the cumulative value of level 1 as a reference. Specifically, level 1
The cumulative value at each level is expressed as a percentage of level 1, assuming that the cumulative value is 100.

第１図中、１１　，　ＱｖＩ１は累積値の百分率を示し
，あるレベルｊにおける百分率“ｒ・Ｒｖ［ｊｌ”はｓ
Ｑｖ［ｊコ／ｓＱｖ［ｌ］に基づき算出される。In Figure 1, 11, QvI1 indicates the percentage of the cumulative value, and the percentage "r・Rv[jl" at a certain level j is s
Calculated based on Qv[j/sQv[l].

この百分率を「止規化累積値」と称することとする。す
ると、各レベルでの正規化累積値というのは、最もつぶ
れた文字に対するそのレベルでの文字のつぶれ（或いは
かすれ）具合を表すことになる．そこで，最も良い状態になっている文字での正規化累積
値を参考にして、最適なる閾値を決定することになる。This percentage will be referred to as the "standardized cumulative value." Then, the normalized cumulative value at each level represents the degree of blurring (or blurring) of the characters at that level relative to the most blurred character. Therefore, the optimal threshold value is determined by referring to the normalized cumulative value of the character in the best condition.

この決定方法の第１の具体的方法を説明する。The first specific method of this determination method will be explained.

上記のつぶれ（かすれ）具合が最も良い状態である正規
化累積値というのは，予め何枚かの原稿を実際に認識さ
せて最適な閾値を求めておき，そこから逆算して求めら
れる。即ち、各レベルでの正規化累積値と、予め求めた
最適な正規化累積値（第１図の最適百分率一定値１２）
とを比較し、その差が最も小さくなるようなレベルを求
めて，そのレベルを最適な閾値とする方法である。The above-mentioned normalized cumulative value, which indicates the best degree of blurring, is obtained by actually recognizing several manuscripts in advance to find the optimal threshold, and then calculating backwards from there. That is, the normalized cumulative value at each level and the optimal normalized cumulative value determined in advance (optimal percentage constant value 12 in Figure 1).
This method compares the two, finds the level that minimizes the difference, and uses that level as the optimal threshold.

仮に、この最適な正規化累積値を７０％とすると、第１
表の例では、濃度レベル５の値が７０％に最も近いので
、最適な閾値は５となる。If this optimal normalized cumulative value is 70%, then the first
In the example of the table, the value of density level 5 is closest to 70%, so the optimal threshold value is 5.

第１表正規化累積値の例また、上記の決定方法の第２の具体的方法を説明する。Table 1 Example of normalized cumulative value Also, a second specific method of the above determination method will be explained.

スキャナ３によっては多値画像と２値画像との両方を出
力できるものもあり、多値画像として出力した画像を２
値化するのであれば，第１の具体的方法でよいが，２値
化の閾値を決定した後、改めて２値画像を読込む方法に
おいては、第１の具体的方法では最適なる閾値が得られ
ない場合が生じ得る。Some scanners 3 can output both multivalued images and binary images;
If you want to convert into a value, you can use the first specific method, but if you decide on the binarization threshold and then read the binary image again, the first specific method will not yield the optimal threshold. There may be cases where this is not possible.

そこで、第２の具体的方法では、まず、最適止規化累積
値に最も近い正規化累積値が得られたレベルを求めた後
、このレベルと、その次に近いレベルとの間で正規化累
積値が最適正規化累積値に最も近くなる濃度レベルを計
算によって小数第１位まで近似して求める。上述の第１
表の例によれば，７０％に最も近いレベルが７２．８６
％のレベル５であり、その次に近いレベルが６３．３０
％のレベル６であるので、その差を１０で割算した値（
７２．８６−６３．３０）／ｉｏ＝０．９５６を用いて
、下記のような計算をする。Therefore, in the second specific method, first find the level at which the normalized cumulative value closest to the optimal regularized cumulative value is obtained, and then normalize between this level and the next closest level. The density level at which the cumulative value is closest to the optimal normalized cumulative value is calculated and approximated to the first decimal place. The first above
According to the example in the table, the level closest to 70% is 72.86.
% level 5, and the next closest level is 63.30
Since it is level 6 of %, the value obtained by dividing the difference by 10 (
72.86-63.30)/io=0.956, the following calculation is performed.

６３．３０＋０．９５６　　　＝６２．２５６　　　・
・・レベル５．９６　３．３　０＋０．９　５　６　ｘ
　２＝６　２．２１２　　　・・レベル５．８６３．３
０＋０．９５６Ｘ７＝６９．９９２　　　・・・レベル
５．３６３．３０＋０．９５６Ｘ８＝７０．９４８　　
　・・・レベル５，２６３．３０＋０．９５６ｘ９＝７
１．９０４　　　・・・レベル５．１そして、この小数
第１位まで近似したレベルよリ、最適な正規化累積値に
最も近くなるレベルを選ぶ。この例では、最適な正規化
累積値７０％に最も近いレベルを求めると、レベル５．
３となる。63.30+0.956 =62.256 ・
・Level 5.96 3.3 0+0.9 5 6 x
2=6 2.212 ・Level 5.863.3
0+0.956X7=69.992 ...Level 5.363.30+0.956X8=70.948
...Level 5,263.30+0.956x9=7
1.904...Level 5.1 Then, from the levels approximated to the first decimal place, select the level that is closest to the optimal normalized cumulative value. In this example, when determining the level closest to the optimal normalized cumulative value of 70%, level 5.
It becomes 3.

このレベルから最適な閾値を求めるためのテーブル（第
２図の閾値テーブル１３）を参照し、最適な閾値を決定
する。The optimum threshold value is determined by referring to a table (threshold value table 13 in FIG. 2) for determining the optimum threshold value from this level.

なお、第１図のフローチャートはこの第２の具体的方法
を示している。Note that the flowchart in FIG. 1 shows this second specific method.

このような本実施例によれば、ワイヤドットプリンタに
よる印字原稿のように印字状態の良くない原稿の場合で
あっても、最適なる２値化の閾値を自動的に設定するこ
とができ、最も良い認識率が得られる。According to this embodiment, even in the case of a document with poor print quality, such as a document printed by a wire dot printer, the optimal binarization threshold can be automatically set, and the most A good recognition rate can be obtained.

実施例２特許請求の範囲の請求項（２）記載の発明の一実施例を
第３図及び第４図により説明する。第１図及び第２図で
説明した部分と同一部分は同一符号を用い、説明を省略
する。本実施例では、閾値計算部６Ａにおいて最適百分
率一定値１２に代えて、最濃レベル百分率１４が備えら
れている。Embodiment 2 An embodiment of the invention described in claim (2) will be described with reference to FIGS. 3 and 4. Components that are the same as those described in FIGS. 1 and 2 are designated by the same reference numerals, and their description will be omitted. In this embodiment, instead of the optimum percentage constant value 12, the threshold calculation unit 6A is provided with a maximum level percentage 14.

このような構成において、前記実施例１の場合と同様に
、濃いレベルからレベル１までのヒストグラムの累積処
理、ヒストグラム累積値の止規化処理を経て、レベル１
の累積値を１００として、各レベルでの累積値をレベル
１に対する百分率で表す。この場合も、各レベルでの正
規化累積値というのは、最もつぶれた文字に対するその
レベルでの文字のつぶれ（或いはかすれ）具合を表すこ
とになる。In such a configuration, as in the case of the first embodiment, the histogram is accumulated from the dark level to the level 1, and the histogram cumulative value is normalized.
The cumulative value at each level is expressed as a percentage of level 1, assuming that the cumulative value is 100. In this case as well, the normalized cumulative value at each level represents the degree of blurring (or blurring) of the characters at that level relative to the most blurred character.

ここに、本実施例では、正規化累積値がほんの小さな値
をとるレベル、即ち画像としては最もかすれていて消え
る寸前のレベルに注目して、閾値を決定しようとするも
のである。これは，その原稿にとって最も濃い画素が現
れ始めるレベルということになる。In this embodiment, the threshold value is determined by focusing on the level at which the normalized cumulative value takes a very small value, that is, the level at which the image is the faintest and on the verge of disappearing. This is the level at which the darkest pixels begin to appear for the original.

この処理の内容を本実施例における第１の具体例にて説
明する。いま、仮に、この正規化累積値（第３図の最濃
レベル百分＊１４）を５％に設定したとする。正規化累
積値とこの正規化累積値５％との差が最も小さくなるよ
うなレベルを求める。The contents of this process will be explained using a first specific example in this embodiment. Now, let us assume that this normalized cumulative value (the darkest level 100%*14 in FIG. 3) is set to 5%. The level at which the difference between the normalized cumulative value and this normalized cumulative value of 5% is the smallest is determined.

前述した第１表の例によれば、正規化累積値５．５８な
る濃度レベル１１が該当する。この濃度レベルと原稿の
最適な閾値との関係を予め調へておけば，このように求
められた濃度レベルから最適な閾値が決定される。According to the example in Table 1 described above, density level 11 with a normalized cumulative value of 5.58 corresponds. If the relationship between this density level and the optimum threshold value of the document is determined in advance, the optimum threshold value can be determined from the density level determined in this way.

また、本実施例による決定方法の第２の具体的方法を説
明する（第４図は、この方法を示している）。スキャナ
３によっては、各レベルにおけるスパンが異なるものが
あるので、より精度を追及するためには、正規化累積値
が５％に最も近いレベルを求めた後、その次に近いレベ
ルの止規化累積値との差を１０で割った値から濃度レベ
ルを計算によって小数第１位まで近似して求める。上述
の第１表の例によれば、最も近いレベルが５．５８％の
レベル１１であり、その次に近いレベルが１．５１％の
レベル１２であるので、その差を１０で割算した値（５
．５８−１．５１）／ｉｏ＝０．４０７を用いて、レベ
ル１１からレベル１２までを、下記のような計算によっ
て求める。Also, a second specific method of the determination method according to this embodiment will be explained (FIG. 4 shows this method). Depending on the scanner 3, the span at each level may differ, so in order to pursue higher accuracy, find the level whose normalized cumulative value is closest to 5%, and then set the next closest level. The density level is obtained by calculating the difference from the cumulative value divided by 10 and approximating it to the first decimal place. According to the example in Table 1 above, the closest level is level 11 with 5.58%, and the next closest level is level 12 with 1.51%, so the difference was divided by 10. value (5
．． Using 58-1.51)/io=0.407, levels 11 to 12 are calculated as follows.

１．５１＋０．４０７　　　＝ｌ．１９７　　　レベル
１１．９１．５１＋０．４０７ｘ２＝２．３２４　　　
レベル１１．８１．５１＋０．４０７ｘ８＝４．７６６
　　　レベル１１．２１．５１＋０．４０７Ｘ９＝５．
１７３　　　Ｌ／ベル１１．１そして、この小数第１位
まで近似したレベルより、正規化累積値５％に最も近く
なるレベルを求める。この例では、最適な正規化累積値
５％に最も近いレベルを求めると，レベル１１．１とな
る。1.51+0.407=l. 197 Level 11.91.51+0.407x2=2.324
Level 11.81.51+0.407x8=4.766
Level 11.21.51+0.407X9=5.
173 L/Bell 11.1 Then, from the levels approximated to the first decimal place, the level closest to the normalized cumulative value of 5% is found. In this example, the level closest to the optimal normalized cumulative value of 5% is found to be level 11.1.

このレベルから最適な閾値を求めるためのテーブル（閾
値テーブル１３）を参照し、最適な閾値を決定する。The optimum threshold value is determined by referring to a table (threshold value table 13) for determining the optimum threshold value from this level.

このような本実施例によっても、ワイヤドソ１〜プリン
タによる印字原稿のように印字状態の良くないノノＫ稿
の場合であっても、最適なる２値化のための閾値を自動
的に設定することができ、最も良い詔識串がｔ（漬られ
ることになる。According to this embodiment, the threshold value for optimal binarization can be automatically set even in the case of a non-K document with poor printing condition, such as a document printed by a wired printer. The best imperial skewers are t (pickled).

実施例−３特許請求の範囲の請求項（３）記載の発明の一実施例を
第５図ないし第８図により説明する。第１図及び第２図
で説明した部分と同様の部分は同一符号を用い、説明も
省略する。本実施例では、ＩＪ値計算部６Ｂにおいて最
適百分率一定値１１に代えて、線形レベル１５付きの回
帰直線メモリ１６が備えられている。Embodiment 3 An embodiment of the invention recited in claim (3) will be described with reference to FIGS. 5 to 8. Components similar to those described in FIGS. 1 and 2 are denoted by the same reference numerals, and description thereof will be omitted. In this embodiment, a regression line memory 16 with a linear level 15 is provided in place of the optimum percentage constant value 11 in the IJ value calculation unit 6B.

このような構成において、前述した実施例の場合と同様
に、濃いレベルからレベル１までのヒストクラムの累積
処理、ヒス１・クラム累積値の正規化処理を経て、レベ
ル１の累積値を１００として、各レベルでの累積値をレ
ベル１に対する百分率で表す。この場合、各レベルでの
百分率の値というのは、最もつぶれた文字に対するその
レベルでの文字のつぶれ（或いはかすれ）具合を表すこ
とになる。In such a configuration, as in the case of the above-mentioned embodiment, through the accumulation processing of histograms from the deep level to level 1 and the normalization processing of the histogram accumulation value, the accumulation value of level 1 is set to 100, The cumulative value at each level is expressed as a percentage of level 1. In this case, the percentage value at each level represents the degree of blurring (or blurring) of the characters at that level relative to the most blurred character.

ここに、この百分率とａ度レヘルとの関係を調へると，
あるレベルにおいて線形性が見られるところがある。こ
の部分のみを取り出して最小２乗法により回帰直線を求
めると、第７図に示すように原稿の濃度によって回帰直
線の傾き、即ち変化率が異なるものである。即ち、淡い
原稿ほど傾きが急になり、原稿濃度が濃くなるに従って
傾きが緩くなることが判る。このことを利用すれば、原
稿の濃度の特性を表すことができる。Now, when we examine the relationship between this percentage and the degree a degree, we get
There is some linearity at some level. When only this portion is extracted and a regression line is determined by the least squares method, the slope of the regression line, that is, the rate of change, differs depending on the density of the original, as shown in FIG. 7. That is, it can be seen that the lighter the document, the steeper the slope, and the darker the document density, the gentler the slope. By utilizing this fact, it is possible to represent the density characteristics of the original.

そこで，本実施例では、第８図に示すような原稿の最適
閾値と回帰直線の傾きとの関係を予め調べて（線形性が
見られるレベルは各スキャナの特性により異なる），そ
の関係を登録したテーブルを閾値テーブル１３Ｂとして
備える。そして、求めた正規化累積値と濃度レベルとの
間にみられる線形性を利用し，回帰直線を計算する（線
形レベル１５付き回帰直線メモリ１６はこの計算のため
に設けたものである）。計算した回帰直線の傾きを用い
て閾値テーブル１３Ｂを参照し、最適閾値を得る。Therefore, in this embodiment, the relationship between the optimal threshold value of the original and the slope of the regression line as shown in FIG. This table is provided as the threshold table 13B. Then, a regression line is calculated using the linearity found between the normalized cumulative value and the concentration level (the regression line memory 16 with linear level 15 is provided for this calculation). The optimum threshold value is obtained by referring to the threshold value table 13B using the calculated slope of the regression line.

実施例４特許請求の範囲の請求項（４）記載の発明の一実施例を
第９図から第１１図により説明する。第９図に本実施例
を実施するブロック構成図を示す。Embodiment 4 An embodiment of the invention recited in claim (4) will be described with reference to FIGS. 9 to 11. FIG. 9 shows a block diagram for implementing this embodiment.

本実施例は、多値画像読込み部２１か６２値画像出力部
２２までの処理に関するものである。概略的には、まず
，多値画像読込み部２１にてスキャナ２３から多値量子
化された画像を所定の固定ライン数分読込み、多値イメ
ージメモリ２４に保有する。このように読み込まれた固
定ライン数分の多値画像を、小領域分割部２５により固
定の小領域に分割する。分割された小領域の画像に対し
、濃度ヒス１〜グラム計数部２６で各々の濃度レベルの
画素数を計数する。そして、累積ヒストグラム計算部２
７において濃い画素の方から画素数を累積していき、正
規化累積値計算部２８において入力画像の密度を正規化
した割合を求め、閾値計算部２９において、求められた
止規化累積値と基準止規化累積値３６から最適閾値を計
算し、２値化部３０によって多値画像を２値化し，２値
イメージメモリ１１に保自する。その後、処理を隣の小
領域に移して同様の２値化を行う。読み込んだ多値画像
に対して全ての領域の２値化が終わった後、次の固定ラ
イン数分を読み込み、同様の処理を行う。これを〃κ稿
全体に対して行い、最適な２値画像を生成する。そして
、この２値画像を文字認識部３２などに送出し文字認識
等の処理に供する。This embodiment relates to processing up to the multi-value image reading section 21 or the 62-value image output section 22. Briefly, first, the multi-value image reading section 21 reads a predetermined fixed number of lines of a multi-value quantized image from the scanner 23 and stores it in the multi-value image memory 24 . The multivalued image for the fixed number of lines read in this way is divided into fixed small regions by the small region dividing section 25. The density hiss 1 to gram counter 26 counts the number of pixels at each density level for the divided small area image. Then, the cumulative histogram calculation unit 2
7, the number of pixels is accumulated starting from the darker pixels, the normalized cumulative value calculating section 28 calculates the normalized ratio of the density of the input image, and the threshold calculating section 29 calculates the normalized cumulative value and the normalized cumulative value. An optimal threshold value is calculated from the reference standardization cumulative value 36, and the multivalued image is binarized by the binarization unit 30 and stored in the binary image memory 11. Thereafter, processing is transferred to the adjacent small area and similar binarization is performed. After the binarization of all areas of the read multivalued image is completed, the next fixed number of lines are read and the same processing is performed. This is done for the entire κ draft to generate an optimal binary image. This binary image is then sent to a character recognition unit 32 or the like for processing such as character recognition.

ここに、前記濃度ヒストグラム計数部２６と累積ヒスト
グラム計算部２７には濃度ヒストタラムメモリ３３が接
続され、累積ヒストグラム計算部２７と正規化累積値計
算部２８には累積ヒストグラムメモリ３４が接続され、
正規化累積値計算部２８と閾値計算部２９には正規化累
積値メモリ３５が接続され，さらに．閾値計算部２９に
は基準正規化累積値メモリ３６が接続されている。Here, a density histogram memory 33 is connected to the density histogram counting section 26 and the cumulative histogram calculating section 27, and a cumulative histogram memory 34 is connected to the cumulative histogram calculating section 27 and the normalized cumulative value calculating section 28.
A normalized cumulative value memory 35 is connected to the normalized cumulative value calculation section 28 and the threshold value calculation section 29, and further. A reference normalized cumulative value memory 36 is connected to the threshold calculation unit 29 .

このような構成において、本実施例による処理を，第１
０図のフローチャートを参照し、１６値の量子化画像を
扱う例で説明する。まず，スキャナ２３から多値画像を
固定ライン数分だけ読み込み、多値イメージメモリ２４
に保有する。多値イメージメモリ２４を小領域分割部２
５により、固定の小領域に分割する。分割された小領域
に対して濃度ヒストグラム計数部２６において、１６値
各々の濃度レベル（第１０図のｃｏｎ）の画素数（第１
０図のＱｖ）を計数し、濃度ヒストグラムメモリ３３に
保有する。次に、この濃度ヒストグラムメモリ３３から
各レベルの画素数を読み込み累積ヒストグラム計算部２
７において濃いレベルの画素から画素数を累積していき
、各レベル毎に累積値（第１０図のｓＱｖ）を求めて，
累積ヒストグラムメモリ３４に保有する。この各レベル
毎の累積値は、そのレベル（第１０図の１）を閾値とし
た時の黒画素数に相当することになる。このような画素
の累積処理は、濃いレベルからレベル１なる淡レベルま
で行う。レベル１の累積数は、レベルＯに対する文字及
びその周辺部（或いはノイズ）の黒画素であるが、これ
は原稿中の文字数或いは１文字についての文字の大きさ
、複雑さ、画数などによって変化することになる。この
ため、次に各レベルでの累積値を、レベル１の累積値を
基準として正規化する。即ち、累積ヒストグラムメモリ
３４から累積値を読み込み、正規化累積値計算部８にお
いて、レベル１までの累積値に対すル各レベルでの累積
値の割合を求める。レベル１までというのは，文書画像
の一度分布としてはレベル０の画素が白地部分に相当し
圧倒的に多いため，これを除いて考えるためである。具
体的には，レベル１の累積値を１００として、各レベル
での累積値をレベル１に対する百分率で表す。第１０図
１１Ｊ．　　”　ｒ　ｆｌ　ｖ　”は正規化累積値の百
分率を示し、あるレベルｊおける百分率”ｒＱｖ［ｊ］
’″はＳＱｖ　［ｊｌ　／ｓ　Ｑｖ　［１コに基づき算
出される。In such a configuration, the processing according to this embodiment is performed in the first
An example of handling a 16-value quantized image will be explained with reference to the flowchart in FIG. First, a fixed number of lines of multivalued image are read from the scanner 23, and the multivalued image memory 24 is loaded with a fixed number of lines.
to be held in The multivalued image memory 24 is divided into small areas by the small area dividing unit 2.
5, the area is divided into fixed small areas. For the divided small areas, the density histogram counting unit 26 calculates the number of pixels (first
Qv) in Figure 0 is counted and stored in the density histogram memory 33. Next, the cumulative histogram calculation unit 2 reads the number of pixels of each level from the density histogram memory 33.
7, accumulate the number of pixels starting from the dark level pixels, calculate the cumulative value (sQv in Figure 10) for each level,
The cumulative histogram memory 34 holds the cumulative histogram. This cumulative value for each level corresponds to the number of black pixels when that level (1 in FIG. 10) is taken as a threshold. Such pixel accumulation processing is performed from a dark level to a light level (level 1). The cumulative number of level 1 is the black pixels of characters and their surroundings (or noise) for level O, but this varies depending on the number of characters in the manuscript or the size, complexity, number of strokes, etc. of each character. It turns out. For this reason, next, the cumulative value at each level is normalized using the cumulative value of level 1 as a reference. That is, the cumulative value is read from the cumulative histogram memory 34, and the normalized cumulative value calculating section 8 calculates the ratio of the cumulative value at each level to the cumulative value up to level 1. The reason for limiting the number to level 1 is to consider excluding level 0 pixels, which correspond to the white background area and are overwhelmingly numerous in the distribution of a document image. Specifically, the cumulative value of level 1 is set to 100, and the cumulative value of each level is expressed as a percentage of level 1. Figure 10 11J. “r fl v ” indicates the percentage of the normalized cumulative value, and the percentage at a certain level j “rQv[j]
''' is calculated based on SQv [jl /s Qv [1 piece.

この百分率を「正規化累積値」と称することとする。す
ると、各レベルでの正規化累積値というのは、最もつぶ
れた文字に対するそのレベルでの文字のつぶれ（或いは
かすれ）具合を表すことになる。This percentage will be referred to as a "normalized cumulative value." Then, the normalized cumulative value at each level represents the degree of blurring (or blurring) of the character at that level relative to the most blurred character.

次に閾値計算部２９において，正規化累積値メモリ３５
から正規化累積値を読み込み、予め求められている最も
よい状態になっている文字での正規化累積値（基準正規
化累積値メモリ３６の内容で、前記実施例１における最
適百分率一定値１２に相当する）に最も近い値をとるレ
ベルを最適閾値と決定する。Next, in the threshold calculation unit 29, the normalized cumulative value memory 35
The normalized cumulative value is read from , and the normalized cumulative value for the character in the best state determined in advance (the content of the reference normalized cumulative value memory 36 is set to the optimal percentage constant value 12 in the first embodiment). The level that takes the value closest to (equivalent to) is determined as the optimal threshold.

次に，２値化部３０において、上記の如く得られた最適
閾値によって当該小領域の多値画像を２値化し、２値イ
メージメモリ３１に保有する。隣の小領域についても、
同様の処理を行う。そして、読み込まれた多値画像が全
て２値化されたら、次の固定ライン数分の多値ｉｉｊｉ
ｉ像を多値イメージメモリ２４に読み込む。この多値画
像に対しても同様の処理を行う。これを原稿全体に対し
て行う。Next, the binarization unit 30 binarizes the multivalued image of the small area using the optimal threshold value obtained as described above, and stores it in the binary image memory 31. Regarding the adjacent small area,
Perform similar processing. Then, when all the read multi-value images have been binarized, the multi-value iiji for the next fixed number of lines is
The i-image is read into the multivalued image memory 24. Similar processing is performed on this multivalued image as well. Do this for the entire manuscript.

本実施例によれば、ワイヤドットプリンタによる印字原
稿のように印字状態の良くない原稿の場合であっても，
最適なる２値化の閾値を自動的に設定することができ、
最も良い認識率が得られることになる。特に，このよう
な処理が、原稿画像全体について一括して行うのではな
く、固定の小領域に分割して各小領域毎に最適閾値を求
めて２値化処理するので、局所的な濃度変化に対応した
最適な２値画像が得られることになる。According to this embodiment, even if the document is printed in poor quality, such as a document printed by a wire dot printer,
The optimal binarization threshold can be automatically set,
The best recognition rate will be obtained. In particular, this kind of processing is not performed on the entire original image at once, but rather on dividing it into fixed small areas and binarizing it by finding the optimal threshold for each small area, so local density changes can be avoided. An optimal binary image corresponding to the above can be obtained.

以上説明した本実施例の局所的最適２値化処理の概念図
を第１１図に示す。入力画像は図示のように多数の固定
小領域に縦横に分割され，各小領域ごとに最適閾値によ
り２値化が行われることになる。FIG. 11 shows a conceptual diagram of the local optimal binarization processing of this embodiment described above. As shown in the figure, the input image is divided vertically and horizontally into a large number of fixed small areas, and binarization is performed using an optimal threshold value for each small area.

実施例５特許請求の範囲の請求項（５）記載の発明の一実施例を
第１２図及び第１３図により説明する。第９図及び第１
０図で説明した部分と同一部分で同一符号を用い、説明
も省略する。本実施例では、閾値計算部２９Ａに基準正
規化累積値メモリ３６に代えて、最漠レベル計算部１７
と閾値テーブル１８とを接続して設けたものである。Embodiment 5 An embodiment of the invention set forth in claim (5) will be described with reference to FIGS. 12 and 13. Figure 9 and 1
The same reference numerals are used for the same parts as those explained in FIG. In this embodiment, instead of the reference normalized cumulative value memory 36 in the threshold value calculation unit 29A, the remotest level calculation unit 17
and the threshold table 18 are connected to each other.

このような構成において、本実施例にあっても、入力画
像の小領域別に、各レベル毎の正規化累積値を求めるま
での処理は、前記実施例４の場合と同様であるが、基準
正規化累積値を参考にして最適閾値を決定する代わりに
、前記実施例２と同様に正規化累積値がほんの小さな値
をとるレベル、即ち、画像としては最もかすれていて消
える寸前のレベル（最濃画素出現濃度レベル）に注目し
て、最適閾値を決定しようとするものである。これは、
その原稿にとって最も濃い画素が現れ始めるレベルとい
うことになる。In this embodiment, the processing up to finding the normalized cumulative value for each level for each small region of the input image is the same as in the fourth embodiment, but the reference normalization is Instead of determining the optimal threshold value with reference to the normalized cumulative value, as in Example 2, the normalized cumulative value is determined at a level where the normalized cumulative value takes a very small value, that is, the level at which the image is the faintest and about to disappear (the darkest). This method attempts to determine the optimal threshold value by paying attention to the pixel appearance density level. this is,
This is the level at which the darkest pixels begin to appear for the original.

この閾値決定処理の内容を具体例にて説明する。The contents of this threshold value determination process will be explained using a specific example.

いま、仮に、このレベルでの正規化累積値を５％に設定
したとする。各レベルでの止規化累積値と、この止規化
累積値５％との差が最も小さくなるようなレベルを最濃
レベル計算部３７で求める。ここに，スキャナ２３によ
っては、各レベルにおけるスパンが異なるものであるの
で、より精度を追及するためには、止規化累積値が５％
に最も近いレベルを求めた後，その次に近いレベルの止
規化累積値との差を１０で割った値から濃度レベルを計
算によって小数第１位まで近似して求める。前述の第１
表の例によれば、最も近いレベルが５．５８％のレベル
１１であり、その次に近いレベルが１．５１％のレベル
１２であるので、その差を１０で割算した値（５．５８
−１．５１）／１０＝０．４０７を用いて、レベル１１
からレベル１２までを，下記のような計算によって求め
る。Now, let us assume that the normalized cumulative value at this level is set to 5%. The darkest level calculation unit 37 determines the level at which the difference between the regularized cumulative value at each level and the regularized cumulative value of 5% is the smallest. Here, since the span at each level differs depending on the scanner 23, in order to pursue higher accuracy, the normalization cumulative value should be set at 5%.
After determining the level closest to the level, the density level is calculated by approximating the difference from the normalized cumulative value of the next closest level by 10 to the first decimal place. The first mentioned above
According to the example in the table, the closest level is level 11 with 5.58%, and the next closest level is level 12 with 1.51%, so the difference is divided by 10 (5. 58
-1.51)/10=0.407, level 11
to level 12 are determined by the following calculation.

１．５１＋０．４０７　　　＝ｌ．１９７　　　レベル
１１．９１．５１＋０．４０７Ｘ２＝２．３２４　　　
レベル１１．８１．５１＋０．４０７Ｘ８＝４．．７６
６　　　レベル１１．２１．５１＋０．４０７ｘ９＝５
．１７３　　　レベル１１．１そして、この小数第１位
まで近似したレベルにおいて、正規化累積値５％に最も
近くなるレベルを求める。この例では、最適な正規化累
積値５％に最も近いレベルを求めると、レベル１１．１
となる。このような処理を最濃レベル計算部３７により
行う。そして、閾値計算部２９Ａにおいて、このレベル
から最適閾値を求めるテーブル（ＩＪ値テーブル３８）
を参照して、当詠小領域の最適閾値を決定する。この閾
値テーブル：３８は予め実験により第２表に示すように
求められたものである。1.51+0.407=l. 197 Level 11.91.51+0.407X2=2.324
Level 11.81.51+0.407X8=4. ．． 76
6 Level 11.21.51+0.407x9=5
．． 173 Level 11.1 Then, among the levels approximated to the first decimal place, find the level that is closest to the normalized cumulative value of 5%. In this example, when finding the level closest to the optimal normalized cumulative value of 5%, the level is 11.1.
becomes. Such processing is performed by the darkest level calculation section 37. Then, in the threshold value calculation unit 29A, a table (IJ value table 38) for calculating the optimal threshold value from this level.
, and determine the optimal threshold value for the relevant small area. This threshold value table: 38 was determined in advance through experiments as shown in Table 2.

この例では最適な閾値（スキャナ読取りレベル）は７と
なる。In this example, the optimal threshold (scanner reading level) is 7.

第２表　閾値テーブル本実施例によっても、ワイヤドッ１−プリンタによる印
字原稿のように印字状態の良くない原稿の場合であって
も、最適なる２値化の閾値を自動的に設定することがで
き、最も良い認識率が得られる。Table 2: Threshold Table According to this embodiment, it is possible to automatically set the optimal threshold for binarization even in the case of a document with poor printing quality, such as a document printed by a wire dot printer. , the best recognition rate is obtained.

尖剖彬一特許請求の範囲の請求項（６）記載の発明の一実施例を
第１４図及び第１５図により説明する。本実施例におい
ては、閾値計算部９Ｂに基準正規化累積値メモリ３６に
代えて、回帰直線計算部３９と閾値テーブル４０を設け
たことが前記実施例４との構成上の違いである。前記実
施例との処理内容の違いは、各小領域毎に求めた止規化
累積値より最適な閾値を決定する処理だけである。An embodiment of the invention recited in claim (6) of the Patent Claims will be described with reference to FIGS. 14 and 15. This embodiment differs from the fourth embodiment in that a regression line calculation section 39 and a threshold table 40 are provided in the threshold calculation section 9B instead of the reference normalized cumulative value memory 36. The only difference in processing content from the above-mentioned embodiment is the processing of determining the optimal threshold value from the normalized cumulative value obtained for each small region.

この閾値決定処理の内容は基本的にはＩ１ｉ記実施例３
と同じであるが、回帰直線の傾きの計算を回帰直線計算
部３９で実行し、求められた回帰直線の傾きを用いて閾
値計算部２９Ｂが閾値テーブル４０を参照し、当該小領
域の最適な閾値を決定する。The contents of this threshold value determination process are basically described in Example 3 of I1i.
However, the regression line calculation unit 39 calculates the slope of the regression line, and the threshold calculation unit 29B refers to the threshold table 40 using the calculated slope of the regression line, and calculates the optimum value for the small area. Determine the threshold.

なお、閾値テーブル４０は前記実施例３における閾値テ
ーブル１３Ｂに相当するもので、その一例を第３表に示
す。Note that the threshold table 40 corresponds to the threshold table 13B in the third embodiment, and an example thereof is shown in Table 3.

第３表　閾値テーブル実施例７特許請求の範囲の請求項（７）及び（８）記載の発明の
一実旅例を第１６図ないし第１８図により説明する。Table 3 Threshold Table Example 7 An example of the invention described in claims (7) and (8) will be explained with reference to FIGS. 16 to 18.

まず第１８図により、処理の概略を説明する。First, the outline of the process will be explained with reference to FIG.

本実施例においては、多値画像を所定の固定ライン数分
読み込み、この部分画像（第１８図のａ，ｂ，ｃなど）
を小領域に分割する。そして、各小領域が文字領域であ
るか否かの判定を行い、文字領域と判定された小領域の
個数が一定値未満であるか、それを越えるか調へる。第
１８図に示すａのような部分画像では、文字領域の個数
が一定値未満であるので、多値画像の次の固定ライン数
分を読み込み、同様の処理を行う。第１８図のｂのよう
に、読み込んだ部分画像における文字領域の個数が一定
値を越えた場合、その文字領域としての小領域を統合し
た領域について前記実施例４と同様の方法によって最適
閾値を決定する。この決定された最適閾値は、画像全体
の対するものである。In this embodiment, a multivalued image is read for a predetermined fixed number of lines, and these partial images (a, b, c in FIG. 18, etc.)
Divide into small regions. Then, it is determined whether each small area is a character area or not, and it is checked whether the number of small areas determined to be character areas is less than or exceeds a certain value. In a partial image such as a shown in FIG. 18, the number of character areas is less than a certain value, so the next fixed number of lines of the multivalued image are read and the same processing is performed. If the number of character areas in the read partial image exceeds a certain value, as shown in FIG. decide. This determined optimal threshold is for the entire image.

このようにして、ある部分画像の読込み時点で最適閾値
が決定されると、閾値決定の処理は終了し、決定した閾
値を画像入力用のスキャナに読取りレベル（２値化スラ
イスレベル）として設定し、スキャナに再度画像の読取
りを行わせることにより。最適閾値により２値化画像を
スキャナより直接的に入力する。In this way, when the optimal threshold value is determined at the time of reading a certain partial image, the threshold value determination process ends, and the determined threshold value is set as the reading level (binarization slice level) in the scanner for image input. , by having the scanner read the image again. A binarized image is directly input from a scanner using an optimal threshold value.

次に本実施例について詳細に説明する。第１６図は本実
施例の横成を示すブロック図であるが、前記実施例４の
ブロック図である第９図における符号と同一の符号は同
様部分であるので、その説明を省略する。前記実施例４
との構成の違いは、文字領域であるか否かを判定するた
めの文字領域判定部４１と、２値画像をスキャナ２３よ
り読み込むための２値画像読込み部４２とが追加されて
いることと、累積ヒス１〜グラム計算部２７Ｃの作用が
部分的に変更されていることである。Next, this embodiment will be explained in detail. FIG. 16 is a block diagram showing the construction of this embodiment. The same reference numerals as those in FIG. 9, which is a block diagram of the fourth embodiment, represent the same parts, so a description thereof will be omitted. Said Example 4
The difference in configuration is that a character area determination unit 41 for determining whether or not the area is a character area, and a binary image reading unit 42 for reading a binary image from the scanner 23 are added. , the operation of the cumulative histogram calculation unit 27C is partially changed.

以下、処理内容を説明する。処理のフローチャートは第
１７図に示す。The details of the processing will be explained below. A flowchart of the process is shown in FIG.

前記実施例４におけると同様に，多値画像情報読込み部
２１にてスキャナ２３より多値画像を固定ライン数分だ
け読み込み、多値イメージメモリ２４に保有し、小領域
分割部２５により多値イメージメモリ２４を固定の小領
域に分割し、濃度ヒストグラム計数部２６にて一つの小
領域の濃度ヒストグラムｌｖ［ｃｏｎ］を作成して濃度
ヒストクラムメモリ３３に保有する。As in the fourth embodiment, the multi-value image information reading section 21 reads the multi-value image for a fixed number of lines from the scanner 23, stores it in the multi-value image memory 24, and divides the multi-value image into the multi-value image by the small area dividing section 25. The memory 24 is divided into fixed small areas, and the density histogram counting section 26 creates a density histogram lv[con] of one small area and stores it in the density histogram memory 33.

一つの小領域の濃度ヒストグラムが得られるたびに，文
字領域判定部４１にて、当該小領域のレベルＯの画素数
Ｑｖ［Ｏ］を濃度ヒストグラムメモリ３３より読込み判
定閾値ＣＨＲと比較し、Ｑｖ［０］≧ＣＨＲのときは当
該小領域を文字領域と判定し，文字領域数ｃｈａｒ　　
ｎｕｍをインクリメントする。レベル０の画素というの
は白地部分の画素に相当し、文字領域には白地の画素が
多数存在するが、写真領域には白地が殆ど存在しないこ
とから、このような文字領域判定が可能である。文字領
域と判定した小領域に対してのみ、累積ヒス１〜グラｌ
１計算部２７Ｃにおいて、前記実施例４と同様の累積ヒ
ストグラムｓＱｖ［ｉ］を作成し累拭ヒストグラムメモ
リ３４に格納する。Every time the density histogram of one small area is obtained, the character area determination unit 41 reads the number of pixels Qv[O] of the level O of the small area from the density histogram memory 33 and compares it with the determination threshold CHR, and calculates Qv[ 0]≧CHR, the small area is determined to be a character area, and the number of character areas char
Increment num. Pixels at level 0 correspond to pixels on a white background, and while there are many white background pixels in text areas, there are almost no white backgrounds in photo areas, so this type of text area determination is possible. . Cumulative histograms from 1 to 1 are applied only to small areas determined to be text areas.
1 calculation unit 27C creates a cumulative histogram sQv[i] similar to that in the fourth embodiment and stores it in the cumulative histogram memory 34.

同様の処理を、多値イメージメモリ２４内の固定ライン
数分の部分画像の全小領域について繰り返す。最後の小
領域まで処理が終わると，文字領域判定部４１において
、文字領域数ｃｈａｒ　　ｎｕｍ）判定閾値Ｃ　Ｈ　Ｒ
　Ｔ　Ｈの判定を行う。判定結果がＮｏの場合、次の固
定ライン数分の多値画像の読み込みが行われ、この部分
画像に対する同様の処理が実行される。Similar processing is repeated for all small areas of the partial image for a fixed number of lines in the multi-valued image memory 24. When the processing is completed up to the last small area, the character area determination unit 41 determines the number of character areas (char num) determination threshold value C H R
Determine TH. If the determination result is No, the multivalued image for the next fixed number of lines is read, and the same process is executed for this partial image.

この判定の結果がＹＥＳの場合、累積ヒストグラム計算
部２７Ｃにおいて、文字領域と判定された全ての小領域
の累積ヒストグラムを読み込み、同じレベルでの累積値
を累積することにより、文字領域としての全小領域の統
合領域に対する各レベルでの累積ヒストグラムを作成し
、累積ヒストグラムメモリ３４に改めて格納する。この
各レベルでの累積値は、文字領域の統合領域内における
、そのレベルを閾値としたときの黒画素数を意味する。If the result of this determination is YES, the cumulative histogram calculation unit 27C reads the cumulative histograms of all the small areas determined to be character areas and accumulates the cumulative values at the same level. A cumulative histogram at each level for the integrated area of the area is created and stored anew in the cumulative histogram memory 34. The cumulative value at each level means the number of black pixels in the integrated area of the character area when that level is taken as a threshold.

このように累積ヒス１・ダラムの統合（第１７図のｓＱ
ｖ統合）の処理が終わると、得られた統合累積ヒストグ
ラムに対して前記実施例４におけると同様の正規化処理
が正規化累積値計算部２８により行われ、得られた正規
化累積値が正規化累積値メモリ３５に格納される。そし
て、閾値計算部２９により前記実施例４におけると同様
の方法で最適閾値が決定される。In this way, the integration of cumulative His 1 and Durham (sQ in Figure 17)
When the process of (integration) is completed, the normalized cumulative value calculation unit 28 performs the same normalization process as in the fourth embodiment on the obtained integrated cumulative histogram, and the obtained normalized cumulative value is normalized. stored in the cumulative value memory 35. Then, the optimum threshold value is determined by the threshold value calculation unit 29 using the same method as in the fourth embodiment.

最適閾値が決定されると、２値画像読込み部４２は、こ
の最適閾値を読取りレベルとして設定してスキャナ２３
にｆＪＫ稿の読み取りを再度行わせ、最適閾値による２
値画像をスキャナ２３より直接読み込み、２値イメージ
メモリ３１に格納する。When the optimal threshold value is determined, the binary image reading unit 42 sets this optimal threshold value as the reading level and reads the scanner 23.
to read the fJK manuscript again, and set 2 using the optimal threshold.
A value image is directly read from a scanner 23 and stored in a binary image memory 31.

なお、固定ライン数分毎のスキャナ２３の読み取り回数
が計数値を越えても、最適閾値を決定できない場合、固
定の読取りレベルにてスキャナ２３により２値画像を読
み込むことになる。Note that even if the number of readings by the scanner 23 per fixed number of lines exceeds the count value, if the optimum threshold value cannot be determined, the binary image is read by the scanner 23 at a fixed reading level.

本実施例によれば、前記実施例４等に比べて多値画像の
処理部分が少なくなるので、処理の高速化を図ることが
できる。また、文字領域としての小領域を統合した領域
に基づいて最適閾値を決定するため、領域ごとの閾値決
定のバラッキが少なくなり良質の２値画像を得られる。According to this embodiment, the number of processing parts for multivalued images is reduced compared to the fourth embodiment, etc., so that processing speed can be increased. Furthermore, since the optimal threshold value is determined based on an area that is a combination of small areas as character areas, there is less variation in threshold value determination for each area, and a high-quality binary image can be obtained.

宍−施−鮭旦特許請求の範囲の請求項（７）及び（８）記載の発明の
他の実施例を第１９図及び第２０図により説明する。Another embodiment of the invention described in claims (7) and (8) of the Shishi-Sei-Sametan patent will be described with reference to FIGS. 19 and 20.

本実施例の構成は、第１９図と第１６図とを対比すれば
明らかなように前記実施例７と殆ど同じであり，違いは
閾値決定に関連した部分が前記実施例５と同じ構成に変
更されていることである。As is clear from a comparison between FIG. 19 and FIG. 16, the configuration of this embodiment is almost the same as that of the seventh embodiment, and the difference is that the portion related to threshold determination is the same as that of the fifth embodiment. That has changed.

処理内容については，第２０図のフローチャートから明
らかなように、閾値決定の処理を除いては前記実施例７
と同一であり、閾値決定の処理は前記実施例５における
処理内容と同じである。Regarding the processing contents, as is clear from the flowchart in FIG.
The processing for determining the threshold value is the same as that in the fifth embodiment.

実施−例９特許請求の範囲の請求項（７）及び（８）記載の発明の
他の実施例を第２１図及び第２２図により説明する。Implementation Example 9 Another example of the invention described in claims (7) and (8) will be described with reference to FIGS. 21 and 22.

本実施例の構成は、第２１図と第１６図とを対比すれば
明らかなように前記実施例７と殆ど同じであり、違いは
閾値決定に関連した部分が前記実施例６と同じ構成に変
更されていることである。As is clear from a comparison between FIG. 21 and FIG. 16, the configuration of this embodiment is almost the same as that of the seventh embodiment, and the difference is that the portion related to threshold determination is the same as that of the sixth embodiment. That has changed.

処理内容については、第２２図のフローチャートから明
らかなように、閾値決定の処理を除いては前記実施例７
と同一であり、閾値決定の処理は前記実施例６における
処理内容と同じである。Regarding the processing contents, as is clear from the flowchart in FIG.
The processing for determining the threshold value is the same as that in the sixth embodiment.

実−施ｊｊ！ｊｌＯ− 特許請求の範囲の請求項（９）記載の一実施例を第２３
図及び第２４図により説明する。Implementation jj! jlO- An embodiment described in claim (9) of the claims is
This will be explained with reference to the drawings and FIG. 24.

本実施例の構成は、第２３図と第９図とを対比すれば明
らかなように前記実施例４と殆ど同じであり、違いは領
域分割を行わないので小領域分割部２５がなく、逆に、
累積値の正規化のための基準レベルを適応的に決定する
ためのノル準レベル計算部４３が追加されていることで
ある。As is clear from the comparison between FIG. 23 and FIG. 9, the configuration of this embodiment is almost the same as that of the fourth embodiment. To,
A norm level calculation unit 43 for adaptively determining a reference level for normalizing cumulative values is added.

処理内容については、第２４図のフローチャー１・から
明らかなように、領域分割を行オ）ずにヒス１−グラム
生成等が行われること、累積値の正規化のための基べＣ
！レベルの決定の部分が追加されたこと、及びここで決
定した基準レベルを用いて累積値の正規化処理が行われ
ることである。Regarding the processing contents, as is clear from the flowchart 1 in Fig. 24, histogram generation etc. are performed without performing region segmentation, and the base C for normalizing the cumulative value is used.
! A part for determining the level has been added, and the cumulative value is normalized using the reference level determined here.

まず、基準レベル決定の処理について説明する。First, the process of determining the reference level will be explained.

基へｑレベルとは、文書画像における地肌を除いた最も
淡い濃度レベルと定義される。これは、文書画像におい
ては、レベルごとの画素数の中で、地肌に相当する画素
数が圧倒的に多いことから、画索数が極端に変わるとこ
ろを地肌と判断ずることが０■能である。The basic q level is defined as the lightest density level in a document image excluding the background. This is because in a document image, the number of pixels corresponding to the background is overwhelmingly large among the number of pixels for each level, so it is impossible to judge areas where the number of pixels changes drastically as the background. be.

そこで、基準レベル計算部４３においては，各レベルｊ
おける累積ヒストグラムの値（累積値）ｓＱｖ［ｊ］と
、その上のレベル（ｊ＋１）の累積値ｓＱｖ［ｉ＋ｌ］
を累積ヒストグラムメモリ３４より、低いレベルから順
次読み込み、ｓＡｖ［ｊ＋ｌコ　／ｓ　Ｑｖ［ｊｌ　　
≧ｐｔｈの判定処理を行う。そして，最初にこの判定条
件を満足したレベルｊを基準レベルとし、正規化累積値
計算部２８に設定する。Therefore, in the reference level calculation section 43, each level j
The value (cumulative value) of the cumulative histogram at sQv[j] and the cumulative value sQv[i+l] of the level (j+1) above it
are sequentially read from the cumulative histogram memory 34 from the lowest level, sAv[j+l/s Qv[jl
≧pth determination processing is performed. Then, the level j that first satisfies this determination condition is set as a reference level and set in the normalized cumulative value calculation unit 28.

この判定の閾値ｐｔｈは通常０．７５位に設定される．
Ｐｔｈ＝０．７５とし、第４表のような累積ヒストグラ
ムの場合，ｊ＝２において初めに判定条件を満たすので
、基準レベルはレベル２となる。The threshold value pth for this determination is normally set at 0.75.
In the case of Pth=0.75 and the cumulative histogram shown in Table 4, the determination condition is first met at j=2, so the reference level is level 2.

正規化累積値計算部２８は、第２４図のフローチャート
からも明らかなように、基準レベル計算部４３により設
定された基準レベルでの累積値をル準として、各レベル
での累積値の正規化を行う。As is clear from the flowchart of FIG. 24, the normalized cumulative value calculation unit 28 normalizes the cumulative value at each level using the cumulative value at the reference level set by the reference level calculation unit 43 as a standard. I do.

したがって、第４表に示した累積ヒス１一グラムの場合
、基準レベル＝２とすると、第４表に示すように正規化
累積値が得られる。Therefore, in the case of cumulative hysteresis of 11 grams shown in Table 4, if the reference level=2, the normalized cumulative value is obtained as shown in Table 4.

このように本実施例においては、正規化の基準レベルを
固定的にレベル１とするのではなく、原稿画像の地肌濃
度レベルに応じ適応的に決定するので，コピー原稿のよ
うに地肌ノイズがある原稿に対しても最適な閾値による
２値画像を得ることができ、高い認識率を達成できる。As described above, in this embodiment, the reference level for normalization is not fixedly set to level 1, but is determined adaptively according to the background density level of the original image. It is possible to obtain a binary image using an optimal threshold value even for a document, and a high recognition rate can be achieved.

第４表　正規化累積値の例実施例↓−↓ 特許請求の範囲の請求項（９）記載の発明の他の実施例
を第２５図及び第２６図により説明する。Table 4 Examples of Normalized Cumulative Values Embodiments ↓−↓ Another embodiment of the invention set forth in claim (9) will be described with reference to FIGS. 25 and 26.

本実施例の構成は、第２５図と第２３図とを対比すれば
明らかなように前記実施例１０と殆ど同じであり、違い
は閾値決定部分を前記実施例５と同じ構成に変更したこ
とである。As is clear from a comparison between FIG. 25 and FIG. 23, the configuration of this embodiment is almost the same as that of the tenth embodiment, and the difference is that the threshold value determination part has been changed to the same structure as the fifth embodiment. It is.

処理内容については、第２６図のフローチャートから明
らかなように、前記実施例５におけると同様な方法で閾
値決定を行うことが前記実施例１０と違うだけである。Regarding the processing contents, as is clear from the flowchart of FIG. 26, the only difference from the tenth embodiment is that the threshold value is determined in the same manner as in the fifth embodiment.

実施例１２特許請求の範囲の請求項（９）記載のもう一つの実施例
を第２７図及び第２８図により説明する。Embodiment 12 Another embodiment described in claim (9) will be described with reference to FIGS. 27 and 28.

本実施例の構成は，第２７図と第２３図とを対比すれば
明らかなように前記実施例１０と殆ど同じであり、違い
は閾値決定部分を前記実施例６と同じ構成に変更したこ
とである。As is clear from a comparison between FIG. 27 and FIG. 23, the configuration of this embodiment is almost the same as that of the tenth embodiment, and the difference is that the threshold determination part has been changed to the same structure as the sixth embodiment. It is.

処理内容については、第２８図のフローチャートから明
らかなように，前記実施例６におけると同様な方法で閾
値決定を行うことが前記実施例１０と違うだけである。Regarding the processing contents, as is clear from the flowchart of FIG. 28, the only difference from the tenth embodiment is that the threshold value is determined in the same manner as in the sixth embodiment.

なお、前記各実施例を組合せた構成も可能である。例え
ば、前記実施例１，２，３，ｔｏ，１１または１２にお
いて、前記実施例７．８または９と同様に２値画像をス
キャナより直接入方させてもよい。また、前記実施例１
から９において、前記実施例１０．１１または１２と同
様に正規化の基準レベルを決定してもよい．〔発明の効果〕以上説明した如く、請求項（１）または（４）の発明に
よれば、閾値を濃レベルから淡レベルに変化させた時の
黒画素数を計数し、最も淡いレベルの画素以外を黒画素
としたものを基準として各閾値における黒画素数を割合
で表して人カ画像の密度を正規化し．各閾値における黒
画素数の割合から最適な閾値を求め、請求項（２）また
は（５）記載の発明によれば、各閾値における黒画素数
の割合に代えて、最も濃い画素が現れ始めるレベルを用
いて最適な閾値を求め、請求項（３）または（６）記載
の発明によれば、各閾値における黒画素数の割合の変化
率から原稿漠度を判別して最適な閾値を求める。Note that a configuration in which the above embodiments are combined is also possible. For example, in the embodiments 1, 2, 3, to, 11, or 12, the binary image may be input directly from the scanner as in embodiments 7, 8, or 9. Moreover, the above-mentioned Example 1
to 9, the reference level for normalization may be determined in the same manner as in Example 10.11 or 12. [Effects of the Invention] As explained above, according to the invention of claim (1) or (4), the number of black pixels is counted when the threshold value is changed from the dark level to the light level, and the pixel at the lightest level is counted. The density of the human image is normalized by expressing the number of black pixels at each threshold as a percentage, with all other pixels as black pixels. The optimal threshold value is determined from the ratio of the number of black pixels at each threshold value, and according to the invention described in claim (2) or (5), instead of the ratio of the number of black pixels at each threshold value, the level at which the darkest pixel begins to appear is determined. According to the invention described in claim (3) or (6), the document vagueness is determined from the rate of change in the ratio of the number of black pixels at each threshold value to determine the optimal threshold value.

したがって、ワイヤドットプリンタによる印字原稿のよ
うに印字状態の良くない原稿に対しても最適な２値化の
閾値を自動的に設定することができ、最も良い詔識率を
得ることができる。Therefore, it is possible to automatically set the optimum binarization threshold value even for a document whose printing condition is not good, such as a document printed by a wire dot printer, and to obtain the best recognition rate.

請求項（４），（５）または（６）記載の発明によれば
、原稿画像を小領域に分割して各小領域に最適閾値を求
めて各々２値化処理するので、１枚のｍ槁中で印字状態
にムラがある場合や、入力装置のシェーディングなどで
画像の濃度値が変化する場合の如く、局所的な濃度変化
に対応した最適な２値画像を得ることができ、最もよい
認識率が得られることになる。According to the invention as set forth in claim (4), (5) or (6), the original image is divided into small regions, the optimal threshold value is determined for each small region, and the respective binarization processing is performed. It is possible to obtain the optimal binary image that corresponds to local density changes, such as when there is uneven printing in the middle of the paper, or when the image density value changes due to shading of the input device, etc. The recognition rate will be obtained.

請求項（７）記載の発明によれば、画像の部分内の文字
領域と判定された小領域の統合領域についての処理によ
って画像全体の最適閾値を求めるため、閾値決定のため
の処理量を少なくすることにより、処理の高速化を図る
ことができるとともに、文字領域と判定された小領域を
統合して扱うため、領域ごとの閾値決定がバラッキを少
なくなり、良好な２値画像が得られる。According to the invention set forth in claim (7), the optimal threshold value for the entire image is determined by processing the integrated region of the small regions determined to be character regions within the image portion, so that the amount of processing for determining the threshold value is reduced. By doing so, it is possible to speed up the processing, and since the small areas determined to be character areas are handled in an integrated manner, there is less variation in threshold value determination for each area, and a good binary image can be obtained.

請求項（８）記戟の発明によれば、決定した最適閾値を
スキャナに設定し，スキャナより２値画像を直接入力す
るので、比較的低速な処理系によっても良好な２値画像
を高速に得られる。According to the invention set forth in claim (8), the determined optimal threshold value is set in the scanner, and the binary image is directly input from the scanner, so even a relatively slow processing system can produce a good binary image at high speed. can get.

請求項（９）記載の発明によれば、あるレベルを閾値と
して計数された黒画素数と次のレベルを閾値として計数
された黒画素数との割合に基づいて，文書画像における
地肌を除いた最も淡い濃度レベルを基準レベルとして入
力画像の密度の正規化を行うことができるので、地肌ノ
イズが存在する原稿の画像に対しても最適な閾値を決定
して２値化を行い，良好な認識率が得られるようになる
。According to the invention described in claim (9), the background in a document image is removed based on the ratio between the number of black pixels counted using a certain level as a threshold and the number of black pixels counted using the next level as a threshold. Since the density of the input image can be normalized using the lightest density level as the reference level, the optimal threshold value can be determined and binarized even for original images with background noise, resulting in good recognition. rate will be obtained.

[Brief explanation of drawings]

第１図及び第２図はそれぞれ請求項（１）記載の発明の
一実施例を示すフローチャート及びブロック図、第３図
及び第４図はそれぞれ請求項（２）記載の発明の一実施
例を示すブロック図及びフローチャート、第５図及び第
６図はそれぞれ請求項（３）記載の発明の一実施例を示
すブロック図及びフローチャ−１・、第７図は正規化累
積値と閾値との関係を示す特性図、第８図は回婦直線の
傾きと最適レベルとの相関を示す特性図、第９図及び第
１０図はそれぞれ請求項（４）記載の発明の一実施例を
示すブロック図及びフローチャート、第１１図は局所的
最適２値化方法の概念図、第１２図及び第１３図はそれ
ぞれ請求項（５）記載の発明の一実施例を示すブロック
図及びフローチャ−１−、第１６図及び第１７図はそれ
ぞれ請求項（７）及び（８）記載の発明の一実施例を示
すブロック図及びフローチャート、第１８図は最適２値
化方法の概念図、第１９図及び第２０図はそれぞれ請求
項（７）及び（８）記載の発明の第２の実施例を示すブ
ロック図及びフローチャート、第２１図及び第２２図は
それぞれ請求項（７）及び（８）記載の発明の第３の実
施例を示すブロック図及びフローチャート、第２３図及
び第２４図はそれぞれ請求項（９）記載の一実施例を示
すブロック図及びフローチャート、第２５図及び第２６
図はそれぞれ請求項（９）記載の発明の第２の実施例を
示すブロック図及びフローチャート、第２７図及び第２
８図はそれぞれ請求項（９）記載の発明の第３の実施例
を示すブロック図及びフローチャートである。１・多値画像読込み部、　　２・・・２値画像出力部、
３・・・スキャナ、　４・・多値イメージメモリ、５・
・諦度ヒストグラムカウント部、６・・・閾値計算部、　　７・・・２値化部、８・・２
値イメージメモリ、　９・・・文字認識部、１０・・累
積ヒストグラムメモＩＪ、１１・・正規化ヒス１一グラムメモリ、１２・・最適６
分率一定値、１３・・閾値テーブル。＊７図閣う１第８図９Ｑ兜（ａＺ＋ｔ［シイトＬへ／９ｒ−１伊０申）刈０
−ｌ第１１図〔ゲ／Ｔ１Ｉ７悶走２Ｊ恍２５９β魅゛区ｊ第ＩＳ図Ｃ４　コ≦％２１！ｒｃ才二ｆ；！！Ｌ４”ｉｊ第１４
図口：ゴ２３閣貨張１′ 口さゴ３２第１６図第１７図第１９図［コ２３庶ｉゴ・「］戸真団丹；！Ｅ巨ゴ了２第２０図ＣＩ二＝つ１次図＄２３図ｒ２３一匹抽＝丑・ロヨー？Ｊ閣］「だア１′２２［Ｑコー３・゜３第２５図口＝１２３［胚硅３０一Ｉ］３２第２６図〔：＝つ第２７図口＝１３２第３図 σ二＝ＤＣ＝ｉつ手続補正書（方式）平成元年１２月１８日1 and 2 are a flowchart and a block diagram showing an embodiment of the invention as claimed in claim (1), respectively, and FIGS. 3 and 4 show an embodiment of the invention as claimed in claim (2), respectively. 5 and 6 are block diagrams and flowcharts showing one embodiment of the invention as claimed in claim (3), respectively. FIG. FIG. 8 is a characteristic diagram showing the correlation between the slope of the circular line and the optimum level, and FIGS. 9 and 10 are block diagrams each showing an embodiment of the invention as claimed in claim (4). and a flowchart, FIG. 11 is a conceptual diagram of the local optimal binarization method, and FIGS. 16 and 17 are block diagrams and flowcharts showing an embodiment of the invention as claimed in claims (7) and (8), respectively, FIG. 18 is a conceptual diagram of the optimal binarization method, and FIGS. 19 and 20 are The figures are a block diagram and a flowchart showing a second embodiment of the invention as claimed in claims (7) and (8), respectively, and FIGS. 21 and 22 show a second embodiment of the invention as claimed in claims (7) and (8), respectively. A block diagram and a flowchart showing the third embodiment, FIGS. 23 and 24 are respectively a block diagram and a flowchart showing an embodiment according to claim (9), and FIGS.
The figures are a block diagram and a flowchart showing a second embodiment of the invention as claimed in claim (9), FIG. 27, and a second embodiment, respectively.
FIG. 8 is a block diagram and a flowchart, respectively, showing a third embodiment of the invention as claimed in claim (9). 1. Multivalued image reading section, 2... Binary image outputting section,
3...Scanner, 4...Multi-value image memory, 5...
- Resignation histogram counting unit, 6... Threshold calculation unit, 7... Binarization unit, 8...2
Value image memory, 9...Character recognition section, 10...Cumulative histogram memo IJ, 11...Normalized histogram memory, 12...Optimum 6
Constant fraction value, 13...Threshold value table. *7 Zukaku U1 8th Figure 9Q Kabuto (aZ+t [To Shiito L/9r-1 I0 Shin) Kari 0
-l Figure 11 [Ge/T1I7 Agony 2J 恍259β Attractive Ward j IS Figure C4 Ko≦%21! rc Saijif;! ! L4”ij 14th
Figure mouth: Go23 Kakakuhari 1' Mouth Sago32 Figure 16 Figure 17 Figure 19 1st figure $23 figure r23 One animal draw = Ox Royo? = Figure 27 Entrance = 132 Figure 3 σ2 = D C = i Procedural Amendment (Method) December 18, 1989

Claims

[Claims]

(1) In a binarization method that converts a multilevel quantized image into a black and white binary image, the number of black pixels is counted when the threshold value is changed from a dark level to a light level, and the pixels at the lightest level are Optimal 2 is characterized in that the density of the input image is normalized by expressing the number of black pixels at each threshold value as a percentage, with all other pixels as black pixels, and the optimal threshold value is determined from the ratio of the number of black pixels at each threshold value. Value method.

(2) In a binarization method that converts a multilevel quantized image into a black and white binary image, the number of black pixels is counted when the threshold is changed from a dark level to a light level, and the pixels at the lightest level are Optimization is characterized in that the density of the input image is normalized by expressing the number of black pixels at each threshold as a percentage, with all other pixels as black pixels, and the optimal threshold is determined using the level at which the darkest pixels begin to appear. Binarization method.

(3) In a binarization method that converts a multilevel quantized image into a black and white binary image, the number of black pixels is counted when the threshold value is changed from a dark level to a light level, and the pixels at the lightest level are The density of the input image is normalized by expressing the number of black pixels at each threshold as a percentage, with all other pixels as black pixels, and the document density is determined from the rate of change in the number of black pixels at each threshold to determine the optimal threshold. An optimal binarization method characterized by determining.

(4) In a binarization method that converts a multi-value quantized image into a black and white binary image, the multi-value image for a fixed small area is held in memory, and the threshold value is set to a dark level within the small area. Count the number of black pixels when changing from to light level, and normalize the density of the input image by expressing the number of black pixels at each threshold as a percentage, with pixels other than the lightest level as black pixels, The optimum threshold value for the small area is determined from the ratio of the number of black pixels at each threshold value, the optimum threshold value for the whole image is obtained by changing the small area, and each small area is binarized using the optimum threshold value for that small area. An optimal binarization method characterized by outputting a value image.

(5) In a binarization method that converts a multi-value quantized image into a black and white binary image, the multi-value image for a fixed small area is held in memory, and the threshold value is set to a dark level within the small area. Count the number of black pixels when changing from to light level, and normalize the density of the input image by expressing the number of black pixels at each threshold as a percentage, with pixels other than the lightest level as black pixels, Using the level at which the darkest pixel begins to appear, the optimum threshold value for the small area is determined, the small area is changed to find the optimum threshold value for the entire image, and each small area is binarized using the optimum threshold value for that small area. An optimal binarization method characterized by outputting a binary image.

(6) In a binarization method that converts a multi-value quantized image into a black and white binary image, the multi-value image for a fixed small area is held in memory, and the threshold value is set to a dark level within the small area. The density of the input image is normalized by counting the number of black pixels when changing from to light level, and expressing the black pixels at each threshold as a percentage based on the black pixels other than the lightest level pixel. The document density is determined from the rate of change in the proportion of the number of black pixels in the threshold value, and the optimal threshold value for the small area is determined.The optimal threshold value for the entire image is determined by changing the small area. An optimal binarization method characterized by outputting a binary image binarized using an optimal threshold value.

(7) Divide the image into multiple small areas, determine whether each small area is a text area, and when the number of small areas determined to be text areas exceeds a certain number, A claim (
An optimal binarization method characterized by determining an optimal threshold for binarizing an entire image by the method described in 1), (2), or (3).

(8) Claim characterized in that the determined optimal threshold value is set in the scanner, and the binary image is directly input from the scanner.
The optimal binarization method described in 1), (2) or (7).

(9) In the optimal binarization method according to claims (1) to (8), based on the ratio of the number of black pixels counted using a certain level as a threshold to the number of black pixels counted using the next level as a threshold. The method is characterized in that a reference level is determined, and the number of black pixels using this reference level as a threshold is used as a reference for normalizing the density of an input image, instead of using pixels other than those at the lightest level as black pixels. Optimal binarization method.