JPH08305794A

JPH08305794A - Address line extracting device

Info

Publication number: JPH08305794A
Application number: JP7105576A
Authority: JP
Inventors: Noboru Nakajima; 昇中島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-04-28
Filing date: 1995-04-28
Publication date: 1996-11-22
Anticipated expiration: 2013-11-25
Also published as: JP2827960B2

Abstract

PURPOSE: To provide an address line extracting device capable of highly accurately extracting an address line without being influenced by change in the size of address characters. CONSTITUTION: A labeling part 22 labels the image of an area including an address to obtain blocks. A block sorting part 23 locally groups these blocks and a character size estimating part 24 calculates reference block size to be the reference size of blocks in each group. A base block selecting part 26 selects a base block to be a block starting the integration of blocks. At the time of judging that an integrated block generated by integrating blocks from the base block by a line growing part 27 is probably an address line, a line inspecting part 29 outputs the integrated block, and at the time of judging that the integrated block is not probably an address line, allows respective parts to execute the reselection of a base block and the regeneration of an integrated block. When a base block can not be detected, the calculation of reference block size is retried.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、宛名行抽出装置に係わ
り、特に、郵便物の宛名行を抽出する宛名行抽出装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an address line extraction device, and more particularly to an address line extraction device for extracting address lines of a mail.

【０００２】[0002]

【従来の技術】文書画像認識装置においては、文書を走
査して得られるディジタル画像に対してレイアウト解析
処理を行った後に、抽出した個々の文字画像に対して文
字認識処理を行い、個々の文字画像に応じた文字コード
を得るといった手順で処理が進められる。ここで、レイ
アウト解析処理とは、入力画像から文字行を抽出し、書
式判定を行うまでの処理のことを示し、このレイアウト
解析処理に関する文献としては、たとえば、1985年に電
子情報通信学会論文誌（Vol.69-D No.11）の2123ページ
から2241ページに掲載された、馬場口らによる「手書き
文字列からの文字切り出しの基礎的考察」と題した論文
がある。2. Description of the Related Art In a document image recognition apparatus, a layout analysis process is performed on a digital image obtained by scanning a document, and then a character recognition process is performed on each extracted character image to obtain individual characters. The process proceeds in a procedure of obtaining a character code according to the image. Here, the layout analysis process refers to a process from extracting a character line from an input image to performing format determination. As a document related to the layout analysis process, for example, in 1985, the Institute of Electronics, Information and Communication Engineers (Vol.69-D No.11), pages 2123 to 2241 published by Babaguchi et al. Entitled "Basic Study of Character Extraction from Handwritten Character Strings".

【０００３】この論文には、郵便物の宛名のレイアウト
解析のための技術が開示されており、以下に記す手順に
従って、レイアウト解析が行われている。[0003] This paper discloses a technique for layout analysis of a mailing address, and the layout analysis is performed according to the procedure described below.

【０００４】郵便物表面を走査して得られたディジ
タル画像に対して、孤立点除去を施す。８連結ラベリングを行い連結成分の外接矩形を得
る。得られた矩形同士の重複した部分の面積がしきい値
以上ならこれらの矩形を統合する。矩形の中心座標を矩形面積で重み付けして、縦およ
び横方向に投影し、投影された点の集中の度合いの高い
点を宛名行の位置とする。最小距離にある全ての矩形の組について、矩形統合
後の矩形が正方形に近いなら統合を行う。矩形の面積の平均値を標準矩形面積とする。矩形の縦横比が既定値以上である場合、矩形内の画
像を矩形長辺へ投影し、最小値で矩形を分割する。文字列方向に矩形を統合して宛名行候補を生成す
る。Isolated point removal is performed on a digital image obtained by scanning the surface of a mail piece. 8-connected labeling is performed to obtain a circumscribed rectangle of connected components. If the areas of the overlapping portions of the obtained rectangles are equal to or larger than the threshold value, these rectangles are integrated. The center coordinates of the rectangle are weighted by the rectangle area and projected in the vertical and horizontal directions, and the point having a high degree of concentration of the projected points is set as the address line position. If the rectangles after rectangle integration are close to squares, the combination of all rectangles at the minimum distance is performed. The average value of the rectangular areas is defined as the standard rectangular area. When the aspect ratio of the rectangle is equal to or larger than the default value, the image in the rectangle is projected onto the long side of the rectangle, and the rectangle is divided by the minimum value. The rectangles are integrated in the character string direction to generate address line candidates.

【０００５】このように、この技術では、ラベリング処
理によって得られた連結成分を統合することによって、
宛名行の抽出を図っている。Thus, in this technique, the connected components obtained by the labeling process are integrated to obtain
We are trying to extract address lines.

【０００６】[0006]

【発明が解決しようとする課題】上記技術に代表され
る、連結成分の統合により宛名文字を検出する技術で
は、郵便物に関わらず宛名文字の大きさを固定サイズと
するものが多かったため、文字の大きさの変動の影響を
受けやすいものであった。また、行間接触の分離の際に
も、分離点を算出するために、文字サイズとして固定値
を用いていたため、切り出す位置の精度を劣化させる原
因となっていた。In the technology for detecting an address character by integrating connected components, which is represented by the above-mentioned technology, there are many cases where the size of the address character is fixed regardless of the postal matter. Was susceptible to fluctuations in size. Also, when separating line-to-line contact, a fixed value is used as the character size in order to calculate the separation point, which is a cause of degrading the accuracy of the cutout position.

【０００７】一般に、宛名を手書きしてある場合には、
一通の郵便物上に書かれている文字サイズのバリエーシ
ョンが著しく大きいことがよくある。このため、これま
での多くの方式のように、文字のサイズが文字ごとに変
動することを想定しておらず、単一の文字サイズを想定
して宛名行の抽出を行う技術では、実際の文字サイズの
バリエーションに対応することはできず、誤った宛名行
の抽出が行われてしまうことがあった。Generally, when the address is handwritten,
There are often significant variations in the font size that is printed on a piece of mail. Therefore, unlike many methods up to now, it is not expected that the character size will change from character to character, and the technology that extracts addressing lines assuming a single character size does not It was not possible to cope with variations in character size, and there were cases where the wrong address line was extracted.

【０００８】上述の「手書き文字列からの文字切り出し
の基礎的考察」において提案されている技術では、文字
の標準的な大きさを求め、これを基準に文字同士の接触
の分離を試みてはいる。しかし、郵便物表面に宛名書き
以外の雑情報が混在している場合を考慮していないた
め、文字サイズの高精度な推定は不可能であった。この
結果、抽出される宛名行も不安定、不確実なものとなっ
てしまっていた。In the technique proposed in the "basic consideration of character segmentation from a handwritten character string" described above, a standard size of a character is obtained, and it is attempted to separate contact between characters based on this standard size. There is. However, since the case where miscellaneous information other than address writing is mixed on the surface of the mail is not considered, it is impossible to estimate the character size with high accuracy. As a result, the extracted address lines were also unstable and uncertain.

【０００９】また、実際の郵便物では、行が入り組んだ
り、接触している場合もあるのであるが、そのための配
慮がなされていない。このため、上記技術では、そのよ
うな場合に、標準的な文字の大きさの推定、さらには行
間接触の分離をはじめとする入り組んだ行の抽出ができ
ないという問題点があった。In actual mail, lines may be intricate or contact with each other, but no consideration has been given to that. Therefore, the above-described technique has a problem in that in such a case, it is not possible to estimate the standard character size and extract complicated lines such as separation of line-to-line contact.

【００１０】そこで、本発明の目的は、宛名で用いられ
ている文字サイズによらず、正確な宛名行の抽出が行え
る宛名行抽出装置を提供することにある。Therefore, an object of the present invention is to provide an address line extracting device which can accurately extract an address line regardless of the character size used in the address.

【００１１】また、本発明の他の目的は、入り組んだ宛
名行に対しても、その宛名行の抽出が正確に行える宛名
行抽出装置を提供することにある。Another object of the present invention is to provide an address line extracting device which can accurately extract an address line even with complicated address lines.

【００１２】[0012]

【課題を解決するための手段】請求項１記載の発明は、
（イ）郵便物の記載内容に応じた画像データから宛名以
外の記載情報を検出する雑情報検出手段と、（ロ）この
雑情報検出手段の検出結果を用いて郵便物の宛名の記載
に用いられている可能性が高い、宛名行の筆記方向を指
定する情報を含む書式候補を少なくとも１つ仮定する宛
名書式仮定手段と、（ハ）この宛名書式仮定手段が仮定
した各書式候補に対して、書式候補に応じた一領域分の
画像データから、黒画素が連結した閉領域を包含するサ
イズが最小である矩形のブロックを特定し、特定したブ
ロックに関する情報を出力するブロック特定手段と、
（ニ）このブロック特定手段が各書式候補に対して特定
した各ブロックの位置および大きさに関する情報を基
に、書式候補ごとに各ブロックを幾つかのグループに分
類するブロック分類手段と、（ホ）このブロック分類手
段が分類したグループごとに、それぞれに属するブロッ
クの大きさの平均値を算出し、算出結果を標準ブロック
サイズとして出力する標準ブロックサイズ算出手段と、
（ヘ）ブロック分類手段によって分類された各グループ
に属するブロックから、標準ブロックサイズ算出手段に
よってそのグループに対して算出された標準ブロックサ
イズとの大きさの差が所定値以内のブロックを宛名文字
ブロックとして特定し、その特定結果をグループごとに
出力する宛名文字ブロック特定手段と、（ト）この宛名
文字ブロック特定手段でグループごとに特定された宛名
文字ブロックから、所定の範囲内の大きさを持ち、その
グループに関する標準ブロックサイズに最も近い大きさ
を持つ宛名文字ブロックをベースブロックとして選出す
るベースブロック選出手段と、（チ）このベースブロッ
ク選出手段によってグループごとに選出されたベースブ
ロックと、そのグループに対応する書式候補における筆
記方向と直交する方向に関しての重複の度合いが所定値
以上の同一グループ内の宛名文字ブロックとを統合し、
各グループに対する統合結果をそれぞれ統合ブロックと
して出力するブロック統合手段と、（リ）このブロック
統合手段が、グループごとに出力する統合ブロックの画
像データ内での位置や大きさを書式候補単位で相互に比
較し、宛名行として最も適当な形態を有する統合ブロッ
クが得られた書式候補を郵便物に用いられている書式と
する書式決定手段とを具備する。According to the first aspect of the present invention,
(B) miscellaneous information detection means for detecting information other than the address from the image data according to the description content of the mail, and (b) use of the detection result of this miscellaneous information detection means to describe the address of the mail. And (c) for each format candidate assumed by this address format assumption means, which assumes at least one format candidate including information specifying the writing direction of the address line A block specifying unit that specifies a rectangular block having a minimum size including a closed area in which black pixels are connected from image data for one area corresponding to a format candidate and outputs information about the specified block,
(D) Block classification means for classifying each block into several groups for each format candidate based on the information on the position and size of each block specified by this block identification means for each format candidate, ) For each group classified by this block classification means, a standard block size calculation means for calculating an average value of the sizes of blocks belonging to the groups and outputting the calculation result as a standard block size,
(F) Blocks that belong to each group classified by the block classifying unit and whose difference in size from the standard block size calculated for the group by the standard block size calculating unit is within a predetermined value are addressed character blocks. The address character block identification means for outputting the identification result for each group and (g) the address character block identified for each group by this address character block identification means has a size within a predetermined range. , A base block selecting means for selecting as the base block the address character block having the size closest to the standard block size for that group, and (h) the base block selected for each group by this base block selecting means, and its group. Orthogonal to the writing direction in the format candidate corresponding to The degree of overlap with respect to direction to integrate the address character blocks in the same group of more than a predetermined value,
A block integrating means for outputting the integrated result for each group as an integrated block, and (b) this block integrating means mutually determines the position and size in the image data of the integrated block output for each group in units of format candidates. A format determining unit that compares the format candidates for which the integrated block having the most suitable form as the address line is obtained to the format used in the mail.

【００１３】すなわち、請求項１記載の発明では、処理
対象となる郵便物に対して、幾つかの書式を仮定してお
き、仮定された各書式に対して、その書式が使用されて
いた場合に、宛名が存在している可能性が高い部分に関
する一領域分の画像データに対して、１文字に相当する
ブロックを求めるとともに、ブロックを大まかに、グル
ープ分けしておく。そして、グループ分けの後に、グル
ープごとに、そのグループに含まれる文字の平均的な大
きさを推定し、推定した文字の大きさ（標準ブロックサ
イズ）を用いて、各ブロックの統合による宛名行（統合
ブロック）の生成が行われるように宛名行抽出装置を構
成する。That is, according to the first aspect of the present invention, several forms are assumed for the postal matter to be processed, and the forms are used for each assumed form. First, the block corresponding to one character is obtained for the image data of one area related to the portion where the address is likely to exist, and the blocks are roughly divided into groups. Then, after grouping, for each group, the average size of the characters included in the group is estimated, and the estimated line size (standard block size) is used to integrate the address lines ( The address line extraction device is configured so that the integrated block) is generated.

【００１４】これにより、文字ブロックを統合して宛名
行を抽出する際に、統合を開始するブロックとなるベー
スブロックの選択が、各文字ブロックの大きさに応じて
行われることになり、その結果として、ブロックの統合
結果として得られる宛名行の高精度化が達成されること
になる。なお、標準ブロックサイズを算出する際には、
ブロックの長辺の長さを用いても良く、また、ブロック
の面積の平方根や、ブロックの幅を用いることもでき
る。As a result, when the character blocks are integrated and the address line is extracted, the base block to be the block for starting the integration is selected according to the size of each character block. As a result, the precision of the address line obtained as a result of the block integration is achieved. When calculating the standard block size,
The length of the long side of the block may be used, or the square root of the area of the block or the width of the block may be used.

【００１５】また、請求項２記載の発明のように、請求
項１記載の発明における標準ブロックサイズ算出手段
を、グループに属する文字ブロックから既定の上限値と
下限値の範囲外の大きさを持つブロックを除外し、残っ
たグループ内のブロックの大きさの平均値と標準偏差と
を算出し、算出した平均値と大きさの隔たりが算出した
標準偏差の既定数倍以内のブロックに関する大きさの平
均値を再度算出し、算出結果を標準ブロックサイズとし
て出力するものとした場合には、さらに、的確にベース
ブロック選択が行われることになり、その結果として、
宛名行の高精度な抽出が可能となる。Further, as in the invention described in claim 2, the standard block size calculation means in the invention described in claim 1 has a size outside the range of the predetermined upper limit value and lower limit value from the character blocks belonging to the group. Exclude the block, calculate the average value and standard deviation of the size of the blocks in the remaining group, and the difference between the calculated average value and the size If the average value is calculated again and the calculation result is output as the standard block size, the base block will be selected more accurately, and as a result,
It is possible to extract address lines with high accuracy.

【００１６】そして、請求項３記載の発明のように、宛
名文字ブロック特定手段が、ブロック分類手段によって
分類された各グループに属するブロックから、標準ブロ
ックサイズ算出手段によってそのグループに対して算出
された標準ブロックサイズとの大きさの差が所定値以内
のブロックを、宛名文字ブロックと特定するとともに、
標準ブロックサイズよりその大きさが所定値以上大きな
ブロックが存在していた場合には、そのブロックを標準
ブロックサイズと同程度の大きさを有するブロックに分
割し、分割したブロックを宛名文字ブロックと特定する
ものとなるように宛名行抽出装置を構成した場合には、
宛名行間の入り組みが激しい郵便物から、正確な宛名行
が抽出できるようにもなる。According to the third aspect of the invention, the addressed character block specifying means calculates the block belonging to each group classified by the block classifying means for the group by the standard block size calculating means. Blocks whose size difference from the standard block size is within a specified value are identified as address character blocks, and
If there is a block whose size is larger than the standard block size by a specified value or more, divide the block into blocks with the same size as the standard block size, and specify the divided blocks as address character blocks. If you configure the address line extraction device to
It is also possible to extract an accurate address line from a postal matter in which the address line is complicated.

【００１７】また、請求項４記載の発明のように、ブロ
ック統合手段として、統合結果として得られた統合ブロ
ックが、標準ブロックサイズの所定数倍で定められる幅
や高さの上限値および下限値の範囲に含まれるという条
件を満たすか否かの判定を行い、条件を満たさないと判
定したときには、その統合ブロックを破棄するととも
に、ベースブロック選出手段に、そのグループに属する
文字ブロックの中から、まだベースブロックとして使用
されておらず、かつ、標準ブロックサイズとの大きさの
差が所定値以内でかつ最も小さいブロックをベースブロ
ックとして選出させるものを用いて、宛名行抽出装置を
構成した場合には、誤った宛名行の抽出が行われる可能
性を更に低いものとすることができる。As the block integrating means, the integrated block obtained as an integrated result is an upper limit value and a lower limit value of the width and height determined by a predetermined number of times the standard block size. It is determined whether or not the condition of being included in the range is satisfied, and when it is determined that the condition is not satisfied, the integrated block is discarded, and the base block selecting means selects from the character blocks belonging to the group, If the address line extraction device is configured using a block that is not yet used as a base block, and that selects the smallest block whose size difference from the standard block size is within a predetermined value as the base block. Can further reduce the possibility that the wrong address line is extracted.

【００１８】さらに、請求項５記載の発明のように、ベ
ースブロック選出手段が、あるグループ内にベースブロ
ックになり得るブロックが存在しない場合、標準ブロッ
クサイズ算出手段に、そのグループ内で宛名行に統合さ
れていないブロックを用いた、標準ブロックサイズの算
出を行わせるものとした場合には、入れ組で文字の大き
さの異なる２つの行が、ブロック分類手段によって予め
別なグループに分離できない場合にも、ブロックの大き
さの正しい平均値が算出されることなり、文字の大きさ
の異なる宛名行同士に入り組み、接触があるような場合
にも宛名行の抽出が的確に行えることになる。Further, as in the invention described in claim 5, when there is no block that can be a base block in a certain group, the base block selecting means notifies the standard block size calculating means to the address line in the group. When the standard block size is calculated using unintegrated blocks, two lines with different character sizes cannot be separated into different groups in advance by the block classification means. In addition, since the correct average value of the block size is calculated, the address lines can be accurately extracted even when there is contact between address lines having different character sizes. .

【００１９】[0019]

【実施例】以下、実施例につき本発明を詳細に説明す
る。EXAMPLES The present invention will be described in detail below with reference to examples.

【００２０】第１の実施例 First embodiment

【００２１】図１に、本発明の第１の実施例による宛名
行抽出装置の機能ブロック図を示す。図示してあるよう
に、第１の実施例の宛名行抽出装置は、画像入力部１１
と雑情報検出部１２と宛名書式仮定部１３と複数の宛名
行抽出部１４₁〜１４₃と書式判定部１５とを備える。
なお、各部は、プロセッサ、メモリ等の周知の電子素子
を用いて形成されており、以下に記すように動作する。FIG. 1 is a functional block diagram of an address line extraction device according to the first embodiment of the present invention. As shown in the figure, the address line extraction device of the first embodiment is provided with an image input unit 11
A miscellaneous information detection unit 12, an address format presumption unit 13, a plurality of address line extraction units 14 _{1 to} 14 ₃ and a format determination unit 15 are provided.
Each unit is formed using a known electronic element such as a processor and a memory, and operates as described below.

【００２２】画像入力部１１は、郵便物を走査して、そ
の表面に描かれた文字あるいは郵便番号枠等の画像に応
じた多値のディジタル画像を出力する。雑情報検出部１
２は、画像入力部１１が出力するディジタル画像に対し
て、ノイズ除去を施す。そして、ノイズ除去を施したデ
ィジタル画像を基に、郵便物表面に記載されている切
手、郵便番号枠等の宛名以外の物体の位置と大きさを検
出し、それらの検出結果を、宛名周辺情報として宛名書
式仮定部１３に出力する。The image input unit 11 scans a mail item and outputs a multivalued digital image corresponding to an image of a character or a postal code frame drawn on the surface thereof. Miscellaneous information detector 1
2 removes noise from the digital image output by the image input unit 11. Then, based on the digital image from which noise has been removed, the position and size of objects other than the address, such as stamps and postal code frames, etc., on the surface of the mail are detected, and the detection results are used as address peripheral information. Is output to the address format assumption unit 13.

【００２３】宛名書式仮定部１３は、宛名書式仮定部１
３からの宛名周辺情報を基に、処理対象となっている郵
便物の書式を判定（仮定）し、その結果を出力する。The address format assumption unit 13 is an address format assumption unit 1.
Based on the address peripheral information from 3, the format of postal matter to be processed is determined (supposed) and the result is output.

【００２４】図２を用いて、宛名書式仮定部の動作内容
を説明する。図示してあるように、郵便物に宛名を書く
場合には、その長手方向が上下方向になるように郵便物
を置いて（縦長に置いて）書く場合（ａ、ｂ）と、長手
方向が左右方向となるように郵便物を置いて（横長に置
いて）書く場合（ｃ）とがあり、文字の筆記方向にも、
文字を縦書きする場合（ａ）と、横書きする場合（ｂ、
ｃ）とがある。The operation contents of the address format presumption unit will be described with reference to FIG. As shown in the figure, when writing an address on a postal matter, the longitudinal direction is different from the case where the postal matter is placed (longitudinal) so that the longitudinal direction is the vertical direction (a, b). There is a case (c) when the postal matter is placed so that it is in the left-right direction (placed horizontally), and in the writing direction of the letters,
When writing characters vertically (a) and horizontally (b,
There is c).

【００２５】宛名書式仮定部１３は、雑情報検出部から
の宛名周辺情報（たとえば、切手の位置、郵便番号枠の
位置）を基に、郵便物を走査したときの向きを判定し、
画像内に線分が検出された場合には、線分の向きから文
字の筆記方向の限定を行い、宛名周辺情報に基づくルー
ルにより、宛名行の抽出対象となっている郵便物が、図
２に示した書式のうち、どの書式によるものであるかを
判断（仮定）し、その結果を、宛名行抽出部１４に出力
する。なお、以後の説明においては、図２（ａ）のよう
な書式を「縦置き縦書き」、（ｂ）のような書式を「縦
置き横書き」、（ｃ）のような書式を「横置き横書き」
と表記することにする。The address format presumption unit 13 determines the direction in which the mail is scanned based on the address peripheral information (for example, the position of the stamp, the position of the postal code frame) from the miscellaneous information detection unit,
When a line segment is detected in the image, the writing direction of the character is limited from the direction of the line segment, and the mail item whose address line is to be extracted is shown in FIG. It is determined (assumed) which one of the formats shown in (1) is used, and the result is output to the address line extraction unit 14. In the following description, the format shown in FIG. 2A is "vertical placement and vertical writing", the format such as (b) is "vertical placement and horizontal writing", and the format such as (c) is "horizontal placement." Horizontal writing "
Will be written as.

【００２６】すなわち、宛名書式仮定部１３は、長手方
向と並行な線分が抽出されたときには、書式を「縦置き
縦書き」または「横置き横書き」に限定し、それ以上の
書式の限定が行えないときには（書式が一意に絞り込め
ないときには）、それら書式として用いられている可能
性がある複数の書式候補を出力する。また、書式候補を
出力する際には、宛名が存在する可能性の高い領域（以
下、宛名存在領域と記す）に関する情報をも出力する。
たとえば、「縦書き縦置き」の場合には、宛名は切手を
上側に置いたときに郵便番号枠の下から郵便物の終端ま
での右半分に存在する可能性が高いため、宛名存在領域
をその領域に設定し、設定した宛名存在領域に関する情
報を書式候補と併せて出力する。That is, when the line segment parallel to the longitudinal direction is extracted, the address format presuming unit 13 limits the format to "vertical placement vertical writing" or "horizontal placement horizontal writing". When it is not possible (when the formats cannot be narrowed down uniquely), multiple format candidates that may be used as those formats are output. Further, when the format candidates are output, information on an area where an address is likely to exist (hereinafter referred to as an address existence area) is also output.
For example, in the case of "vertical writing", the address is likely to exist in the right half from the bottom of the postal code frame to the end of the mail when the stamp is placed on the upper side. The area is set, and information about the set address existence area is output together with the format candidate.

【００２７】各宛名行抽出部１４は、それぞれ、宛名書
式仮定部１３が出力する書式候補と、その書式に該当す
る宛名存在領域の座標情報を受け取り、書式に応じた宛
名行の生成を行い、宛名行として出力する。この宛名行
抽出部の動作内容の詳細は後述する。宛名書式仮定部１
３から書式候補が複数出力された場合は、複数の宛名行
抽出部１４が起動され、各宛名行抽出部１４は、それぞ
れ別の書式候補に応じた宛名行の抽出処理を実行する。Each address line extraction unit 14 receives the format candidate output by the address format assumption unit 13 and the coordinate information of the address existence area corresponding to the format, and generates the address line according to the format. Output as address line. Details of the operation of the address line extraction unit will be described later. Address format assumption section 1
When a plurality of format candidates are output from 3, the plurality of address line extraction units 14 are activated, and each address line extraction unit 14 executes address line extraction processing according to different format candidates.

【００２８】書式判定部１５は、各宛名行抽出部１４が
抽出した宛名行候補を入力とし、それぞれの確からしさ
の判定を行い、最も確からしい書式を特定し、特定した
書式を出力する。The format determining unit 15 receives the address line candidates extracted by each address line extracting unit 14 as input, determines the certainty of each address line candidate, identifies the most probable format, and outputs the identified format.

【００２９】以下、図３を参照して、書式判定部の動作
を詳細に説明する。なお、図３は、第１の実施例の宛名
行抽出装置において、２つの宛名行抽出部が起動される
場合の一例を示した図であり、（ａ）には、宛名行の抽
出が行われる画像を示してあり、（ｂ）、（ｃ）には、
２つの宛名行抽出部の抽出結果を示してある。The operation of the format determining section will be described in detail below with reference to FIG. Note that FIG. 3 is a diagram showing an example in which two address line extraction units are activated in the address line extraction device of the first embodiment. In FIG. 3A, address line extraction is performed. The images shown are shown in (b) and (c).
The extraction results of the two address line extraction units are shown.

【００３０】図３（ａ）に示した画像に対して、雑情報
検出部と書式仮定部による処理が行われると、書式仮定
部は「横置き横書き」と「縦置き縦書き」という２つの
書式候補を出力し、２つの宛名行抽出部が起動されるこ
とになる。When the image shown in FIG. 3A is processed by the miscellaneous information detection unit and the format assumption unit, the format assumption unit has two types of "horizontal and horizontal writing" and "vertical and vertical writing". The format candidates are output, and the two address line extraction units are activated.

【００３１】そして、書式候補「横置き横書き」が与え
られた宛名行抽出部は、図３（ｂ）に模式的に示したよ
うな形で、宛名行を抽出し、各宛名行の幅、長さ、位置
等に関する情報を出力する。また、書式候補「縦置き縦
書き」が与えられた宛名行抽出部は、図３（ｃ）に模式
的に示したような形で、宛名行を抽出し、各宛名行の
幅、長さ、位置等に関する情報を出力する。Then, the address line extraction unit, to which the format candidate "horizontal writing" is given, extracts the address lines in the form as schematically shown in FIG. 3B, and the width of each address line, Outputs information about length, position, etc. In addition, the address line extraction unit, to which the format candidate “vertical vertical writing” is given, extracts the address lines in the form as schematically shown in FIG. 3C, and the width and length of each address line. , Output information about position, etc.

【００３２】書式判定部１５は、宛名行の幅の既定の基
準値と、書式候補が「横置き横書き」の場合の宛名行の
幅６１との差、および、基準値と書式候補が「縦置き縦
書き」の場合の宛名行の幅６３の差を算出する。この
際、宛名行が複数行ある場合には、同様の処理を他の宛
名行に対しても行い、書式候補ごとに、算出した差のコ
スト値を求め、コスト値の小さい方の書式を採用し、採
用した書式を用いた方の宛名行抽出部の出力を、最終出
力とする。The format determining unit 15 determines the difference between the default reference value of the address line width and the address line width 61 when the format candidate is "horizontal writing", and the reference value and the format candidate are "vertical". The difference in the address line width 63 in the case of "place vertical writing" is calculated. At this time, if there are multiple address lines, the same processing is performed for other address lines, the cost value of the calculated difference is calculated for each format candidate, and the format with the smaller cost value is adopted. Then, the output of the address line extraction unit using the adopted format is the final output.

【００３３】なお、最終書式の決定時に、基準値との比
較を、長さ６２、６４に対して、あるいは位置等に関し
ても行うようにしても良い。また、予め宛名行の幅、長
さ、位置の頻度分布を多数の郵便物から計測しておき、
これを参照して、書式候補から１つの書式を選択するよ
うにすることもできる。When determining the final format, the comparison with the reference value may be performed for the lengths 62 and 64, or for the position and the like. In addition, the width, length, and frequency distribution of the position of the address line are measured in advance from many mail items,
With reference to this, one format can be selected from the format candidates.

【００３４】以下、宛名行抽出部の動作の詳細を説明す
る。The operation of the address line extraction unit will be described in detail below.

【００３５】図４に、第１の実施例の宛名行抽出装置に
おける各宛名行抽出部の機能ブロック図を示す。図示し
てあるように、宛名行抽出部１４は、前処理部２１とラ
ベリング部２２とブロック分類部２３と文字サイズ推定
部２４と文字ブロック選出部２５とベースブロック選出
部２６と行検証部２７とを備える。FIG. 4 shows a functional block diagram of each address line extraction unit in the address line extraction device of the first embodiment. As illustrated, the address line extraction unit 14 includes a preprocessing unit 21, a labeling unit 22, a block classification unit 23, a character size estimation unit 24, a character block selection unit 25, a base block selection unit 26, and a line verification unit 27. With.

【００３６】前処理部２１は、宛名領域画像を２値化
し、孤立点除去を行い、宛名領域２値画像を生成する。
ラベリング部２２は、宛名領域２値画像に対して８連結
ラベリングを行い、各黒画素連結成分の最小包囲矩形で
あるブロックに関する情報を出力する。The pre-processing unit 21 binarizes the address area image, removes isolated points, and generates an address area binary image.
The labeling unit 22 performs 8-connected labeling on the binary image of the address area and outputs information about a block that is a minimum enclosing rectangle of each black pixel connected component.

【００３７】ブロック分類部２３は、ブロック内を黒画
素で埋めたものを、垂直軸と水平軸のうち書式候補の文
字列方向と直交する方向に近い軸へ投影したヒストグラ
ムにスムージング処理を施した結果のヒストグラムを生
成し、ヒストグラムが“０”になる部分を境界とする連
続領域に寄与しているブロックを各連続領域ごとにピッ
クアップすることで、ブロックをグループに分類する。The block classifying unit 23 performs smoothing processing on the histogram obtained by projecting the block filled with black pixels on the axis near the direction orthogonal to the character string direction of the format candidate among the vertical axis and the horizontal axis. Blocks are classified into groups by generating a resulting histogram, and picking up, for each continuous region, blocks that contribute to the continuous region bounded by the portion where the histogram is "0".

【００３８】なお、このブロック分類部２３では、スム
ージングのマスクの大きさを予め決めた文字の大きさ程
度に設定することによって、「林」、「ル」のような文
字中に分離を含む文字においても、ヒストグラムをなま
らせて、１行が複数のグループに分割されることを防い
でいる。ブロック分類部２３によってグループに分類さ
れたブロックは、グループごとに文字サイズ推定部２４
と文字ブロック選出部２５に出力される。In the block classification unit 23, by setting the size of the smoothing mask to a predetermined character size, the characters such as "Hayashi" and "Le" that include separation are included in the character. Also in (1), the histogram is blunted to prevent one line from being divided into a plurality of groups. The blocks classified by the block classification unit 23 into groups are classified by the character size estimation unit 24.
Is output to the character block selection unit 25.

【００３９】文字サイズ推定部２４は、各グループに含
まれるブロックの大きさの平均値を算出し、それぞれの
グループにおける標準ブロックサイズとする。ブロック
の大きさとしては、たとえばブロックの幅、長辺の長
さ、面積の平方根等を用いることが出来るが、第１の実
施例の宛名行抽出装置では、ブロックの長辺の長さを用
いている。文字サイズ推定部２４が算出した標準ブロッ
クサイズは文字ブロック選出部２５とベースブロック選
出部２６に出力される。The character size estimation unit 24 calculates the average value of the sizes of the blocks included in each group and sets it as the standard block size in each group. As the size of the block, for example, the width of the block, the length of the long side, the square root of the area, or the like can be used. In the address line extraction device of the first embodiment, the length of the long side of the block is used. ing. The standard block size calculated by the character size estimation unit 24 is output to the character block selection unit 25 and the base block selection unit 26.

【００４０】なお、文字サイズ推定部２４を、所定の上
限値、下限値で規定される範囲に、長辺の長さが含まれ
ないブロックを処理対象から外し、残ったブロックに関
して大きさの平均値を算出しこれを標準ブロックサイズ
とするように構成することも出来る。The character size estimation unit 24 excludes blocks whose long side length is not included in the range defined by the predetermined upper and lower limits from the processing target and averages the sizes of the remaining blocks. It is also possible to calculate the value and use this as the standard block size.

【００４１】文字ブロック選出部２５は、文字サイズ推
定部２４で得られた標準ブロックサイズに従って、グル
ープに属する各ブロックが単一の宛名文字かどうかの判
定を行う。文字ブロック選出部２５は、ブロックのサイ
ズと標準ブロックサイズの差が所定値以内の場合に、そ
のブロックが単一の文字に対応するものであると判定
し、そのようなブロックを文字ブロックとしてベースブ
ロック選出部２６に出力する。The character block selection unit 25 determines whether each block belonging to the group is a single addressed character according to the standard block size obtained by the character size estimation unit 24. When the difference between the block size and the standard block size is within a predetermined value, the character block selection unit 25 determines that the block corresponds to a single character, and bases such a block as a character block. Output to the block selection unit 26.

【００４２】ベースブロック選出部２６は、各グループ
に属する文字ブロックから、標準ブロックサイズに最も
近い大きさを持つブロックを、ベースブロックとしてグ
ループごとに選出し、行成長部２７に出力する。The base block selection unit 26 selects a block having a size closest to the standard block size from the character blocks belonging to each group as a base block for each group and outputs it to the line growth unit 27.

【００４３】行成長部２７は、ベースブロック選出部２
６からのベースブロックを、ブロックを統合して宛名行
の抽出処理を行う際に統合を開始するブロックとして用
いて、各グループごとにそれぞれに属する文字ブロック
を統合して、宛名行の候補となる統合ブロックの生成を
行う。書式候補が縦書きの場合には、水平軸上でベース
ブロックとの重複が所定値以上あるブロックを統合し、
横書きの場合には垂直軸上でベースブロックとの重複が
所定値以上あるブロックを統合する。行成長部２７は、
統合したブロックに対して同様な統合を順次行ってい
き、グループ内でブロックの統合がこれ以上行えなくな
るまでブロックの統合を続ける。そして、生成した統合
ブロックを宛名行候補として出力する（書式判定部１５
に出力する）。The row growth unit 27 is the base block selection unit 2
The base blocks from 6 are used as blocks to start the integration when the blocks are integrated and the address line extraction processing is performed, and the character blocks belonging to each group are integrated to become the address line candidates. Generate an integrated block. When the format candidate is vertical writing, the blocks that overlap with the base block on the horizontal axis by a certain value or more are integrated,
In the case of horizontal writing, blocks that overlap with the base block on the vertical axis by a predetermined value or more are integrated. The line growth unit 27 is
The same integration is sequentially performed on the integrated blocks, and the blocks are integrated until no more blocks can be integrated in the group. Then, the generated integrated block is output as an address line candidate (format determination unit 15).
Output to).

【００４４】第２の実施例 Second embodiment

【００４５】第２の実施例による宛名行抽出装置の基本
的な構成は、第１の実施例の宛名行抽出装置と同様のも
のであり、宛名行抽出部の機能だけを違えたものである
ため、宛名行抽出部に関する説明だけを行うことにす
る。The basic structure of the address line extraction device according to the second embodiment is the same as that of the address line extraction device of the first embodiment, but only the function of the address line extraction unit is different. Therefore, only the address line extraction unit will be described.

【００４６】第１の実施例の宛名行抽出装置に設けられ
た宛名行抽出部内の文字サイズ推定部１３は、各グルー
プに含まれるブロックの大きさの平均値を算出し、それ
ぞれのグループにおける標準ブロックサイズとしてい
た。第２の実施例では、文字サイズ推定部１３は、グル
ープに属する文字ブロックをまず所定の上限値、下限値
の範囲外のブロックを除外し、残ったブロックの平均
値、標準偏差を求め、平均値とブロックの大きさの差が
標準偏差の所定数倍以内のもののみを用いて、再度、平
均値を算出し、その算出結果を標準ブロックサイズとす
る。The character size estimation unit 13 in the address line extraction unit provided in the address line extraction device of the first embodiment calculates the average value of the sizes of the blocks included in each group, and the standard value for each group is calculated. I used the block size. In the second embodiment, the character size estimation unit 13 first excludes the character blocks belonging to the group from the blocks outside the predetermined upper and lower limit values, obtains the average value and standard deviation of the remaining blocks, and averages them. Only when the difference between the value and the block size is within a predetermined multiple of the standard deviation, the average value is calculated again, and the calculation result is set as the standard block size.

【００４７】第３の実施例 Third embodiment

【００４８】図５に、第３の実施例の宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図を示す。図
から明らかなように、第３の実施例の宛名行抽出部１４
の基本的な構成は、第１の実施例の宛名行抽出部（図４
参照）に、ブロック分割部２８を付加したものとなって
いる。FIG. 5 shows a functional block diagram of an address line extraction unit provided in the address line extraction device of the third embodiment. As is apparent from the figure, the address line extraction unit 14 of the third embodiment
The basic configuration of the address line extraction unit of FIG.
(See), the block dividing unit 28 is added.

【００４９】前処理部２１とラベリング部２２とブロッ
ク分類部２３とベースブロック選出部２６と行成長部２
７の動作内容は、第１の実施例における動作と同一であ
るので、ここでは、その他の各部の動作の説明だけを行
うことにする。The preprocessing unit 21, the labeling unit 22, the block classification unit 23, the base block selection unit 26, and the line growth unit 2
Since the operation content of 7 is the same as the operation in the first embodiment, only the operation of each of the other parts will be described here.

【００５０】第１の実施例における文字サイズ推定部２
４は、標準ブロックサイズを文字ブロック選出部２５と
ベースブロック選出部２６に出力するものであったが、
本実施例の文字サイズ推定部２４は、標準ブロックサイ
ズを、文字ブロック選出部２５とベースブロック選出部
２６に出力するとともに、ブロック分割部２８に対して
も出力する。Character size estimating unit 2 in the first embodiment
4 outputs the standard block size to the character block selection unit 25 and the base block selection unit 26.
The character size estimation unit 24 of the present embodiment outputs the standard block size to the character block selection unit 25 and the base block selection unit 26, and also to the block division unit 28.

【００５１】そして、文字ブロック選出部２５は、標準
ブロックサイズとの差が既定値以内のサイズを持つブロ
ックを文字ブロックとして選出し、ベースブロック選出
部２６に出力するとともに、標準ブロックサイズとの差
が既定値より大きく、かつ標準ブロックサイズより大き
いサイズを持つブロックを文字接触ブロックとして選出
し、ブロック分割部２８に出力する。Then, the character block selection unit 25 selects a block having a size within a predetermined value as a difference from the standard block size as a character block, outputs it to the base block selection unit 26, and outputs the difference from the standard block size. A block having a size larger than a predetermined value and a size larger than the standard block size is selected as a character contact block and is output to the block dividing unit 28.

【００５２】ブロック分割部２８は、文字接触ブロック
の長辺の長さを標準ブロックサイズで割って四捨五入し
た数でブロックの長辺を等分割し、分割したブロック
を、それぞれ文字ブロックとして、ベースブロック選出
部２６に出力する。The block division unit 28 divides the length of the long side of the character contact block by the standard block size and rounds it up to divide the long side of the block into equal parts, and divides the divided blocks into character blocks. Output to the selection unit 26.

【００５３】すなわち、第１あるいは第２の実施例の宛
名行抽出装置では、行間に跨る文字接触が存在する場
合、そのブロックを無視して宛名行の抽出を行っていた
が、第３の実施例の宛名行抽出装置では、接触により大
きくなったブロックを予め分離し、分離したブロックが
行成長部に入力される。このため、第３の実施例の宛名
行抽出装置は、たとえば、行間の入り組みが存在してい
るときなとに、第１あるいは第２の実施例の宛名行抽出
装置に比して、より正確な宛名行の抽出が行えるものと
なっている。That is, in the address line extraction device of the first or second embodiment, when there is a character contact across lines, the address line is extracted by ignoring the block, but the third embodiment. In the example of the address line extraction device, blocks that have become large due to contact are separated in advance, and the separated blocks are input to the line growth unit. For this reason, the address line extraction device of the third embodiment is better than the address line extraction device of the first or second embodiment, for example, when there is a line gap. The address line can be accurately extracted.

【００５４】第４の実施例 Fourth Embodiment

【００５５】図６に、第４の実施例の宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図を示す。図
から明らかなように、第４の実施例の宛名行抽出部１４
の基本的な構成は、第３の実施例の宛名行抽出部（図５
参照）に、行検証部２９を付加したものとなっている。FIG. 6 shows a functional block diagram of an address line extraction unit provided in the address line extraction device of the fourth embodiment. As is clear from the figure, the address line extraction unit 14 of the fourth embodiment
The basic configuration of the address line extracting unit of the third embodiment (see FIG.
(See), the line verification unit 29 is added.

【００５６】図示した機能ブロックのうち、前処理部２
１、ラベリング部２２、ブロック分類部２３、文字ブロ
ック選出部２５、行成長部２７、ブロック分割部２８の
動作内容は、第３の実施例と同一であるため、説明は省
略する。Of the functional blocks shown, the preprocessing unit 2
1, the operation contents of the labeling unit 22, the block classification unit 23, the character block selection unit 25, the line growth unit 27, and the block division unit 28 are the same as those in the third embodiment, and therefore the description thereof will be omitted.

【００５７】文字サイズ推定部２４は、標準ブロックサ
イズを文字ブロック選出部２５、ベースブロック選出部
２６、ブロック分割部２８に加え、行検証部１９に対し
ても出力する。The character size estimation unit 24 outputs the standard block size to the line block verification unit 19 in addition to the character block selection unit 25, the base block selection unit 26, and the block division unit 28.

【００５８】行検証部２９は、行成長部２７から出力さ
れる統合ブロックが、標準ブロックサイズの所定数倍で
定義される幅、高さ、統合ブロックに寄与している文字
ブロック間の距離のそれぞれの条件を満たすものである
場合、その統合ブロックを宛名行候補として出力する。
そして、上記の条件を満たさない場合には、ベースブロ
ック選出部２６に対して、ベースブロック変更信号を出
力し、生成された統合ブロックを棄却する。The line verifying unit 29 determines the width, height, and distance between the character blocks contributing to the integrated block, which the integrated block output from the line growing unit 27 is defined by a predetermined multiple of the standard block size. When the conditions are satisfied, the integrated block is output as the address line candidate.
When the above conditions are not satisfied, a base block change signal is output to the base block selection unit 26, and the generated integrated block is rejected.

【００５９】ベースブロック選出部２６は、ベースブロ
ック変更信号を受信すると、これまでベースブロックに
なっていない文字ブロックの中から、標準ブロックサイ
ズとの差が既定値以内でかつ最も標準ブロックサイズに
近いブロックを選出し、これを新しいベースブロックと
して行成長部２７に出力する。Upon receiving the base block change signal, the base block selection unit 26 has the difference from the standard block size within the predetermined value and the closest to the standard block size among the character blocks that have not been the base blocks so far. A block is selected and this is output to the row growth unit 27 as a new base block.

【００６０】ベースブロック選出部２６、行成長部２
７、行検証部２８は、上記のような一連の処理を繰り返
し、宛名行が文字ブロックとして記憶されている全ブロ
ックが宛名行に分類されるか、ベースブロックが選択で
きなくなった場合に、処理を終了する（宛名行抽出部の
処理が終了する）。Base block selection unit 26, line growth unit 2
7. The line verification unit 28 repeats the series of processes as described above, and when all blocks in which the address line is stored as a character block are classified into the address line or the base block cannot be selected, the process is performed. Ends (the processing of the address line extraction unit ends).

【００６１】このように、第４の実施例の宛名行抽出装
置では、行成長部の出力する統合ブロックが宛名行とし
ての条件を満たすか否かの判定が行われ、もし判定条件
を満たさない場合には、ベースブロックが再度選出され
て、宛名行の統合がやり直される。As described above, in the address line extraction apparatus of the fourth embodiment, it is determined whether the integrated block output by the line growth unit satisfies the condition as the address line, and the determination condition is not satisfied. In this case, the base block is re-elected and the address line integration is redone.

【００６２】なお、第４の実施例の宛名行抽出装置で
は、統合ブロックが、標準ブロックサイズの所定数倍で
定義される幅、高さ、統合ブロックに寄与している文字
ブロック間の距離のそれぞれの条件を満たすものである
か否かにより、ベースブロックの再選出を行わせている
が、行検証部２９に、生成された宛名行の位置関係か
ら、予め定められた条件を満たさない宛名行を破棄する
機能を持たせても良い。たとえば、（１）１行目はある
程度の長さを持つ、（２）１行目の書き出し位置は一定
領域内から書かれる、等の条件を定めておき、各条件を
満たさない宛名行を破棄させるように、行検証部２９を
構成することも出来る。In the address line extraction apparatus of the fourth embodiment, the integrated block has a width, a height defined by a predetermined number of times the standard block size, and a distance between the character blocks contributing to the integrated block. Although the base block is re-elected based on whether or not each condition is satisfied, the line verification unit 29 determines the address that does not satisfy the predetermined condition from the positional relationship of the generated address lines. It may have a function to discard rows. For example, conditions such as (1) the first line has a certain length, (2) the writing position of the first line is written from within a certain area, and the address lines that do not satisfy each condition are discarded. The row verification unit 29 can also be configured to do so.

【００６３】また、宛名行の始端および終端位置、大き
さ、行同士の相対的な位置を記述した統計的なテーブル
を予め作成しておき、これにより抽出された宛名行の尤
度を算出し、規定値に満たない行を破棄させることも出
来る。Further, a statistical table in which the start and end positions of the address line, the size, and the relative position of the lines are described is created in advance, and the likelihood of the address line extracted by this is calculated. , It is also possible to discard rows that do not reach the specified value.

【００６４】第５の実施例 Fifth embodiment

【００６５】図７に、第５の実施例による宛名行抽出装
置で用いられている宛名行抽出部の機能ブロック図を示
す。図示してあるように、第５の実施例による宛名行抽
出装置１４は、第４の実施例の宛名行抽出部（図６参
照）と、ベースブロック選出部２６から文字サイズ推定
部２４へ、フィードバックのための信号が送出できるよ
うに変更したものとなっている。FIG. 7 is a functional block diagram of the address line extraction unit used in the address line extraction device according to the fifth embodiment. As shown in the figure, the address line extraction device 14 according to the fifth embodiment includes an address line extraction unit (see FIG. 6) of the fourth embodiment and a base block selection unit 26 to a character size estimation unit 24. It has been modified so that a signal for feedback can be sent.

【００６６】前処理部２１とラベリング部２２とブロッ
ク分類部２３と文字ブロック選出部２５と行成長部２７
とブロック分割部２８と行検証部２９の動作は、第４の
実施例と同一であるので、その説明は省略する。The preprocessing unit 21, the labeling unit 22, the block classification unit 23, the character block selection unit 25, and the line growth unit 27.
The operations of the block dividing unit 28 and the row verifying unit 29 are the same as those in the fourth embodiment, and the description thereof will be omitted.

【００６７】ベースブロック選出部２６は、ベースブロ
ック変更信号を受信すると、これまでベースブロックに
なっていない文字ブロックの中から、標準ブロックサイ
ズとの差が既定値以内でかつ最も標準ブロックサイズに
近いブロックを選出し、これをベースブロックとして行
成長部２７に出力するが、このときにベースブロックに
成りうる文字ブロックが存在しない場合には、標準ブロ
ックサイズ修正信号、およびグループ内にまだ宛名行候
補に統合されていない文字ブロックである残存ブロック
を文字サイズ推定部２４に対して出力する。Upon receiving the base block change signal, the base block selection unit 26 has the difference from the standard block size within the predetermined value and the closest to the standard block size among the character blocks which have not been the base blocks so far. A block is selected and is output to the line growing unit 27 as a base block. At this time, if there is no character block that can be a base block, the standard block size correction signal and the addressed line candidate are still in the group. The remaining block, which is a character block that has not been integrated into, is output to the character size estimation unit 24.

【００６８】文字サイズ推定部２４は、標準ブロックサ
イズ修正信号と残存ブロックを受信すると残存ブロック
に関して、第１の実施例における文字サイズ推定部２４
における処理と同様の手順で標準ブロックサイズを算出
し、文字ブロック選出部２５とブロック分割部２８とベ
ースブロック選出部２６と行検証部２９に対して出力
し、残存ブロックに関して宛名行候補ブロックの生成を
行う。When the character size estimating unit 24 receives the standard block size correction signal and the remaining block, the character size estimating unit 24 in the first embodiment will be used for the remaining block.
The standard block size is calculated by the same procedure as in the process in (1) and output to the character block selection unit 25, the block division unit 28, the base block selection unit 26, and the line verification unit 29, and the address line candidate block is generated for the remaining block. I do.

【００６９】このように、この実施例の宛名行抽出装置
では、ベースブロック選出部２６がベースブロックを選
出するときに、グループ内にまだ宛名行候補に統合され
ていない文字ブロックである残存ブロックが存在してい
るにもかかわらず、ベースブロックに成りうる文字ブロ
ックが存在しないようなときに、再度標準ブロックサイ
ズを修正してからベースブロックを選出しなおす。As described above, in the address line extraction device of this embodiment, when the base block selection unit 26 selects a base block, the remaining blocks that are character blocks that have not yet been integrated into the address line candidates are included in the group. If there is no character block that can be used as a base block even if it exists, correct the standard block size again and re-select the base block.

【００７０】すなわち、第４の実施例では、ベースブロ
ックを再度選出する際にベースブロックになりうるブロ
ックが存在しない場合、ベースブロックの選択を行わな
かった。このため、スキュー、行間の入り組み等で１つ
のグループ内に大きさの異なる行同士が混在するとき
に、残留するブロックが存在しているにもかかわらず、
ベースブロックが選択できないような不都合が生じてい
たが、この実施例では、上述のように、そのような不都
合が生じないので、宛名行の抽出が正確に行えることに
なる。That is, in the fourth embodiment, when there is no block that can become a base block when the base block is selected again, the base block is not selected. For this reason, when rows of different sizes are mixed in one group due to skew, interlacing of rows, etc., despite the presence of residual blocks,
Although the inconvenience that the base block cannot be selected has occurred, this embodiment does not cause such an inconvenience as described above, so that the address line can be accurately extracted.

【００７１】以下、図８ないし図１０を用いて、図８
（ａ）に示したような入力画像に対する処理が、第５の
実施例の宛名行抽出装置で実行され、宛名書式仮定部１
３が、書式候補を「縦置き縦書き」とし、宛名存在領域
を、図８（ｂ）の宛名存在領域３０とした場合を例に、
第５の実施例による宛名行抽出装置の動作内容を、さら
に、具体的に説明する。Below, referring to FIG. 8 to FIG.
The processing for the input image as shown in (a) is executed by the address line extraction device of the fifth embodiment, and the address format assumption unit 1
3 shows an example in which the format candidate is “vertical vertical writing” and the address existence area is the address existence area 30 of FIG. 8B.
The operation contents of the address line extraction device according to the fifth embodiment will be described more specifically.

【００７２】このような宛名存在領域３０に関する情報
を受けた前処理部２１は、宛名存在領域３０の位置に相
当する宛名の画像に対して２値化処理を行い、２値画像
を生成する。ラベリング部２２は、前処理部２１が生成
した２値画像内をラベリングし、同図（ｃ）に示したよ
うに、連結成分の最小包囲矩形であるブロックを出力す
る。The preprocessing section 21 having received the information on the address existence area 30 performs a binarization process on the image of the address corresponding to the position of the address existence area 30 to generate a binary image. The labeling unit 22 labels the inside of the binary image generated by the preprocessing unit 21 and outputs a block which is a minimum enclosing rectangle of connected components, as shown in FIG.

【００７３】ブロック分類部２３では、書式候補が「縦
置き縦書き」であるから、各ブロックを水平軸へそれぞ
れ投影して、投影値が“０”になる部分でブロックをグ
ループ分けする。その結果、（ｃ）に示した各ブロック
は、（ｄ）に示したように、２つのグループ３１_A、３
１_Bに分類されることになる。In the block classification unit 23, since the format candidate is "vertical vertical writing", each block is projected onto the horizontal axis and the blocks are grouped at the portion where the projection value is "0". As a result, each block shown in (c) has two groups 31 _A , 3 _A and 3 _B as shown in (d).
It will be classified as 1 _B.

【００７４】（ｂ）、（ｄ）から明らかなように、この
段階では、各グループに、１つの宛名行だけが含まれる
とは限らず、グループ３１_Aのように、１グループに複
数の宛名行が含まれることもある。As is clear from (b) and (d), at this stage, each group does not always include only one address line, and a plurality of address lines are included in one group like the group 31 _A. May include lines.

【００７５】以下、図９を用いて、複数の宛名行が含ま
れている方のグループであるグループ３１_Aに対する処
理手順を説明する。The processing procedure for the group 31 _A, which is the group containing the plurality of address lines, will be described below with reference to FIG.

【００７６】図９（ａ）に示したようなグループ３１_A
に対して、ブロック同士の重複する部分の面積がブロッ
ク面積の既定値以上の割合を占める場合、これらのブロ
ックを統合すると同図（ｂ）のようになる。Group 31 _A as shown in FIG. 9 (a)
On the other hand, when the areas of the overlapping portions of the blocks occupy the ratio of the predetermined value of the block area or more, these blocks are integrated as shown in FIG.

【００７７】既に説明したように、文字サイズ推定部２
４は、ブロック分類部２３からのブロックに関する情報
を基に、予め定められている上限値と下限値の範囲に含
まれる幅を持つブロックのみを取り出し、それらのブロ
ックの幅の平均値と標準偏差を算出する。そして、算出
した平均値とブロックの幅の差が、標準偏差の所定数倍
以下のブロックを選択し、選択したブロックの幅の平均
値を算出し、標準ブロックサイズとして出力する。As described above, the character size estimation unit 2
Reference numeral 4 is based on the information about the blocks from the block classification unit 23, only blocks having a width included in a range of a predetermined upper limit value and a lower limit value are taken out, and an average value and a standard deviation of the widths of those blocks are taken out. To calculate. Then, a block in which the difference between the calculated average value and the block width is equal to or smaller than a predetermined number of standard deviations is selected, and the average value of the selected block widths is calculated and output as the standard block size.

【００７８】すなわち、図１０に示したように、（ａ）
のようなブロックを含むグループ（図９の（ｂ）と同じ
ものである）に対しては、図１０（ｂ）に示したような
ブロック群に対して、まず、ブロックの幅の平均値と標
準偏差とを算出し、その算出結果を用いて（ｃ）に示し
たようなブロック群を選択する。そして、図１０（ｃ）
に示したようなブロック群に対して、再度、平均値の算
出を行い、その算出結果を、標準ブロックサイズ３３と
して、文字ブロック選出手段２５とブロック分割手段１
８とベースブロック選出部２６と行検証部２９に出力す
る。That is, as shown in FIG.
For a group including such blocks (which is the same as FIG. 9B), first, with respect to the block group shown in FIG. The standard deviation is calculated, and the block group as shown in (c) is selected using the calculation result. And FIG. 10 (c)
The average value is calculated again for the block group as shown in FIG. 3, and the calculation result is used as the standard block size 33, and the character block selecting means 25 and the block dividing means 1
8 to the base block selection unit 26 and the row verification unit 29.

【００７９】図９に戻って、第５の実施例の宛名行抽出
装置の動作の説明を続ける。文字ブロック選出部２５
は、対象としているグループ３１_A内の各ブロックが、
その長辺と、文字サイズ推定部２４が決定した標準ブロ
ックサイズとの比が、予め定められた上限値と下限値の
範囲に含まれているという条件を満たすものであるか否
かを判定する。そして、条件を満たしているブロック
を、文字ブロックとして選択してベースブロック選出部
２６に出力し、条件を満たしていないブロックを、行間
接触ブロックとしてブロック分離部２７に出力する。Returning to FIG. 9, the description of the operation of the address line extraction device of the fifth embodiment will be continued. Character block selection unit 25
Indicates that each block in the target group 31 _A is
It is determined whether or not the ratio of the long side to the standard block size determined by the character size estimation unit 24 satisfies the condition of being included in a range of predetermined upper limit value and lower limit value. . Then, a block that satisfies the condition is selected as a character block and is output to the base block selection unit 26, and a block that does not satisfy the condition is output to the block separation unit 27 as a line contact block.

【００８０】たとえば、図９（ｂ）に示してあるグルー
プ３１_Aでは、同図（ｃ）に模式的に示したように、標
準ブロックサイズ３３との比が大きいブロック３５が、
行間接触ブロックとされ、他のブロックは、文字ブロッ
クとされることになる。For example, in the group 31 _A shown in FIG. 9B, the block 35 having a large ratio with the standard block size 33 is, as schematically shown in FIG. 9C,
It will be a line contact block, and the other blocks will be character blocks.

【００８１】ブロック分離部２７では、標準ブロックサ
イズで行間接触ブロックの幅を割り、四捨五入すること
で、行間接触ブロックをいくつに分離するかを決定し、
分離を実行する。このため、ブロック分離部２７は、ブ
ロック３５の長辺の長さと、標準ブロックサイズ３３と
の比を算出することによって、ブロック３５を２個のブ
ロックに分離することを決定する。そして、図９（ｃ）
に模式的に示してあるように、分離位置３６で分離を実
行し、分離した２個のブロックを、ベースブロック選出
部２７に出力する。The block separation unit 27 divides the width of the line-to-line contact block by the standard block size and rounds it to determine how many line-to-line contact blocks should be separated.
Perform separation. Therefore, the block separation unit 27 determines to separate the block 35 into two blocks by calculating the ratio of the long side length of the block 35 and the standard block size 33. And FIG. 9 (c)
, The separation is performed at the separation position 36, and the two separated blocks are output to the base block selection unit 27.

【００８２】結局、ベースブロック選出部２７には、
（ｄ）に示したようなブロックが入力されることにな
り、ベースブロック選出部２７は、それらのブロックか
ら、標準ブロックサイズ３３に最も近い大きさを持つブ
ロック３４をベースブロックとして選出する。After all, the base block selection section 27
The blocks shown in (d) are input, and the base block selection unit 27 selects a block 34 having a size closest to the standard block size 33 from those blocks as a base block.

【００８３】そして、この場合、書式候補が「縦置き縦
書き」であるため、行成長部２９は、（ｅ）に示したよ
うに、水平軸上に投影したとき（すなわち、矢印３８方
向に投影したとき）に、ベースブロック３４と重複があ
るブロック（図において、網掛けが付してあるブロッ
ク）を、順次統合していく。そして、行成長部２７は、
グループ内のブロックが全て文字行に統合されたとき、
あるいは、統合が行えなくなったときに、統合処理を停
止し、統合結果を行検証部２９に出力するので、この場
合は、（ｆ）に示したように、グループ３１_A内の右半
分に存在する各ブロックの統合が行われた段階で、統合
結果が、統合ブロック３８として出力されることにな
る。In this case, since the format candidate is "vertical vertical writing", the line growth unit 29 projects the image on the horizontal axis as shown in (e) (that is, in the direction of arrow 38). Blocks that overlap the base block 34 (when projected) (blocks shaded in the figure) are sequentially integrated. Then, the line growth unit 27
When all the blocks in the group are integrated into a character line,
Alternatively, when the integration cannot be performed, the integration process is stopped and the integration result is output to the row verification unit 29. In this case, therefore, as shown in (f), it exists in the right half of the group 31 _A. The integrated result is output as an integrated block 38 when the respective blocks to be integrated are integrated.

【００８４】行検証部２９は、生成された統合ブロック
のサイズ、統合に参加したブロックの配置等をチェック
し、条件に合うものは宛名行候補として、書式判定部１
５に出力する。また、行検証部２９は、注目しているグ
ループにおいて宛名行に属さないブロックが存在するか
否かのチェックを行い、残存ブロックがある場合には、
グループ内に大きさの異なる宛名行が存在することを考
慮して、残存ブロックについて再度文字サイズの推定を
行い、文字サイズ推定部２４における標準ブロックサイ
ズの修正後、文字ブロック選出部２５とブロック分割部
２８とベースブロック選出部２６と行成長部２７と行検
証部２９に、残存ブロックに関する処理を再度実行させ
る。The line verification unit 29 checks the size of the generated integrated block, the arrangement of the blocks participating in the integration, and the like, and if the conditions are met, the format determination unit 1 determines that the address line candidate is the address line candidate.
5 is output. Further, the line verification unit 29 checks whether or not there is a block that does not belong to the addressed line in the focused group, and if there is a remaining block,
In consideration of the presence of address lines of different sizes in the group, the character size of the remaining block is estimated again, the standard block size is corrected by the character size estimation unit 24, and the character block selection unit 25 and the block division are performed. The unit 28, the base block selection unit 26, the row growth unit 27, and the row verification unit 29 are caused to execute the processing regarding the remaining blocks again.

【００８５】このため、（ｆ）に示したような、宛名行
に属さないブロックが存在している場合には、行検証部
２９は、残存ブロックに対して、統合ブロック２８内の
ブロックに対して行ったのと、同様の処理を実行する。
その結果、行成長部２７に、（ｇ）に模式的に示したよ
うに、新たな標準ブロックサイズ４１とベースブロック
３９が通知されることになり、行成長部２７は、各ブロ
ックを統合していき、新たな統合ブロック４２を生成す
ることになる。Therefore, when there is a block that does not belong to the addressed line as shown in (f), the line verification unit 29 determines that the remaining block is the block in the integrated block 28. Then, the same processing is performed.
As a result, the row growth unit 27 is notified of the new standard block size 41 and the new base block 39, as schematically shown in (g), and the row growth unit 27 integrates the blocks. Then, a new integrated block 42 is generated.

【００８６】第５の実施例の宛名行抽出装置では、以上
の動作を各グループの残存ブロックがなくなるか、また
は標準ブロックサイズを更新しても新たな宛名行が生成
されなくなるまで続行し、その後、処理を終了する。In the address line extraction device of the fifth embodiment, the above operation is continued until there are no remaining blocks in each group or no new address line is generated even if the standard block size is updated, and thereafter. , The process ends.

【００８７】このように、第１ないし第５の実施例によ
る宛名行抽出装置では、宛名行を生成する際の元となる
ベースブロックの選択が、１つのグループ内に存在する
ブロックのうち、小さなものを考慮に入れずに算出され
る平均的なブロックサイズを基に行われる構成となって
いるので、誤ったベースブロックから宛名行の成長が行
われ、その結果として誤った宛名行抽出が行われるとい
ったことが起こりにくい。As described above, in the address line extracting apparatus according to the first to fifth embodiments, the selection of the base block which is the source for generating the address line is the smallest among the blocks existing in one group. Since the configuration is performed based on the average block size calculated without taking account of the factors, the address line is grown from the wrong base block, and as a result, the wrong address line is extracted. It's hard to be told.

【００８８】すなわち、図１１（ａ）、（ｂ）に示した
ように、宛名書きの文字の大きさは多様に変化するた
め、文字の大きさによらず、一定のパラメータによりベ
ースブロックの選択を行う構成である場合には、（ａ）
に示した宛名書きのように比較的小さい文字で構成され
る宛名行（たとえば、ワープロ出力された宛名行）の抽
出を可能とするためには、ベースブロックになり得るブ
ロック幅の下限を小さめにしておかなければならない。
しかし、そのように設定しておくと、（ｂ）のように大
きな文字が筆記された郵便物では、図中、枠で示したよ
うに、文字単位ではなく、たとえば、文字の一部分がブ
ロックあるいはベースブロックと見なされてしまい、結
果として誤った宛名行抽出が行われてしまう。That is, as shown in FIGS. 11 (a) and 11 (b), since the size of the address writing character varies in various ways, the base block is selected by a constant parameter regardless of the character size. If the configuration is such that (a)
In order to enable the extraction of address lines composed of relatively small characters (for example, address lines output by word processing) such as the address writing shown in, make the lower limit of the block width that can be the base block small. I have to keep it.
However, if such setting is made, in a mail piece in which a large character is written as shown in (b), as shown by a frame in the figure, for example, a part of the character is a block or It is regarded as a base block, resulting in incorrect address line extraction.

【００８９】これに対して、各実施例の宛名行抽出装置
では、図１１（ａ）、（ｂ）いずれのケースでも、正確
に１文字分のブロックがベースブロックに選択されるこ
とになるので、上述のような誤った宛名行抽出は行われ
ず、宛名行の成長が安定に行われることになる。On the other hand, in the address line extraction apparatus of each embodiment, in both cases of FIGS. 11A and 11B, exactly one character block is selected as the base block. The erroneous address line extraction as described above is not performed, and the address line is stably grown.

【００９０】また、図１２（ａ）に示したように、１通
の宛名書きにおいても宛名氏名、市区町名、番地等の行
毎に文字の大きさが変化するものが多く見られるのであ
るが、各実施例の宛名行抽出装置では、そのような郵便
物の画像データは、同図（ｂ）に示したように、グルー
プ３１₁、３１₂といったような形で、大まかにグルー
プ分けされ、グループ単位で宛名行の成長が行われる。
このため、一通の宛名書き内での文字の大きさが変化し
ていても、安定した宛名行の抽出が行えるようになって
いる。Further, as shown in FIG. 12A, even in a single address writing, there are many cases in which the size of the character changes line by line such as address name, city / ward name, and street address. However, in the address line extraction device of each embodiment, the image data of such a mail is roughly divided into groups such as groups 31 ₁ and 31 ₂ as shown in FIG. , Address lines are grown in groups.
Therefore, even if the size of characters in one address writing is changed, the address line can be stably extracted.

【００９１】[0091]

【発明の効果】以上、詳細に説明したように、請求項１
記載の発明による宛名行抽出装置では、郵便物上に記載
されている文字サイズに応じて、宛名行を成長させる際
の元となるベースブロックの選択が行われるので、文字
サイズがどのようなものであっても、正確な宛名行の抽
出が行えることになる。As described in detail above, claim 1 is as follows.
In the address line extraction device according to the invention described, the base block that is the source for growing the address line is selected according to the character size described on the mail, so what character size is used? Even in this case, the address line can be accurately extracted.

【００９２】また、請求項２記載の発明による宛名行抽
出装置では、文字ブロックの大きさの分布を考慮に入れ
て、宛名として不適当な文字ブロックを用いずに標準ブ
ロックサイズが算出され、その標準ブロックサイズを基
にベースブロックが選択されるので、ノイズ、及び文字
接触に影響されずに、高精度な文字サイズの推定が行わ
れることになり、その結果として、正確な宛名行の抽出
が行えることになる。In the address line extraction device according to the second aspect of the present invention, the standard block size is calculated without taking into account the character block size distribution and without using an unsuitable character block. Since the base block is selected based on the standard block size, high-precision character size estimation is performed without being affected by noise and character contact, and as a result, accurate address line extraction is possible. You can do it.

【００９３】そして、請求項３記載の発明のように、標
準ブロックサイズと比して大きなブロックがあった場合
には、そのブロックを分割して文字ブロックを生成する
ように宛名行抽出装置を構成した場合には、宛名行が入
り組んでいる場合にも、正確な宛名行の抽出が行える宛
名行抽出装置が得られることになる。Further, as in the invention described in claim 3, when there is a block larger than the standard block size, the address line extraction device is configured to divide the block to generate a character block. In this case, even if the address lines are complicated, it is possible to obtain the address line extraction device that can accurately extract the address lines.

【００９４】さらに、請求項４記載の発明のように、ブ
ロックの統合を行う（宛名行の生成を担当する）ブロッ
ク統合手段に、統合結果が宛名行として正しい形態のも
のか否かを判断する機能を付加すれば、誤った宛名行抽
出が行われる確率が極めて低い宛名行抽出装置を得るこ
とが出来ることになる。Further, as in the invention described in claim 4, the block integrating means for integrating the blocks (in charge of generating the address line) determines whether or not the integrated result is in the correct form as the address line. By adding a function, it is possible to obtain an address line extracting device with a very low probability that wrong address line extraction is performed.

【００９５】そして、請求項５記載の発明のように宛名
行抽出装置を構成した場合には、文字の大きさの異なる
宛名行同士に入り組み、接触があるような場合にも、宛
名行の抽出が的確に行えることになる。When the address line extraction device is configured as in the fifth aspect of the present invention, the address line extracting device is configured to detect the address line even when the address lines having different character sizes are interdigitated and contact with each other. The extraction can be done accurately.

[Brief description of drawings]

【図１】本発明の第１の実施例による宛名行抽出装置の
概要を示す機能ブロック図である。FIG. 1 is a functional block diagram showing an outline of an address line extraction device according to a first embodiment of the present invention.

【図２】第１の実施例による宛名行抽出装置内の書式仮
定部が仮定する書式を説明するための説明図である。FIG. 2 is an explanatory diagram for explaining a format assumed by a format assumption unit in the address line extraction device according to the first embodiment.

【図３】第１の実施例による宛名行抽出装置内の書式判
定部の動作を説明するために用いた、処理対象とされる
画像と宛名行抽出部の出力との対応関係の概要を示した
模式図である。FIG. 3 shows an outline of a correspondence relationship between an image to be processed and an output of an address line extraction unit, which is used for explaining an operation of a format determination unit in the address line extraction device according to the first embodiment. It is a schematic diagram.

【図４】本発明の第１の実施例による宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図である。FIG. 4 is a functional block diagram of an address line extraction unit provided in the address line extraction device according to the first embodiment of the present invention.

【図５】本発明の第３の実施例による宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図である。FIG. 5 is a functional block diagram of an address line extraction unit provided in an address line extraction device according to a third embodiment of the present invention.

【図６】本発明の第４の実施例による宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図である。FIG. 6 is a functional block diagram of an address line extraction unit provided in an address line extraction device according to a fourth embodiment of the present invention.

【図７】本発明の第５の実施例による宛名行抽出装置内
に設けられる宛名行抽出部の機能ブロック図である。FIG. 7 is a functional block diagram of an address line extraction unit provided in an address line extraction device according to a fifth embodiment of the present invention.

【図８】第５の実施例による宛名行抽出装置の動作を説
明するために用いた、処理対象とされる画像と書式仮定
部が出力する情報および宛名行抽出部内のブロック分類
部が出力する情報との対応関係の概要を示した模式図で
ある。FIG. 8 is an image used for explaining the operation of the address line extraction apparatus according to the fifth embodiment, the image to be processed, the information output by the format assumption unit, and the block classification unit in the address line extraction unit output. It is a schematic diagram which showed the outline of the correspondence with information.

【図９】第５の実施例による宛名行抽出部の動作を説明
するための本発明における書式判定部の入力例を示す説
明図である。FIG. 9 is an explanatory diagram showing an input example of a format determination unit in the present invention for explaining the operation of an address line extraction unit according to a fifth example.

【図１０】第５の実施例による文字サイズ推定部が、標
準ブロックサイズを算出する際にその大きさを参照する
ブロックを示した説明図である。FIG. 10 is an explanatory diagram showing blocks that a character size estimation unit according to a fifth embodiment refers to when calculating a standard block size.

【図１１】従来の宛名行抽出技術と、実施例の宛名行抽
出装置とを比較するために用いた、異なるサイズの文字
で表現された同一の内容を有する宛名行の一例を示した
説明図である。FIG. 11 is an explanatory diagram showing an example of an address line having the same content expressed by characters of different sizes, which is used for comparing the conventional address line extraction technique and the address line extraction device of the embodiment. Is.

【図１２】行毎に文字の大きさが変化する宛名の一例
と、その宛名がグループ分けされる様子を示した説明図
である。FIG. 12 is an explanatory diagram showing an example of an address in which the size of a character changes line by line and how the address is divided into groups.

[Explanation of symbols]

１１画像入力部１２雑情報検出部１３宛名書式仮定部１４宛名行抽出部１５書式判定部２１前処理部２２ラベリング部２３ブロック分類部２４文字サイズ推定部２５文字ブロック選出部２６ベースブロック選出部２７行成長部２８ブロック分割部２９行検証部 11 Image Input Section 12 Miscellaneous Information Detection Section 13 Address Format Assumption Section 14 Address Line Extraction Section 15 Format Determination Section 21 Preprocessing Section 22 Labeling Section 23 Block Classification Section 24 Character Size Estimation Section 25 Character Block Selection Section 26 Base Block Selection Section 27 Line growth unit 28 Block division unit 29 Line verification unit

Claims

[Claims]

1. A miscellaneous information detecting means for detecting information other than an address from image data corresponding to the content of the mail, and a detection result of the miscellaneous information detecting means for use in describing the address of the mail. For each of the format candidates assumed by the address format assumption means, which assumes at least one format candidate including the information that specifies the writing direction of the address line,
Block specifying means for specifying a rectangular block having a minimum size including a closed area in which black pixels are connected from the image data for one area corresponding to the format candidate and outputting information on the specified block; A block classification unit that classifies each block into several groups for each format candidate based on the information about the position and size of each block that the block specification unit has identified for each format candidate, and this block classification unit For each group, the average value of the sizes of the blocks belonging to each group is calculated, and the standard block size calculation unit that outputs the calculation result as the standard block size, and the blocks that belong to each group classified by the block classification unit, Standard block calculated for the group by the standard block size calculation means A block whose size difference from the size is within a predetermined value is specified as an address character block, and the specified result is output for each group, and an address character block specification unit is specified for each group by this address character block specification unit. Base block selecting means for selecting, as a base block, an address character block having a size within a predetermined range and having a size closest to the standard block size for the group from the address character block. By the base block selected for each group by, and the address character blocks in the same group having a degree of overlap in the direction orthogonal to the writing direction in the format candidate corresponding to the group is a predetermined value or more, each Output the integrated result for each group as an integrated block. And the block integrating means compares the positions and sizes of the integrated blocks output for each group in the image data with each other in the format candidate unit, and has the most suitable form as an address line. An address line extraction device, comprising: a format determining unit that sets a format candidate for which an integrated block is obtained as a format used for the mail.

2. The standard block size calculation means excludes blocks having a size outside a predetermined upper limit value and a lower limit value from the character blocks belonging to the group, and the remaining block size in the group. The average value of the size and the standard deviation is calculated, and the average value of the sizes of the blocks within a predetermined number of times the calculated standard value and the calculated standard deviation is calculated again, and the calculated result is the standard block. The address line extracting device according to claim 1, wherein the address line extracting device outputs the size as a size.

3. The size of a block belonging to each group classified by the block classifying unit and the standard block size calculated for the group by the standard block size calculating unit by the addressed character block specifying unit. If a block with a difference of less than or equal to a predetermined value is specified as an address character block, and there is a block whose size is larger than the standard block size by a predetermined value or more, then that block is approximately the same as the standard block size. The address line extracting device according to claim 1 or 2, wherein the address line extracting device is configured to divide the block into blocks having a size of, and specify the divided block as an address character block.

4. The condition that the block integrating means includes the integrated block obtained as a result of integration in a range of an upper limit value and a lower limit value of a width or height defined by a predetermined multiple of the standard block size. If it is determined that the condition is not satisfied, the integrated block is discarded, and the base block selecting means uses it as a base block from the character blocks belonging to the group. 4. The address line extraction device according to claim 3, wherein a block which is not processed and whose difference from the standard block size is within a predetermined value and has the smallest difference is selected as a base block.

5. When the base block selecting means does not have a block that can be a base block in a certain group, the standard block size calculating means uses a block which is not integrated in an address line in the group. 5. The address line extraction device according to claim 4, wherein the standard block size is calculated.