JPH0762860B2 - Character separator - Google Patents

Character separator

Info

Publication number
JPH0762860B2
JPH0762860B2 JP61235941A JP23594186A JPH0762860B2 JP H0762860 B2 JPH0762860 B2 JP H0762860B2 JP 61235941 A JP61235941 A JP 61235941A JP 23594186 A JP23594186 A JP 23594186A JP H0762860 B2 JPH0762860 B2 JP H0762860B2
Authority
JP
Japan
Prior art keywords
character
line
partial
characters
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP61235941A
Other languages
Japanese (ja)
Other versions
JPS6389989A (en
Inventor
善丈 辻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP61235941A priority Critical patent/JPH0762860B2/en
Publication of JPS6389989A publication Critical patent/JPS6389989A/en
Publication of JPH0762860B2 publication Critical patent/JPH0762860B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は記載書式の制限のない文字行を読み取る光学的
文字読取装置等に用いる文字分離装置に関し、特に郵便
物上に記載されている住所等の特定分野において、記載
書式の制限のない文字行から個々の文字を切り出す文字
分離装置に係わる。
TECHNICAL FIELD The present invention relates to a character separating device used for an optical character reading device or the like for reading a character line whose description format is not limited, and particularly to an address described on a postal matter. In a specific field such as the above, the present invention relates to a character separating device that cuts out individual characters from a character line having no limitation on the description format.

(従来技術とその問題点) 従来、印字あるいは手書き文字を光学的に読取る装置
(以下、OCRと呼ぶ)において、英数字及びカタカナを
対象とするものはすでに実用化されており、最近では手
書き漢字を読み取ることも実用化されつつある。このよ
うなOCRを用いて文字を読み取る場合、紙面上の文字行
から個々の文字を切り出す文字分離技術が必要不可欠と
なる。従来の文字分離技術では、例えば電子通信学会論
文誌(D),J68−D,No.8.p1497−1504(1985年8月)に
示されているように、文字ピツチに基づいて文字切り出
しを行う方法や英数字などを対象として簡単な空白によ
る文字分離を行う方法を始め、各種手法が開発されてい
る。これら従来の文字分離技術を用いて文字を切り出す
場合には、1つの文字行内に例えば縦書きと横書きなど
が混在しないという仮定を設けている。しかしながら、
このような仮定が成立しない例として郵便物上の宛て名
などがある。このような郵便物上の宛て名は、手書き漢
字や手書き数字なども含まれており、例えば、縦書きの
宛て名住所に対して丁目や番地などが省略形を用いた横
書きで記載されることがある。この場合、省略形で記載
された横書き宛て名部(例えば丁目・番地)は、県名や
氏名で用いる漢字などとは異なり、アラビア数字などを
用いることが多いため、このような郵便宛て名のよう
に、文字行内に異なる向きを持つ部分文字行が含まれる
場合には、その部分文字行の向きなどの要因に応じて適
当な文字分離方式を選択する必要が生じる。ところが、
郵便物上の宛て名などに見られるように、文字行内に異
なる方向を持つ部分文字行が含まれる文字行に対して従
来の文字分離技術では、精度良くしかも効率的に個々の
文字を切り出すことが困難であつた。
(Prior art and its problems) Conventionally, in a device that optically reads printed characters or handwritten characters (hereinafter referred to as OCR), those that target alphanumeric characters and katakana have already been put into practical use, and recently, handwritten kanji characters. Reading is also being put to practical use. When reading characters using such OCR, character separation technology that cuts out individual characters from the character lines on the paper is essential. In the conventional character separation technology, for example, as shown in IEICE Transactions (D), J68-D, No.8.p1497-1504 (August 1985), character segmentation is performed based on character pitch. Various methods have been developed, including a method for performing and a method for performing character separation by simple white space for alphanumeric characters. When cutting out characters using these conventional character separation techniques, it is assumed that vertical writing and horizontal writing do not coexist in one character line. However,
An example of a case where such an assumption does not hold is a mailing address on a mail item. Addresses on such postal items include handwritten Chinese characters and handwritten numbers. For example, address and address in vertical writing must be written horizontally using abbreviations such as chome and address. There is. In this case, the horizontal part of the address written in abbreviated form (for example, chome and street address) often uses Arabic numerals, unlike the Kanji characters used in prefecture names and names. As described above, when the character lines include partial character lines having different directions, it is necessary to select an appropriate character separation method according to factors such as the direction of the partial character lines. However,
With the conventional character separation technology, individual characters can be cut out accurately and efficiently with respect to character lines that include partial character lines with different directions within the character line, as seen in mail addresses. Was difficult.

本発明の目的は、上記従来の問題点を解決すべく、文字
行内に含まれた異なる向きを有する部分文字行を検知
し、検知された部分文字行の形状や文字行内の相対位置
等からその部分文字行を個々の文字に分離する手法を変
更することによつて、効率的にしかも精度良く文字切り
出しが行えるようにした文字分離装置を提供することに
ある。
The object of the present invention is to solve the above-mentioned conventional problems by detecting partial character lines having different orientations contained in a character line, and detecting the partial character line based on the shape of the detected partial character line or the relative position in the character line. It is an object of the present invention to provide a character separating device capable of efficiently and accurately cutting out a character by changing a method of separating a partial character line into individual characters.

(問題点を解決するための手段) 前述の問題点を解決するために本発明が提供する手段
は:紙面上に記載された複数の文字行を走査し、個々の
文字イメージを抽出する文字分離装置であつて:ブロツ
ク化された文字行内に異なる向きを有する部分文字行が
混在するか否かを検知し、同一方向の前記部分文字行毎
に前記文字行を分割する手段と;複数個の文字分離手段
を設け,前記部分文字行の特性及び前記文字行内の相対
位置に従つて前記複数個の文字分離手段から所定の文字
分離手段を選択する手段とを具備することを特徴とす
る。
(Means for Solving the Problems) In order to solve the above problems, the means provided by the present invention are: character separation for scanning a plurality of character lines described on a paper surface and extracting individual character images. A device: means for detecting whether or not partial character lines having different directions are mixed in a block character line and dividing the character line into the partial character lines in the same direction; Character separation means is provided, and means for selecting a predetermined character separation means from the plurality of character separation means according to the characteristic of the partial character line and the relative position within the character line.

(作 用) 本発明において、文字行内に異なる向きを有する部分文
字行の有無を検知し、それらの部分文字行の特性及び相
対位置に従つて文字分離方式を変更することにより、効
率良くしかも精度良く文字切り出し処理を行うことが可
能となる。
(Operation) In the present invention, by detecting the presence or absence of partial character lines having different orientations in a character line and changing the character separation method according to the characteristics and relative position of those partial character lines, it is possible to efficiently and accurately It is possible to perform the character cutting process well.

(実施例) 以下、本発明の実施例について図面を参照しつつ説明す
る。
(Example) Hereinafter, the Example of this invention is described, referring drawings.

図1は、縦書きと横書きが混在する郵便物上の住所の一
例を示す図であり、本発明の原理を説明するためのもの
である。
FIG. 1 is a diagram showing an example of an address on a postal matter in which vertical writing and horizontal writing are mixed, and is for explaining the principle of the present invention.

図において斜線で示した文字は、住所を示しており、図
中丸印は住所の一部の省略を表わしている。図1(a)
の文字行において、図に示すように、水平方向に投影分
布を求めると、文字の塊まりや文字の一部(以下、文字
塊と呼ぶ)に相当する分布が生じ、この分布を調べるこ
とにより、文字イメージを抽出することができるが、図
中文字イメージ“市”や“4−1−1"は、正しく文字を
切り出すことができない。そこで、図1(a)で示した
水平方向の投影分布により分割された各文字塊イメージ
に対して水平方向の投影分布を図1(b)で示すように
算出する。図1(b)で示すように、文字塊イメージ
“川”及び“4−1−1"では、それぞれ3個及び5個の
部分文字を表わす投影分布が抽出され、文字塊イメージ
“川”及び“4−1−1"の水平方向の両端位置が判明す
るため、各文字塊イメージの位置及び大きさ更には、文
字塊内に含まれる要素数(例えば文字塊イメージ“川”
及び“4−1−1"では要素数はそれぞれ3及び4であ
る)が抽出できる。次に、各文字塊イメージの大きさ及
び文字塊内に含まれる要素数を検査して、横書きである
可能性が調べられる。例えば図1(c)において、文字
塊イメージ“川”及び“4−1−4"が要素数及び文字塊
イメージの縦横比から横書きである候補文字塊イメージ
と判断される。そこで、横書き候補文字塊イメージとし
て、図1(c)の記号L1,L3で示す2つの領域が得ら
れ、記号L2で示す縦書き文字塊イメージが得られる。
The hatched characters in the figure represent addresses, and the circles in the figure represent omissions of part of the address. Figure 1 (a)
As shown in the figure, in the character line of, when the projection distribution is obtained in the horizontal direction, a distribution corresponding to a lump of characters or a part of the character (hereinafter referred to as a lump of characters) is generated. The character image can be extracted, but the character images "city" and "4-1-1" in the figure cannot be cut out correctly. Therefore, the horizontal projection distribution is calculated as shown in FIG. 1B for each character block image divided by the horizontal projection distribution shown in FIG. As shown in FIG. 1B, in the character block images “kawa” and “4-1-1”, projection distributions representing three and five partial characters are extracted, and the character block images “kawa” and “4-1” are extracted. Since both horizontal positions of "4-1-1" are known, the position and size of each character block image and the number of elements included in the character block (for example, character block image "Kawa")
And "4-1-1", the number of elements is 3 and 4, respectively. Next, the size of each character block image and the number of elements included in the character block are checked to check the possibility of horizontal writing. For example, in FIG. 1C, the character block images “Kawa” and “4-1-4” are determined to be candidate character block images for horizontal writing from the number of elements and the aspect ratio of the character block images. Then, as the horizontally-written candidate character block image, two regions shown by the symbols L 1 and L 3 in FIG. 1C are obtained, and the vertically-written character block image shown by the symbol L 2 is obtained.

ここで、本発明では、例えば、郵便物上の住所で用いら
れる横書き文字に関して丁目、番地などを数字及び特殊
記号を用いて表現されることが多いことから、文字行の
横書き候補文字塊イメージの相対位置も利用されるた
め、図1(d)で示すように、文字塊イメージ“川”は
縦書きで1文字と判断される。ここで文字分離に関し、
横書きと判断された領域“4−1−1"は、図1(d)の
図中点線で示すように空白により1文字毎に分離され、
縦書きと判断された領域は例えば、文字ピツチを基礎と
して文字分離装置により1文字毎に分離される。
Here, in the present invention, for example, with respect to the horizontal writing characters used in the address on the postal matter, because the chome, address, etc. are often expressed using numbers and special symbols, the horizontal writing candidate character block image of the character line is Since the relative position is also used, as shown in FIG. 1D, the character block image “kawa” is determined to be one character in vertical writing. Regarding character separation here,
The area "4-1-1" which is determined to be horizontally written is separated for each character by a blank space as shown by a dotted line in FIG.
The area determined to be vertically written is separated for each character by the character separating device based on the character pitch, for example.

尚、以上の処理は、投影分布を用いた説明したが、例え
ば、文字イメージの輪郭追跡などの方法を用いて実現で
きることは言うまでもない。また本発明は、図1で説明
した縦書き住所に限定されることなく、利用できること
は言うまでもない。
Although the above processing has been described using the projection distribution, it goes without saying that it can be realized using a method such as contour tracing of a character image. Further, it goes without saying that the present invention can be used without being limited to the vertical address described in FIG.

図2は、本発明の一実施例を示す論理ブロツク図であ
る。図において、1は行イメージ記憶部であり、図1
(a)で示すような文字行イメージを記憶する。尚、紙
面上から前述した文字行イメージを検出する技術は公知
の技術を用いて行うことができる。
FIG. 2 is a logic block diagram showing an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a row image storage unit.
A character line image as shown in (a) is stored. A known technique can be used as the technique for detecting the character line image on the paper surface.

2は、行分割部である。行分割部2は、図1(b)で示
したように、文字行を部分文字行に分割し、各文字塊イ
メージの位置・大きさ及び要素数(以下、文字塊情報と
呼ぶ)を検出し、文字塊情報記憶部3に格納する。尚、
行分割部2は公知の技術を用いることができる。部分行
判定部は、文字塊情報記憶部3より各文字塊イメージの
位置・大きさ及び要素数を基にして、図1で示したよう
に縦書きの部分行イメージと横書きの部分行イメージで
あるかを判定して、各文字塊情報及び縦書き又は横書き
の各部分行の領域情報を部分行情報記憶部5に格納す
る。
2 is a line dividing unit. As shown in FIG. 1B, the line dividing unit 2 divides a character line into partial character lines, and detects the position / size and the number of elements (hereinafter referred to as character block information) of each character block image. Then, it is stored in the character block information storage unit 3. still,
A known technique can be used for the line division unit 2. Based on the position / size and the number of elements of each character block image from the character block information storage unit 3, the partial line determination unit displays a vertical partial line image and a horizontal partial line image as shown in FIG. It is determined whether there is any character block information and the area information of each partial line of vertical writing or horizontal writing is stored in the partial line information storage unit 5.

文字分離部6では、複数個の文字分離手段を内蔵してお
り、文字分離部6に順次、転送される部分行情報記憶部
5に格納された部分行の領域情報及び各文字塊情報に従
つて、所定の文字分離手段を起動し、行イメージ記憶部
1に格納された文字行イメージを順次1文字毎に切り出
し、文字イメージ記憶部7に格納する。
The character separating unit 6 has a plurality of character separating means built therein, and follows the area information of each partial line stored in the partial line information storage unit 5 and each piece of character block information that are sequentially transferred to the character separating unit 6. Then, a predetermined character separating means is activated, and the character line images stored in the line image storage unit 1 are sequentially cut out for each character and stored in the character image storage unit 7.

(発明の効果) 以上説明したように、本発明によれば、縦書き、横書き
が混在し、手書き文字も利用される文字行であつても、
処理速度を低下させることなく、しかも精度良く文字切
り出しを可能とする文字分離装置を容易に提供すること
が可能となる。
(Effect of the invention) As described above, according to the present invention, even in a character line in which vertical writing and horizontal writing are mixed and handwritten characters are also used,
It is possible to easily provide a character separation device that can accurately cut out a character without reducing the processing speed.

【図面の簡単な説明】[Brief description of drawings]

図1は、郵便物上の住所の一例を用いて本発明の原理を
示す図である。図2は、本発明の一実施例を示す論理ブ
ロツク図である。 図において、1は行イメージ記憶部、2は行分割部、3
は文字塊情報記憶部、4は部分行判定部、5は部分行情
報記憶部、6は文字分離部、7は文字イメージ記憶部で
ある。
FIG. 1 is a diagram showing the principle of the present invention using an example of an address on a mail item. FIG. 2 is a logic block diagram showing an embodiment of the present invention. In the figure, 1 is a line image storage unit, 2 is a line dividing unit, 3
Is a character block information storage unit, 4 is a partial line determination unit, 5 is a partial line information storage unit, 6 is a character separation unit, and 7 is a character image storage unit.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】紙面上に記載された複数の文字行を走査
し、個々の文字イメージを抽出する文字分離装置におい
て:ブロツク化された文字行内に異なる向きを有する部
分文字行が混在するか否かを検知し、同一方向の前方部
分文字行毎に前記文字行を分割する手段と;複数個の文
字分離手段を設け,前記部分文字行の特性及び前記文字
行内の相対位置に従つて前記複数個の文字分離手段から
所定の文字分離手段を選択する手段とを具備することを
特徴とする文字分離装置。
1. A character separating device for scanning a plurality of character lines described on a paper surface to extract individual character images: Whether partial character lines having different directions are mixed in a blocked character line. And a means for dividing the character line for each forward partial character line in the same direction; a plurality of character separating means are provided; And a means for selecting a predetermined character separating means from the individual character separating means.
JP61235941A 1986-10-03 1986-10-03 Character separator Expired - Lifetime JPH0762860B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61235941A JPH0762860B2 (en) 1986-10-03 1986-10-03 Character separator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61235941A JPH0762860B2 (en) 1986-10-03 1986-10-03 Character separator

Publications (2)

Publication Number Publication Date
JPS6389989A JPS6389989A (en) 1988-04-20
JPH0762860B2 true JPH0762860B2 (en) 1995-07-05

Family

ID=16993493

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61235941A Expired - Lifetime JPH0762860B2 (en) 1986-10-03 1986-10-03 Character separator

Country Status (1)

Country Link
JP (1) JPH0762860B2 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55121584A (en) * 1979-03-12 1980-09-18 Daihen Corp Automatic pattern checking method

Also Published As

Publication number Publication date
JPS6389989A (en) 1988-04-20

Similar Documents

Publication Publication Date Title
US5201011A (en) Method and apparatus for image hand markup detection using morphological techniques
Aradhye A generic method for determining up/down orientation of text in roman and non-roman scripts
US6778703B1 (en) Form recognition using reference areas
JP3086702B2 (en) Method for identifying text or line figure and digital processing system
JPH0420226B2 (en)
JPH0762860B2 (en) Character separator
JP3268552B2 (en) Area extraction method, destination area extraction method, destination area extraction apparatus, and image processing apparatus
JPH04502526A (en) image recognition
JP3440501B2 (en) Driver's license recognition device
JP4244692B2 (en) Character recognition device and character recognition program
Jeong et al. A document image preprocessing system for keyword spotting
JP2570703B2 (en) Character reader
JP3091278B2 (en) Document recognition method
Wolf et al. Form-based localization of the destination address block on complex envelopes
JP3162552B2 (en) Mail address recognition device and address recognition method
JPS6394384A (en) System for deciding direction of character row
JP2000210624A (en) Postal address recognition device
JPH0737034A (en) Optical character reader
JPH11238095A (en) Postal address reader
JP2616995B2 (en) Character recognition device
JP2000339408A (en) Character segmentation device
JPH02230484A (en) Character recognizing device
JPH04309B2 (en)
JPH09212579A (en) Address letter recognition method for mail
JPH0433082A (en) Document recognizing device

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term