JPH022192B2

JPH022192B2 -

Info

Publication number: JPH022192B2
Application number: JP57157237A
Authority: JP
Inventors: Shigemi Osada; Junji Hatsuzaki; Akira Inoe
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-09-09
Filing date: 1982-09-09
Publication date: 1990-01-17
Also published as: JPS5945584A

Description

【発明の詳細な説明】 (1) 発明の技術分野本発明は複数の文字が近接して書かれた文字列
に対して個々の文字を分離抽出することができる
文字分離抽出方式に関するものである。[Detailed Description of the Invention] (1) Technical Field of the Invention The present invention relates to a character separation and extraction method that can separate and extract individual characters from a character string in which a plurality of characters are written close to each other. .

(2) 従来技術と問題点従来の文字列の分離抽出方式としては、一定の
形状たとえば矩形のウインドウを文字列上に走査
させ、個々の文字を抽出する方式があるが、複数
の文字が互いに近接して書かれている文字列の場
合、これらの文字を正確に分離抽出することは不
可能であつた。(2) Prior Art and Problems Conventional methods for separating and extracting character strings include methods in which a window of a certain shape, for example, a rectangle, is scanned over a character string to extract individual characters. In the case of character strings written close to each other, it has been impossible to accurately separate and extract these characters.

(3) 発明の目的本発明の目的は複数の文字が近接して書かれて
いる場合でも各文字を正確に分離することができ
る文字分離抽出方式を提供することである。(3) Purpose of the Invention The purpose of the present invention is to provide a character separation and extraction method that can accurately separate each character even when a plurality of characters are written close to each other.

(4) 発明の構成前記目的を達成するため、本発明の文字分離抽
出方式は単一または複数の文字から構成される文
字列をその外接矩形よりも各方向にそれぞれ１画
素分大きい矩形によつて切出された画像に対し、
文字列方向に垂直で長きが矩形幅に等しい１画素
幅のウインドウを矩形領域の一端より走査し、最
初に該ウインドウ内に黒画素が出現する位置を検
出してその位置での該ウインドウの両端に存在す
る画素間の最短径路の探索領域を該ウインドウ位
置から走査方向側に限定して最短径路を見出し、
該ウインドウと前記最短径路によつて囲まれた領
域を１文字の存在する領域とみなし、その領域を
逐次分離抽出することにより、近接して書かれた
文字列から個々の文字の存在する領域を抽出可能
としたことを特徴とするものである。(4) Structure of the Invention In order to achieve the above object, the character separation and extraction method of the present invention divides a character string consisting of a single character or a plurality of characters into a rectangle that is one pixel larger in each direction than its circumscribing rectangle. For the extracted image,
A window with a width of 1 pixel, which is perpendicular to the character string direction and whose length is equal to the width of the rectangle, is scanned from one end of the rectangular area, and the position where a black pixel appears in the window is first detected, and both ends of the window at that position are scanned. Find the shortest path by limiting the search area for the shortest path between pixels existing in the window position to the scanning direction side,
The area surrounded by the window and the shortest path is regarded as the area where one character exists, and by sequentially separating and extracting that area, the area where each character exists is extracted from a string of characters written in close proximity. It is characterized by being able to be extracted.

(5) 発明の実施例第１図〜第１４図は本発明を具体例により説明
する手順説明図である。(5) Embodiments of the Invention FIGS. 1 to 14 are procedural explanatory diagrams for explaining the present invention using specific examples.

ここでは文字列として第１図に示す「V_2B」の
ようにそのままでは分離困難な場合の処理手順を
示す。 Here, we will show the processing procedure when it is difficult to separate the character string as it is, such as "V _2B " shown in FIG. 1.

第１図は文字列１１₁〜１１₃をその外接矩形よ
りも各方向にそれぞれ１画素分大きい矩形枠１２
により切出した画像を示す。この画像に対して同
図に示すウインドウ１３、すなわち文字列方向の
垂直で長さが矩形枠１２の縦幅に等しく、１画素
幅のウインドウを矩形枠内領域の１端より走査さ
せ、最初にウインドウ１３内に黒画素１４が出現
する位置を検出する。 Figure 1 shows character strings 11 ₁ to 11 ₃ in a rectangular frame 12 that is one pixel larger in each direction than its circumscribed rectangle.
This shows an image cut out by . This image is scanned with a window 13 shown in the same figure, that is, a window that is perpendicular to the character string direction and whose length is equal to the vertical width of the rectangular frame 12 and has a width of 1 pixel from one end of the area within the rectangular frame. The position where the black pixel 14 appears within the window 13 is detected.

第２図は第１図の文字列１１₁〜１１₃に対しウ
インドウ１３により黒画素１４を検出した位置を
示す。次に、この位置におけるウインドウ１３の
両端位置“Ｓ”，“Ｅ”の白画素間の最短径路を見
出すのであるが、その探索領域をウインドウ１３
の位置から文字列の走査方向に限定する。第２図
の“＋”印で示す文字端位置１５は探索領域の限
定を表わすラベルである。すなわち、“＋”印の
右側の領域が探索領域である。最短径路の抽出ア
ルゴリズムとしては、従来各種のものが提案され
ているが、ここではリー（Lee）のアルゴリズム
を用いて説明する。 FIG. 2 shows the positions where black pixels 14 are detected by the window 13 for the character strings 11 ₁ to 11 ₃ shown in FIG. Next, the shortest path between the white pixels at both end positions "S" and "E" of the window 13 at this position is found.
Limited to the scanning direction of the character string from the position. The character end position 15 indicated by the "+" mark in FIG. 2 is a label indicating the limitation of the search area. That is, the area to the right of the "+" mark is the search area. Although various algorithms have been proposed to extract the shortest path, here, Lee's algorithm will be used for explanation.

前述の処理によつて決定された２点“Ｓ”，
“Ｅ”のうち、まずどちらか一方、たとえば“Ｓ”
を出発点として選ぶ。最初に、“Ｓ”に直接隣接
する（４方向連結）白画素にラベル“１”を与え
る。次にラベル“１”の与えられた画素に隣接す
る白画素にラベル“２”を与える。ラベル“２”
に隣接する白画素にラベル“３”を与える。さら
にラベル“３”に隣接する白画素にラベル“１”
を与える。以下、これを繰返えす。この処理は終
点“Ｅ”に到達するまでこの手順で続けられる。
第３図はこのラベル付与の手順の結果を示す。 Two points “S” determined by the above processing,
One of “E” first, for example “S”
Choose as a starting point. First, a label "1" is given to the white pixel directly adjacent to "S" (four-way connection). Next, a label "2" is given to the white pixel adjacent to the pixel given the label "1". Label “2”
Label "3" is given to the white pixel adjacent to . Furthermore, the white pixel adjacent to the label “3” is labeled “1”.
give. Repeat this below. This procedure continues until the end point "E" is reached.
FIG. 3 shows the results of this labeling procedure.

次にこのラベルを終点“Ｅ”からラベル付与と
逆の手順で“３”→“２”→“１”→“３”と巡
回するラベルの順番に逆追跡し、始点“Ｓ”に到
達する径路を見出す。この径路が最短径路である
が、必ずしも一意的に定まることは限らない。こ
れを一意に定めるため、第４図に示すように、逆
追跡の方向に優先順位を設け、これに従つて追跡
を行なう。第５図に逆追跡によつて検出された最
短径路を示し、逆追跡の際に径路上の画素１６に
ラベル“＋”を与えたものである。 Next, this label is traced in the reverse order of labeling from the end point "E" to "3" → "2" → "1" → "3" in the reverse order of labeling, and reaches the start point "S". Find a path. Although this route is the shortest route, it is not necessarily determined uniquely. In order to uniquely determine this, as shown in FIG. 4, priorities are set in the direction of reverse tracking, and tracking is performed in accordance with these priorities. FIG. 5 shows the shortest path detected by back tracking, and a label "+" is given to each pixel 16 on the path during back tracking.

以上の処理によつて、文字列中の１文字の存在
する領域が“＋”で囲まれた領域として識別され
る。そこでこの領域内の黒画像を切出すことによ
つて、１文字の画像が分離抽出される。 Through the above processing, an area where one character in a character string exists is identified as an area surrounded by "+". Therefore, by cutting out the black image within this area, the image of one character is separated and extracted.

第６図は１文字を分離抽出した後の矩形枠１２
の領域内の文字列１１₂，１１₃の画像である。 Figure 6 shows the rectangular frame 12 after separating and extracting one character.
This is an image of character strings 11 ₂ and 11 ₃ within the area.

以上の処理を、１文字分離抽出する毎に白画
素、黒画素以外のラベルが付与された画素を白画
素に変換した後、矩形領域内の黒画素が検出され
なくなるまで繰返すことにより、文字列の個々の
文字が１文字ずつ分離抽出できる。 By repeating the above process until no black pixels are detected within the rectangular area, after converting the pixels labeled other than white pixels and black pixels into white pixels each time one character is separated and extracted, the character string Each character can be separated and extracted one by one.

すなわち、第７図に示すように、次の文字１１
_２に対しウインドウを設定し、次の文字の探索領
域の限定を表わすラベル１７を決定し、第２図〜
第６図の手順を繰返す。 That is, as shown in Figure 7, the next character 11
A window is set for ₂ , a label 17 representing the limit of the search area for the next character is determined, and the labels shown in FIGS.
Repeat the procedure shown in Figure 6.

第８図は本発明の実施例の構成説明図である。
同図において、２１は入力文字列画像の外接矩形
を検出する外接矩形検出回路、２２は文字列画像
を格納しておく画像メモリ、２３は分離された文
字を切出す文字切出し回路、２４は文字端位置を
検出するためのウインドウを走査するウインドウ
走査回路、２５は文字端位置へのラベル付けや最
短径路探索のためのラベルを付与するラベリング
回路、２６は付与されたラベルを基に逆追跡を行
なう逆追跡回路、２７は分離抽出された文字の消
去及びラベルの消去を行なうクリア回路、２８は
文字列画像の外接矩形のアドレスやラベリングの
始点、終点のアドレスを保持しておくアドレステ
ーブルである。 FIG. 8 is a configuration explanatory diagram of an embodiment of the present invention.
In the figure, 21 is a circumscribed rectangle detection circuit that detects a circumscribed rectangle of an input character string image, 22 is an image memory that stores the character string image, 23 is a character cutting circuit that cuts out separated characters, and 24 is a character string image. A window scanning circuit 25 scans a window to detect the end position, a labeling circuit 25 labels the end position of a character and a label for searching for the shortest path, and 26 performs reverse tracing based on the assigned label. 27 is a clear circuit that erases separated and extracted characters and labels; 28 is an address table that holds the address of the circumscribing rectangle of the character string image and the addresses of the start and end points of labeling. .

この構成において、第１図〜第７図の例を引用
して説明する。 This configuration will be explained with reference to the examples shown in FIGS. 1 to 7.

まず、入力文字列画像１１₁〜１１₃を外接矩形
検出回路２１を介して画像メモリ２２に格納す
る。このとき外接矩形検出回路２１によつて検出
された入力文字列１１₁〜１１₃の外接矩形より各
方向にそれぞれ１画素分大きい矩形枠１２のアド
レスをアドレステーブル２８に書込む。 First, input character string images 11 ₁ to 11 ₃ are stored in the image memory 22 via the circumscribed rectangle detection circuit 21 . At this time, the addresses of the rectangular frames 12 that are larger by one pixel in each direction than the circumscribed rectangles of the input character strings 11 ₁ to 11 ₃ detected by the circumscribed rectangle detection circuit 21 are written into the address table 28 .

ウインドウ走査回路２４は、アドレステーブル
２８に保持されている矩形アドレスを基に、矩形
領域の一端よりウインドウ１３を走査し、文字端
位置１５、すなわちウインドウ内に最初に黒画素
が現われる位置を検出する。そしてその位置１５
でのウインドウ１３の両端の一方を始点（Ｓ）、
他方を終点（Ｅ）としてそのアドレスをアドレス
テーブル２８に書込む。 The window scanning circuit 24 scans the window 13 from one end of the rectangular area based on the rectangular address held in the address table 28, and detects the character end position 15, that is, the position where a black pixel first appears in the window. . and its position 15
Set one of the ends of window 13 as the starting point (S),
The other is set as the end point (E) and its address is written into the address table 28.

ラベリング回路２５は始点アドレスを基に、ま
ず最短径路探索領域を限定するためのラベル、後
の処理では文字の切出しに利用されるラベルを、
文字端位置１５に付与する。 Based on the starting point address, the labeling circuit 25 first generates a label to limit the shortest route search area, and a label to be used for cutting out characters in later processing.
Assigned to character end position 15.

これは第２図中に“＋”印で示されるラベルで
あり、実際には他のラベルと区別される特別のコ
ードである。 This is a label indicated by a "+" mark in FIG. 2, and is actually a special code to be distinguished from other labels.

その後、ラベリング回路２５は始点（Ｓ）から
隣接する４方向連結画素に“１”→“２”→
“３”→“１”と巡回するラベルを終点に達する
まで付与していく。 After that, the labeling circuit 25 assigns "1" → "2" → adjacent four-way connected pixels from the starting point (S).
Labels that cycle from "3" to "1" are given until the end point is reached.

ラベル付けが終点に達すると、逆追跡回路２６
は終点(E)からラベル付けの逆の順番でラベルを逆
追跡し、始点（Ｓ）に到達する径路１６を見出
す。これと同時に見出した径路１６には文字端位
置１５に付与したものと同じラベル“＋”を付与
する。 When the labeling reaches the end point, the backtracking circuit 26
traces the labels back from the end point (E) in the reverse order of labeling and finds a path 16 that reaches the start point (S). At the same time, the path 16 found is given the same label "+" as that given to the character end position 15.

文字切出し回路２３は文字端位置１５のラベル
“＋”と、始点（Ｓ）、終点(E)の最短径路１６のラ
ベル“＋”で囲まれる領域内の黒画素１４を切出
し、この画像１１₁を出力に接続される文字認識
装置等に送出する。 The character cutting circuit 23 cuts out the black pixels 14 in the area surrounded by the label "+" of the character end position 15 and the label "+" of the shortest path 16 of the start point (S) and end point (E), and extracts the black pixels 14 from this image 11 ₁ is sent to a character recognition device etc. connected to the output.

クリア回路２７は前記ラベル“＋”で囲まれる
文字領域およびこの切出しのために用いた各ラベ
ルを消去し、第６図の次の文字列１１₂，１１₃の
みが得られ、第７図の文字端位置１７で限定され
る領域につき前述の手順が再び繰返えされる。 The clear circuit 27 erases the character area surrounded by the label "+" and each label used for this cutting out, and only the next character strings 11 ₂ and 11 ₃ shown in FIG. 6 are obtained, and the character strings 11 2 and 11 3 shown in FIG. The above procedure is repeated again for the area defined by the character end position 17.

(6) 発明の効果以上説明したように、本発明によれば、複数の
文字が近接して書かれている場合でも、これを単
独に分離抽出することが可能となる。従つて、フ
リーフオーマツトの手書き文字でも単独に分離抽
出することができるので、手書き文字に対する文
字認識等を行なう場合に大きな効果が発揮され
る。(6) Effects of the Invention As explained above, according to the present invention, even when a plurality of characters are written close to each other, it is possible to separate and extract each character individually. Therefore, even free-format handwritten characters can be separated and extracted individually, which is highly effective when performing character recognition for handwritten characters.

[Brief explanation of drawings]

第１図〜第７図は本発明の手順説明図、第８図
は本発明の実施例の構成説明図であり、図中１１
_１〜１１₃は文字列、１２は矩形枠、１３はウイン
ドウ、１４は黒画素、１５，１７は文字端位置、
１６は最短径路、２１は外接矩形検出回路、２２
は画像メモリ、２３は文字切出し回路、２４はウ
インドウ走査回路、２５はラベリング回路、２６
は逆追跡回路、２７はクリア回路、２８はアドレ
ステーブルを示す。 1 to 7 are procedure explanatory diagrams of the present invention, and FIG. 8 is a configuration explanatory diagram of an embodiment of the present invention.
₁ to 11 ₃ are character strings, 12 is a rectangular frame, 13 is a window, 14 is a black pixel, 15 and 17 are character end positions,
16 is the shortest path, 21 is a circumscribed rectangle detection circuit, 22
is an image memory, 23 is a character cutting circuit, 24 is a window scanning circuit, 25 is a labeling circuit, 26
27 is a clear circuit, and 28 is an address table.

Claims

[Claims]

1 For an image in which a character string consisting of a single or multiple characters is cut out by a rectangle that is one pixel larger in each direction than its circumscribed rectangle, the length is perpendicular to the direction of the character string. A window with a width of 1 pixel, which is equal to the width of the rectangle, is scanned from one end of the rectangular area, and the position where a black pixel appears in the window is first detected, and the shortest path between the pixels existing at both ends of the window at that position is determined. The shortest path is found by limiting the search area from the window position to the scanning direction side, and the area surrounded by the window and the shortest path is regarded as an area where one character exists, and that area is sequentially separated and extracted. A character separation and extraction method characterized by making it possible to extract regions where individual characters exist from character strings written in close proximity.