JPH04167193A - Character recognizing method - Google Patents

Character recognizing method

Info

Publication number
JPH04167193A
JPH04167193A JP2295733A JP29573390A JPH04167193A JP H04167193 A JPH04167193 A JP H04167193A JP 2295733 A JP2295733 A JP 2295733A JP 29573390 A JP29573390 A JP 29573390A JP H04167193 A JPH04167193 A JP H04167193A
Authority
JP
Japan
Prior art keywords
character
punctuation mark
document
punctuation
orientation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2295733A
Other languages
Japanese (ja)
Other versions
JP3064391B2 (en
Inventor
Noboru Nakamura
昇 中村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2295733A priority Critical patent/JP3064391B2/en
Publication of JPH04167193A publication Critical patent/JPH04167193A/en
Application granted granted Critical
Publication of JP3064391B2 publication Critical patent/JP3064391B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は、紙等に印刷された文字を画像として読み取り
、各文字を認識する文字認識方法に関するものである。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character recognition method for reading characters printed on paper or the like as images and recognizing each character.

[従来の技術] 従来、文字パターンから文字を認識する際には、画像読
み取り装置に認識対象文書となる原稿を載せて、原稿の
画像をビットイメージで読み取り、その中から各文字を
切り出す。そして、各文字の文字パターンの特徴量と予
め保持しである文字の特徴量とを比較しながら、一致が
得られたらその文字を認識結果として出力していた。
[Prior Art] Conventionally, when recognizing characters from a character pattern, a document to be recognized is placed on an image reading device, an image of the document is read as a bit image, and each character is cut out from the document. Then, while comparing the feature amount of the character pattern of each character with the feature amount of a previously stored character, if a match is found, that character is output as a recognition result.

[発明が解決しようとする課題] しかしながら従来の場合、原稿の置き方として文字の方
向即ち、上下、左右を指定された通りに置かないと文字
の認識が行われなかったり、或は、原稿の載置の方向を
ユーザが翻訳装置に指示しておく必要があった。或は、
4つの方向全てに辞書との照合を行う為処理速度が遅か
った。
[Problems to be Solved by the Invention] However, in the conventional case, the characters cannot be recognized unless the original is placed in the specified direction (top, bottom, left and right) or the original is placed in the specified direction. It was necessary for the user to instruct the translation device about the direction of placement. Or,
The processing speed was slow because the dictionary was checked in all four directions.

[課題を解決するための手段] この課題を解決するために本発明は句読点の位置に着目
して文字の方向を識別するようにした。
[Means for Solving the Problem] In order to solve this problem, the present invention focuses on the position of punctuation marks to identify the direction of characters.

即ち、認識対象文書が、縦書きが横書きかを求め認識対
象文書を構成する文字の外接矩形の大きさ、位置に基づ
いて句読点に対応する位置に文字が存在するとした場合
の該文字の外接矩形となる句読点領域を求め、その領域
内での句読点の位置に応じて文字の方向を特定し、文字
の認識を行うようにした。
In other words, it is determined whether the recognition target document is written vertically or horizontally, and based on the size and position of the circumscribed rectangle of the characters constituting the recognition target document, the circumscribed rectangle of the character is determined if the character exists in a position corresponding to a punctuation mark. The system calculates the punctuation mark area, identifies the direction of the character according to the position of the punctuation mark within that area, and performs character recognition.

[作用] 句読点は横書きの場合は領域の左下に、縦書きの場合に
は右上にあるものなので、文書が縦書きか横書きかとい
う情報とで、文字の方向を特定することができる。
[Operation] Punctuation marks are located at the bottom left of the area in horizontal writing, and at the top right in vertical writing, so the direction of the characters can be specified using the information as to whether the document is written vertically or horizontally.

[実施例コ 以下、本発明の一実施例における文字認識方法について
図面を参照しながら説明する。
[Embodiment] A character recognition method according to an embodiment of the present invention will be described below with reference to the drawings.

第1図は、本実施例の文字認識方法を用いた文字認識装
置の構成図である。1は文字の記入されている文書を光
学的に走査し、電気信号に変換する光電変換部、2は光
電変換部1がら送られてくる光電変換信号を、ある基準
により、白黒に対応して2値化する2値化部、3は2値
化部2がら送られてくる2値化パターンから領域に分割
する領域分割部、4は分割された領域から文字領域を抽
出する文字領域抽出部、5は文字領域抽出部4により抽
出された文字領域の水平垂直射影を作成し、その間隔か
ら縦書き・横書きを判定する縦書き・横書き判定部、6
は文字領域抽出部4より抽出された文字領域のオリエン
テーションを判定するオリエンテーション判定部である
。7は縦書き・横書き判定部5、オリエンテーション判
定部6で判定した結果に従って、文字認識を行う文字認
識部である。
FIG. 1 is a block diagram of a character recognition device using the character recognition method of this embodiment. 1 is a photoelectric conversion unit that optically scans a written document and converts it into an electrical signal; 2 is a photoelectric conversion unit that converts the photoelectric conversion signal sent from photoelectric conversion unit 1 into black and white according to a certain standard; A binarization unit performs binarization, 3 is an area division unit that divides the binarization pattern sent from the binarization unit 2 into areas, and 4 is a character area extraction unit that extracts character areas from the divided areas. , 5 is a vertical writing/horizontal writing determination unit that creates a horizontal/vertical projection of the character area extracted by the character area extraction unit 4 and determines vertical writing/horizontal writing from the interval, 6
is an orientation determination unit that determines the orientation of the character area extracted by the character area extraction unit 4. Reference numeral 7 denotes a character recognition unit that performs character recognition according to the results determined by the vertical/horizontal writing determination unit 5 and the orientation determination unit 6.

第2図は、第1図の文字認識装置のブロック図である。FIG. 2 is a block diagram of the character recognition device of FIG. 1.

8はRAMで、画像データ等を格納する。9はROMで
、文字パターン辞書、プログラム等を格納する。10は
R3232Cで、認識指令及び、文字出力を行う。11
は演算処理装置で、プログラムの動作をコントロールす
る。12はスキャナで画像をイメージ・メモリの中にと
りこむ。
A RAM 8 stores image data and the like. A ROM 9 stores a character pattern dictionary, programs, and the like. 10 is R3232C, which issues recognition commands and outputs characters. 11
is an arithmetic processing unit that controls the operation of a program. Reference numeral 12 captures an image into an image memory using a scanner.

処理全体の流れを、第3図のフローチャートを使って、
説明する。
The flow of the entire process can be explained using the flowchart in Figure 3.
explain.

初めに、2値データを縦横両方向に走査して、白黒それ
ぞれの画素の連続した部分を領域区分線として、領域に
分割する(ステップl、第9図、第10図)。分割した
領域を更に黒画素8連結の小領域に分割し、その間隔、
大きさの分布が、ばらつきの少ない領域を、文字領域と
して抽出する(ステップ2、第11図)。抽出された文
字領域の水平・垂直斜影を取り、その谷間の大きさより
、縦書き・横書きを判定する(ステップ3、第4図)。
First, binary data is scanned in both the vertical and horizontal directions and divided into regions using continuous portions of black and white pixels as region dividing lines (step 1, FIGS. 9 and 10). The divided area is further divided into small areas of 8 connected black pixels, and the interval between them is
A region with little variation in size distribution is extracted as a character region (Step 2, FIG. 11). The horizontal and vertical oblique shadows of the extracted character area are taken, and the vertical or horizontal writing is determined based on the size of the valley (step 3, FIG. 4).

黒画素8連結小領域の位置、大きさから、句読点候補を
抽出し、認識を行う(ステップ4、第5図)。認識した
結果、その類似度等の評価値により、句読点であると判
定されれば、その句読点の位置に対応して文字が存在す
るとした場合のその文字の外接矩形である句読点領域を
求め、その句読点領域内での句読点の位置により、オリ
エンテーションを判定しくステップ8)、すべての句読
点がリジェクトされれば、数個の通常文字パタンを、そ
のまま、90度、180度、270度回転した4つのパ
ターンで認識を行う(ステップ6)。その4種類に対し
て、類似度等の認識確度の和が最大の方向より、オリエ
ンテーションを判定する(ステップ7)。最後に、対象
とした文字領域が、判定した縦書き・横書き、オリエン
テーションであるとして、すべての文字を認識する(ス
テップ9)。
Punctuation mark candidates are extracted from the position and size of the small area of 8 connected black pixels and recognized (step 4, FIG. 5). As a result of the recognition, if it is determined that it is a punctuation mark based on the evaluation value such as its similarity, the punctuation mark area that is the circumscribed rectangle of the character if a character exists corresponding to the position of the punctuation mark is calculated, and the punctuation mark area is calculated. The orientation is determined based on the position of the punctuation mark within the punctuation mark area. Step 8) If all punctuation marks are rejected, four patterns are created by rotating several regular character patterns by 90 degrees, 180 degrees, and 270 degrees. recognition is performed (step 6). For these four types, the orientation is determined based on the direction in which the sum of recognition accuracy such as similarity is the largest (step 7). Finally, all characters are recognized assuming that the target character area has the determined vertical writing/horizontal writing and orientation (step 9).

具体例により、オリエンテーション判定について、説明
を補足する。句読点が、第8図(a)の様に左下にある
場合、横書きならば、第6図(a)ポートレート、正置
、縦書きならば(d)ポートレート、逆置き(180度
回転)であると判定できる。また、第7図(b)の様に
文字が90度回転ならば、各々第6図(f)ランドスケ
ープ、逆置き、第6図(h)ランドスケープ、正置であ
ると判定できる。
The explanation of orientation determination will be supplemented with a specific example. If the punctuation mark is at the bottom left as shown in Figure 8 (a), if it is written horizontally, it will be in Figure 6 (a) portrait, upright; if it is written vertically, it will be (d) portrait, upside down (rotated 180 degrees). It can be determined that Furthermore, if the characters are rotated by 90 degrees as shown in FIG. 7(b), it can be determined that the characters are in FIG. 6(f) landscape, reverse orientation, and FIG. 6(h) landscape, normal orientation.

[発明の効果] 以上のように、本発明は、縦書き、横書きの情報と句読
点の位置とから原稿のオリエンテーションを判定するこ
とにより、全自動で、文字認識を行うことが可能となり
、省力化、操作ミスがなくなる等の効果がある。
[Effects of the Invention] As described above, the present invention makes it possible to perform character recognition fully automatically by determining the orientation of a manuscript from information about vertical writing and horizontal writing and the position of punctuation marks, thereby saving labor. This has the effect of eliminating operational errors.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例における文字認識方法を用い
た文字認識装置の構成図、第2図は本発明の一実施例に
おける文字認識装置のブロック図、第3図は本実施例に
おけるフローチャート、第4図は水平・垂直斜影の例を
示す図、第5図は8隣接小領域の外接矩形の例、第6図
は縦書き・横書き、オリエンテーションの例を示す図、
第7図は、スキャナから入ってくる文字パターンの例を
示す図、第8図は、同じく句読点の例を示す図、第9図
は入力文書の例を示す図、第10図は第9図の入力文書
の領域分割を行った例を示す図、第11図は、黒画素8
連結小連結に分割し、文字領域を抽出した例を示す図で
ある。 第2図 第4図 第5図 第7図 (a)     (b) (c)     (d) 第8図 (a)     (b) (c)     (d) (○)          (b) (e)            (f)図 (c)           (d) (CI)           (it)第9図 第11図 第10図 屹字領域
FIG. 1 is a block diagram of a character recognition device using a character recognition method according to an embodiment of the present invention, FIG. 2 is a block diagram of a character recognition device according to an embodiment of the present invention, and FIG. 3 is a block diagram of a character recognition device according to an embodiment of the present invention. Flowchart, FIG. 4 is a diagram showing an example of horizontal and vertical oblique shading, FIG. 5 is an example of a circumscribed rectangle of 8 adjacent small areas, FIG. 6 is a diagram showing an example of vertical writing, horizontal writing, and orientation.
Figure 7 is a diagram showing an example of a character pattern coming in from a scanner, Figure 8 is a diagram also showing an example of punctuation marks, Figure 9 is a diagram showing an example of an input document, and Figure 10 is a diagram showing an example of an input document. FIG. 11 is a diagram showing an example of region division of an input document.
It is a figure which shows the example which divided|segmented into connected small connections and extracted the character area. Figure 2 Figure 4 Figure 5 Figure 7 (a) (b) (c) (d) Figure 8 (a) (b) (c) (d) (○) (b) (e) (f ) Figure (c) (d) (CI) (it) Figure 9 Figure 11 Figure 10

Claims (1)

【特許請求の範囲】 1)認識対象文書の画像を読み取り、 読み取った画像中の各文字の外接矩形を求め、求められ
た外接矩形の大きさと並び方に基づいて読み取った画像
中の句読点に対応する位置に文字が存在するとした場合
の該文字の外接矩形である句読点領域を求め、 前記句読点領域内での句読点の位置と縦書きか横書きか
の情報とに応じて認識対象文書の向きを識別し、 識別された向きに応じて読み取られた画像中の各文字の
認識を行う ことを特徴とする文字認識方法。 2)前記句読点領域は複数の句読点に対して求め、その
各々から得られた文書の方向のうち、最も多い方向に応
じて文字の認識を行うことを特徴とする特許請求の範囲
第1項に記載の文字認識方法。
[Claims] 1) Read an image of a document to be recognized, find a circumscribing rectangle of each character in the read image, and match punctuation marks in the read image based on the size and arrangement of the obtained circumscribed rectangle. If a character exists at that position, find a punctuation mark area that is a circumscribed rectangle of the character, and identify the orientation of the document to be recognized according to the position of the punctuation mark within the punctuation mark area and information on whether it is written vertically or horizontally. , A character recognition method characterized in that each character in a read image is recognized according to the identified orientation. 2) The punctuation mark area is determined for a plurality of punctuation marks, and characters are recognized according to the most common direction among the directions of the document obtained from each of the punctuation marks. Character recognition method described.
JP2295733A 1990-10-31 1990-10-31 Character recognition method Expired - Fee Related JP3064391B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2295733A JP3064391B2 (en) 1990-10-31 1990-10-31 Character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2295733A JP3064391B2 (en) 1990-10-31 1990-10-31 Character recognition method

Publications (2)

Publication Number Publication Date
JPH04167193A true JPH04167193A (en) 1992-06-15
JP3064391B2 JP3064391B2 (en) 2000-07-12

Family

ID=17824470

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2295733A Expired - Fee Related JP3064391B2 (en) 1990-10-31 1990-10-31 Character recognition method

Country Status (1)

Country Link
JP (1) JP3064391B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1166213A (en) * 1997-08-26 1999-03-09 Aputasu:Kk Electronic clinical chart device, instruction transmitting method using the same, and computer-readable record medium recorded with medical data
JP2000076380A (en) * 1998-08-31 2000-03-14 Casio Comput Co Ltd Handwritten character input device and storage medium
CN116563879A (en) * 2023-05-12 2023-08-08 东南大学 Method and system for recognizing multi-line text and/or multi-angle text in electrical drawings

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1166213A (en) * 1997-08-26 1999-03-09 Aputasu:Kk Electronic clinical chart device, instruction transmitting method using the same, and computer-readable record medium recorded with medical data
JP2000076380A (en) * 1998-08-31 2000-03-14 Casio Comput Co Ltd Handwritten character input device and storage medium
CN116563879A (en) * 2023-05-12 2023-08-08 东南大学 Method and system for recognizing multi-line text and/or multi-angle text in electrical drawings

Also Published As

Publication number Publication date
JP3064391B2 (en) 2000-07-12

Similar Documents

Publication Publication Date Title
JP2986383B2 (en) Method and apparatus for correcting skew for line scan images
JPH01253077A (en) Detection of string
JP3006466B2 (en) Character input device
JPH04167193A (en) Character recognizing method
JP3268552B2 (en) Area extraction method, destination area extraction method, destination area extraction apparatus, and image processing apparatus
US7103220B2 (en) Image processing apparatus, method and program, and storage medium
JPS63304387A (en) document reading device
JP7532124B2 (en) Information processing device, information processing method, and program
JPH08171609A (en) High-speed character string extraction device
JP3239965B2 (en) Character recognition device
JP2800205B2 (en) Image processing device
JP2571826B2 (en) String pattern extraction device
JPH07109612B2 (en) Image processing method
JPH11250179A (en) Character reocognition device and its method
JP3199033B2 (en) Optical character reading method and optical character reading device
JP2721415B2 (en) Character image extraction method
JPH04252665A (en) Image processing device
JP2954218B2 (en) Image processing method and apparatus
JPH04262660A (en) Image recognition output device
JPH083830B2 (en) Character string pattern cutting method
JPH01245376A (en) Character segmenting device for character reader
JPS6327751B2 (en)
JPS61165184A (en) Automatic detecting system of reference point
JPH01144181A (en) Optical character reader
JPH0578068B2 (en)

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090512

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100512

Year of fee payment: 10

LAPS Cancellation because of no payment of annual fees