JPH04167193A - Character recognizing method - Google Patents
Character recognizing methodInfo
- Publication number
- JPH04167193A JPH04167193A JP2295733A JP29573390A JPH04167193A JP H04167193 A JPH04167193 A JP H04167193A JP 2295733 A JP2295733 A JP 2295733A JP 29573390 A JP29573390 A JP 29573390A JP H04167193 A JPH04167193 A JP H04167193A
- Authority
- JP
- Japan
- Prior art keywords
- character
- punctuation mark
- document
- punctuation
- orientation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Character Input (AREA)
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
[産業上の利用分野]
本発明は、紙等に印刷された文字を画像として読み取り
、各文字を認識する文字認識方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character recognition method for reading characters printed on paper or the like as images and recognizing each character.
[従来の技術]
従来、文字パターンから文字を認識する際には、画像読
み取り装置に認識対象文書となる原稿を載せて、原稿の
画像をビットイメージで読み取り、その中から各文字を
切り出す。そして、各文字の文字パターンの特徴量と予
め保持しである文字の特徴量とを比較しながら、一致が
得られたらその文字を認識結果として出力していた。[Prior Art] Conventionally, when recognizing characters from a character pattern, a document to be recognized is placed on an image reading device, an image of the document is read as a bit image, and each character is cut out from the document. Then, while comparing the feature amount of the character pattern of each character with the feature amount of a previously stored character, if a match is found, that character is output as a recognition result.
[発明が解決しようとする課題]
しかしながら従来の場合、原稿の置き方として文字の方
向即ち、上下、左右を指定された通りに置かないと文字
の認識が行われなかったり、或は、原稿の載置の方向を
ユーザが翻訳装置に指示しておく必要があった。或は、
4つの方向全てに辞書との照合を行う為処理速度が遅か
った。[Problems to be Solved by the Invention] However, in the conventional case, the characters cannot be recognized unless the original is placed in the specified direction (top, bottom, left and right) or the original is placed in the specified direction. It was necessary for the user to instruct the translation device about the direction of placement. Or,
The processing speed was slow because the dictionary was checked in all four directions.
[課題を解決するための手段]
この課題を解決するために本発明は句読点の位置に着目
して文字の方向を識別するようにした。[Means for Solving the Problem] In order to solve this problem, the present invention focuses on the position of punctuation marks to identify the direction of characters.
即ち、認識対象文書が、縦書きが横書きかを求め認識対
象文書を構成する文字の外接矩形の大きさ、位置に基づ
いて句読点に対応する位置に文字が存在するとした場合
の該文字の外接矩形となる句読点領域を求め、その領域
内での句読点の位置に応じて文字の方向を特定し、文字
の認識を行うようにした。In other words, it is determined whether the recognition target document is written vertically or horizontally, and based on the size and position of the circumscribed rectangle of the characters constituting the recognition target document, the circumscribed rectangle of the character is determined if the character exists in a position corresponding to a punctuation mark. The system calculates the punctuation mark area, identifies the direction of the character according to the position of the punctuation mark within that area, and performs character recognition.
[作用]
句読点は横書きの場合は領域の左下に、縦書きの場合に
は右上にあるものなので、文書が縦書きか横書きかとい
う情報とで、文字の方向を特定することができる。[Operation] Punctuation marks are located at the bottom left of the area in horizontal writing, and at the top right in vertical writing, so the direction of the characters can be specified using the information as to whether the document is written vertically or horizontally.
[実施例コ
以下、本発明の一実施例における文字認識方法について
図面を参照しながら説明する。[Embodiment] A character recognition method according to an embodiment of the present invention will be described below with reference to the drawings.
第1図は、本実施例の文字認識方法を用いた文字認識装
置の構成図である。1は文字の記入されている文書を光
学的に走査し、電気信号に変換する光電変換部、2は光
電変換部1がら送られてくる光電変換信号を、ある基準
により、白黒に対応して2値化する2値化部、3は2値
化部2がら送られてくる2値化パターンから領域に分割
する領域分割部、4は分割された領域から文字領域を抽
出する文字領域抽出部、5は文字領域抽出部4により抽
出された文字領域の水平垂直射影を作成し、その間隔か
ら縦書き・横書きを判定する縦書き・横書き判定部、6
は文字領域抽出部4より抽出された文字領域のオリエン
テーションを判定するオリエンテーション判定部である
。7は縦書き・横書き判定部5、オリエンテーション判
定部6で判定した結果に従って、文字認識を行う文字認
識部である。FIG. 1 is a block diagram of a character recognition device using the character recognition method of this embodiment. 1 is a photoelectric conversion unit that optically scans a written document and converts it into an electrical signal; 2 is a photoelectric conversion unit that converts the photoelectric conversion signal sent from photoelectric conversion unit 1 into black and white according to a certain standard; A binarization unit performs binarization, 3 is an area division unit that divides the binarization pattern sent from the binarization unit 2 into areas, and 4 is a character area extraction unit that extracts character areas from the divided areas. , 5 is a vertical writing/horizontal writing determination unit that creates a horizontal/vertical projection of the character area extracted by the character area extraction unit 4 and determines vertical writing/horizontal writing from the interval, 6
is an orientation determination unit that determines the orientation of the character area extracted by the character area extraction unit 4. Reference numeral 7 denotes a character recognition unit that performs character recognition according to the results determined by the vertical/horizontal writing determination unit 5 and the orientation determination unit 6.
第2図は、第1図の文字認識装置のブロック図である。FIG. 2 is a block diagram of the character recognition device of FIG. 1.
8はRAMで、画像データ等を格納する。9はROMで
、文字パターン辞書、プログラム等を格納する。10は
R3232Cで、認識指令及び、文字出力を行う。11
は演算処理装置で、プログラムの動作をコントロールす
る。12はスキャナで画像をイメージ・メモリの中にと
りこむ。A RAM 8 stores image data and the like. A ROM 9 stores a character pattern dictionary, programs, and the like. 10 is R3232C, which issues recognition commands and outputs characters. 11
is an arithmetic processing unit that controls the operation of a program. Reference numeral 12 captures an image into an image memory using a scanner.
処理全体の流れを、第3図のフローチャートを使って、
説明する。The flow of the entire process can be explained using the flowchart in Figure 3.
explain.
初めに、2値データを縦横両方向に走査して、白黒それ
ぞれの画素の連続した部分を領域区分線として、領域に
分割する(ステップl、第9図、第10図)。分割した
領域を更に黒画素8連結の小領域に分割し、その間隔、
大きさの分布が、ばらつきの少ない領域を、文字領域と
して抽出する(ステップ2、第11図)。抽出された文
字領域の水平・垂直斜影を取り、その谷間の大きさより
、縦書き・横書きを判定する(ステップ3、第4図)。First, binary data is scanned in both the vertical and horizontal directions and divided into regions using continuous portions of black and white pixels as region dividing lines (step 1, FIGS. 9 and 10). The divided area is further divided into small areas of 8 connected black pixels, and the interval between them is
A region with little variation in size distribution is extracted as a character region (Step 2, FIG. 11). The horizontal and vertical oblique shadows of the extracted character area are taken, and the vertical or horizontal writing is determined based on the size of the valley (step 3, FIG. 4).
黒画素8連結小領域の位置、大きさから、句読点候補を
抽出し、認識を行う(ステップ4、第5図)。認識した
結果、その類似度等の評価値により、句読点であると判
定されれば、その句読点の位置に対応して文字が存在す
るとした場合のその文字の外接矩形である句読点領域を
求め、その句読点領域内での句読点の位置により、オリ
エンテーションを判定しくステップ8)、すべての句読
点がリジェクトされれば、数個の通常文字パタンを、そ
のまま、90度、180度、270度回転した4つのパ
ターンで認識を行う(ステップ6)。その4種類に対し
て、類似度等の認識確度の和が最大の方向より、オリエ
ンテーションを判定する(ステップ7)。最後に、対象
とした文字領域が、判定した縦書き・横書き、オリエン
テーションであるとして、すべての文字を認識する(ス
テップ9)。Punctuation mark candidates are extracted from the position and size of the small area of 8 connected black pixels and recognized (step 4, FIG. 5). As a result of the recognition, if it is determined that it is a punctuation mark based on the evaluation value such as its similarity, the punctuation mark area that is the circumscribed rectangle of the character if a character exists corresponding to the position of the punctuation mark is calculated, and the punctuation mark area is calculated. The orientation is determined based on the position of the punctuation mark within the punctuation mark area. Step 8) If all punctuation marks are rejected, four patterns are created by rotating several regular character patterns by 90 degrees, 180 degrees, and 270 degrees. recognition is performed (step 6). For these four types, the orientation is determined based on the direction in which the sum of recognition accuracy such as similarity is the largest (step 7). Finally, all characters are recognized assuming that the target character area has the determined vertical writing/horizontal writing and orientation (step 9).
具体例により、オリエンテーション判定について、説明
を補足する。句読点が、第8図(a)の様に左下にある
場合、横書きならば、第6図(a)ポートレート、正置
、縦書きならば(d)ポートレート、逆置き(180度
回転)であると判定できる。また、第7図(b)の様に
文字が90度回転ならば、各々第6図(f)ランドスケ
ープ、逆置き、第6図(h)ランドスケープ、正置であ
ると判定できる。The explanation of orientation determination will be supplemented with a specific example. If the punctuation mark is at the bottom left as shown in Figure 8 (a), if it is written horizontally, it will be in Figure 6 (a) portrait, upright; if it is written vertically, it will be (d) portrait, upside down (rotated 180 degrees). It can be determined that Furthermore, if the characters are rotated by 90 degrees as shown in FIG. 7(b), it can be determined that the characters are in FIG. 6(f) landscape, reverse orientation, and FIG. 6(h) landscape, normal orientation.
[発明の効果]
以上のように、本発明は、縦書き、横書きの情報と句読
点の位置とから原稿のオリエンテーションを判定するこ
とにより、全自動で、文字認識を行うことが可能となり
、省力化、操作ミスがなくなる等の効果がある。[Effects of the Invention] As described above, the present invention makes it possible to perform character recognition fully automatically by determining the orientation of a manuscript from information about vertical writing and horizontal writing and the position of punctuation marks, thereby saving labor. This has the effect of eliminating operational errors.
第1図は本発明の一実施例における文字認識方法を用い
た文字認識装置の構成図、第2図は本発明の一実施例に
おける文字認識装置のブロック図、第3図は本実施例に
おけるフローチャート、第4図は水平・垂直斜影の例を
示す図、第5図は8隣接小領域の外接矩形の例、第6図
は縦書き・横書き、オリエンテーションの例を示す図、
第7図は、スキャナから入ってくる文字パターンの例を
示す図、第8図は、同じく句読点の例を示す図、第9図
は入力文書の例を示す図、第10図は第9図の入力文書
の領域分割を行った例を示す図、第11図は、黒画素8
連結小連結に分割し、文字領域を抽出した例を示す図で
ある。
第2図
第4図
第5図
第7図
(a) (b)
(c) (d)
第8図
(a) (b)
(c) (d)
(○) (b)
(e) (f)図
(c) (d)
(CI) (it)第9図
第11図
第10図
屹字領域FIG. 1 is a block diagram of a character recognition device using a character recognition method according to an embodiment of the present invention, FIG. 2 is a block diagram of a character recognition device according to an embodiment of the present invention, and FIG. 3 is a block diagram of a character recognition device according to an embodiment of the present invention. Flowchart, FIG. 4 is a diagram showing an example of horizontal and vertical oblique shading, FIG. 5 is an example of a circumscribed rectangle of 8 adjacent small areas, FIG. 6 is a diagram showing an example of vertical writing, horizontal writing, and orientation.
Figure 7 is a diagram showing an example of a character pattern coming in from a scanner, Figure 8 is a diagram also showing an example of punctuation marks, Figure 9 is a diagram showing an example of an input document, and Figure 10 is a diagram showing an example of an input document. FIG. 11 is a diagram showing an example of region division of an input document.
It is a figure which shows the example which divided|segmented into connected small connections and extracted the character area. Figure 2 Figure 4 Figure 5 Figure 7 (a) (b) (c) (d) Figure 8 (a) (b) (c) (d) (○) (b) (e) (f ) Figure (c) (d) (CI) (it) Figure 9 Figure 11 Figure 10
Claims (1)
た外接矩形の大きさと並び方に基づいて読み取った画像
中の句読点に対応する位置に文字が存在するとした場合
の該文字の外接矩形である句読点領域を求め、 前記句読点領域内での句読点の位置と縦書きか横書きか
の情報とに応じて認識対象文書の向きを識別し、 識別された向きに応じて読み取られた画像中の各文字の
認識を行う ことを特徴とする文字認識方法。 2)前記句読点領域は複数の句読点に対して求め、その
各々から得られた文書の方向のうち、最も多い方向に応
じて文字の認識を行うことを特徴とする特許請求の範囲
第1項に記載の文字認識方法。[Claims] 1) Read an image of a document to be recognized, find a circumscribing rectangle of each character in the read image, and match punctuation marks in the read image based on the size and arrangement of the obtained circumscribed rectangle. If a character exists at that position, find a punctuation mark area that is a circumscribed rectangle of the character, and identify the orientation of the document to be recognized according to the position of the punctuation mark within the punctuation mark area and information on whether it is written vertically or horizontally. , A character recognition method characterized in that each character in a read image is recognized according to the identified orientation. 2) The punctuation mark area is determined for a plurality of punctuation marks, and characters are recognized according to the most common direction among the directions of the document obtained from each of the punctuation marks. Character recognition method described.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2295733A JP3064391B2 (en) | 1990-10-31 | 1990-10-31 | Character recognition method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2295733A JP3064391B2 (en) | 1990-10-31 | 1990-10-31 | Character recognition method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPH04167193A true JPH04167193A (en) | 1992-06-15 |
| JP3064391B2 JP3064391B2 (en) | 2000-07-12 |
Family
ID=17824470
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2295733A Expired - Fee Related JP3064391B2 (en) | 1990-10-31 | 1990-10-31 | Character recognition method |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP3064391B2 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH1166213A (en) * | 1997-08-26 | 1999-03-09 | Aputasu:Kk | Electronic clinical chart device, instruction transmitting method using the same, and computer-readable record medium recorded with medical data |
| JP2000076380A (en) * | 1998-08-31 | 2000-03-14 | Casio Comput Co Ltd | Handwritten character input device and storage medium |
| CN116563879A (en) * | 2023-05-12 | 2023-08-08 | 东南大学 | Method and system for recognizing multi-line text and/or multi-angle text in electrical drawings |
-
1990
- 1990-10-31 JP JP2295733A patent/JP3064391B2/en not_active Expired - Fee Related
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH1166213A (en) * | 1997-08-26 | 1999-03-09 | Aputasu:Kk | Electronic clinical chart device, instruction transmitting method using the same, and computer-readable record medium recorded with medical data |
| JP2000076380A (en) * | 1998-08-31 | 2000-03-14 | Casio Comput Co Ltd | Handwritten character input device and storage medium |
| CN116563879A (en) * | 2023-05-12 | 2023-08-08 | 东南大学 | Method and system for recognizing multi-line text and/or multi-angle text in electrical drawings |
Also Published As
| Publication number | Publication date |
|---|---|
| JP3064391B2 (en) | 2000-07-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2986383B2 (en) | Method and apparatus for correcting skew for line scan images | |
| JPH01253077A (en) | Detection of string | |
| JP3006466B2 (en) | Character input device | |
| JPH04167193A (en) | Character recognizing method | |
| JP3268552B2 (en) | Area extraction method, destination area extraction method, destination area extraction apparatus, and image processing apparatus | |
| US7103220B2 (en) | Image processing apparatus, method and program, and storage medium | |
| JPS63304387A (en) | document reading device | |
| JP7532124B2 (en) | Information processing device, information processing method, and program | |
| JPH08171609A (en) | High-speed character string extraction device | |
| JP3239965B2 (en) | Character recognition device | |
| JP2800205B2 (en) | Image processing device | |
| JP2571826B2 (en) | String pattern extraction device | |
| JPH07109612B2 (en) | Image processing method | |
| JPH11250179A (en) | Character reocognition device and its method | |
| JP3199033B2 (en) | Optical character reading method and optical character reading device | |
| JP2721415B2 (en) | Character image extraction method | |
| JPH04252665A (en) | Image processing device | |
| JP2954218B2 (en) | Image processing method and apparatus | |
| JPH04262660A (en) | Image recognition output device | |
| JPH083830B2 (en) | Character string pattern cutting method | |
| JPH01245376A (en) | Character segmenting device for character reader | |
| JPS6327751B2 (en) | ||
| JPS61165184A (en) | Automatic detecting system of reference point | |
| JPH01144181A (en) | Optical character reader | |
| JPH0578068B2 (en) |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090512 Year of fee payment: 9 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100512 Year of fee payment: 10 |
|
| LAPS | Cancellation because of no payment of annual fees |