JPS60201486A

JPS60201486A - Handwritten document reading method

Info

Publication number: JPS60201486A
Application number: JP59057420A
Authority: JP
Inventors: Shigeru Goto; 茂後藤; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1984-03-27
Filing date: 1984-03-27
Publication date: 1985-10-11
Also published as: JPH0330191B2

Abstract

PURPOSE:To read a handwritten Japanese document with high speed and high precision by detecting a character kind designation in advance, detecting the character lines of a character without designation so as to select the character kind and using a dictionary proper to the character kind so as to identify the character. CONSTITUTION:A character type designation detecting sections 5 divides a pattern register 2 into four areas corresponding to a character kind designation describing frame 22, number of black points in each area is counted, compared with a threshold value, the presence or absence of the designation of the character kind is detected and the presence or absence of designation of the character kind is transmitted to an identification section 6. A character line amount detection section 4 detects the character lines from the character pattern in the pattern register 2 at the same time, the complicated degree D of the character is detected by normalizing the result with the size of the character, and when the complicated degree D of the character is detected, the D is fed to the character kind designation detection section 5. When the character kind designation is not detected on the character kind designation area at the character kind designation detection section 5, the complicated degree D is used, the character kind is decided and the result is fed to the identification section. In the identification section 6, the feature fed from the feature extraction section 3 and the dictionary are collated by using the dictionary memory of the character kind having the designation of character kind corresponding to each character in advance.

Description

【発明の詳細な説明】（技術分野）本発明は高速で精度の良い手書文書の読取方法に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a method for reading handwritten documents at high speed and with high accuracy.

（背景技術）これまでに手書文書の読取方法として記入した文字の文
字種を指定する記入枠内の指定の有無を検出し、指定を
検出したカラムの文字の読取を指定された文字種の辞書
のみを参照して行う方法が提案されている。しかしなが
ら、この方法では、文字種の指定がなかった場合、全て
の辞書を参照しなければならず処理速度が遅くなるとい
う問題があった。(Background technology) As a method for reading handwritten documents, the presence or absence of a specification in a writing frame that specifies the character type of the written character is detected, and the characters in the column where the specification is detected are read only in a dictionary of the specified character type. A method has been proposed that refers to However, this method has the problem that if no character type is specified, all dictionaries must be referenced, which slows down the processing speed.

（発明の目的および概要）本発明の目的は従来の技術の上記欠点を改善して高速で
精度のよい手書文書の読取方法を提供することにあり、
その特徴は、文字の文字線量を検出しそれを当該文字の
複雑度とし、字種の指定がなかった場合、その複雑度に
より当該文字の含まれる文字様の辞書を選択して識別を
行うことにある。(Objective and Summary of the Invention) An object of the present invention is to improve the above-mentioned drawbacks of the conventional technology and provide a method for reading handwritten documents at high speed and with high accuracy.
Its feature is that it detects the character dose of a character and uses it as the complexity of the character, and if the character type is not specified, it selects the dictionary of the character style that contains the character based on the complexity and performs identification. It is in.

（発明の実施例）第１図は本発明手書日本語文書読取方法における一実施
例を示す構成図である。図において、１は光電変換部、
２はバタンレジスタ、３は特徴抽出部、４は文字線量検
出部、５は文字種検出部、６は識別部、７は文字名出力
、８はひらがな辞書メモリ、９はカタカナ辞書メモリ、
１０は英字数字記号辞書メモリ、１１は漢字辞書メモリ
である。(Embodiment of the Invention) FIG. 1 is a block diagram showing an embodiment of the handwritten Japanese document reading method of the present invention. In the figure, 1 is a photoelectric conversion unit;
2 is a button register, 3 is a feature extraction unit, 4 is a character dose detection unit, 5 is a character type detection unit, 6 is an identification unit, 7 is a character name output, 8 is a hiragana dictionary memory, 9 is a katakana dictionary memory,
10 is an alphanumeric symbol dictionary memory, and 11 is a kanji dictionary memory.

また、第２図は本実施例に使用した帳票例を示す図で、
２１は帳票、２２は字種指定記入枠で、その中の２３は
漢字指定欄、２４はひらがな指定欄、２５はカタカナ指
定欄、２６は英字数字記号指定欄、２７は文字記入枠、
２８は字種指定記入行、２９は文字記入行で、例えば第
３図の帳票記入例のように記入しておく。Furthermore, Figure 2 is a diagram showing an example of the form used in this example.
21 is a form, 22 is a character type specification entry frame, 23 is a kanji specification column, 24 is a hiragana specification column, 25 is a katakana specification column, 26 is an alphanumeric symbol specification column, 27 is a character entry frame,
Reference numeral 28 is a character type specification entry line, and 29 is a character entry line, in which entries are made, for example, as in the form entry example shown in FIG.

以下、この帳票例を用いて本発明の動作を次に説明する
。The operation of the present invention will be explained below using this example of the form.

まず第２図の帳票２１の字種指定記入行２８の行につい
て文字記入行２９の各文字に対応した字種指定の有無を
検出し、識別部６へ字種指定の有無を送出する。その動
作は光電変換部１により字種指定記入行２８について光
電変換を行ない２値の量子化された電気信号に変換し、
１文字分の領域を切り出して・ぐターンレジスタ２に格
納する。First, the presence or absence of a character type designation corresponding to each character in the character entry line 29 is detected for the character type designation entry line 28 of the form 21 in FIG. The operation is such that the photoelectric conversion unit 1 performs photoelectric conversion on the character type specification entry line 28 and converts it into a binary quantized electric signal.
Cut out an area for one character and store it in the turn register 2.

字種指定検出部５は・やターンレジスタ２を字種指定記
入枠２２に対応する様に４個の領域に分割し、各領域内
の黒点数（文字線部を黒点とする。〕を計数し、閾値と
比較してそれぞれの文種の指定の有無を検出し、識別部
６へ前記字種即ち漢字、ひらがな、カタカナ、記号等の
指定の有無を送出する。以上の動作により文字記入行２
９の各文字に対応した字種指定を検出する。次に、第２
図の文字記入行２９の読取りを行なう。その動作は光電
変換部１により文字記入行２９について光電変換を行な
い、２値の量子化した電気信号に変換し、１文字分の領
域を切出して・やターンレジスタ２に格納する。特徴抽
出部３は・やターンレジスタ２内の文字パターンより各
種特徴を抽出し、該特徴を識別部６へ送出する。The character type designation detection unit 5 divides the turn register 2 into four areas corresponding to the character type designation entry frame 22, and counts the number of black dots in each area (character line parts are black dots). Then, it compares it with the threshold value to detect whether each character type has been specified, and sends the presence or absence of the character type, ie, kanji, hiragana, katakana, symbol, etc., to the identification unit 6. By the above operation, the character entry line is 2
The character type designation corresponding to each character of 9 is detected. Next, the second
The character entry line 29 in the figure is read. In this operation, the photoelectric converter 1 performs photoelectric conversion on the character entry line 29, converts it into a binary quantized electric signal, cuts out an area for one character, and stores it in the turn register 2. The feature extraction section 3 extracts various features from the character patterns in the turn register 2 and sends the features to the identification section 6.

同時に文字線量検出部４ではパターンレジスタ２内の文
字・ぐターンより文字線量を検出して文字の大きさで正
規化することに文字の複雑度りとする。複雑度は次式に
よって表わされる。At the same time, the character dose detection unit 4 detects the character dose from the characters and patterns in the pattern register 2 and normalizes it by the size of the character, depending on the complexity of the character. The complexity is expressed by the following equation.

但しＫはＤを整数化するための定数、Ａは文字枠内の全
黒点数、ＰＢは文字の外接枠のうち高さ方向の大きさ、
同様にＰＲは幅方向の大きさを示すものである。ＷＬは
文字の線幅で次式によってめる。However, K is a constant for converting D into an integer, A is the total number of black dots within the character frame, PB is the size of the circumscribed frame of the character in the height direction,
Similarly, PR indicates the size in the width direction. WL is the line width of the character and is determined by the following formula.

但しＱは、文字枠内を２×２の窓で全点観測し、４点と
も黒点である個数を表わす。However, Q represents the number of points in which all four points are black points when observing all points within the character frame using a 2×2 window.

文字の複雑度りが検出されたら、字種指定検出部５へ複
雑度りを送出する。字種指定検出部５では文字種指定領
域で文字種指定が検出できなかった場合、前記複雑度り
を用い以下の条件を判定し、文字種を決定し識別部へ送
出する。When the degree of complexity of a character is detected, the degree of complexity is sent to the character type designation detection section 5. If the character type designation detecting unit 5 cannot detect a character type designation in the character type designation area, it uses the degree of complexity to determine the following conditions, determines the character type, and sends it to the identification unit.

Ｄ　（ａ　全ての辞書を参照する。D (a) Refer to all dictionaries.

Ｄ≧ａ　字種は漢字であるとし漢字の辞書を−り。D≧a Assuming that the character type is a kanji, look up a kanji dictionary.

但し本実施例においてはａ　＝　１０、Ｋ＝５とした。However, in this example, a=10 and K=5.

（５）識別部６は特徴抽出部３よシ送出された特徴と辞書とを
照合し、最終的に１文字のカラゴリ名を文字泡出カフへ
出力する。(5) The identification unit 6 compares the features sent from the feature extraction unit 3 with the dictionary, and finally outputs a one-character color name to the character bubble cuff.

識別部６において使用する辞書メモリは、ひらがな辞書
メモリ８、カタカナ辞書メモリ９、英字数字記号辞書メ
モリ１０及び漢字辞書メモリ１１の４種が用意されてい
るが、前記特徴抽出部３より送出された特徴と辞書との
照合は、前記あらかじめ各文字に対応する字種指定があ
った文字種の辞書メモリを使用して行う。There are four types of dictionary memories used in the identification unit 6: a hiragana dictionary memory 8, a katakana dictionary memory 9, an alphanumeric symbol dictionary memory 10, and a kanji dictionary memory 11. The feature is compared with the dictionary using the dictionary memory of the character types in which the character types corresponding to each character have been designated in advance.

（発明の効果）本発明は以−ヒ詳細に説明したようにあらかじめ字種指
定の検出を行い、前記指定のない文字については、文字
の文字線量を検出して、字種の選択を行い字種に適した
辞書により文字の識別を行っているので高速で精度の高
い読取が出来、従って高速で精度の良い手書日本語文書
の読取が可能となる効果がある。(Effects of the Invention) As described in detail below, the present invention detects the character type designation in advance, and for characters without the above designation, the character type is selected by detecting the character dose of the character. Since characters are identified using a dictionary appropriate for the species, reading can be performed at high speed and with high accuracy, and therefore, handwritten Japanese documents can be read at high speed and with high accuracy.

[Brief explanation of drawings]

第１図は本発明による手書文書読取方法の一実（６）流側を示す構成図、第２図は本発明の実施例で使用した
帳票例を示す図、第３図はその帳票記入例を示す図であ
る。１・・・光電変換部、２・・・パターンレジスタ、３・
・特徴抽出部、４・・・文字線量検出部、５・・・字種
指定検出部、６・・・識別部、７・・・文字名出力、８
・・・ひらがな辞書メモリ、９・・・カタカナ辞書メモ
リ、ｌＯ・・・英字数字記号辞書メモリ、１１・・・漢
字辞書メモリ、２１・・・帳票、２２・・・字種指定記
入枠、２３・・・ひらがな指定欄、２４・・・カタカナ
指定欄、２５・・・英字数字記号指定欄、２６・・・漢
字指定欄、２７・・・文字記入枠、２８・・字種指定記
入枠、２９・・・文字記入行。特許出願人沖電気工業株式会社特許出願代理人弁理士　山　本　恵　− （７）第１図手続補正書（自発）昭和５９年８月１４−日特許庁長官　志　賀　学　殿１、事件の表示昭和５９年　特許願第５７４２０号２、発明の名称手書文書読取方法３、補正をする者事件との関係　特許出願人名　称　（０２９）沖電気工業株式会社５、補正の対象明細書の特許請求の範囲、発明の詳細な説明及び図面の
簡単な説明の各欄並びに図面６、補正の内容（１１特許請求の範囲を別紙のとおり補正する。（２）明細書第２頁第２０行〜同第３頁第１行の「本発
明・・・一実施例」を「本発明による手書日本語文書読
取方法の一実施例」と補正する。（３）同第３頁第８行、第１１行、第１６行、第１９〜
２０行、同第４頁第３行、第３〜４行、同第５頁第１２
行、第１３行及び同第７頁第５行、第９行、第１２行の
「字種指定」を１文字種指定」と補正する。（４）同第４頁第２行、第３行、第１４行、第１５行、
第１６行、第１８行、第１９行及び同第７頁第４行の１
パターン」を「バタン」と補正する。（５）同第４頁第６行の「文種」を「文字種」と補正す
る。（６）同第４頁第７行の「識別部６へ前記字種」を「前
記文字種」と補正する。（７）同第４頁第８〜９行の「送出する。」を１文字種
検出部５内の文字種指定メモリに格納する。」と補正す
る。（８）同第４頁第１０行の「字種指定を」を「字種指定
の有無を」と補正する。（９）同第４頁第２０行の１に文字」を１により文字」
と補正する。（１０）同第５頁第１４行の「文字種指定領域で文字種
指定」を「前記文字種指定メモリを順次参照し前記文字
種指定を識別部６へ送出し文字種指定領域で第３図の２
２に示すごとく文字種指定」と補正する。（１１）同第５頁第１６行の「識別部へ」を「識別部６
へ」と補正する。（１２）図面の第３図を別紙のとおり補正する。以　上（３）特許請求の範囲（１）　手書き日本語文書において文字を記入する文字
枠と、該文字枠の近傍にもうけられ文字種を指定する文
字種指定領域を有し、文字種指定領域で指定された文字
種の辞書により文字枠に記入された手書文字を認識する
手書文書読取方法において、文字種の指定がない場合に
、手書文字の文字線量を文字の複雑度としてめ、該複雑
度に対応する文字種の辞書を選択し、該選択された辞書
により手書文字を認識することを特徴とする手書文書読
取方法。（２）前記辞書がひらがな、かたかな、英数字、及び漢
字に対し各々もうけられることを特徴とする特許請求の
範囲第１項記載の手書文書読取方法。Figure 1 is a block diagram showing an example of the handwritten document reading method (6) according to the present invention, the flow side, Figure 2 is a diagram showing an example of a form used in an embodiment of the present invention, and Figure 3 is the entry of the form. It is a figure which shows an example. 1... Photoelectric conversion unit, 2... Pattern register, 3...
・Feature extraction unit, 4... Character dose detection unit, 5... Character type specification detection unit, 6... Identification unit, 7... Character name output, 8
...Hiragana dictionary memory, 9...Katakana dictionary memory, lO...Alphabet/numeric symbol dictionary memory, 11...Kanji dictionary memory, 21...Form, 22...Character type specification entry frame, 23 ...Hiragana specification field, 24...Katakana specification field, 25...Alphabet, numeric symbol specification field, 26...Kanji specification field, 27...Character entry box, 28...Character type specification entry box, 29...Character entry line. Patent Applicant Oki Electric Industry Co., Ltd. Patent Application Agent Megumi Yamamoto - (7) Figure 1 Procedural Amendment (Voluntary) August 14, 1980 - Japan Patent Office Commissioner Manabu Shiga 1, Indication of Case 1982 Patent Application No. 57420 2 Title of the invention Handwritten document reading method 3 Relationship with the case of the person making the amendment Patent applicant name (029) Oki Electric Industry Co., Ltd. 5 Patent claim for the specification subject to amendment scope, detailed description of the invention, and brief description of the drawings, drawing 6, and contents of the amendment (11) Claims are amended as shown in the attached sheet. (2) Page 2 of the specification, line 20 to "One embodiment of the present invention..." in the first line of page 3 is amended to read "an embodiment of the handwritten Japanese document reading method according to the present invention." (3) Page 3, line 8, Line 11, line 16, line 19~
Line 20, page 4, line 3, lines 3-4, page 5, line 12
Correct the "Character type designation" in line 1, line 13, and lines 5, 9, and 12 of page 7 to "1 character type designation." (4) Page 4, lines 2, 3, 14, and 15,
Lines 16, 18, 19 and 1 of the 4th line of page 7
Correct the pattern with a bang. (5) Correct "text type" in line 6 of page 4 to "character type". (6) Correct "the character type to the identification unit 6" on the seventh line of the fourth page to "the character type". (7) Store "Send." in the 8th to 9th lines of the 4th page in the character type designation memory in the 1 character type detection unit 5. ” he corrected. (8) On page 4, line 10, "specify character type" is corrected to "specify character type". (9) On the 4th page, line 20, the letter 1 is replaced by the letter 1.
and correct it. (10) "Specify the character type in the character type specification area" on the 14th line of page 5 is changed to "Sequentially refer to the character type specification memory and send the character type specification to the identification unit 6.
2. Specify the character type as shown in Figure 2. (11) Change “To identification section” to “Identification section 6” on page 5, line 16.
"To," he corrected. (12) Figure 3 of the drawings will be amended as shown in the attached sheet. Above (3) Claims (1) A handwritten Japanese document has a character frame in which characters are written, and a character type designation area provided near the character frame to designate a character type, and a character type designation area for specifying a character type. In a handwritten document reading method that recognizes handwritten characters written in a character frame using a dictionary of character types, when the character type is not specified, the character dose of the handwritten character is considered as the complexity of the character, and the complexity is A handwritten document reading method comprising selecting a dictionary of corresponding character types and recognizing handwritten characters using the selected dictionary. (2) The handwritten document reading method according to claim 1, wherein the dictionary is created for each of hiragana, katakana, alphanumeric characters, and kanji.

Claims

[Claims]

(1) A character frame in which to write the characters of a handwritten Japanese document,
In a handwritten document reading method that has a character type specification area provided near the character frame and specifies the character type, and recognizes handwritten characters written in the character frame using a dictionary of character types specified in the character type specification area, the character type is specified. is not specified, the character dose of handwritten characters is taken as the character complexity, a dictionary of character types corresponding to the complexity is selected, and the handwritten characters are recognized using the selected dictionary. A handwritten document reading method.

(2) The handwritten document reading method according to claim 1, wherein the dictionaries are respectively created for hiragana, katakana, alphanumeric characters, and kanji.