JPH03214286A

JPH03214286A - Character recognizing device

Info

Publication number: JPH03214286A
Application number: JP2009623A
Authority: JP
Inventors: Keiko Abe; 阿部　惠子
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1990-01-19
Filing date: 1990-01-19
Publication date: 1991-09-19
Anticipated expiration: 2014-05-24
Also published as: JP2893781B2

Abstract

PURPOSE:To improve the identification factor of a separate character by registering the image data on the separate character that is erroneously identified into a recognition dictionary part. CONSTITUTION:The segmented character signals S7 which form the while of a separate character that is erroneously recognized is supplied to a dictionary production part 29. Thus an entire image of the separate character is registered into a recognition dictionary part 30. Therefore a character recognizing part 28 integrates the signals S7 forming each component part of the separate character when this character is identified. Then the part 28 can accurately identify the separate character by reference to the part 30. In such a way, the identification factor is improved for the separate character.

Description

【発明の詳細な説明】：１童菓上の利用分野二本発明は、例えば印刷文書の文字を認識して文字コード
に変換する場合に使用して好適な文字認識装置に関する
。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character recognition device suitable for use, for example, in recognizing characters in a printed document and converting them into character codes.

[Summary of the invention]

本発明は、例えば印刷文書の文字を認識して文字コード
に変換する場合に使用して好適な文字認識装置に関し、
原文書のイメージに対応する原文字信号から１個の文字
又は分離文字の各構成部を形成する切出し文字信号を順
次切出す文字切出し部と、文字コードに対応するイメー
ジデータを記憶した認識辞書部と、１個又は複数個のそ
の切出し文字信号に対応する文字コードをその認識辞書
部を用いて識別する文字識別部と、その文字切出し部よ
り供給されるその切出し文字信号のイメージデータをそ
の認識辞書部へ登録する辞書作成部とを有し、その文字
識別部にて分離文字の識別を誤った場合、この識別を誤
った分離文字の全体を形成する切出し文字信号をその文
字切出し部よりその辞書作成部へ供給することにより、
その識別を誤った分離文字のイメージデータをその認識
辞書部尺登録することにより、分離文字の識別率を容易
に高めることができるようにしたものである。The present invention relates to a character recognition device suitable for use, for example, when recognizing characters in a printed document and converting them into character codes.
a character extraction unit that sequentially extracts extracted character signals forming each component of one character or separated characters from an original character signal corresponding to an image of an original document; and a recognition dictionary unit that stores image data corresponding to character codes. a character identification section that uses the recognition dictionary section to identify a character code corresponding to one or more of the cutout character signals; and a character identification section that recognizes the image data of the cutout character signal supplied from the character cutout section. It has a dictionary creation section that registers the separated characters in the dictionary section, and if the character identification section makes a mistake in identifying a separated character, the cutout character signal forming the entire separated character that was misidentified is transferred from the character cutting out section. By supplying it to the dictionary creation section,
By registering the image data of the incorrectly identified separated characters in the recognition dictionary, it is possible to easily increase the identification rate of separated characters.

[Conventional technology]

例えば活版印刷において作業者が活字を拾う工程を自動
化するためには、タイプ印刷等で作成された原稿の各文
字を認識して文字コードに変換する文字認識装置が必要
である。For example, in order to automate the process in which a worker picks up type in letterpress printing, a character recognition device is required to recognize each character in a document created by type printing or the like and convert it into a character code.

第゜６図は特開昭６２　−７４１８１　号公報で開示さ
れている従来の文字認識装置を示し、この第６図におい
て、（１）は原稿読取部であり、この原稿読取部（１）
から原稿の１ページ分の濃淡に対応する原文字信号Ｓ１
が文字列切出し部（２〕に供給される。この原文字信号
５１は原稿を所定の密度でドット分解し、黒いドットを
ハイレベル“１”　白いドットを口−レベル“Ｏ”で表
わしたものであるが、各ドットの濃度を複数ビ７｝の２
進数で表わす場合もある。Fig. 6 shows a conventional character recognition device disclosed in Japanese Unexamined Patent Publication No. 62-74181. In Fig. 6, (1) is a document reading section;
Original character signal S1 corresponding to the shading of one page of the original from
is supplied to the character string extraction unit (2).This original character signal 51 is obtained by dividing the document into dots at a predetermined density, and representing black dots as high level "1" and white dots as low level "O". However, the density of each dot is
It may also be expressed in decimal numbers.

文字列切出し部（２）は第１段前処理部（３）、第２段
前処理部（４）及び第３段前処理部（５）より構成され
、原文字信号Ｓ１には第１段前処理部（３）において雑
音の除去及び原稿の回転補正がなされ、第２段前処理部
（４）において文字領域ＡＲ（第７図参照）がその他の
領域（写真、図面等の領域）から区分されてその文字領
域ＡＲに含まれるイメージデータだけが抽出され、第３
段前処理部（５）においてその抽出された文字領域Ａ　
Ｒに含まれる文字列ＡＰＩ，ＡＲ２，　　・・・に対応
する文字列信号Ｓ４が抽出される。The character string extraction unit (2) is composed of a first stage preprocessing unit (3), a second stage preprocessing unit (4), and a third stage preprocessing unit (5). The preprocessing unit (3) removes noise and corrects the rotation of the document, and the second stage preprocessing unit (4) separates the character area AR (see Figure 7) from other areas (areas of photographs, drawings, etc.). Only the image data included in the character area AR is extracted, and the third
The character area A extracted in the pre-processing section (5)
Character string signals S4 corresponding to character strings API, AR2, . . . included in R are extracted.

この文字列信号Ｓ４の抽出を行なうには、第７図で示す
如く、文字領域Ａ　Ｒの各ドットの位置を水平方向にと
ったＸ軸と垂直方向にとったＹ軸とよりなる（Ｘ，Ｙ）
座標で表わし、各ドットの“１″又は“０”の値をＹ軸
上に投影して和をとること１＝よりＹ投影信号Ｓｙを生
成する。そして、二のＹ投影信号Ｓｙを所定の閾値レベ
ルで２値化すると、この２億化した信号の内の／％イレ
ベル“１”の区間が夫々文字列．Ａ　Ｒｌ，　Ａ　Ｒ２
，・・・・に対応する如くなり、文字列信号Ｓ４は後続
の文字切出し部（６）に供給される。In order to extract this character string signal S4, as shown in FIG. Y)
Expressed in coordinates, the value of "1" or "0" of each dot is projected onto the Y axis and the sum is calculated to generate a Y projection signal Sy. Then, when the second Y projection signal Sy is binarized at a predetermined threshold level, each section of the 200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,0,0,0,0,0,0,0,0,0,0,0,0,0,0, level, section of level "1" section becomes a character string. A Rl, A R2
, . . , and the character string signal S4 is supplied to the subsequent character cutting section (6).

文字切出し邪（６）においては、例えば第８図Ａに示す
１番目の文字列ＡＲｉ　の文字列信号Ｓ４をＸ軸上に投
影してＸ投影信号Ｓｘを生成し、このＸ投影信号ＳＸを
最小レベル（値が１）の閾値ＴＨＩで２値化することに
より粗切出し信号ＤＴＩ（第８図Ｃ）を得て、このＸ投
影信号Ｓｘを中程度のレベルの閾値ＴＨ２（第８図Ｄ）
で２値化することにより細切出し信号ＤＴ２（第８図Ｅ
》を得る。同様に粗切出し信号ＤＴＩ　がハイレベル“
１″の区間だけで個々にＹ投影信号Ｓｙを生成すること
により、Ｙ方向の切出し信号を生成することができる。In character extraction error (6), for example, the character string signal S4 of the first character string ARi shown in FIG. 8A is projected onto the X axis to generate an X projection signal Sx, and this X projection signal SX is By binarizing with a threshold value THI of level (value 1), a coarse cutout signal DTI (Fig. 8C) is obtained, and this X projection signal Sx is converted to a medium level threshold value TH2 (Fig. 8D).
By binarizing the signal DT2 (Fig. 8E)
》obtained. Similarly, the coarse cutting signal DTI is at high level.
By individually generating the Y projection signal Sy only in the 1'' section, it is possible to generate the cutout signal in the Y direction.

そして、最終的に第８図Ａに示す如く、例えば文字「て
」についてはこの文字に外接する外接枠（９）（Ｍ［で
ハイレベル“１″となると共に、分離文字である「い」
については分離されている各部に外接する外接枠（１１
），　（１２）　　の内部でハイレベル“１”となる切
出し信号が得られ、人力される文字列信号Ｓ４からその
切出し信号がハイレベル“１”となる部分だけを順次切
出した信号が基本矩形切出し文字信号Ｓ７となる。Finally, as shown in FIG. 8A, for example, for the character "te", the circumscribing frame (9) (M[) becomes high level "1", and the separated character "i"
For each separated part, the circumscribing frame (11
), (12), a basic rectangular signal is obtained by sequentially cutting out only the parts where the cutting signal becomes high level "1" from the human input character string signal S4. This becomes a cutout character signal S7.

尚、第８図Ｅの細切出し信号ＤＴ２　は各文字のより微
細な構造を調べる場合に使用される。また、第８図Ａの
分離文字である「い」については外接枠（１１），　（
１２）　　が２個あるため、後に文字識別の段階で統合
を行なう必要がある。Incidentally, the fine cutting signal DT2 shown in FIG. 8E is used when examining the finer structure of each character. In addition, for “i”, which is a separated character in Figure 8A, the circumscribing frame (11), (
12) Since there are two, it is necessary to integrate them later at the character identification stage.

（７）は文字識別部を示し、この文字識別部（７）は基
本矩形切出し文字信号Ｓ７を各外接枠毎に取込んで文字
認識を行なう。具体的には、先ず位置による分類を行な
い、第８図八の文字例ＡＲｉ　　に対して上半分の範囲
に存在する文字（ｒ’　Ｊ，　ｒ”」，「゜」など）及
び下半分に存在する文字（「。」ｒ．　　」，　ｒ，　
　Ｊなど）を第１特徴文字としてパターンマッチングを
行って、対応する文字コード（ＪＩＳコードなど）を付
与する。これで識別ができない場合には、外接枠の幅を
Ｗ１高さをｈとして、縦横比ｈ　／　ｗ及び相対的大き
さによる分類を行なう。即ち、ｗ１横比ｈ／ｗが０＜ｈ
／ｗ＜０、５の範囲に入るか、ｌ，５＜ｈ／ｗの範囲に
入るかによって分類を行なう。更に、平均的な大きさの
外接粋の幅をｗＲ　，高さをｈ，として、縦相対比ｈ／
ｈａ及び撹相対比ｗ／ｗＩＩ　の値が夫々０くｈ／Ｌ＜
０．５及び０　＜　ｗ　／　ｗ　ｔｒ　＜　０．　５　
（Ｄ範囲に入ルカ否かによって分類を行ない。上述の範
囲に入る文字を第２特徴文字としてパターンマッチング
を行なう。Reference numeral (7) indicates a character recognition section, and this character recognition section (7) takes in the basic rectangular cut-out character signal S7 for each circumscribed frame and performs character recognition. Specifically, first, classification is performed by position, and characters existing in the upper half range (r' J, r'', ゜, etc.) and characters existing in the lower half of the character example ARi in Figure 8 are classified. character (``.''r.'', r,
Pattern matching is performed using J, etc.) as the first characteristic character, and a corresponding character code (JIS code, etc.) is assigned. If identification is not possible, the width of the circumscribing frame is set to W1 and the height is h, and classification is performed based on the aspect ratio h/w and relative size. That is, w1 side ratio h/w is 0<h
Classification is performed depending on whether it falls within the range /w<0,5 or l,5<h/w. Furthermore, let the width of the average-sized circumscribed cutlet be wR and the height be h, and the vertical ratio h/
ha and stirring ratio w/wII are each 0 and h/L<
0.5 and 0 < w / w tr < 0. 5
(Classification is performed depending on whether it falls within the D range or not. Pattern matching is performed using characters that fall within the above range as second characteristic characters.

また、第１及び第２特徴文字に分類されない文字に対し
ては個別に記憶されているドットパターンとのパターン
マッチングを行ない、所定の合致度が得られた場合には
その文字コードを付与する。Further, for characters that are not classified as the first or second characteristic characters, pattern matching is performed with individually stored dot patterns, and if a predetermined degree of matching is obtained, the character code is assigned.

それでもｌＪＩＷａできない文字が残った場合には、そ
の外接枠を更に複数の微少外接粋に分離する再切出し及
び後に続く外接枠と合体させる統合の動作が実効される
。尚、最終的に認識できない文字が残った場合には、そ
の文字には認識できない文字であることを示すリジェク
トコードが付与される。If a character that cannot be lJIWa still remains, the operations of re-cutting the circumscribing frame to further separate it into a plurality of minute circumscribing elements and merging them with the subsequent circumscribing frame are performed. In addition, if an unrecognizable character ultimately remains, a reject code indicating that the character is an unrecognizable character is given to that character.

その文字識別部（７）で生成された原稿の１ページ分の
文字コードは文字の位置や大きさを示す情報と共に所定
の記憶装置に記憶される。更に、認識結果が正しいかど
うかをオペレータが判定できる様に、その文字コードに
対応する文字のビデオ信号が陰極線管等の表示部（８）
に供給され、この表示部（８）の表示画面には原稿に対
応した形式で認識結果としての一群の文字が表示される
。この場合、認識できなかった文字の部分には高輝度の
矩形のブランクが表示される。従って、修正対象文字や
認識できない文字が存在する場合には、オペレータはワ
ードプロセッサと同じ要領でその部分に所望の文字を打
込むことができる。The character code for one page of the manuscript generated by the character identification section (7) is stored in a predetermined storage device along with information indicating the position and size of the character. Furthermore, the video signal of the character corresponding to the character code is displayed on a display unit (8) such as a cathode ray tube so that the operator can judge whether the recognition result is correct or not.
A group of characters as a recognition result is displayed on the display screen of the display unit (8) in a format corresponding to the document. In this case, a high-intensity rectangular blank is displayed in the portion of the character that cannot be recognized. Therefore, if a character to be corrected or a character that cannot be recognized exists, the operator can input the desired character into that part in the same way as a word processor.

上述のように、原稿の濃淡に対応する原文字信号Ｓ１を
生成し、この信号Ｓ１を１個の文字に外接する外接枠で
切出して切出し文字信号Ｓ７を生成し、この切出し文字
信号Ｓ７に対応する文字を特定するという文字認識のア
ルゴリズム自体は基本的には確立しているということが
できる。As described above, an original character signal S1 corresponding to the shading of the original is generated, and this signal S1 is cut out using a circumscribing frame circumscribing one character to generate a cutout character signal S7, and a cutout character signal S7 corresponding to the cutout character signal S7 is generated. It can be said that the character recognition algorithm itself, which specifies the characters that are used, is basically established.

ク発明が解決しようとする課Ｍ〕しかしながら、その文字認識のアルゴリズムを用いた文
字認識装置を実際にオフィスに設定してオペレータが使
用した結果、操作性において種々の不都合があることが
判明した。[Problem to be Solved by the Invention] However, when a character recognition device using this character recognition algorithm was actually set up in an office and used by an operator, it was found that there were various inconveniences in terms of operability.

その不都合の１つは、分離文字の識別率を高めるのが困
難であることである。即ち、文字認識装置においては原
稿の中の或る特定の文字のみを読み誤ることがあるが、
その特定の文字が通常の一体化された文字であれば、文
字切出し部（６）から出力される基本矩形切出し文字信
号Ｓ７をそのまま用いて学習機能によって文字識別Ｂ（
７）の中に特徴量やドットパターン等のイメージデータ
を登録するごとにより次回からは正確に認識を行なうこ
とができる。One of its disadvantages is that it is difficult to increase the identification rate of separated characters. In other words, character recognition devices may misread only certain characters in a document;
If the specific character is a normal integrated character, the learning function uses the basic rectangular cutout character signal S7 output from the character cutout section (6) as it is to identify the character B (
By registering image data such as feature amounts and dot patterns in step 7), accurate recognition can be performed from the next time onwards.

しかしながら、例えば変則字体等の分離文字については
文字切出しＢ（６）から出力される基本矩形切出し文字
信号Ｓ７はその分離文字を構成する各部分の切出し文字
信号であるため、その分離文字のイメージデータを学習
機能によってその文字識別部（７〕の中に登録しようと
しても、登録されるのはその分離文字を構成する各部分
毎のイメージデータであり、分離文字全体を１個の文字
とみなしたイメージデータを登録することはできなかっ
た。However, for example, for separated characters such as irregular fonts, the basic rectangular cutout character signal S7 output from the character cutout B (6) is the cutout character signal of each part constituting the separated character, so the image data of the separated character is Even if you try to register it in the character recognition section (7) using the learning function, what is registered is image data for each part that makes up the separated character, and the entire separated character is regarded as one character. Image data could not be registered.

従って、変則字体等の分離文字は文字識別Ｂ（７）にお
いて切出し文字信号の統合を行っても、この統合された
文字信号のイメージデータの特徴量とその文字識別部（
７）に予め記録されているイメージデータの特徴量とは
合致せず常に誤認識されることになり、分離文字の識別
率を高めることは困難であった。Therefore, for separated characters such as irregular fonts, even if the extracted character signals are integrated in character identification B (7), the feature amount of the image data of this integrated character signal and its character identification part (
7) does not match the feature amount of the image data recorded in advance and is always erroneously recognized, making it difficult to increase the identification rate of separated characters.

本発明は斯かる点に鑑み、文字認識装置において大がか
りな辞書作成用の装置などを用いることなく分離文字の
識別率を高めるようにすることを目的とする。In view of the above, an object of the present invention is to improve the identification rate of separated characters in a character recognition device without using a large-scale dictionary creation device.

[Means to solve the problem]

本発明による文字認識装置は、原文書（１４）のイメー
ジに対応する原文字信号５１から１個の文字又は分離文
字の各構成部を形成する切出し文字信号Ｓ７を順次切出
す文字切出し部（２４．　２５）と、文字コードに対応
するイメージデータを記１意した認識辞書部（３０）と
、１個又は複数個のその切出し文字信号Ｓ７に対応する
文字コードをその認識辞書部（３０）を用いて識別する
文字認識部（２８）と、その文字切出し部（２４．　２
５）より供給されるその切出し文字信号Ｓ７のイメージ
データをその認識辞書部（３０）へ登録する辞書作成部
（２９）とを有し、その文字識別部（２８）にて分離文
字の識別を誤った場合、この識別を誤った分離文字の全
体を形成する切出し文字信号Ｓ７をその文字切出し部（
２４．　２５）よりその辞書作成部（２９）へ供給する
ことにより、その識別を誤った分離文字のイメージデー
タをその認識辞書部（３０）へ登録するようにしたもの
である。The character recognition device according to the present invention has a character cutting unit (24) that sequentially cuts out cutout character signals S7 forming each constituent part of one character or a separated character from an original character signal 51 corresponding to an image of an original document (14). .25), a recognition dictionary section (30) that records image data corresponding to the character code, and a recognition dictionary section (30) that records the character code corresponding to one or more of the extracted character signals S7. a character recognition unit (28) for identifying characters using the
5) a dictionary creation unit (29) that registers the image data of the cut-out character signal S7 supplied from the recognition dictionary unit (30), and the character identification unit (28) identifies the separated characters. If the identification is incorrect, the cut-out character signal S7 forming the entire separated character for which the identification was incorrect is sent to the character cut-out portion (
24. 25) to the dictionary creation section (29), thereby registering the image data of the incorrectly identified separated character in the recognition dictionary section (30).

[Effect]

斯かる本発明によれば、その識別を誤った分離文字の全
体を形成する切出し文字信号Ｓ７が辞書作成部（２９）
へ供給されるため、その認識辞書部（３０）へはその識
別を誤った分離文字の全体のイメージが登録される。従
って、次にその識別を誤った分離文字が識別対象となっ
たときには、その文字識別部（２８）はその分離文字の
各構成部を形成する切出し文字信号Ｓ７を統合して認識
辞書ｆｌ’Ｅ（３０）を参照することにより、その分離
文字を正確に識別することができる。According to the present invention, the cut-out character signal S7 forming the entire separated character that has been incorrectly identified is sent to the dictionary creation unit (29).
Therefore, the entire image of the separated character that has been misidentified is registered in the recognition dictionary section (30). Therefore, the next time a separated character that has been incorrectly identified is to be identified, the character identification unit (28) integrates the cut-out character signals S7 forming each constituent part of the separated character and uses the recognition dictionary fl'E. (30), the separator character can be accurately identified.

従って、分離文字の識別率を高めることができる。Therefore, the identification rate of separated characters can be increased.

〔Example〕

以下、本発明による文字認識装置の一実施例につき第１
図〜第５図を参照して説明しよう。Hereinafter, the first embodiment of the character recognition device according to the present invention will be described.
This will be explained with reference to FIGS.

第１図は本例の文字認識装置のシステム構成を示し、こ
の第１図において、（１３）はドキュメントフィダーと
イメージリーグとよりなるスキャナー（１４）はこのス
キャナーにセットされた原稿であり、スキアナー（１３
）は原稿（１４）の１ページ全体を例えば４００　ｘ４
００ｄｐｉ　（ドット／インチ）の読取り密度でド７｝
分解し、各ドットの濃淡に対応した原文字信号Ｓ１を生
成する。Figure 1 shows the system configuration of the character recognition device of this example. In Figure 1, (13) is a scanner consisting of a document feeder and an image league. A scanner (14) is a document set in this scanner. (13
) is the entire page of manuscript (14), for example 400 x 4
7 with a reading density of 00dpi (dots/inch)
The original character signal S1 corresponding to the shading of each dot is generated.

（１５）はイメージデータ入出力ボード、（１６）はホ
ストコンビューク、（２１）はプリンターを示し、イメ
ージデータ入出力ボード（１５）は原文字信号Ｓ１の所
定部分をホストコンピュータ（１６）に供給すると共に
、ホストコンピュータ（１６）から出力される印字用の
信号をプリンター（１６）に供給する。（１７）はホス
トコンピュータ（１６）を摸作するためのキーボード、
（１８）はホストコンピュータ（１６）に各種座標を入
力するための座標入力ユニット、（１９）はキャラクタ
識別ボードを示し、ホストコンピュータ（１６）が原文
字信号Ｓ１から１個の文字の外接枠の内部テハイレベル
“１″となる切出し信号を用いて切出した基本矩形切出
し文字信号Ｓ７を順次そのキャラクタ識別ボード（１９
）に供給すると、キャラクタ識別ボード（１９）はその
切出し文字信号Ｓ７に対応する文字の文字コードＣ（そ
の文字の認識ができない場合にはりジエクトコード）を
ホストコンピュータ（１６）に供給する。(15) is an image data input/output board, (16) is a host computer, (21) is a printer, and the image data input/output board (15) supplies a predetermined portion of the original character signal S1 to the host computer (16). At the same time, a printing signal output from the host computer (16) is supplied to the printer (16). (17) is a keyboard for imitating the host computer (16);
(18) is a coordinate input unit for inputting various coordinates to the host computer (16), and (19) is a character identification board, in which the host computer (16) extracts the circumscribed frame of one character from the original character signal S1. The basic rectangular cutout character signal S7 cut out using the cutout signal with the internal high level "1" is sequentially passed through the character identification board (19).
), the character identification board (19) supplies the character code C of the character corresponding to the cutout character signal S7 (or the jump code if the character cannot be recognized) to the host computer (16).

（２０）は陰極線管よりなる表示装置を示し、この表示
装置（２０）の表示画面の所定領域にはｒ！Ｋ稿（１４
）の１ページ分の文字を認識した結果を原稿（１４）に
対応した形式で表示する如くなす。また、この表示装置
（２０）の表示画面には必要に応じて原稿（１４）の１
ページ分又は所定部分のドットパターンそのものをも表
示できる如くなす。(20) shows a display device made of a cathode ray tube, and a predetermined area of the display screen of this display device (20) has r! K draft (14
) is displayed in a format corresponding to the original (14). The display screen of this display device (20) also displays one part of the original (14) as needed.
The dot pattern itself for a page or a predetermined portion can be displayed.

第２図は第１図例のプリンター（２１）に関する処理部
分を除くより詳細な構成を示し、この第２図のイメージ
データ入出力ボード（１５）において、（２２）は原稿
（１４）の１ページ分以上のドットパターンを記憶でき
るメモリを有するイメージデータ人力部、（２３）は同
じく原稿（１４）の１ページ分以上のドットパターンを
記憶できるバックアップメモリであり、スキャナー（１
３）より出力された原稿（１４）の１ページ分の原文字
信号Ｓ１をイメージデータ人力部（２２）に記憶する。FIG. 2 shows a more detailed configuration of the printer (21) shown in FIG. 1, excluding the processing part. In the image data input/output board (15) shown in FIG. The image data manual section (23) has a memory that can store dot patterns for more than one page, and (23) is a backup memory that can also store dot patterns for more than one page of the original (14).
3) Store the original character signal S1 for one page of the original document (14) outputted in the image data manual section (22).

また、このイメージデータ人力部（２２）に記憶されて
いる原文字信号Ｓ１の所望の部分を随時バックアップメ
モ’Ｊ　（２３）に移送すると共に、その所望の部分は
文字がない白紙状体に対応する（例えばゼロレベル“０
″の）原文字信号で置換する如くなし、この所望の部分
が置換えられた原文字信号Ｓ１を原文字信号Ｓ２と称し
、この原文字信号Ｓ２の所定部分を原文字信号Ｓ３と称
する。In addition, a desired part of the original character signal S1 stored in the image data manual section (22) is transferred to the backup memo 'J (23) at any time, and the desired part corresponds to a blank sheet with no characters. (e.g. zero level "0"
The original character signal S1 with this desired portion replaced is referred to as an original character signal S2, and a predetermined portion of this original character signal S2 is referred to as an original character signal S3.

ホストコンピュータ（１６）において、（２４）は中央
処理ユニット（以下ｒＣ　Ｐ　Ｕ．と称す）　、（２５
）はメインメモリ、（２６）は表示装置（２０）用のビ
デオ信号用ＲＡＭ　（以下’　Ｖ　Ｒ　Ａ　Ｍ　Ｊと称
す）　、（２７）は文字コードを入力してこの文字コー
ドに対応する所定の字体のドットパターン即ちフォント
を出力するヰアラクタＲ　Ｏ　Ｍよりなるフォントテー
ブルを示し、オペレータがキーボード（１７）及び座標
人力ユニ７｝（１８）を介してＣ　Ｐ　Ｕ（２４）に各
種コマンド、データ及び座標データを供給すると、ＣＰ
Ｕ（２４）’はこれに対応して本例の文字認識装置の全
体の動作を制御する。In the host computer (16), (24) is a central processing unit (hereinafter referred to as rCPU.), (25)
) is the main memory, (26) is a video signal RAM for the display device (20) (hereinafter referred to as VRAMJ), and (27) is a character code input and a predetermined memory corresponding to this character code. It shows a font table consisting of an Aractor ROM that outputs dot patterns of fonts, that is, fonts, and an operator inputs various commands, data, and When the coordinate data is supplied, CP
Correspondingly, U(24)' controls the overall operation of the character recognition device of this example.

また、原文字信号Ｓ２及びＳ３を夫々随時メインメモ’
Ｊ（２５）及びＶ　Ｒ　Ａ　Ｍ　（２６）に供給する。In addition, the original character signals S2 and S3 can be recorded as a main memo at any time.
J (25) and V R A M (26).

この場合、ｃ　Ｐ　Ｕ（２４）及びメインメモリ（２５
）が第６図例の文字列切出し部（２）及び文字切出し部
（６）に対応し、メ・インメモ’）　（２５）かろ続出
された１文字の外接枠Ｏ内部に対応する基本矩形切出し
文字信号Ｓ７を順次キアラクタ識別ボード（１９）に供
給し、キャラクタ識別ボード（１９）より送信されて来
る文字コードＣをメインメモリ（２５）を介してフォン
トテ一ブル（２７）のアドレスバスに供給し、このフォ
ントテーブル（２７）のデータパスに現われるフォント
データをＶ　Ｒ　Ａ　Ｍ（２６）の所定領域に書込む如
くなす。また、本例のホストコンビニータ（１６）、キ
ーボード（１７）、座標人カユニッ｝　（１８）及び表
示装置（２０）よりなるシステムはワードプロセッサと
しての機能をも具えている。In this case, cPU (24) and main memory (25
) corresponds to the character string extraction part (2) and character extraction part (6) in the example in Figure 6, and the basic rectangle extraction corresponds to the inside of the circumscribing frame O of one character that is successively outputted from the main memo') (25). The character signal S7 is sequentially supplied to the character identification board (19), and the character code C transmitted from the character identification board (19) is supplied to the address bus of the font table (27) via the main memory (25). , the font data appearing in the data path of this font table (27) is written into a predetermined area of the VRAM (26). Furthermore, the system comprising the host combinator (16), keyboard (17), coordinate system (18), and display device (20) of this example also has the function of a word processor.

キャラクタ識別ボード（１９）において、ク２８）は文
字の認識部、（３０）は各種字体のフォントデータを文
字コード（本例ではＪＩＳコード）に対応させて記憶し
ている認識辞書部を示し、認識部（２８）及び認識辞書
部（３０）が基本的に第６図の文字識別部（７）に対応
する。本例の認識辞書部ク３０）は大分類文字用の大分
類辞書部と細分類文字用の細分類辞書部とに分かれ、大
分類辞書部には前述した如く位置によって分類された第
１特徴文字並びに外接枠の相対的大きさく縦横比ｈ／ｗ
）、縦相対比ｈ／ｈ，及び横相対比Ｗ／ＷＲの値によっ
て分類された第２特徴文字の例えば縦２４ドット×横２
４ドットに正規化されたフォントデータが夫々格納され
ている。尚、一般に文字の大まかな特徴は文字の外接枠
の各辺の近傍のドットパターンによっても表わされるた
め、各文字の外接枠の四辺の近傍のドットパターンを四
辺データ（又は周辺データ）として数値化して、この四
辺データが所定の範囲に収まった文字（大分類文字）の
フォントデータを大分類辞書部に格納する如くなしても
よい。In the character identification board (19), 28) indicates a character recognition section, (30) indicates a recognition dictionary section that stores font data of various fonts in correspondence with character codes (JIS codes in this example), The recognition section (28) and the recognition dictionary section (30) basically correspond to the character identification section (7) in FIG. The recognition dictionary section 30) of this example is divided into a large classification dictionary section for large classification characters and a subclassification dictionary section for small classification characters, and the main classification dictionary section has first features classified by position as described above. Relative size of characters and circumscribing frame, aspect ratio h/w
), vertical relative ratio h/h, and horizontal relative ratio W/WR of the second characteristic character, for example, 24 dots in height x 2 dots in width.
Font data normalized to 4 dots is stored respectively. Generally, the general characteristics of a character are also expressed by the dot patterns near each side of the circumscribing frame of the character, so the dot patterns near the four sides of the circumscribing frame of each character are digitized as four-side data (or peripheral data). Then, font data of characters (major classification characters) whose four side data fall within a predetermined range may be stored in the major classification dictionary section.

一方、細分類辞書部には大分類辞書部に含まれない他の
全ての文字（細分類文字）の正規化されたブオントデー
夕が文字コードに対応して格納されている。On the other hand, the minor classification dictionary section stores normalized font data of all other characters (minor classification characters) not included in the major classification dictionary section in correspondence with the character codes.

（２９）は辞書作成部を示し、この辞書作成部（２９）
はオペレータによって辞書作成モードが設定された場合
には、供給されて来る１文字分の基本矩形切出し文字信
号によって表わされるフォントデータが大分類文字に対
応するか細分類文字に対応するかを判別し、大分類文字
に対応するときはそのフォントデータを正規化して認識
辞書部（３０）の大分類辞書部の所定の文字コードの領
域に書込み、細分類文字に対応するときにはそのフォン
トデー夕を正規化して認識辞書部（３０）の細分類辞書
部の所定の文字コードの領域に書込む如くなす。これに
よって、ユーザ側で簡便に種々の字体に対応できる認識
辞書部（３０）を作成することができる。(29) indicates a dictionary creation section, and this dictionary creation section (29)
When the dictionary creation mode is set by the operator, determines whether the font data represented by the supplied basic rectangular cutout character signal for one character corresponds to a major classification character or a subclassification character. When corresponding to major classification characters, the font data is normalized and written to a predetermined character code area of the major classification dictionary section of the recognition dictionary section (30), and when corresponding to subclassification characters, the font data is normalized. and write it into a predetermined character code area of the subclassification dictionary section of the recognition dictionary section (30). This allows the user to easily create a recognition dictionary section (30) that can handle various fonts.

キャラクタ識別ボード（１９）の認識部（２８）はオペ
レータによって文字認識モードが設定された場合、供給
されて来る基本矩形切出し文字信号Ｓ７が大分類文字に
対応するときには認識辞書部（３０）の大分類文字部の
フォントデータを順次第１の先入れ先出し（ＦＩＦＯ）
レジスタに書込み、細分類文字に対応するときには細分
類文字部のフォントデータを順次第２のＦＩＦＯレジス
タに書込む。また、この動作と平行して認識部（２８）
はその基本矩形切出し文字信号Ｓ７に対応するドットパ
ターンを正規化して順次第３のＦＩＦＯレジスタに書込
む。そして、認識部（２８）は第３のＦＩＦＯレジスタ
中のｆｉ２識対象となる文字のドットパターンと第１の
ＦＩＦＣ５レジスタ中の一連のフォントデータ及び第２
のＦＩＦＯレジスタ中の一連のフォントデータとを順次
比較することにより、その認識対象となる文字のドット
パターンに最も近いフォントデータに対応する文字コー
ドを優先順位の高い順に１０個生成し、この文字コード
をホストコンピュータ（１６）のメインメモリの所定領
域に書込む如くなす。When the character recognition mode is set by the operator, the recognition section (28) of the character identification board (19) changes the size of the recognition dictionary section (30) when the supplied basic rectangular cutout character signal S7 corresponds to a major classification character. The font data of the classified character section is sequentially processed in first-in, first-out (FIFO) format.
When the font data corresponds to subcategory characters, the font data of the subclass character section is sequentially written to the second FIFO register. Also, in parallel with this operation, the recognition unit (28)
normalizes the dot pattern corresponding to the basic rectangular cutout character signal S7 and sequentially writes it into the third FIFO register. The recognition unit (28) then uses the dot pattern of the character to be recognized in the fi2 register in the third FIFO register, the series of font data in the first FIFC5 register, and the second FIFO register.
By sequentially comparing a series of font data in the FIFO register of , 10 character codes corresponding to the font data closest to the dot pattern of the character to be recognized are generated in descending order of priority, and these character codes are is written into a predetermined area of the main memory of the host computer (16).

その優先順位を決定するには、例えば２４Ｘ２４ドット
の個々のドットについて認識対象となる文字のドントパ
ターンと認識辞書部（３０）より読出したフォントデー
タとを比較して、両者の値が異なっているドットの総和
を評価値となし、この評価値が小さい順に優先順位を高
く設定する。またこの最も優先順位の高い文字コードの
評価値が所定値以下の場合には、文字認識が行なわれた
ものとみなしてその最も優先順位の高い文字コードを認
識対象となる文字の文字コードＣとしてメインメモ！Ｊ
　（２５）の原稿（１４）に対応して定められた領域に
書込む。同時に認識部（２８）はその文字コードＣと共
にその文字の大きさを示すデータ及−びその文字の平均
的な外接枠の中での位置を示すデータをメインメモ！Ｊ
　（２５）に書込む如くなす。一方、最も優先順位の高
い文字コードの評価値が所定値を超える場合には、認識
部（２８）は文字認識ができなかったものとみなしてて
リジエクトコードをそのメインメモＩＪ（２５）の原稿
（１４）に対応して定めろれた領域に書込む如くなす。To determine the priority order, for example, for each dot of 24 x 24 dots, the dont pattern of the character to be recognized is compared with the font data read from the recognition dictionary section (30), and the values of the two are found to be different. The total sum of dots is taken as the evaluation value, and the order of priority is set in descending order of the evaluation value. If the evaluation value of the character code with the highest priority is less than a predetermined value, it is assumed that character recognition has been performed, and the character code with the highest priority is set as the character code C of the character to be recognized. Main memo! J
(25) is written in the area determined corresponding to the original (14). At the same time, the recognition unit (28) records the character code C, data indicating the size of the character, and data indicating the position of the character within the average circumscribing frame as a main memo! J
Do as written in (25). On the other hand, if the evaluation value of the character code with the highest priority exceeds the predetermined value, the recognition unit (28) considers that the character could not be recognized and writes the reject code to the main memo IJ (25). It is written in a predetermined area corresponding to the original (14).

上述の文字認識の動作はバイブライン方式で高速に実行
される。The character recognition operation described above is executed at high speed using the Vibration method.

第３図は本例の表示装置（２０）の表示画面（２ＯＡ）
を示し、この表示画面（２０＾）には、原稿（１４）の
１ページ分の文字を認識した認識結果を表示する認識結
果表示領域（３１）を設ける。本例では原稿（１４）は
縦書きであるが、認識結果は横書きで表示している。（
３２）は修正対象となる文字を指示するためのカーソル
を示し、このカーソル（３２）はキーボート責１７〉又
は座標人カユニッ｝　（１８）によって認識結果の任意
の文字の上に移動することができる。Figure 3 shows the display screen (2OA) of the display device (20) of this example.
This display screen (20^) is provided with a recognition result display area (31) for displaying the recognition results of one page of characters of the original (14). In this example, the original (14) is written vertically, but the recognition results are displayed horizontally. (
32) indicates a cursor for indicating the character to be corrected, and this cursor (32) can be moved over any character in the recognition result using the keyboard or the coordinate system (18). .

第３図例カーソル（３２）をリジェクトコードを付され
た文字を示すブランクの上に移動すると、表示画面（２
ＯＡ＞　　の認識結果表示領域ク３１）に近接する領域
（３４）及び（３５）に夫々７対象文字一の語及びフラ
ンクが表示される。（３６）は機能釦領域を示し、この
領域（３６）には「候補」、「コード」及び「かな漢字
」の文字が表示されており、これらの機能釦は座標人カ
ユニッ｝　（１８）によって選択することができる。例
えば「候補」が選択されると、表示画面（２ＯＡ）　　
の一部に修正対象となる文字に最も近いと認識された１
０個の文字のパターンが優先順位の高い順に表示され、
「コード」が選択されるとその修正対象文字を文字コー
ドで直接指定できるようになり、「かな漢字」が選択さ
れるとその修正対゛象文字をかな／漢字変換で人力でき
るようになる。Example in Figure 3 When the cursor (32) is moved over the blank that indicates the character with the reject code, the display screen (2
The words and flanks of the seven target characters are displayed in areas (34) and (35) adjacent to the recognition result display area (31) of OA>, respectively. (36) indicates the function button area, and the characters "Candidate", "Code", and "Kana-Kanji" are displayed in this area (36), and these function buttons are selected by the coordinate person (18). can do. For example, when "Candidate" is selected, the display screen (2OA)
1 recognized as being closest to the character to be corrected
Patterns of 0 characters are displayed in order of priority,
If ``Code'' is selected, the character to be corrected can be specified directly by the character code, and if ``Kana-Kanji'' is selected, the character to be corrected can be manually converted into Kana/Kanji.

また、（３７）は再認識釦、（３８）は文字統合釦を示
し、これらの釦（３７），　（３８）　　を座標入力ユ
ニット（１８）で選択することにより、夫々修正対象文
字の再認識及び分離文字の統合を実行できるようになる
。上述の機能釦領域（３６）の各釦及び釦（３７），　
（３８）の選択は具体的には、十字のカーソル（４３）
を座標人カユニ７Ｎ１８）にて所望の釦の上に移動した
後に座標人カスインチを損作することによって実行され
る。In addition, (37) indicates a re-recognition button, and (38) indicates a character integration button. By selecting these buttons (37) and (38) with the coordinate input unit (18), the characters to be corrected can be re-recognized, respectively. and separate characters can be integrated. Each button of the above-mentioned function button area (36) and button (37),
Specifically, the selection of (38) is made using the cross cursor (43).
This is executed by moving the coordinate character 7N18) onto the desired button and then moving the coordinate character character 7N18) to the desired button.

本例ではその認識結果表示領域（３ｌ）の近傍に面積が
Ｗ　ｘ　Ｗの周辺イメージ表示領域（３３）を設け、こ
の周辺イメージ表示領域（３３）には修正対象となる文
字に対応する原稿（１４）上の文字「え」を中心として
幅１７０ドット×高さ１７０　　ドットの領域のドット
パターンをそのまま表示する。具体的には第２図におい
て、Ｃ　Ｐ　Ｕ（２４）はイメージデータ入力部（２２
）に記憶されている１ページ分の原文字信号Ｓ２（本例
では原文字信号Ｓ１そのもの）の内で修正対象となる文
字を中心として１７０　Ｘ１７０ドットの領域の原文字
信号Ｓ３を読出して、この原文字信号Ｓ３を拡大（補間
）又は縮小（間引き）してＶ　Ｒ　Ａ　Ｍ　（２６＞の
所定領域に書込む如くなす。従って、第３図の表示画面
（２ＯＡ）　　中の周辺イメージ表示領域（３３）には
修正対象となるリジエクト文字に対応する原稿（１４）
上の文字　え．及びこの文字の周辺の例えば８個の文字
のイメージがドノトパターンとしてそのまま表示される
。In this example, a peripheral image display area (33) with an area of W x W is provided near the recognition result display area (3l), and this peripheral image display area (33) displays the document ( 14) Display the dot pattern as it is in an area of 170 dots wide x 170 dots high with the character "e" above as the center. Specifically, in FIG. 2, the CPU (24) is an image data input unit (22
) is stored in the original character signal S2 for one page (in this example, the original character signal S1 itself), the original character signal S3 of a 170 x 170 dot area centered on the character to be corrected is read out, and this The original character signal S3 is enlarged (interpolated) or reduced (thinned) and written in a predetermined area of V R A M (26>. Therefore, the peripheral image display area (2OA) in the display screen (2OA) in FIG. 33) contains the manuscript (14) corresponding to the reject characters to be corrected.
The letters above. Images of, for example, eight characters around this character are displayed as they are as a donot pattern.

このように修正対象の文字及びその周辺の文字のイメー
ジが表示されると、オペレータは原稿（１４）を参、照
することなく修正対象となる文字を前後の文字等に合わ
せて修正することができるので、修正の効率が改善され
る利益がある。When the image of the character to be corrected and the surrounding characters is displayed in this way, the operator can correct the character to be corrected according to the characters before and after it without referring to the manuscript (14). This has the benefit of improving the efficiency of correction.

更に、第３図において、（３９）は切出し文字パターン
表示領域を示し、この表示領域（３９）には修正対象と
なっている文字の基本矩形切出し文字信号Ｓ７に対応す
る２４Ｘ２４ドットに正規化されたドットパターンを表
示する。また、本例において文字統合釦（３８）が選択
されると、表示画面（２ＯＡ）　　の左端の部分に統合
文字領域（４０）が確保され、この統合文字領域（４０
）の下端に登録釦（４１）及び削除釦（４２）が表示さ
れる。この統合文字領域（４０）に表示される文字は第
２図のメインメモＩＪ（２５）の一部に設けちれた文字
統合バノファに記憶されているイメージデータに対応す
るものである。Furthermore, in FIG. 3, (39) indicates a cutout character pattern display area, and in this display area (39), a normalized 24×24 dot pattern corresponding to the basic rectangular cutout character signal S7 of the character to be corrected is displayed. Display the dot pattern. Furthermore, in this example, when the character integration button (38) is selected, an integrated character area (40) is secured at the left end of the display screen (2OA), and this integrated character area (40) is secured at the left end of the display screen (2OA).
) A registration button (41) and a delete button (42) are displayed at the bottom of the screen. The characters displayed in this integrated character area (40) correspond to the image data stored in the character integrated banner provided in a part of the main memo IJ (25) in FIG.

本例の文字認識装置の修正エディター機能によって認識
結渠の修正を行？よう際に分離文字の統合登録を行なう
場合の動作につき説明するに、本例の原稿（１４）は縦
書きであるため対象とする分離文字としては縦書きの場
合に分離文字として判定される文字「え」を想定すると
共に、この原稿（１４）で使用されている文字「え」の
字体は第２図の認識辞書部（３０）に登録されていない
変則字体であるとする。Can you correct the recognition conduit using the correction editor function of the character recognition device in this example? To explain the operation when performing integrated registration of separated characters when writing, since the manuscript (14) in this example is written vertically, the target separated characters are characters that would be determined as separated characters when written vertically. It is assumed that the character ``e'' is used in this manuscript (14), and that the font of the character ``e'' used in this manuscript (14) is an irregular font that is not registered in the recognition dictionary section (30) in FIG.

この場合、その文字「え」を全体として或る文字である
と認識することができないので、その文字「え」を２つ
の構成部分に分離して、上の構成部分を口・」、下の構
成部分を認識できない文字（リジエクトコードを付す文
字）であると認識すると考えろれ、表示画面（２ＯＡ）
　　中の認識結果表示領域（３１）においては第３図に
示す如く本来「え」が表示されるべき位置（４４Ａ＞，
　（４４Ｂ）　　に夫々「・」及びブランクが表示され
る。続いてオペレータがカーソル（３２）を位置（４４
Ｂ）　　に在るブランクの上に移動させて座標人力スイ
ッチを操作することにより、そのブランクに対応する文
字が修正対象文字に指定されて、周辺イメージ表示領域
（３３）及び切出し文字パターン表示領域（３９）には
夫々そのブランクに対応する原稿（１４）上の文字の周
辺のイメージ及びその文字そのもののイメージが表示さ
れる。In this case, the character ``e'' cannot be recognized as a character as a whole, so the character ``e'' is separated into two parts, the upper part becomes 口・'', and the lower part becomes The display screen (2OA) should be considered to recognize the constituent parts as unrecognizable characters (characters with reject codes).
In the recognition result display area (31) inside, as shown in Figure 3, the position where "e" should originally be displayed (44A>,
(44B) are displayed with "." and a blank, respectively. Next, the operator moves the cursor (32) to position (44).
By moving the character over the blank in B) and operating the coordinate manual switch, the character corresponding to the blank is designated as the character to be corrected, and the character is displayed in the peripheral image display area (33) and the cut-out character pattern display area ( 39), an image of the periphery of the character on the document (14) corresponding to the blank and an image of the character itself are displayed.

これらのイメージによってオペレータは原稿（１４）の
分離文字「え」が誤って第１ブロー）ク（４５Ａ）　及
び第２ブロック（４５１３）　　に分離されたまま認識
されてしまったことを知ることができる。These images let the operator know that the separated character "e" in the original (14) was mistakenly recognized as being separated into the first block (45A) and the second block (4513). .

そこで、オペレータがその変則字体の分離文字「え」の
イメージデータを統合して認識辞書部（３０）へ登録す
るため、先ず認識結果表示領域（３１）の位置（４４Ａ
）　　の「・」を修正対象に指定して文字統合釦（３８
）を選択すると、表示画面（２ＯＡ）　　中の統合文字
領域（４０）にその「・」の元のパターン（４６Ａ）が
表示される。次にオペレータが位置（４４Ｂ）　　のブ
ランクを修正対象に指定して文字統合釦（３８）を選択
すると、統合文字領域（４０）のパターン（４６Ａ）　
　の下ｊこそのブランクの元のパターン（４６Ｂ＞　　
が合成されて表示される。そして、オペレータがカーソ
ル（４３）を移動してその統合文字領域（４０）の中の
登録釦（４１）を選択して対応するＪＩＳコード等を入
力することにより、その分離文字１え，の全体を形成す
る切出し文字信号Ｓ７がメインメモ！Ｊ　（２５）から
辞書作成部（２９）へ供給され、その分離文字一え」の
全体のイメージヂータ（正規化データ、大分類用のデー
タ、細分類用のデータ等）が認識辞書部（３０）に登録
される。尚、その統合文字領域（４０）の中の削除釦（
４２〉を選択することにより、その統合文字領域（４０
）に表示されている分離文字の各構成部分を次第に削除
することができる。Therefore, in order for the operator to integrate the image data of the separated character "e" in the irregular font and register it in the recognition dictionary section (30), first the position (44A) of the recognition result display area (31) is
) to be corrected and press the character integration button (38
) is selected, the original pattern (46A) of that "." is displayed in the integrated character area (40) on the display screen (2OA). Next, when the operator specifies the blank at position (44B) as the correction target and selects the character integration button (38), the pattern (46A) in the integrated character area (40)
The lower part of the blank is the original pattern (46B>
are combined and displayed. Then, by moving the cursor (43), selecting the registration button (41) in the integrated character area (40), and inputting the corresponding JIS code, etc., the operator can select the entire separated character 1e. The cutout character signal S7 that forms is the main memo! J (25) to the dictionary creation unit (29), and the entire image data (normalized data, major classification data, subclassification data, etc.) of the separated character set is supplied to the recognition dictionary unit (30). ) will be registered. In addition, the delete button (
42>, the integrated character area (40
) can be progressively deleted.

次に、本例の文字認識装置の修正エディターによる分離
文字「え」の文字統合の動作につき第４図のステップ（
１０１）〜（１０６）　を参照して詳細に説明するに、
原稿（１４）の座標系、分離文字「え」のメインメモ’
ＩＪ　（２５）への入力段階のイメージのメモリマップ
及びその分離文字「え」を統合するためメインメモ’Ｉ
Ｊ　（２５＞の文字統合バッファ領域に記憶された統合
ブロック（４８）を夫々第５図Ａ，Ｂ及びＣの如く表わ
す。また、その分離文字「え」を分離して認識したとき
の２つの構成部分の外接枠を夫々第５図已に示す如く第
１ブＤ−／ク（４７Ａ）　　及び第２ブロック（４７Ｂ
）　　と称し、座標系（Ｘ，Ｙ）上のブロック（４７Ａ
），　（４７Ｂ）　　及び（４８）の始点Ｑ，，　Ｑ２
，Ｒの座標を夫々（ｘ＋，ｙ＋），　（Ｘ２，　ｙ２）
及び（Ｘ３，ｙｓ）で表わし、ブロック（４７Ａ），　
（４７Ｂ）　及び（４８）の（Ｘ方向の長さ、Ｙ方向の
長さ）のデータを夫々（Ｗｌ，　ｈ　ｌ），　（Ｗ２．
　ｈ２）及び（Ｗ３，ｈ３）で表わす。Next, the steps shown in Figure 4 (
To explain in detail with reference to 101) to (106),
Coordinate system of manuscript (14), main memo of separated character "E"'
Main memo 'I' to integrate the memory map of the image at the input stage to IJ (25) and its separated character 'e'
The integrated blocks (48) stored in the character integration buffer area of J (25> are shown as shown in FIG. As shown in Figure 5, the circumscribing frames of the constituent parts are the first block (47A) and the second block (47B).
), and the block (47A
), (47B) and (48) starting point Q,, Q2
, R coordinates (x+, y+), (X2, y2) respectively
and (X3, ys), block (47A),
(47B) and (48) (length in the X direction, length in the Y direction) are (Wl, h l) and (W2.
h2) and (W3, h3).

ステップ（１０１）オペレータが第１ブロック（４７＾）を選択してメイン
メモＩＪ（２５）の文字統合バッファ領域に登録するよ
うにＣ　Ｐ　Ｕ（２４）に指示する。Step (101) The operator instructs the CPU (24) to select the first block (47^) and register it in the character integration buffer area of the main memo IJ (25).

一ステップ（１０２）Ｃ　Ｐ　’ＩＪ　（２４）はその第１ブロック（４７Ａ
）　　をそのまま統合ブロック（４８）とみなす。即ち
、（Ｘ３．　Ｙ３．　Ｗ３．　ｈ３）　＝（Ｘ＋，　３
’ｌ＋　ｗ，，　ｈ＋）　＋＋−（ｘ）が成立し、この
統合ブロック（４８）のドットパターンイメージを表示
画面（２ＯＡ）　　の統合文字領域ク４０）に表示する
。One step (102) C P 'IJ (24) is its first block (47A
) is regarded as the integrated block (48). That is, (X3. Y3. W3. h3) = (X+, 3
'l+w,,h+)++-(x) is established, and the dot pattern image of this integrated block (48) is displayed on the integrated character area 40) of the display screen (2OA).

−ステップ（１０３）オペレータが第２ブロック（４７Ｂ）　　を選択してメ
インメモ！Ｊ　（２５）の文字統合バンファ領域に追加
登録するようにＣ　Ｐ　Ｕ（２４）に指示する。-Step (103) The operator selects the second block (47B) and writes the main memo! Instructs CPU (24) to additionally register in the character integration buffer area of J (25).

ステップ（１０４）Ｃ　Ｐ　Ｕ（２４）は矩形の第１ブロック（４７Ａ）　
　と矩形の第２ブロック（４７Ｂ）　　とよりこれら２
つのブロック（４７Ａ）．　（４７Ｂ）　　に外接する
矩形の統合ブロソク（４８）を合成する。即ち、先ず統
合ブロック（４８）の始点Ｒの座標（Ｘ３，Ｙ３）を次
式によって求める。Step (104) CPU (24) is the rectangular first block (47A)
and the second rectangular block (47B).
1 block (47A). (47B) A rectangular integration block (48) circumscribing is synthesized. That is, first, the coordinates (X3, Y3) of the starting point R of the integrated block (48) are determined by the following equation.

Ｘ　３　＝ｍｉｎ（　ｘ　ｌ．　Ｘ　２）　　　　　　
　　　＝　＝　（２Ａ）ｙ：＋＝ｍ＋ｎ（ｙ＋，　ｙ２
）　　　　　　　　　　　・・・１２Ｂ）尚、本例の座
標（Ｘ，Ｙ）の最小単位は１×１ドットであるとする。X 3 = min( x l. X 2)
= = (2A)y:+=m+n(y+, y2
)...12B) It is assumed that the minimum unit of coordinates (X, Y) in this example is 1×1 dot.

次に、ブロック（４７Ａ），　（４７Ｂ）の内で原点（
０．０＞からＸ方向及びＹ方向へ最も遠い部分の座標を
夫々ｘｓＥ及びｙ３Ｅとすると、これらの座標（ｘ３Ｅ
，ｙｓＥ）　　は次のように表わすことができる。Next, in blocks (47A) and (47B), the origin (
If the coordinates of the farthest part in the X and Y directions from 0.0> are xsE and y3E, respectively, then these coordinates (x3E
,ysE) can be expressed as follows.

ｘ３Ｅ＝ｍａｘ（ｘ，＋ｗ＋　−１，　Ｘ２＋Ｗ２−１
）−＝（３Ａ＞３７＋Ｅ＝ｍａＸ（３／＋＋　ｈ＋　　
１，　Ｖ２千ｈ　，−１）−・−　（３Ｅｌ）従って、
式（２Ａ），　（２Ｂ）の始点Ｒの座標（　Ｘ：ｌ＋　
ｙｓ）を用いることにより、統合ブロック（４８）のＸ
方向の長さＷ３　及びＹ方向の長さｈ，は次式で表すこ
とができる。x3E=max(x, +w+ -1, X2+W2-1
)-=(3A>37+E=maX(3/++ h+
1, V2,000h , -1) - - (3El) Therefore,
The coordinates of the starting point R of equations (2A) and (2B) (X:l+
ys) of the integrated block (48).
The length W3 in the direction and the length h in the Y direction can be expressed by the following formula.

Ｗ，＝　Ｘ３Ｅ　　Ｘ：ｌ＋１．　　　　　　　　　＝
＝＜４Ａ）ｈ：ｒ＝Ｖ３Ｅ　　Ｙ３二１　　　　　　　
　・・・・（４日）ーステップ（１０５）　− Ｃ　Ｐ　Ｕ（２４＞はステップ（１０４）　　で求めた
統合ブロック（４８〉に対応してメインメモリ（２５）
に記憶されている切出し文字信号Ｓ７をその文字のＪＩ
Ｓコードと共に辞書作成部（２９）に供給することによ
り、その分離文字「え」の認識辞書部（３０）への登録
を行なった後に終了のステップ（１０６）へ移行する。W,=X3E X:l+1. =
=<4A)h:r=V3E Y321
...(4th day) - Step (105) - CPU (24> is the main memory (25) corresponding to the integrated block (48>) obtained in step (104)
The cutout character signal S7 stored in
By supplying the separated character "e" to the dictionary creation section (29) together with the S code, the separated character "e" is registered in the recognition dictionary section (30), and then the process moves to the final step (106).

尚、分離文字が３個以上の構成部分を有する場合には（
例えば文字「三」など）、更に第３ブロッグ等を統合す
るステップが必要となる。In addition, if the separated character has three or more constituent parts, (
For example, the character "three"), a step of integrating a third blog, etc. is required.

上述のように本例によれば、メインメモ！Ｊ　（２５）
の文字統合バソファ領域で所望の文字の各構成部分に外
接するブロックを統合して統合ブロック（４８）を構成
し、この統合ブロソク（４８）に対応する文字信号全体
を切出し文字信号Ｓ７とみなして辞書作成部（２９）へ
供給することにより、認識辞書部（３０）へイメージデ
ータを学習作用によって登録することができるため、ど
のような複雉な分離文字であってもそのイメージデータ
を認識辞書ｊｌ’ｌ３（３０）に追加登録することがで
きる。従って、次の認識からはその分離文字を正確に認
識することができるようになるため、分離文字の識別率
を高めることができる利益がある。According to this example as mentioned above, the main memo! J (25)
An integrated block (48) is constructed by integrating the blocks circumscribed to each constituent part of the desired character in the character integration batho area of By supplying the image data to the dictionary creation unit (29), the image data can be registered in the recognition dictionary unit (30) through a learning action, so no matter how complex the separated characters are, the image data can be input to the recognition dictionary. Additional registration can be made to jl'l3 (30). Therefore, since the separated character can be accurately recognized from the next recognition, there is an advantage that the identification rate of the separated character can be increased.

また、本例では認識結果を修正するための修正エディタ
ーが上述の文字統合を行なっているため、大がかりな辞
書作成用の装置を使用する必要がない。Furthermore, in this example, since the correction editor for correcting the recognition result performs the above-mentioned character integration, there is no need to use a large-scale dictionary creation device.

尚、本発明は上述実施例に限定されず、本発明の要旨を
逸脱しない範囲で種々の構成を採り得ることは勿論であ
る。It should be noted that the present invention is not limited to the above-described embodiments, and it goes without saying that various configurations may be adopted without departing from the gist of the present invention.

〔Effect of the invention〕

本発明によれば、分離文牢の識別エを容易に高めること
ができる利益がある。According to the present invention, there is an advantage that the identification efficiency of the separated prison can be easily improved.

[Brief explanation of drawings]

第１図は本発明の一実施例の文字認Ｒ装置の／ステム構
成を示す一部斜視図を含む正面図、第２図は第１図例の
要部のより詳細一一構成を示す一部斜視図を含む構ｌｉ
３：図、第３図は一実施例の表示画面（２ＯＡ）　　の
構成例を示す正面図、第４図は一実施例の分離文字の登
録動作を示すフローチャート図、第５図は一実施例の各
座標系の相互の関係の説明に供する線図、第６図は従来
の文字認識装置の全体構成を示すブロック図、第７図及
び第８図は夫々従来の文字列及び原矩形の切出し動作の
説明に供する線図である。（１３）はスキャナー、（１４）は原稿、（１５）はイ
メージデータ入出力ボート、（１６）はホストコンピュ
ータ、（１９）はキ丁ラクタ識別ボート、（２０）は表
示装置、（２４）は中央処理ユニット、（２５）はメイ
ンメモリ、（２８）は認識郎、（２９）は認識辞書部、
（３８）は文字統合釦、（４８）は統合ブロックである
。代理人松隈秀盛第４図第５図FIG. 1 is a front view including a partial perspective view showing the stem configuration of a character recognition R device according to an embodiment of the present invention, and FIG. Structure including partial perspective view
3: Figure 3 is a front view showing a configuration example of the display screen (2OA) of one embodiment, FIG. 4 is a flowchart showing the registration operation of separated characters in one embodiment, and FIG. 5 is one embodiment. 6 is a block diagram showing the overall configuration of a conventional character recognition device, and FIGS. 7 and 8 are diagrams showing conventional character strings and cutting out original rectangles, respectively. FIG. 3 is a diagram for explaining the operation. (13) is a scanner, (14) is a document, (15) is an image data input/output board, (16) is a host computer, (19) is a printer identification board, (20) is a display device, (24) is a central processing unit, (25) main memory, (28) recognition unit, (29) recognition dictionary unit,
(38) is a character integration button, and (48) is an integration block. Agent Hidemori Matsukuma Figure 4 Figure 5

Claims

[Claims]

a character extraction unit that sequentially extracts extracted character signals forming each component of one character or separated characters from an original character signal corresponding to an image of an original document; and a recognition dictionary unit that stores image data corresponding to character codes. a character identification unit that uses the recognition dictionary unit to identify a character code corresponding to one or more of the cutout character signals; and a character identification unit that recognizes the image data of the cutout character signal supplied from the character cutout unit. and a dictionary creation section that registers the separated characters in the dictionary section, and when the character identification section makes a mistake in identifying a separated character, the cutout character signal forming the entire separated character that was misidentified is sent from the character cutting out section to the above. By supplying it to the dictionary creation section,
A character recognition device characterized in that image data of the separated character that has been incorrectly identified is registered in the recognition dictionary section.