JPH03214380A

JPH03214380A - Character recognizing device

Info

Publication number: JPH03214380A
Application number: JP2009987A
Authority: JP
Inventors: Keiko Abe; 阿部　惠子; Takayuki Fujikawa; 藤川　孝之; Susumu Takasaki; 高崎　進; Katsumasa Sakai; 酒井　勝正; Hiromichi Aoki; 青木　宏導
Original assignee: Sony Corp; Toppan Printing Co Ltd
Current assignee: Sony Corp; Toppan Inc
Priority date: 1990-01-19
Filing date: 1990-01-19
Publication date: 1991-09-19

Abstract

PURPOSE:To substantially erase the part which does not become the object of character recognition in an original document without spoiling its original document by moving the prescribed part of an original character signal stored in a first memory to a second memory at any time. CONSTITUTION:The device is provided with a first memory for storing the prescribed range portion of an original document 14, for instance, an original character signal S1 of a one-page portion and supplying it to its character discriminating parts 16, 19, and a second memory 23 for backing up a first memory 22. In such a state, the prescribed part of the original character signal S1 stored in the first memory 22 is moved to the second memory 23. In such a way, it becomes equal to the case when the part of the original document 14 corresponding to the prescribed part is erased substantially without spoiling the original document 14, and also, becomes equal to the case when its erased part is restored substantially by returning again the original character signal S1 moved to its second memory 23 to the original area of its first memory 22.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、例えば印刷文字の文字を認識して文字コード
に変換する場合に使用して好適な文字認識袋置に関する
。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character recognition bag holder suitable for use, for example, in recognizing printed characters and converting them into character codes.

[Summary of the invention]

本発明は、原文書の濃淡に対応する原文字信号を生成す
る原稿読取部と、その原文字信号に対応する文字を識別
する文字識別部とを有する文字認識装置において、その
原文書の所定範囲分の原文字信号を記憶してその文字識
別部に供給する第１のメモリと、この第１のメモリをバ
ックアップする第２のメモリとを設け、その第１のメモ
リに記憶されている原文字信号の所定部分を随時その第
２のメモリに移すことにより、原文書を損ねることなく
実質的にその原文書の内で文字認識の対象とならない部
分を消去できる様にしたものである。The present invention provides a character recognition device that includes a document reading section that generates an original character signal corresponding to the density of the original document, and a character identification section that identifies characters corresponding to the original character signal. A first memory that stores the original character signal of the minutes and supplies it to the character recognition unit, and a second memory that backs up the first memory, and the original character signal stored in the first memory is provided. By moving a predetermined portion of the signal to the second memory at any time, it is possible to substantially erase portions of the original document that are not subject to character recognition without damaging the original document.

［従来の技術］例えば活版印刷において作業者が活字を拾う工程を自動
化するためには、タイプ印刷等で作成された原稿の各文
字を認識して文字コードに変換する文字認識装置が必要
である。[Prior Art] For example, in order to automate the process in which a worker picks up type in letterpress printing, a character recognition device is required that recognizes each character in a manuscript created by type printing and converts it into a character code. .

第６図は特開昭６２−７４１８１号公報で開示されてい
る従来の文字認識装置を示７し、この第６図において、
（１）は原稿読取部であり、この原稿読取部（１）から
原稿の１ページ分の濃淡に対応する原文字信号Ｓ１が文
字列切出し部（２）に供給される。この原文字信号Ｓ１
は原稿を所定の密度でドット分解し、黒いドットをハイ
レベル“１′′、白いドットをローレベル“０”で表わ
したものであるが、各ドットの濃度を複数ビットの２進
数で表わす場合もある。FIG. 6 shows a conventional character recognition device disclosed in Japanese Unexamined Patent Publication No. 62-74181, and in this FIG.
(1) is a document reading section, and from this document reading section (1), an original character signal S1 corresponding to the shading of one page of the document is supplied to a character string cutting section (2). This original character signal S1
The original is divided into dots at a predetermined density, and black dots are expressed as high level ``1'' and white dots as low level ``0.'' However, when the density of each dot is expressed as a multi-bit binary number, There is also.

文字列切出し部（２）は第１段前処理部（３）、第２段
前処理部（４）及び第３段前処理部（５）より構成され
、原文字信号Ｓ１には第１段前処理部（３）において雑
音の除去及び原稿の回転補正がなされ、第２段前処理部
（４）において文字領域ＡＲ（第７図参照）がその他の
領域（写真、図面等の領域）から区分されてその文字領
域ＡＲに含まれるイメージデータだけが抽出され、第３
段前処理部（５）においてその抽出された文字領域ＡＲ
に含まれる文字列ＡＰＩ，ＡＲ２，・・・・に対応する
文字列信号Ｓ４が抽出される。The character string extraction unit (2) is composed of a first stage preprocessing unit (3), a second stage preprocessing unit (4), and a third stage preprocessing unit (5). The preprocessing unit (3) removes noise and corrects the rotation of the document, and the second stage preprocessing unit (4) separates the character area AR (see Figure 7) from other areas (areas of photographs, drawings, etc.). Only the image data included in the character area AR is extracted, and the third
The character area AR extracted in the pre-processing section (5)
A character string signal S4 corresponding to the character strings API, AR2, . . . included in the character strings is extracted.

この文字列信号Ｓ４の抽出を行なうには、第７図で示す
如く、文字領域ＡＲの各ドットの位置を水平方向にとっ
たＸ軸と垂直方向にとったｙ軸とよりなる（Ｘ，　Ｙ）
座標で表わし、各ドットの“１”又は“′０”の値をＹ
軸上に投影して和をとることによりＹ投影信号Ｓｙを生
成する。そして、このＹ投影信号Ｓｙを所定の闇値レベ
ルで２値化すると、この２値化した信号の内のハイレベ
ル“ｌ”の区間が夫々文字列ＡＲ１．ＡＲ２，・・・・
に対応する如くなり、文字列信号Ｓ４は後続の文字切出
し部（６）に供給される。In order to extract this character string signal S4, as shown in FIG. )
Expressed in coordinates, the value of “1” or “0” of each dot is expressed as Y
A Y projection signal Sy is generated by projecting onto the axis and calculating the sum. Then, when this Y projection signal Sy is binarized at a predetermined dark value level, sections of high level "l" in this binarized signal are respectively character strings AR1. AR2,...
The character string signal S4 is supplied to the subsequent character cutting section (6).

文字切出し部（６）においては、例えば第８図Ａに示す
ｉ番目の文字列Ａ　Ｒ　ｉの文字列信号Ｓ４をＸ軸上に
投影してＸ投影信号Ｓχを生成し、このＸ投影信号Ｓ×
を最小レベル（値が１）の閾値ＴＨＩで２｛！！化する
こと６こより粗切出し信号ＤＴＩ（第８図Ｃ）を得て、
このＸ投影信号Ｓｘを中程度のレベルの闇値ＴＨ２（第
８図Ｄ）で２値化することにより細切出し信号ＤＴ２（
第８図Ｅ）を得る。同様に粗切出し信号ＤＴＩがハイレ
ヘル“１゜“の区間だけで個々にＹ投影信号Ｓｙを生成
すること６こより、Ｙ方向の切出し信号を生成すること
ができる。In the character cutting section (6), for example, the character string signal S4 of the i-th character string AR i shown in FIG. 8A is projected onto the X axis to generate an X projection signal Sχ, and this ×
at the minimum level (value 1) threshold THI is 2{! ! 6 to obtain a coarse cutout signal DTI (Fig. 8C),
By binarizing this X projection signal Sx with a medium level darkness value TH2 (Fig. 8D), a finely cut signal DT2 (
Figure 8E) is obtained. Similarly, by individually generating the Y projection signal Sy only in the section where the coarse cutting signal DTI is high level "1°", the cutting signal in the Y direction can be generated.

そして、最終的に第８図Ａに示す如く、例えば文字「て
」についてはこの文字に外接する外接枠（９）の内部で
ハイレベル“１″となると共に、分離文字である「い」
については分離されている各部に外接する外接枠（ＩＩ
），（１２）の内部でハイレベル“１”となる切出し信
号が得られ、入力される文字列信号Ｓ４からその切出し
信号がハイレベル“１゜“となる部分だけを順次切出し
た信号が基本矩形切出し文字信号Ｓ７となる。Finally, as shown in FIG. 8A, for example, the character "te" becomes a high level "1" inside the circumscribing frame (9) that circumscribes this character, and the separated character "i"
Regarding, the circumscribing frame (II
), (12), a signal with a high level of "1" is obtained, and the basic signal is a signal obtained by successively cutting out only the parts where the signal has a high level of "1°" from the input character string signal S4. This becomes a rectangular cut-out character signal S7.

尚、第８図Ｅの細切出し信号ＤＴ２は各文字のより微細
な構造を調べる場合に使用される。また、第８図Ａの分
離文字である「い」については外接枠（１１）　，　（
１２）が２個あるため、後に文字識別の段階で統合を行
なう必要がある。Incidentally, the fine cutting signal DT2 in FIG. 8E is used when examining the finer structure of each character. In addition, for the separated character “i” in Figure 8A, the circumscribing frame (11), (
12), it is necessary to integrate them later at the character identification stage.

（７）は文字識別部を示し、この文字識別部（７）は基
本矩形切出し文字信号Ｓ７を各外接枠毎に取込んで文字
認識を行なう。具体的には、先ず位置による分類を行な
い、第８図八の文字例ＡＲｉに対して上半分の範囲に存
在する文字（「”」，「”」，「゜」など）及び下半分
に存在する文字（「。」，ｒ．」，ｒ，」など）を第１
特徴文字としてパターンマッチングを行って、対応する
文字コード（ＪＩＳコードなど）を付与する。これで識
別ができない場合には、外接枠の幅をＷ、高さをｈとし
て、縦横比ｈ　／　ｗ及び相対的大きさによる分類を行
なう。即ち、縦横比ｈ　／　ｗがＱ＜ｈ／ｗ＜０．５の
範囲に入るか、１．５＜ｈ／ｗの範囲に入るかによって
分類を行なう。更に、平均的な大きさの外接枠の幅をｗ
Ｒ、高さをｈＲとして、縦相対比ｈ／ｈえ及び横相対比
Ｗ／ＷＲの値が夫々０〈ｈ／ｈｌ＜０．５及び０　＜　
ｗ　／　Ｗ　Ｒ　＜　０．　５の範囲に入るか否かによ
って分類を行ない。上述の範囲に入る文字を第２特徴文
字としてパターンマッチングを行なう。Reference numeral (7) indicates a character recognition section, and this character recognition section (7) takes in the basic rectangular cut-out character signal S7 for each circumscribed frame and performs character recognition. Specifically, first, classification is performed by position, and characters existing in the upper half range (such as "", """, "゜", etc.) and characters existing in the lower half of the character example ARi in Figure 8 are classified. characters (such as ``.'', r.'', r,'') as the first
Pattern matching is performed as a characteristic character, and a corresponding character code (JIS code, etc.) is assigned. If identification is not possible, the width of the circumscribing frame is set to W, the height is set to h, and classification is performed based on the aspect ratio h/w and relative size. That is, classification is performed depending on whether the aspect ratio h/w falls within the range of Q<h/w<0.5 or 1.5<h/w. Furthermore, the width of the average size circumscribing frame is w
R, the height is hR, and the values of vertical relative ratio h/h and horizontal relative ratio W/WR are 0<h/hl<0.5 and 0<
w/WR<0. Classification is performed based on whether or not it falls within the range of 5. Pattern matching is performed using characters that fall within the above range as second characteristic characters.

また、第１及び第２特徴文字に分類されない文字に対し
ては個別に記憶されているドットパターンとのパターン
マッチングを行ない、所定の合致度が得られた場合には
その文字コードを付与する。Further, for characters that are not classified as the first or second characteristic characters, pattern matching is performed with individually stored dot patterns, and if a predetermined degree of matching is obtained, the character code is assigned.

それでも認識できない文字が残った場合には、その外接
枠を更に複数の微少外接枠に分離する再切出し及び後に
続く外接枠と合体させる統合の動作が実効される。尚、
最終的に認識できない文字が残った場合には、その文字
には認識できない文字であることを示すリジエクトコー
ドが付与される。If a character that cannot be recognized still remains, the operations of re-cutting the circumscribing frame to separate it into a plurality of minute circumscribing frames and merging them with the subsequent circumscribing frame are performed. still,
If an unrecognized character ultimately remains, a reject code is given to that character to indicate that it is an unrecognized character.

その文字識別部（７）で生成された原稿の１ページ分の
文字コードは文字の位置や大きさを示す情報と共に所定
の記憶装置に記憶される。更に、認識結果が正しいかど
うかをオペレータが判定できる様に、その文字コードに
対応する文字のビデオ信号が陰極線管等の表示部（８）
に供給され、この表示部（８）の表示画面には原稿に対
応した形式で認識結果としての一群の文字が表示される
。この場合、認識できなかった文字の部分には高輝度の
矩形のブランクが表示される。従って、修正対象文字や
認識できない文字が存在する場合には、オペレータはワ
ードプロセッサと同じ要領でその部分に所望の文字を打
込むことができる。The character code for one page of the manuscript generated by the character identification section (7) is stored in a predetermined storage device along with information indicating the position and size of the character. Furthermore, the video signal of the character corresponding to the character code is displayed on a display unit (8) such as a cathode ray tube so that the operator can judge whether the recognition result is correct or not.
A group of characters as a recognition result is displayed on the display screen of the display unit (8) in a format corresponding to the document. In this case, a high-intensity rectangular blank is displayed in the portion of the character that cannot be recognized. Therefore, if a character to be corrected or a character that cannot be recognized exists, the operator can input the desired character into that part in the same way as a word processor.

上述のように、原稿の濃淡に対応する原文字信号Ｓ１を
生成し、この信号Ｓ１を１個の文字に外接する外接枠で
切出して切出し文字信号Ｓ７を生成し、この切出し文字
信号Ｓ７に対応する文字を特定するという文字認識のア
ルゴリズム自体は基本的には確立しているということが
できる。As described above, an original character signal S1 corresponding to the shading of the original is generated, and this signal S1 is cut out using a circumscribing frame circumscribing one character to generate a cutout character signal S7, and a cutout character signal S7 corresponding to the cutout character signal S7 is generated. It can be said that the character recognition algorithm itself, which specifies the characters that are used, is basically established.

[Problem to be solved by the invention]

しかしながら、その文字認識のアルゴリズムを用いた文
字認識装置を実際にオフィスに設定してオペレータが使
用した結果、操作性において種々の不都合があることが
判明した。However, when a character recognition device using this character recognition algorithm was actually set up in an office and used by an operator, it was found that there were various inconveniences in operability.

その不都合の１つは、原稿の中に文字認識に不必要な雑
音，罫線，図，写真等が含まれている場合には予めその
不必要な部分を修整ペイント等で消去しなければならず
、オリジナル原稿を保存するにはコピーを取っておかな
ければならなかったことである。更に、修整ペイント等
で消去した部分を復元したい場合にはオリジナル原稿を
セントし直す必要があり作業効率が悪かった。One of the inconveniences is that if the manuscript contains noise, ruled lines, figures, photographs, etc. that are unnecessary for character recognition, the unnecessary parts must be erased in advance using correction paint, etc. , in order to preserve the original manuscript, it was necessary to make a copy. Furthermore, when it is desired to restore a portion erased by retouching paint or the like, it is necessary to re-centre the original manuscript, which is inefficient.

本発明は斯かる点に鑑み、原稿を損なうことなく実質的
に原稿の内の不必要な部分を消去できると共に、随時こ
の消去した部分を復元できるようにすることを目的とす
る。In view of the above, it is an object of the present invention to make it possible to substantially erase unnecessary portions of a document without damaging the document, and to restore the erased portions at any time.

〔課題を解決するための手段］本発明による文字認識装置は、原文書（ｌ４）の濃淡に
対応する原文字信号Ｓｌを生成する原稿読取部（１３）
と、その原文字信号Ｓ１に対応する文字を識別する文字
識別部（１６．１９）とを有する文字認識装置において
、その原文書（１４）の所定範囲分（例えば１ページ分
）の原文字信号Ｓ１を記憶してその文字識別部（１６．
１９）に供給する第１のメモリ（２２）と、この第１の
メモリ（２２）をパックアップする第２のメモリ（２３
）とを設け、その第１のメモリ（２２）に記憶されてい
る原文字信号Ｓ１の所定部分を随時その第２のメモリ（
２３）に移すようにしたものである。[Means for Solving the Problems] A character recognition device according to the present invention includes a document reading unit (13) that generates an original character signal Sl corresponding to the shading of an original document (14).
and a character identification unit (16.19) that identifies the character corresponding to the original character signal S1, the original character signal of a predetermined range (for example, one page) of the original document (14) is used. S1 is stored and its character identification section (16.
19), and a second memory (23) that backs up this first memory (22).
), and a predetermined portion of the original character signal S1 stored in the first memory (22) is transferred to the second memory (22) at any time.
23).

〔作用］斯かる本発明によれば、第１のメモリ（２２）に記憶さ
れている原文字信号Ｓ１の所定部分をその第２のメモリ
（２３）に移すことにより原文書（ｌ４）を損なうこと
なく実質的にその所定部分に対応する原文書（１４）の
部分が消去されたのと同等になる。また、その第２のメ
モリ（２３）に移された原文字信号Ｓ１を再びその第１
のメモリ（２２）の元の領域に戻すことにより実質的に
その消去された部分が復元されたのと同等になる。[Operation] According to the present invention, the original document (l4) is damaged by moving a predetermined portion of the original character signal S1 stored in the first memory (22) to the second memory (23). This is essentially equivalent to erasing the portion of the original document (14) corresponding to the predetermined portion. Also, the original character signal S1 transferred to the second memory (23) is transferred to the first memory (23) again.
By restoring the erased portion to its original area in the memory (22), it is essentially equivalent to restoring the erased portion.

〔実施例］以下、本発明による文字認識装置の一実施例につき第１
図〜第５図を参照して説明しよう。[Example] Hereinafter, the first example of the character recognition device according to the present invention will be described.
This will be explained with reference to FIGS.

第１図は本例の文字認識装置のシステム構成を示し、こ
の第１図において、（１３）はドキュメントフィダーと
イメージリーダとよりなるスキャナー（１４）はこのス
キャナーにセットされた原稿であり、スキャナー（１３
）は原稿（１４）の１ページ全体を例えば４００Ｘ４０
０ｄｐｉ　（ドット／インチ）の読取り密度でドット分
解し、各ドットの濃淡に対応した原文字信号Ｓ１を生成
する。FIG. 1 shows the system configuration of the character recognition device of this example. In FIG. (13
) is the entire page of manuscript (14), for example, 400X40
The dots are separated at a reading density of 0 dpi (dots/inch) to generate an original character signal S1 corresponding to the shade of each dot.

（１５）はイメージデータ入出力ボード、（１６）はホ
ストコンピュータ、（２ｌ）はプリンターを示し、イメ
ージデータ入出力ボード（１５）は原文字信号ｓ１の所
定部分をホストコンピュータ（１６）に供給すると共に
、ホストコンピュータ（１６）から出力される印字用の
信号をプリンター（１６）に供給する。（１７）はホス
トコンピュータ（１６）を操作するためのキーボード、
（１８）はホストコンピュータ（１６）に各種座標を入
力するための座標入力ユニット、（１９）はキャラクタ
識別ボードを示し、ホストコンピュータ（Ｉ６）が原文
字信号Ｓ１から１個の文字の外接枠の内部でハイレヘル
“′ｌ゜”となる切出し信号を用いて切出した基本矩形
切出し文字信号Ｓ７を順次そのキャラクタ識別ボード（
１９）に供給すると、キャラクタ識別ボード（１９）は
その切出し文字信号ｓ７に対応する文字の文字コートＣ
（その文字の認識ができない場合にはりジェクトコード
）をホストコンピュータ（１６）に供給する。(15) is an image data input/output board, (16) is a host computer, (2l) is a printer, and the image data input/output board (15) supplies a predetermined portion of the original character signal s1 to the host computer (16). At the same time, a printing signal output from the host computer (16) is supplied to the printer (16). (17) is a keyboard for operating the host computer (16);
(18) is a coordinate input unit for inputting various coordinates to the host computer (16), and (19) is a character identification board, in which the host computer (I6) extracts the circumscribed frame of one character from the original character signal S1. The basic rectangular cut-out character signal S7 cut out using the cut-out signal that becomes high level "'l゜" internally is sequentially passed through the character identification board (
19), the character identification board (19) detects the character code C of the character corresponding to the extracted character signal s7.
(or eject code if the character cannot be recognized) is supplied to the host computer (16).

（２０）は陰極線管よりなる表示装置を示し、この表示
装置（２０）の表示画面の所定領域には原稿（１４）の
１ページ分の文字を認識した結果を原稿（１４）に対応
した形式で表示する如くなす。また、この表示装置（２
０）の表示画面には必要に応じて原稿（１４）の１ペー
ジ分又は所定部分のドットパターンそのものをも表示で
きる如くなす。(20) indicates a display device made of a cathode ray tube, and a predetermined area on the display screen of this display device (20) displays the results of recognizing characters for one page of the manuscript (14) in a format corresponding to the manuscript (14). Do as shown in . In addition, this display device (2
The display screen 0) is arranged so that the dot pattern itself of one page or a predetermined portion of the original (14) can be displayed as necessary.

第２図は第１図例のプリンター（２１）に関する処理部
分を除くより詳細な構成を示し、この第２図のイメージ
データ入出力ボード（１５）において、（２２）は原稿
（１４）の１ページ分以上のドントパターンを記憶でき
るメモリを有するイメージデータ入力部、（２３）は同
じく原稿（１４）の１ページ分以上のドットパターンを
記憶できるパソクアノプメモリであり、スキャナー（１
３）より出力された原稿（１４）の１ページ分の原文字
信号Ｓ１をイメージデータ，入力部（２２）に記憶する
。また、このイメージデータ人力部（２２）に記憶され
ている原文字信号ＳＬの所望の部分を随時ハングアップ
メモリ（２３）に移送すると共に、その所望の部分は文
字がない白紙状体に対応する（例えばゼロレベル“０゜
゛の）原文字信号で置換する如くなし、この所望の部分
が置換えられた原文字信号Ｓ１を原文字信号Ｓ２と称し
、この原文字信号Ｓ２の所定部分を原文字信号Ｓ３と称
する。FIG. 2 shows a more detailed configuration of the printer (21) shown in FIG. 1, excluding the processing part. In the image data input/output board (15) shown in FIG. The image data input section (23) has a memory that can store dot patterns for more than one page, and the image data input section (23) is a Pasokanop memory that can also store dot patterns for more than one page of the original (14).
3) Store the original character signal S1 for one page of the original document (14) outputted from the image data input section (22). Further, a desired portion of the original character signal SL stored in the image data manual section (22) is transferred to the hang-up memory (23) at any time, and the desired portion corresponds to a blank sheet with no characters. The original character signal S1 with this desired portion replaced is called the original character signal S2, and a predetermined portion of this original character signal S2 is replaced with an original character signal (for example, zero level "0°"). It is called S3.

ホストコンピュータ（１６）において、（２４）は中央
処理ユニット（以下ｒｃＰＵＪと称す）　、（２５）は
メインメモリ、（２６）は表示装置（２ｏ）用のビデオ
信号用ＲＡＭ　（以下ｒ　Ｖ　Ｒ　Ａ　Ｍ　」と称す）
　、（２７）は文字コードを人力してこの文字コードに
対応する所定の字体のドットパターン即ちフォントを出
力するキャラクタＲＯＭよりなるフォントテーブルを示
し、オペレータがキーボード（１７）及び座標入力ユニ
ット（１８）を介してＣ　Ｐ　Ｕ　（２４）に各種コマ
ンド、データ及び座標データを供給すると、ｃＰＵ（２
４）はこれに対応して本例の文字認識装置の全体の動作
を制御する。In the host computer (16), (24) is a central processing unit (hereinafter referred to as rcPUJ), (25) is a main memory, and (26) is a video signal RAM for the display device (2o) (hereinafter referred to as rVRAM). ”)
, (27) shows a font table consisting of a character ROM that manually inputs a character code and outputs a dot pattern, that is, a font, of a predetermined font corresponding to this character code, and the operator uses the keyboard (17) and the coordinate input unit (18). When various commands, data, and coordinate data are supplied to the CPU (24) via the CPU (24), the CPU (24)
4) controls the overall operation of the character recognition device of this example.

また、原文字信号Ｓ２及びＳ３を夫々随時メインメモリ
（２５）及びＶ　Ｒ　Ａ　Ｍ　（２６）に供給する。こ
の場合、Ｃ　Ｐ　Ｕ　（２４）及びメインメモリ（２５
）が第６図例の文字列切出し部（２）及び文字切出し部
（６）に対応し、メインメモリ（２５）から読出された
１文字の外接枠の内部に対応する基本矩形切出し文字信
号Ｓ７を順次キャラクタ識別ボード（１９）に供給し、
キャラクタ識別ボード（１９）より送信されて来る文字
コードＣをメインメモリ（２５）を介してフォントテー
ブル（２７）のアドレスバスに供給し、このフォントテ
ーブル（２７）のデータハスに現われるフォントデータ
をＶ　Ｒ　Ａ　Ｍ　（２６）の所定領域に書込む如くな
す。また、本例のホストコンピュータ（１６）、キーボ
ード（１７）、座標人カユニッ｝　（１８）及び表示装
置（２０）よりなるシステムはワードプロセッサとして
の機能をも具えている。Further, the original character signals S2 and S3 are supplied to the main memory (25) and VRAM (26), respectively, as needed. In this case, CPU (24) and main memory (25)
) corresponds to the character string cutting part (2) and character cutting part (6) in the example in FIG. 6, and is the basic rectangular cut-out character signal S7 corresponding to the inside of the circumscribed frame of one character read from the main memory (25). are sequentially supplied to the character identification board (19),
The character code C transmitted from the character identification board (19) is supplied to the address bus of the font table (27) via the main memory (25), and the font data appearing in the data bus of this font table (27) is sent to the V R It is written in a predetermined area of A M (26). Furthermore, the system comprising the host computer (16), keyboard (17), coordinate system (18), and display device (20) of this example also has the function of a word processor.

キャラクタ識別ボード（１９）において、（２８）は文
字の認識部、（３０）は各種字体のフォントデータを文
字コード（本例ではＪＩＳコード）に対応させて記憶し
ている認識辞書部を示し、認識部（２日）及び認識辞書
部（３０）が基本的に第６図の文字識別部（７）に対応
する。本例の認識辞書部（３０）は大分類文字用の大分
類辞書部と細分類文字用の細分類辞書部とに分かれ、大
分類辞書部には前述した如く位置によって分類された第
１特徴文字並びに外接枠の相対的大きさ（縦横比ｈ／ｗ
）、縦相対比ｈ／ｈ，Ｉ及び横相対比ｗ　／　ｗカの値
によって分類された第２特徴文字の例えば縦２４ドット
×横２４ドットに正規化されたフォントデータが夫々格
納されている。尚、一般に文字の大まかな特徴は文字の
外接枠の各辺の近傍のドットパターンによっても表わさ
れるため、各文字の外接枠の四辺の近傍のドットパター
ンを四辺データ（又は周辺データ）として数値化して、
この四辺データが所定の範囲に収まった文字（大分類文
字）のフォントデータを大分類辞書部に格納する如くな
してもよい。In the character identification board (19), (28) indicates a character recognition section, (30) indicates a recognition dictionary section that stores font data of various fonts in correspondence with character codes (JIS codes in this example), The recognition section (2nd) and the recognition dictionary section (30) basically correspond to the character identification section (7) in FIG. The recognition dictionary section (30) of this example is divided into a large classification dictionary section for large classification characters and a subclassification dictionary section for subclassification characters, and the main classification dictionary section has first features classified by position as described above. Relative size of characters and circumscribing frame (aspect ratio h/w
), font data normalized to, for example, 24 dots vertically x 24 dots horizontally, are stored for second characteristic characters classified according to values of vertical relative ratio h/h, I and horizontal relative ratio w/w. . Generally, the general characteristics of a character are also expressed by the dot patterns near each side of the circumscribing frame of the character, so the dot patterns near the four sides of the circumscribing frame of each character are digitized as four-side data (or peripheral data). hand,
The font data of characters (major classification characters) whose four side data fall within a predetermined range may be stored in the major classification dictionary section.

一方、細分類辞書部には大分類辞書部に含まれない他の
全ての文字（細分類文字）の正規化されたフォントデー
タが文字コードに対応して格納されている。On the other hand, the minor classification dictionary section stores normalized font data of all other characters (minor classification characters) not included in the major classification dictionary section in correspondence with character codes.

（２９）は辞書作成部を示し、この辞書作成部（２９）
はオペレータによって辞書作成モードが設定された場合
には、供給されて来る１文字分の基本矩形切出し文字信
号によって表わされるフォントデータが大分類文字に対
応するか細分類文字に対応するかを判別し、大分類文字
に対応するときはそのフォントデータを正規化して認識
辞書部（３ｏ）の大分類辞書部の所定の文字コードの領
域に書込み、細分類文字に対応するときにはそのフォン
トデータを正規化して認識辞書部（３０）の細分類辞書
部の所定の文字コードの領域に書込む如くなす。これに
よって、ユーザ側で簡便に種々の字体に対応できる認識
辞書部（３０）を作成することができる。(29) indicates a dictionary creation section, and this dictionary creation section (29)
When the dictionary creation mode is set by the operator, determines whether the font data represented by the supplied basic rectangular cutout character signal for one character corresponds to a major classification character or a subclassification character. , when corresponding to major classification characters, the font data is normalized and written into a predetermined character code area of the major classification dictionary section of the recognition dictionary section (3o), and when corresponding to subclassification characters, the font data is normalized. and write it in a predetermined character code area of the subclassification dictionary section of the recognition dictionary section (30). This allows the user to easily create a recognition dictionary section (30) that can handle various fonts.

キャラクタ識別ボード（１９）の認識部（２８）はオペ
レータによって文字認識モードが設定された場合、供給
されて来る基本矩形切出し文字信号Ｓ７が大分類文字に
対応するときには認識辞書部（３０）の大分類文字部の
フォントデータを順次第１の先入れ先出し（Ｆ　Ｉ　Ｆ
Ｏ）レジスタに書込み、細分類文．字に対応するときに
は細分類文字部のフォントデータを順次第２のＦＩＦＯ
レジスタに書込む。また、この動作と平行して認識部（
２８）はその基本矩形切出し文字信号Ｓ７に対応するド
ットパターンを正規化して順次第３のＦＩＦＯレジスタ
に書込む。そして、認識部（２８）は第３のＦＩＦＯレ
ジスタ中の認識対象となる文字のドットパターンと第ｌ
のＦＩＦＯレジスタ中の一連のフォントデータ及び第２
のＦＩＦＯレジスタ中の一連のフォントデータとを順次
比較することにより、その認識対象となる文字のドット
パターンに最も近いフォントデータに対応する文字コー
ドを優先順位の高い順に１０個生成し、この文字コード
をホストコンピュータ（１６）のメインメモリの所定領
域に書込む如くなす。When the character recognition mode is set by the operator, the recognition section (28) of the character identification board (19) changes the size of the recognition dictionary section (30) when the supplied basic rectangular cutout character signal S7 corresponds to a major classification character. The font data of the classified character section is sequentially sorted in first-in, first-out (F I F
O) Write to register, subclassification statement. When dealing with characters, the font data of the subclassified character part is sequentially stored in 2 FIFOs.
Write to register. Also, in parallel with this operation, the recognition unit (
28) normalizes the dot pattern corresponding to the basic rectangular cutout character signal S7 and sequentially writes it into the FIFO register 3. The recognition unit (28) then uses the dot pattern of the character to be recognized in the third FIFO register and the lth
A series of font data in the FIFO register and the second
By sequentially comparing a series of font data in the FIFO register of , 10 character codes corresponding to the font data closest to the dot pattern of the character to be recognized are generated in descending order of priority, and these character codes are is written into a predetermined area of the main memory of the host computer (16).

その優先順位を決定するには、例えば２４　Ｘ　２４ド
ットの個々のドットについて認識対象となる文字のドソ
トパターンと認識辞書部（３０）より読出したフォント
データとを比較して、両者の値が異なっているドットの
総和を評価値となし、この評価値が小さい順に優先順位
を高く設定する。またこの最も優先順位の高い文字コー
ドの評価値が所定値以下の場合には、文字認識が行なわ
れたものとみなしてその最も優先順位の高い文字コード
を認識対象となる文字の文字コードＣとしてメインメモ
リ（２５）の原稿（１４）に対応して定められた領域に
書込む。同時に認識部（２８）はその文字コードＣと共
にその文字の大きさを示すデータ及びその文字の平均的
な外接枠の中での位置を示すデータをメインメモリ（２
５）に書込む如くなす。一方、最も優先順位の高い文字
コードの評価値が所定値を超える場合には、認識部（２
８）は文字認識ができなかったものとみなして！リジエ
クトコードをそのメインメモリ（２５）の原稿（１４）
に対応して定められた領域に書込む如くなす。上述の文
字認識の動作はパイプライン方式で実行される。To determine the priority, for example, the dosoto pattern of the character to be recognized for each dot of 24 x 24 dots is compared with the font data read from the recognition dictionary section (30), and it is determined whether the values of the two are different. The total sum of the dots that exist is taken as an evaluation value, and the order of priority is set in descending order of the evaluation value. If the evaluation value of the character code with the highest priority is less than a predetermined value, it is assumed that character recognition has been performed, and the character code with the highest priority is set as the character code C of the character to be recognized. The data is written in a predetermined area corresponding to the original (14) in the main memory (25). At the same time, the recognition unit (28) stores the character code C, data indicating the size of the character, and data indicating the position of the character within the average circumscribing frame in the main memory (28).
Do as described in 5). On the other hand, if the evaluation value of the character code with the highest priority exceeds the predetermined value, the recognition unit (2
8) is assumed to be unable to recognize characters! The reject code is stored in the main memory (25) of the manuscript (14)
The data is written in a predetermined area corresponding to the data. The character recognition operation described above is executed in a pipeline manner.

本例の動作を説明するに、本例の原稿（１４）は第３図
に示す如く「社説」という大きな見出し（１４Ａ）及び
他の縦書きの小さな文字の文章より構成され、その見出
し（１４Ａ）だけを消去するものとする。To explain the operation of this example, as shown in FIG. ) shall be deleted.

この場合、先ずオペレータはキーボード（１７）よりＣ
　Ｐ　Ｕ　（２４）にオリジナル原稿の表示コマンドを
与える。これに応じてＣ　Ｐ　Ｕ　（２２）はスキャナ
ー（１３）より生成される原稿（ｌ４）の１ページ分の
原文字信号Ｓ１をイメージデータ入力部（２２）のメモ
リ部に書込み、書込んだ原文字信号Ｓ１をそのまま原文
字信号Ｓ３としてＶ　Ｒ　ＡＭ　（２６）に供給する如
くなす。これに応じて表示装置（２０）の表示画面（２
０Ａ）の領域（３１）には第４図Ａに示す如く原稿（１
４）のそのままのイメージがドットパターンとして表示
される。In this case, the operator first selects C from the keyboard (17).
A command to display the original document is given to P U (24). In response, the CPU (22) writes the original character signal S1 for one page of the original (14) generated by the scanner (13) into the memory section of the image data input section (22), and The character signal S1 is supplied as is to the V RAM (26) as the original character signal S3. Accordingly, the display screen (2) of the display device (20)
In the area (31) of 0A), there is a manuscript (1) as shown in FIG.
The original image of 4) is displayed as a dot pattern.

次にオペレータはその表示画面（２ＯＡ）中の座標人カ
ユニッ｝　（１８）の指定する座標値に対応する位置に
十文字のカーソルを表示させる。座標人カユニッ｝　（
１８）の底面には例えば２つの回転軸を有するポールが
配され、座標人カユニノ｝　（１Ｂ）をテーブル面等に
付勢して２次元的に動かすことによりそのボールが２つ
の回転軸の回りに回転し、指定する座標値を連続的に２
次元的に変化させることができる如くなされている。そ
して、オペレータがその座標人カユニッ｝　（１８）を
動かして十文字のカーソルを「社説」という文字の左上
の点Ｐ，に移動させてその座標人カユニ７｝（１８）の
座標人力スイッチを操作することにより、その点Ｐ１の
座標をＣ　Ｐ　Ｕ　（２４）に教示する。同様に「社説
」という文字の右下の点Ｆ２の座標をＣ　Ｐ　Ｕ　（２
４）に教示する。Next, the operator displays a cross-shaped cursor on the display screen (2OA) at the position corresponding to the coordinate value specified by the coordinate system (18). coordinate person kayuni} (
For example, a pole with two rotation axes is arranged on the bottom of the ball 18), and by urging the coordinate person Kayunino (1B) against a table surface and moving it two-dimensionally, the ball rotates around the two rotation axes. The specified coordinate values are rotated to
It is designed so that it can be changed dimensionally. Then, the operator moves the coordinate operator (18) to move the cross-shaped cursor to the upper left point P of the word "Editorial" and operates the coordinate manual switch of the coordinate operator (7) (18). By doing so, the coordinates of the point P1 are taught to CPU (24). Similarly, the coordinates of point F2 at the bottom right of the word "editorial" are C P U (2
4).

Ｃ　Ｐ　Ｕ　（２４）は第４図Ａに示す如くそれら点Ｐ
，と点Ｐ２とを夫々対角点とする矩形領域（３２）を求
め、第５図Ａに示す如くそのイメージデータ入力部（２
２）のメモリ部（２２Ａ）に書込まれている原文字信号
Ｓ１の内でその矩形領域（３２）に対応する部分（３３
）をそのままバックアップメモリ（２３）の記憶部（２
３Ａ）の対応する部分（３４）に移し、その移した部分
には白紙に対応する例えばローレベル“０”のデータを
書込む如くなす。その後、そのイメージデータ入力部（
２２）のメモリ部に書込まれている原文字信号Ｓ３（原
文字信号Ｓ１の内の一部をローレベル“０”に置換えた
信号）をＶ　Ｒ　Ａ　Ｍ　（２６）に書込むことにより
、表示装置（２０）の表示画面（２０＾）には第４図Ｂ
に示す如《原稿（１４）から「社説」という見出し（１
４Ａ）が消去されたイメージが表示される。C P U (24) is the point P as shown in Figure 4A.
, and point P2 as diagonal points, and input the image data input section (2) as shown in FIG. 5A.
The part (33) corresponding to the rectangular area (32) of the original character signal S1 written in the memory section (22A) of 2)
) in the storage section (2) of the backup memory (23).
3A) to the corresponding portion (34), and write, for example, low level "0" data corresponding to a blank sheet in the transferred portion. After that, the image data input section (
By writing the original character signal S3 (signal in which part of the original character signal S1 is replaced with low level "0") written in the memory section of 22) to V R A M (26), The display screen (20^) of the display device (20) shows Fig. 4B.
As shown in 《From the manuscript (14) to the heading ``Editorial'' (1)
An image with 4A) erased is displayed.

また、その原文字信号Ｓ３をそのまま原文字信号Ｓ２と
してメインメモリ（２５）に書込むことにより、その原
稿（１４）から「社説」という見出し（１４Ａ）が消去
された形式の原稿が文字認識の対象となされる。Also, by writing the original character signal S3 as it is into the main memory (25) as the original character signal S2, the original in the form in which the heading "Editorial" (14A) has been deleted from the original (14) is processed for character recognition. be targeted.

次に、その消去した「社説」という見出し（１４Ａ）を
復元するにはＣ　Ｐ　Ｕ　（２２）は第５図Ｂに示す如
く、バックアップメモリ（２３）の記憶部（２３Ａ）の
部分（３４）に書込まれているデータをイメージデータ
入力部（２２）のメモリ部（２２Ａ）の部分（３３）に
移す。そして、そのイメージデータ入力部（２２）のメ
モリ部に書込まれている原文字信号３３（原文字信号Ｓ
１そのもの）をＶＲＡＭ（２６）に書込むことにより、
表示画面（２０Ａ）には第４図Ａに示す如く再び原稿（
１４）の全体のイメージがそのまま表示される。Next, in order to restore the erased heading "Editorial" (14A), the CPU (22) uses a portion (34) of the storage section (23A) of the backup memory (23), as shown in FIG. 5B. The data written in is transferred to the part (33) of the memory part (22A) of the image data input part (22). Then, the original character signal 33 (original character signal S
1 itself) to VRAM (26),
The display screen (20A) shows the original (20A) again as shown in Figure 4A.
14) is displayed as is.

また、その原文字信号Ｓ３をそのまま原文字信号Ｓ２と
してメインメモリ（２５）に書込むことにより、元の原
稿（１４）の全体が文字認識の対象となる。Further, by writing the original character signal S3 as it is into the main memory (25) as the original character signal S2, the entire original document (14) becomes the object of character recognition.

上述のように本例によれば、メモリ部を有するイメージ
データ入力部（２２）及びバックアップメモリ（２３）
が設けられているので、イメージデータ入力部（２２）
に記憶されている原文字信号Ｓｌを部分的にバンクアッ
プメモリ（２３）に移し、この移した部分に白紙に対応
するデータを書込むだけで、原稿（ｌ４）を損うことな
く実質的に原稿（ｌ４）の所望の部分を消去できる利益
がある。As described above, according to this example, an image data input section (22) having a memory section and a backup memory (23)
is provided, so the image data input section (22)
By simply transferring a portion of the original character signal Sl stored in the bank-up memory (23) and writing the corresponding data on a blank sheet into the transferred portion, the original character signal Sl can be effectively converted without damaging the original (l4). There is an advantage that a desired portion of the original (14) can be erased.

更に、そのバックアップメモリ（２３）に移されたデー
タの内の所望の部分を再びイメージデータ入力部（２２
）のメモリ部に戻すだけで、容易にその消去した原稿の
所望の部分を実質的に復元できる利益がある。Furthermore, a desired portion of the data transferred to the backup memory (23) is input again to the image data input unit (22).
) There is an advantage in that a desired portion of the erased document can be substantially restored simply by returning it to the memory section.

また、従来は原稿の文字領域中に埋め込まれた変則的な
見出しなどを避けるため１ページの原稿を複数の文字領
域に区分しなければならず作業効率が悪い場合があった
が、本例によればその変則的な見出しなどを消去するこ
とにより１ページの原稿（又は両開きの２ページの原稿
でも）には原則として１つの文字領域を指定するだけで
文字認識が行なえるようになり、作業効率がより改善さ
れる利益がある。In addition, in the past, one page of a manuscript had to be divided into multiple character areas to avoid irregular headings embedded in the character area of the manuscript, which sometimes resulted in poor work efficiency. According to the authors, by erasing such irregular headings, character recognition can be performed in principle by specifying one character area for a one-page manuscript (or even a two-page double-page manuscript). There is a benefit of improved efficiency.

尚、本発明は上述実施例に限定されず、本発明の要旨を
逸脱しない範囲で種々の横成を採り得ることは勿論であ
る。It should be noted that the present invention is not limited to the above-described embodiments, and it goes without saying that various configurations may be adopted without departing from the gist of the present invention.

〔Effect of the invention〕

本発明によれば、原稿を損うことなく実質的に原稿の内
の不必要な部分を消去できると共に、随時この消去した
部分を復元できる利益がある。According to the present invention, unnecessary portions of a document can be substantially erased without damaging the document, and the erased portions can be restored at any time.

[Brief explanation of drawings]

第１図は本発明の一実施例の文字認識装置のシステム構
成を示す一部斜視図を含む正面図、第２図は第１図例の
要部のより詳細な構成を示す一部斜視図を含む構成図、
第３図は一実施例の原稿の一例を示す線図、第４図は表
示画面（２ＯＡ）の変化を示す正面図、第５図は一実施
例の動作の説明に供する線図、第６図は従来の文字認識
装置の全体構成を示すブロック図、第７図及び第８図は
夫々従来の文字列及び原矩形の切出し動作の説明に供す
る線図である。（１３）はスキャナー、（１４）は原稿、（１５）はイ
メージデータ入出力ボード、（１６）はホストコンピュ
ータ、（ｌ９）はキャラクタ識別ボード、（２０）は表
示装置、（２４）は中央処理ユニット、（２５）はメイ
ンメモリ、（２８）は認識部、（２９）は認識辞書部で
ある。FIG. 1 is a front view including a partial perspective view showing the system configuration of a character recognition device according to an embodiment of the present invention, and FIG. 2 is a partial perspective view showing a more detailed configuration of the main part of the example shown in FIG. A configuration diagram including
FIG. 3 is a line diagram showing an example of the original of one embodiment, FIG. 4 is a front view showing changes in the display screen (2OA), FIG. 5 is a line diagram for explaining the operation of one embodiment, and FIG. The figure is a block diagram showing the overall configuration of a conventional character recognition device, and FIGS. 7 and 8 are diagrams for explaining the conventional character string and original rectangle extraction operations, respectively. (13) is a scanner, (14) is a document, (15) is an image data input/output board, (16) is a host computer, (19) is a character identification board, (20) is a display device, (24) is a central processing The unit (25) is a main memory, (28) is a recognition section, and (29) is a recognition dictionary section.

Claims

[Scope of Claims] A character recognition device comprising: a document reading unit that generates an original character signal corresponding to the shading of the original document; and a character identification unit that identifies characters corresponding to the original character signal; A first memory that stores original character signals for a predetermined range and supplies them to the character identification unit, and a second memory that backs up the first memory, and the original character signals are stored in the first memory. A character recognition device characterized in that a predetermined portion of an original character signal is transferred to the second memory at any time.