JPH0934987A

JPH0934987A - Character recognized result correcting method and character recognizing device

Info

Publication number: JPH0934987A
Application number: JP7183759A
Authority: JP
Inventors: Masaharu Nagata; 政晴永田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-07-20
Filing date: 1995-07-20
Publication date: 1997-02-07

Abstract

PROBLEM TO BE SOLVED: To provide a high-operability character recognizing device. SOLUTION: This device is provided with a feature extracting part 16 for extracting the feature of an input pattern, dictionary part 18 in which the standard feature of a character is stored, character recognizing part 20 for performing the character recognition of the input pattern by comparing the feature with the standard feature, display part 24 for displaying this recognized result, and keyboard 26 for designating one of erroneously read characters for which the erroneously recognized results are displayed on the display part. Then, a control part 28 is provided to retrieve the input pattern having a feature similar to the feature of the designated erroneously read character by comparing the feature of the designated erroneously read character with the other feature and to display the recognized result of the retrieved input pattern on the display part while adding a marker to it.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、媒体上の文字パタン
の読取り認識結果を修正する文字認識結果修正方法およ
びそれを用いた文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition result correction method for correcting a recognition result obtained by reading a character pattern on a medium and a character recognition device using the same.

【０００２】[0002]

【従来の技術】従来、文字パタンの認識結果の修正にあ
たっては、誤って認識された文字（誤読文字）をオペレ
ータが逐一検索し、正しい文字に置換して修正してい
た。この修正方法では、類似文字を注意深く観察する必
要があるため、修正に時間がかかり、オペレータの負担
となっていた。そこで、このオペレータの負担を軽減
し、かつ修正に要する時間の低減を図るために、例え
ば、文献：（株）メディアドライブ研究所カタログ「Ｗ
ｉｎＲｅａｄｅｒＰｌｕｓｖ．２．０」には、誤読
文字を認識結果の文字名に基づいて一括して検索・置換
する方法が提案されている。2. Description of the Related Art Conventionally, when correcting a recognition result of a character pattern, an operator searches every character that is erroneously recognized (misread character) and replaces it with a correct character to correct it. In this correction method, it is necessary to carefully observe the similar characters, which requires a long time for correction and is a burden on the operator. Therefore, in order to reduce the burden on the operator and the time required for the correction, for example, a document: Media Drive Research Laboratory Catalog "W
inReader Plus v. 2.0 ”proposes a method for collectively searching and replacing misread characters based on the character name of the recognition result.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上述し
た文献に開示の技術においては、認識結果に基づいて一
括検索・置換するため、置換する必要の無い文字も検索
・置換してしまうことがある。このため、オペレータは
検索の都度、置換する必要の無い文字を除くために、検
索結果を確認する必要があった。However, in the technique disclosed in the above-mentioned document, since characters are collectively searched and replaced based on the recognition result, characters that do not need to be replaced may be searched and replaced. Therefore, the operator has to confirm the search result every time the search is performed in order to remove the characters that do not need to be replaced.

【０００４】このため、より正確に誤読文字を検索でき
る、操作性の良い文字認識結果の修正方法および文字認
識装置の実現が望まれていた。For this reason, it has been desired to realize a character recognition result correction method and a character recognition device which can retrieve misread characters more accurately and have good operability.

【０００５】[0005]

[Means for Solving the Problems]

（第１の発明）この出願に係る第１の発明の文字認識結
果修正方法によれば、入力パタンの特徴と、辞書部に格
納された標準特徴とを比較してこの入力パタンの文字認
識を行なって、この文字認識の認識結果を表示部に表示
した後、誤ったこの認識結果を修正するにあたり、誤っ
た認識結果を表示された誤読文字のうちの１文字を指定
し、指定されたこの誤読文字の入力パタンの特徴と、指
定された当該誤読文字以外の入力パタンの特徴とを比較
して、指定された当該誤読文字の特徴に類似した特徴を
有する入力パタンを検索し、検索された入力パタンに対
応する認識結果にマーカを付けて表示することを特徴と
する。(First Invention) According to the character recognition result correction method of the first invention of this application, the character recognition of the input pattern is performed by comparing the characteristics of the input pattern with the standard characteristics stored in the dictionary section. Then, after displaying the recognition result of this character recognition on the display unit, when correcting this erroneous recognition result, specify one of the misread characters that displayed the incorrect recognition result, and The characteristics of the input pattern of the misread character and the characteristics of the input pattern other than the specified misread character are compared, and the input pattern having the characteristics similar to the specified characteristics of the misread character is searched and searched. It is characterized in that the recognition result corresponding to the input pattern is displayed with a marker.

【０００６】（第２の発明）この出願に係る第２の発明
の文字認識装置によれば、入力パタンの特徴を抽出する
特徴抽出部を具え、標準特徴が格納された辞書部を具
え、特徴と標準特徴とを比較して、当該入力パタンの文
字認識を行なう文字認識部を具え、この文字認識の認識
結果および特徴を格納するメモリを具え、この認識結果
を表示する表示部を具え、表示部に誤った認識結果を表
示された誤読文字のうちの１文字を指定するための入力
部を具え、指定された誤読文字の入力パタンの特徴と、
指定された当該誤読文字以外の入力パタンの特徴とを比
較して、指定された当該誤読文字の入力パタンの特徴に
類似した特徴を有する入力パタンを検索し、検索された
この入力パタンに対応する認識結果にマーカを付けて表
示させる制御部を具えてなることを特徴とする。尚、こ
れらの発明で認識する文字には、記号一般も含まれる。(2nd invention) According to the character recognition device of the 2nd invention which relates to this application, it has the feature extraction section which extracts the feature of the input pattern, it possesses the dictionary section where standard feature is housed, features And a standard feature for comparing the input pattern with a character recognition unit, a memory for storing the recognition result and the feature of the character recognition, and a display unit for displaying the recognition result. And an input unit for designating one of the misread characters whose incorrect recognition result is displayed, and the characteristics of the input pattern of the specified misread character,
By comparing the characteristics of the input pattern other than the specified misread character, the input pattern having the characteristics similar to the characteristics of the input pattern of the specified misread character is searched, and the input pattern corresponding to the searched input pattern is searched. It is characterized by comprising a control unit for displaying a recognition result with a marker. The characters recognized in these inventions include general symbols.

【０００７】[0007]

【作用】この出願に係る第１の発明の文字認識結果修正
方法および第２の発明の文字認識装置によれば、誤読文
字の修正にあたり、表示された認識結果中の誤読文字の
うちの１文字を指定し、指定された誤読文字の特徴と、
指定された当該誤読文字以外の特徴とを比較し、指定さ
れた当該誤読文字の特徴に類似した特徴を有する入力パ
タンを検索する。According to the character recognition result correction method of the first invention and the character recognition device of the second invention related to this application, when correcting a misread character, one of the misread characters in the displayed recognition result is corrected. And the characteristics of the specified misread character,
The characteristics other than the designated misread character are compared to search for an input pattern having a feature similar to the designated misread character.

【０００８】その結果、オペレータが認識結果の表示を
見て、誤読文字のうちの１文字を指定すれば、指定され
た誤読文字と入力パタンの特徴が類似する入力パタンの
文字を検索することができる。この検索時には、初めの
文字認識パターン時と違って、指定された誤読文字の入
力パタンの特徴と、その他の入力パタンの特徴とを比較
する。このため、文字認識時に、同一パタンとして誤読
した入力パタンのうち、非同一の入力パタンを排除し
て、指定された誤読文字の入力パタンの特徴と類似する
入力パタンのみを検索することができる。As a result, when the operator looks at the display of the recognition result and designates one character of the misread characters, the character of the input pattern having similar characteristics to the designated misread character can be retrieved. it can. At the time of this search, unlike the first character recognition pattern, the characteristics of the input pattern of the designated misread character and the characteristics of the other input patterns are compared. Therefore, at the time of character recognition, non-identical input patterns can be excluded from the input patterns misread as the same pattern, and only the input patterns similar to the characteristics of the designated misread character input pattern can be searched.

【０００９】そして、これらの発明では、検索された入
力パタンに対応する認識結果にマーカを付けて表示す
る。その結果、オペレータはマーカがついた文字のみに
注意を向ければ良い。このため、誤読文字の見落としに
よる修正漏れの発生を抑制することができ、さらに、修
正に要する時間を低減することができる。また、マーカ
がつくのは特徴が類似の文字だけであるので、誤修正の
低減を図り、正確に修正を行なうことができる。In these inventions, the recognition result corresponding to the searched input pattern is displayed with a marker. As a result, the operator need only pay attention to the marked characters. For this reason, it is possible to suppress occurrence of omission of correction due to oversight of misread characters, and further it is possible to reduce the time required for correction. Further, since the markers are attached only to the characters having similar characteristics, it is possible to reduce erroneous corrections and correct them accurately.

【００１０】したがって、これらの発明によれば、修正
の際のオペレータの負担を軽減し性格かつ操作性のよい
文字認識結果修正方法および文字認識装置の実現を図る
ことができる。Therefore, according to these inventions, it is possible to realize a character recognition result correction method and a character recognition device which reduce the burden on the operator at the time of correction and have good character and operability.

【００１１】[0011]

【実施例】以下、図面を参照して、この出願に係る第１
の発明の文字認識結果修正方法および第２の発明の文字
認識装置の実施例について併せて説明する。尚、参照す
る図面は、これらの発明が理解できる程度に各構成成分
の形状、大きさおよび配置関係を概略的に示してあるに
過ぎない。従って、これらの発明は図示例に限定される
ものではない。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The first embodiment of the present application with reference to the drawings
An embodiment of the character recognition result correcting method of the invention of 1) and the character recognition device of the 2nd invention will be described together. It should be noted that the drawings to be referred to merely schematically show the shapes, sizes, and positional relationships of the respective constituent components to the extent that these inventions can be understood. Therefore, these inventions are not limited to the illustrated examples.

【００１２】（文字認識装置の構成について）先ず、図
１に、この実施例の文字認識装置のブロック図を示す。
この実施例の文字認識装置は、入力パタンを得るための
構成として、走査部１０およびイメージ記憶部１２およ
び文字切り出し部１４を具えている。走査部１０では、
文字認識の対象である文字等が記載された被読取り媒体
からイメージを取り込む。また、イメージ記憶部１２で
は、走査部１０によって取り込まれたイメージを記憶す
る。また、文字切り出し部１４では、イメージ記憶部１
２に記憶されたイメージから一文字単位に文字パタンを
切り出して入力パタンを得る。(Regarding Configuration of Character Recognition Device) First, FIG. 1 shows a block diagram of a character recognition device of this embodiment.
The character recognition device of this embodiment includes a scanning unit 10, an image storage unit 12, and a character cutting unit 14 as a configuration for obtaining an input pattern. In the scanning unit 10,
An image is captured from a medium to be read on which characters or the like to be recognized for characters are described. The image storage unit 12 also stores the image captured by the scanning unit 10. Further, in the character cutout unit 14, the image storage unit 1
A character pattern is cut out for each character from the image stored in 2 to obtain an input pattern.

【００１３】また、この実施例の文字認識装置は、入力
パタンの特徴を抽出する特徴抽出部１６を具えている。
また、文字の標準特徴が格納された辞書部１８を具えて
いる。また、特徴と標準特徴とを比較して、当該入力パ
タンの文字認識を行なう文字認識部２０を具えているま
た、この文字認識の認識結果および特徴を格納するメモ
リ２２を具えている。また、この認識結果を表示する表
示部２４を具えている。また、表示部に誤った認識結果
を表示された誤読文字のうちの１文字を指定するための
入力部としてのキーボード２６を具えている。The character recognition apparatus of this embodiment further comprises a feature extraction unit 16 for extracting the features of the input pattern.
It also comprises a dictionary unit 18 in which standard features of characters are stored. In addition, a character recognition unit 20 that performs character recognition of the input pattern by comparing the characteristics with the standard characteristics is provided, and a memory 22 that stores the recognition result and characteristics of the character recognition is provided. Further, it is provided with a display unit 24 for displaying the recognition result. Further, the keyboard 26 is provided as an input unit for designating one of the misread characters for which an incorrect recognition result is displayed on the display unit.

【００１４】そして、指定された誤読文字の特徴と、指
定された当該誤読文字以外の特徴とを比較して、指定さ
れた当該誤読文字の特徴に類似した特徴を有する入力パ
タンを検索し、検索された入力パタンに対応する認識結
果にマーカを付けて表示部に表示させる制御部２８を具
えている。Then, the feature of the designated misread character is compared with the feature other than the designated misread character, and an input pattern having a feature similar to the feature of the designated misread character is searched and retrieved. The controller 28 is provided with a marker attached to the recognition result corresponding to the inputted input pattern and displaying it on the display.

【００１５】（文字認識方法について）先ず、図２に示
す被読取り媒体３０上に記載された文字、記号等の文字
認識を従来周知の文字認識方法を用いて行なう。図２に
示す被読み取り媒体には、図中１００で示す漢字の
「工」および図中１０２で示すカタカナの「エ」がそれ
ぞれ含まれている。(Regarding Character Recognition Method) First, character recognition of characters, symbols and the like described on the medium 30 to be read shown in FIG. 2 is performed using a conventionally known character recognition method. The medium to be read shown in FIG. 2 includes a kanji character “K” shown by 100 in the drawing and a katakana “E” shown by 102 in the drawing.

【００１６】この実施例では、文字認識を行なうにあた
り、サブパタンを作成して得た特徴マトリクスを特徴と
する。このために先ず、被読取り媒体３０上を走査部１
０で光学的に走査し、得られたイメージをイメージ記憶
部１２に格納する。次に、格納されたイメージから一文
字単位の入力パタンを文字切り出し部１４において切り
出す。この入力パタンは、入力パタンの特徴を抽出する
特徴抽出部１６へ入力される。In this embodiment, when character recognition is performed, a feature matrix obtained by creating sub patterns is used as a feature. For this purpose, first, the scanning unit 1 is moved over the medium 30 to be read.
The image is optically scanned at 0, and the obtained image is stored in the image storage unit 12. Next, the character cutting unit 14 cuts out an input pattern for each character from the stored image. This input pattern is input to the feature extraction unit 16 that extracts the features of the input pattern.

【００１７】次に、図３に、この実施例の文字認識装置
の特徴抽出部１６の構成をブロック図で示す。この実施
例では、サブパタンを作成して得た特徴マトリクスを用
いて文字認識を行なう。このため、特徴抽出部１６へ入
力された入力パタンは、先ず、パタンレジスタ３２およ
び線幅計算部３４へ入力される。Next, FIG. 3 is a block diagram showing the configuration of the feature extraction unit 16 of the character recognition apparatus of this embodiment. In this embodiment, character recognition is performed using a feature matrix obtained by creating sub patterns. Therefore, the input pattern input to the feature extraction unit 16 is first input to the pattern register 32 and the line width calculation unit 34.

【００１８】線幅計算部３４は、入力パタンの線幅
（Ｗ）を計算する。線幅（Ｗ）の計算にあたっては、先
ず、入力パタンの各部分を２×２の４画素の窓で走査し
たときに、４画素全てが黒画素となる窓の個数Ｍと、そ
の入力パタンを構成する全黒画素数Ａとを計数する。そ
して、窓の個数Ｍと全黒画素数Ａとから、線幅（Ｗ）を
下記の（１）式で計数する。The line width calculator 34 calculates the line width (W) of the input pattern. In calculating the line width (W), first, when each part of the input pattern is scanned by a 2 × 2 window of 4 pixels, the number M of windows in which all 4 pixels are black pixels and the input pattern are calculated. The total number of black pixels A is counted. Then, from the number M of windows and the number A of all black pixels, the line width (W) is counted by the following equation (1).

【００１９】Ｗ＝Ａ／（Ａ−Ｍ）・・・（１）次に、サブパタンタン抽出部３６では、パタンレジスタ
３２について垂直スキャンを全面行って、黒ビット連続
長さと線幅計算部３４で得られた線幅（Ｗ）との関係よ
り垂直サブパタン（ＶＳＰ）を抽出する。さらに、ＶＳ
Ｐと同様にして、水平スキャンを行って水平サブパタン
（ＨＳＰ）を抽出し、右斜め４５°スキャンを行って右
斜めサブパタン（ＲＳＰ）を抽出し、左斜め４５°スキ
ャンを行って左斜めサブパタン（ＬＳＰ）を抽出する。W = A / (A−M) (1) Next, in the sub pattern extraction unit 36, the vertical scanning is performed on the entire pattern register 32, and the black bit continuous length and line width calculation unit 34 obtains it. The vertical sub-pattern (VSP) is extracted from the relationship with the obtained line width (W). Furthermore, VS
Similarly to P, a horizontal scan is performed to extract a horizontal sub-pattern (HSP), a right diagonal 45 ° scan is performed to extract a right diagonal sub-pattern (RSP), and a left diagonal 45 ° scan is performed to perform a left diagonal sub-pattern ( LSP).

【００２０】ここで、図４の上段に、漢字の「工」のサ
ブパタンを示す。図４の上段の左端から順に、入力パタ
ンの原パタン、ＶＳＰ、ＨＳＰ、ＲＳＰおよびＬＳＰを
示す。但し、漢字の「工」には、ＬＳＰのサブパタンが
無いため、ＬＳＰは白紙となる。また、図５の上段に、
カタカナの「エ」のサブパタンを示す。図５の上段の左
端から順に、入力パタンの原パタン、ＶＳＰ、ＨＳＰ、
ＲＳＰおよびＬＳＰを示す。但し、カタカナの「エ」に
は、ＲＳＰおよびＬＳＰのサブパタンが無いため、ＲＳ
ＰおよびＬＳＰは白紙となる。そして、後述するよう
に、この４つのサブパタンから、１つの入力パタンにつ
いて（図４および図５の下段にそれぞれ示された）４次
元の特徴マトリクスを抽出する。Here, the upper part of FIG. 4 shows a sub-pattern of the kanji "Kou". The original patterns of the input patterns, VSP, HSP, RSP and LSP are shown in order from the left end of the upper part of FIG. However, there is no LSP sub-pattern in the kanji "Kou", so the LSP is blank. Also, in the upper part of FIG.
The sub-pattern of katakana "e" is shown. The original pattern of the input pattern, VSP, HSP, in order from the left end of the upper part of FIG.
RSP and LSP are shown. However, since there is no RSP or LSP sub-pattern in the katakana "e", RS
P and LSP are blank. Then, as will be described later, a four-dimensional feature matrix (shown in the lower part of FIGS. 4 and 5) for one input pattern is extracted from the four sub patterns.

【００２１】次に、パタンレジスタ３２上の入力パタン
は、文字枠検出部３８に入力される。文字枠検出部３８
では、入力パタンに外接する文字枠を検出する。Next, the input pattern on the pattern register 32 is input to the character frame detector 38. Character frame detector 38
Then, the character frame circumscribing the input pattern is detected.

【００２２】次に、文字枠検出部３８で検出された文字
枠は、文字枠分割決定部４０に入力される。文字枠分割
決定部４０では、文字枠内をＮ×Ｍの領域に分割するた
めに用いるＸ軸およびＹ軸上の分割点座標を決定する。
但し、ＮおよびＭは自然数を表し、この実施例ではＮ＝
Ｍ＝５とする。また、この実施例では、文字枠の水平方
向をＸ軸、垂直方向をＹ軸とする。Next, the character frame detected by the character frame detection unit 38 is input to the character frame division determination unit 40. The character frame division determination unit 40 determines the division point coordinates on the X axis and the Y axis used for dividing the inside of the character frame into N × M regions.
However, N and M represent natural numbers, and in this embodiment, N =
Let M = 5. In this embodiment, the horizontal direction of the character frame is the X axis and the vertical direction is the Y axis.

【００２３】次に、特徴マトリクス抽出部４２におい
て、文字枠分割決定部４０で決定された分割点座標と、
サブパタン抽出部３６で抽出された４つのサブパタン
と、線幅計算部３４で計算された線幅（Ｗ）とを用い
て、入力パタン毎の特徴マトリクスを抽出する。Next, in the feature matrix extraction unit 42, the division point coordinates determined by the character frame division determination unit 40,
A feature matrix for each input pattern is extracted using the four sub patterns extracted by the sub pattern extraction unit 36 and the line width (W) calculated by the line width calculation unit 34.

【００２４】特徴マトリクスの抽出にあたっては、先
ず、４つのサブパタンの文字領域枠を分割点座標によっ
てＮ×Ｍの領域にそれぞれ分割する。次に、分割された
各領域中の黒画素数（Ｂ_ij）を計数する。次に、この黒
画素数（Ｂ_ij）と線幅（Ｗ）とから、下記の（２）を用
いて各領域の文字線長（Ｌ_ij）を計算する。In extracting the characteristic matrix, first, the character area frames of the four sub patterns are divided into N × M areas by the division point coordinates. Next, the number of black pixels (B _ij ) in each divided area is counted. Next, the character line length (L _ij ) of each area is calculated from the number of black pixels (B _ij ) and the line width (W) using the following (2).

【００２５】Ｌ_ij＝Ｂ_ij／Ｗ但し、１≦ｉ≦Ｎ、１≦ｊ≦Ｍとする。L _ij = B _ij / W, where 1 ≦ i ≦ N and 1 ≦ j ≦ M.

【００２６】次に、文字線長（Ｌ_ij）を入力パタンの大
きさで正規化する。例えば、ＶＳＰの特徴マトリクスに
おいては文字枠のＹ方向の長さΔＹで、ＨＳＰの特徴マ
トリクスにおいては文字枠のＸ方向長さΔＸで、また、
ＲＳＰおよびＬＳＰの特徴マトリクスにおいては（ΔＸ
² ＋ΔＹ² ）^1/2 で正規化する。そして、文字線長を正
規化した値を、各領域の値とした（Ｎ×Ｍ）×４次元の
特徴マトリクスを作成する。Next, the character line length (L _ij ) is normalized by the size of the input pattern. For example, in the feature matrix of VSP, the length ΔY of the character frame in the Y direction, in the feature matrix of HSP, the length ΔX of the character frame in the X direction, and
In the feature matrix of RSP and LSP, (ΔX
Normalize by ² + ΔY ² ) ^1/2 . Then, the value obtained by normalizing the character line length is used as the value of each area to create a (N × M) × 4 dimensional feature matrix.

【００２７】ここで、図４の下段に、漢字の「工」の特
徴マトリクスを示す。図４の下段の左端から順に、入力
パタンのＶＳＰ、ＨＳＰ、ＲＳＰおよびＬＳＰにそれぞ
れ対応した特徴マトリクスを示す。但し、漢字の「工」
には、ＬＳＰのサブパタンが無いため、ＬＳＰに対応し
た特徴マトリクス中の値は全て０となる。また、図５の
下段に、カタカナの「エ」の特徴マトリクスを示す。図
５の下段の左端から順に、入力パタンのＶＳＰ、ＨＳ
Ｐ、ＲＳＰおよびＬＳＰにそれぞれ対応した特徴マトリ
クスを示す。但し、カタカナの「エ」には、ＲＳＰおよ
びＬＳＰのサブパタンが無いため、ＲＳＰおよびＬＳＰ
に対応した特徴マトリクス中の値は全て０となる。Here, the lower part of FIG. 4 shows the characteristic matrix of the Chinese character "Kou". The feature matrices corresponding to the input patterns VSP, HSP, RSP, and LSP are shown in order from the left end of the lower part of FIG. However, the kanji "Kou"
, There are no LSP sub-patterns, so all values in the feature matrix corresponding to LSP are 0. In addition, the lower part of FIG. 5 shows a feature matrix of katakana “D”. Input patterns VSP and HS in order from the left end of the lower part of FIG.
The characteristic matrix corresponding to each of P, RSP, and LSP is shown. However, the katakana "E" does not have the sub-patterns of RSP and LSP, so RSP and LSP
All the values in the feature matrix corresponding to are 0.

【００２８】次に、特徴抽出部で抽出された特徴マトリ
クスは、認識部２０および制御部２８へ入力される。Next, the feature matrix extracted by the feature extraction unit is input to the recognition unit 20 and the control unit 28.

【００２９】識別部２０では、抽出された特徴マトリク
ス（ｆ_i ）と、辞書部に格納されている標準特徴として
の標準文字パタンの各特徴マトリクス（ｆ_m ）との間の
距離Ｄを、周知の下記の（３）式を用いて求める。In the identification unit 20, the distance D between the extracted feature matrix (f _i ) and each feature matrix (f _m ) of standard character patterns as standard features stored in the dictionary unit is known. It is calculated using the following equation (3).

【００３０】Ｄ＝｛Σ（ｆ_i −ｆ_m ）² ｝^1/2 ・・・（３）尚、標準文字パタンは、通常、同一の文字について複数
通り用意されている。D = {Σ (f _i −f _m ) ² } ^1/2 (3) Incidentally, a plurality of standard character patterns are usually prepared for the same character.

【００３１】そして、当該入力パタンの４つのサブパタ
ンにそれぞれ対応する特徴マトリクスと、標準文字パタ
ンの特徴マトリクスとの間の距離が最小のものから順序
付けを行なう。そして、最も距離Ｄの小さい標準文字パ
タンに相当する文字名をその入力パタンの認識結果とし
て制御部２８に入力する。Then, ordering is performed from the smallest distance between the feature matrix corresponding to each of the four sub patterns of the input pattern and the feature matrix of the standard character pattern. Then, the character name corresponding to the standard character pattern having the smallest distance D is input to the control unit 28 as the recognition result of the input pattern.

【００３２】また、制御部２８に入力された入力パタン
の各特徴マトリクスおよび認識結果はメモリ２２に格納
される。The feature matrix and the recognition result of the input pattern input to the control unit 28 are stored in the memory 22.

【００３３】さらに、文字切り出しから認識結果をメモ
リ２２に格納するまでの処理を、被読取り媒体３０上の
各文字について繰返して行なう。そして、被読取り媒体
３０上の各文字について認識結果を得た後、メモリ２２
に格納された、各入力パタンに対応する認識結果を表示
部２４に表示する。Further, the processing from the character segmentation to the storage of the recognition result in the memory 22 is repeated for each character on the medium 30 to be read. After obtaining the recognition result for each character on the medium 30 to be read, the memory 22
The recognition result corresponding to each input pattern stored in is displayed on the display unit 24.

【００３４】ところで、文字認識において使用される辞
書の特徴マトリクスは、１つの文字について、通常、複
数種類の特徴マトリクスが標準特徴として用意されてい
る。このため、例えば、カタカナの「エ」でありなが
ら、フォントによっては、カタカナの「エ」の標準特徴
よりも、漢字の「工」の標準特徴の方が距離Ｄが近くな
ってしまう場合があり得る。この様な誤読がされた認識
結果の表示例５０を図６の（Ａ）に示す。この表示例５
０では、本来カタカナの「エ」である入力パタンが、図
中に１０４で示すように、全て漢字の「工」として認識
されている。By the way, as a feature matrix of a dictionary used in character recognition, a plurality of types of feature matrices are usually prepared as standard features for one character. For this reason, for example, the standard feature of the kanji "Kaku" may be closer to the distance D than the standard feature of the katakana "E", although it is the katakana "E". obtain. A display example 50 of the recognition result in which such misreading is made is shown in FIG. This display example 5
At 0, the input pattern originally having the katakana "e" is recognized as all the kanji "work" as shown by 104 in the figure.

【００３５】（文字認識結果の修正方法）次に、図７の
フローチャートを参照して、文字認識結果の修正方法に
ついて説明する。上述したように認識結果が表示部に表
示（ａ）された後、先ず、オペレータは、表示部２４に
表示された認識結果を見て、必要に応じて誤読文字をキ
ーボード２６で１文字指定する（ｂ）。(Method of Correcting Character Recognition Result) Next, a method of correcting the character recognition result will be described with reference to the flowchart of FIG. After the recognition result is displayed (a) on the display unit as described above, first, the operator looks at the recognition result displayed on the display unit 24 and, if necessary, designates one misread character using the keyboard 26. (B).

【００３６】次に、修正の有無の判断を行なう（ｃ）。
修正が無い場合は、メモリ２２に格納されている認識結
果を出力する。一方、修正がある場合は、認識結果のマ
ーカの有無の判断を行なう（ｄ）。マーカがある場合
は、（ａ）の表示に処理へ戻りマーカがついた文字を訂
正する。一方、マーカが無い場合は、類似した入力パタ
ンを検索する（ｅ）。Next, it is judged whether or not there is a correction (c).
If there is no correction, the recognition result stored in the memory 22 is output. On the other hand, if there is a correction, the presence or absence of the marker in the recognition result is determined (d). If there is a marker, the process returns to the display of (a) and the character with the marker is corrected. On the other hand, when there is no marker, a similar input pattern is searched (e).

【００３７】ここでは、オペレータは、表示を見て、本
来カタカナの「エ」であるのに漢字の「工」として認識
されている１文字をカタカナの「エ」に修正して指定す
る。この際に、正しい認識結果（即ち、カタカナの
「エ」）を入力するとともに、修正された文字の表示例
中の位置の情報も指定される。位置の情報は、指定され
た誤読文字の入力パタンの特徴マトリクスをメモリ２２
中から読出すために使用される。Here, the operator, looking at the display, corrects and designates one character, which is originally the katakana “E” but recognized as the Chinese character “K”, into the Katakana “E”. At this time, the correct recognition result (that is, the katakana “E”) is input, and the position information in the display example of the corrected character is also specified. For the position information, the feature matrix of the input pattern of the designated misread character is stored in the memory 22.
Used to read from inside.

【００３８】誤読文字が指定されると、制御部２８は、
メモリ２２から指定された誤読文字の特徴マトリクスを
読出す。そして、指定された誤読文字の特徴と、メモリ
２２に格納されている指定された当該誤読文字以外の特
徴とを比較して、指定された当該誤読文字の特徴に類似
した特徴を有する入力パタンを検索する。When the misread character is designated, the control unit 28
The characteristic matrix of the designated misread character is read from the memory 22. Then, the feature of the designated misread character is compared with the feature other than the designated misread character stored in the memory 22, and an input pattern having a feature similar to the feature of the designated misread character is compared. Search for.

【００３９】ここで、類似した特徴を有するとは、例え
ば、上述した（３）式を用いて、入力パタンの特徴とし
ての特徴マトリクス間の距離を計算し、この距離がある
閾値以下の特徴マトリクス同士を類似した特徴と判断す
ると良い。Here, having similar features means that the distance between the feature matrices as the features of the input pattern is calculated using, for example, the above-mentioned equation (3), and this distance is a feature matrix below a certain threshold. It is good to judge that they are similar features.

【００４０】この類似パタンの検索にあたっては、誤読
文字の文字パターンから抽出した特徴マトリクスを基準
とするので、初めの文字認識の際のように、例えば漢字
の「工」とカタカナの「エ」との混同を抑制することが
できる。Since the feature matrix extracted from the character pattern of the misread character is used as a reference in the search for this similar pattern, as in the case of the first character recognition, for example, “K” of kanji and “E” of katakana are used. Can be suppressed.

【００４１】次に、制御部２８は、検索された入力パタ
ンに対応する認識結果にマーカを付けて表示部２４に表
示させる。ここでは、マーカとしてアンダーラインを付
ける。ここで、図６の（Ｂ）に、マーカのついた表示結
果の表示例６０を示す。この表示例では、図中に１０６
で示す本来カタカナの「エ」であるべき誤読文字の漢字
の「工」にのみアンダーラインがついている。Next, the control unit 28 causes the display unit 24 to display the recognition result corresponding to the retrieved input pattern with a marker. Here, an underline is added as a marker. Here, FIG. 6B shows a display example 60 of the display result with the marker. In this display example, 106 in the figure
The underline is added only to the kanji “K”, which is the misread character that should be originally “Ka” in “”.

【００４２】従って、オペレータは、マーカの表示され
た認識結果を見て、アンダーラインのついた認識結果の
みを修正すれば良い。アンダーラインのついた認識結果
をキーボードを用いて修正する毎に、認識結果の表示が
更新され、修正された部分のアンダーラインが消える。
そして、全てのアンダーラインが消えるまで逐一誤読文
字を修正することができる。Therefore, the operator only needs to correct the underlined recognition result by looking at the recognition result displayed with the marker. Every time the underlined recognition result is corrected using the keyboard, the recognition result display is updated and the underline of the corrected part disappears.
And you can correct the misread characters one by one until all the underlines disappear.

【００４３】さらに、例えば、上述したカタカナの
「エ」以外の種類の誤読文字がある場合は、上述の誤読
文字の指定の処理（ａ）から図７に示すフローチャート
に沿って修正処理を繰返す。例えば、カタカナの「エ」
以外の誤読例としては、漢字の「八」とカタカナの
「ハ」、漢字の「十」とカタカナの「ナ」、漢字の千と
カタカナの「チ」、そして記号の「／」とカタカナの
「ノ」等が挙げられる。Further, for example, when there is a misread character of a type other than the above-mentioned katakana "E", the correction process is repeated from the above-mentioned process (a) for designating the misread character according to the flowchart shown in FIG. For example, the katakana "E"
Other examples of misreading include the kanji “8” and katakana “ha”, the kanji “10” and katakana “na”, the kanji 1,000 and katakana “chi”, and the symbol “/” and katakana "No" etc. are mentioned.

【００４４】上述した実施例では、これらの発明を特定
の条件で構成した例について説明したが、これらの発明
は多くの変更および変形を行なうことができる。例え
ば、上述した実施例では、入力文字の特徴として、サブ
パタンを作成して特徴マトリクスを用いたが、これらの
発明では、特徴として、例えば、類似度法を用いても良
い。類似度法では、特徴マトリクスの代わりに特徴ベク
トルを比較して、類似した特徴を有する入力パタンの検
索を行なうと良い。In the above-described embodiments, the examples in which these inventions are configured under specific conditions have been described, but many modifications and variations can be made to these inventions. For example, in the above-described embodiment, the sub-pattern is created and the feature matrix is used as the feature of the input character, but in these inventions, for example, the similarity method may be used as the feature. In the similarity method, it is advisable to compare the feature vectors instead of the feature matrix to search for input patterns having similar features.

【００４５】また、上述した実施例では、マーカとし
て、アンダーラインを付したが、これらの発明では、マ
ーカとして、例えばこの当文字を罫線で囲む、この当文
字を反転表示するあるいはこの当文字を点滅表示しても
良い。In the above-described embodiment, the marker is underlined. However, in these inventions, the marker is surrounded by ruled lines, the character is highlighted, or the character is highlighted. You may blink it.

【００４６】また、上述した実施例では、入力部として
キーボードを用いたが、これらの発明では入力部はこれ
に限定されるものではなく、例えばマウスを用いても良
い。Although the keyboard is used as the input unit in the above-described embodiments, the input unit is not limited to this in these inventions, and for example, a mouse may be used.

【００４７】[0047]

【発明の効果】この出願に係る第１の発明の文字認識結
果修正方法および第２の発明の文字認識装置によれば、
誤読文字の修正にあたり、誤った認識結果を表示された
誤読文字のうちの１文字を指定し、指定された誤読文字
の特徴と、指定された当該誤読文字以外の特徴とを比較
し、指定された当該誤読文字の特徴に類似した特徴を有
する入力パタンを検索する。According to the character recognition result correction method of the first invention and the character recognition device of the second invention related to this application,
In correcting the misread character, one of the misread characters displayed with the incorrect recognition result is specified, the specified misread character is compared with the specified feature other than the misread character, and the specified An input pattern having characteristics similar to the characteristics of the misread character is searched.

【００４８】その結果、オペレータが認識結果の表示を
見て、誤読文字のうちの１文字を指定すれば、指定され
た誤読文字と入力パタンの特徴が類似する入力パタンの
文字を検索することができる。この検索時には、初めの
文字認識時と違って、指定された誤読文字の入力パタン
の特徴と、その他の入力パタンの特徴とを比較する。こ
のため、文字認識時に、同一パタンとして誤読した入力
パタンのうち、非同一の入力パタンを排除して、指定さ
れた誤読文字の入力パタンの特徴と類似する入力パタン
のみを検索することができる。As a result, if the operator looks at the display of the recognition result and designates one character of the misread characters, the character of the input pattern having the similar characteristics to the designated misread character can be retrieved. it can. At the time of this search, unlike the first character recognition, the characteristics of the input pattern of the designated misread character are compared with the characteristics of the other input patterns. Therefore, at the time of character recognition, non-identical input patterns can be excluded from the input patterns misread as the same pattern, and only the input patterns similar to the characteristics of the designated misread character input pattern can be searched.

【００４９】そして、これらの発明では、検索された入
力パタンに対応する認識結果にマーカを付けて表示す
る。その結果、オペレータはマーカがついた文字のみに
注意を向ければ良い。このため、誤読文字の見落としに
よる修正漏れの発生を抑制することができ、さらに、修
正に要する時間を低減することができる。また、マーカ
が付くのは特徴が類似の文字だけであるので、誤修正の
低減を図り、正確に修正を行なうことができる。In these inventions, the recognition result corresponding to the retrieved input pattern is displayed with a marker. As a result, the operator need only pay attention to the marked characters. For this reason, it is possible to suppress occurrence of omission of correction due to oversight of misread characters, and further it is possible to reduce the time required for correction. Further, since the markers are attached only to characters having similar characteristics, it is possible to reduce erroneous corrections and correct them accurately.

【００５０】したがって、これらの発明によれば、修正
の際のオペレータの負担を軽減した、操作性のよい文字
認識結果修正方法および文字認識装置の実現を図ること
ができる。Therefore, according to these aspects of the invention, it is possible to realize a character recognition result correction method and a character recognition device which are easy to operate and have a reduced burden on the operator.

[Brief description of drawings]

【図１】実施例の文字認識装置の構成の説明に供するブ
ロック図である。FIG. 1 is a block diagram for explaining a configuration of a character recognition device according to an embodiment.

【図２】実施例の被読取り媒体である。FIG. 2 is a read medium according to an embodiment.

【図３】実施例の文字認識装置を構成する特徴抽出部の
構成の説明に供するブロック図である。FIG. 3 is a block diagram for explaining the configuration of a feature extraction unit that constitutes the character recognition device of the embodiment.

【図４】漢字の「工」のサブパタンおよび特徴マトリク
スの説明に供する図である。FIG. 4 is a diagram for explaining a sub-pattern and a feature matrix of a Chinese character “ko”.

【図５】カタカナの「エ」のサブパタンおよび特徴マト
リクスの説明に供する図である。FIG. 5 is a diagram for explaining a sub-pattern of katakana “D” and a feature matrix.

【図６】（Ａ）は、初めの文字認識時の認識結果の表示
例であり、（Ｂ）は、マークを付した認識結果の表示例
である。FIG. 6A is a display example of a recognition result at the time of initial character recognition, and FIG. 6B is a display example of a recognition result with a mark.

【図７】認識結果の修正処理の説明に供するフローチャ
ートである。FIG. 7 is a flowchart for explaining a recognition result correction process.

[Explanation of symbols]

１０：走査部１２：イメージ記憶部１４：文字切り出し部１６：特徴抽出部１８：辞書部２０：認識部２２：メモリ２４：表示部２６：キーボード（入力部）２８：制御部３０：被読取り媒体３２：パタンレジスタ３４：線幅計算部３６：サブパタン抽出部３８：文字枠検出部４０：文字枠分割決定部４２：特徴マトリクス抽出部５０、６０：認識結果の表示例 10: scanning unit 12: image storage unit 14: character cutting unit 16: feature extraction unit 18: dictionary unit 20: recognition unit 22: memory 24: display unit 26: keyboard (input unit) 28: control unit 30: medium to be read 32: pattern register 34: line width calculation unit 36: sub pattern extraction unit 38: character frame detection unit 40: character frame division determination unit 42: feature matrix extraction unit 50, 60: display example of recognition result

Claims

[Claims]

1. The character of the input pattern is recognized by comparing the characteristic of the input pattern with the standard characteristic stored in the dictionary unit, and the recognition result of the character recognition is displayed on the display unit. In correcting the recognition result, one character of the misread characters for which an incorrect recognition result is displayed is specified, and the characteristics of the input pattern of the specified misread character and the input pattern other than the specified misread character are specified. It is characterized in that an input pattern having a feature similar to the feature of the specified misread character is searched by comparing with the feature, and a recognition result corresponding to the searched input pattern is displayed with a marker attached. Character recognition result correction method.

2. A character extracting section for extracting a characteristic of an input pattern, a dictionary section storing standard characteristics, comparing the characteristic with the standard characteristic, and performing character recognition of the input pattern. A recognition unit, a memory for storing the recognition result of the character recognition and the feature, a display unit for displaying the recognition result, and one of the misread characters for which an incorrect recognition result is displayed on the display unit. An input unit for designating a character is provided, and the characteristics of the input pattern of the specified misread character and the characteristics of the input pattern other than the specified misread character are compared,
A control unit for searching for an input pattern having characteristics similar to the input pattern of the specified misread character and displaying a recognition result corresponding to the searched input pattern with a marker attached. Character recognition device.