JPH0527157B2 - - Google Patents

Info

Publication number
JPH0527157B2
JPH0527157B2 JP59045044A JP4504484A JPH0527157B2 JP H0527157 B2 JPH0527157 B2 JP H0527157B2 JP 59045044 A JP59045044 A JP 59045044A JP 4504484 A JP4504484 A JP 4504484A JP H0527157 B2 JPH0527157 B2 JP H0527157B2
Authority
JP
Japan
Prior art keywords
kanji
characters
character
word
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP59045044A
Other languages
Japanese (ja)
Other versions
JPS60189582A (en
Inventor
Tozen Hai
Eiichiro Yamamoto
Yukikazu Kaburayama
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP59045044A priority Critical patent/JPS60189582A/en
Publication of JPS60189582A publication Critical patent/JPS60189582A/en
Publication of JPH0527157B2 publication Critical patent/JPH0527157B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 〔発明の技術分野〕 本発明は文字認識後処理方式、特に文字認識装
置に関連して、漢字、ひらがな、カタカナ等を含
む非漢字の混合した文章を読取り、文字認識を行
なつた後に生ずる認識エラーを修正する文字認識
後処理方式に関するものである。
[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a character recognition post-processing method, particularly a character recognition device, which reads a text containing a mixture of non-kanji including kanji, hiragana, katakana, etc., and performs character recognition. This invention relates to a character recognition post-processing method for correcting recognition errors that occur after character recognition is performed.

〔従来技術と問題点〕[Prior art and problems]

例えば光学的文字認識装置を用いて漢字、非漢
字を含む日本語の文章を認識することは既に行な
われているが、認識後に生じているエラーを修正
する方式、すなわち文字認識後処理対策は、未だ
十分満足しうる段階に達していない。
For example, optical character recognition devices have already been used to recognize Japanese texts including kanji and non-kanji characters, but there are no methods for correcting errors that occur after recognition, that is, post-processing measures for character recognition. We have not yet reached a stage where we are fully satisfied.

従来、このような文字認識後処理対策として考
えられていた方式は、上記のような文字を含む複
雑な日本語文章を扱う場合において、該文章を単
語部分で区切つて予め分かち書き等を行なつて文
章中に用いられた単語の位置を予め記入時に意識
しておく方式、あるいは文字を認識した結果、エ
ラーを生じた際におけるエラーの修正時に、エラ
ーのある単語の特定位置を知らせる方式などがあ
る。
Conventionally, the method considered as a post-processing measure for character recognition is to divide the text into word parts and perform separate writing etc. in advance when handling complex Japanese sentences containing the above characters. There are two methods: one in which the position of words used in a sentence is kept in mind when filling in the text, and another in which the specific position of the word in error is notified when an error is corrected when an error occurs as a result of character recognition. .

しかし、いずれの上記方式においても、記入
時、あるいは修正時に人が介在して必ず余分な措
置を講じておかなければならなかつた。上記の例
では、記入時の分かち書きをすることであると
か、特定の単語の位置を予め意識しておかなけれ
ばならず、最終目的に対していわば1つのダミー
ステツプの如き処理が必要となつていた。
However, in any of the above methods, extra steps must be taken by human intervention when filling out or editing information. In the above example, it is necessary to separate notes when writing, or to be aware of the position of specific words in advance, and it is necessary to perform processing like a dummy step for the final purpose. Ta.

〔発明の目的〕[Purpose of the invention]

本発明は上記の問題点に鑑みこれを解決するも
ので、本発明においては文章中の漢字と非漢字と
を識別するために別個の識別方式により識別をし
て漢字、非漢字の識別精度を上げると共に、識別
された漢字列に対して特定長の単語を形成する文
字列にしたがつて単語辞書を用意し、認識後のエ
ラー対策、エラー修正を効率的に行ないうる文字
認識後処理方式を提供することを目的としてい
る。
The present invention solves this problem in view of the above-mentioned problems.In the present invention, in order to distinguish between kanji and non-kanji in a text, separate identification methods are used to improve the accuracy of kanji and non-kanji identification. At the same time, a word dictionary is prepared according to the character strings that form words of a specific length for the identified kanji string, and a character recognition post-processing method that can efficiently take measures against and correct errors after recognition is developed. is intended to provide.

〔発明の構成〕[Structure of the invention]

この目的を達成するため本発明の文字認識後処
理方式では、平仮名文字等の非漢字と漢字とを分
類することができる文字認識装置において、漢字
か非漢字かを判定する漢字・非漢字判定手段と、
非漢字を識別する非漢字識別手段と、漢字を識別
する漢字識別手段と、漢字文字列を抽出する漢字
文字列抽出手段と、漢字文字列が規定文字数より
長い場合にこれを規定文字数に分離する単語分離
手段を備え、漢字識別手段で認識した漢字文字列
をその漢字長と合致した単語辞書と照合して単語
による認識を行うことを特徴とする。
In order to achieve this object, the character recognition post-processing method of the present invention uses a character recognition device capable of classifying non-kanji such as hiragana characters and kanji, and a kanji/non-kanji determination means for determining whether the character is a kanji or a non-kanji. and,
A non-kanji identification means for identifying non-kanji characters, a kanji identification means for identifying kanji characters, a kanji character string extraction means for extracting a kanji character string, and when the kanji character string is longer than a specified number of characters, it is separated into a specified number of characters. The present invention is characterized in that it includes word separation means and performs word recognition by comparing the Kanji character string recognized by the Kanji identification means with a word dictionary that matches the length of the Kanji characters.

〔発明の実施例〕[Embodiments of the invention]

本発明を一実施例にもとづき詳述するに先立ち
その概略を第2図により説明する。
Before explaining the present invention in detail based on one embodiment, its outline will be explained with reference to FIG.

第2図aに示す文を例えばOCRで読取り、漢
字、非漢字を一文字ずつ認識して第2図bの如き
認識結果を得たとき、まずオペレータが非漢字部
分におけるエラーを例えばキー入力により修正
し、同cの如き修正結果を得る。それから漢字部
分を単語辞書と照合して後処理を行うが、この場
合、漢字が多数連続している部分については、単
語としてもつとも多いのが2文字の組合せである
ので、例えば2文字毎に区切つて後処理を行う。
これにより「指足」、「人力」、「装直」、「目一」…
等を「指定」、「入力」、「装置」、「同一」…等の正
しいものに修正することができる。
When the sentence shown in Figure 2a is read using OCR, for example, and the kanji and non-kanji characters are recognized one by one, and the recognition result shown in Figure 2b is obtained, the operator first corrects the error in the non-kanji part by, for example, key input. Then, a modified result as shown in c. Then, post-processing is performed by comparing the kanji part with a word dictionary.In this case, for parts where there are many consecutive kanji characters, most words have combinations of two characters, so for example, they are separated into two-character parts. Then perform post-processing.
As a result, "fingers and feet", "human power", "re-fitting", "one eye"...
etc. can be corrected to correct values such as "designation", "input", "device", "same", etc.

本発明の一実施例を第1図〜第5図にもとづき
説明する。
An embodiment of the present invention will be described based on FIGS. 1 to 5.

第1図は本発明の一実施例構成図、第2図は本
発明における認識エラーの修正プロセス説明図、
第3図は文中の漢字非漢字を識別する方式、第4
図は第3図に示すループ数、連結成分数および平
均黒ラン数による漢字の分析識別例、第5図は第
3図における輪郭線分系列方式による例(特開昭
58−225849号公報参照)を示す。
FIG. 1 is a configuration diagram of an embodiment of the present invention, FIG. 2 is an explanatory diagram of the recognition error correction process in the present invention,
Figure 3 shows a method for identifying kanji and non-kanji characters in a sentence.
The figure shows an example of analyzing and identifying kanji using the number of loops, the number of connected components, and the average number of black runs shown in Figure 3. Figure 5 shows an example of the contour line segment series method in Figure 3 (Japanese Patent Application Laid-Open No.
58-225849).

第1図において、1は読取られるべき漢字、非
漢字(ひらがな、カタカナなど)の含んでいるド
キユメント入力部、2は漢字・非漢字用判定回
路、3は非漢字用識別回路、4は漢字識別回路、
5は漢字文字列抽出回路、6は表示装置、Kはそ
のキー部、7は単語分離回路、8は所定の長さの
漢字列で異なる意味を有する単語群が記憶されて
いる単語辞書部、9は単語処理回路を示す。
In Figure 1, 1 is a document input section containing kanji and non-kanji (hiragana, katakana, etc.) to be read, 2 is a kanji/non-kanji determination circuit, 3 is a non-kanji identification circuit, and 4 is kanji identification. circuit,
5 is a kanji character string extraction circuit; 6 is a display device; K is a key thereof; 7 is a word separation circuit; 8 is a word dictionary section in which groups of words having different meanings are stored in kanji strings of a predetermined length; 9 indicates a word processing circuit.

本発明による文字認識処理方式においては、漢
字が混合されている文章に対して、まず漢字であ
るか非漢字であるかを、各文字列について正確に
判別した上で、漢字であることが判れば単語とい
うものは、いくつかの漢字が連続したものである
から、漢字列にのみ単語に関する後処理を自動的
に行なわんとするものである。
In the character recognition processing method according to the present invention, for a sentence containing a mixture of kanji, it is first accurately determined whether each character string is a kanji or a non-kanji, and then it is determined whether it is a kanji. Since a word is a series of several kanji characters, post-processing for words is automatically performed only on kanji strings.

次に第1図の実施例の動作を説明する。 Next, the operation of the embodiment shown in FIG. 1 will be explained.

例えばOCRでドキユメントを読取つて得たド
キユメント入力部1の文字すなわち文中の漢字、
非漢字を漢字・非漢字判定回路2を介して順次判
定する。その結果、非漢字であると判定されたも
のは識別回路3で、どのような文字(ひらがな、
カタカナ等)であるかが識別される。一方、漢字
であると判定された出力は漢字用の識別回路4に
送られ、どのような漢字であるかがそれぞれ1文
字毎に識別される。
For example, the characters in the document input section 1 obtained by reading the document with OCR, that is, the kanji in the sentence,
Non-kanji characters are sequentially determined through a kanji/non-kanji determination circuit 2. As a result, the identification circuit 3 determines what kind of characters (hiragana,
Katakana, etc.) is identified. On the other hand, the output determined to be a kanji is sent to a kanji identification circuit 4, and the type of kanji is identified for each character.

非漢字用の識別回路3から出力された非漢字出
力及び漢字用の識別回路4から出力された漢字出
力が表示装置6上に表示されるので、認識された
非漢字、例えばエラーのあるひらがなはオペレー
タによつて表示装置6上のキーKにより修正され
る。これにより第2図bの「わ」、「ほ」、「加」等
が、同cに示す如く、「れ」、「は」、「が」(「加」
は漢字・非漢字の誤判定による)と修正される。
Since the non-kanji output output from the non-kanji identification circuit 3 and the kanji output output from the kanji identification circuit 4 are displayed on the display device 6, recognized non-kanji characters, such as erroneous hiragana, are displayed on the display device 6. Modifications are made by the operator using key K on the display device 6. As a result, ``wa'', ``ho'', ``ka'', etc. in Figure 2b are changed to ``re'', ``ha'', ``ga''(``ka'') as shown in Figure 2c.
is corrected as (due to misjudgment of kanji/non-kanji).

他方、識別回路4からの漢字出力は漢字文字列
抽出回路5に送られ、非漢字に挾まれた漢字文字
列を抽出する。そして抽出された漢字列は単語分
離回路に送られ、該漢字列が4語、5語、6語と
いうように長い場合には、ここで所定の単位で区
切る。例えば「文字認識装置」という文字列は
「文字」「認識」「装置」のように区切られる。こ
うしてある長さ、例えば2語で区切られた漢字は
単語後処理回路9へ送られる。前記単語後処理回
路9は単語辞書部8に接続され、そこからの漢字
出力を受けるようになつている。更に該単語辞書
部8には異なる意味を有するある長さの単語群が
予め多数記憶されている。したがつて、単語後処
理回路9において単語分離回路7からの単語と、
単語辞書部8からの単語(両者は長さが等しい)
とが比較され、認識された前者の単語が正しいも
のかどうかが判定される。このとき識別回路4で
は複数の候補が抽出されてこれらも送出されてく
るので、これら候補も使用して照合比較する。
On the other hand, the kanji output from the identification circuit 4 is sent to a kanji character string extraction circuit 5, which extracts kanji character strings sandwiched between non-kanji characters. The extracted kanji string is then sent to a word separation circuit, where if the kanji string is long, such as 4, 5, or 6 words, it is separated into predetermined units. For example, the character string "character recognition device" is divided into "character,""recognition," and "device." In this way, Kanji characters separated by a certain length, for example two words, are sent to the word post-processing circuit 9. The word post-processing circuit 9 is connected to the word dictionary section 8 and receives kanji output from there. Further, the word dictionary section 8 stores in advance a large number of word groups having different meanings and having a certain length. Therefore, in the word post-processing circuit 9, the words from the word separation circuit 7 and
Words from word dictionary part 8 (both have equal length)
are compared, and it is determined whether the former recognized word is correct. At this time, since a plurality of candidates are extracted and sent to the identification circuit 4, these candidates are also used for comparison and comparison.

すなわち、非漢字、例えば、送り文字などのひ
らがなにエラーがあれば、認識後に表示装置6上
で、エラーのある文字はすでに修正されているの
で、認識された漢字にエラーがあれば漢字を含む
文字の前後の意味から、単語分離回路7からのエ
ラー漢字は、単語後処理回路9において単語辞書
部8からの漢字出力と比較され、エラーを生じて
いると判定されるので該辞書部8からの正しい単
語に自動的に置換されて出力される。
That is, if there is an error in a non-kanji character, such as a hiragana character, the character with the error has already been corrected on the display device 6 after recognition, so if there is an error in the recognized kanji character, the kanji will be included. Based on the meanings before and after the characters, the error kanji from the word separation circuit 7 is compared with the kanji output from the word dictionary section 8 in the word post-processing circuit 9, and since it is determined that an error has occurred, the error kanji is sent from the dictionary section 8. will be automatically replaced with the correct word in the output.

例えば、第2図のaに示す如き文章がドキユメ
ント1に記入されていた場合に、読取後、漢字、
非漢字用識別回路4、および3で認識された結果
の文章いおいてbに示す如きエラーがあつたとす
る。該認識エラー文章は表示装置6上にそのまま
表示されるから、オペレータがそれを見てエラー
の存在するひらがなをcのように、キーを押して
修正する。すなわち、図示の例では「わ」を
『れ』に、「ほ」を『は』に、それに漢字と認識さ
れてしまつた「加」を『が』に修正する。このひ
らがな修正プロセスにおいては、オペレータは漢
字単語の正否の判定、修正は全くしなくてよい。
For example, if a sentence like the one shown in Figure 2 a is written in document 1, after reading it, the kanji,
Assume that there is an error as shown in b in the sentences recognized by the non-kanji identification circuits 4 and 3. Since the recognition error sentence is displayed as it is on the display device 6, the operator looks at it and corrects the hiragana in which the error exists, such as c, by pressing a key. That is, in the illustrated example, ``wa'' is corrected to ``re'', ``ho'' to ``ha'', and ``KA'', which has been recognized as a kanji character, to ``GA''. In this hiragana correction process, the operator does not have to judge whether or not the kanji word is correct or correct it at all.

漢字単語の認識エラーについては第2図のdの
ように自動的に修正が行なわれる。すなわち、第
2図のbに示すような漢字単語にエラーがある
と、単語後処理回路9において、単語分離回路7
からのエラー単語「指足」「装直」「目一」「人力」
「便用」なる入力およびこれらの変換のときに得
られたそれぞれの候補と、単語辞書部8から順次
取出して比較した結果、見つけた正しい単語『指
定』『入力』『装置』『同一』とが、dに示すよう
に自動的に置換される。
Errors in the recognition of kanji words are automatically corrected as shown in d of FIG. That is, if there is an error in a kanji word as shown in b in FIG.
Error words from ``finger and foot'', ``rearrangement'', ``first glance'', ``manpower''
As a result of sequentially extracting and comparing the input "convenient" and each candidate obtained during these conversions from the word dictionary section 8, the correct words "designation", "input", "device", and "same" were found. is automatically replaced as shown in d.

次に漢字と非漢字とを識別する具体的な方式に
ついて概説する。これについては同一出願人によ
る特願昭57−169510号によりすでに出願されてい
る。
Next, we will outline a specific method for distinguishing between kanji and non-kanji. Regarding this, an application has already been filed in Japanese Patent Application No. 169510/1983 by the same applicant.

第3図に示す如く、画数が多く複雑な漢字と画
数の少ない非漢字(例えば、ひらがな、カタカ
ナ)との識別は、第1段階で下記に述べるループ
数および連結成分数を分析して判定し、それで判
定のつかない場合には第2段階で下記に述べる平
均黒ラン数を調べて判定し、非漢字と画数の差し
て違わない少画数の漢字(例えば、ひらがなと識
別が困難な「山」「川」等)は第3段階で輪郭線
分系列を調べて最終的な両者の識別を行なう。
As shown in Figure 3, the discrimination between complex kanji with a large number of strokes and non-kanji with a small number of strokes (e.g. hiragana, katakana) is determined by analyzing the number of loops and the number of connected components described below in the first step. If the determination cannot be made, the second step is to check the average number of black runs described below and determine the number of black runs. ",""river," etc.), the contour line segment series is examined in the third step to make a final distinction between the two.

第4図は、第3図の第1、第2段階までの識別
方式を漢字の「漢」について行なう実例を示す。
同図においてループ数は「漢」の右側の環を形成
している部分であり、この場合ループ数は2つで
ある。連結成分数というのは、各画が分離・独立
している数であつて、図示の例ではサンズイ部の
3個、右側のクサカンムリに類似した部分の1
個、それにその下部のループを含むブロツクの1
個で計5つということになる。
FIG. 4 shows an example in which the identification method up to the first and second stages of FIG. 3 is applied to the kanji character "kan".
In the figure, the number of loops is the part forming the ring on the right side of "Kan", and in this case, the number of loops is two. The number of connected components is the number of separate and independent parts of each stroke, and in the illustrated example, there are three in the crested part, and one in the part similar to the crested crest on the right.
, and one of the blocks containing its bottom loop
That's a total of 5 pieces.

平均黒ラン数は列の黒ラン数と行の黒ラン数に
分けられ、列の黒ラン数は図示の例では垂直に漢
字を走査した際の存在する黒点(情報あり)の数
で、左側のサンズイ部では3、右側部分では6と
いうことになる。これを一般式で表わせば 列の平均黒ラン数nyは、 行の平均黒ラン数nxは、 ということになる。
The average number of black runs is divided into the number of black runs in columns and the number of black runs in rows. In the example shown, the number of black runs in a column is the number of black dots (with information) that exist when a kanji is scanned vertically. This means that the number is 3 for the sandy part and 6 for the right side part. Expressing this in a general formula, the average number of black runs in the column n y is The average number of black runs in a row n x is It turns out that.

以上、ループ数、連結成分数、平均黒ラン数の
使用により、多画数文字と少画数文字とに分離す
ることができた。次に、この少画数文字の中を、
非漢字と少画数漢字とに分けるために、輪郭線分
特徴を用いる。輪郭線分特徴抽出の例を第5図に
示す。輪郭線分は、文字の縁部において各線分が
開いているか(○)閉じているか(●)により、
4種の線分(○―○ ○―● ●―○ ●―●)
が出来る。これら4種の線分の出現系列は、原パ
タンの構造が単純である場合には非常に安定して
いる。従つて、少画数文字に対して輪郭線分の出
現系列(出現順序)を調べることにより、その文
字の属するカテゴリーを知ることができる。
As described above, by using the number of loops, the number of connected components, and the average number of black runs, characters can be separated into characters with a large number of strokes and characters with a small number of strokes. Next, inside this small number of strokes characters,
Contour segment features are used to classify non-kanji characters and kanji characters with a small number of strokes. FIG. 5 shows an example of contour line segment feature extraction. Contour line segments are determined by whether each line segment is open (○) or closed (●) at the edge of the character.
4 types of line segments (○―○ ○―● ●―○ ●―●)
I can do it. The appearance series of these four types of line segments are very stable when the structure of the original pattern is simple. Therefore, by checking the appearance series (order of appearance) of outline segments for a character with a small number of strokes, it is possible to know the category to which the character belongs.

このようにして本発明においては、第3図に示
す3つの識別段階を踏んで最終的に、漢字、非漢
字をかなりの精度で識別し、前述した如く、非漢
字に対して生じた認識エラーは表示装置6上でオ
ペレータが修正し、漢字に対して生じた認識エラ
ーは自動的に正しい単語に修正されうる。
In this way, in the present invention, kanji and non-kanji are finally identified with considerable accuracy by going through the three identification steps shown in FIG. is corrected by the operator on the display device 6, and recognition errors occurring in the kanji characters can be automatically corrected into correct words.

〔発明の効果〕〔Effect of the invention〕

以上述べたように、本発明においては漢字と非
漢字の混在したドキユメントを作成する際に、文
章の分かち書きをしたり、あるいは文章中で特定
の文字位置を予め意識したり、あるいは特定の単
語を認識ポイントとして予定せずに、漢字に対す
る識別エラーは自動的に修正しうるので、余分な
マンパワーを必要とせず、効率的な文字認識後処
理を行なうことができる。例えば光学文字認識装
置が得られる。
As described above, in the present invention, when creating a document containing a mixture of kanji and non-kanji characters, it is possible to separate the sentences, or to be aware of specific character positions in the sentences, or to write specific words. Since identification errors for Chinese characters can be automatically corrected without being scheduled as recognition points, efficient character recognition post-processing can be performed without requiring extra manpower. For example, an optical character recognition device is obtained.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の文字認識後処理方式の実施例
の構成、第2図は本発明により処理される文字認
識後の認識エラーの修正プロセス、第3図は文章
中の漢字、非漢字の識別方式、第4図は第3図の
第1段階、第2段階までの識別法による漢字の分
析・識別例、第5図は第3図の第3段階の識別法
による漢字識別例をそれぞれ示す。 図中、1はドキユメント入力部、2は漢字、非
漢字判定回路、3は非漢字用識別回路、4は漢字
用識別回路、5は漢字文字列抽出回路、6は表示
装置、Kはキー、7は単語分離回路、8は単語辞
書部、9は単語後処理回路を示す。
Figure 1 shows the configuration of an embodiment of the character recognition post-processing method of the present invention, Figure 2 shows the process of correcting recognition errors after character recognition processed by the present invention, and Figure 3 shows how to correct kanji and non-kanji characters in a sentence. Identification method, Figure 4 shows an example of kanji analysis and identification using the first and second stage identification methods in Figure 3, and Figure 5 shows an example of kanji identification using the third stage identification method in Figure 3. show. In the figure, 1 is a document input section, 2 is a kanji/non-kanji discrimination circuit, 3 is a non-kanji discrimination circuit, 4 is a kanji discrimination circuit, 5 is a kanji character string extraction circuit, 6 is a display device, K is a key, 7 is a word separation circuit, 8 is a word dictionary section, and 9 is a word post-processing circuit.

Claims (1)

【特許請求の範囲】 1 平仮名文字等の非漢字と漢字とを分類するこ
とができる文字認識装置において、 漢字と非漢字かを判定する漢字・非漢字判定手
段と、非漢字を識別する非漢字識別手段と、漢字
を識別する漢字識別手段と、漢字文字列を抽出す
る漢字文字列抽出手段と、漢字文字列を規定文字
数より長い場合にこれを規定文字数に分離する単
語分離手段を備え、漢字識別手段で認識した漢字
文字列をその漢字長と合致した単語辞書と照合し
て単語による認識を行うことを特徴とする文字認
識後処理方式。
[Scope of Claims] 1. In a character recognition device capable of classifying non-kanji such as hiragana characters and kanji, there is provided a kanji/non-kanji determination means for determining whether a kanji is a kanji or a non-kanji, and a non-kanji character for identifying a non-kanji. comprising an identification means, a kanji identification means for identifying kanji, a kanji character string extraction means for extracting a kanji character string, and a word separation means for separating a kanji character string into a predetermined number of characters when the kanji character string is longer than a predetermined number of characters; A character recognition post-processing method characterized by performing word recognition by comparing a kanji character string recognized by an identification means with a word dictionary that matches the kanji length.
JP59045044A 1984-03-09 1984-03-09 Post-processing system of character recognition Granted JPS60189582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59045044A JPS60189582A (en) 1984-03-09 1984-03-09 Post-processing system of character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59045044A JPS60189582A (en) 1984-03-09 1984-03-09 Post-processing system of character recognition

Publications (2)

Publication Number Publication Date
JPS60189582A JPS60189582A (en) 1985-09-27
JPH0527157B2 true JPH0527157B2 (en) 1993-04-20

Family

ID=12708362

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59045044A Granted JPS60189582A (en) 1984-03-09 1984-03-09 Post-processing system of character recognition

Country Status (1)

Country Link
JP (1) JPS60189582A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS636687A (en) * 1986-06-27 1988-01-12 Canon Inc character recognition device
JPH04242491A (en) * 1991-01-17 1992-08-31 Nec Corp Optical character reader

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5710195A (en) * 1980-06-19 1982-01-19 Nippon Electric Co Word recognizing device
JPS5839377A (en) * 1981-09-02 1983-03-08 Toshiba Corp Character recognizing device

Also Published As

Publication number Publication date
JPS60189582A (en) 1985-09-27

Similar Documents

Publication Publication Date Title
US5164899A (en) Method and apparatus for computer understanding and manipulation of minimally formatted text documents
JP3427692B2 (en) Character recognition method and character recognition device
JP3640972B2 (en) A device that decodes or interprets documents
US4611346A (en) Method and apparatus for character recognition accommodating diacritical marks
JP2726568B2 (en) Character recognition method and device
JP3445394B2 (en) How to compare at least two image sections
Hochberg et al. Script and language identification for handwritten document images
JP2973944B2 (en) Document processing apparatus and document processing method
US5161245A (en) Pattern recognition system having inter-pattern spacing correction
JP3452774B2 (en) Character recognition method
KR20010093764A (en) Retrieval of cursive chinese handwritten annotations based on radical model
JPH0684006A (en) Method of online handwritten character recognition
RU2259592C2 (en) Method for recognizing graphic objects using integrity principle
JPH04195692A (en) document reading device
JPH0527157B2 (en)
Mohapatra et al. Spell checker for OCR
Kanai et al. A preliminary evaluation of automatic zoning
AbdelRaouf Offline printed Arabic character recognition
Sturgeon Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR.
Muaz Urdu optical character recognition system MS thesis
Balasooriya Improving and Measuring OCR Accuracy for Sinhala with Tesseract OCR Engine
Hwang et al. Segmentation of a text printed in Korean and English using structure information and character recognizers
Touj et al. Segmentation stage of a PHMM-based model for off-line recognition of Arabic handwritten city names
JP2902138B2 (en) How to correct misread characters
JP2957211B2 (en) Japanese document proofreading support device