JPH0475556B2 - - Google Patents
Info
- Publication number
- JPH0475556B2 JPH0475556B2 JP58202807A JP20280783A JPH0475556B2 JP H0475556 B2 JPH0475556 B2 JP H0475556B2 JP 58202807 A JP58202807 A JP 58202807A JP 20280783 A JP20280783 A JP 20280783A JP H0475556 B2 JPH0475556 B2 JP H0475556B2
- Authority
- JP
- Japan
- Prior art keywords
- font
- feature
- character
- features
- recognition dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
Landscapes
- Character Discrimination (AREA)
Description
【発明の詳細な説明】
〔発明の技術分野〕
この発明は、主として複数の字体の印刷文字を
読取対象として認識辞書の構成と、その認識辞書
を用いて入力文字の属するカテゴリと字体を判別
する文字読取装置に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention mainly involves the configuration of a recognition dictionary for reading printed characters in a plurality of fonts, and the use of the recognition dictionary to determine the category and font to which an input character belongs. This invention relates to a character reading device.
従来、印刷文字を対象とした文字読取装置の認
識法には、文字パターン自体を重ね合わせ、最も
良く重なり合うものを候補とするパターンマツチ
ング法と、文字パターンよりその文字を構成する
線分などの特徴を抽出し、最も特徴が似ているも
のを候補とする特徴マツチング法がある。これら
の認識法を用いた文字読取装置において、複数の
字体を読み取るための辞書構成にはすべての字体
の字形を平均して用意したものと、カテゴリ毎に
各字体の辞書を用意したものがある。前者の場合
は平均的な特徴で認識辞書が構成されるため、高
精度な認識ができない上に、字体を認識できな
い。後者は認識辞書のメモリ量が増大するととも
に、識別処理が遅くなるという欠点があつた。ま
た、従来の文字読取装置には、読取対象となつた
文字の字体を文字コードとともに出力する機能を
有するものはなかつた。
Conventionally, the recognition methods of character reading devices for printed characters include a pattern matching method in which the character patterns themselves are superimposed and the one that best overlaps is selected as a candidate, and a method in which the line segments that make up the character are determined from the character pattern. There is a feature matching method that extracts features and selects those with the most similar features as candidates. In character reading devices that use these recognition methods, dictionary configurations for reading multiple fonts include one that prepares an average of all font shapes, and one that prepares a dictionary for each font for each category. . In the former case, the recognition dictionary is composed of average features, which makes it impossible to perform highly accurate recognition and also makes it impossible to recognize fonts. The latter has the disadvantage that the memory amount of the recognition dictionary increases and the identification process becomes slow. Further, none of the conventional character reading devices has a function of outputting the font of the character to be read together with the character code.
この発明は、これらの欠点を解決するために、
文字パターンに出現する特徴は同一のカテゴリ
(何という文字かということ)であれば字体(明
朝体、ゴシツク体等のこと)が異なつても同じ特
徴が多数存在することに着目し、字体間で共通に
出現する特徴をくくり出した共通特徴と、それぞ
れの字体に固有に出現する特徴を持つ個別特徴と
に分類して文字を登録することにより認識辞書の
規模を節約し、かつ、カテゴリと字体を同時に認
識するようにしたものである。以下、この発明を
図面について説明する。
In order to solve these drawbacks, this invention
We focused on the fact that many of the same features that appear in character patterns exist even if the fonts (Mincho font, Gothic font, etc.) are different as long as they are in the same category (what the character is called). By classifying and registering characters into common features, which are the features that commonly appear in each font, and individual features, which have features that appear uniquely in each font, the size of the recognition dictionary can be saved, and the size of the recognition dictionary can be reduced. It is designed to recognize fonts at the same time. Hereinafter, this invention will be explained with reference to the drawings.
第1図はこの発明による文字読取装置の構成の
一例である。この図で、1は帳票で、読み取るべ
き文字入力文字が記載されている。2は前処理部
で、帳票1に記載されている文字を1文字ずつ切
り出す。3は特徴抽出部、4は識別部、5はあら
かじめ特徴が記憶されている認識辞書部、6は制
御部、7は出力端子である。
FIG. 1 shows an example of the configuration of a character reading device according to the present invention. In this figure, numeral 1 is a form on which input characters to be read are written. 2 is a preprocessing unit that cuts out characters written on form 1 one by one. Reference numeral 3 designates a feature extraction section, 4 a recognition section, 5 a recognition dictionary section in which features are stored in advance, 6 a control section, and 7 an output terminal.
次に、動作について説明する。装置にセツトさ
れた帳票1に記載された文字は1行ずつ光電変換
された後、制御部6の信号に従つて前処理部2で
1文字ずつ切り出され、特徴抽出部3で各文字の
特徴が抽出されてこれらの特徴データが識別部4
に送られる。識別部4では、例えば線形識別関数
を用いた類似算出法では、入力文字の特徴とあら
かじめ認識辞書部5に用意した特徴とを照合し、
類似度を算出し、最も高い類似度を示したカテゴ
リあるいはあるしきい値以上の類似度を持つカテ
ゴリを出力する。 Next, the operation will be explained. After the characters written on the form 1 set in the device are photoelectrically converted line by line, the preprocessing unit 2 extracts the characters one by one according to the signals from the control unit 6, and the feature extraction unit 3 extracts the characteristics of each character. are extracted and these feature data are sent to the identification unit 4.
sent to. In the identification section 4, for example, in a similarity calculation method using a linear discriminant function, the features of the input character are compared with the features prepared in advance in the recognition dictionary section 5,
The degree of similarity is calculated, and the category showing the highest degree of similarity or the category having the degree of similarity above a certain threshold value is output.
次に、認識辞書部5の構成法と識別部4におけ
るカテゴリおよび字体の認識法について、第2図
を用いて詳細に説明する。第2図において、aは
従来の認識辞書構成、bはこの発明による認識辞
書構成の概念図である。ここでは簡略化のため
に、2字体を読取対象とした場合について説明す
る。 Next, the method of configuring the recognition dictionary section 5 and the method of recognizing categories and fonts in the identification section 4 will be explained in detail with reference to FIG. In FIG. 2, a is a conceptual diagram of a conventional recognition dictionary configuration, and b is a conceptual diagram of a recognition dictionary configuration according to the present invention. Here, for the sake of simplicity, a case where two fonts are to be read will be described.
aにおいて、同一のカテゴリC1に対し、字体
#
1の特徴F1の構成要素が(f1,f2,f4,f6,…
…fo-2,fo)、字体#
2の特徴F2の構成要素が
(f2,f3,f5,f6,……fo-2,fo-1)であつた場合、
そのまま結合すると認識辞書部5の規模は約2倍
となる。 In a, for the same category C 1 , the constituent elements of feature F 1 of font #1 are (f 1 , f 2 , f 4 , f 6 ,...
...f o-2 , f o ), and the constituent elements of feature F 2 of font #2 are (f 2 , f 3 , f 5 , f 6 , ... f o-2 , f o-1 ) ,
If they are combined as is, the size of the recognition dictionary section 5 will be approximately doubled.
そこで同図bに示すように、字体#
1と字体#
2に共通である特徴(共通特徴)をF9(f2,f6,
……fo-2)、字体#
1のみの特徴(個別特徴)を
F′1=(f1,f4,……fo)、字体#
2のみの特徴(個
別特徴)をF′2=(f3,f5,……fo-1)として分類
し、別々に認識辞書部5に登録する。これを全カ
テゴリに対し行う。 Therefore, as shown in figure b, font #1 and font #
F 9 (f 2 , f 6 ,
...f o-2 ), the features (individual features) of font #1 only
F' 1 = (f 1 , f 4 , ... f o ), classifying the features (individual features) of font #2 only as F' 2 = ( f 3 , f 5 , ... f o-1 ), They are registered separately in the recognition dictionary section 5. Do this for all categories.
認識辞書部5を用いて識別を行う際、入力文字
の特徴と認識辞書部5の特徴(F0+F′1),(F0+
F′2)を照合し、類似度の高い方の字体をその字
種の候補とすることにより字体の認識を行い、カ
テゴリと字体の情報を同時に出力する。なお、こ
の時、入力文字の特徴と共通特徴F0との照合は
1回行うだけで良く、第2図aのようにF1,F2
の全特徴と照合を行う場合よりも識別処理を高速
に行うことができる。 When performing identification using the recognition dictionary unit 5, the characteristics of the input character and the characteristics of the recognition dictionary unit 5 (F 0 +F′ 1 ), (F 0 +
F′ 2 ), and the font with the highest degree of similarity is selected as a candidate for the font type, thereby recognizing the font and simultaneously outputting category and font information. At this time, it is only necessary to check the input character features and the common feature F 0 once, and as shown in Figure 2 a, F 1 , F 2
Identification processing can be performed faster than when matching is performed with all the features of .
なお、線幅比や飾りの有無などの特徴を用いた
フオント認識用の辞書と、この発明による認識辞
書部を併用して識別を行えば、さらに高精度にフ
オントを認識できることはいうまでもない。 It goes without saying that fonts can be recognized with even higher accuracy if a dictionary for font recognition that uses features such as line width ratio and the presence or absence of decoration is used in combination with the recognition dictionary unit according to the present invention. .
以上説明したように、この発明は、共通特徴と
個別特徴とを別々に登録して認識辞書部を構成
し、この両特徴と入力文字から抽出された特徴と
を照合してどのカテゴリのどの字体の個別特徴と
一致したかを判定してカテゴリと字体情報を得る
ようにしたので、認識辞書部の規模の増大を抑
え、かつ、字体の種類を判別でき、また、字体数
が増加しても適用することができる利点がある。
As explained above, the present invention configures a recognition dictionary section by registering common features and individual features separately, and compares these features with features extracted from input characters to determine which font in which category. Since the category and font information is obtained by determining whether the individual characteristics of There are advantages that can be applied.
第1図はこの発明の一実施例を示すブロツク
図、第2図はこの発明による認識辞書部の構成を
説明する概念図である。
図中、1は帳票、2は前処理部、3は特徴抽出
部、4は識別部、5は認識辞書部、6は制御部、
7は出力端子、aは従来の認識辞書構成、bはこ
の発明による認識辞書構成を示し、F1,F2,F0,
F′1,F′2はそれぞれ同一のカテゴリC1において、
F1は字体#
1の特徴、F2は字体#
2の特徴、F0
は字体#
1と#
2の共通特徴、F′1は字体#
1の
みの特徴、F′2は字体#
2の特徴である。
FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG. 2 is a conceptual diagram illustrating the configuration of a recognition dictionary section according to the present invention. In the figure, 1 is a form, 2 is a preprocessing unit, 3 is a feature extraction unit, 4 is an identification unit, 5 is a recognition dictionary unit, 6 is a control unit,
7 is an output terminal, a is a conventional recognition dictionary configuration, b is a recognition dictionary configuration according to the present invention, F 1 , F 2 , F 0 ,
F′ 1 and F′ 2 are each in the same category C 1 ,
F 1 is a feature of font #1, F 2 is a feature of font #2, F 0
is a common feature of font #1 and #2, F'1 is a feature of font #1 only, and F'2 is a feature of font #2.
Claims (1)
文字読取装置において、字体が異なつても同一の
カテゴリに共通して出現する共通特徴と、そのカ
テゴリにおいてそれぞれの字体に固有に出現する
個別特徴とが別々に登録されている認識辞書部
と、前記入力文字から前記各特徴を抽出する特徴
抽出部と、前記認識辞書部内の前記各特徴と前記
特徴抽出部の入力文字から抽出された各特徴とを
照合する際に、前記共通特徴の一致を判定した後
そのカテゴリのどの字体の個別特徴と一致したか
を判定してカテゴリと字体情報とを出力する識別
部とを具備したことを特徴とする文字読取装置。1. In a character reading device that reads input characters containing a mixture of multiple types of fonts, it is possible to distinguish between common features that appear in the same category even if the fonts are different, and individual features that appear uniquely for each font in that category. A recognition dictionary section that is registered separately, a feature extraction section that extracts each feature from the input character, and each feature in the recognition dictionary section and each feature extracted from the input character of the feature extraction section. A character characterized by comprising an identification unit that determines whether the common feature matches, and then determines which individual feature of the font in that category matches the character, and outputs the category and font information when comparing. reading device.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP58202807A JPS6095690A (en) | 1983-10-31 | 1983-10-31 | Character reader |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP58202807A JPS6095690A (en) | 1983-10-31 | 1983-10-31 | Character reader |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS6095690A JPS6095690A (en) | 1985-05-29 |
| JPH0475556B2 true JPH0475556B2 (en) | 1992-12-01 |
Family
ID=16463520
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP58202807A Granted JPS6095690A (en) | 1983-10-31 | 1983-10-31 | Character reader |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS6095690A (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2582611B2 (en) * | 1988-04-05 | 1997-02-19 | 富士通株式会社 | How to create a multi-font dictionary |
| JPH02268388A (en) * | 1989-04-10 | 1990-11-02 | Hitachi Ltd | Character recognizing method |
| JP3599180B2 (en) | 1998-12-15 | 2004-12-08 | 松下電器産業株式会社 | SEARCH METHOD, SEARCH DEVICE, AND RECORDING MEDIUM |
-
1983
- 1983-10-31 JP JP58202807A patent/JPS6095690A/en active Granted
Also Published As
| Publication number | Publication date |
|---|---|
| JPS6095690A (en) | 1985-05-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA1208784A (en) | Method and apparatus for character recognition accommodating diacritical marks | |
| JPH0475556B2 (en) | ||
| JP3151866B2 (en) | English character recognition method | |
| JPS62281082A (en) | character recognition device | |
| JPH0210473B2 (en) | ||
| JPS6095689A (en) | Optical character reader | |
| JPS6081688A (en) | Information recognition method | |
| JPS60138689A (en) | Character recognizing method | |
| JP2665488B2 (en) | Personal dictionary registration method | |
| JPH056464A (en) | Method and device for character string recognition | |
| JP3100786B2 (en) | Character recognition post-processing method | |
| JP2784004B2 (en) | Character recognition device | |
| JPH0632088B2 (en) | Character recognition method | |
| JPS6174087A (en) | word reader | |
| JPH0291785A (en) | Image recognition method and device | |
| JPH11134439A (en) | Method for recognizing word | |
| JPS59148984A (en) | Pattern recognition system | |
| JPS62288989A (en) | Character recognizing system | |
| JPH04242491A (en) | Optical character reader | |
| JPS5851390A (en) | Font character recognizing device | |
| JPH04318687A (en) | Character recognition unit | |
| JPH0340186A (en) | Character recognizer | |
| JPS62280985A (en) | optical character reader | |
| JPS5914078A (en) | Reader of business form | |
| JPS6145378A (en) | Word reader |