JPH0475556B2

JPH0475556B2 -

Info

Publication number: JPH0475556B2
Application number: JP58202807A
Authority: JP
Inventors: Mitsuru Toyoda; Kenichiro Ishii; Sueji Myahara
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1983-10-31
Filing date: 1983-10-31
Publication date: 1992-12-01
Also published as: JPS6095690A

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、主として複数の字体の印刷文字を
読取対象として認識辞書の構成と、その認識辞書
を用いて入力文字の属するカテゴリと字体を判別
する文字読取装置に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention mainly involves the configuration of a recognition dictionary for reading printed characters in a plurality of fonts, and the use of the recognition dictionary to determine the category and font to which an input character belongs. This invention relates to a character reading device.

[Prior art]

従来、印刷文字を対象とした文字読取装置の認
識法には、文字パターン自体を重ね合わせ、最も
良く重なり合うものを候補とするパターンマツチ
ング法と、文字パターンよりその文字を構成する
線分などの特徴を抽出し、最も特徴が似ているも
のを候補とする特徴マツチング法がある。これら
の認識法を用いた文字読取装置において、複数の
字体を読み取るための辞書構成にはすべての字体
の字形を平均して用意したものと、カテゴリ毎に
各字体の辞書を用意したものがある。前者の場合
は平均的な特徴で認識辞書が構成されるため、高
精度な認識ができない上に、字体を認識できな
い。後者は認識辞書のメモリ量が増大するととも
に、識別処理が遅くなるという欠点があつた。ま
た、従来の文字読取装置には、読取対象となつた
文字の字体を文字コードとともに出力する機能を
有するものはなかつた。 Conventionally, the recognition methods of character reading devices for printed characters include a pattern matching method in which the character patterns themselves are superimposed and the one that best overlaps is selected as a candidate, and a method in which the line segments that make up the character are determined from the character pattern. There is a feature matching method that extracts features and selects those with the most similar features as candidates. In character reading devices that use these recognition methods, dictionary configurations for reading multiple fonts include one that prepares an average of all font shapes, and one that prepares a dictionary for each font for each category. . In the former case, the recognition dictionary is composed of average features, which makes it impossible to perform highly accurate recognition and also makes it impossible to recognize fonts. The latter has the disadvantage that the memory amount of the recognition dictionary increases and the identification process becomes slow. Further, none of the conventional character reading devices has a function of outputting the font of the character to be read together with the character code.

[Summary of the invention]

この発明は、これらの欠点を解決するために、
文字パターンに出現する特徴は同一のカテゴリ
（何という文字かということ）であれば字体（明
朝体、ゴシツク体等のこと）が異なつても同じ特
徴が多数存在することに着目し、字体間で共通に
出現する特徴をくくり出した共通特徴と、それぞ
れの字体に固有に出現する特徴を持つ個別特徴と
に分類して文字を登録することにより認識辞書の
規模を節約し、かつ、カテゴリと字体を同時に認
識するようにしたものである。以下、この発明を
図面について説明する。 In order to solve these drawbacks, this invention
We focused on the fact that many of the same features that appear in character patterns exist even if the fonts (Mincho font, Gothic font, etc.) are different as long as they are in the same category (what the character is called). By classifying and registering characters into common features, which are the features that commonly appear in each font, and individual features, which have features that appear uniquely in each font, the size of the recognition dictionary can be saved, and the size of the recognition dictionary can be reduced. It is designed to recognize fonts at the same time. Hereinafter, this invention will be explained with reference to the drawings.

[Embodiments of the invention]

第１図はこの発明による文字読取装置の構成の
一例である。この図で、１は帳票で、読み取るべ
き文字入力文字が記載されている。２は前処理部
で、帳票１に記載されている文字を１文字ずつ切
り出す。３は特徴抽出部、４は識別部、５はあら
かじめ特徴が記憶されている認識辞書部、６は制
御部、７は出力端子である。 FIG. 1 shows an example of the configuration of a character reading device according to the present invention. In this figure, numeral 1 is a form on which input characters to be read are written. 2 is a preprocessing unit that cuts out characters written on form 1 one by one. Reference numeral 3 designates a feature extraction section, 4 a recognition section, 5 a recognition dictionary section in which features are stored in advance, 6 a control section, and 7 an output terminal.

次に、動作について説明する。装置にセツトさ
れた帳票１に記載された文字は１行ずつ光電変換
された後、制御部６の信号に従つて前処理部２で
１文字ずつ切り出され、特徴抽出部３で各文字の
特徴が抽出されてこれらの特徴データが識別部４
に送られる。識別部４では、例えば線形識別関数
を用いた類似算出法では、入力文字の特徴とあら
かじめ認識辞書部５に用意した特徴とを照合し、
類似度を算出し、最も高い類似度を示したカテゴ
リあるいはあるしきい値以上の類似度を持つカテ
ゴリを出力する。 Next, the operation will be explained. After the characters written on the form 1 set in the device are photoelectrically converted line by line, the preprocessing unit 2 extracts the characters one by one according to the signals from the control unit 6, and the feature extraction unit 3 extracts the characteristics of each character. are extracted and these feature data are sent to the identification unit 4.
sent to. In the identification section 4, for example, in a similarity calculation method using a linear discriminant function, the features of the input character are compared with the features prepared in advance in the recognition dictionary section 5,
The degree of similarity is calculated, and the category showing the highest degree of similarity or the category having the degree of similarity above a certain threshold value is output.

次に、認識辞書部５の構成法と識別部４におけ
るカテゴリおよび字体の認識法について、第２図
を用いて詳細に説明する。第２図において、ａは
従来の認識辞書構成、ｂはこの発明による認識辞
書構成の概念図である。ここでは簡略化のため
に、２字体を読取対象とした場合について説明す
る。 Next, the method of configuring the recognition dictionary section 5 and the method of recognizing categories and fonts in the identification section 4 will be explained in detail with reference to FIG. In FIG. 2, a is a conceptual diagram of a conventional recognition dictionary configuration, and b is a conceptual diagram of a recognition dictionary configuration according to the present invention. Here, for the sake of simplicity, a case where two fonts are to be read will be described.

ａにおいて、同一のカテゴリC₁に対し、字体
＃１の特徴F₁の構成要素が（f₁，f₂，f₄，f₆，…
…f_o-2，f_o）、字体＃２の特徴F₂の構成要素が
（f₂，f₃，f₅，f₆，……f_o-2，f_o-1）であつた場合、
そのまま結合すると認識辞書部５の規模は約２倍
となる。 In a, for the same category C ₁ , the constituent elements of feature F ₁ of font #1 are (f ₁ , f ₂ , f ₄ , f ₆ ,...
...f _o-2 , f _o ), and the constituent elements of feature F ₂ of font #2 are (f ₂ , f ₃ , f ₅ , f ₆ , ... f _o-2 , f _o-1 ) ,
If they are combined as is, the size of the recognition dictionary section 5 will be approximately doubled.

そこで同図ｂに示すように、字体＃１と字体＃
２に共通である特徴（共通特徴）をF₉（f₂，f₆，
……f_o-2）、字体＃１のみの特徴（個別特徴）を
F′₁＝（f₁，f₄，……f_o）、字体＃２のみの特徴（個
別特徴）をF′₂＝（f₃，f₅，……f_o-1）として分類
し、別々に認識辞書部５に登録する。これを全カ
テゴリに対し行う。 Therefore, as shown in figure b, font #1 and font #
F ₉ (f ₂ , f ₆ ,
...f _o-2 ), the features (individual features) of font #1 only
F' ₁ = (f ₁ , f ₄ , ... f _o ), classifying the features (individual features) of font #2 only as F' ₂ = ( f ₃ , f ₅ , ... f _o-1 ), They are registered separately in the recognition dictionary section 5. Do this for all categories.

認識辞書部５を用いて識別を行う際、入力文字
の特徴と認識辞書部５の特徴（F₀＋F′₁），（F₀＋
F′₂）を照合し、類似度の高い方の字体をその字
種の候補とすることにより字体の認識を行い、カ
テゴリと字体の情報を同時に出力する。なお、こ
の時、入力文字の特徴と共通特徴F₀との照合は
１回行うだけで良く、第２図ａのようにF₁，F₂
の全特徴と照合を行う場合よりも識別処理を高速
に行うことができる。 When performing identification using the recognition dictionary unit 5, the characteristics of the input character and the characteristics of the recognition dictionary unit 5 (F ₀ +F′ ₁ ), (F ₀ +
F′ ₂ ), and the font with the highest degree of similarity is selected as a candidate for the font type, thereby recognizing the font and simultaneously outputting category and font information. At this time, it is only necessary to check the input character features and the common feature F ₀ once, and as shown in Figure 2 a, F ₁ , F ₂
Identification processing can be performed faster than when matching is performed with all the features of .

なお、線幅比や飾りの有無などの特徴を用いた
フオント認識用の辞書と、この発明による認識辞
書部を併用して識別を行えば、さらに高精度にフ
オントを認識できることはいうまでもない。 It goes without saying that fonts can be recognized with even higher accuracy if a dictionary for font recognition that uses features such as line width ratio and the presence or absence of decoration is used in combination with the recognition dictionary unit according to the present invention. .

〔Effect of the invention〕

以上説明したように、この発明は、共通特徴と
個別特徴とを別々に登録して認識辞書部を構成
し、この両特徴と入力文字から抽出された特徴と
を照合してどのカテゴリのどの字体の個別特徴と
一致したかを判定してカテゴリと字体情報を得る
ようにしたので、認識辞書部の規模の増大を抑
え、かつ、字体の種類を判別でき、また、字体数
が増加しても適用することができる利点がある。 As explained above, the present invention configures a recognition dictionary section by registering common features and individual features separately, and compares these features with features extracted from input characters to determine which font in which category. Since the category and font information is obtained by determining whether the individual characteristics of There are advantages that can be applied.

[Brief explanation of the drawing]

第１図はこの発明の一実施例を示すブロツク
図、第２図はこの発明による認識辞書部の構成を
説明する概念図である。図中、１は帳票、２は前処理部、３は特徴抽出
部、４は識別部、５は認識辞書部、６は制御部、
７は出力端子、ａは従来の認識辞書構成、ｂはこ
の発明による認識辞書構成を示し、F₁，F₂，F₀，
F′₁，F′₂はそれぞれ同一のカテゴリC₁において、
F₁は字体＃１の特徴、F₂は字体＃２の特徴、F₀
は字体＃１と＃２の共通特徴、F′₁は字体＃１の
みの特徴、F′₂は字体＃２の特徴である。 FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG. 2 is a conceptual diagram illustrating the configuration of a recognition dictionary section according to the present invention. In the figure, 1 is a form, 2 is a preprocessing unit, 3 is a feature extraction unit, 4 is an identification unit, 5 is a recognition dictionary unit, 6 is a control unit,
7 is an output terminal, a is a conventional recognition dictionary configuration, b is a recognition dictionary configuration according to the present invention, F ₁ , F ₂ , F ₀ ,
F′ ₁ and F′ ₂ are each in the same category C ₁ ,
F ₁ is a feature of font #1, F ₂ is a feature of font #2, F ₀
is a common feature of font #1 and #2, _F'1 is a feature of font #1 only, and _F'2 is a feature of font #2.

Claims

[Claims]

1. In a character reading device that reads input characters containing a mixture of multiple types of fonts, it is possible to distinguish between common features that appear in the same category even if the fonts are different, and individual features that appear uniquely for each font in that category. A recognition dictionary section that is registered separately, a feature extraction section that extracts each feature from the input character, and each feature in the recognition dictionary section and each feature extracted from the input character of the feature extraction section. A character characterized by comprising an identification unit that determines whether the common feature matches, and then determines which individual feature of the font in that category matches the character, and outputs the category and font information when comparing. reading device.