JPH02173886A

JPH02173886A - Word recognizing system

Info

Publication number: JPH02173886A
Application number: JP63327602A
Authority: JP
Inventors: Kaoru Katagiri; 片桐　薫; Hiroyuki Sugai; 菅井　弘幸
Original assignee: Toshiba Corp; Toshiba System Development Co Ltd
Current assignee: Toshiba Corp; Toshiba System Development Co Ltd
Priority date: 1988-12-27
Filing date: 1988-12-27
Publication date: 1990-07-05

Abstract

PURPOSE:To accurately perform recognition even when the meaning of an abbreviated name is overlapped by constituting a dictionary by registering a homonym as a reference type on a word, and executing the evaluation of the word by adding the homonym on the reference type. CONSTITUTION:Image information obtained at an input part 11 is supplied to a character detection recognition part 13, and information comprising a character is extracted from the image information, and a recognition processing is executed on individual character. Word recognition is executed on the above result at a word recognition part 15 by using a first data base. After the execution of the word recognition, address recognition is performed at an address recognition part 21 by using a second data base 19, and a recognition result is supplied to an output device 23, then, it is displayed. And the dictionary is comprised by registering the homonym as the reference type on the word, and the evaluation of the word is executed by adding the homonym on the reference type. In such a way, it is possible to accurately execute the word recognition.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、単語認識方式に関し、例えば、住所を検索す
る際の基礎となる単語の認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a word recognition method, and for example, to a word recognition method that is the basis for searching for an address.

（従来の技術）認識方法には種々雑多のものがあるが、住所認識の場合
には以下のような方式が採用されている。(Prior Art) There are various recognition methods, but in the case of address recognition, the following methods are adopted.

まず、住所認識を実行するために、文字を認識し、これ
に引続き単語を認識する。この認識結果に基づいて、単
語を組合せて住所を検索する。この単語認識に際し、同
義語があった場合、略称を候補として挙げていた。例え
ば、ｒＨｉＧＨＷＡＹＪ　　ｒＩ（ｔＷＡＹＪ　　ｒＨ
ＷＹｊとの記載を読取り認識した場合、辞書からその略
称であるｒＨＷＹＪを認識候補として得ていた。　これ
は、ｒＨｉＧＨＷＡＹＪ　　ｒＨｉＷＡＹＪ　　ｒＨＷ
ＹＪが共に同一の意味、即ち高速道路という意味しかな
い場合には非常に有効であった。First, to perform address recognition, characters are recognized, followed by words. Based on this recognition result, addresses are searched by combining words. During word recognition, if synonyms were found, abbreviations were suggested as candidates. For example, rHiGHWAYJ rI(tWAYJ rH
When the description WYj was read and recognized, its abbreviation rHWYJ was obtained from the dictionary as a recognition candidate. This is rHiGHWAYJ rHiWAYJ rHW
It was very effective when both YJ had the same meaning, that is, expressway.

しかしながら、本願発明者が、試験、検討を重ねたとこ
ろ、略称の意味が重複する場合があり、正しい住所検索
が出来なかった。However, after repeated tests and studies by the inventor of the present application, it was found that the meanings of the abbreviations overlapped in some cases, making it impossible to perform correct address searches.

（発明が解決しようとする課題）このように従来の技術では、単語の候補を単一の略称で
表現していたため、略称の意味が重複する場合、正しい
住所検索が出来なかった。(Problems to be Solved by the Invention) As described above, in the conventional technology, word candidates are expressed by a single abbreviation, so when the meanings of the abbreviations overlap, correct address retrieval cannot be performed.

そこで、この発明は、略称の意味が重複する場合でもよ
り正確な認識の出来る単語認識方式を提供することを目
的とする。Therefore, it is an object of the present invention to provide a word recognition method that allows more accurate recognition even when abbreviations have overlapping meanings.

［発明の構成］（課題を解決するための手段）本発明は上記課題を解決するために、単語を辞書を用い
て認識する単語認識方式において、前記辞書を前記単語
に対し標準形と同義語を登録して構成し、前記標準形に
同義語を追加して前記単語の評価を実行することを特徴
とする。[Structure of the Invention] (Means for Solving the Problems) In order to solve the above problems, the present invention provides a word recognition method in which words are recognized using a dictionary. The method is characterized in that a synonym is registered and configured, a synonym is added to the standard form, and the word is evaluated.

（作用）本発明によれば、略称に対し可能な全ての意味を持たせ
、複数の候補を認識候補とするため単語認識がより正確
に実行される。(Operation) According to the present invention, all possible meanings are assigned to an abbreviation and a plurality of candidates are used as recognition candidates, so that word recognition is performed more accurately.

（実施例）次にこの発明の一実施例について、図面を用いて詳細に
説明する。(Example) Next, an example of the present invention will be described in detail using the drawings.

この実施例は米国での郵便物の住所認識装置に関する。This embodiment relates to a mail address recognition system in the United States.

この住所認識装置は、第１図に示されるよう入力部１１
を含む。この入力部１１は、ＣｃＤ等の光電変換装置か
ら構成され、郵便物上の画像をｒＯＪ　　ｒｌＪの画像
情報に変換するものである。この入力部１１で得られた
画像情報は、文字検出認識部１３に供給される。この文
字検出認識部１３では、画像情報から文字を構成する情
報を抽出し、個々の文字について認識処理を実行する。This address recognition device has an input section 11 as shown in FIG.
including. This input unit 11 is composed of a photoelectric conversion device such as a CcD, and converts an image on a mail item into rOJ rlJ image information. The image information obtained by this input section 11 is supplied to a character detection recognition section 13. The character detection and recognition unit 13 extracts information constituting characters from image information and performs recognition processing on each character.

この結果に対して単語認識部１５において、第１のデー
タベースを用いて単語認識が実行される。The word recognition unit 15 executes word recognition on this result using the first database.

単語認識の後、第２のデータベース１９を用いて住所認
識部２１にて住所認識がなされる。この認識結果が出力
装置２３に供給し、表示等をする。After word recognition, the address recognition unit 21 uses the second database 19 to recognize the address. This recognition result is supplied to the output device 23 and displayed.

次に上記構成における動作処理の詳細について説明する
。Next, details of the operation processing in the above configuration will be explained.

文字検出認識部１３での文字検出は、周知の技術で達成
される。まず、画像情報に対して２軸方向（水平、垂直
）に分布の統計をとる。即ち、２軸方向の各々に射影を
とり、「１」の数を計数する。すると、文字のある場所
では他の領域に比し、「１」の数、即ち画素数が多いの
で、文字の検出は容易に実現される。ただし、この時に
は、通常の認識処理に用いるより粗い密度の画素により
分布をとる。例えば、ＣＯＤでの読取り能力が８本／　
ｍ　ｍであり、通常の認識処理でも８本／　ｍ　ｍで行
っていたとすると、上記の検出処理には１本／ｍｍの精
度で行う。粗い画素での処理の方が文字検出に誤りが生
じないので好ましい。Character detection by the character detection recognition unit 13 is achieved using a well-known technique. First, statistics on the distribution of image information in two axes (horizontal and vertical) are taken. That is, projections are taken in each of the two axis directions and the number of "1"s is counted. Then, since the number of "1"s, that is, the number of pixels is larger in the area where the character is present than in other areas, the detection of the character can be easily realized. However, at this time, the distribution is based on pixels with a coarser density than those used in normal recognition processing. For example, the reading ability in COD is 8 lines/
mm, and if normal recognition processing is performed at a rate of 8 lines/mm, the above detection process is performed with an accuracy of 1 line/mm. Processing using coarse pixels is preferable because it prevents errors in character detection.

この文字検出では、２軸方向に射影をとるが、−の方向
の射影には切れ目（画素のない領域）があり、しかもそ
の間隔がある一定値以上になっている。この方向を第１
の方向と呼ぶ。他の方向の射影は、切れ目がないか、あ
ったとしてもその間隔が狭い。In this character detection, projections are taken in the two-axis directions, and there is a break (area with no pixels) in the projection in the - direction, and the interval between the breaks is greater than a certain value. This direction is the first
is called the direction of Projections in other directions are either seamless or closely spaced.

ここでは、切れ目のある方向に沿って「行」が構成され
ていると考えられる。よって、上記の切れ目に沿って、
「行」を切出していく。Here, it is considered that "rows" are constructed along the direction of the cut. Therefore, along the above cut,
Cut out the “row”.

行を切出したなら、この行を構成する領域の画像に対し
て、第１の方向とは垂直な第２の方向に沿って射影を取
る。これにより、文字のある領域では画素がカウントさ
れ、文字のない領域では画素がカウントされない。この
処理においては、扱う画素密度を行切出し時より精密な
ものにしても良い。上記と同様に、文字の存在の有無が
第２の方向への射影となって表れる。よって、行に対し
て文字の切出しが出来、文字検切が達成される。Once a row is cut out, a projection is taken along a second direction perpendicular to the first direction with respect to the image of the area constituting this row. As a result, pixels are counted in areas with characters, and pixels are not counted in areas without characters. In this process, the pixel density to be handled may be made more precise than that at the time of line extraction. Similarly to the above, the presence or absence of a character appears as a projection in the second direction. Therefore, characters can be cut out from a line, and character inspection can be achieved.

文字の検切に引続き、文字認識が行われる。この実施例
では、例えば複合類似度法を用いて文字認識を行なう。Following character verification, character recognition is performed. In this embodiment, character recognition is performed using, for example, a composite similarity method.

ここまでが、文字検出認識部１３での処理であり、文字
検出認識部１３からは、ｒＨＪ　　ｒｉＪ　　ｒＷＪ　
　ｒＹＪ等の文字についての情報が得られる。The processing up to this point is the processing in the character detection and recognition unit 13. From the character detection and recognition unit 13, rHJ riJ rWJ
Information about characters such as rYJ can be obtained.

この“文字認識の後、単語認識部１５にて、文字の組合
わせである単語の認識が実行される。尚、認識について
は、単純に認識結果が得られるのではなく、複数の認識
候補が尤度の高い順に得られ、通常の場合、郵便物上の
読取りにより、複数の単語が認識候補として出力される
。After this "character recognition," the word recognition unit 15 executes word recognition, which is a combination of characters.In addition, regarding recognition, rather than simply obtaining a recognition result, multiple recognition candidates are Words are obtained in descending order of likelihood, and in normal cases, a plurality of words are output as recognition candidates by reading them on a piece of mail.

今、説明の都合上、対象の郵便物上の住所中にｒｘｘｘ
ｘ　　ＨｉＷＡＹＪとあり、文字認識の結果、ｒＨｉＷ
ＡＹＪが単語として抽出されたとする。この単語ｒＨｉ
ＷＡＹＪと第１のデータベース１７の内容とを比較照合
する。For the sake of explanation, please note that rxxx is included in the address on the target mail.
x HiWAYJ, and as a result of character recognition, rHiW
Suppose that AYJ is extracted as a word. This word rHi
WAYJ and the contents of the first database 17 are compared and verified.

第１のデータベース１７の該当する部分が以下の第１表
のように構成されている。The relevant portion of the first database 17 is configured as shown in Table 1 below.

第１表二こで、第１表の左欄の「書状タイプ」とあるのは、略
称であるものを含んでいる。In Table 1, ``Letter type'' in the left column of Table 1 includes abbreviations.

よって、郵便物から読取った単語が「ＨｉＷＡＹ」であ
るから、この単語と第１表に示される第１のデータベー
ス１７の「書状タイプ」との比較照合が取られる。この
例では、第１表の第２行目のものが一致するので、まず
、その標準形を認識候補として抽出する。更に、郵便物
に書かれているｒＨｉＷＡＹＪを同義語として認識候補
の一つとする。これは、ｒＨｉＷＡＹＪが［高速道路（
Ｈｉ　Ｇ　ＨＷＡ　Ｙ）　Ｊである場合の他に、地名で
ある場合もあるからである。Therefore, since the word read from the mail is "HiWAY," this word is compared with the "letter type" in the first database 17 shown in Table 1. In this example, since the second line of Table 1 matches, the standard form is first extracted as a recognition candidate. Furthermore, rHiWAYJ written on the mail is treated as a synonym and one of the recognition candidates. This is because rHiWAYJ [Expressway (
This is because in addition to the case where it is Hi G HWA Y) J, it may also be a place name.

こうして、２つの認識候補ｒＨｉＷＡＹＪｒＨｉＧＨＷ
ＡＹＪが出力され、住所認識部２１に送られる。住所認
識部２１では第２のデータベース１つを利用して認識候
補を一つに絞ってい（。具体的には、読取った単語の前
後の単語まで含め、認識候補を一つに決定する。ここで
は、地名であるとして、ｒｘｘｘｘ　　ＨｉＧＨＷＡＹ
Ｊが選び出される。In this way, two recognition candidates rHiWAYJrHiGHW
AYJ is output and sent to the address recognition section 21. The address recognition unit 21 uses one second database to narrow down the recognition candidates to one (.Specifically, it includes the words before and after the read word and decides on one recognition candidate. So, assuming it is a place name, rxxxxx HiGHWAY
J is selected.

この結果は、出力部２３により表示等がなされる。This result is displayed, etc. by the output unit 23.

また、郵便物上に「ロロロロ　ＨＷＹ　　ＸＸＸＸ」と
の記載があり、その通りに読取ったとする。Also, assume that the mail has the words "RORORORO HWY XXXX" written on it and is read as such.

この場合には、ｒＨＷＹＪという単語の認識において、
［ＷＹｊとその標準形である「ＨｉＧＨＷＡＹＪという
単語が認識候補として抽出される。In this case, in recognizing the word rHWYJ,
[The word WYj and its standard form “HiGHWAYJ” are extracted as recognition candidates.

そして、この両候補が住所認識部２１にて最終的に一つ
に絞られる。Then, these two candidates are finally narrowed down to one by the address recognition unit 21.

［発明の効果］以上説明したように本発明によれば、単語の標準形と同
義語を認識候補とするので、単語認識がより正確に実行
される。[Effects of the Invention] As described above, according to the present invention, since the standard form of a word and a synonym are used as recognition candidates, word recognition can be performed more accurately.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係る住所認識装置の構成を
示す概略図である。１１・・・人力部１３・・・文字検出認識部１５・・・単語認識部１７・・・第１のデータベース１９・・・第２のデータベース２１・・・住所認識部　　　　　代理人弁理士則近恵佑
２３・・・出力部　　　　　　　　　　同１月下−坏１
回FIG. 1 is a schematic diagram showing the configuration of an address recognition device according to an embodiment of the present invention. 11...Human power department 13...Character detection recognition unit 15...Word recognition unit 17...First database 19...Second database 21...Address recognition unit Representative Patent Attorney Norichika Keisuke 23... Output part January 2nd - Kyo 1
times

Claims

[Claims]

(1) In a word recognition method that recognizes words using a dictionary, the dictionary is configured by registering standard forms and synonyms for the words, and the synonyms are added to the standard forms to evaluate the words. A word recognition method characterized by: