JPH02173886A - Word recognizing system - Google Patents

Word recognizing system

Info

Publication number
JPH02173886A
JPH02173886A JP63327602A JP32760288A JPH02173886A JP H02173886 A JPH02173886 A JP H02173886A JP 63327602 A JP63327602 A JP 63327602A JP 32760288 A JP32760288 A JP 32760288A JP H02173886 A JPH02173886 A JP H02173886A
Authority
JP
Japan
Prior art keywords
recognition
word
character
homonym
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63327602A
Other languages
Japanese (ja)
Inventor
Kaoru Katagiri
片桐 薫
Hiroyuki Sugai
菅井 弘幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba System Development Co Ltd
Original Assignee
Toshiba Corp
Toshiba System Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba System Development Co Ltd filed Critical Toshiba Corp
Priority to JP63327602A priority Critical patent/JPH02173886A/en
Publication of JPH02173886A publication Critical patent/JPH02173886A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To accurately perform recognition even when the meaning of an abbreviated name is overlapped by constituting a dictionary by registering a homonym as a reference type on a word, and executing the evaluation of the word by adding the homonym on the reference type. CONSTITUTION:Image information obtained at an input part 11 is supplied to a character detection recognition part 13, and information comprising a character is extracted from the image information, and a recognition processing is executed on individual character. Word recognition is executed on the above result at a word recognition part 15 by using a first data base. After the execution of the word recognition, address recognition is performed at an address recognition part 21 by using a second data base 19, and a recognition result is supplied to an output device 23, then, it is displayed. And the dictionary is comprised by registering the homonym as the reference type on the word, and the evaluation of the word is executed by adding the homonym on the reference type. In such a way, it is possible to accurately execute the word recognition.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は、単語認識方式に関し、例えば、住所を検索す
る際の基礎となる単語の認識方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a word recognition method, and for example, to a word recognition method that is the basis for searching for an address.

(従来の技術) 認識方法には種々雑多のものがあるが、住所認識の場合
には以下のような方式が採用されている。
(Prior Art) There are various recognition methods, but in the case of address recognition, the following methods are adopted.

まず、住所認識を実行するために、文字を認識し、これ
に引続き単語を認識する。この認識結果に基づいて、単
語を組合せて住所を検索する。この単語認識に際し、同
義語があった場合、略称を候補として挙げていた。例え
ば、rHiGHWAYJ  rI(tWAYJ  rH
WYjとの記載を読取り認識した場合、辞書からその略
称であるrHWYJを認識候補として得ていた。 これ
は、rHiGHWAYJ  rHiWAYJ  rHW
YJが共に同一の意味、即ち高速道路という意味しかな
い場合には非常に有効であった。
First, to perform address recognition, characters are recognized, followed by words. Based on this recognition result, addresses are searched by combining words. During word recognition, if synonyms were found, abbreviations were suggested as candidates. For example, rHiGHWAYJ rI(tWAYJ rH
When the description WYj was read and recognized, its abbreviation rHWYJ was obtained from the dictionary as a recognition candidate. This is rHiGHWAYJ rHiWAYJ rHW
It was very effective when both YJ had the same meaning, that is, expressway.

しかしながら、本願発明者が、試験、検討を重ねたとこ
ろ、略称の意味が重複する場合があり、正しい住所検索
が出来なかった。
However, after repeated tests and studies by the inventor of the present application, it was found that the meanings of the abbreviations overlapped in some cases, making it impossible to perform correct address searches.

(発明が解決しようとする課題) このように従来の技術では、単語の候補を単一の略称で
表現していたため、略称の意味が重複する場合、正しい
住所検索が出来なかった。
(Problems to be Solved by the Invention) As described above, in the conventional technology, word candidates are expressed by a single abbreviation, so when the meanings of the abbreviations overlap, correct address retrieval cannot be performed.

そこで、この発明は、略称の意味が重複する場合でもよ
り正確な認識の出来る単語認識方式を提供することを目
的とする。
Therefore, it is an object of the present invention to provide a word recognition method that allows more accurate recognition even when abbreviations have overlapping meanings.

[発明の構成] (課題を解決するための手段) 本発明は上記課題を解決するために、単語を辞書を用い
て認識する単語認識方式において、前記辞書を前記単語
に対し標準形と同義語を登録して構成し、前記標準形に
同義語を追加して前記単語の評価を実行することを特徴
とする。
[Structure of the Invention] (Means for Solving the Problems) In order to solve the above problems, the present invention provides a word recognition method in which words are recognized using a dictionary. The method is characterized in that a synonym is registered and configured, a synonym is added to the standard form, and the word is evaluated.

(作用) 本発明によれば、略称に対し可能な全ての意味を持たせ
、複数の候補を認識候補とするため単語認識がより正確
に実行される。
(Operation) According to the present invention, all possible meanings are assigned to an abbreviation and a plurality of candidates are used as recognition candidates, so that word recognition is performed more accurately.

(実施例) 次にこの発明の一実施例について、図面を用いて詳細に
説明する。
(Example) Next, an example of the present invention will be described in detail using the drawings.

この実施例は米国での郵便物の住所認識装置に関する。This embodiment relates to a mail address recognition system in the United States.

この住所認識装置は、第1図に示されるよう入力部11
を含む。この入力部11は、CcD等の光電変換装置か
ら構成され、郵便物上の画像をrOJ  rlJの画像
情報に変換するものである。この入力部11で得られた
画像情報は、文字検出認識部13に供給される。この文
字検出認識部13では、画像情報から文字を構成する情
報を抽出し、個々の文字について認識処理を実行する。
This address recognition device has an input section 11 as shown in FIG.
including. This input unit 11 is composed of a photoelectric conversion device such as a CcD, and converts an image on a mail item into rOJ rlJ image information. The image information obtained by this input section 11 is supplied to a character detection recognition section 13. The character detection and recognition unit 13 extracts information constituting characters from image information and performs recognition processing on each character.

この結果に対して単語認識部15において、第1のデー
タベースを用いて単語認識が実行される。
The word recognition unit 15 executes word recognition on this result using the first database.

単語認識の後、第2のデータベース19を用いて住所認
識部21にて住所認識がなされる。この認識結果が出力
装置23に供給し、表示等をする。
After word recognition, the address recognition unit 21 uses the second database 19 to recognize the address. This recognition result is supplied to the output device 23 and displayed.

次に上記構成における動作処理の詳細について説明する
Next, details of the operation processing in the above configuration will be explained.

文字検出認識部13での文字検出は、周知の技術で達成
される。まず、画像情報に対して2軸方向(水平、垂直
)に分布の統計をとる。即ち、2軸方向の各々に射影を
とり、「1」の数を計数する。すると、文字のある場所
では他の領域に比し、「1」の数、即ち画素数が多いの
で、文字の検出は容易に実現される。ただし、この時に
は、通常の認識処理に用いるより粗い密度の画素により
分布をとる。例えば、CODでの読取り能力が8本/ 
m mであり、通常の認識処理でも8本/ m mで行
っていたとすると、上記の検出処理には1本/mmの精
度で行う。粗い画素での処理の方が文字検出に誤りが生
じないので好ましい。
Character detection by the character detection recognition unit 13 is achieved using a well-known technique. First, statistics on the distribution of image information in two axes (horizontal and vertical) are taken. That is, projections are taken in each of the two axis directions and the number of "1"s is counted. Then, since the number of "1"s, that is, the number of pixels is larger in the area where the character is present than in other areas, the detection of the character can be easily realized. However, at this time, the distribution is based on pixels with a coarser density than those used in normal recognition processing. For example, the reading ability in COD is 8 lines/
mm, and if normal recognition processing is performed at a rate of 8 lines/mm, the above detection process is performed with an accuracy of 1 line/mm. Processing using coarse pixels is preferable because it prevents errors in character detection.

この文字検出では、2軸方向に射影をとるが、−の方向
の射影には切れ目(画素のない領域)があり、しかもそ
の間隔がある一定値以上になっている。この方向を第1
の方向と呼ぶ。他の方向の射影は、切れ目がないか、あ
ったとしてもその間隔が狭い。
In this character detection, projections are taken in the two-axis directions, and there is a break (area with no pixels) in the projection in the - direction, and the interval between the breaks is greater than a certain value. This direction is the first
is called the direction of Projections in other directions are either seamless or closely spaced.

ここでは、切れ目のある方向に沿って「行」が構成され
ていると考えられる。よって、上記の切れ目に沿って、
「行」を切出していく。
Here, it is considered that "rows" are constructed along the direction of the cut. Therefore, along the above cut,
Cut out the “row”.

行を切出したなら、この行を構成する領域の画像に対し
て、第1の方向とは垂直な第2の方向に沿って射影を取
る。これにより、文字のある領域では画素がカウントさ
れ、文字のない領域では画素がカウントされない。この
処理においては、扱う画素密度を行切出し時より精密な
ものにしても良い。上記と同様に、文字の存在の有無が
第2の方向への射影となって表れる。よって、行に対し
て文字の切出しが出来、文字検切が達成される。
Once a row is cut out, a projection is taken along a second direction perpendicular to the first direction with respect to the image of the area constituting this row. As a result, pixels are counted in areas with characters, and pixels are not counted in areas without characters. In this process, the pixel density to be handled may be made more precise than that at the time of line extraction. Similarly to the above, the presence or absence of a character appears as a projection in the second direction. Therefore, characters can be cut out from a line, and character inspection can be achieved.

文字の検切に引続き、文字認識が行われる。この実施例
では、例えば複合類似度法を用いて文字認識を行なう。
Following character verification, character recognition is performed. In this embodiment, character recognition is performed using, for example, a composite similarity method.

ここまでが、文字検出認識部13での処理であり、文字
検出認識部13からは、rHJ  riJ  rWJ 
 rYJ等の文字についての情報が得られる。
The processing up to this point is the processing in the character detection and recognition unit 13. From the character detection and recognition unit 13, rHJ riJ rWJ
Information about characters such as rYJ can be obtained.

この“文字認識の後、単語認識部15にて、文字の組合
わせである単語の認識が実行される。尚、認識について
は、単純に認識結果が得られるのではなく、複数の認識
候補が尤度の高い順に得られ、通常の場合、郵便物上の
読取りにより、複数の単語が認識候補として出力される
After this "character recognition," the word recognition unit 15 executes word recognition, which is a combination of characters.In addition, regarding recognition, rather than simply obtaining a recognition result, multiple recognition candidates are Words are obtained in descending order of likelihood, and in normal cases, a plurality of words are output as recognition candidates by reading them on a piece of mail.

今、説明の都合上、対象の郵便物上の住所中にrxxx
x  HiWAYJとあり、文字認識の結果、rHiW
AYJが単語として抽出されたとする。この単語rHi
WAYJと第1のデータベース17の内容とを比較照合
する。
For the sake of explanation, please note that rxxx is included in the address on the target mail.
x HiWAYJ, and as a result of character recognition, rHiW
Suppose that AYJ is extracted as a word. This word rHi
WAYJ and the contents of the first database 17 are compared and verified.

第1のデータベース17の該当する部分が以下の第1表
のように構成されている。
The relevant portion of the first database 17 is configured as shown in Table 1 below.

第1表 二こで、第1表の左欄の「書状タイプ」とあるのは、略
称であるものを含んでいる。
In Table 1, ``Letter type'' in the left column of Table 1 includes abbreviations.

よって、郵便物から読取った単語が「HiWAY」であ
るから、この単語と第1表に示される第1のデータベー
ス17の「書状タイプ」との比較照合が取られる。この
例では、第1表の第2行目のものが一致するので、まず
、その標準形を認識候補として抽出する。更に、郵便物
に書かれているrHiWAYJを同義語として認識候補
の一つとする。これは、rHiWAYJが[高速道路(
Hi G HWA Y) Jである場合の他に、地名で
ある場合もあるからである。
Therefore, since the word read from the mail is "HiWAY," this word is compared with the "letter type" in the first database 17 shown in Table 1. In this example, since the second line of Table 1 matches, the standard form is first extracted as a recognition candidate. Furthermore, rHiWAYJ written on the mail is treated as a synonym and one of the recognition candidates. This is because rHiWAYJ [Expressway (
This is because in addition to the case where it is Hi G HWA Y) J, it may also be a place name.

こうして、2つの認識候補rHiWAYJrHiGHW
AYJが出力され、住所認識部21に送られる。住所認
識部21では第2のデータベース1つを利用して認識候
補を一つに絞ってい(。具体的には、読取った単語の前
後の単語まで含め、認識候補を一つに決定する。ここで
は、地名であるとして、rxxxx  HiGHWAY
Jが選び出される。
In this way, two recognition candidates rHiWAYJrHiGHW
AYJ is output and sent to the address recognition section 21. The address recognition unit 21 uses one second database to narrow down the recognition candidates to one (.Specifically, it includes the words before and after the read word and decides on one recognition candidate. So, assuming it is a place name, rxxxxx HiGHWAY
J is selected.

この結果は、出力部23により表示等がなされる。This result is displayed, etc. by the output unit 23.

また、郵便物上に「ロロロロ HWY  XXXX」と
の記載があり、その通りに読取ったとする。
Also, assume that the mail has the words "RORORORO HWY XXXX" written on it and is read as such.

この場合には、rHWYJという単語の認識において、
[WYjとその標準形である「HiGHWAYJという
単語が認識候補として抽出される。
In this case, in recognizing the word rHWYJ,
[The word WYj and its standard form “HiGHWAYJ” are extracted as recognition candidates.

そして、この両候補が住所認識部21にて最終的に一つ
に絞られる。
Then, these two candidates are finally narrowed down to one by the address recognition unit 21.

[発明の効果] 以上説明したように本発明によれば、単語の標準形と同
義語を認識候補とするので、単語認識がより正確に実行
される。
[Effects of the Invention] As described above, according to the present invention, since the standard form of a word and a synonym are used as recognition candidates, word recognition can be performed more accurately.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例に係る住所認識装置の構成を
示す概略図である。 11・・・人力部 13・・・文字検出認識部 15・・・単語認識部 17・・・第1のデータベース 19・・・第2のデータベース 21・・・住所認識部     代理人弁理士則近恵佑
23・・・出力部          同1月下−坏1
FIG. 1 is a schematic diagram showing the configuration of an address recognition device according to an embodiment of the present invention. 11...Human power department 13...Character detection recognition unit 15...Word recognition unit 17...First database 19...Second database 21...Address recognition unit Representative Patent Attorney Norichika Keisuke 23... Output part January 2nd - Kyo 1
times

Claims (1)

【特許請求の範囲】[Claims] (1)単語を辞書を用いて認識する単語認識方式におい
て、前記辞書を前記単語に対し標準形と同義語を登録し
て構成し、前記標準形に同義語を追加して前記単語の評
価を実行することを特徴とする単語認識方式。
(1) In a word recognition method that recognizes words using a dictionary, the dictionary is configured by registering standard forms and synonyms for the words, and the synonyms are added to the standard forms to evaluate the words. A word recognition method characterized by:
JP63327602A 1988-12-27 1988-12-27 Word recognizing system Pending JPH02173886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63327602A JPH02173886A (en) 1988-12-27 1988-12-27 Word recognizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63327602A JPH02173886A (en) 1988-12-27 1988-12-27 Word recognizing system

Publications (1)

Publication Number Publication Date
JPH02173886A true JPH02173886A (en) 1990-07-05

Family

ID=18200890

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63327602A Pending JPH02173886A (en) 1988-12-27 1988-12-27 Word recognizing system

Country Status (1)

Country Link
JP (1) JPH02173886A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995664A (en) * 1996-06-21 1999-11-30 Nec Corporation Information recognition apparatus for recognizing recognition object information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995664A (en) * 1996-06-21 1999-11-30 Nec Corporation Information recognition apparatus for recognizing recognition object information

Similar Documents

Publication Publication Date Title
US20230015054A1 (en) Text classification method, electronic device and computer-readable storage medium
JPH11232291A (en) Search method for protein 3D structure database
JP2847715B2 (en) Character recognition device and character recognition method
JPH02130673A (en) Data retrieving system
JPH02173886A (en) Word recognizing system
JPH02181269A (en) Address recognizing system
JPH09282418A (en) Recognition method compounding apparatus and method
JP2685257B2 (en) Recognition method
JPH0477857A (en) Improper expression detecting device
JPH0795337B2 (en) Word recognition method
JPH02166586A (en) Address retrieving system
KR880000505B1 (en) Attachment Verification Method for Automatic Conversion of Korean and Chinese Characters
JP2839515B2 (en) Character reading system
JPS62285189A (en) Character recognition post processing system
JP2874199B2 (en) Word dictionary matching device
JPS61114388A (en) Character input device
JP2865443B2 (en) Kanji conversion device for Kana name or Kana corporation name
JPH09171539A (en) Character recognition device
JPS63138479A (en) Character recognizing device
JPS5953986A (en) Character recognizing device
JPH11238068A (en) Text retrieval device
JPH0226268B2 (en)
JPS6190641A (en) Indivisual identification apparatus
JPH0437971A (en) Character reading device
JPH04278664A (en) Address analysis processor