JPH08243505A

JPH08243505A - Address reading device and method

Info

Publication number: JPH08243505A
Application number: JP7053946A
Authority: JP
Inventors: Hisao Ogata; 日佐男緒方; Katsumi Marukawa; 勝美丸川; Yoshihiro Shima; 好博嶋; Masashi Koga; 昌史古賀; Tatsuhiko Kagehiro; 達彦影広; Masato Teramoto; 正人寺本; Shigeru Watanabe; 成渡辺; Hiromichi Fujisawa; 浩道藤澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-03-14
Filing date: 1995-03-14
Publication date: 1996-09-24
Anticipated expiration: 2016-08-20
Also published as: JP3201207B2; JPH11207266A

Abstract

(57)【要約】【目的】郵便物の住所情報の中にある住所表示番号の
認識を高速にかつ精度良く認識すること，さらに，新し
い住所表示番号の表記パターンの登録などのメンテナン
スを容易にすることを目的とする。【構成】任意の数字を表すワイルドカードを用いて表
現した単語を辞書に登録しておき，文字認識した結果を
ワイルドカードに変換した候補文字群とその辞書を単語
照合を行うことで住所表示番号を認識する。また，辞書
から照合する単語を検索するときに，ワイルドカードに
変換した候補文字群をインデックスとして検索し，さら
に住所表示番号の新旧住所表記属性，および縦横書きの
属性が一致する単語のみを取り出して照合を行う。 (57) [Summary] [Purpose] Fast and accurate recognition of the address display number in the address information of the mail, and easy maintenance such as registration of a new address display number notation pattern. The purpose is to do. [Structure] An address display number is created by registering a word expressed using a wildcard that represents an arbitrary number in a dictionary, and performing word matching on the candidate character group obtained by converting the result of character recognition into a wildcard and the dictionary. Recognize. In addition, when searching for words to be matched from the dictionary, the candidate character groups converted into wildcards are searched as an index, and only the words with the new and old address notation attributes of the address display number and the vertical and horizontal writing attributes are extracted. Match.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は，郵便物を自動的に区分
するために，郵便物上に記載されている住所情報を読み
取る住所読取装置及びその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an address reader and method for reading address information written on a mail in order to automatically classify the mail.

【０００２】[0002]

【従来の技術】住所情報として，例えば「東京都国分寺
市西恋ヶ窪３丁目８−１」が記述されていた時，「東京
都国分寺市西恋ヶ窪」を町域情報，「３丁目８−１」の
数字で記述されている部分を住所表示番号情報と定義す
る。従来，郵便自動読み取り区分法としては，町域情報
について区分をする方式が知られている。これは「東京
都」，「国分寺市」，「西恋ヶ窪」と階層構造を持つた
め，１，２字程度文字が認識できない場合でも，階層構
造を利用して，知識処理によりそれを補間することで，
町域情報全体を認識することが可能だからである。一
方，住所表示番号情報はそのような階層構造がない上
に，同じ意味を持つ住所表示番号でも様々な記述形態が
存在する。例えば，「三丁目八番地一号」，「３丁目８
−１」，「３−８−１」，「３の８の１」は同じ住所表
示番号を意味する。2. Description of the Related Art If, for example, "3-8-1, Nishikoigakubo, Kokubunji, Kokubunji, Tokyo" is described as address information, "3-8-1", Nishikoigakubo, Kokubunji, Kokubunji, Tokyo The part described by numbers is defined as the address display number information. Conventionally, as a mail automatic reading classification method, a method of classifying town area information is known. Since this has a hierarchical structure with "Tokyo", "Kokubunji City", and "Nishi-Koigakubo", even if one or two characters cannot be recognized, the hierarchical structure is used to interpolate it by knowledge processing. so,
This is because it is possible to recognize the entire town area information. On the other hand, the address display number information does not have such a hierarchical structure, and there are various description forms even for the address display numbers having the same meaning. For example, "3rd, 8th, 1st,""3rd,8th"
"-1,""3-8-1," and "1 of 3-8" mean the same address display number.

【０００３】従来，このような住所表示番号の区分方法
としては，例えば，特開平６−１２４３６６号公報，ま
たは電子方法通信学会信学技法ＮＣＬ９２−２６，ＰＲ
Ｕ９２−４０（１９９２年１０月）「住所読み取りにお
ける丁目・街区認識方式」が知られている。この手法
は，切り出した各文字パターンを認識した後，認識した
結果の文字種に応じてラベル付けし，住所表示番号部分
を数値部と区切り情報部に分ける。Conventionally, as a method of classifying such address display numbers, for example, Japanese Laid-Open Patent Publication No. 6-124366 or the Institute of Electronics and Communication Engineers, NCL92-26, PR.
U92-40 (October 1992) "Chome / block recognition method for address reading" is known. In this method, after recognizing each cut-out character pattern, labeling is performed according to the character type of the recognized result, and the address display number part is divided into a numerical value part and a delimiter information part.

【０００４】例えば，「３丁目８−１」は「ＮＤＤＮＤ
Ｎ」（Ｎ：算用数字を表すラベル，Ｄ：区切り情報を表
すラベル）となる。その後，ラベルの全ての組み合わせ
に応じた複数の処理関数を予め用意しておき，前述のラ
ベル付け応じた個別の処理関数を呼び出すことにより，
住所表示番号を認識するものである。For example, "3-8-1" is "NDDND"
N ”(N: label indicating arithmetic number, D: label indicating delimiter information). After that, prepare multiple processing functions according to all combinations of labels in advance, and call the individual processing functions according to the above-mentioned labeling,
It recognizes the address display number.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら，上記の
方法では２つの大きな問題点がある。一つ目の問題点
は，誤認識に十分対応できないという問題である。例え
ば，「３丁目８−１」という住所表示番号が入力された
時に，「丁」の認識結果に数字や区切り情報を表す文字
以外の認識候補しかなかった場合には，ラベル付けを行
うことができないために対応できない。あるいは，住所
表示番号の領域が検出できなかった場合は，ラベル付け
ができなくなるので住所表示番号を認識することができ
ない。However, the above method has two major problems. The first problem is that it cannot adequately deal with misrecognition. For example, when the address display number "3 chome 8-1" is entered and the recognition result of "chome" includes only recognition candidates other than numbers and characters representing delimiter information, labeling may be performed. I can't respond because I can't. Alternatively, if the area of the address display number cannot be detected, labeling cannot be performed and the address display number cannot be recognized.

【０００６】また，上記方法では区切り情報を表す文字
を一つのラベルで表して区別していない。そのため，例
えば「３丁目８−１」の「目」が誤認しきされて「８」
という認識候補があった場合は，「ＮＤＮＮＤＮ」とい
うラベル付けがなされることになり，数字と数字の間が
区切り情報で仕切られているので，一見矛盾のないラベ
ル付けがなされることになる。Further, in the above method, the characters representing the delimiter information are not distinguished by being represented by one label. Therefore, for example, the "eye" of "3-8-1" is mistakenly recognized as "8".
If there is such a recognition candidate, the label will be labeled as "NDNNDN", and since the numbers are separated by delimiter information, the labeling will be apparently consistent.

【０００７】この問題は区切り情報を詳細に区別すれば
解決できる。しかし，この装置では高速に処理すること
を目的として，ラベル付けに応じた個別の処理関数を設
けてそれぞれで処理を行っている。そのため，住所表示
番号の様々な表記パターンを詳細に分類して処理しよう
とすると，非常に多くの処理関数を設ける必要があり，
非常に困難となる。This problem can be solved by separating the delimiter information in detail. However, in this device, for the purpose of high-speed processing, individual processing functions corresponding to labeling are provided and each processing is performed. Therefore, when trying to classify and process various notation patterns of address display numbers in detail, it is necessary to provide an extremely large number of processing functions.
It will be very difficult.

【０００８】２つ目の問題点は，個別処理関数方式であ
るため，新しい表記パターンへの登録をする場合は新し
く関数を生成する必要があり，メンテナンスが容易でな
いという問題である。The second problem is that since it is an individual processing function system, it is necessary to generate a new function when registering a new notation pattern, and maintenance is not easy.

【０００９】本発明は，処理速度を犠牲にすることなく
上記問題点を解決すること，および自動的に住所表示番
号が認識できなかった場合に，オペレータにより正しい
住所表示番号を入力するために確認の補助を行うことを
目的とする。その他の目的は，明細書の記載から自ずと
明らかになろう。The present invention solves the above problems without sacrificing the processing speed, and when the address display number cannot be automatically recognized, the operator is required to input the correct address display number. The purpose is to provide assistance. Other purposes will be apparent from the description in the specification.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するため
に，本発明では基本的に５つの特徴事項を提供する。第
一の特徴事項は，任意の数字を表すワイルドカードを用
いて単語照合を行うことである。郵便物の住所情報から
文字切出，認識結果として，候補文字群とそれに対応す
るペナルティを格納した候補文字ラティスが得られる。
その後，知識処理により住所表示番号領域の先頭を検出
し，住所表示番号領域の候補文字ラティスから，正しい
住所表示番号を抽出するために，単語照合を行う。しか
し，丁目や番などを表す数字は任意の数字を取り得るた
め，全ての数字の組み合わせについて，住所表示番号の
単語を保持するのは記憶容量と処理速度の点で事実上不
可能である。そこで，候補文字ラティスの数字候補を任
意の数字を表すワイルドカードに変換した候補文字ラテ
ィスを作る。そして，ワイルドカードで表した住所表記
番号の単語とワイルドカードのラティスとの単語照合を
行えば，住所表記番号においても町域照合と同様な単語
照合ができ，正しい住所表示番号の単語を抽出すること
ができる。そして，単語照合の結果，上位の候補に上が
ってきたワイルドカードの単語の数字部分について，元
の候補文字ラティスを参照しながら元の数字に復元すれ
ば，正しい候補が得られる。In order to solve the above problems, the present invention basically provides five features. The first feature is that word matching is performed using wildcards that represent arbitrary numbers. As a result of character segmentation and recognition from the address information of the mail, a candidate character lattice that stores a candidate character group and the corresponding penalty is obtained.
After that, the beginning of the address display number area is detected by knowledge processing, and word matching is performed in order to extract the correct address display number from the candidate character lattice of the address display number area. However, since the numbers indicating the claws and numbers can be arbitrary numbers, it is practically impossible to hold the word of the address display number for all combinations of numbers in terms of storage capacity and processing speed. Therefore, a candidate character lattice is created by converting the candidate characters of the candidate character lattice into a wildcard that represents an arbitrary number. Then, if the word with the address notation number represented by a wild card and the wild card lattice are matched, the same word matching as the town area matching can be performed for the address notation number, and the word with the correct address display number is extracted. be able to. Then, as a result of the word matching, if the numeral portion of the wildcard word that has been selected as the upper candidate is restored to the original numeral while referring to the original candidate character lattice, a correct candidate can be obtained.

【００１１】上記第一の特徴事項の詳細について述べ
る。住所表示番号領域について候補文字ラティスを示し
たのが図６である。図６(a)は候補文字の文字コードを
格納した候補文字テーブルであり，図６(b)はそれぞれ
の候補文字に対するペナルティである。ワイルドカード
への変換テーブルの例を図７に示す。図７において，例
えば算用数字は「ｎ」，漢数字は「ｋ」，区切り記号は
「丁」，「目」，「番」，「号」，「−」，「｜」等に
変換される。住所表記番号で使用されない全ての文字は
その他の文字として「ｅ」に変換される。Details of the first characteristic item will be described. FIG. 6 shows the candidate character lattice for the address display number area. FIG. 6A is a candidate character table storing the character codes of the candidate characters, and FIG. 6B is the penalty for each candidate character. FIG. 7 shows an example of the conversion table to the wild card. In FIG. 7, for example, arithmetic numerals are converted to “n”, Chinese numerals are “k”, and delimiters are converted to “Ding”, “eye”, “ban”, “go”, “−”, “|”, etc. It All characters not used in the address notation number are converted to "e" as other characters.

【００１２】このようなテーブルを用いて生成されたワ
イルドカードラティスの例を図８に示す。図８(a)はワ
イルドカードテーブルであり，図８(b)はそれに対応す
るコストテーブルである。ここでは，一つの記入文字に
ついて同じワイルドカードが重複してでてきたら，最も
上位に位置するワイルドカードのみをワイルドカードテ
ーブルに書き込み，それ以外のワイルドカードは省略す
る。例えば，図６(a)の記入文字番号１２の候補文字を
見ると，第１位，第２位，第５位にそれぞれ算用数字の
候補があるが，第１位の算用数字に対するワイルドカー
ド「ｎ」のみを，図８(a)のワイルドカードテーブルの
該当する場所に書き込む。そして，図８(b)のコストテ
ーブルのそれに対応する場所に，第１位のコスト「０」
を書き込む。FIG. 8 shows an example of the wild card lattice generated using such a table. FIG. 8A is a wild card table, and FIG. 8B is a corresponding cost table. Here, if the same wild card is duplicated for one entry character, only the highest wild card is written in the wild card table, and other wild cards are omitted. For example, looking at the candidate character with the written character number 12 in FIG. 6 (a), there are candidates for arithmetic numbers in the first, second, and fifth positions, respectively. Only the card "n" is written in the corresponding place in the wild card table of FIG. 8 (a). Then, at the location corresponding to that in the cost table of FIG. 8 (b), the first cost "0"
Write.

【００１３】一方，照合を行うための単語はワイルドカ
ードを用いて「ｎ丁目ｎ−ｎｅ」等の表現で辞書に格納
されている。よって，ワイルドカードラティスから単語
照合を行うためのオートマトンを生成し，ワイルドカー
ドで表された単語との照合を行うことで，正しい単語を
得ることができる。ここでオートマトンは状態と状態の
間の遷移経路に対して，それぞれ候補文字とそれに対応
するコストが割り当てられており，単語が状態間を遷移
していく間に，該当するコストが積算されていく。これ
により，それぞれの単語に対して文字数で割った平均コ
ストが得られ，そのコストが小さい単語が上位候補とし
て挙げられる。On the other hand, the word for matching is stored in the dictionary in a representation such as "n-chome n-ne" using a wild card. Therefore, a correct word can be obtained by generating an automaton for word matching from the wild card lattice and matching with the word represented by the wild card. Here, in the automaton, candidate characters and their corresponding costs are assigned to the transition paths between states, and the corresponding costs are accumulated while the word transitions between states. . As a result, the average cost divided by the number of characters for each word is obtained, and the word with the low cost is listed as the top candidate.

【００１４】図９にオートマトンの例を示す。オートマ
トン１９１において丸印は状態を示し，その中に書かれ
た数字は状態番号を示す。加えて，状態と状態の間が単
語の各記入文字位置に対応し，状態間の線は遷移経路を
示す。遷移経路上の左側の文字は，ある状態でオートマ
トンにその文字が入力された時に，その遷移経路を辿っ
て次の状態に遷移することを示す。また，遷移経路上で
「other」は遷移経路に対応する文字として明示された
もの以外の全ての文字を表す。遷移経路上の［］内の数
字は，その経路を辿って遷移した時に有するコストであ
る。FIG. 9 shows an example of the automaton. In the automaton 191, circles indicate states, and numbers written therein indicate state numbers. In addition, the state-to-state correspondence corresponds to each written character position of the word, and the line between the states indicates a transition path. The character on the left side of the transition path indicates that when the character is input to the automaton in a certain state, the transition path is followed to transition to the next state. Further, "other" on the transition path represents all characters other than those explicitly specified as the characters corresponding to the transition path. The number in brackets [] on the transition route is the cost of having a transition along the route.

【００１５】例として，オートマトン１９１を用いて，
単語「ｎ丁目ｎ−ｎｅ」１９０が入力された時のコスト
計算を考える。まず，状態１から状態２に遷移するとき
に「ｎ」のコスト［０］が加算され，状態２から状態３
に遷移するときは「丁」のコスト［１］が加算され，以
下同様に遷移が進んでいく。そして，単語の文字数分の
遷移が全て終わった後，積算されたコストを単語の文字
数で割ることでその単語の平均コスト１９２が得られ
る。As an example, using the automaton 191,
Consider the cost calculation when the word "n-chome n-ne" 190 is entered. First, when the state 1 transits to the state 2, the cost [0] of “n” is added, and the state 2 to the state 3 are added.
When transitioning to, the cost [1] of "Ding" is added, and the transition proceeds in the same manner. Then, after all the transitions of the number of characters of the word are completed, the integrated cost is divided by the number of characters of the word to obtain the average cost 192 of the word.

【００１６】第二の特徴事項は，高速に単語照合を行う
ためにインデックスを設けることである。住所表示番号
の多様な表記パターンに対応するためには，多くの単語
を登録しておく必要があるが，それら単語群を全て照合
すると処理時間が膨大になる。そこで，インデックスを
設けて不要な単語は照合を行わないようにする。以下に
３つのインデックスを示す。The second feature is that an index is provided for high-speed word matching. It is necessary to register many words in order to deal with various notation patterns of the address display number, but if all the word groups are collated, the processing time becomes enormous. Therefore, an index is provided so that unnecessary words are not matched. The three indexes are shown below.

【００１７】一つ目のインデックスは，記入文字の１文
字目，２文字目の候補文字をインデックスとすることで
ある。１文字目，２文字目の候補文字に単語の文字が含
まれるものは，正しい単語である可能性が高く，それに
漏れるものは可能性が低いと思われる。そこで，１文字
目，２文字目に該当する候補文字を含む単語のみを照合
することで，全ての単語を照合しなくても高速に正しい
解を探索することができる。二つ目のインデックスは新
旧住所表記のフラグである。まず，予め町域照合の単語
辞書にその町域の新旧住所表記に関するフラグを登録し
ておく。一方，そのワイルドカードの単語に，新旧住所
表記のどちらに対応する表記パターンかを示すフラグを
設けておく。そして，単語を検索する際に，それらフラ
グを照らし合せることで，不必要な単語照合を防ぐこと
ができる。三つ目のインデックスは，縦横書きのフラグ
である。文字認識を行うときに縦横書きに関するフラグ
を出力するようにしておく，一方，ワイルドカードの単
語にも縦横書きに関するフラグを設けておき，それらの
フラグを照らし合せることで，不必要な単語照合を避け
ることができる。The first index is to use the first and second candidate characters of the entered character as an index. It is highly likely that the first and second candidate characters that include a word character are correct words, and those that leak to that are unlikely. Therefore, by matching only the words including the candidate characters corresponding to the first and second characters, the correct solution can be searched at high speed without matching all the words. The second index is a flag indicating the old and new address. First, a flag related to the notation of old and new addresses in a town area is registered in advance in the word dictionary for town area matching. On the other hand, the wildcard word is provided with a flag indicating which of the new and old address notation patterns it corresponds to. Then, when searching for a word, by checking those flags, it is possible to prevent unnecessary word matching. The third index is a vertical and horizontal writing flag. When performing character recognition, the flags related to vertical and horizontal writing are output. On the other hand, the flags related to vertical and horizontal writing are also provided for the wildcard words, and by comparing these flags, unnecessary word matching can be performed. Can be avoided.

【００１８】第三の特徴事項は，町域照合により住所表
示番号領域の先頭が検出できなかった場合に，様々な記
入文字位置において単語照合を行うことである。まず住
所の先頭から始まる候補文字ラティスの中から，数字の
候補を全て探索し，それらを含む記入文字位置を記憶し
ておく。これは住所表示番号が必ず数字から始まること
による。そして，探索した記入文字位置を住所表示番号
の先頭であると仮定して，仮定された全ての文字位置か
ら単語照合を行う。これにより，任意の位置に記入して
ある住所表示番号を照合することができる。A third feature is that word matching is performed at various entry character positions when the beginning of the address display number area cannot be detected by town area matching. First, the candidate character lattice starting from the beginning of the address is searched for all numerical candidates, and the entry character position including them is stored. This is because the address display number always starts with a number. Then, assuming that the searched character position is the beginning of the address display number, word matching is performed from all the assumed character positions. As a result, the address display number written in any position can be checked.

【００１９】第四の特徴事項は，住所表示番号の数字部
分について階層的に数字の取り得る範囲を保持しておく
住所表示番号範囲辞書である。予め，丁目，番，号等の
各数字部分の取り得る範囲を辞書中に階層的に登録して
おく。例えば，国分寺市は４丁目までしかないとか，４
丁目は８番地までしかないとか，４丁目８番地は９号ま
でしかない等の情報を階層的に保持しておく。そして，
ワイルドカードから数字に復元された住所表示番号の候
補に対して，この辞書と照らし合せることで，在りえな
い住所表示番号を候補から除くことができる。The fourth characteristic item is an address display number range dictionary which holds a range in which numbers can be hierarchically set for the number part of the address display number. The possible range of each numerical part such as chome, number, and number is registered in the dictionary in a hierarchical manner in advance. For example, Kokubunji City has only 4 chomes, 4
Information such as that the number of chome is only 8 or that the number of chome 4 is 8 is only hierarchically retained. And
By comparing this address dictionary with the candidates for address display numbers that have been restored from wildcards to numbers, it is possible to remove the impossible address display numbers from the candidates.

【００２０】第五の特徴事項は，第４の特徴事項を用い
てオペレータが入力した住所表示番号が正しいかどうか
の判定を行うことである。照合の結果，正しい住所表示
番号が得られなかった場合に，オペレータが郵便物の宛
名を見ながら住所表示番号入力する。その際に入力ミス
を防ぐために，入力された住所表示番号が正しいかどう
かを住所表示番号範囲辞書を用いて判定する。The fifth feature is to determine whether the address display number entered by the operator using the fourth feature is correct. If the correct address display number is not obtained as a result of the collation, the operator inputs the address display number while checking the address of the mail. At this time, in order to prevent input mistakes, it is determined using the address display number range dictionary whether the input address display number is correct.

【００２１】尚，上記の基本的な５つの特徴事項は，装
置としても方法としてもそれぞれ発明として捉えられる
ものである。The above-mentioned five basic features can be regarded as inventions in terms of both a device and a method.

【００２２】[0022]

【作用】本発明は，次の５つの基本的な作用がある。第
一は，文字認識で正しい候補文字が全く挙がらなかった
場合でも，住所表示番号を認識できることである。本発
明では，住所表示番号の表記パターンを任意の数字を表
すワイルドカードを用いて表現した辞書単語として保持
しており，認識結果の候補文字群と単語のコストを計算
して照合を行うことで，住所表示番号を認識することが
できる。そのため，住所表示番号の一部の認識結果の候
補文字群に正しい候補が上がらなかった場合でも，それ
を補間して住所表示番号を認識することができる。例え
ば，「３丁目８−１」の「丁」に対応する文字パターン
に対して，「丁」という文字が認識候補として上がらな
かった場合でも，それに対応する単語が全体としてコス
トが小さければ，「丁」を補間して住所番号を認識する
ことができる。The present invention has the following five basic actions. First, the address display number can be recognized even if no correct candidate character is found in the character recognition. In the present invention, the notation pattern of the address display number is held as a dictionary word expressed by using a wildcard that represents an arbitrary number, and the candidate character group of the recognition result and the cost of the word are calculated and collated. ， Can recognize the address display number. Therefore, even when a correct candidate does not appear in the candidate character group of the recognition result of a part of the address display number, the address display number can be recognized by interpolating the correct candidate. For example, even if the character "Ding" does not appear as a recognition candidate for the character pattern corresponding to "Ding" in "3rd chome 8-1", if the cost of the corresponding word is small as a whole, " The address number can be recognized by interpolating "Ding".

【００２３】加えて，住所表示番号の表記パターンを辞
書の形式で保持しているため，個別の関数を準備して処
理するより詳細に表記パターンを見ることができるの
で，誤認識を防ぐことができる。In addition, since the notation pattern of the address display number is held in the form of a dictionary, the notation pattern can be viewed in more detail than preparing and processing an individual function, thus preventing erroneous recognition. it can.

【００２４】第二は，住所表示番号を詳細に調べること
ができるにも関わらず，高速に処理されることである。
まず，辞書から単語を検索するときに，各文字パターン
に対応する候補文字群をインデックスとして検索するた
めに，照合を行う単語数を減らすことができる。さら
に，各単語に新旧住所表記や縦横書きに対応する属性を
持たせいているので，予め認識しようとする住所表示番
号が新旧住所表記のどちらに属するか，あるいは縦横書
きのどちらであるかが分かっていれば，不必要な単語の
照合を防ぐことができ，高速な処理が可能となる。Second, although the address display number can be examined in detail, it is processed at high speed.
First, when a word is searched from a dictionary, a candidate character group corresponding to each character pattern is searched as an index, so that the number of words to be matched can be reduced. Furthermore, since each word has an attribute corresponding to the old and new address notation and vertical and horizontal writing, it is possible to know in advance whether the address display number to be recognized belongs to the new or old address notation or the vertical and horizontal writing. If so, unnecessary word collation can be prevented and high-speed processing becomes possible.

【００２５】第三は，辞書方式であるために，新しい表
記パターンが発生した場合は簡単に辞書に登録すること
ができ，メンテナンスが容易なことである。Thirdly, since it is a dictionary system, when a new writing pattern occurs, it can be easily registered in the dictionary and maintenance is easy.

【００２６】第四に，住所表示番号の数字部分について
階層的に数字の取り得る範囲を保持しておく住所表示番
号範囲辞書があるため，実際にありえない住所表示番号
の候補を除くことができる。Fourthly, since there is an address display number range dictionary that holds a range in which the numbers of the address display numbers can be hierarchically held, it is possible to exclude the address display number candidates that cannot actually exist.

【００２７】第五に，住所表示番号照合の結果，住所表
示番号候補が得られなかった場合に，オペレータが郵便
物の宛名領域の画像を見ながら，住所表示番号を正しく
入力できることである。住所表示番号の数値部の取り得
る範囲の値を階層的に保持する住所表示番号範囲辞書を
保持しているため，オペレータが入力した住所表示番号
の数値部が正しい範囲内に入っているかを，住所表示番
号範囲辞書を参照して判定することができる。判定の結
果，範囲外と判定された場合はオペレータに警告を行う
ため，オペレータによる入力ミスを防ぐことができる。Fifth, as a result of the address display number collation, if no address display number candidate is obtained, the operator can correctly input the address display number while looking at the image of the address area of the mail. Since the address display number range dictionary that hierarchically holds the values of the range that can be taken by the numerical part of the address display number is stored, whether the numerical part of the address display number entered by the operator is within the correct range, It can be determined by referring to the address display number range dictionary. If the result of the determination is that it is out of the range, the operator is warned, so that an input error by the operator can be prevented.

【００２８】[0028]

【実施例】以下，本発明の第一の実施例を図１〜図１４
を用いて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A first embodiment of the present invention will be described below with reference to FIGS.
Will be explained.

【００２９】図１は本実施例の装置全体の構成図であ
る。郵便物１００は供給部１０１に順次送られる。供給
部１０１において郵便物が所定位置を通過し，その通過
の間に郵便物の表面の画像は画像入力部１０２により撮
像される。住所認識部１０６では，郵便物の表面に記載
された住所を読み取り区分情報を生成する。一方，表面
画像が撮像された当該郵便物は遅延搬送路１０３に送ら
れる。遅延搬送路１０３では，表面画像より区分情報を
生成するための所定時間分，郵便物は当該搬送路１０３
を移動する。区分部１０４では，住所認識部１０６から
の区分情報に従って郵便物を区分けした後，区分棚１０
５に格納する。画像入力部１０２では，ラインセンサの
ような光電変換素子からの画像信号をデジタル化すると
共に，郵便物の表面の画像から宛名の文字行を抽出す
る。FIG. 1 is a block diagram of the entire apparatus of this embodiment. The postal matter 100 is sequentially sent to the supply unit 101. In the supply unit 101, the mail piece passes through a predetermined position, and an image of the surface of the mail piece is captured by the image input unit 102 during the passage. The address recognition unit 106 reads the address written on the surface of the postal matter and generates classification information. On the other hand, the postal matter whose surface image has been captured is sent to the delay conveyance path 103. In the delay conveyance path 103, the mail is sent for the predetermined time for generating the sorting information from the surface image,
To move. The sorting unit 104 sorts the mail according to the sorting information from the address recognition unit 106, and then sorts the sorting shelves 10
Store in 5. The image input unit 102 digitizes an image signal from a photoelectric conversion element such as a line sensor, and extracts a character line of an address from an image on the surface of a mail.

【００３０】住所認識部１０６は，制御部１０７，画像
処理部１０８，文字認識部１０９，知識処理部１１０か
らなり，制御部１０７は画像処理部１０８，文字認識部
１０９，知識処理部１１０をそれぞれ制御する。知識処
理部１１０は，町域照合部１１１，住所表示番号照合部
１１２からなり，それぞれ町域照合部１１１では町域単
語辞書１１３を，住所表示番号照合部１１２では住居表
示番号辞書群１１４をアクセスし，文字認識部１０９の
認識結果の誤りなどを自動的に修正する。The address recognition unit 106 comprises a control unit 107, an image processing unit 108, a character recognition unit 109, and a knowledge processing unit 110. The control unit 107 includes an image processing unit 108, a character recognition unit 109, and a knowledge processing unit 110, respectively. Control. The knowledge processing unit 110 includes a town area collation unit 111 and an address display number collation unit 112. The town area collation unit 111 accesses the town area word dictionary 113, and the address display number collation unit 112 accesses the house display number dictionary group 114. Then, an error in the recognition result of the character recognition unit 109 is automatically corrected.

【００３１】住所表示番号辞書群１１４は，住所表示番
号単語インデックス辞書１１５，住所表示番号単語辞書
１１６，住所表示番号範囲辞書１１７からなる。住所表
示番号単語辞書１１６は，街区の表記パターンを納めた
辞書であり，住所表示番号単語インデックス辞書１１５
は，住所表示番号単語辞書１１６から必要な単語を選択
的に検索するためのインデックスを納めた辞書である。
住所表示番号範囲辞書１１７は各住所表示番号における
丁目，街区，住居表示番号のそれぞれの範囲を階層的に
記した辞書である。The address display number dictionary group 114 includes an address display number word index dictionary 115, an address display number word dictionary 116, and an address display number range dictionary 117. The address display number word dictionary 116 is a dictionary that stores the notation pattern of the block, and the address display number word index dictionary 115.
Is a dictionary that stores indexes for selectively searching for required words from the address display number word dictionary 116.
The address display number range dictionary 117 is a dictionary in which each range of chome, block, and house display number in each address display number is hierarchically described.

【００３２】図２は図１における住所認識部１０６の処
理全体の流れを示したものである。図２を用いて住所認
識部１０６の動作を説明する。FIG. 2 shows the flow of the entire processing of the address recognition unit 106 in FIG. The operation of the address recognition unit 106 will be described with reference to FIG.

【００３３】ステップ１２０では，画像入力部１０２が
郵便物の宛名書いてある面を撮像し，その画像を住所認
識部１０６に入力する。ステップ１２１では，画像処理
部１０８が画像入力部１０２より送られてきた宛名の書
いてある全面の画像から宛名領域を抽出する。ステップ
１２２では，画像処理部１０８がステップ１２１で得ら
れた宛名領域の画像から文字行毎の画像を抽出する。ス
テップ１２３では，文字認識部１０９がステップ１２２
で得られた文字行の画像から，１文字毎に文字画像を切
り出した後，切り出した１文字毎に文字を認識して文字
コードに変換する。ここで，文字認識した結果は候補と
しての複数の文字コードと対応する類似度が得られる。
また，文字の方向，すなわち縦書きか横書きかの情報も
得られる。In step 120, the image input unit 102 captures an image of the surface of the postal matter on which the address is written, and inputs the image to the address recognition unit 106. In step 121, the image processing unit 108 extracts the address area from the entire image of the address written from the image input unit 102. In step 122, the image processing unit 108 extracts an image for each character line from the image of the address area obtained in step 121. In step 123, the character recognition unit 109 causes step 122
After the character image is cut out for each character from the image of the character line obtained in step 1, a character is recognized for each cut out character and converted into a character code. Here, as a result of character recognition, similarities corresponding to a plurality of candidate character codes are obtained.
In addition, information on the direction of characters, that is, vertical writing or horizontal writing can be obtained.

【００３４】ステップ１２４では，知識処理部１１０に
おける町域照合部１１１がステップ１２３で得られた文
字認識結果を基に候補文字ラティスを生成する。ここ
で，候補文字ラティスの例を図４に示す。これらは，例
えば「東京都国分寺市西恋ヶ窪３−８−１日立寮」とい
う住所が，郵便物の宛名に書いてあった時に，町域情報
を表す「東京都国分寺市西恋ヶ窪」の部分の候補文字ラ
ティスの例である。図４(a)は候補文字ラティス中の候
補文字テーブルの例であり，図４(b)はそれに対応する
コストテーブルの例である。図４(a)の候補文字テーブ
ルは，記入文字番号毎に第１位から第ｍ位（ｍは任意）
まで，候補文字の文字コードが並べられている。図４
(b)のコストテーブルは，図４(a)の候補文字テーブルの
文字コードが格納されている位置と対応する場所に，そ
の候補文字に対するコストが格納されている。例えば，
記入文字番号１の第１位の候補文字「東」のコストは
「０」となる。In step 124, the town area collating section 111 in the knowledge processing section 110 generates a candidate character lattice based on the character recognition result obtained in step 123. Here, an example of the candidate character lattice is shown in FIG. These are, for example, when the address "3-8-1, Nishi-Kogokubo, Kokubunji-shi, Tokyo, Hitachi Dormitory" is written in the address of the mail, it is the part of "Nishi-Kogakubo, Kokubunji-shi, Tokyo" that represents the town area information. It is an example of a candidate character lattice. FIG. 4A is an example of a candidate character table in the candidate character lattice, and FIG. 4B is an example of a cost table corresponding to it. In the candidate character table of FIG. 4 (a), the first to mth positions (m is arbitrary) for each entry character number.
Up to, the character codes of the candidate characters are arranged. FIG.
In the cost table of (b), the cost for the candidate character is stored at a position corresponding to the position where the character code of the candidate character table of FIG. 4 (a) is stored. For example,
The cost of the first candidate character “East” of the written character number 1 is “0”.

【００３５】ステップ１２５では，知識処理部１１０の
中の町域照合部１１１が，町域単語辞書１１３を用いて
ステップ１２４で得られた候補文字ラティスから住所の
町域情報，および町域情報に対応する新旧住所表記情報
を抽出する。町域情報の抽出法としては，候補文字ラテ
ィスからオートマトンを生成して単語照合を行う手法が
知られている。これは特開平３−１２５２８８号公報に
その詳細が記述されている。In step 125, the town area collation unit 111 in the knowledge processing unit 110 uses the town area word dictionary 113 to convert the candidate character lattice obtained in step 124 into town area information of an address and town area information. Extract the corresponding old and new address notation information. As a method of extracting town area information, a method is known in which an automaton is generated from a candidate character lattice and word matching is performed. The details are described in Japanese Patent Laid-Open No. 3-125288.

【００３６】次に，町域単語辞書１１３の構成，およ
び，町域情報に対応する新旧住所表記情報の抽出法を図
５を用いて説明する。町域単語辞書１１３は単語が階層
的に格納されており，例えば，都道府県レベルの単語と
しては，埼玉県１４２，東京都１４３，神奈川県１４４
があり，東京都１４３の下の市区郡レベルの単語として
は，小金井市１４５，国分寺市１４６，国立市１４７が
ある。そして，国分寺市１４６の下には町域レベルの単
語，日吉町１４８，西恋ヶ窪１４９，東恋ヶ窪１５０が
ある。さらに，町域レベルの単語には全国の町域情報を
識別するための７桁の町域区分番号１５１〜１５３，個
々の町域が新旧住所表記のどちらに該当するかを示すフ
ラグ１５４〜１５６がそれぞれ一緒に格納されている。
新旧住所表記のフラグは，例えばその町域が新住所表記
に該当すれば１，旧住所表記に該当すれば２というよう
な値を取る。よって階層的に単語照合を行って町域情報
が得られると，辞書の該当する部分を参照することで，
町域情報を識別するための７桁の町域区分番号，辞書の
新旧住所表記のフラグも同時に得られる。７桁の町域区
分番号は，後の処理で住所表示番号の数字部分の取りう
る範囲を判定するために，住所表示番号範囲辞書１１７
を検索する時のインデックス，および区分部１０５で郵
便物を区分するための制御情報の一部として利用され
る。また，新旧住所表記フラグは住所表示番号照合を行
うために，住所表示番号単語辞書から照合する単語数を
制限するためのインデックスとして利用される。以上の
ステップ１２５の町域照合処理により，町域情報，町域
区分番号，および新旧住所表記フラグが得られる。Next, the construction of the town area word dictionary 113 and the method of extracting new and old address notation information corresponding to town area information will be described with reference to FIG. The town area word dictionary 113 stores words hierarchically. For example, as prefecture level words, Saitama prefecture 142, Tokyo 143, Kanagawa prefecture 144.
There are municipality-level words below Tokyo 143 such as Koganei city 145, Kokubunji city 146, and national city 147. Below the Kokubunji city 146, there are town level words, Hiyoshi-cho 148, Nishikoigakubo 149, and Higashikoigakubo 150. Further, in the word at the town area level, 7-digit town area division numbers 151 to 153 for identifying town area information nationwide, and flags 154 to 156 indicating which of the old and new address areas each town area corresponds to. Are stored together.
The flag of the new and old address notation takes a value such as 1 if the town area corresponds to the new address notation and 2 if the town area corresponds to the old address notation. Therefore, when word information is hierarchically obtained and town area information is obtained, by referring to the corresponding part of the dictionary,
A 7-digit town area classification number for identifying town area information and a flag indicating the old and new address in the dictionary can be obtained at the same time. The 7-digit town area division number is used in the address display number range dictionary 117 in order to determine the possible range of the numerical portion of the address display number in the subsequent processing.
It is used as an index when searching for and as a part of the control information for sorting the mail by the sorting unit 105. Further, the new and old address notation flags are used as an index for limiting the number of words to be matched from the address display number word dictionary in order to perform address display number matching. The town area information, town area division number, and new and old address notation flag are obtained by the town area collation processing in step 125 described above.

【００３７】ステップ１２６では，知識処理部１１０の
中の住所表示番号照合部１１２が住所表示番号辞書群１
１４を用いて，ステップ１２４で得られた候補文字ラテ
ィスから住所表示番号を抽出する。ステップ１２６の詳
細を図３，図５〜図１１を用いて詳細に説明する。図３
においてステップ１３２，１３５が本発明の特徴となっ
ている。In step 126, the address display number collation unit 112 in the knowledge processing unit 110 causes the address display number dictionary group 1
14 is used to extract the address display number from the candidate character lattice obtained in step 124. Details of step 126 will be described in detail with reference to FIGS. 3 and 5 to 11. FIG.
In steps 132 and 135, the feature of the present invention is.

【００３８】ステップ１３０では，図２のステップ１２
４で生成した候補文字ラティスを住所表示番号照合部１
１２に入力する。ステップ１３１では，候補文字ラティ
スの住所表示番号が書かれた領域の先頭の記入文字番号
を検出する。これは図２のステップ１２５で行った町域
照合により，町域の書いてある領域の終わりが検出でき
るので，それを利用する。ステップ１３２では，ステッ
プ１３０で得られた候補文字ラティスから，ワイルドカ
ードラティスを生成する。ここで，ワイルドカードラテ
ィスとは住所表示番号の照合を行うために，候補文字ラ
ティス中の数字を任意の数字を表すワイルドカードで置
き換えたラティスである。In step 130, step 12 in FIG.
Address display number collation unit 1
Enter in 12. In step 131, the entry character number at the head of the area in which the address display number of the candidate character lattice is written is detected. This is utilized because the end of the written area of the town area can be detected by the town area collation performed in step 125 of FIG. In step 132, a wildcard lattice is generated from the candidate character lattice obtained in step 130. Here, the wildcard lattice is a lattice in which the numbers in the candidate character lattice are replaced with wildcards representing arbitrary numbers in order to match the address display numbers.

【００３９】ステップ１３２の詳細を図６，図７，図８
を用いて説明する。図７はワイルドカードラティスを生
成するために用いる変換テーブルであり，以下に詳細を
述べる。Details of step 132 are shown in FIG. 6, FIG. 7 and FIG.
Will be explained. FIG. 7 is a conversion table used to generate the wild card lattice, which will be described in detail below.

【００４０】分類における「数字」のテーブルは，丁
目，街区，住居表示番号を表す数字に関わるテーブルで
あり，候補文字ラティス中に任意の算用数字と漢数字の
候補文字があれば，それぞれ「ｎ」，「ｋ」というワイ
ルドカードに変換される。The table of "numerals" in the classification is a table relating to numerals representing chome, block, and house display number. Converted to wildcards "n" and "k".

【００４１】「区切り文字」のテーブルは，例えば住所
中に「３丁目８−１」とある場合に，丁目や街区の数字
同士を区切るために使用される文字「丁目」や「−」に
関するテーブルである。ここでは「丁」，「目」，
「番」，「地」，「の」，「ノ」については変換せず
に，そのままの文字を使用する。一方「−」，「〜」，
「／」は「−」というワイルドカードに変換される。The "delimiter" table is a table relating to the characters "chome" and "-" used to separate the numbers of the chome and the block when, for example, "3 chome 8-1" appears in the address. Is. Here, "Ding", "eyes",
"Number", "ground", "no", and "no" are not converted and the characters are used as they are. On the other hand, "-", "~",
"/" Is converted to a wildcard "-".

【００４２】「その他」のテーブルは，「数字」テーブ
ル，「区切り文字」テーブル以外の文字に関するテーブ
ルで，上記で述べた以外の文字は全て「ｅ」というワイ
ルドカードに変換される。すなわち「その他」というの
は丁目，街区，住居表示番号に関わる文字以外の全てを
指す。The "other" table is a table relating to characters other than the "numeric" table and the "delimiter" table, and all the characters other than those mentioned above are converted into a wildcard "e". That is, "other" refers to all but the characters related to the chome, block, and house display number.

【００４３】図８は，図７の変換テーブルを用いて，図
６で示す候補文字ラティスから，ワイルドカードラティ
スを生成した例である。ワイルドカードラティスは図８
(a)で示すワイルドカードテーブルと，図８(b)で示すコ
ストテーブルの２つからなる。FIG. 8 shows an example in which a wild card lattice is generated from the candidate character lattice shown in FIG. 6 using the conversion table shown in FIG. Wild card lattice is Figure 8
The wild card table shown in (a) and the cost table shown in FIG. 8 (b) are used.

【００４４】図８のワイルドカードラティスの生成は，
図６(a)の候補文字テーブルにおいて候補文字の順位の
高いほうから行う。まず第１位の候補文字をワイルドカ
ードに変換して図８(a)で示すワイルドカードテーブル
の第１位の場所に書き込む。それと同時に変換された候
補文字に対応するコストを，図８(b)のコストテーブル
の該当する場所に書き込む。The generation of the wild card lattice shown in FIG.
In the candidate character table of FIG. 6A, the candidate characters are ranked in descending order. First, the first-ranked candidate character is converted into a wildcard and written in the first-ranked place of the wildcard table shown in FIG. At the same time, the cost corresponding to the converted candidate character is written in the corresponding place in the cost table of FIG. 8 (b).

【００４５】次に，第２位の候補文字を調べ，それが第
１位の候補文字と同じワイルドカードに属するなら，重
複するのでそれを省略する。もし違うワイルドカードに
属するのであれば，そのワイルドカードと対応するコス
トを，それぞれ図８(a)のワイルドカードテーブル，図
８(b)のコストテーブルに書き込む。以下，全ての候補
文字について同じことを繰り返す。Next, the second-ranked candidate character is examined, and if it belongs to the same wildcard as the first-ranked candidate character, it is omitted because it is duplicated. If they belong to different wild cards, the costs corresponding to the wild cards are written in the wild card table of FIG. 8A and the cost table of FIG. 8B, respectively. Hereinafter, the same thing is repeated for all the candidate characters.

【００４６】例えば，図６(a)の候補文字テーブルにお
ける記入文字番号１２の列の変換を考える。まず，第１
位の候補文字「８」をワイルドカード「ｎ」に変換し
て，図８(a)のワイルドカードテーブルにおける記入文
字番号１２の第１位の場所に書き込む。それと同時に，
候補文字「８」に対応するコスト「０」を，図８(b)で
示すコストテーブルの該当する場所に書き込む。For example, consider conversion of the column of the entry character number 12 in the candidate character table of FIG. 6 (a). First, the first
The candidate character "8" for the rank is converted into a wild card "n" and written in the first place of the entry character number 12 in the wild card table of FIG. 8 (a). At the same time,
The cost "0" corresponding to the candidate character "8" is written in the corresponding place of the cost table shown in FIG. 8 (b).

【００４７】次に，第２位を見ると候補文字「３」は同
じワイルドカード「ｎ」に属するので，省略して図８
(b)のワイルドカードテーブルには何も書かない。更
に，第３位を見ると候補文字「日」は「ｎ」と違うワイ
ルドカード「ｅ」に属するので，図８(a)のワイルドカ
ードテーブルの中で空いている第２位の場所にそれを書
き込む。それと共に，図８(b)コストテーブルの対応す
る場所に候補文字「日」のコスト「２」を書き込む。以
下，同様の処理を全ての順位の候補文字に対して行う。Next, looking at the second place, since the candidate character "3" belongs to the same wildcard "n", it is omitted in FIG.
Nothing is written in the wildcard table in (b). Further, looking at the 3rd place, the candidate character "day" belongs to the wildcard "e" different from "n", so it is placed in the 2nd place that is vacant in the wildcard table of FIG. 8 (a). Write. At the same time, the cost "2" of the candidate character "day" is written in the corresponding location of the cost table of FIG. 8 (b). Hereinafter, the same processing is performed on the candidate characters of all ranks.

【００４８】以上のステップ１３２の処理により，ステ
ップ１３０で得られた候補文字ラティスから，ワイルド
カードラティスが生成される。Through the above processing of step 132, a wildcard lattice is generated from the candidate character lattice obtained in step 130.

【００４９】ステップ１３３では，ステップ１３２で生
成したワイルドカードラティスから，単語照合を行うた
めのオートマトンを生成する。ステップ１３３の詳細を
図９を用いて説明する。In step 133, an automaton for word matching is generated from the wild card lattice generated in step 132. Details of step 133 will be described with reference to FIG.

【００５０】図９は辞書から単語を取り出し，オートマ
トンを用いて住所表示番号単語の照合を行う過程を示し
たものである。まず，オートマトン１９１について説明
する。住所表示番号パターンを表す辞書単語とワイルド
カードラティスの照合を行うために，図８で示すワイル
ドカードラティスから有限オートマトン１９１を生成す
る。そして，オートマトン１９１は文字列として表した
辞書単語を順次入力し，その単語の平均コストはいくら
になるかを計算する。FIG. 9 shows a process of extracting a word from a dictionary and matching address display number words by using an automaton. First, the automaton 191 will be described. A finite state automaton 191 is generated from the wild card lattice shown in FIG. 8 in order to match the dictionary word representing the address display number pattern with the wild card lattice. Then, the automaton 191 sequentially inputs the dictionary words represented as a character string, and calculates what the average cost of the words is.

【００５１】オートマトン１９１において丸印は状態を
示し，その中に書かれた数字は状態番号を示す。加え
て，状態と状態の間が単語の各記入文字位置に対応し，
状態間の線は遷移経路を示す。遷移経路上の左側の文字
は，ある状態でオートマトンにその文字が入力された時
に，その遷移経路を辿って次の状態に遷移することを示
す。また，遷移経路上で「other」は遷移経路に対応す
る文字として明示されたもの以外の全ての文字を表す。
遷移経路上の［］内の数字は，その経路を辿って遷移し
た時に有するコストである。In the automaton 191, a circle indicates a state, and a number written therein indicates a state number. In addition, between states corresponds to each written character position of the word,
Lines between states indicate transition paths. The character on the left side of the transition path indicates that when the character is input to the automaton in a certain state, the transition path is followed to transition to the next state. Further, "other" on the transition path represents all characters other than those explicitly specified as the characters corresponding to the transition path.
The number in brackets [] on the transition route is the cost of having a transition along the route.

【００５２】例として，オートマトン１９１を用いて，
単語「ｎ丁目ｎ−ｎｅ」１９０が入力された時のコスト
計算を考える。まず，状態１から状態２に遷移するとき
に「ｎ」のコスト［０］が加算され，状態２から状態３
に遷移するときは「丁」のコスト［１］が加算され，以
下同様に遷移が進んでいく。そして，単語の文字数分の
遷移が全て終わった後，積算されたコストを単語の文字
数で割ることでその単語の平均コスト１９２が得られ
る。As an example, using the automaton 191,
Consider the cost calculation when the word "n-chome n-ne" 190 is entered. First, when the state 1 transits to the state 2, the cost [0] of “n” is added, and the state 2 to the state 3 are added.
When transitioning to, the cost [1] of "Ding" is added, and the transition proceeds in the same manner. Then, after all the transitions of the number of characters of the word are completed, the integrated cost is divided by the number of characters of the word to obtain the average cost 192 of the word.

【００５３】次に，図８のワイルドカードラティスから
の有限オートマトン１９１の生成を説明する。まず，図
９のオートマトン１９１の状態１と状態２の間の遷移経
路を生成し，図８(a)のワイルドカードテーブルの記入
文字番号１２の候補文字をそれぞれ割り当てる。それと
共に，候補文字に対応する図８(b)で示すコストテーブ
ルのコストを，同様に状態１と状態２の間の遷移経路に
それぞれ割り当てる。次に，明示されたもの以外の全て
の文字を表す遷移経路として「other」を生成し，さら
にそのコストを１５とする。以下，同様に状態と遷移経
路をワイルドカードラティスから次々に生成していく。
ここで，遷移経路のコストはコストは必ずしも上記の値
にする必要はなく，任意の数字でよい。Next, the generation of the finite state automaton 191 from the wildcard lattice of FIG. 8 will be described. First, a transition path between the state 1 and the state 2 of the automaton 191 of FIG. 9 is generated, and the candidate character of the entry character number 12 of the wildcard table of FIG. At the same time, the costs in the cost table shown in FIG. 8B corresponding to the candidate characters are similarly assigned to the transition paths between the states 1 and 2, respectively. Next, "other" is generated as a transition path that represents all characters other than the specified one, and the cost is set to 15. Similarly, states and transition paths are generated one after another from the wild card lattice.
Here, the cost of the transition route does not necessarily have to be the above value, and may be an arbitrary number.

【００５４】以上のステップ１３３の処理により，ワイ
ルドカードラティスからオートマトン１９１が生成され
る。By the processing of the above step 133, the automaton 191 is generated from the wild card lattice.

【００５５】ステップ１３４では，ステップ１３３で生
成したオートマトン１９１と図１の住所表示番号単語イ
ンデックス辞書１１５，住所表示番号単語辞書１１６を
用いて，住所表示番号単語のオートマトン照合を行う。
ステップ１３４の詳細を図８，図９，図１０，図１１を
用いて説明する。図１０はオートマトン照合処理の流れ
を示したＰＡＤである。図１１は住所表示番号単語辞
書，住所表示番号単語インデックス辞書の構成を示した
図である。In step 134, the automaton matching of the address display number word is performed using the automaton 191 generated in step 133 and the address display number word index dictionary 115 and the address display number word dictionary 116 of FIG.
Details of step 134 will be described with reference to FIGS. 8, 9, 10, and 11. FIG. 10 is a PAD showing the flow of the automaton matching process. FIG. 11 is a diagram showing the configurations of the address display number word dictionary and the address display number word index dictionary.

【００５６】まず，図１１の辞書の構成を説明する。住
所表示番号単語辞書１１６はワイルドカードを用いて丁
目，街区，住居表示番号の表記パターンを表した単語，
およびその検索情報や属性を格納した辞書である。丁
目，街区，住居表示番号の表記パターンを表した単語と
しては，例えば「ｎ丁目ｎ−ｎｅ」，「ｎ｜ｎ｜ｎｅ」
等が格納されている。住所表示番号単語インデックス辞
書１１５は，照合を行うために必要な単語を住所表示番
号単語辞書から選択的に検索するためのインデックスを
格納した辞書である。インデックスは，辞書単語の第１
文字目，第２文字目の文字を使用する。First, the structure of the dictionary shown in FIG. 11 will be described. The address display number word dictionary 116 is a word representing a notation pattern of chome, block, and house display number using wild cards,
And a dictionary that stores the search information and attributes thereof. For example, “n-chome n-ne” and “n | n | ne” can be used as words that represent the notation patterns of chome, block, and house display number.
Etc. are stored. The address display number word index dictionary 115 is a dictionary that stores an index for selectively retrieving a word necessary for matching from the address display number word dictionary. The index is the first of the dictionary words
The first and second characters are used.

【００５７】住所表示番号単語辞書１１６は，街区の表
記パターンを表す単語２２８，単語の第２文字目が同一
の文字を持つ単語間の相対アドレス２２９，新旧住所表
記を示すフラグ２３０，縦横書きを示すフラグ２３１か
らなる。The address display number word dictionary 116 includes a word 228 representing a block notation pattern, a relative address 229 between words having the same second character as the second character, a flag 230 indicating old and new address notation, and vertical and horizontal writing. The flag 231 indicates.

【００５８】新旧住所表記を示すフラグ２３０は，街区
表記を表す単語が新住所表記の表記パターンであれば
１，旧住所表記の表記パターンであれば２，どちらの住
所表記にも対応するのであれば３という数字が格納され
ている。縦横書きを示すフラグ２３１は，街区表記を表
す単語が横書きに属するのであれば１，縦書きに属する
のであれば２，縦横両方に属するのであれば３という数
字が格納されている。The flag 230 indicating the new and old address notation corresponds to either 1 if the word representing the block notation is the notation pattern of the new address notation or 2 if it is the notation pattern of the old address notation. The number 3 is stored. The flag 231 indicating vertical / horizontal writing stores a number of 1 if the word representing the block notation belongs to horizontal writing, 2 if it belongs to vertical writing, and 3 if it belongs to both vertical and horizontal writing.

【００５９】住所表示番号単語インデックス辞書１１５
は，１文字目インデックステーブル２２０，２文字目イ
ンデックステーブル２２４の二つのテーブルからなる。
１文字目インデックステーブル２２０は，辞書単語の第
１文字目の文字番号を格納したテーブル２２１，同一の
１文字目を持つ単語の数２２２，住所表示番号単語辞書
１１６へのポインタテーブル２２３からなる。２文字目
インデックステーブル２２４は，同様に辞書単語の２文
字目の文字番号を格納したテーブル２２５，同一の２文
字目を持つ単語の数２２６，住所表示番号単語辞書１１
６へのポインタテーブル２２７からなる。Address display number word index dictionary 115
Is composed of two tables, a first character index table 220 and a second character index table 224.
The first character index table 220 includes a table 221, which stores the character number of the first character of the dictionary word, the number 222 of words having the same first character, and a pointer table 223 to the address display number word dictionary 116. Similarly, the second character index table 224 is a table 225 storing the second character number of the dictionary word, the number 226 of words having the same second character, and the address display number word dictionary 11.
6 to the pointer table 227.

【００６０】次に，住所表示番号単語インデックス辞書
１１５を用いて，住所表示番号単語辞書１１６の単語を
検索する時の処理の流れを説明する。実線で表された矢
印は１文字目インデックステーブル２２０を用いて，第
１文字目が同一の文字である単語を検索するときの検索
の流れを示す。点線で表された矢印は２文字目インデッ
クステーブル２２４を用いて，２文字目が同一の文字で
ある単語を検索するときの検索の流れを示す。Next, the flow of processing when searching for words in the address display number word dictionary 116 using the address display number word index dictionary 115 will be described. The arrow shown by a solid line indicates a search flow when searching for a word in which the first character is the same character using the first character index table 220. The arrow indicated by the dotted line shows the flow of search when searching for a word in which the second character is the same character using the second character index table 224.

【００６１】例えば，単語の１文字目が「ｎ」である単
語を辞書から検索する場合は，１文字目インデックステ
ーブル２２０の第１文字目が同一の文字である単語の数
m1，および「ｎ」のポインタP1(1)を参照する。ポイン
タP1(1)には，住所表示番号単語辞書の中で１文字目が
「ｎ」で始まる単語の最初のポインタが格納されている
ので，その単語を参照する。単語辞書の中では１文字目
が同じ文字の単語は連続して並べてあるので，「ｎ」で
始まる最初の単語を見つけると，以下はポインタをイン
クリメントするだけで，次々に単語を検索することがで
きる。そして，m1回検索を行うと「ｎ」で始まる単語の
終わりになるので，そこで単語の検索を終了する。For example, when a word in which the first character of the word is "n" is searched from the dictionary, the number of words in which the first character of the first character index table 220 is the same character
The pointer P1 (1) of m1 and "n" is referred to. Since the pointer P1 (1) stores the first pointer of the word in which the first character starts with "n" in the address display number word dictionary, that word is referred to. In the word dictionary, words with the same first character are arranged in a row, so if you find the first word starting with "n", you can search words one after another by simply incrementing the pointer. it can. Then, when the search is performed m1 times, the end of the word starting with "n" is reached, and the word search is terminated at that point.

【００６２】単語の２文字目をインデックスとして検索
する場合も同様である。例えば，単語の２文字目が
「丁」である単語を辞書から検索する場合は，２文字目
インデックステーブル２２４の第１文字目が同一の文字
である単語の数q1，および「丁」のポインタP2(1)を参
照する。ポインタP2(1)には，住所表示番号単語辞書の
中で２文字目が「丁」の単語のポインタが格納されてい
るのでその単語を参照する。その後，単語の第２文字目
が同一の文字を持つ単語間の相対アドレス２２９を参照
してポインタをシフトすることで，第２文字目が同じ
「丁」の単語を検索することができる。そして，q1回検
索を行うと２文字目が同じ「丁」の単語の終わりになる
ので，そこで単語の検索を終了する。The same applies when the second character of a word is searched as an index. For example, when searching for a word in which the second character of the word is "Ding" from the dictionary, the number of words q1 in which the first character of the second character index table 224 is the same character, and a pointer of "Ding" See P2 (1). Since the pointer P2 (1) stores a pointer of a word whose second character is "Ding" in the address display number word dictionary, the word is referred to. After that, by referring to the relative address 229 between words in which the second character of the word has the same character and shifting the pointer, it is possible to retrieve the word of "Ding" having the same second character. Then, when the search is performed q1 times, the second character ends at the end of the word with the same "ding", so the word search ends there.

【００６３】ここで，図２のステップ１２３の文字切り
出し・文字認識で，住所が縦書きか横書きかが分かって
いるので，縦横書きを示すフラグ２３１を参照すること
で，検索した単語の中から該当する単語を絞り込んで取
り出すことができる。Here, since it is known whether the address is in vertical writing or horizontal writing in the character segmentation / character recognition in step 123 of FIG. 2, by referring to the flag 231 indicating vertical / horizontal writing, from among the searched words, You can narrow down and extract the relevant words.

【００６４】また，図２のステップ１２５の町域照合で
住所表示番号が新旧どちらの住所表記に属するのかが分
かっているので，新旧住所表記を示すフラグ２３０を参
照して，検索した単語の中から該当する単語を絞り込ん
で取り出すことができる。In addition, since it is known by the town area collation in step 125 of FIG. 2 which address notation the new or old address belongs to, the flag 230 indicating the old and new address notation is referred to, and You can narrow down and extract the corresponding words from.

【００６５】次に，ステップ１３４の具体的処理内容を
図１０の処理フロー，および図８，図９，図１１を用い
て説明する。Next, the specific processing contents of step 134 will be described with reference to the processing flow of FIG. 10 and FIGS. 8, 9 and 11.

【００６６】ステップ２００では，照合を行おうとする
住所に対して，図２のステップ１２３で得られた住所の
縦横書きを表すフラグ，ステップ１２５で得られた新旧
住所表記を表すフラグをセットする。ステップ２０２で
は，図８(a)のワイルドカードテーブルにおける記入文
字番号１２の第１位の候補文字を取り出し，図１１の１
文字目インデックステーブル２２０を参照して，辞書単
語数２２２，および単語辞書のポインタ２２３を取得す
る。ステップ２０４では，ポインタが指している先の単
語を検索し，この単語がステップ２００でセットした新
旧住所表記フラグ，および縦横書きフラグと矛盾がない
かを辞書の該当するテーブル２３０，２３１を参照して
チェックする。もし矛盾がなければステップ２０５に進
む。ステップ２０５では，検索した単語を図９のオート
マトン１９１に入力して，状態を遷移させながら平均コ
ストを計算する。ステップ２０３では，ステップ２０４
からステップ２０５の処理をステップ２０２で求めた辞
書単語の数だけ繰り返す。In step 200, a flag indicating vertical and horizontal writing of the address obtained in step 123 of FIG. 2 and a flag representing the old and new address notation obtained in step 125 are set for the address to be collated. At step 202, the first candidate character of the entry character number 12 in the wild card table of FIG.
By referring to the character index table 220, the dictionary word number 222 and the word dictionary pointer 223 are acquired. In step 204, the word pointed to by the pointer is searched, and whether the word is consistent with the old and new address notation flags set in step 200 and the vertical and horizontal writing flags is referred to by the corresponding tables 230 and 231 in the dictionary. To check. If there is no contradiction, go to step 205. In step 205, the retrieved word is input to the automaton 191 of FIG. 9 and the average cost is calculated while transitioning the states. In step 203, step 204
The process from step 205 to step 205 is repeated by the number of dictionary words obtained in step 202.

【００６７】ステップ２０１では，図８(a)における記
入文字番地１２の次順位の候補文字を取り出し，ステッ
プ２０２からステップ２０５の処理を記入文字番地１２
の候補文字数回繰り返す。ステップ２０７では，図８
(a)のワイルドカードテーブルの記入文字番号１３の第
１位の候補文字を取り出し，図１１の２文字目インデッ
クステーブル２２４を参照して，辞書単語数２２６，お
よび単語辞書のポインタ２２７を取得する。ステップ２
０９では，ポインタが指している先の単語を検索し，こ
の単語がステップ２００でセットした新旧住所表記フラ
グ，および縦横書きフラグと矛盾がないかを辞書の該当
するテーブルを参照してチェックする。もし矛盾がなけ
ればステップ２１０に進む。In step 201, the candidate character of the next rank of the entered character address 12 in FIG. 8A is taken out, and the processing in steps 202 to 205 is performed.
Repeat the candidate character several times. In step 207, FIG.
The first candidate character of the entry character number 13 in the wildcard table of (a) is taken out, and the number of dictionary words 226 and the pointer 227 of the word dictionary are acquired by referring to the second character index table 224 of FIG. . Step 2
At 09, the word pointed to by the pointer is searched, and it is checked by referring to the corresponding table in the dictionary whether this word is consistent with the old and new address notation flags set in step 200 and the vertical and horizontal writing flags. If there is no contradiction, go to step 210.

【００６８】ステップ２１０では，検索した単語を図９
のオートマトン１９１に入力して，状態を遷移させなが
ら平均コストを計算する。ステップ２０８では，ステッ
プ２０９からステップ２１０の処理をステップ２０７で
求めた辞書単語の数だけ繰り返す。ステップ２０６で
は，図８(a)の記入文字番地１３の次順位の候補文字を
取り出し，ステップ２０７からステップ２１０の処理を
記入文字番地１３の候補文字数回繰り返す。In step 210, the retrieved word is searched for in FIG.
To the automaton 191 to calculate the average cost while transitioning the states. In step 208, the processes of steps 209 to 210 are repeated for the number of dictionary words obtained in step 207. In step 206, the candidate character of the next rank of the entered character address 13 in FIG. 8A is taken out, and the processing from step 207 to step 210 is repeated for the number of candidate characters in the entered character address 13.

【００６９】ステップ２１１では，ステップ２００から
ステップ２１０の処理で求めた単語，および，そのコス
トを昇冪の順に並べ変える。ステップ２１２では，ステ
ップ２１１で並べ変えた単語の上位Ｌ（Ｌ＞１）個を選
択する。In step 211, the words obtained by the processing in steps 200 to 210 and their costs are rearranged in the order of ascending power. In step 212, the upper L (L> 1) words rearranged in step 211 are selected.

【００７０】以上のステップ２００からステップ２１２
の処理により，図３におけるステップ１３４のオートマ
トン照合が行われ，平均コストの小さい上位Ｌ個の単語
およびそのコストが得られる。Steps 200 to 212 above
By the processing of (1), the automaton collation of step 134 in FIG. 3 is performed, and the upper L words having a small average cost and the costs thereof are obtained.

【００７１】ステップ１３５では，ステップ１３４で得
られた上位Ｌ個のワイルドカードで表された単語につい
て，「ｎ」，「ｋ」の数字を表すワイルドカードを元の
数字に復元して，候補を生成する。ここではＬ＝１とし
て，図９の単語「ｎ丁目ｎ−ｎｅ」１９０を数字に復元
した結果を図１２に示す。数字復元の処理は，まず図６
に示す候補文字テーブル１６０と単語「ｎ丁目ｎ−ｎ」
１９０の位置合わせを行う。その後，数字「ｎ」に対応
する場所の候補数字，およびそのコストをそれぞれ図６
(a)候補文字テーブルから取り出して，実際の丁目や街
区を生成する。また，図６(b)コストテーブルから数字
に対応するコスト取り出して，復元した住所表示番号単
語に対するコストを積算していく。In step 135, for the words represented by the top L wildcards obtained in step 134, the wildcards representing the numbers "n" and "k" are restored to the original numbers and candidates are selected. To generate. Here, FIG. 12 shows a result of restoring the word “n-chome n-ne” 190 of FIG. 9 into a numeral with L = 1. Figure 6 shows the process of digit restoration.
Candidate character table 160 and the word “n-chome nn” shown in FIG.
190 is aligned. After that, the candidate number of the place corresponding to the number “n” and its cost are shown in FIG.
(a) Generate an actual chome or block by extracting it from the candidate character table. Further, the costs corresponding to the numbers are extracted from the cost table of FIG. 6 (b), and the costs for the restored address display number words are accumulated.

【００７２】ステップ１３６では，ステップ１３５で数
字に復元した住所表示番号単語の候補について，丁目，
街区，住居表示番号の数字部分を，図１の住所表示番号
範囲辞書１１７と矛盾がないかを判別する。ここで，住
所表示番号範囲辞書１１７は各町域について，丁目，街
区，住居表示番号の数字がそれぞれどの範囲を取りえる
かの範囲情報を階層的に格納した辞書である。住所表示
番号範囲辞書１１７の詳細を図１３を用いて説明する。At step 136, the address display number word candidates restored to numbers at step 135 are selected as follows.
It is determined whether or not there is a contradiction in the numerical portion of the block and residence display number with the address display number range dictionary 117 of FIG. Here, the address display number range dictionary 117 is a dictionary that hierarchically stores range information indicating which range of the chome, block, and house display number can be set for each town area. Details of the address display number range dictionary 117 will be described with reference to FIG.

【００７３】住所表示番号範囲辞書１１７は，インデッ
クステーブル２６０と住所表示番号範囲テーブル２６３
からなる。インデックステーブル２６０は，町域を識別
する町域区分番号テーブル２６１と，住所表示番号範囲
テーブル２６３へのポインタテーブル２６２からなる。
住所表示番号範囲テーブル２６３は，丁目の番号をイン
デックスとして格納した丁目テーブル２６４，街区の番
号をインデックスとして格納した街区テーブル２６５，
住居表示番号の最大値を格納した住居表示番号テーブル
２６６からなる。The address display number range dictionary 117 includes an index table 260 and an address display number range table 263.
Consists of The index table 260 includes a town area division number table 261 for identifying town areas and a pointer table 262 for an address display number range table 263.
The address display number range table 263 is a chome table 264 which stores chome numbers as indexes, and a block table 265 which stores clique numbers as indexes.
The house display number table 266 stores the maximum value of the house display number.

【００７４】次に，住所表示番号範囲を参照する時の処
理の流れを説明する。例えば，「東京都国分寺市西恋ヶ
窪」住所表示番号範囲を参照するには，まず図２のステ
ップ１２５で図５に示す町域単語辞書を用いて求めた
「東京都国分寺市西恋ヶ窪」に対応する町域区分番号
「１８５０００２」について，インデックステーブル２
６０の町域区分番号テーブル２６１を参照する。「１８
５０００２」に対応するポインタPaは，住所表示番号範
囲テーブル２６３の中で，「東京都国分寺市西恋ヶ窪」
の範囲データが格納されている領域の先頭を参照してい
る。その領域には，丁目テーブル２６４，街区テーブル
２６５をインデックスとして，住居表示番号の最大値が
住居表示番号テーブル２６６に格納されてある。そこ
で，該当する丁目，街区インデックスを検索すること
で，例えば「３丁目８番」の住居表示番号の最大値は９
まで，「東京都国分寺市西恋ヶ窪」の全ての領域を検索
することで，丁目の最大値は４までしかないことなどが
分かる。例えば，「東京都国分寺市西恋ヶ窪」の丁目が
４丁目までしかない，３丁目３番地が住居表示番号５ま
でしかないとすると，図１２の候補群は図１４で示す候
補に絞られる。以上のステップ１３６の処理により，住
所表示番号の各丁目，街区，住居表示番号の範囲の判定
が行われ，範囲外と判定された候補は図１２の候補群か
ら削除される。Next, the flow of processing when referring to the address display number range will be described. For example, in order to refer to the address display number range of "Nishi-Koigakubo, Kokubunji-shi, Tokyo", first, correspond to "Nishi-Koigakubo, Kokubunji-shi, Tokyo" that was obtained using the town area word dictionary shown in Fig. 5 in step 125 of Fig. 2. Index table 2 for town area classification number "1850002"
The town area classification number table 261 of 60 is referred to. "18
The pointer Pa corresponding to "50002" is "Nishikoigakubo, Kokubunji-shi, Tokyo" in the address display number range table 263.
The beginning of the area in which the range data is stored is referenced. In the area, the maximum value of the house display number is stored in the house display number table 266 using the chome table 264 and the block block table 265 as indexes. Therefore, by searching the corresponding chome and block index, for example, the maximum value of the house display number of "3-8" is 9
By searching all areas of "Nishi-Koigakubo, Kokubunji-shi, Tokyo", you can see that the maximum value of chome is only 4. For example, if there are only 4 chomes in "Nishi-Koigakubo, Kokubunji-shi, Tokyo" and 3 chomes have only house display numbers 5, the candidate group in FIG. 12 is narrowed down to the candidates shown in FIG. By the process of step 136 described above, the range of each chome of the address display number, the block, and the house display number is determined, and the candidates determined to be out of the range are deleted from the candidate group of FIG.

【００７５】ここで，住所表示番号範囲辞書は本実施例
に限るものではなく，例えば，住居表示番号部分は上限
値のみではなく，下限値も同時に持たせてもよい。ま
た，駐車場の住居表示番号などを除いた，実際に郵便配
達の対象となる住居表示番号のみを全て登録してもよ
い。Here, the address display number range dictionary is not limited to this embodiment. For example, the house display number portion may have not only the upper limit value but also the lower limit value at the same time. Alternatively, all the house display numbers actually targeted for mail delivery may be registered, excluding the house display numbers of parking lots.

【００７６】ステップ１３７では，ステップ１３６で絞
り込んだ候補からコストの小さいＰ（Ｐ＞１）個の候補
を住所表示番号照合結果として選択する。ここでは，Ｐ
＝２として図１４の「３丁目８−１」，「３丁目３−
１」が照合結果として選択される。In step 137, P (P> 1) candidates having a small cost are selected from the candidates narrowed down in step 136 as the address display number collation result. Here, P
= 2, "3 chome 8-1" and "3 chome 3-" in FIG.
"1" is selected as the matching result.

【００７７】以上のステップ１３０からステップ１３７
までの処理により，図２のステップ１２７住所表示番号
照合が行われ，照合結果として住所表示番号の候補「３
丁目８−１」，「３丁目３−１」が得られる。Steps 130 to 137 above
By the processing up to step 127, the address display number collation in step 127 of FIG. 2 is performed, and as a collation result, the address display number candidate “3” is displayed.
Chome 8-1 "and" 3 chome 3-1 "are obtained.

【００７８】ステップ１２７では，ステップ１２５の町
域照合で得られた町域候補「東京都国分寺市西恋ヶ窪」
と，ステップ１２６の住所表示番号照合で得られた住所
表示番号の候補「３丁目８−１」，「３丁目３−１」を
つないで住所候補を生成する。この例では，「東京都国
分寺市西恋ヶ窪３丁目８−１」，「東京都国分寺市西恋
ヶ窪３丁目３−１」が得られる。さらに，この住所情報
を用いて図１における区分部１０４を制御する制御情報
を生成する。At step 127, the town area candidate “Nishi-Koigakubo, Kokubunji-shi, Tokyo” obtained by the town area collation at step 125 is obtained.
Then, the address display number candidates “3 chome 8-1” and “3 chome 3-1” obtained by the address display number collation in step 126 are connected to generate an address candidate. In this example, "3-8-1, Nishikoigakubo, Kokubunji-shi, Tokyo" and "3-3-1 Nishikoigakubo, Kokubunji-shi, Tokyo" are obtained. Further, using this address information, control information for controlling the sorting unit 104 in FIG. 1 is generated.

【００７９】本発明の第二の実施例を図１，図２，図
９，図１５を用いて説明する。ここでは，７桁の町域区
分番号が宛名に印刷されている時に，町域照合により町
域情報が得られなかった場合を考える。A second embodiment of the present invention will be described with reference to FIGS. 1, 2, 9 and 15. Here, consider the case where the town area information cannot be obtained by the town area collation when the 7-digit town area classification number is printed on the address.

【００８０】図２において，ステップ１２０からステッ
プ１２２までは，第１の実施例と同様な処理を行う。In FIG. 2, steps 120 to 122 are the same as those in the first embodiment.

【００８１】ステップ１２３では，第一の実施例と同様
に文字認識部１０９がステップ１２２で得られた文字行
の画像から，１文字毎に文字を認識して文字コードに変
換する。ただし，ここでは住所情報だけでなく宛名に印
刷されている町域区分番号も認識して文字コードに変換
する。ステップ１２４では，第一の実施例と同様に町域
照合部１１１がステップ１２３で得られた文字認識結果
を基に候補文字ラティスを生成する。ステップ１２５で
は，第一の実施例と同様に町域照合部１１１が町域単語
辞書１１３を用いて町域照合を行なう。ただし，本実施
例では町域照合の結果，町域情報および町域情報に対応
する新旧住所表記情報を抽出できなかった場合を想定す
る。ステップ１２６では，図１の知識処理部１１０の中
の住所表示番号照合部１１２が住所表示番号辞書群１１
４を用いて，ステップ１２４で得られた候補文字ラティ
スから住所表示番号を抽出する。ステップ１２６の詳細
を図９，図１５を用いて詳細に説明する。図１５におい
てステップ３０２，３０４が本発明の特徴となってい
る。In step 123, the character recognition unit 109 recognizes each character from the image of the character line obtained in step 122 and converts it into a character code, as in the first embodiment. However, here, not only the address information but also the town area classification number printed on the address is recognized and converted into a character code. In step 124, the town area matching unit 111 generates a candidate character lattice based on the character recognition result obtained in step 123, as in the first embodiment. In step 125, the town area matching unit 111 uses the town area word dictionary 113 to perform town area matching, as in the first embodiment. However, in the present embodiment, it is assumed that, as a result of the town area collation, the town area information and the new and old address notation information corresponding to the town area information cannot be extracted. In step 126, the address display number collation unit 112 in the knowledge processing unit 110 of FIG.
4, the address display number is extracted from the candidate character lattice obtained in step 124. Details of step 126 will be described in detail with reference to FIGS. 9 and 15. In FIG. 15, steps 302 and 304 are characteristic of the present invention.

【００８２】ステップ３００では，図２のステップ１２
４で生成した候補文字ラティスを入力する。In step 300, step 12 in FIG.
Input the candidate character lattice generated in 4.

【００８３】ステップ３０１では，ステップ１３０で得
られた候補文字ラティスから，第１の実施例と同様な方
法でワイルドカードラティスを生成する。ただし，町域
情報が得られないために住所表示番号の先頭が検出でき
ないので，住所の先頭からワイルドカードラティスを生
成する。ステップ３０２では，ワイルドカードラティス
から数字の候補が含まれる記入文字番号を全て検出す
る。すなわちワイルドカードテーブルで「ｎ」や「ｋ」
が含まれる記入文字番号を抽出する。ステップ３０３で
は，ステップ３０１で生成したワイルドカードラティス
から第１の実施例と同様な方法により，記入文字番号１
から単語照合を行うためのオートマトンを生成する。そ
して，例えば図９のオートマトン１９１が得られたとす
る。ここでは住所表示番号部分のオートマトンのみを表
示している。In step 301, a wild card lattice is generated from the candidate character lattice obtained in step 130 by the same method as in the first embodiment. However, since the beginning of the address display number cannot be detected because the town area information cannot be obtained, the wild card lattice is generated from the beginning of the address. In step 302, all entered character numbers including candidate numbers are detected from the wild card lattice. That is, "n" or "k" in the wildcard table
Extract the entry character number that includes. In step 303, the entered character number 1 is obtained from the wild card lattice generated in step 301 by the same method as in the first embodiment.
Generate an automaton for word matching from. Then, for example, it is assumed that the automaton 191 of FIG. 9 is obtained. Here, only the automaton of the address display number part is displayed.

【００８４】ステップ３０４では，ステップ３０３で生
成したオートマトン１９１と図１の住所表示番号単語イ
ンデックス辞書１１５，住所表示番号単語辞書１１６を
用いて，住所表示番号単語のオートマトン照合を行う。
ただし，第１の実施例の方式と違う点は，生成したオー
トマトンの状態数をＫ（Ｋ＞１）とすると，Ｋ個の切断
点で切断し，各切断点から始まる後部の部分オートマト
ンに対して第１の実施例の単語照合を行うことである。
例えば，図９では状態番号１から始まるオートマトンに
対して単語照合を行っていたが，それを状態番号２，
３，・・・から始まるオートマトンに対しても，同様な
単語照合を行う。これにより，任意の位置に存在する単
語を抽出することができる。また，辞書から単語を選択
する場合は，図２のステップ１２５で新旧住所表記の属
性が得られないので，辞書にある新旧住所表記のフラグ
を見ないで単語を選択する。In step 304, the automaton 191 generated in step 303 and the address display number word index dictionary 115 and the address display number word dictionary 116 shown in FIG.
However, the difference from the method of the first embodiment is that, assuming that the number of states of the generated automaton is K (K> 1), it cuts at K cutting points, and for the rear partial automaton starting from each cutting point. The word matching of the first embodiment is performed.
For example, in FIG. 9, word matching was performed on the automaton starting from state number 1, but it was changed to state number 2,
Similar word matching is performed for automata starting from 3 ,. As a result, the word existing at an arbitrary position can be extracted. When selecting a word from the dictionary, since the attributes of the old and new address notation cannot be obtained in step 125 of FIG. 2, the word is selected without looking at the flag of the old and new address notation in the dictionary.

【００８５】ステップ３０５では，第１の実施例と同様
な方法により，ステップ３０４で得られた上位Ｌ（Ｌ＞
１）個のワイルドカードで表された単語について，
「ｎ」，「ｋ」の数字を表すワイルドカードを元の数字
に復元して，住所表示番号の候補を生成する。At step 305, the upper L (L> L) obtained at step 304 is processed by the same method as in the first embodiment.
1) For each word represented by wildcards,
Wild cards representing the numbers "n" and "k" are restored to the original numbers to generate candidates for address display numbers.

【００８６】ステップ３０６では，第１の実施例と同様
な方法によりステップ３０５で生成した候補から，コス
トの小さいＰ個の候補を住所表示番号照合結果として選
択する。以上のステップ３００からステップ３０６まで
の処理により，図２におけるステップ１２６の住所表示
番号照合が行われる。In step 306, P candidates having a small cost are selected as the address display number collation result from the candidates generated in step 305 by the same method as in the first embodiment. The address display number collation of step 126 in FIG. 2 is performed by the above processing from step 300 to step 306.

【００８７】ステップ１２７では，ステップ１２３で得
られた町域区分番号とステップ１２６で得られた住所表
示番号を併せて図１の区分部１０４を制御する制御情報
を生成する。In step 127, the town area division number obtained in step 123 and the address display number obtained in step 126 are combined to generate control information for controlling the division section 104 in FIG.

【００８８】以上ステップ１２０からステップ１２７の
処理により，町域照合により住所表示番号領域の先頭が
見つからなかった場合でも，区分部１０４を制御する制
御情報を得ることが可能になる。By the processing from step 120 to step 127, it is possible to obtain the control information for controlling the sorting unit 104 even when the head of the address display number area is not found by the town area collation.

【００８９】本発明の第三の実施例を図２，図１６，図
１７，図１８を用いて説明する。A third embodiment of the present invention will be described with reference to FIGS. 2, 16, 17, and 18.

【００９０】図１６において，本発明の特徴は不読修正
部３１０と住所表示番号範囲辞書１１７である。図１６
の装置の動作を図１７を用いて説明する。In FIG. 16, the feature of the present invention is the non-reading correction section 310 and the address display number range dictionary 117. FIG.
The operation of the device will be described with reference to FIG.

【００９１】ステップ３４０からステップ３４５まで
は，第１の実施例における図２のステップ１２０から１
２５までとそれぞれ同様な処理を行う。ステップ３４６
では，第１の実施例と同様な方法により住所表示番号照
合を行う。ただし，照合を行った結果，候補が得られな
かった場合を考える。ステップ３４７では，図１６の制
御部１０７が不読文字修正部３１０に知識処理部１１０
の結果を送り，不読文字修正部３１０においてオペレー
タの入力作業により住所表示番号を入力する。図１８は
入力作業のための表示画面の例である。３６０は図１６
の画像入力部１０２で取り込んだ郵便物の宛名画像であ
る。３６１は知識処理部１１０の結果であり，住所表示
番号の候補がなかったので数値に対応する部分は「？」
で示されている。３６２，３６３，３６４は，住所表示
番号の中で，それぞれ丁目，街区，住居表示番号の数値
をオペレータが入力するための枠である。ここで，オペ
レータが入力した数値を知識処理部１１０が住所表示番
号範囲辞書１１７を用いてその値が正しい範囲内に入っ
ているかを判定する。判定の結果，もし範囲外となった
場合は，オペレータにその旨を表示する。以上のステッ
プ３４７の処理により，住所表示番号が入力される。Steps 340 to 345 correspond to steps 120 to 1 in FIG. 2 in the first embodiment.
The same processing as that up to 25 is performed. Step 346
Then, the address display number is collated by the same method as in the first embodiment. However, consider the case where no candidate is obtained as a result of the matching. In step 347, the control unit 107 of FIG.
The address display number is input by the operator's input work in the unreadable character correction unit 310. FIG. 18 is an example of a display screen for input work. FIG. 16 shows 360.
It is an address image of a mail item captured by the image input unit 102 of FIG. 361 is a result of the knowledge processing unit 110. Since there is no address display number candidate, the part corresponding to the numerical value is "?".
Indicated by. Numerals 362, 363, and 364 are frames for the operator to input the numerical values of the chome, block, and house display number in the address display number, respectively. Here, the knowledge processing unit 110 uses the address display number range dictionary 117 to determine whether the value input by the operator is within the correct range. If the result of the judgment is that it is outside the range, the fact is displayed to the operator. The address display number is input by the processing in step 347 described above.

【００９２】ステップ３４８では，ステップ３４５で得
られた町域情報と，ステップ３４６で入力された住所表
示番号を結合して住所候補を生成する。以上，ステップ
３４０からステップ３４８の処理により，住所表示番号
照合で候補が得られなかった場合でも，装置の補助によ
りオペレータが正しい住所表示番号を入力することがで
き，正しい住所が得られる。In step 348, the town area information obtained in step 345 and the address display number input in step 346 are combined to generate an address candidate. As described above, by the processing from step 340 to step 348, even if no candidate is obtained by the address display number collation, the operator can input the correct address display number with the assistance of the apparatus, and the correct address can be obtained.

【００９３】上記実施例では，住所表示番号の例は新住
所表記を用いて説明したが，これは旧住所表記の住所表
示番号に対しても同様な処理が適用される。In the above embodiment, the example of the address display number has been described by using the new address notation, but the same processing is applied to the address display number of the old address notation.

【００９４】[0094]

【発明の効果】本発明は，次の５つの効果がある。第一
は，文字認識で正しい候補文字が全く挙がらなかった場
合でも，住所表示番号を認識できることである。本発明
では，住所表示番号の表記パターンを任意の数字を表す
ワイルドカードを用いて表現した辞書単語として保持し
ており，認識結果の候補文字群と単語のコストを計算し
て照合を行うことで，住所表示番号を認識することがで
きる。そのため，住所表示番号の一部の認識結果の候補
文字群に正しい候補が上がらなかった場合でも，それを
補間して住所表示番号を認識することができる。例え
ば，「３丁目８−１」の「丁」に対応する文字パターン
に対して，「丁」という文字が認識候補として上がらな
かった場合でも，それに対応する単語が全体としてコス
トが小さければ，「丁」を補間して住所番号を認識する
ことができる。The present invention has the following five effects. First, the address display number can be recognized even if no correct candidate character is found in the character recognition. In the present invention, the notation pattern of the address display number is held as a dictionary word expressed by using a wildcard that represents an arbitrary number, and the candidate character group of the recognition result and the cost of the word are calculated and collated. ， Can recognize the address display number. Therefore, even when a correct candidate does not appear in the candidate character group of the recognition result of a part of the address display number, the address display number can be recognized by interpolating the correct candidate. For example, even if the character "Ding" does not appear as a recognition candidate for the character pattern corresponding to "Ding" in "3rd chome 8-1", if the cost of the corresponding word is small as a whole, " The address number can be recognized by interpolating "Ding".

【００９５】加えて，住所表示番号の表記パターンを辞
書の形式で保持しているため，個別の関数を準備して処
理するより詳細に表記パターンを見ることができるの
で，誤認識を防ぐことができる。In addition, since the notation pattern of the address display number is held in the form of a dictionary, the notation pattern can be seen in more detail than preparing and processing individual functions, so that erroneous recognition can be prevented. it can.

【００９６】第二は，住所表示番号を詳細に調べること
ができるにも関わらず，高速に処理されることである。
まず，辞書から単語を検索するときに，各文字パターン
に対応する候補文字群をインデックスとして検索するた
めに，照合を行う単語数を減らすことができる。さら
に，各単語に新旧住所表記や縦横書きに対応する属性を
持たせいているので，予め認識しようとする住所表示番
号が新旧住所表記のどちらに属するか，あるいは縦横書
きのどちらであるかが分かっていれば，不必要な単語の
照合を防ぐことができ，高速な処理が可能となる。Second, although the address display number can be examined in detail, it is processed at high speed.
First, when a word is searched from a dictionary, a candidate character group corresponding to each character pattern is searched as an index, so that the number of words to be matched can be reduced. Furthermore, since each word has an attribute corresponding to the old and new address notation and vertical and horizontal writing, it is possible to know in advance whether the address display number to be recognized belongs to the new or old address notation or the vertical and horizontal writing. If so, unnecessary word collation can be prevented and high-speed processing becomes possible.

【００９７】第三は，辞書方式であるために，新しい表
記パターンが発生した場合は簡単に辞書に登録すること
ができ，メンテナンスが容易なことである。Thirdly, since it is a dictionary system, when a new writing pattern occurs, it can be easily registered in the dictionary and maintenance is easy.

【００９８】第四に，住所表示番号の数字部分について
階層的に数字の取り得る範囲を保持しておく住所表示番
号範囲辞書があるため，実際にありえない住所表示番号
の候補を除くことができる。Fourthly, since there is an address display number range dictionary that holds a range in which the numerical portion of the address display number can be hierarchically held, it is possible to exclude the address display number candidates that cannot actually exist.

【００９９】第五に，住所表示番号照合の結果，住所表
示番号候補が得られなかった場合に，オペレータが郵便
物の宛名領域の画像を見ながら，住所表示番号を正しく
入力できることである。住所表示番号の数値部の取り得
る範囲の値を階層的に保持する住所表示番号範囲辞書を
保持しているため，オペレータが入力した住所表示番号
の数値部が正しい範囲内に入っているかを，住所表示番
号範囲辞書を参照して判定することができる。判定の結
果，範囲外と判定された場合はオペレータに警告を行う
ため，オペレータによる入力ミスを防ぐことができる。Fifth, as a result of the address display number collation, if the address display number candidate is not obtained, the operator can correctly input the address display number while looking at the image of the address area of the mail. Since the address display number range dictionary that hierarchically holds the values of the range that can be taken by the numerical part of the address display number is stored, whether the numerical part of the address display number entered by the operator is within the correct range, It can be determined by referring to the address display number range dictionary. If the result of the determination is that it is out of the range, the operator is warned, so that an input error by the operator can be prevented.

【０１００】[0100]

[Brief description of drawings]

【図１】本発明における第１の実施例の装置の全体構成
図である。FIG. 1 is an overall configuration diagram of a device according to a first embodiment of the present invention.

【図２】本発明における第１の実施例の装置の処理全体
の流れを示す図である。FIG. 2 is a diagram showing the flow of the entire processing of the apparatus of the first embodiment of the present invention.

【図３】住所表示番号照合を行う処理の流れを示す図で
ある。FIG. 3 is a diagram showing a flow of processing for performing address display number matching.

【図４】町域照合を行うための候補文字ラティスの例を
示す図である。FIG. 4 is a diagram showing an example of a candidate character lattice for performing town area matching.

【図５】町域情報を格納した町域辞書の構成の例を示す
図である。FIG. 5 is a diagram showing an example of the configuration of a town area dictionary storing town area information.

【図６】住所表示番号領域の候補文字ラティスの例を示
す図である。FIG. 6 is a diagram showing an example of a candidate character lattice of an address display number area.

【図７】候補文字ラティスから住所表示番号照合を行う
ためのワイルドカードラティスを生成する変換テーブル
を示す図である。FIG. 7 is a diagram showing a conversion table for generating a wildcard lattice for performing address display number matching from candidate character lattices.

【図８】図７の変換テーブルを用いて，図６の候補文字
ラティスから生成したワイルドカードラティスの例を示
す図である。8 is a diagram showing an example of a wild card lattice generated from the candidate character lattice of FIG. 6 using the conversion table of FIG.

【図９】オートマトン単語照合の処理概要を示す図であ
る。FIG. 9 is a diagram showing an outline of processing of automaton word matching.

【図１０】オートマトン単語照合の処理の流れを示すＰ
ＡＤである。FIG. 10 is a flowchart P showing a flow of processing of automaton word matching.
It is AD.

【図１１】住所表示番号単語インデックス辞書，住所表
示番号単語辞書の構成を示す図である。FIG. 11 is a diagram showing configurations of an address display number word index dictionary and an address display number word dictionary.

【図１２】照合した結果の住所表示番号単語から数字部
分を復元した住所表示番号の候補の例を示す図である。FIG. 12 is a diagram showing an example of address display number candidates in which a numerical part is restored from the address display number word as a result of matching.

【図１３】住所表示番号範囲辞書の構成の例を示す図で
ある。FIG. 13 is a diagram showing an example of the configuration of an address display number range dictionary.

【図１４】図１２の住所表示番号の候補から，各数字の
部分について取り得る範囲を判定した後に残った候補を
示す図である。FIG. 14 is a diagram showing candidates remaining after determining a possible range for each numeral part from the candidates of the address display number of FIG. 12;

【図１５】本発明の第二の実施例における住所表示番号
照合処理の流れを示す図である。FIG. 15 is a diagram showing the flow of address display number matching processing in the second embodiment of the present invention.

【図１６】本発明における第三の実施例の装置の全体構
成図である。FIG. 16 is an overall configuration diagram of an apparatus according to a third embodiment of the present invention.

【図１７】本発明における第三の実施例の処理全体の流
れを示す図である。FIG. 17 is a diagram showing the overall flow of processing in a third embodiment of the present invention.

【図１８】不読となった住所表示番号を入力するための
画面の表示の例を示す図である。FIG. 18 is a diagram showing a display example of a screen for inputting an unread address display number.

[Explanation of symbols]

１００…郵便物，１０１…郵便物供給部，１０２…画像
入力部，１０３…遅延搬送部，１０４…区分部，１０５
…区分棚，１０６…住所認識部，１０７…制御部，１０
８…画像処理部，１０９…文字認識部，１１０…知識処
理部，１１１…町域照合部，１１２…住所表示番号照合
部，１１３…町域単語辞書，１１４…住所表示番号辞書
群，１１５…住所表示番号単語インデックス辞書，１１
６…住所表示番号単語辞書，１１７…住所表示番号範囲
辞書。100 ... Mail, 101 ... Mail supply section, 102 ... Image input section, 103 ... Delayed transport section, 104 ... Sorting section, 105
... Division shelf, 106 ... Address recognition unit, 107 ... Control unit, 10
8 ... Image processing unit, 109 ... Character recognition unit, 110 ... Knowledge processing unit, 111 ... Town area collation unit, 112 ... Address display number collation unit, 113 ... Town area word dictionary, 114 ... Address display number dictionary group, 115 ... Address display number word index dictionary, 11
6 ... Address display number word dictionary, 117 ... Address display number range dictionary.

───────────────────────────────────────────────────── フロントページの続き (72)発明者古賀昌史東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者影広達彦東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者寺本正人愛知県尾張旭市晴丘町池上１番地株式会社日立製作所オフィスシステム事業部内 (72)発明者渡辺成愛知県尾張旭市晴丘町池上１番地株式会社日立製作所オフィスシステム事業部内 (72)発明者藤澤浩道東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Masafumi Koga 1-280, Higashi Koikeku, Kokubunji, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (72) Tatsuhiko Kagehiro 1-280, Higashi Koikeku, Kokubunji, Tokyo Hitachi, Ltd. Central Research Laboratory (72) Inventor Masato Teramoto 1 Ikegami, Haruoka-cho, Owariasahi-shi, Aichi Hitachi Ltd. Office Systems Division (72) Inventor Shigeru Watanabe 1 Ikegami, Haruoka-cho, Owariasahi, Aichi Hitachi Ltd. Office Systems Division (72) Inventor Hiromichi Fujisawa 1-280, Higashi Koikekubo, Kokubunji City, Tokyo Inside Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

1. An image input by converting an image on a mail into an electric signal in an address reading device that detects character information on the mail and reads address information including town area information and address display number information. An input unit, a character recognition unit that cuts out and recognizes character information from the input image, and outputs a recognition candidate character group for each cut out character pattern; and a recognition candidate character group that is output from the character recognition unit. A town area collating means for recognizing a town area by collating with a town area dictionary storing town area information, and an address display number area detecting means for detecting the head of the address display number area based on the recognition result of the town area collating means. And a candidate in which the number in the candidate character group corresponding to each character pattern after the beginning of the address display number region is replaced with a wild card representing any number based on the output from the address display number region detection means. A wild card conversion means for converting to a character group, an address display number word dictionary that holds various notation patterns of the address display number expressed by the above wild card as a word, an output result of the wild card conversion means and an address display number word dictionary The address display number collating means for recognizing the notation pattern of the address display number and the output result of the address display number collating means
And a numeral restoring means for collating with the candidate character group output from the character recognizing means, restoring the numbers in the candidate character group replaced by the wild card, and outputting the candidates for the address display number. Address reading device.

2. The address reading device according to claim 1, wherein in addition to the character direction detecting means for detecting whether the character group representing the address information on the mail is vertical writing or horizontal writing, and the town area information. , The town area classification number for identifying the town area, the town area dictionary that stores the information whether the town area is the new address notation or the old address notation, and the recognition result and the address of the town area matching means Based on the notation dictionary, the address notation identifying means for identifying whether the town area is the new address notation or the old address notation, and the address indication in which each word has an attribute concerning the old and new address notation and an attribute concerning the vertical and horizontal writing A number word dictionary and a word search means for searching a word from the house display number word dictionary using the character code of the candidate character group as an index for the candidate character group of each character pattern at any position replaced by a wild card. ， Character direction Based on the output of the output means and the address notation identifying means, from the words searched by the word searching means, only the word in which the attributes of the old and new address notation and the vertical and horizontal writing match from the house display number word dictionary is read out. And an address reading device.

3. The address reading device according to claim 1, wherein when the head of the address display number area cannot be detected based on the recognition result of the town area collating means, the recognition result includes a numerical candidate. Based on the output from the head presumption means that assumes an arbitrary character pattern to be the head of the address display number area and the position assumed to be the head of the address display number area, the output result of the wildcard conversion means Address display number collating means for recognizing the notation pattern of the address display number by collating the words in the house display number word dictionary with an automaton, the output result of the address display number collating means, and the candidate output from the character recognizing means And a numeral restoring means for collating with the character group, restoring the numbers in the candidate character group replaced by the wild card, and outputting the candidates for the address display number. Address reading device, characterized in that.

4. An address reading device that detects character information on a postal matter, reads address information consisting of town area information and address display number information, and determines a range that can be taken by the numerical part of the address display number. Image input means for converting an image on an object into an electric signal and inputting it, and character recognizing means for cutting out character information from the input image and recognizing it and outputting a recognition candidate character group for each cut out character pattern. Based on the recognition result of the town area collating means for recognizing the town area by collating the recognition candidate character group output from the character recognizing means with the town area dictionary storing the town area information, the address Address display number area detection means for detecting the beginning of the display number area,
Based on the output from the address display number area detection means, the numbers in the candidate character group corresponding to each character pattern after the beginning of the address display number area are converted into a candidate character group in which a wildcard representing any number is replaced. The wildcard conversion means, the address display number word dictionary that holds various notation patterns of the address display number expressed by the wildcard as words, the output result of the wildcard conversion means, and the words of the address display number word dictionary are automata. Using the address display number collation means for recognizing the notation pattern of the address display number, the output result of the address display number collation means, and the candidate character group output from the character recognition means.
A number restoring means for restoring the numbers in the candidate character group replaced by the wild card and outputting the candidates for the address display number, and an address display number for hierarchically holding the value of the range of the numerical part of the address display number. With respect to the range dictionary and the candidate of the address display number output from the number restoration means, it is determined whether the value of the numerical part is within the range of the address display number by referring to the address display number range dictionary. Address display number range determining means for narrowing down candidates based on the address display number.

5. An address reading device that detects character information on a postal matter, reads address information consisting of town area information and address display number information, and determines the range that can be taken by the numerical part of the address display number. Image input means for converting an image on an object into an electric signal and inputting it, and character recognition means for cutting out character information from the input image for recognition and outputting a recognition candidate character group for each cut out character pattern And a recognition candidate character group output from the character recognition means, and a recognition candidate character group output from the character recognition means, which recognizes the town area by matching the recognition candidate character group output from the character recognition means with a town area dictionary storing town area information. Based on the recognition result of the town area collation means for recognizing the town area by collating with the town area dictionary storing the town area information, and the address display number area detection for detecting the head of the address display number area Means and , Converting the numbers in the candidate character group corresponding to each character pattern from the beginning of the address display number area to the candidate character group in which the wild card representing any number is replaced based on the output from the address display number area detecting means The wildcard conversion means, the house display number word dictionary that holds various notation patterns of the address display number expressed by the wildcard as words, the output result of the wildcard conversion means, and the words of the house display number word dictionary. If the address display number collating means for recognizing the notation pattern of the address display number by collating using the automaton and the output result of the address display number collating means cannot obtain the address display number candidate, the operator can A display device for entering the address display number while looking at the image of the address area, and a range of possible values for the numerical part of the address display number are displayed. A layered address display number range dictionary and a range determination means for determining whether the numerical part of the address display number input by the operator is within the correct range by referring to the address display number range dictionary, and the determination result. An address reading device comprising: means for issuing an alarm to an operator when it is determined to be out of the range.

6. An address reading method for detecting character information on a postal matter and reading address information including town area information and address display number information, converting an image on the postal matter into an electric signal, and inputting the converted signal. The character information is cut out from the input image and recognized, the recognition candidate character group is output for each cut out character pattern, and the output recognition candidate character group is compared with the town area dictionary storing the town area information. By recognizing the town area, the head of the address display number area is detected based on the recognition result, and the number in the candidate character group corresponding to each character pattern after the head of the address display number area is detected based on the detection result. ,
The result of wildcard conversion is provided with an address display number word dictionary that holds the various notation patterns of the address display numbers represented by the above wildcards as words by converting to a candidate character group that has been replaced with a wildcard that represents an arbitrary number. And the address display number word dictionary are matched using an automaton, the notation pattern of the address display number is recognized, the recognition result is compared with the candidate character group output from the character recognition means, and a wild card is used. The numbers in the replaced candidate character group are restored, and the candidates for the address display number are output.
An address reading method characterized by the above.