JPH0981679A

JPH0981679A - Optical character reader

Info

Publication number: JPH0981679A
Application number: JP7238827A
Authority: JP
Inventors: Kunio Miyata; 國男宮田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-09-18
Filing date: 1995-09-18
Publication date: 1997-03-28

Abstract

PROBLEM TO BE SOLVED: To eliminate reading impossibility and erroneous reading even when a slip is zigzag. SOLUTION: A mechanism part 1 carries and feeds the slip 2-2 corresponding to the instruction of a mechanism control part 5, an optical part 2 performs irradiation by a lamp 2-1 disposed at the read position of the carried slip 2-2 and images on the slip ch are formed through a lens 2-3 on the CCD sensor of a photoelectric conversion part 3. The photoelectric conversion part 3-1 amplifies and A/D converts the images on the slip formed on the CCD and stores the images of written characters in an image memory 4-1. The photoelectric conversion part 3-2 amplifies and A/D converts the images on the slip formed on the CCD and stores the images of character frames and the written characters in the image memory 4-2. A pre-processing part 26 calculates the inclination of an upper edge, corrects a read field from the inclination and segments a character. When rejection is performed, the address of the character frame of the rejected character is calculated and the character is segmented again corresponding to the address.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、光学式文字読取装
置（以下、ＯＣＲ(Optical Character Reader)と略
す。）における、帳票蛇行時の取得イメージに対する、
文字の切り出しのための文字位置の補正に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image acquired when a form is meandering in an optical character reader (hereinafter abbreviated as OCR (Optical Character Reader)).
The present invention relates to correction of character positions for cutting out characters.

【０００２】[0002]

【従来の技術】図２は、従来のＯＣＲの構成概念図であ
る。図３（Ａ），（Ｂ）は、図２中の機構部１及び光学
系２の構成例を示すＯＣＲの給紙・読取機構概念図であ
り、特に同図（Ａ）は平面図であり、同図（Ｂ）は側面
図である。以下、図２及び図３を参照しつつ、従来のＯ
ＣＲの動作の説明をする。図３に示すように、図２中の
機構部１中のホッパ１−１上に、ホッパ基準面１−２に
沿って帳票束１−３がセットされている。読取りを開始
すると、まず吸入ローラ１−４が降りて、帳票２−２を
１枚給紙する。複数枚同時に給紙されそうになると、反
転ローラ１−６にて２枚目以降の帳票２−２はホッパ１
−１に戻されるため、必ず１枚ずつ給紙されていく。給
紙ローラ１−５、反転ローラ１−６を通過した帳票２−
２は、走行路途中にある駆動ローラ１−７、１−８、１
−９他のローラ類により、読取位置Ｒに向けて搬送され
ていく。読取位置Ｒに配置されたランプ２−１により帳
票２−２が照射され、ミラ２−４により帳票のイメージ
が反射され、図２中のレンズ２−３により集光され、光
電変換部３のＣＣＤセンサ上に結像する。ＣＣＤセンサ
にて光電変換された帳票２−２のイメージは、イメージ
メモリ４に帳票２−２全面の多値情報（通常、８〜３２
程度）として格納される。2. Description of the Related Art FIG. 2 is a conceptual diagram of a conventional OCR. 3A and 3B are conceptual diagrams of the OCR sheet feeding / reading mechanism showing an example of the configuration of the mechanical unit 1 and the optical system 2 in FIG. 2, and in particular, FIG. 3A is a plan view. The same figure (B) is a side view. Hereinafter, referring to FIG. 2 and FIG. 3, the conventional O
The operation of the CR will be described. As shown in FIG. 3, the form bundle 1-3 is set on the hopper 1-1 in the mechanism unit 1 in FIG. 2 along the hopper reference surface 1-2. When reading is started, first, the suction roller 1-4 descends to feed one form 2-2. When multiple sheets are about to be fed at the same time, the reversing roller 1-6 causes the second and subsequent sheets 2-2 to be fed to the hopper 1.
Since it is returned to -1, it is always fed one by one. Form 2-passed through paper feed roller 1-5 and reversing roller 1-6
Reference numeral 2 designates drive rollers 1-7, 1-8, 1 located on the way of the traveling path.
-9 It is conveyed toward the reading position R by other rollers. The form 2-2 is illuminated by the lamp 2-1 arranged at the reading position R, the image of the form is reflected by the mirror 2-4, is condensed by the lens 2-3 in FIG. The image is formed on the CCD sensor. The image of the form 2-2 photoelectrically converted by the CCD sensor is stored in the image memory 4 as multivalued information (usually 8 to 32) of the entire form 2-2.
It is stored as.

【０００３】前処理部６では、イメージメモリ４の内容
を帳票フォーマット情報を元に、各読出フィールド毎に
切り出し、最終的には各文字単位のイメージとして切り
出して、特徴抽出部７へ送出する。特徴抽出部７では、
切り出したイメージの特徴量を算出し、識別部８では、
特徴量に基づいて、識別用辞書９を参照して、文字を識
別して、ＩＦ制御部１０を通して、上位ＷＳ（ワークス
テーション）へ出力する。共通制御部１１は、１枚の帳
票２−２の認識が終了すると、システムバス１２を介し
て、機構制御部５に対して、次の帳票を給紙するよう指
示する。機構部１では、上述したと同様にして次の帳票
を光学系２に給紙・搬送する。従来装置には、図３中の
ホッパ基準面１−２の延長上に、帳票走行路の基準面が
あり、アライナ機構（帳票を基準面に押し付ける役目を
する機構のこと）により帳票２−２を帳票走行路の基準
面に沿わせて走行させ、傾きのない帳票イメージを取得
していた。近年、装置の小型・軽量化、及び読取対象と
する帳票連量の低下（４０ｋｇ以下の薄い帳票への対応
他）が図られるようになり、アライナ機構の代わりに、
電位的に帳票の傾きを補正するようになった。このた
め、帳票走行路の基準面はなくなり、仮想的な走行路基
準面よりかなり離れたところに、図３に示すように帳票
走行路右端壁１−１０ともいうべきもの（サイドフレー
ムと称することもある）が設けられるようになった。走
行路の基準面がなく、またアライナ機構がないというこ
とは、薄い帳票がアライナ機構で基準面に押し付けられ
て折れたり、しわがよったりすることを防止する半面、
吸入ローラ１−４、給紙ローラ１−５の部分で何等から
理由で、傾いて給紙された帳票２−２は傾いたまま走行
路内を搬送されていくことになるという問題点を有する
ことになった。The preprocessing unit 6 cuts out the contents of the image memory 4 for each read field based on the form format information, and finally cuts out as an image of each character unit, and sends it to the feature extraction unit 7. In the feature extraction unit 7,
The feature amount of the cut out image is calculated, and the identifying unit 8
The character is identified by referring to the identification dictionary 9 based on the characteristic amount, and is output to the upper WS (workstation) through the IF control unit 10. When the recognition of one form 2-2 is completed, the common control unit 11 instructs the mechanism control unit 5 via the system bus 12 to feed the next form. The mechanical unit 1 feeds and conveys the next form to the optical system 2 in the same manner as described above. In the conventional device, the reference plane of the form running path is provided on the extension of the hopper reference plane 1-2 in FIG. 3, and the form 2-2 is provided by an aligner mechanism (a mechanism that presses the form against the reference plane). Was running along the reference plane of the form running path, and a form image without inclination was acquired. In recent years, it has become possible to reduce the size and weight of the device, and to reduce the amount of forms to be read (for thin forms of 40 kg or less, etc.). Instead of the aligner mechanism,
The inclination of the form has been corrected in terms of electric potential. For this reason, the reference plane of the form traveling road disappears, and it should also be called the right end wall 1-10 of the form traveling road at a place considerably distant from the virtual traveling road reference plane (referred to as a side frame). There is also). The fact that there is no reference surface for the road and there is no aligner mechanism means that thin forms are prevented from being pressed against the reference surface by the aligner mechanism and breaking or wrinkling.
For some reason, the suction roller 1-4 and the paper feed roller 1-5 have a problem that the form 2-2 that is inclined and fed is conveyed in the traveling path while being inclined. is what happened.

【０００４】図４（Ａ），（Ｂ）は、イメージメモリ４
内における帳票のイメージの概念図である。図４に示す
ように、図２中のイメージメモリ４は帳票に対応して、
仮想的に平面構造を有している。アドレスとしては、横
方向Ｘ（Ｘ０〜Ｘｎ）、縦方向Ｙ（Ｙ０〜Ｙｎ）が設け
られている。実際は、この平面構造のメモリを複数枚
（階調数）重ねた多値情報メモリであるが、ここでは、
分かり易くするためにＸ，Ｙの平面構造で説明する。図
４（Ａ）中のＡ１，Ｂ１，Ｃ１，Ｄ１は、帳票２−２の
４角、（Ｘ０，Ｙ０）は、帳票の上辺と左辺を基準とし
た原点、（ＸＡ１，ＹＡ１），（ＸＢ１，ＹＢ１）は、
Ａ１，Ｂ１の各アドレスである。図４（Ｂ）中のＡ２，
Ｂ２，Ｃ２，Ｄ２は、帳票２−２の４角、（Ｘ０，Ｙ
０）は、帳票の上辺と左辺を基準とした原点、（ＸＡ
２，ＹＡ２），（ＸＢ２，ＹＢ２）は、Ａ２，Ｂ２の各
アドレスである。図４（Ａ）は、上辺Ｕの傾きα１、下
辺の傾きβ１とした時、α１＝β１の場合を示す図であ
り、左上がりの傾きα１で読取位置Ｒに搬送されてきた
帳票２−２が下辺Ｄも上辺Ｕと同じ傾きα１で読取位置
Ｒを通過していった帳票のイメージを示している。この
場合は、上辺Ｕの読取フィールド（１１０と記入されて
いる）Ｆ１も下辺Ｄの読取フィールド（２２０と記入さ
れている）Ｆ２も、他のどの位置の読取フィールドも正
確に切り出せることが分かっている。FIGS. 4A and 4B show the image memory 4
It is a conceptual diagram of the image of the form in the. As shown in FIG. 4, the image memory 4 in FIG. 2 corresponds to the form,
It has a virtually planar structure. Addresses are provided in the horizontal direction X (X0 to Xn) and the vertical direction Y (Y0 to Yn). Actually, this is a multi-valued information memory in which a plurality of memories (the number of gradations) of this planar structure are stacked, but here,
For the sake of clarity, the description will be given with a plane structure of X and Y. In FIG. 4A, A1, B1, C1, and D1 are the four corners of the form 2-2, (X0, Y0) is the origin based on the upper and left sides of the form, (XA1, YA1), (XB1). , YB1) is
These are the addresses A1 and B1. A2 in FIG. 4 (B)
B2, C2, D2 are the four corners of the form 2-2, (X0, Y
0) is the origin based on the top and left sides of the form, (XA
2, YA2) and (XB2, YB2) are the addresses of A2 and B2. FIG. 4A is a diagram showing a case where α1 = β1 where the upper side U has a slope α1 and the lower side has a slope β1, and the form 2-2 that has been conveyed to the reading position R with a slope α1 rising to the left. The lower side D also shows the image of the form that has passed the reading position R with the same inclination α1 as the upper side U. In this case, it can be seen that the reading field F1 on the upper side U (marked with 110), the reading field F2 on the lower side D (marked with 220) F2, and the reading field at any other position can be accurately cut out. ing.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記従
来のＯＣＲでは、図４（Ｂ）に示すα２≠β２の場合に
示すように、左上がりの傾き（上辺の傾き＝α２）で読
取位置に搬送されてきた帳票２−２が、読取中（読取位
置を通過中）に蛇行し、下辺Ｄにおいては、傾きβ２と
いう、上辺Ｕとは逆の傾向、右上がりの傾きで読取位置
を通過した場合、上辺Ｕの傾きα２のみを補正係数とし
て帳票内部を切り出すと、上辺Ｕの読取フィールド（１
１０と記入されている）Ｆ１は正確に切り出せるが、下
辺Ｄの読取フィールド（２２０と記入されている）Ｆ２
を正確に切り出せないことがあった。つまり、図４
（Ｂ）においては、下辺Ｄに近い読取フィールドＦ２に
対し、上辺Ｕの傾きα２のみを補正係数として帳票内部
を切り出そうとしても、イメージの位置がずれてしま
い、正確に切り出せないという問題があった。この帳票
蛇行の傾向は、・装置量産時の機構部品の寸法精度、組立精度他のばら
つき・帳票の大きさ、連量・環境条件（温度、湿度）に関係して発生するものであり、避けて通れない問題で
ある。However, in the above-mentioned conventional OCR, as shown in the case of α2 ≠ β2 shown in FIG. 4B, the sheet is conveyed to the reading position with an inclination to the left (upper side inclination = α2). When the received form 2-2 meanders during reading (passing through the reading position) and passes through the reading position with a slope β2 on the lower side D, which is the reverse tendency of the upper side U, and is an upward slope. , If the inside of the form is cut out using only the inclination α2 of the upper side U as the correction coefficient, the reading field (1
F1 (marked as 10) can be accurately cut out, but reading field F2 (marked as 220) on the lower side D
Sometimes it was not possible to cut out exactly. That is, FIG.
In (B), with respect to the reading field F2 close to the lower side D, even if an attempt is made to cut out the inside of the form using only the inclination α2 of the upper side U as a correction coefficient, the position of the image is displaced, and there is a problem that it cannot be accurately cut out. there were. This tendency of meandering of the form should be avoided because it is related to the dimensional accuracy, assembly accuracy, and other variations of mechanical parts during mass production of the device, the size of the form, and the amount of continuous material, and environmental conditions (temperature, humidity). It is a problem that cannot be passed.

【０００６】[0006]

【課題を解決するための手段】第１の発明では、前記課
題を解決するために、ＯＣＲは、ドロップアウトカラー
で印刷された文字枠内に文字が記入又は印刷された帳票
を給紙・搬送する機構部と、所定の読取位置上にランプ
を配設して、そのランプにより前記搬送されてきた帳票
に照射して、光信号を得る光学系と、前記光学系より出
力される光信号を電気信号に変換して、前記文字枠内の
記入文字又は、印字文字のイメージを得る第１の光電変
換部とを備えている。そして、前記第１の光電変換部よ
り得られたイメージを格納する第１のイメージメモリ
と、前記光学系より出力される光信号から前記ドロッブ
アウトカラーが有する白レヘルに近い微小信号を増幅・
抽出して電気信号に変換して、前記文字枠のイメージを
得る第２の光電変換部と、前記第２の光電変換部より得
られたイメージを格納する第２のイメージメモリとを備
えている。さらに、前記帳票の上辺の基準ラインからの
傾きを測定し、その傾きと前記帳票の横に１つ又は複数
個の前記文字枠を含む領域である各フィールドの情報、
及び各フィールド内の各文字枠の情報を表す帳票フォー
マット情報とに基づいて、読取フィールド内の各文字パ
タンを切り出す第１の前処理部と、識別部により前記切
り出された文字が不読と判定されると、不読とされた文
字パタンの切り出し状況を調べて、その文字パタンの上
又は下が切り出し範囲の枠に接している場合には、切り
出し位置が不適当と判断し、その不読とされた読取フィ
ールド内の文字枠のアドレスを前記第２のイメージメモ
リに格納されたイメージから算出して、その文字枠のア
ドレスにしたがって、不読とされた文字パタンの再切り
出しをする第２の前処理部と、前記第１、及び第２の前
処理部により切り出された文字の認識をする識別部とを
備えている。第１の発明によれば、以上のように、ＯＣ
Ｒを構成したので、読取フィールド内の切り出し文字が
識別部より不読となったとき、第２のイメージメモリか
らその不読となった読取フィールドに含まれるべき文字
枠のアドレスを求めて、そのアドレスにしたがって、そ
の文字パタンの再切り出しをする。従って、前記課題を
解決できるのである。In the first invention, in order to solve the above-mentioned problems, the OCR feeds and conveys a form in which characters are written or printed in a character frame printed in dropout color. And the optical system that obtains an optical signal by irradiating the conveyed form by the lamp, and the optical signal output from the optical system. It is provided with a first photoelectric conversion unit for converting into an electric signal to obtain an image of a written character or a printed character in the character frame. Then, a first image memory that stores the image obtained by the first photoelectric conversion unit, and an optical signal output from the optical system that amplifies a minute signal that is close to the white leher included in the dropout color.
It is provided with a second photoelectric conversion unit for extracting and converting into an electric signal to obtain the image of the character frame, and a second image memory for storing the image obtained by the second photoelectric conversion unit. . Further, the inclination of the upper side of the form from the reference line is measured, and the inclination and the information of each field which is an area including one or more of the character frames beside the form,
And a first pre-processing unit that cuts out each character pattern in the reading field based on the form format information that represents the information of each character frame in each field, and the cut-out character is determined to be unreadable by the identification unit. Then, check the cutout status of the unreadable character pattern, and if the top or bottom of the character pattern touches the frame of the cutout range, the cutout position is judged to be inappropriate and the unread Secondly, the address of the character frame in the read field is calculated from the image stored in the second image memory, and the unreadable character pattern is re-cut out according to the address of the character frame. And a discriminating unit for recognizing the characters cut out by the first and second pre-processing units. According to the first invention, as described above, the OC
Since R is configured, when the cut-out character in the reading field becomes unreadable by the identification unit, the address of the character frame to be included in the reading field which becomes unreadable is obtained from the second image memory, and Re-cut out the character pattern according to the address. Therefore, the above problem can be solved.

【０００７】[0007]

【発明の実施の形態】第１の実施形態図１は、本発明の第１の実施形態のＯＣＲの構成図であ
り、従来の図２中の要素と共通の要素には共通の符号を
付してある。本第１の実施形態のＯＣＲが従来のＯＣＲ
と異なる点は、文字枠を読み取るための光電変換部３−
２と文字枠のイメージを格納するイメージメモリ４−２
とを設け、前処理部２６を、帳票の上辺の基準ラインか
らの傾きを測定し、その傾きに基づいて、読取フィール
ドを傾き補正して、その読取フィールド内の文字パタン
を切り出す第１の前処理部と、識別部８より不読とされ
た読取フィールドについては、その読取フィールドに含
まれるべき文字枠のアドレスを算出して、そのアドレス
に基づいて、不読となった文字パタンの再切り出しをす
る第２の前処理部とで構成したことである。図１に示す
ように、このＯＣＲでは、機構部１、光学系２、第１の
光電変換部３−１，第２の光電変換部３−２、第１のイ
メージメモリ４−１，第２のイメージメモリ４−２、機
構制御部５、前処理部２６、特徴抽出部７、識別部８、
識別用辞書９、Ｉ／Ｆ制御部１０、共通制御部１１、及
びシステムバス１２により構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment FIG. 1 is a block diagram of an OCR according to a first embodiment of the present invention, in which elements common to those shown in FIG. I am doing it. The OCR of the first embodiment is a conventional OCR
3 is different from the photoelectric conversion unit 3 for reading the character frame.
2 and an image memory 4-2 that stores the image of the character frame
And the preprocessing unit 26 measures the inclination of the upper side of the form from the reference line, corrects the inclination of the reading field based on the inclination, and cuts out the character pattern in the reading field. For the reading field that is unreadable by the processing unit and the identification unit 8, the address of the character frame that should be included in the reading field is calculated, and the unreadable character pattern is re-cut out based on the address. And a second pre-processing unit for As shown in FIG. 1, in this OCR, the mechanical unit 1, the optical system 2, the first photoelectric conversion unit 3-1, the second photoelectric conversion unit 3-2, the first image memory 4-1, and the second image memory 4-1 are used. Image memory 4-2, mechanism control unit 5, pre-processing unit 26, feature extraction unit 7, identification unit 8,
It is composed of an identification dictionary 9, an I / F controller 10, a common controller 11, and a system bus 12.

【０００８】光学系２は、機構部１に対して、帳票２−
２の搬送方向に配設されている。光学系２の出力側は、
光電変換部３−１が接続されている。光電変換部３−１
の出力側は、イメージメモリ４−１が接続され、光電変
換部３−２の出力側は、イメージメモリ４−２が接続さ
れている。イメージメモリ４−１、及び４−２の出力側
は、前処理部２６が接続され、前処理部２６の出力側
は、特徴抽出部７が接続されている。特徴抽出部７の出
力側は、識別部８が接続されている。識別部８の入出力
側は、文字の特徴量を格納する識別用辞書９が接続され
ている。識別部８の出力側は、Ｉ／Ｆ制御部１０が接続
されている。Ｉ／Ｆ制御部１０の出力側は、上位ＷＳが
接続されている。イメージメモリ４−１，４−２、機構
制御部５、前処理部２６、特徴抽出部７、識別部８、Ｉ
／Ｆ制御部１０、及び共通制御部１１は、システムバス
１２が接続されている。機構部１は、帳票走行路の基準
面、及びアライナ機構を有さず、ホッパにセットされた
帳票束より１枚の帳票２−２を給紙・搬送するものであ
り、光学系２は、読取位置でランプ２−２により帳票２
−２上のイメージを照射して、レンズ２−３を介して光
電変換部３−１のＣＣＤセンサ上に結像するものであ
る。光電変換部３−１は、ＣＣＤセンサにて帳票２−２
上のイメージを光電変換して、さらに増幅・Ａ／Ｄ変換
して、通常の記入文字、印字文字を読み取るものであ
る。光電変換部３−２は、光電変換部３−１中に配設さ
れたＣＣＤセンサにて光電変換された電気信号を、増幅
・Ａ／Ｄ変換して、ドロップアウトカラーで印字された
文字枠を読取るものである。The optical system 2 has a form 2-for the mechanical unit 1.
2 are arranged in the transport direction. The output side of the optical system 2 is
The photoelectric conversion unit 3-1 is connected. Photoelectric conversion unit 3-1
The image memory 4-1 is connected to the output side of, and the image memory 4-2 is connected to the output side of the photoelectric conversion unit 3-2. The preprocessor 26 is connected to the output sides of the image memories 4-1 and 4-2, and the feature extraction unit 7 is connected to the output side of the preprocessor 26. An identification unit 8 is connected to the output side of the feature extraction unit 7. The input / output side of the identification unit 8 is connected to an identification dictionary 9 that stores character feature amounts. An I / F control unit 10 is connected to the output side of the identification unit 8. An upper WS is connected to the output side of the I / F control unit 10. Image memories 4-1, 4-2, mechanism control unit 5, preprocessing unit 26, feature extraction unit 7, identification unit 8, I
The system bus 12 is connected to the / F control unit 10 and the common control unit 11. The mechanical unit 1 does not have a reference surface of a form traveling path and an aligner mechanism, and feeds and conveys one form 2-2 from a form bundle set in a hopper, and the optical system 2 Form 2 by the lamp 2-2 at the reading position
-2 is irradiated and an image is formed on the CCD sensor of the photoelectric conversion unit 3-1 through the lens 2-3. The photoelectric conversion unit 3-1 uses a CCD sensor to form 2-2.
The above image is photoelectrically converted and further amplified and A / D converted to read ordinary written characters and printed characters. The photoelectric conversion unit 3-2 amplifies and A / D converts the electric signal photoelectrically converted by the CCD sensor disposed in the photoelectric conversion unit 3-1, and is a character frame printed in dropout color. Is to be read.

【０００９】イメージメモリ４−１は、光電変換部３−
１で得た帳票２−２の通常のイメージを格納するメモリ
であり、イメージメモリ４−２は、光電変換部３−２で
得た帳票２−２のドロッブアウトカラーで印刷された文
字枠のイメージを格納するメモリである。前処理部２６
は、第１の前処理部で文字の切り出しを行い、第２の前
処理部で、不読となった読取フィールドの再切り出しを
するものである。特徴抽出部７は、切り出した文字の特
徴量を抽出し、識別部８は、文字を識別するものであ
る。識別用辞書９は、文字を識別するために文字の特徴
データを格納する辞書である。機構制御部５は、機構部
１の給紙動作の制御を行い、Ｉ／Ｆ制御部１０は、認識
部８より認識した文字の知識処理などする上位ＷＳとイ
ンタフェースを取るためのものであり、共通制御部１１
は、機構制御部５、前処理部２６、特徴抽出部７、識別
部８、Ｉ／Ｆ制御部１０の動作制御を行うものである。The image memory 4-1 includes a photoelectric conversion unit 3-
1 is a memory for storing a normal image of the form 2-2 obtained in 1 and the image memory 4-2 is a character frame printed in the drop-out color of the form 2-2 obtained in the photoelectric conversion unit 3-2. Is a memory for storing the image of. Preprocessing unit 26
In the first preprocessing unit, characters are cut out, and the second preprocessing unit recuts unreadable reading fields. The feature extraction unit 7 extracts the feature amount of the cut out character, and the identification unit 8 identifies the character. The identification dictionary 9 is a dictionary that stores character feature data for identifying characters. The mechanism control unit 5 controls the paper feeding operation of the mechanism unit 1, and the I / F control unit 10 is for interfacing with a higher-level WS that performs knowledge processing of characters recognized by the recognition unit 8. Common control unit 11
Is for controlling the operation of the mechanism control unit 5, the preprocessing unit 26, the feature extraction unit 7, the identification unit 8, and the I / F control unit 10.

【００１０】図５は、図１中の光電変換部の構成の例を
示す図である。図５に示すように、光電変換部３−１
は、ＣＣＤセンサ及び光電変換部１３、増幅回路１４−
１、Ａ／Ｄ変換回路１５−１より構成され、光電変換部
３−２は、増幅回路１４−２、Ａ／Ｄ変換回路１５−２
より構成されている。ＣＣＤセンサ及び光電変換部１３
の出力側は、増幅回路１４−１及び１４−２が接続され
ている。増幅回路１４−１の出力側は、Ａ／Ｄ変換回路
１５−１が接続されている。増幅回路１４−２の出力側
は、Ａ／Ｄ変換回路１５−２が接続されている。ＣＣＤ
センサ及び光電変換部１３は、レンズ２−３より集光さ
れた光を入力し、電気信号に変換するものであり、増幅
回路１４−１及び１４−２は、電気信号を増幅するもの
である。Ａ／Ｄ変換回路１５−１は、増幅回路１４−１
より出力される電気信号をドロップアウトカラーが有す
る信号レベルよりも少し小さい基準電圧と黒レベルに近
い基準電圧間をディジタル信号に変換するものである。
Ａ／Ｄ変換回路１５−２は、増幅回路１４−２より出力
される電気信号をドロップアウトカラーが有する信号レ
ベルよりも少し小さい基準電圧と白レベルの基準電圧と
の間をディジタル信号に変換するものである。FIG. 5 is a diagram showing an example of the configuration of the photoelectric conversion unit in FIG. As shown in FIG. 5, the photoelectric conversion unit 3-1
Is a CCD sensor and photoelectric conversion unit 13, an amplifier circuit 14-
1, the A / D conversion circuit 15-1, and the photoelectric conversion unit 3-2 includes an amplification circuit 14-2 and an A / D conversion circuit 15-2.
It is composed of CCD sensor and photoelectric conversion unit 13
The output side of is connected to the amplifier circuits 14-1 and 14-2. The A / D conversion circuit 15-1 is connected to the output side of the amplification circuit 14-1. The output side of the amplifier circuit 14-2 is connected to the A / D conversion circuit 15-2. CCD
The sensor and photoelectric conversion unit 13 inputs the light condensed by the lens 2-3 and converts it into an electric signal, and the amplifier circuits 14-1 and 14-2 amplify the electric signal. . The A / D conversion circuit 15-1 includes an amplifier circuit 14-1.
The electrical signal output from the dropout color is converted into a digital signal between a reference voltage slightly lower than the signal level of the dropout color and a reference voltage near the black level.
The A / D conversion circuit 15-2 converts the electric signal output from the amplification circuit 14-2 into a digital signal between a reference voltage slightly lower than the signal level of the dropout color and a white level reference voltage. It is a thing.

【００１１】図６は、帳票の文字枠の例を示す図であ
る。図６に示すように、帳票には、記入用文字枠が予め
印刷されており、この記入用文字枠内に文字が記入され
るようになっている。記入用文字枠には、１文字毎に独
立した文字枠の記入用文字枠ＦＲ１、各文字枠が縦横く
っついた表の形をした記入用文字枠（表形式）ＦＲ２、
複数の文字を１つの文字枠内に記入する印字用文字枠Ｆ
Ｒ３などがある。これらの文字枠は、白レベルに近いド
ロップアウトカラーで印刷されており、記入や印字時の
目安とする枠である。以下、これらの図を参照しつつ、
図１のＯＣＲの動作（ａ）〜（ｊ）の説明をする。FIG. 6 is a diagram showing an example of a character frame of a form. As shown in FIG. 6, an entry character frame is printed in advance on the form, and characters are to be entered in the entry character frame. The writing character box has a writing character frame FR1 which is an independent character frame for each character, a writing character frame (tabular form) FR2 in the form of a table in which each character frame is vertically and horizontally attached,
Character frame F for printing that writes multiple characters in one character frame
There is R3 etc. These character frames are printed in a dropout color close to the white level, and are used as a guide when writing or printing. Below, referring to these figures,
The operations (a) to (j) of the OCR of FIG. 1 will be described.

【００１２】（ａ）機構部１機構部１では、機構制御部５の制御の下で、ホッパにセ
ットされた帳票束から吸入ローラ、及び給紙ローラによ
り帳票２−２を給紙して、搬送ローラによって、光学系
２の方向に搬送する。（ｂ）光学系２搬送ローラによって搬送されてきた帳票２−２は、読取
位置Ｒでランプ２−１に照射され、帳票２−２上のイメ
ージはレンズ２−３により集光されて、図５中の光電変
換部３−１のＣＣＤセンサ１３上に結像する。（ｃ）光電変換部３−１図７（Ａ），（Ｂ）は、光電変換部の動作概念図であ
り、特に同図（Ａ）は、帳票２−２の一部を示す図であ
り、同図（Ｂ）は、読取位置Ｒにおける、同図（Ａ）の
図５中の増幅回路１４−１，１４−２の出力信号を示す
図である。図７（Ａ）中のＦ１〜Ｆ６は縦の文字枠部Ｆ
Ｒを示し、Ｃ１〜Ｃ４は、記入文字部ＣＨを表してい
る。本例では、３個の文字枠ＦＲ内に１１０の記入文字
ＣＨが記入されている。光電変換部３−１中の増幅回路
１４−１では、図５中のＣＣＤセンサ及び光電変換部１
３より得られた電気信号を図７（Ｂ）に示すように増幅
（例えば、０〜５Ｖ）する。Ａ／Ｄ変換部１５−１で
は、まず、増幅回路１４−１で増幅された信号をドロッ
プアウトカラーが有するレベルよりも少し小さいレベル
に設定された（＋）の基準電圧（例えば、４．５Ｖ）
Ｃ、及び黒レベルに近い（−）の基準電圧（例えば、
０．５Ｖ）Ｂ２と比較して、（＋）の基準電圧Ｃと
（−）の基準電圧Ｂ２間の電圧にレベル変換する。そし
て、（＋）の基準電圧Ｃと（−）の基準電圧Ｂ２との間
をＮ（例えば、Ｎは８〜３２）段階に別けてディジタル
値に変換して、イメージメモリ４−１に書き込む。この
結果、文字枠は白色と処理されて、記入文字Ｃ１〜Ｃ４
だけが識別される。(A) Mechanism Unit 1 In the mechanism unit 1, under the control of the mechanism control unit 5, the form 2-2 is fed from the form bundle set in the hopper by the suction roller and the paper feed roller, It is conveyed in the direction of the optical system 2 by the conveying roller. (B) Optical system 2 The form 2-2 conveyed by the conveying rollers is irradiated on the lamp 2-1 at the reading position R, and the image on the form 2-2 is condensed by the lens 2-3, An image is formed on the CCD sensor 13 of the photoelectric conversion unit 3-1 in FIG. (C) Photoelectric conversion unit 3-1 FIGS. 7A and 7B are operation conceptual diagrams of the photoelectric conversion unit, and in particular, FIG. 7A is a diagram showing a part of the form 2-2. 5B is a diagram showing output signals of the amplifier circuits 14-1 and 14-2 in FIG. 5A of FIG. 5A at the reading position R. F1 to F6 in FIG. 7A are vertical character frame portions F
R is shown and C1 to C4 represent the written character portion CH. In this example, 110 entry characters CH are entered in the three character boxes FR. In the amplifier circuit 14-1 in the photoelectric conversion unit 3-1, the CCD sensor and the photoelectric conversion unit 1 in FIG.
The electric signal obtained from No. 3 is amplified (for example, 0 to 5 V) as shown in FIG. In the A / D conversion unit 15-1, first, the signal amplified by the amplifier circuit 14-1 is set to a level (+) which is slightly lower than the level of the dropout color (for example, 4.5V). )
C, and a reference voltage (-) close to the black level (for example,
0.5V) B2, and level conversion is performed to a voltage between the (+) reference voltage C and the (−) reference voltage B2. Then, the area between the (+) reference voltage C and the (-) reference voltage B2 is converted into a digital value in N (for example, N is 8 to 32) stages, and written into the image memory 4-1. As a result, the character frame is processed as white, and the entered characters C1 to C4
Only are identified.

【００１３】（ｄ）光電変換部３−２光電変換部３−２中の増幅回路１４−２では、図５中の
ＣＣＤセンサ及び光電変換部１３より得られた電気信号
を図７（Ｂ）に示すように増幅（例えば、０〜５Ｖ）す
る。これは、ドロップアウトカラーで印刷された薄い線
は、光電変換波形においても、白レベルＷ１からすこし
しか下がらないからである。Ａ／Ｄ変換部１５−２で
は、まず、増幅回路１４−２で増幅された信号を白レベ
ルの（＋）の基準電圧Ｗ１、及びドロップアウトカラー
が有するレベルよりも少し小さいレベルに設定された
（−）の基準電圧Ｃと比較して、（＋）の基準電圧Ｗ１
と（−）の基準電圧Ｃ間の電圧にレベル変換する。そし
て、（＋）の基準電圧Ｗ１と（−）の基準電圧Ｃとの間
をＮ（例えば、Ｎは８〜３２）段階に別けてディジタル
値に変換して、イメージメモリ４−２に書き込む。この
結果、イメージメモリ４−２には、記入文字とドロップ
アウトカラーのような薄い文字や文字枠の信号が混じ
る。(D) Photoelectric conversion section 3-2 In the amplifier circuit 14-2 in the photoelectric conversion section 3-2, the electric signals obtained from the CCD sensor and the photoelectric conversion section 13 in FIG. Amplification (for example, 0 to 5 V) is performed as shown in FIG. This is because the thin line printed in the dropout color slightly drops from the white level W1 even in the photoelectric conversion waveform. In the A / D converter 15-2, first, the signal amplified by the amplifier circuit 14-2 is set to a level slightly lower than the level of the white level (+) reference voltage W1 and the dropout color. Compared with the (−) reference voltage C, the (+) reference voltage W1
The level is converted into a voltage between the reference voltage C of (-). Then, the interval between the (+) reference voltage W1 and the (-) reference voltage C is converted into digital values in N (for example, N is 8 to 32) steps and written into the image memory 4-2. As a result, in the image memory 4-2, the characters to be written are mixed with light characters such as dropout colors and signals of character frames.

【００１４】（ｅ）イメージメモリ４−１，４−２図８（Ａ），（Ｂ）は、イメージメモリ４−１，４−２
に格納された帳票の一部のイメージの例を示す図であ
る。図８（Ａ）に示すように、イメージメモリ４−１に
は、文字枠は白レベルとして扱われて、文字枠の情報は
入らない。一方、図８（Ｂ）に示すように、イメージメ
モリ４−２には、文字枠及び記入文字の情報が入ってい
る。（ｆ）前処理部２６図９は、第１の実施形態の前処理部２６の動作説明図で
あり、イメージメモリ４−１に格納されたイメージの一
例を示している。本例では、左上の文字枠内に１１０の
文字が記入されており、右下の文字枠内に２２０の記入
文字が記入されている。Ａ３，Ｂ３，Ｃ３，Ｄ３は帳票
２−２の４角を示し、Ａ３，Ｂ３のアドレスは、（ＸＡ
３，ＹＡ３）、（ＸＢ３，ＹＢ３）となっている。（Ｘ
０，Ｙ０）は、帳票の左辺及び右辺を基準辺とした原
点、（Ｘ０〜Ｘｎ）は横方向Ｘ、（Ｙ０〜Ｙｎ）は縦方
向Ｙを示す。まず、第１の前処理部では、帳票２−２の
基準辺が上辺Ｕと左辺Ｌであれば、イメージメモリ４−
１の上端（帳票２−２以外の領域は黒レベルとなってい
る）からの白レベルへの変化点のアドレスを横方向に順
次検出して、帳票の上辺Ｕを検出する。(E) Image memories 4-1 and 4-2 FIGS. 8A and 8B are image memories 4-1 and 4-2.
It is a figure which shows the example of a part of image of the form stored in. As shown in FIG. 8A, the character frame is treated as a white level in the image memory 4-1 and the character frame information is not stored therein. On the other hand, as shown in FIG. 8B, the image memory 4-2 contains information on character frames and written characters. (F) Pre-Processing Unit 26 FIG. 9 is an operation explanatory diagram of the pre-processing unit 26 of the first embodiment, and shows an example of an image stored in the image memory 4-1. In this example, 110 characters are entered in the upper left character box, and 220 characters are entered in the lower right character box. A3, B3, C3, D3 indicate the four corners of the form 2-2, and the addresses of A3, B3 are (XA
3, YA3) and (XB3, YB3). (X
0, Y0) is the origin with the left and right sides of the form as reference sides, (X0 to Xn) is the horizontal direction X, and (Y0 to Yn) is the vertical direction Y. First, in the first preprocessing unit, if the reference side of the form 2-2 is the upper side U and the left side L, the image memory 4-
The address of the change point from the upper end of 1 (the area other than the form 2-2 is at the black level) to the white level is sequentially detected in the horizontal direction to detect the upper side U of the form.

【００１５】そして、帳票の左上角の点Ａ３のアドレス
（ＸＡ３，ＹＡ３）と帳票の右上角の点Ｂ３のアドレス
（ＸＢ３，ＹＢ３）より、例えば、次式（１）のように
して帳票のイメージの上辺の傾きα３を算出する。ｔａｎα３＝（ＹＢ３−ＹＡ３）／（ＸＢ３−ＸＡ３）・・・（１）まず、図９中の１行目の読取フィールドＲＦ１を切り出
すには、帳票２−２の文字枠の大きさ、位置などを表す
帳票フォーマット情報（この帳票フォーマット情報はシ
ステムで固定的に決めておき、それを検索することによ
る得ることもできるし、帳票ＩＤを読み出して、その帳
票ＩＤより得ることもできる）より、上辺Ｕから読取フ
ィールドＲＦ１の中央のラインまでの寸法Ｙ１の点から
傾きα３で縦方向に１行目の切り出し範囲Ｓ１を設定す
る。次に、その１行目の切り出し範囲Ｓ１内で読取フィ
ールドＲＦ１の横方向の範囲を設定して、読取フィール
ドＲＦ１に含まれる各文字を切り出していき、特徴抽出
部７に出力する。次に、２行目の読取フィールドＲＦ２
を切り出すには、フォーマット情報より、上辺Ｕからの
寸法Ｙ２の点から、上辺の傾きα３で横方向に、２行目
の切り出し範囲Ｓ２を設定する。そして、その２行目の
切り出し範囲Ｓ２内で読取フィールドＲＦ２の横方向の
範囲を設定して、読取フィールドＲＦ２に含まれる各文
字を切り出していき、特徴抽出部７に出力する。Then, from the address (XA3, YA3) of the point A3 at the upper left corner of the form and the address (XB3, YB3) of the point B3 at the upper right corner of the form, for example, the image of the form can be expressed by the following equation (1). The slope α3 of the upper side is calculated. tan α3 = (YB3-YA3) / (XB3-XA3) (1) First, in order to cut out the reading field RF1 on the first line in FIG. 9, the size and position of the character frame of the form 2-2, etc. From the form format information (which can be obtained by fixedly determining the form format information in the system and searching it, or by reading the form ID and obtaining the form ID from the form ID) A cutout range S1 of the first row is set in the vertical direction from the point of the dimension Y1 from U to the center line of the reading field RF1 with an inclination α3. Next, the horizontal range of the read field RF1 is set within the cut-out range S1 of the first line, and each character included in the read field RF1 is cut out and output to the feature extraction unit 7. Next, the reading field RF2 in the second row
In order to cut out, the cut-out range S2 of the second line is set in the horizontal direction from the point of the dimension Y2 from the upper side U with the inclination α3 of the upper side according to the format information. Then, the horizontal range of the read field RF2 is set within the cut-out range S2 of the second row, and each character included in the read field RF2 is cut out and output to the feature extraction unit 7.

【００１６】これらの各切り出し文字は、特徴抽出部７
で特徴抽出された後、識別部８で文字認識される。読取
フィールドＲＦ１については、上辺Ｕの傾きα３に基づ
いて、切り出し文字が補正されているので、図９に示す
ように、読取フィルードＲＦ１内の各切り出し文字内
に、記入文字である１１０がそれぞれ収まっているた
め、識別部８により正しく文字が識別される。図１０
（Ａ），（Ｂ）は、読取フィールドＲＦ２の切り出しイ
メージの例を示す図であり、特に同図（Ａ）は、帳票上
の読取りフィールドＲＦ２に対応するイメージであり、
同図（Ｂ）は、実際に切り出された読取フィールドＲＦ
２の切り出し文字のイメージである。図９及び図１０
（Ｂ）に示すように、２番目の読取フィールドＲＦ２に
ついては、記入文字、２２０のイメージの上が切れた形
でしか切り出せない。これは、帳票２−２が蛇行走行し
たため、帳票２−２の上辺Ｕの傾きα３（左上がり）で
も、下辺Ｄの傾きβ３は、α３とは逆の傾き（右下が
り）になっているためである。Each of these cut-out characters is a feature extraction unit 7
After the features are extracted in step S1, the identification unit 8 recognizes characters. In the reading field RF1, since the cut-out character is corrected based on the inclination α3 of the upper side U, as shown in FIG. 9, each of the cut-out characters in the read field RF1 contains the written character 110. Therefore, the identifying unit 8 correctly identifies the character. FIG.
(A), (B) is a diagram showing an example of a cutout image of the reading field RF2, in particular (A) is an image corresponding to the reading field RF2 on the form,
FIG. 2B shows the read field RF actually cut out.
It is an image of the cutout character of 2. 9 and 10
As shown in (B), the second reading field RF2 can be cut out only in the form in which the upper part of the image of the written character 220 is cut off. This is because the form 2-2 travels in a meandering manner, and even if the inclination α3 (upper left) of the upper side U of the form 2-2, the inclination β3 of the lower side D is the reverse (downward right) of α3. Is.

【００１７】このような場合、・読取りフィールドのある文字の読取結果が、認識部８
により不読（リジェクト）で、・その文字の切り出しイメージは、切り出し範囲の枠に
上または下に接している（切り出し範囲が不適当なた
め、イメージが不当に切り出されている）時は、以下に
説明する第２の前処理部により、イメージメモリ４−２
内の文字枠を正確に切り出して、そのアドレスを検出
し、再度、正確な文字の切り出しを行っていく。このよ
うに、読取り結果が不読（リジェクト）の時にのみ、こ
の処理を行うのは、装置全体の処理速度をできるだけ速
い状態にしておくためである。図１の構成ですべての文
字枠を検出しながら処理をすると、処理速度が大幅に遅
くなる。第２の前処理部では、第２の読取フィールドＲ
Ｆ２に含まれるべき文字枠をイメージメモリ４−２を上
から順に横方向に検索して、最初に黒点が現れる文字枠
のアドレス（Ｘ１０' ，Ｙ１０' ）を検出する。これよ
り文字枠が右下がり（左端の文字枠が検出される）であ
るか左下がり（右端の文字枠が検出される）であるか判
断される。本例では、左端の文字枠が最初に検出され、
右下がりになっていると判断される。In such a case, the reading result of the character having the reading field is the recognition unit 8
It is unreadable (rejected) due to: ・ When the cutout image of the character touches the upper or lower part of the frame of the cutout range (the image is cut out improperly because the cutout range is inappropriate), Image memory 4-2 by the second preprocessing unit described in
The character frame inside is accurately cut out, its address is detected, and the correct character is cut out again. As described above, this processing is performed only when the read result is unreadable (reject) because the processing speed of the entire apparatus is kept as high as possible. If the processing is performed while detecting all the character frames in the configuration of FIG. 1, the processing speed will be significantly slowed down. In the second preprocessing section, the second read field R
The character frame to be included in F2 is searched laterally from the image memory 4-2 in order from the top, and the address (X10 ', Y10') of the character frame in which the black dot first appears is detected. From this, it is determined whether the character frame is descending to the right (the character frame at the left end is detected) or is descending to the left (the character frame at the right end is detected). In this example, the leftmost character frame is detected first,
It is judged that it is falling to the right.

【００１８】第２の読取フィールドＲＦ２に含まれるべ
き文字枠の中で、文字枠が右下がりであれば、Ｘアドレ
スが最大となる黒点のアドレス（Ｘ２０' ，Ｙ２０' ）
を検出し、文字枠が右上がりであれば、Ｘアドレスが最
小となる黒点のアドレス（Ｘ２０' ，Ｙ２０' ）を検出
する。読取フィールドＲＦ２の概略傾きβ３' を次式
（２）より算出する。ｔａｎβ３' =(Ｙ２０' −Ｙ１０')/ ( Ｘ２０' −Ｘ１０') ・・・（２）傾きβ３' の傾き補正をかけて、以下のようにして、読
取フィールドＲＦ２に含まれるべき文字枠のアドレスを
正確に算出する。図１１（Ａ）〜（Ｃ）は、文字枠検索
の例を示す図である。Among the character frames to be included in the second reading field RF2, if the character frame is in the lower right direction, the address of the black dot (X20 ', Y20') that maximizes the X address.
If the character frame is rising to the right, the address (X20 ', Y20') of the black dot that minimizes the X address is detected. The approximate inclination β3 ′ of the reading field RF2 is calculated by the following equation (2). tan β3 ′ = (Y20′−Y10 ′) / (X20′−X10 ′) (2) The inclination correction of the inclination β3 ′ is performed, and the character frame to be included in the reading field RF2 is obtained as follows. Calculate the address accurately. 11A to 11C are diagrams showing an example of character box search.

【００１９】図１１（Ａ）に示すように、β３' の傾き
補正をかけて、イメージメモリ４−２から通常の切り出
し範囲よりも若干広めに、上下に設定して、文字枠検索
エリアＡＲの切り出しを行う。図１１（Ａ）中の（Ｘ'
，Ｙ' ）は図９中の原点（Ｘ０，Ｙ０）を中心とし
て、（Ｘ，Ｙ）をβ３' 回転した座標軸を表している。
そして、帳票フォーマット情報から得られる文字枠の幅
Ｗ、文字枠のピッチＰ、文字枠の高さＨから、文字枠検
索エリアＡＲをＸ１〜Ｘ２（各文字枠を含む）の範囲の
Ｘ' 方向の各アドレスに対するＹ' 方向の黒点数を計数
し、投影してみると、図１１（Ｂ）に示す投影データが
得られる。図１１（Ｂ）に示すように、文字枠の部分に
黒点が集中しているので、文字枠の幅Ｗ、文字枠のピッ
チＰの情報との合致・判断より、その部分の文字枠の
Ｘ'方向の正確なアドレスが特定できる。文字枠検索エ
リアＡＲを、Ｙ１〜Ｙ２の範囲でＹ' 方向の各アドレス
に対するＸ' 方向の黒点数を計数してみると、図１１
（Ｃ）に示す投影データが得られる。図１１（Ｃ）に示
すように、文字枠の部分に黒点が集中しているので、文
字枠の高さＨの情報との合致・判断より、その部分の文
字枠のＹ' 方向の正確なアドレスが特定できる。As shown in FIG. 11 (A), the inclination of β3 'is corrected, and the upper and lower portions are set slightly wider than the normal cut-out range from the image memory 4-2 to set the character frame search area AR. Cut out. (X 'in FIG. 11 (A)
, Y ') represents a coordinate axis obtained by rotating (X, Y) by β3' around the origin (X0, Y0) in FIG.
Then, based on the width W of the character frame, the pitch P of the character frame, and the height H of the character frame obtained from the form format information, the character frame search area AR is in the range X1 to X2 (including each character frame) in the X ′ direction. When the number of black spots in the Y'direction for each address is counted and projected, the projection data shown in FIG. 11B is obtained. As shown in FIG. 11B, since the black dots are concentrated in the character frame portion, the X of the character frame of that portion is determined based on the matching / judgment with the information of the width W of the character frame and the pitch P of the character frame. The exact address in the'direction can be specified. When the number of black dots in the X ′ direction for each address in the Y ′ direction in the character frame search area AR is counted in the range of Y1 to Y2, FIG.
The projection data shown in (C) is obtained. As shown in FIG. 11C, since the black dots are concentrated on the character frame portion, it is possible to determine the exact value in the Y ′ direction of the character frame of that portion from the matching / judgment with the information of the height H of the character frame. The address can be specified.

【００２０】傾きβ３' 、Ｘ' 方向のアドレス、及び
Ｙ' 方向のアドレスから文字枠の正確なＸアドレス及び
Ｙアドレスが求める。そして、不読となった文字の文字
枠のアドレスにしたがって、その文字を再度切り出し
て、特徴抽出部７に出力する。さらに、これ以降に読取
フィールドが有れば、次式（３）で示される傾きβ３
で、その以降の読取フィールドの傾き補正を行う。そし
て、読取フィールドＲＦ２の正確な傾きβ３を、例え
ば、次式（３）により算出する。ｔａｎβ３＝（Ｙ２０−Ｙ１０）／（Ｘ２０−Ｘ１０）・・・（３）ここで、（Ｘ１０，Ｙ１０）は、読取フィールドＲＦ２
の左端の文字枠の左上の点のアドレス（ここでは、先頭
の記入文字「２」が記入される文字枠のアドレス）、
（Ｘ２０，Ｙ２０）は、読取フィールドＲＦ２の右端の
文字枠の左上の点のアドレス（ここでは、最後の記入文
字「１」が記入される文字枠のアドレス）である。ま
た、黒点の分布により文字枠のアドレスを求めて、その
アドレスによって算出した式（３）に示す傾きは、黒点
の位置によって算出した式（２）に示す傾きよりも、よ
り正確な値となる。さらに、以降の読取フィールドの切
り出し文字が不読となり、その文字パタンが切り出し範
囲に接している場合には、上述したと同様にして、その
文字パタンの再切り出しを行うとともに、その不読とな
った読取フィールドの傾きにしたがって、以降の読取フ
ィールドを傾き補正して文字パタンを切り出してゆく。Accurate X and Y addresses of the character frame are obtained from the inclination β3 ', the address in the X'direction, and the address in the Y'direction. Then, according to the address of the character frame of the unreadable character, the character is cut out again and output to the feature extraction unit 7. Furthermore, if there is a read field after this, the slope β3 expressed by the following equation (3)
Then, the inclination of the subsequent reading field is corrected. Then, the accurate inclination β3 of the reading field RF2 is calculated, for example, by the following equation (3). tan β3 = (Y20−Y10) / (X20−X10) (3) where (X10, Y10) is the read field RF2.
The address of the upper left point of the leftmost character box (here, the address of the character box where the first entry character "2" is entered),
(X20, Y20) is the address of the upper left point of the character box at the right end of the reading field RF2 (here, the address of the character box in which the last entry character "1" is entered). Further, the slope of the formula (3) calculated from the address of the character frame obtained from the distribution of the black dots is more accurate than the slope of the formula (2) calculated from the position of the black dots. . Furthermore, if the cut-out character in the subsequent reading field becomes unreadable and the character pattern is in contact with the cut-out range, re-cut out that character pattern and make it unreadable as described above. According to the inclination of the read field, the inclination of the subsequent read fields is corrected and the character pattern is cut out.

【００２１】（ｇ）特徴抽出部７特徴抽出部７では、前処理部２６から転送されてきた１
文字分のイメージに対して、各種の特徴を抽出する。（ｈ）識別部８識別部８では、特徴抽出部７から転送されてきた１文字
分の特徴データを用いて、識別用辞書９を参照し、候補
文字を選択する。第１の前処理部により切り出された読
取りフィールドＲＦ２につしては、切り出し文字の上ま
たは下（本例では、上）がかけているので、切り出した
文字は不読（リジェクト）となるが、第２の前処理部に
より再度切り出された読取フィールドＲＦ２について
は、各切り出された領域に、文字が完全に含まれ、文字
の方向も正しく設定されているので、識別部８でリジェ
クトされずに、候補文字が選択される。候補文字は、Ｉ
／Ｆ制御部１０を経て、上位ＷＳ（ワークステーショ
ン）に転送され、何等かの知識処理（単語照合、文脈処
理、妥当性チェックなど）後に、ある文字符号に決定さ
れる。(G) Feature Extracting Unit 7 In the feature extracting unit 7, the 1
Various features are extracted from the character image. (H) Discrimination Unit 8 The discrimination unit 8 refers to the discrimination dictionary 9 using the characteristic data for one character transferred from the characteristic extraction unit 7, and selects a candidate character. In the reading field RF2 cut out by the first preprocessing unit, since the cut-out character is over or under (in this example, over), the cut-out character is unreadable (reject). As for the read field RF2 cut out again by the second pre-processing unit, each cut-out area completely contains characters and the direction of the characters is set correctly, so that the identification unit 8 does not reject it. Then, the candidate character is selected. The candidate character is I
It is transferred to the upper WS (workstation) through the / F control unit 10, and after some knowledge processing (word matching, context processing, validity check, etc.), it is determined to be a certain character code.

【００２２】（ｉ）共通制御部１１共通部制御部１１では、識別部８で識別結果が不読（リ
ジェクト）の場合には、システームバス１２を通して、
前処理部２６に通知する。また、１枚の帳票について識
別が終了すると、機構制御部５に次の帳票を給紙するよ
うに指示する。（ｊ）機構制御部５機構制御部５では、共通制御部１１からの指示にしたが
って、機構部１に次の帳票をホッパ上にセットされた帳
票束から給紙するように指示する。以上説明したよう
に、第１の実施形態によれば、読取結果が不読となり、
その文字パタンが切り出し範囲に接していると、その文
字パタンの文字枠のアドレスを求めて、文字パタンを再
度切り出すようしたので、帳票が読取位置で蛇行するこ
とが原因の切り出し不良や不読や誤読が発生することの
ない、優れたＯＣＲの実現が期待できる。(I) Common Control Unit 11 In the common unit control unit 11, if the identification result of the identification unit 8 is unreadable (reject), the
Notify the preprocessing unit 26. When the identification of one sheet is completed, the mechanism control unit 5 is instructed to feed the next sheet. (J) Mechanism Control Unit 5 The mechanism control unit 5 instructs the mechanism unit 1 to feed the next form from the form bundle set on the hopper in accordance with the instruction from the common control unit 11. As described above, according to the first embodiment, the reading result becomes unreadable,
When the character pattern is in contact with the cutout range, the address of the character frame of the character pattern is obtained and the character pattern is cut out again.Therefore, cutout or unreadness caused by the form meandering at the reading position It is expected that excellent OCR will be realized without misreading.

【００２３】第２の実施形態図１２は、本発明の第２の実施形態のＯＣＲの構成図で
あり、図１中の要素と共通の要素には共通の符号を付し
てある。本第２の実施形態のＯＣＲが第１の実施形態の
ＯＣＲと異なる点は、帳票を縦方向に複数にブロック領
域に分割して、各ブロック領域毎の帳票の傾きを求め
て、その傾きによって、ブロック領域毎に読取フィール
ドの傾き補正を行うようにしたことである。図１２に示
すように、このＯＣＲでは、機構部１、光学系２、光電
変換部３−１，３−２、イメージメモリ４−１，４−
２、機構制御部５、前処理部３６、特徴抽出部７、識別
部８、識別用辞書９、Ｉ／Ｆ制御部１０、共通制御部１
１、及びシステムバス１２により構成されている。光学
系２は、機構部１に対して、帳票２−２の搬送方向に配
設されている。光学系２の出力側は、光電変換部３−
１、及び３−２が接続されている。光電変換部３−１の
出力側は、イメージメモリ４−１が接続され、光電変換
部３−２の出力側は、イメージメモリ４−２が接続され
ている。 Second Embodiment FIG. 12 is a block diagram of an OCR according to a second embodiment of the present invention. Elements common to those in FIG. 1 are designated by common reference numerals. The difference between the OCR of the second embodiment and the OCR of the first embodiment is that the form is vertically divided into a plurality of block areas, the inclination of the form is calculated for each block area, and the inclination is determined by the inclination. That is, the inclination of the read field is corrected for each block area. As shown in FIG. 12, in this OCR, the mechanical unit 1, the optical system 2, the photoelectric conversion units 3-1, 3-2, and the image memories 4-1, 4- are used.
2, mechanism control unit 5, pre-processing unit 36, feature extraction unit 7, identification unit 8, identification dictionary 9, I / F control unit 10, common control unit 1
1 and the system bus 12. The optical system 2 is arranged with respect to the mechanism unit 1 in the conveyance direction of the form 2-2. The output side of the optical system 2 has a photoelectric conversion unit 3-
1 and 3-2 are connected. The image memory 4-1 is connected to the output side of the photoelectric conversion unit 3-1, and the image memory 4-2 is connected to the output side of the photoelectric conversion unit 3-2.

【００２４】イメージメモリ４−１、及び４−２の出力
側は、前処理部３６が接続され、前処理部２６の出力側
は、特徴抽出部７が接続されている。特徴抽出部７の出
力側は、識別部８が接続されている。識別部８の入出力
側は、文字の特徴量を格納する識別用辞書９が接続され
ている。識別部８の出力側は、Ｉ／Ｆ制御部１０が接続
されている。Ｉ／Ｆ制御部１０の出力側は、上位ＷＳが
接続されている。イメージメモリ４−１，４−２、機構
制御部５、前処理部３６、特徴抽出部７、識別部８、Ｉ
／Ｆ制御部１０、及び共通制御部１１は、システムバス
１２により接続されている。以下、図１２の動作（ａ）
〜（ｌ）の説明をする。The preprocessor 36 is connected to the output sides of the image memories 4-1 and 4-2, and the feature extractor 7 is connected to the output side of the preprocessor 26. An identification unit 8 is connected to the output side of the feature extraction unit 7. The input / output side of the identification unit 8 is connected to an identification dictionary 9 that stores character feature amounts. An I / F control unit 10 is connected to the output side of the identification unit 8. An upper WS is connected to the output side of the I / F control unit 10. Image memories 4-1, 4-2, mechanism control unit 5, preprocessing unit 36, feature extraction unit 7, identification unit 8, I
The / F control unit 10 and the common control unit 11 are connected by the system bus 12. Hereinafter, the operation (a) of FIG.
~ (L) will be described.

【００２５】（ａ）機構部１機構部１は、第１の実施形態の機構部１と同様に動作す
る。（ｂ）光学系２光学系２は、第１の実施形態の光学系２と同様に動作す
る。（ｃ）光電変換部３−１光電変換部３−１は、第１の実施形態の光電変換部３−
１と同様に動作する。（ｄ）光電変換部３−２光電変換部３−２は、第１の実施形態の光電変換部３−
２と同様に動作する。（ｅ）イメージメモリ４−１イメージメモリ４−１には、第１の実施形態と同様に、
記入文字のイメージが格納される。（ｇ）イメージメモリ４−２イメージメモリ４−２には、第１の実施形態と同様に、
文字枠及び記入文字のイメージが格納される。(A) Mechanism Unit 1 The mechanism unit 1 operates in the same manner as the mechanism unit 1 of the first embodiment. (B) Optical system 2 The optical system 2 operates similarly to the optical system 2 of the first embodiment. (C) Photoelectric conversion unit 3-1 The photoelectric conversion unit 3-1 is the photoelectric conversion unit 3- of the first embodiment.
It operates in the same manner as 1. (D) Photoelectric conversion unit 3-2 The photoelectric conversion unit 3-2 is the photoelectric conversion unit 3- of the first embodiment.
It operates in the same way as 2. (E) Image Memory 4-1 In the image memory 4-1, the same as in the first embodiment,
The image of the written character is stored. (G) Image memory 4-2 In the image memory 4-2, as in the first embodiment,
An image of the character frame and the entered character is stored.

【００２６】（ｈ）前処理部３６図１３は、第２の実施形態の前処理部３６の動作説明図
であり、イメージメモリ４−２に格納されたイメージの
一例を示している。図１３中のＡ４，Ｂ４，Ｃ４，Ｄ４
は帳票２−２の４角を示し、Ａ４，Ｂ４のアドレスは、
（ＸＡ４，ＹＡ４）、（ＸＢ４，ＹＢ３）となってい
る。（Ｘ０，Ｙ０）は、帳票の左辺及び右辺を基準辺と
した原点、（Ｘ０〜Ｘｎ）は横方向Ｘ、（Ｙ０〜Ｙｎ）
は縦方向Ｙを示す。図１３に示すように、帳票の上辺の
傾きα４と帳票の下辺の傾きはβ４であり、α４≠β４
とする。これは、この帳票は蛇行しながら読み取ってい
たことが分かる。本帳票の読取りに際しては、まずこの
帳票の縦方向の寸法を検出する必要がある。これは、帳
票２−２の帳票ＩＤを読取り、その帳票フォマット情報
より、帳票の縦方向の寸法を知ることができるし、また
実際のイメージ４−１の中の帳票２−２のイメージよ
り、縦方向の寸法を測定する。(H) Pre-Processing Unit 36 FIG. 13 is an operation explanatory view of the pre-processing unit 36 of the second embodiment, and shows an example of the image stored in the image memory 4-2. A4, B4, C4, D4 in FIG.
Indicates the four corners of form 2-2, and the addresses of A4 and B4 are
(XA4, YA4) and (XB4, YB3). (X0, Y0) is the origin with the left and right sides of the form as reference sides, (X0 to Xn) is the horizontal direction X, (Y0 to Yn).
Indicates the vertical direction Y. As shown in FIG. 13, the slope α4 of the upper side of the form and the slope α4 of the lower side of the form are β4, and α4 ≠ β4
And This shows that this form was being read while meandering. When reading this form, it is first necessary to detect the vertical dimension of this form. This is because it is possible to read the form ID of the form 2-2, know the vertical dimension of the form from the form format information, and from the image of the form 2-2 in the actual image 4-1. Measure the vertical dimension.

【００２７】本実施形態では、この帳票の縦方向の寸法
＝４００ｍｍとする。まず、帳票を上辺Ｕから１００ｍ
ｍ（この１００ｍｍは固定ではなく、任意に設定するも
のである。）４つのブロック領域に分ける。第１のブロック領域Ａ１：上辺〜１００ｍｍ第２のブロック領域Ａ２：１００ｍｍ〜２００ｍｍ第３のブロック領域Ａ３：２００ｍｍ〜３００ｍｍ第４のブロック領域Ａ４：３００ｍｍ〜４００ｍｍ（下
辺）第１のブロック領域Ａ１については、上辺の傾きα４を
帳票傾き補正データとして用いながら切り出しを行な
う。第２のブロック領域Ａ２については、上辺から１０
０ｍｍに一番近くて、できるだけ横に連続した読取フィ
ールド（例えば、図１３中のＬ２００行）に対して、第
１の実施形態の第２の前処理部の処理と同様にして、そ
の文字枠を検出していき、そのイメージメモリ４−２内
のアドレス情報より、その位置における帳票２−２の傾
き度合いを算出する。そして、算出した傾き補正データ
を用いながら、第２のブロック領域Ａ２内の読取りフィ
ールドに対して、文字の切り出しを行う。In this embodiment, the vertical dimension of this form is 400 mm. First, the form is 100m from the upper side U
m (This 100 mm is not fixed but is set arbitrarily.) Divide into four block areas. First block area A1: upper side to 100 mm Second block area A2: 100 mm to 200 mm Third block area A3: 200 mm to 300 mm Fourth block area A4: 300 mm to 400 mm (lower side) First block area A1 Cuts out while using the upper side inclination α4 as form inclination correction data. For the second block area A2, 10 from the upper side
For a read field that is closest to 0 mm and is as continuous as possible in the horizontal direction (for example, line L200 in FIG. 13), in the same manner as the process of the second preprocessing unit of the first embodiment, the character frame Is detected, the degree of inclination of the form 2-2 at that position is calculated from the address information in the image memory 4-2. Then, using the calculated inclination correction data, characters are cut out in the reading field in the second block area A2.

【００２８】第３のブロック領域Ａ３については、上辺
から２００ｍｍに一番近くて、できるだけ横に連続した
読取フィールド（例えば、図１３中のＬ２００行付近）
に対して、第１の実施形態の第２の前処理部の処理と同
様にして、その文字枠を検出していき、そのイメージメ
モリ４−２内のアドレス情報よりその位置における帳票
の傾き度合いを算出する。そして、算出した傾き補正デ
ータを用いながら、第３のブロック領域Ａ３内の読取フ
ィールドに対して、文字の切り出しを行う。第４のブロ
ック領域については、上辺から３００ｍｍに一番近く
て、できるだけ横に連続した読取フィールド（例えば、
図１３中のＬ３００行付近）に対して、第１の実施形態
の第２の前処理部の処理と同様にして、その文字枠を検
出していき、そのイメージメモリ４−２内のアドレス情
報よりその位置における帳票２−２の傾き度合いを算出
する。For the third block area A3, the read field is closest to 200 mm from the upper side and is as continuous as possible laterally (for example, near the L200 line in FIG. 13).
On the other hand, similarly to the processing of the second preprocessing unit of the first embodiment, the character frame is detected, and the degree of inclination of the form at that position is determined from the address information in the image memory 4-2. To calculate. Then, while using the calculated inclination correction data, characters are cut out in the reading field in the third block area A3. For the fourth block area, the read field that is closest to 300 mm from the upper side and is as continuous as possible in the horizontal reading field (for example,
13) (near line L300 in FIG. 13), the character frame is detected similarly to the processing of the second preprocessing unit of the first embodiment, and the address information in the image memory 4-2 is detected. Then, the degree of inclination of the form 2-2 at that position is calculated.

【００２９】そして、算出した傾き補正データを用いな
がら、第４のブロック領域Ａ４内の読取フィールドに対
して、文字切り出しを行う。このように、各１００ｍｍ
の距離をあけて、文字枠のアドレスを検出していくの
は、全体の処理速度をできるだけ速い状態にしておくた
めである。もし、全ての文字枠のアドレスを検出しなが
ら読取りを行うと、処理速度が大幅に遅くなる。また、
一般的な機構では、１００ｍｍの間に帳票の傾きが大き
く変化することがないので、この程度の間隔（１００ｍ
ｍ程度のこと）で、常に帳票の傾きを検出・追従するこ
とで、十分な性能を得ることができる。Then, using the calculated inclination correction data, character cutting is performed on the read field in the fourth block area A4. In this way, each 100 mm
The reason why the address of the character frame is detected while keeping the distance is is to keep the overall processing speed as fast as possible. If reading is performed while detecting the addresses of all the character frames, the processing speed will be significantly slowed down. Also,
With a general mechanism, the inclination of the form does not change significantly within 100 mm, so this interval (100 m
(about m), sufficient performance can be obtained by always detecting and following the inclination of the form.

【００３０】（ｉ）特徴抽出部７特徴抽出部７では、前処理部３６で切り出された各文字
の特徴を抽出する。（ｊ）認識部８識別部８では、特徴抽出部７から転送されてきた１文字
分の特徴データを用いて、識別用辞書９を参照し、候補
文字を選択する。候補文字は、Ｉ／Ｆ制御部１０を経
て、上位ＷＳ（ワークステーション）に転送され、何等
かの知識処理（単語照合、文脈処理、妥当性チェックな
ど）後に、ある文字符号に決定される。（ｋ）共通制御部１１共通部制御部１１では、識別部８で識別結果が不読（リ
ジェクト）の場合には、システームバス１２を通して、
前処理部２６に通知する。また、１枚の帳票について識
別が終了すると、機構制御部５に次の帳票を給紙するよ
うに指示する。（ｌ）機構制御部５機構制御部５では、共通制御部１１からの指示にしたが
って、機構部１に次の帳票をホッパ上にセットされた帳
票束から給紙するように指示する。以上説明したように、第２の実施形態によれば、複数の
ブロック領域毎に読取フィールドの傾きを求めて、読取
フィールドの傾き補正をするので、第１の実施形態と同
様の利点がある。(I) Feature Extracting Unit 7 The feature extracting unit 7 extracts the features of each character cut out by the preprocessing unit 36. (J) Recognition Unit 8 The identification unit 8 uses the feature data for one character transferred from the feature extraction unit 7 to refer to the identification dictionary 9 to select a candidate character. The candidate character is transferred to the upper WS (workstation) via the I / F control unit 10 and, after some knowledge processing (word matching, context processing, validity check, etc.), is determined as a certain character code. (K) Common Control Unit 11 In the common unit control unit 11, if the identification result of the identification unit 8 is unreadable (reject), the
Notify the preprocessing unit 26. When the identification of one sheet is completed, the mechanism control unit 5 is instructed to feed the next sheet. (L) Mechanism Control Unit 5 In accordance with the instruction from the common control unit 11, the mechanism control unit 5 instructs the mechanism unit 1 to feed the next form from the form bundle set on the hopper. As described above, according to the second embodiment, the inclination of the reading field is obtained for each of the plurality of block areas, and the inclination of the reading field is corrected. Therefore, there is the same advantage as the first embodiment.

【００３１】第３の実施形態図１４は、本発明の第３の実施形態のＯＣＲの構成図で
あり、図１中の要素と共通の要素には共通の符号を付し
てある。本第３の実施形態のＯＣＲが第１の実施形態の
ＯＣＲと異なる点は、各読取フィールドについて、その
読取フィールドの全ての文字枠のアドレスを算出する文
字枠処理部４１設け、前処理部４６は、各文字枠のアド
レスにしたがって、その文字枠内の文字パタンを切り出
すようにしたことである。図１４に示すように、このＯ
ＣＲでは、機構部１、光学系２、第１の光電変換部３−
１，第２の光電変換部３−２、第１のイメージメモリ４
−１，第２のイメージメモリ４−２、機構制御部５、文
字処理部４１、前処理部４６、特徴抽出部７、識別部
８、識別用辞書９、Ｉ／Ｆ制御部１０、共通制御部１
１、及びシステムバス１２により構成されている。 Third Embodiment FIG. 14 is a block diagram of an OCR according to a third embodiment of the present invention. Elements common to those in FIG. 1 are designated by common reference numerals. The OCR of the third embodiment is different from the OCR of the first embodiment in that for each reading field, the character box processing unit 41 that calculates the addresses of all the character frames of the reading field is provided, and the preprocessing unit 46. That is, the character pattern in the character frame is cut out according to the address of each character frame. As shown in FIG.
In CR, the mechanism unit 1, the optical system 2, and the first photoelectric conversion unit 3-
1, second photoelectric conversion unit 3-2, first image memory 4
-1, second image memory 4-2, mechanism control unit 5, character processing unit 41, preprocessing unit 46, feature extraction unit 7, identification unit 8, identification dictionary 9, I / F control unit 10, common control Part 1
1 and the system bus 12.

【００３２】光学系２は、機構部１に対して、帳票２−
２の搬送方向に配設されている。光学系２の出力側は、
光電変換部３−１が接続されている。光電変換部３−１
の出力側は、イメージメモリ４−１が接続され、光電変
換部３−２の出力側は、イメージメモリ４−２が接続さ
れている。イメージメモリ４−１の出力側は、前処理部
４６が接続され、前処理部２６の出力側は、特徴抽出部
７が接続されている。イメージメモリ４−２の出力側
は、文字枠処理部４１が接続され、文字枠処理部４１の
出力側は、前処理部４６が接続されている。特徴抽出部
７の出力側は、識別部８が接続されている。識別部８の
入出力側は、文字の特徴量を格納する識別用辞書９が接
続されている。識別部８の出力側は、Ｉ／Ｆ制御部１０
が接続されている。Ｉ／Ｆ制御部１０の出力側は、上位
ＷＳが接続されている。イメージメモリ４−１，４−
２、機構制御部５、文字枠処理部４１、前処理部４６、
特徴抽出部７、識別部８、Ｉ／Ｆ制御部１０、及び共通
制御部１１は、システムバス１２が接続されている。以
下、図１４の動作（ａ）〜（ｌ）の説明をする。The optical system 2 has a form 2-for the mechanical unit 1.
2 are arranged in the transport direction. The output side of the optical system 2 is
The photoelectric conversion unit 3-1 is connected. Photoelectric conversion unit 3-1
The image memory 4-1 is connected to the output side of, and the image memory 4-2 is connected to the output side of the photoelectric conversion unit 3-2. The output side of the image memory 4-1 is connected to the preprocessing section 46, and the output side of the preprocessing section 26 is connected to the feature extraction section 7. The character frame processing unit 41 is connected to the output side of the image memory 4-2, and the preprocessing unit 46 is connected to the output side of the character frame processing unit 41. An identification unit 8 is connected to the output side of the feature extraction unit 7. The input / output side of the identification unit 8 is connected to an identification dictionary 9 that stores character feature amounts. The output side of the identification unit 8 has an I / F control unit 10
Is connected. An upper WS is connected to the output side of the I / F control unit 10. Image memory 4-1, 4-
2, mechanism control unit 5, character box processing unit 41, pre-processing unit 46,
A system bus 12 is connected to the feature extraction unit 7, the identification unit 8, the I / F control unit 10, and the common control unit 11. The operations (a) to (l) of FIG. 14 will be described below.

【００３３】（ａ）機構部１機構部１は、第１の実施形態の機構部１と同様に動作す
る。（ｂ）光学系２光学系２は、第１の実施形態の光学系２と同様に動作す
る。（ｃ）光電変換部３−１光電変換部３−１は、第１の実施形態の光電変換部３−
１と同様に動作する。（ｄ）光電変換部３−２光電変換部３−２は、第１の実施形態の光電変換部３−
２と同様に動作する。（ｅ）イメージメモリ４−１イメージメモリ４−１には、第１の実施形態と同様に、
記入文字のイメージが格納される。（ｆ）イメージメモリ４−２イメージメモリ４−２には、第１の実施形態と同様に、
文字枠及び記入文字のイメージが格納される。(A) Mechanism Section 1 The mechanism section 1 operates in the same manner as the mechanism section 1 of the first embodiment. (B) Optical system 2 The optical system 2 operates similarly to the optical system 2 of the first embodiment. (C) Photoelectric conversion unit 3-1 The photoelectric conversion unit 3-1 is the photoelectric conversion unit 3- of the first embodiment.
It operates in the same manner as 1. (D) Photoelectric conversion unit 3-2 The photoelectric conversion unit 3-2 is the photoelectric conversion unit 3- of the first embodiment.
It operates in the same way as 2. (E) Image Memory 4-1 In the image memory 4-1, the same as in the first embodiment,
The image of the written character is stored. (F) Image memory 4-2 In the image memory 4-2, as in the first embodiment,
An image of the character frame and the entered character is stored.

【００３４】（ｇ）文字枠処理部４１図１５は、第３の実施形態の文字枠処理部４１の動作説
明図であり、イメージメモリ４−２に格納されたイメー
ジの一例を示している。Ａ５，Ｂ５，Ｃ５，Ｄ５は帳票
の４角であり、（Ｘ０，Ｙ０）は、帳票の上辺及び左辺
を基準とした原点、Ａ５，Ｂ５のアドレスは（ＸＡ５，
ＹＡ５），（ＸＢ５，ＹＢ５）となっている。図１５に
示すように、帳票の上辺の傾きα５と帳票の下辺の傾き
はβ５であり、α５≠β５とする。これは、この帳票は
蛇行しながら読み取っていたことが分かる。まず、文字
枠処理部４１では、上辺の傾きα５を算出し、この上辺
の傾きα５から、１行目の文字枠の各アドレスを第１の
実施形態の第２の前処理部の処理と同様にして算出し
て、前処理部４６に渡す。次に、２行目の文字枠の各ア
ドレスを第１の実施形態の第２の前処理部の処理と同様
にして算出して、前処理部４６に渡す。以下、同様にし
て、３行目、４行目、…、７行目、最終行の各行につい
て、その文字枠の各アドレスを第１の実施形態の第２の
前処理部の処理と同様にして算出して、前処理部４６に
渡す。このように、毎行、文字枠のアドレスを算出して
いるので、帳票の途中で蛇行走行のために、帳票のイメ
ージが曲がっていても、常に追従していくことになる。
例えば、図１５に示すように、４行目から５行目、６行
目と、上辺Ｕの傾きα５に対して、大きく逆方向に蛇行
していても、確実に追随・制御していける。(G) Character Box Processing Unit 41 FIG. 15 is an operation explanatory diagram of the character frame processing unit 41 of the third embodiment, and shows an example of an image stored in the image memory 4-2. A5, B5, C5 and D5 are the four corners of the form, (X0, Y0) is the origin based on the upper and left sides of the form, and the addresses of A5 and B5 are (XA5,
YA5), (XB5, YB5). As shown in FIG. 15, the inclination α5 of the upper side of the form and the inclination α5 of the lower side of the form are β5, and α5 ≠ β5. This shows that this form was being read while meandering. First, in the character box processing unit 41, the inclination α5 of the upper side is calculated, and from the inclination α5 of the upper side, each address of the character frame in the first line is processed in the same manner as the processing of the second preprocessing unit of the first embodiment. Is calculated and passed to the preprocessing unit 46. Next, each address of the character frame on the second line is calculated in the same way as the process of the second preprocessing unit of the first embodiment, and is passed to the preprocessing unit 46. Hereinafter, similarly, for each of the third line, the fourth line, ..., The seventh line, and the last line, each address of the character frame is set to be the same as the process of the second preprocessing unit of the first embodiment. Calculated and passed to the preprocessing unit 46. In this way, since the address of the character frame is calculated for each line, even if the image of the form is bent, it will always follow because of meandering running in the middle of the form.
For example, as shown in FIG. 15, even if the vehicle is meandering largely in the opposite direction with respect to the fourth line to the fifth line and the sixth line and the inclination α5 of the upper side U, it is possible to reliably follow and control.

【００３５】（ｈ）前処理部４６前処理部４６では、文字枠処理部４１から出力された各
文字枠のアドレスにしたがって、その文字枠内の文字パ
タンを切り出し、文字パタンを特徴抽出部７に渡す。（ｉ）特徴抽出部７特徴抽出部７では、前処理部３６で切り出された各文字
の特徴を抽出する。（ｊ）認識部８識別部８では、特徴抽出部７から転送されてきた１文字
分の特徴データを用いて、識別用辞書９を参照し、候補
文字を選択する。候補文字は、Ｉ／Ｆ制御部１０を経
て、上位ＷＳ（ワークステーション）に転送され、何等
かの知識処理（単語照合、文脈処理、妥当性チェックな
ど）後に、ある文字符号に決定される。(H) Pre-Processing Unit 46 In the pre-processing unit 46, the character pattern within the character frame is cut out according to the address of each character frame output from the character frame processing unit 41, and the character pattern is extracted by the feature extraction unit 7 Pass to. (I) Feature Extraction Unit 7 The feature extraction unit 7 extracts the features of each character cut out by the preprocessing unit 36. (J) Recognition Unit 8 The identification unit 8 uses the feature data for one character transferred from the feature extraction unit 7 to refer to the identification dictionary 9 to select a candidate character. The candidate character is transferred to the upper WS (workstation) via the I / F control unit 10 and, after some knowledge processing (word matching, context processing, validity check, etc.), is determined as a certain character code.

【００３６】（ｋ）共通制御部１１共通部制御部１１では、識別部８で識別結果が不読（リ
ジェクト）の場合には、システームバス１２を通して、
前処理部２６に通知する。また、１枚の帳票について識
別が終了すると、機構制御部５に次の帳票を給紙するよ
うに指示する。（ｌ）機構制御部５機構制御部５では、共通制御部１１からの指示にしたが
って、機構部１に次の帳票をホッパ上にセットされた帳
票束から給紙するように指示する。以上説明したように、第３の実施形態によれば、第１の
実施形態と同様の利点がある上に、文字枠処理部４１を
設けたので、文字枠のアドレスの算出と文字の切り出し
とが別々に行われるので、速度的には十分な性能を得る
ことができる。なお、本発明は、上記実施形態に限定さ
れず種々の変形が可能である。その変形例としては、例
えば、次のようなものがある。(K) Common Control Section 11 In the common section control section 11, if the identification result of the identification section 8 is unreadable (reject), the
Notify the preprocessing unit 26. When the identification of one sheet is completed, the mechanism control unit 5 is instructed to feed the next sheet. (L) Mechanism Control Unit 5 In accordance with the instruction from the common control unit 11, the mechanism control unit 5 instructs the mechanism unit 1 to feed the next form from the form bundle set on the hopper. As described above, according to the third embodiment, in addition to the same advantages as the first embodiment, the character box processing unit 41 is provided. Is performed separately, it is possible to obtain sufficient performance in terms of speed. Note that the present invention is not limited to the above embodiment, and various modifications are possible. For example, there are the following modifications.

【００３７】（１）第１の実施形態において、不読と
なった文字枠のアドレスを求めて、その文字を再切り出
ししてもよいし、不読となった読取フィールドの文字枠
のアドレスは、左端の文字枠と右端の文字枠についての
み算出して、式（３）に示す傾きを求め、その傾きで傾
き補正して、不読となった文字を切り出ししてもよい、
（２）第２の実施形態で分割するブロック領域は、そ
の帳票のフィールドの数などにより適宜決定すればよ
く、また、そのブロック領域のサイズは、１つの帳票内
で異なっていても勿論構わない。(1) In the first embodiment, the address of a character box that has become unreadable may be obtained and the character may be cut out again. , The leftmost character frame and the rightmost character frame are calculated to obtain the inclination shown in Expression (3), the inclination is corrected with the inclination, and the unreadable character may be cut out.
(2) The block area to be divided in the second embodiment may be appropriately determined according to the number of fields of the form, and the size of the block area may be different in one form. .

【００３８】[0038]

【発明の効果】以上詳細に説明したように、第１〜第３
の発明によれば、不読となった文字パタンをその文字枠
のアドレスを算出して、文字パタンの再切り出しをす
る、複数のブロック領域に分割して、各ブロック領域の
傾きを求めて、読出フィールドを傾き補正する、又は各
文字枠のアドレスを求めて、文字を切り出すようにした
ので、帳票が蛇行することが原因の切り出し不良による
不読や誤読が発生することがなくなる。As described in detail above, the first to third embodiments
According to the invention, the unreadable character pattern is divided into a plurality of block areas in which the address of the character frame is calculated and the character pattern is re-cut out, and the inclination of each block area is obtained. Since the characters are cut out by correcting the inclination of the read field or by obtaining the address of each character frame, no misreading or misreading due to a cutout failure due to the meandering of the form does not occur.

[Brief description of drawings]

【図１】本発明の第１の実施形態のＯＣＲの構成図であ
る。FIG. 1 is a configuration diagram of an OCR according to a first embodiment of the present invention.

【図２】従来のＯＣＲの構成図である。FIG. 2 is a configuration diagram of a conventional OCR.

【図３】ＯＣＲの給紙・読取機構概念図である。FIG. 3 is a conceptual diagram of an OCR sheet feeding / reading mechanism.

【図４】イメージメモリ４内における帳票のイメージを
示す図である。FIG. 4 is a diagram showing an image of a form in the image memory 4.

【図５】図１中の光電変換部の構成の例を示す図であ
る。5 is a diagram showing an example of a configuration of a photoelectric conversion unit in FIG.

【図６】文字枠の例を示す図である。FIG. 6 is a diagram showing an example of a character frame.

【図７】光電変換部の動作概念図である。FIG. 7 is an operation conceptual diagram of a photoelectric conversion unit.

【図８】イメージメモリ４−１，４−２に格納されたイ
メージの例を示す図である。FIG. 8 is a diagram showing an example of images stored in image memories 4-1 and 4-2.

【図９】第１の実施形態の前処理部２６の動作説明図で
ある。FIG. 9 is an operation explanatory diagram of the preprocessing unit 26 of the first embodiment.

【図１０】読取フィールドＲＦ２の切り出しイメージを
示す図である。FIG. 10 is a diagram showing a cutout image of a reading field RF2.

【図１１】文字枠検索の例を示す図である。FIG. 11 is a diagram illustrating an example of character box search.

【図１２】本発明の第２の実施形態のＯＣＲの構成図で
ある。FIG. 12 is a configuration diagram of an OCR according to a second embodiment of the present invention.

【図１３】第２の実施形態の前処理部３６の動作説明図
である。FIG. 13 is an operation explanatory diagram of the preprocessing unit 36 according to the second embodiment.

【図１４】本発明の第３の実施形態のＯＣＲの構成図で
ある。FIG. 14 is a configuration diagram of an OCR according to a third embodiment of the present invention.

【図１５】第３の実施形態の文字枠処理部４１の動作説
明図である。FIG. 15 is an operation explanatory diagram of the character box processing unit 41 according to the third embodiment.

[Explanation of symbols]

１機構部２光学系３−１，３−２光電変換部４−１，４−２イメージメモリ５機構制御部２６，３６，４６前処理部７特徴抽出部８識別部９識別用辞書１０Ｉ／Ｆ制御部１１共通制御部４１文字枠処理部 DESCRIPTION OF SYMBOLS 1 Mechanism part 2 Optical system 3-1 and 3-2 Photoelectric conversion part 4-1 and 4-2 Image memory 5 Mechanism control part 26,36,46 Pre-processing part 7 Feature extraction part 8 Identification part 9 Identification dictionary 10 I / F control unit 11 common control unit 41 character box processing unit

Claims

[Claims]

1. A mechanism unit for feeding and transporting a form in which characters are written or printed in a character frame printed in dropout color, and a lamp is provided, and the form conveyed by the lamp is provided. And a first photoelectric conversion unit that obtains an image of a written character or a printed character in the character frame by converting the optical signal output from the optical system into an electric signal, A first image memory that stores an image obtained by the first photoelectric conversion unit; and an optical signal output from the optical system that amplifies a minute signal that is close to a white leher included in the dropout color. A second photoelectric conversion unit that obtains an image of the character frame by extracting and converting into an electric signal; a second image memory that stores the image obtained by the second photoelectric conversion unit; Upper standard lie Information of each field that is an area including one or a plurality of the character frames beside the form and the form, and form format information indicating the information of each character frame in each field First, cut out each character pattern in the reading field based on
When the cut-out character is determined to be unreadable by the pre-processing unit and the identifying unit, the cutting-out status of the unread character pattern is checked,
When the upper or lower part of the character pattern is in contact with the frame of the cutout range, it is determined that the cutout position is inappropriate, and the address of the character frame in the reading field of the unread character is set to the second value. A second pre-processing unit, which calculates from an image stored in an image memory and re-cuts out unreadable character patterns according to the address of the character frame; and the first and second pre-processing An optical character reading device, comprising: an identification unit for recognizing characters cut out by the unit.

2. A mechanism part for feeding and conveying a form in which characters are written or printed in a character frame printed in dropout color, and a lamp is provided, and the form conveyed by the lamp is provided. And a first photoelectric conversion unit that obtains an image of a written character or a printed character in the character frame by converting the optical signal output from the optical system into an electric signal, A first image memory that stores an image obtained by the first photoelectric conversion unit; and an optical signal output from the optical system that amplifies a minute signal that is close to a white leher included in the dropout color. A second photoelectric conversion unit for extracting and converting into an electric signal to obtain the image of the character frame; a second image memory for storing the image obtained by the second photoelectric conversion unit; and the form. Multiple blocks Divide into areas, for the top block area, measure the inclination from the reference line of the top side of the form, for the other block areas,
Based on the information of each field that is an area including one or a plurality of the character frames beside the form and the form format information that represents the information of each character frame in each field, the second image memory The address of the character frame included in one field included in each block area is calculated from the image, the inclination of each block area is measured from the address of the character frame, and based on the measured inclination of each block area, The reading field in each block area is tilt-corrected to cut out a character pattern in the reading field, and a preprocessing section for recognizing the characters cut out by the preprocessing section are provided. An optical character reader.

3. A mechanism unit for feeding and conveying a form in which characters are written or printed in a character frame printed in dropout color, and a lamp is provided, and the form conveyed by the lamp is conveyed. And a first photoelectric conversion unit that obtains an image of a written character or a printed character in the character frame by converting the optical signal output from the optical system into an electric signal, A first image memory that stores an image obtained by the first photoelectric conversion unit; and an optical signal output from the optical system that amplifies a minute signal that is close to a white leher included in the dropout color. A second photoelectric conversion unit for extracting and converting into an electric signal to obtain an image of the character frame, a second image memory for storing the image obtained by the second photoelectric conversion unit, and each character frame Form that represents information A character frame processing unit that calculates the addresses of all the character frames from the image of the second image memory based on the format information, and a pre-processing for cutting out the character pattern in the character frames according to the address of the character frame. An optical character reading device comprising: a unit; and an identification unit that recognizes the character cut out by the preprocessing unit.