JP7601982B2

JP7601982B2 - Character recognition device and image preprocessing method

Info

Publication number: JP7601982B2
Application number: JP2023169752A
Authority: JP
Inventors: 雄太松本; 弘暉久野
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2023-03-02
Filing date: 2023-09-29
Publication date: 2024-12-17
Anticipated expiration: 2043-03-02
Also published as: JP2024124307A

Description

本発明は、文字認識装置及び画像前処理方法に関する。 The present invention relates to a character recognition device and an image preprocessing method.

手書き文字や印刷文字を光学的に読み取った画像を、コンピュータが利用可能なデジタルデータ（例えば、文字コード）に変換する光学文字認識（Optical Character Recognition，ＯＣＲ）技術が活用されている。光学的な読取りは、イメージスキャナやデジタルカメラ等の光学デバイスによって実現される。読み取られた画像は、パターン認識等の画像処理によってデジタルデータに変換される。 Optical character recognition (OCR) technology is used to convert images of handwritten or printed characters optically read into digital data (e.g., character codes) that can be used by a computer. Optical reading is achieved by optical devices such as image scanners and digital cameras. The read image is then converted into digital data through image processing such as pattern recognition.

また、いわゆる人工知能（Artificial Intelligence，ＡＩ）技術が目覚ましく発展している。近年のＡＩ技術の重要なマイルストーンとして、入力層と出力層との間に多数の中間層を有する深層ニューラルネットワークを用いた深層学習（Deep Learning）、注意（Attention）機構を用いて構成されるエンコーダ／デコーダ型のモデルであるトランスフォーマ（Transformer）等が挙げられる。 Also, so-called artificial intelligence (AI) technology has been developing remarkably. Important milestones in AI technology in recent years include deep learning, which uses a deep neural network with multiple intermediate layers between the input layer and the output layer, and the Transformer, an encoder/decoder type model that uses an attention mechanism.

ＡＩ技術の主要な適用分野の１つとして画像処理技術が挙げられる。上述したように、ＯＣＲ技術においては画像処理が用いられることから、現在、ＡＩ技術をＯＣＲ技術に適用したＡＩ－ＯＣＲ技術が発展の端緒にある（例えば、特許文献１）。 One of the main application areas of AI technology is image processing technology. As mentioned above, image processing is used in OCR technology, and AI-OCR technology, which applies AI technology to OCR technology, is currently at the beginning of development (for example, Patent Document 1).

特開２０２３－００３６４８号公報JP 2023-003648 A

ＯＣＲ処理におけるＡＩ技術を利用した歪み補正は未だ発展中の分野であって、今後の様々な技術開発が待望されている。ＯＣＲ処理に対して入力される画像（以下、実入力画像と称する）の歪みが小さいほど、適切な文字認識が実現される。ＡＩ－ＯＣＲのデファクトスタンダードとなるような突出した技術又は技術の組合せは、未だ見出されていない。 Distortion correction using AI technology in OCR processing is still a developing field, and various technological developments are expected in the future. The smaller the distortion of the image input to OCR processing (hereinafter referred to as the actual input image), the more appropriate character recognition will be achieved. A standout technology or combination of technologies that can become the de facto standard for AI-OCR has yet to be found.

以上の事情に鑑み、本発明は、実入力画像に記載されている認識されるべき文字を適切に処理することを目的とする。 In view of the above, the present invention aims to appropriately process characters to be recognized that are written in an actual input image.

上記目的を達成するために、本発明に係る文字認識装置は、複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出する特徴点検出部と、前記特徴点検出部が検出した前記複数の特徴点から外れ点を除去する外れ点除去部と、前記外れ点除去部が前記外れ点を除去した後の除去済み特徴点を用いて前記実入力画像の歪みを補正する歪み補正部と、前記歪み補正部が補正した後の補正済み実入力画像における前記認識エリアに相当するエリアに含まれる補正済み文字に対して文字認識を行う文字認識部と、を含む。 To achieve the above object, the character recognition device according to the present invention includes a feature point detection unit that detects a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image, an outlier point removal unit that removes outliers from the plurality of feature points detected by the feature point detection unit, a distortion correction unit that corrects distortion of the actual input image using the removed feature points after the outlier point removal unit has removed the outliers, and a character recognition unit that performs character recognition on corrected characters included in areas corresponding to the recognition areas in the corrected actual input image after correction by the distortion correction unit.

また、本発明に係る画像前処理方法は、コンピュータのプロセッサにより、複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出することと、検出された前記複数の特徴点から外れ点を除去することと、前記外れ点が除去された後の除去済み特徴点を用いて前記実入力画像の歪みを補正することと、を有する。 The image preprocessing method according to the present invention includes detecting, by a computer processor, a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image, removing outliers from the detected plurality of feature points, and correcting distortion of the actual input image using the removed feature points after the outliers have been removed.

以上の構成によれば、実入力画像に記載されている認識されるべき文字を適切に処理することが可能である。なお、以上の構成により、当該効果の代わりに、又は当該効果とともに、他の効果が奏されてもよい。 The above configuration makes it possible to appropriately process characters to be recognized that are written in an actual input image. Note that the above configuration may provide other effects instead of or in addition to the above effect.

第１実施形態に係る文字認識装置２０を含む文字認識システムＳを概略的に示す図である。1 is a diagram illustrating a character recognition system S including a character recognition device 20 according to a first embodiment. 第１実施形態に係るユーザ端末１０のハードウェア構成図である。FIG. 2 is a hardware configuration diagram of the user terminal 10 according to the first embodiment. 第１実施形態に係る文字認識装置２０のハードウェア構成図である。1 is a hardware configuration diagram of a character recognition device 20 according to a first embodiment. 第１実施形態に係る文字認識装置２０のソフトウェア構成図である。FIG. 2 is a diagram illustrating a software configuration of the character recognition device 20 according to the first embodiment. 第１実施形態に係る文字認識の詳細処理を示すフローチャートである。5 is a flowchart showing detailed processing of character recognition according to the first embodiment. 第１実施形態に係るテンプレート画像Ｉ_Ｔの例を示す図である。FIG. 2 is a diagram showing an example of a template image _IT according to the first embodiment. 第１実施形態に係る実入力画像Ｉ_Ｒの例を示す図である。FIG. 2 is a diagram showing an example of an actual input image I _R according to the first embodiment. 第１実施形態に係る特徴点検出の説明図である。FIG. 4 is an explanatory diagram of feature point detection according to the first embodiment. 第１実施形態に係る補正済み実入力画像Ｉ_ＲＡの例を示す図である。FIG. 2 is a diagram showing an example of a corrected actual input image _IRA according to the first embodiment. 図９の対比例を示す図である。FIG. 10 is a diagram showing a comparative example of FIG. 9 . 第２実施形態に係る文字認識装置２０のソフトウェア構成図である。FIG. 11 is a diagram illustrating a software configuration of a character recognition device 20 according to a second embodiment.

以下、添付の図面を参照して本発明の実施形態を詳細に説明する。なお、本明細書及び図面において、同様に説明されることが可能な要素については、同一の符号を付することにより重複した説明が省略され得る。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the attached drawings. In this specification and drawings, elements that can be described in the same way may be designated by the same reference numerals to avoid redundant description.

以下に説明される各実施形態は、本発明を実現可能な構成の一例に過ぎない。以下の各実施形態は、本発明が適用される装置の構成や各種の条件に応じて適宜に修正又は変更することが可能である。以下の各実施形態に含まれる要素の組合せの全てが本発明を実現するのに必須であるとは限られず、要素の一部を適宜に省略することが可能である。したがって、本発明の範囲は、以下の各実施形態に記載される構成によって限定されるものではない。相互に矛盾のない限りにおいて、以下の実施形態内に記載された複数の構成を組み合わせた構成も採用可能である。 Each of the embodiments described below is merely one example of a configuration that can realize the present invention. Each of the following embodiments can be modified or changed as appropriate depending on the configuration of the device to which the present invention is applied and various conditions. Not all of the combinations of elements included in each of the following embodiments are necessarily essential to realize the present invention, and some of the elements can be omitted as appropriate. Therefore, the scope of the present invention is not limited by the configurations described in each of the following embodiments. As long as there are no mutual contradictions, a configuration that combines multiple configurations described in the following embodiments can also be adopted.

１．第１実施形態
図１は、第１実施形態に係る文字認識装置２０を含む文字認識システムＳを概略的に示す図である。文字認識システムＳは、ユーザ端末１０と、文字認識装置２０と、を含む。文字認識システムＳが、他の構成要素を含んでもよい。他の構成要素とは、例えば、ユーザ端末１０と文字認識装置２０との間に配置され、種々のデータ処理を実行するフロントエンドサーバである。 1. First embodiment Fig. 1 is a diagram illustrating a character recognition system S including a character recognition device 20 according to a first embodiment. The character recognition system S includes a user terminal 10 and the character recognition device 20. The character recognition system S may include other components. The other components are, for example, a front-end server that is disposed between the user terminal 10 and the character recognition device 20 and executes various data processing.

ユーザ端末１０は、ユーザが使用するスマートフォンやＰＣ等の端末装置である。ユーザは、ユーザ端末１０を用いて文字認識装置２０が提供するアプリケーションやサービスを使用する。ユーザ端末１０は、例えば、無線通信ネットワーク及びインターネットを介して文字認識装置２０に接続する。 The user terminal 10 is a terminal device such as a smartphone or a PC used by a user. The user uses the user terminal 10 to use applications and services provided by the character recognition device 20. The user terminal 10 is connected to the character recognition device 20 via, for example, a wireless communication network and the Internet.

文字認識装置２０は、ユーザ端末１０から送信される画像（実入力画像）に対してＯＣＲ処理を実行するサーバ装置である。文字認識装置２０は、例えば、保険金申請のための診断書画像や、銀行口座開設のための申込書画像などの様々な定形帳票に対応した実入力画像に対して、ＯＣＲ処理を実行する。 The character recognition device 20 is a server device that performs OCR processing on an image (actual input image) sent from the user terminal 10. The character recognition device 20 performs OCR processing on actual input images corresponding to various standard forms, such as images of medical certificates for insurance claims and images of application forms for opening a bank account.

なお、文字認識装置２０は、ＯＣＲ処理による文字認識結果を、他のサーバ装置に送信してよい。また、文字認識装置２０は、ＯＣＲ処理を含む種々のアプリケーション処理を実行するサーバ装置であってもよい。すなわち、文字認識装置２０は、本実施形態で説明される機能のみを提供してもよいし、他の機能を併せて提供してもよい。文字認識装置２０は、オンプレミス環境に配置されてもよく、他の企業によって提供されるクラウド環境に配置されてもよい。また、文字認識装置２０は文字認識部２８による通常のＯＣＲ処理のみを実行し、特徴点検出部２２、外れ点除去部２４並びに歪み補正部２６を含むコンピュータである前処理装置を別に配置し、ユーザ端末から受領した画像をこの前処理装置で処理された画像を文字認識装置２０の文字認識部２８でＯＣＲ処理して文字認識結果を得る構成であってもよい。さらには、文字認識装置２０の各部全て又はその一部を一つのアプリケーションとして構成し、ユーザ端末にインストールして実行する形態であってもよい。 The character recognition device 20 may transmit the character recognition result by the OCR process to another server device. The character recognition device 20 may also be a server device that executes various application processes including the OCR process. That is, the character recognition device 20 may provide only the functions described in this embodiment, or may also provide other functions. The character recognition device 20 may be arranged in an on-premise environment, or in a cloud environment provided by another company. The character recognition device 20 may also be configured to execute only normal OCR processing by the character recognition unit 28, and to separately arrange a pre-processing device that is a computer including the feature point detection unit 22, the outlier removal unit 24, and the distortion correction unit 26, and to obtain a character recognition result by OCR processing an image received from a user terminal and processed by this pre-processing device in the character recognition unit 28 of the character recognition device 20. Furthermore, all or part of the parts of the character recognition device 20 may be configured as one application, which is installed and executed on the user terminal.

文字認識装置２０は、ＯＣＲ処理を実現するための機能部として、特徴点検出部２２と外れ点除去部２４と歪み補正部２６と文字認識部２８とを含む。各部の詳細については後述される。 The character recognition device 20 includes, as functional units for implementing OCR processing, a feature point detection unit 22, an outlier removal unit 24, a distortion correction unit 26, and a character recognition unit 28. Details of each unit will be described later.

図２は、第１実施形態に係るユーザ端末１０のハードウェア構成図である。図２に示すように、ユーザ端末１０は、プロセッサ１０１とメモリ１０２と入出力インタフェース１０３と通信インタフェース１０４とを有する。ユーザ端末１０に設けられる以上の要素は内部バスによって相互に接続される。なお、ユーザ端末１０は、図２に示された要素以外のハードウェア要素を有してもよい。 Figure 2 is a hardware configuration diagram of the user terminal 10 according to the first embodiment. As shown in Figure 2, the user terminal 10 has a processor 101, a memory 102, an input/output interface 103, and a communication interface 104. The above elements provided in the user terminal 10 are connected to each other by an internal bus. Note that the user terminal 10 may have hardware elements other than the elements shown in Figure 2.

プロセッサ１０１は、ユーザ端末１０の種々の機能を実現する演算素子である。プロセッサ１０１は、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、メモリコントローラ等の要素を含むＳｏＣ（System-on-a-Chip）であってよい。 The processor 101 is a computing element that realizes various functions of the user terminal 10. The processor 101 may be a system-on-a-chip (SoC) that includes elements such as a central processing unit (CPU), a graphics processing unit (GPU), and a memory controller.

メモリ１０２は、ＲＡＭ（Random Access Memory）、ｅＭＭＣ（embedded Multi Media Card）等の記憶媒体によって構成される。メモリ１０２は、ユーザ端末１０における種々の処理を実行するのに用いられるプログラム及びデータを一時的又は恒久的に格納する要素である。上記プログラムは、ユーザ端末１０の動作のための１つ以上の命令を含む。プロセッサ１０１は、メモリ１０２に記憶されたプログラムをメモリ１０２及び／又は不図示のシステムメモリに展開し実行することによって、ユーザ端末１０の機能を実現する。 The memory 102 is composed of a storage medium such as a RAM (Random Access Memory) or an eMMC (embedded Multi Media Card). The memory 102 is an element that temporarily or permanently stores programs and data used to execute various processes in the user terminal 10. The programs include one or more instructions for the operation of the user terminal 10. The processor 101 realizes the functions of the user terminal 10 by expanding and executing the programs stored in the memory 102 in the memory 102 and/or a system memory (not shown).

入出力インタフェース１０３は、ユーザ端末１０への操作を受け付けてプロセッサ１０１に供給すると共に、種々の情報をユーザに提示するインタフェースであって、例えば、タッチパネル、又はキーボード及びディスプレイである。 The input/output interface 103 is an interface that accepts operations on the user terminal 10 and supplies them to the processor 101, as well as presenting various information to the user, and is, for example, a touch panel or a keyboard and display.

通信インタフェース１０４は、インターネット通信を実現するための種々の信号処理を実行する回路であって、例えば、ネットワークインタフェースカード（ＮＩＣ）である。 The communication interface 104 is a circuit that performs various signal processing to realize Internet communication, such as a network interface card (NIC).

図３は、第１実施形態に係る文字認識装置２０のハードウェア構成図である。図３に示すように、文字認識装置２０は、プロセッサ２０１とメモリ２０２と入出力インタフェース２０３と通信インタフェース２０４とを有する。文字認識装置２０に設けられる以上の要素は内部バスによって相互に接続される。なお、文字認識装置２０は、図３に示された要素以外のハードウェア要素を有してもよい。 Fig. 3 is a hardware configuration diagram of the character recognition device 20 according to the first embodiment. As shown in Fig. 3, the character recognition device 20 has a processor 201, a memory 202, an input/output interface 203, and a communication interface 204. The above elements provided in the character recognition device 20 are connected to each other by an internal bus. Note that the character recognition device 20 may have hardware elements other than the elements shown in Fig. 3.

プロセッサ２０１は、文字認識装置２０の種々の機能を実現する演算素子である。プロセッサ２０１は、ＣＰＵであってよく、さらにＧＰＵ等の他のプロセッサを含んでもよい。 The processor 201 is a computing element that realizes various functions of the character recognition device 20. The processor 201 may be a CPU, and may further include other processors such as a GPU.

メモリ２０２は、ＲＡＭ、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）等の記憶媒体によって構成される。メモリ２０２は、文字認識装置２０における種々の処理を実行するのに用いられるプログラム及びデータを一時的又は恒久的に格納する要素である。上記プログラムは、文字認識装置２０の動作のための１つ以上の命令を含む。プロセッサ２０１は、メモリ２０２に記憶されたプログラムをメモリ２０２及び／又は不図示のシステムメモリに展開し実行することによって、文字認識装置２０の機能を実現する。 The memory 202 is composed of a storage medium such as a RAM, a ROM (Read Only Memory), a HDD (Hard Disk Drive), or an SSD (Solid State Drive). The memory 202 is an element that temporarily or permanently stores programs and data used to execute various processes in the character recognition device 20. The programs include one or more instructions for the operation of the character recognition device 20. The processor 201 realizes the functions of the character recognition device 20 by expanding and executing the programs stored in the memory 202 in the memory 202 and/or a system memory (not shown).

入出力インタフェース２０３は、文字認識装置２０への操作を受け付けてプロセッサ２０１に供給すると共に、種々の情報をユーザに提示するインタフェースであって、例えば、キーボード及びディスプレイである。なお、文字認識装置２０が入出力インタフェース２０３を有さず、遠隔操作されてもよい。 The input/output interface 203 is an interface that accepts operations on the character recognition device 20 and supplies them to the processor 201, and presents various information to the user, and is, for example, a keyboard and a display. Note that the character recognition device 20 may not have the input/output interface 203 and may be remotely operated.

通信インタフェース２０４は、インターネット通信を実現するための種々の信号処理を実行する回路であって、例えば、ネットワークインタフェースカード（ＮＩＣ）である。 The communication interface 204 is a circuit that performs various signal processing to realize Internet communication, such as a network interface card (NIC).

図４は、第１実施形態に係る文字認識装置２０のソフトウェア構成図である。図４に示すように、文字認識装置２０は、制御部２１０と記憶部２２０と通信部２３０とを有する。 Figure 4 is a software configuration diagram of the character recognition device 20 according to the first embodiment. As shown in Figure 4, the character recognition device 20 has a control unit 210, a memory unit 220, and a communication unit 230.

制御部２１０は、特徴点検出部２２と外れ点除去部２４と歪み補正部２６と文字認識部２８とを含む種々の機能を実現するソフトウェア要素であって、前述されたプロセッサ２０１によって実現される。以下、制御部２１０の動作を概略的に説明する。 The control unit 210 is a software element that realizes various functions including the feature point detection unit 22, the outlier removal unit 24, the distortion correction unit 26, and the character recognition unit 28, and is realized by the processor 201 described above. The operation of the control unit 210 will be briefly described below.

特徴点検出部２２は、複数の認識エリアＲＡを含むテンプレート画像Ｉ_Ｔと、テンプレート画像Ｉ_Ｔに対応するフォーマットＦに認識されるべき文字が記載されている実入力画像Ｉ_Ｒと、の間で互いに対応する複数の特徴点Ｐ_Ｆを検出する。 The feature point detection unit 22 detects a plurality of corresponding feature points PF between a template image _IT including a plurality of recognition areas RA and an actual input image _IR in which characters to be recognized are written in a format _F corresponding to the template image _IT .

外れ点除去部２４は、特徴点検出部２２が検出した複数の特徴点Ｐ_Ｆから外れ点Ｐ_Ｏを除去する。 The outlier removal section 24 removes the outlier points P _O from the plurality of feature points P _F detected by the feature point detection section 22 .

歪み補正部２６は、外れ点除去部２４が外れ点を除去した後の除去済み特徴点Ｐ_ＦＥを用いて実入力画像Ｉ_Ｒの歪みを補正する。 The distortion correction unit 26 corrects the distortion of the actual input image I _R by using the removed feature points P _FE obtained after the outlier removal unit 24 has removed the outlier points.

文字認識部２８は、歪み補正部２６が補正した後の補正済み実入力画像Ｉ_ＲＡにおける認識エリアＲＡに相当するエリアＣＡに含まれる補正済み文字Ｃ’に対して文字認識を行う。 The character recognition unit 28 performs character recognition on the corrected character C′ included in the area CA, which corresponds to the recognition area RA in the corrected actual input image _IRA corrected by the distortion correction unit 26 .

記憶部２２０は、制御部２１０によって使用される種々のデータ及びプログラムを記憶する要素であって、プロセッサ２０１と協働するメモリ２０２によって実現される。 The storage unit 220 is an element that stores various data and programs used by the control unit 210, and is realized by the memory 202 that cooperates with the processor 201.

通信部２３０は、制御部２１０による制御の下で他の装置と通信する要素であって、プロセッサ２０１と協働する通信インタフェース２０４によって実現される。 The communication unit 230 is an element that communicates with other devices under the control of the control unit 210, and is realized by the communication interface 204 that cooperates with the processor 201.

図５から図１０を参照して、第１実施形態に係る文字認識の詳細処理を説明する。図５は、第１実施形態に係る文字認識の詳細処理を示すフローチャートである。 Details of the character recognition process according to the first embodiment will be described with reference to Figures 5 to 10. Figure 5 is a flowchart showing the details of the character recognition process according to the first embodiment.

ステップＳ５１０において、まず、特徴点検出部２２は、ユーザ端末１０から送信された実入力画像Ｉ_Ｒを受信すると共に、記憶部２２０に記憶されているテンプレート画像Ｉ_Ｔを読み出す。 In step S510, first, the feature point detection unit 22 receives the actual input image I _R transmitted from the user terminal 10 and reads out the template image I _T stored in the storage unit 220.

図６は、第１実施形態に係るテンプレート画像Ｉ_Ｔの例を示す図である。テンプレート画像Ｉ_Ｔは、ユーザが文字を記入するための紙媒体（フォーマットＦ）をプリントするのに用いられる画像データである。フォーマットＦは、プリント済みの紙媒体としてユーザに提供されてもよいし、ユーザ自身によってプリントされてもよい。 6 is a diagram showing an example of a template image _IT according to the first embodiment. The template image _IT is image data used to print a paper medium (format F) on which a user writes characters. The format F may be provided to the user as a printed paper medium, or may be printed by the user himself.

図６に示すように、テンプレート画像Ｉ_Ｔは、複数の認識エリアＲＡを含む。認識エリアＲＡは、例えば、漢字や数字が記入されるエリアと、チェック印によってチェックされるエリアとを含む。図６においては、作図の簡単のために、一部の認識エリアＲＡのみに符号が付されている。認識エリアＲＡは、テンプレート画像Ｉ_Ｔを示す画像データにおいて、画像内の座標によって特定される所定の領域（例えば、矩形領域）を占めると共に、プリントされたフォーマットＦにおいて所定の物理的領域を占める。 As shown in Fig. 6, the template image _IT includes a plurality of recognition areas RA. The recognition areas RA include, for example, an area in which Chinese characters or numbers are written, and an area to be checked with a check mark. In Fig. 6, for the sake of simplicity, only some of the recognition areas RA are labeled with symbols. The recognition areas RA occupy a predetermined area (e.g., a rectangular area) specified by coordinates in the image in the image data representing the template image _IT , and also occupy a predetermined physical area in the printed format F.

図７は、第１実施形態に係る実入力画像Ｉ_Ｒの例を示す図である。図７に示すように、実入力画像Ｉ_Ｒは、テンプレート画像Ｉ_Ｔに対応するフォーマットＦを撮影した画像であって、ユーザがフォーマットＦに記載した文字が含まれる。実入力画像Ｉ_Ｒは、現実の紙媒体であるフォーマットＦを撮影することによって取得される画像データである。したがって、図７のように、実入力画像Ｉ_Ｒは、紙の折り目や撮影角度等の複数の要因に基づいて生じた歪みを有する場合が多い。 Fig. 7 is a diagram showing an example of a real input image I _R according to the first embodiment. As shown in Fig. 7, the real input image I _R is an image obtained by photographing a format F corresponding to the template image I _T , and includes characters written by a user in the format F. The real input image I _R is image data acquired by photographing the format F, which is an actual paper medium. Therefore, as shown in Fig. 7, the real input image I _R often has distortions caused by multiple factors such as folds in the paper and the photographing angle.

特徴点検出部２２は、ステップＳ５１０において、歪み補正部２６による実入力画像Ｉ_Ｒの歪み補正（ステップＳ５３０）の前処理として、以下のような特徴点検出を実行する。 In step S510, the feature point detection unit 22 executes the following feature point detection as preprocessing for the distortion correction of the actual input image I _R by the distortion correction unit 26 (step S530).

図８は、第１実施形態に係る特徴点検出の説明図である。特徴点検出部２２は、例えば、ＬｏＦＴＲ（Local Feature Matching with Transformers）アルゴリズムに基づいて、テンプレート画像Ｉ_Ｔと実入力画像Ｉ_Ｒとの間で互いに対応する複数の特徴点Ｐ_Ｆを検出する。 8 is an explanatory diagram of feature point detection according to the first embodiment. The feature point detection unit 22 detects a plurality of corresponding feature points _PF between the template image I _T and the actual input image I _R based on, for example, a Local Feature Matching with Transformers (LoFTR) algorithm.

特徴点Ｐ_Ｆは、例えばエリアの境界や矩形枠のコーナー等の画像上の特徴的な点であって、本実施形態においてはテンプレート画像Ｉ_Ｔと実入力画像Ｉ_Ｒとの間で互いに対応している。互いに対応するテンプレート画像Ｉ_Ｔ上の特徴点Ｐ_Ｆ及び実入力画像Ｉ_Ｒ上の特徴点Ｐ_Ｆは、類似する特徴量を有する。 The feature points P _F are characteristic points on an image, such as the boundaries of an area or the corners of a rectangular frame, and in this embodiment correspond to each other between the template image I _T and the actual input image I _R. The corresponding feature points P _F on the template image I _T and the corresponding feature points P _F on the actual input image I _R have similar feature amounts.

特徴点検出部２２は、テンプレート画像Ｉ_Ｔに含まれる点と実入力画像Ｉ_Ｒに含まれる点とが対応していることの程度を示す確信度（Confidence）が所定の閾値（例えば、９８％又は９９％）を上回る点を、複数の特徴点Ｐ_Ｆとして抽出すると好適である。上記した特徴点Ｐ_Ｆの抽出は、例えば、ＬｏＦＴＲアルゴリズムに従って実行される。ＬｏＦＴＲアルゴリズムは、教師データを用いた教師あり学習によって学習された学習済みモデルによって実現される。ＬｏＦＴＲアルゴリズムにおいては、テンプレート画像Ｉ_Ｔと実入力画像Ｉ_Ｒとの間で互いに対応する複数の特徴点Ｐ_Ｆを探索し、探索された特徴点Ｐ_Ｆが確信度に従って足切りされる。 The feature point detection unit 22 preferably extracts points having a confidence level (Confidence) indicating the degree of correspondence between the points included in the template image I _T and the points included in the real input image I _R that exceeds a predetermined threshold (e.g., 98% or 99%) as a plurality of feature points P _F. The extraction of the feature points P _{F described above is performed, for example, according to the LoFTR algorithm. The LoFTR algorithm is realized by a trained model trained by supervised learning using teacher data. In the LoFTR algorithm, a plurality of feature points P F} _{corresponding} to each other between the template image I _T and the real input image I _R are searched for, and the searched feature points P _F are pruned according to the confidence level.

ＬｏＦＴＲアルゴリズムでは、まず、畳み込みニューラルネットワーク及びトランスフォーマによって各画像Ｉ_Ｔ，Ｉ_Ｒに関する特徴量Ｆ_Ｔ，Ｆ_Ｒが算定される。その後、画像Ｉ_Ｔ，Ｉ_Ｒ間の粗な対応付けを輸送最適アルゴリズムによって算定し、小パッチにおける詳細な対応付けの計算を行う。結果として、画像Ｉ_Ｔ，Ｉ_Ｒ間におけるピクセルレベルのマッチングが行われ、対応する複数の特徴点Ｐ_Ｆが抽出される。以上から理解されるように、ＬｏＦＴＲアルゴリズムは、各画像Ｉ_Ｔ，Ｉ_Ｒの全体に対して適用される手法である。 In the LoFTR algorithm, first, feature quantities F _T and F _R for each image I _T and I _R are calculated using a convolutional neural network and a transformer. Then, a rough correspondence between the images I _T and I _R is calculated using a transport optimal algorithm, and a detailed correspondence is calculated for small patches. As a result, pixel-level matching is performed between the images I _T and I _R , and multiple corresponding feature points P _F are extracted. As can be seen from the above, the LoFTR algorithm is a method that is applied to the entirety of each image I _T and I _R.

図８は、ＬｏＦＴＲアルゴリズムによって検出された複数の特徴点Ｐ_Ｆの説明図である。図８では、テンプレート画像Ｉ_Ｔと実入力画像Ｉ_Ｒとの間で互いに対応する特徴点Ｐ_Ｆが線分によって示されている。 8 is an explanatory diagram of a plurality of feature points P _F detected by the LoFTR algorithm. In FIG. 8, corresponding feature points P _F between the template image I _T and the real input image I _R are indicated by line segments.

なお、特徴点検出部２２は、ＳｕｐｅｒＰｏｉｎｔやＰａｔｃｈ２Ｐｉｘ等、ＬｏＦＴＲ以外のアルゴリズムに基づいて複数の特徴点Ｐ_Ｆを検出してもよい。すなわち、特徴点検出部２２は、任意の検出アルゴリズムを用いて、テンプレート画像Ｉ_Ｔ及び実入力画像Ｉ_Ｒに対するステップＳ５１０の処理を実行してよい。 The feature point detection unit 22 may detect a plurality of feature points P _F based on an algorithm other than LoFTR, such as SuperPoint or Patch2Pix. That is, the feature point detection unit 22 may execute the process of step S510 on the template image I _T and the actual input image I _R using any detection algorithm.

ステップＳ５２０において、歪み補正部２６による実入力画像Ｉ_Ｒの歪み補正（ステップＳ５３０）の前処理として、外れ点除去部２４は、特徴点検出部２２が検出した複数の特徴点Ｐ_Ｆから外れ点Ｐ_Ｏを除去する。 In step S520, as preprocessing for the distortion correction of the actual input image I _R by the distortion correction unit 26 (step S530), the outlier removal unit 24 removes the outlier points P _O from the plurality of feature points P _F detected by the feature point detection unit 22.

より詳細には、外れ点除去部２４は、ＣＯＮＳＡＣ（Conditional Sample Consensus）アルゴリズムに基づいて、複数の特徴点Ｐ_Ｆに対応する座標値から外れ値を特定し、特定された外れ値に対応する特徴点Ｐ_Ｆである外れ点Ｐ_Ｏを、歪み補正部２６によるステップＳ５３０の歪み補正に用いるべき特徴点Ｐ_Ｆから除去する。外れ点除去部２４は、例えば、対応する特徴点Ｐ_Ｆの座標値間の距離が所定の閾値を上回る場合に、その座標値（特徴点Ｐ_Ｆの組）を外れ値として認識してよい。上記所定の閾値は、全ての対応する特徴点Ｐ_Ｆ間の距離の統計的値（平均値、中央値、分散、標準偏差等）に基づいて設定されてもよく、ランダムに選択された部分的な特徴点Ｐ_Ｆ間の距離の統計的値に基づいて設定されてもよい。 More specifically, the outlier removal unit 24 identifies an outlier from coordinate values corresponding to a plurality of feature points P _F based on a CONSAC (Conditional Sample Consensus) algorithm, and removes an outlier P _O , which is a feature point P _F corresponding to the identified outlier, from feature points P _F to be used for distortion correction in step S530 by the distortion correction unit 26. For example, when the distance between the coordinate values of corresponding feature points P _F exceeds a predetermined threshold, the outlier removal unit 24 may recognize the coordinate values (a set of feature points P _F ₎ as an outlier. The predetermined threshold may be set based on a statistical value (average value, median, variance, standard deviation, etc.) of the distance between all corresponding feature points P _{F, or may be set based on a statistical value of the distance between randomly selected partial feature points P F.}

ＣＯＮＳＡＣアルゴリズムは、教師あり学習及び自己教師あり学習によって学習された学習済みモデルによって実現される。ＣＯＮＳＡＣアルゴリズムにおいては、データセットからサンプルを選択する際に用いた情報に基づいてサンプルが更新される。ＣＯＮＳＡＣアルゴリズムは、ＬｏＦＴＲアルゴリズムと同様に、各画像Ｉ_Ｔ，Ｉ_Ｒの全体に対して適用される手法である。 The CONSAC algorithm is realized by a trained model trained by supervised learning and self-supervised learning. In the CONSAC algorithm, samples are updated based on information used when selecting samples from a dataset. Like the LoFTR algorithm, the CONSAC algorithm is a method applied to the entirety of each image I _T and I _R.

ステップＳ５３０において、歪み補正部２６は、外れ点除去部２４が外れ点Ｐ_Ｏを除去した後の除去済み特徴点Ｐ_ＦＥを用いて実入力画像Ｉ_Ｒの歪みを補正する。 In step S530, the distortion correction unit 26 corrects the distortion of the actual input image I _R by using the removed feature points P _FE obtained after the outlier point removal unit 24 has removed the outlier points P _O.

より詳細には、歪み補正部２６は、薄板スプライン（Thin Plate Spline）アルゴリズムに基づいて、実入力画像Ｉ_Ｒにおける除去済み特徴点Ｐ_ＦＥの座標を、テンプレート画像Ｉ_Ｔにおける対応する特徴点Ｐ_Ｆの座標に近付けるように実入力画像Ｉ_Ｒを補正し、補正済み実入力画像Ｉ_ＲＡを出力する。 More specifically, the distortion correction unit 26 corrects the real input image I _R based on a Thin Plate Spline algorithm so as to bring the coordinates of the removed feature points P _FE in the real input image I _R closer to the coordinates of the corresponding feature points P _F in the template image I _T , and outputs a corrected real input image I _RA .

薄板スプラインアルゴリズムは、２次元平面における点の集合を用いて、集合に含まれる点を通る曲面を求めるアルゴリズムである。本実施形態においては、薄板スプラインアルゴリズムが実入力画像Ｉ_Ｒの全体に対して適用される。他に、後述されるように、薄板スプラインアルゴリズムが実入力画像Ｉ_Ｒを分割した部分画像に対して適用されてもよい。 The thin plate spline algorithm is an algorithm that uses a set of points in a two-dimensional plane to find a curved surface that passes through the points included in the set. In this embodiment, the thin plate spline algorithm is applied to the entire real input image I _R. Alternatively, as described later, the thin plate spline algorithm may be applied to partial images obtained by dividing the real input image I _R.

図９は、第１実施形態に係る補正済み実入力画像Ｉ_ＲＡの例を示す図である。一方、図１０は、図５に示すフローチャートにおいてステップＳ５２０の外れ点除去を実行しない場合に歪み補正部２６から出力される補正済み実入力画像Ｉ’_ＲＡの例（すなわち、図９の対比例）を示す図である。 Fig. 9 is a diagram showing an example of a corrected actual input image IRA according to the first embodiment, while Fig. 10 is a diagram showing an example of a corrected actual input image _I'RA (i.e., a comparison example of Fig. 9) output from the distortion correction unit 26 when the outlier removal in step _S520 in the flowchart shown in Fig. 5 is not executed.

図９に示すように、上述したステップＳ５１０からＳ５３０を実入力画像Ｉ_Ｒに対して実行した場合には、フォーマットＦにおける歪みが適切に補正された補正済み実入力画像Ｉ_ＲＡが取得されている。 As shown in FIG. 9, when the above-described steps S510 to S530 are executed on the actual input image I 1 _R , a corrected actual input image I 1 _RA in which the distortion in the format F has been appropriately corrected is acquired.

一方、図１０に示すように、ステップＳ５２０の外れ点除去が実行されない場合には、歪み補正が不完全な補正済み実入力画像Ｉ’_ＲＡが取得されてしまう。結果として、後段の文字認識の精度が低下する。 10, if the outlier removal in step S520 is not performed, a corrected actual input image _I'RA with incomplete distortion correction will be acquired, resulting in a decrease in the accuracy of the subsequent character recognition.

ステップＳ５４０において、文字認識部２８は、歪み補正部２６が補正した後の補正済み実入力画像Ｉ_ＲＡにおける認識エリアＲＡに相当するエリアＣＡに含まれる補正済み文字Ｃ’に対して文字認識を行う。 In step S540, the character recognition unit 28 performs character recognition on the corrected character C′ included in the area CA corresponding to the recognition area _RA in the corrected actual input image IRA corrected by the distortion correction unit 26.

以上の構成によれば、特徴点検出及び外れ値除去に基づく歪み補正がなされた実入力画像Ｉ_Ｒに対して文字認識が実行されるので、そうでない構成と比較して、実入力画像Ｉ_Ｒに記載された文字をより適切に認識することが可能である。 According to the above configuration, character recognition is performed on the real input image I _R that has been subjected to distortion correction based on feature point detection and outlier removal, so it is possible to more appropriately recognize characters written in the real input image I _R compared to a configuration that does not do so.

２．第２実施形態
図１１は、第２実施形態に係る文字認識装置２０のソフトウェア構成図である。図１１に示すように、文字認識装置２０は、第１実施形態と同様に、制御部２１０と記憶部２２０と通信部２３０とを有する。第１実施形態と比較して、第２実施形態の制御部２１０は、テンプレート選択部３０と補正要否判定部３２とを、ソフトウェア要素としてさらに含む。 2. Second embodiment Fig. 11 is a software configuration diagram of a character recognition device 20 according to a second embodiment. As shown in Fig. 11, the character recognition device 20 has a control unit 210, a storage unit 220, and a communication unit 230, similar to the first embodiment. Compared to the first embodiment, the control unit 210 of the second embodiment further includes a template selection unit 30 and a correction necessity determination unit 32 as software elements.

テンプレート選択部３０は、ＬｏＦＴＲアルゴリズム及び／又はＣＯＮＳＡＣアルゴリズムを用いて、特徴点検出部２２が用いるべきテンプレート画像Ｉ_Ｔを複数のテンプレート画像Ｉ_Ｔから選択する。 The template selection unit 30 uses the LoFTR algorithm and/or the CONSAC algorithm to select a template image I 1 _T to be used by the feature point detection unit 22 from among a plurality of template images I 1 _T.

より詳細には、例えば、テンプレート選択部３０は、実入力画像Ｉ_Ｒと複数のテンプレート画像Ｉ_Ｔの各々との間で特徴点検出部２２による特徴点検出を行って、最も多くの特徴点Ｐ_Ｆが検出されたテンプレート画像Ｉ_Ｔを選択する。その後、選択されたテンプレート画像Ｉ_Ｔを用いて、第１実施形態のステップＳ５１０からＳ５４０が実行される。 More specifically, for example, the template selection unit 30 performs feature point detection between the real input image I _R and each of the multiple template images I _T using the feature point detection unit 22, and selects the template image I _T from which the largest number of feature points P _F are detected. Then, steps S510 to S540 of the first embodiment are executed using the selected template image I _T.

また、テンプレート選択部３０は、上記のように検出された特徴点Ｐ_Ｆに対して、さらに外れ点除去部２４による外れ点除去を行った後に、最も多くの特徴点Ｐ_Ｆが残っているテンプレート画像Ｉ_Ｔを選択してもよい。 Furthermore, the template selection unit 30 may further perform outlier removal on the feature points _PF detected as described above using the outlier removal unit 24, and then select the template image _IT having the largest number of remaining feature points _PF .

以上の構成によれば、ユーザ端末１０のユーザがテンプレート画像Ｉ_Ｔを選択しなくても、使用すべきテンプレート画像Ｉ_Ｔを文字認識装置２０が自動的に選択可能である。 According to the above configuration, even if the user of the user terminal 10 does not select a template image _IT , the character recognition device 20 can automatically select the template image _IT to be used.

補正要否判定部３２は、ＬｏＦＴＲアルゴリズム及び／又はＣＯＮＳＡＣアルゴリズムを用いて、歪み補正部２６による歪み補正を実行すべきか否かを判定する。 The correction necessity determination unit 32 uses the LoFTR algorithm and/or the CONSAC algorithm to determine whether or not distortion correction should be performed by the distortion correction unit 26.

より詳細には、例えば、補正要否判定部３２は、実入力画像Ｉ_Ｒと複数のテンプレート画像Ｉ_Ｔの各々との間で特徴点検出部２２による特徴点検出を行う。対応する特徴点Ｐ_Ｆの間で座標値の差分が大きい場合（例えば、座標値の差分の合計が所定の閾値を上回る場合）、実入力画像Ｉ_Ｒの歪みが相対的に大きいと考えられるので、補正要否判定部３２は歪み補正部２６による歪み補正を実行すると判定する。 More specifically, for example, the correction necessity determination unit 32 performs feature point detection by the feature point detection unit 22 between the actual input image I _R and each of the multiple template images I _T. If the difference in coordinate values between corresponding feature points P _F is large (for example, if the sum of the differences in coordinate values exceeds a predetermined threshold value), it is considered that the distortion of the actual input image I _R is relatively large, and therefore the correction necessity determination unit 32 determines that distortion correction should be performed by the distortion correction unit 26.

又は、対応する特徴点Ｐ_Ｆの間で座標値の差分の分散が大きい場合（例えば、座標値の差分の分散が所定の閾値を上回る場合）、実入力画像Ｉ_Ｒの歪みが相対的に大きいと考えられるので、補正要否判定部３２は歪み補正部２６による歪み補正を実行すると判定する。 Alternatively, if the variance of the differences in coordinate values between corresponding feature points P _F is large (for example, if the variance of the differences in coordinate values exceeds a predetermined threshold value), the distortion of the actual input image I _R is considered to be relatively large, and the correction necessity determination unit 32 determines that distortion correction should be performed by the distortion correction unit 26.

また、補正要否判定部３２は、上記のように検出された特徴点Ｐ_Ｆに対して、さらに外れ点除去部２４による外れ点除去を行った後に、上記した補正要否判定を実行してもよい。 Furthermore, the correction necessity determining section 32 may perform the above-mentioned correction necessity determination after the outlier removing section 24 further performs outlier point removal on the feature points _PF detected as described above.

歪み補正部２６による歪み補正を実行しないと判定された場合、制御部２１０は、実入力画像Ｉ_Ｒに対して一般的な台形補正を実行してよい。 If it is determined that distortion correction by the distortion corrector 26 is not to be performed, the controller 210 may perform general keystone correction on the actual input image I 1 _R.

以上の構成によれば、歪みが相対的に大きい画像、すなわち歪み補正の必要性が相対的に高い画像に対して、選択的に歪み補正が実行される。したがって、文字認識装置２０の全体的な処理負荷を低減することが可能である。 According to the above configuration, distortion correction is selectively performed on images with relatively large distortion, i.e., images for which the need for distortion correction is relatively high. Therefore, it is possible to reduce the overall processing load of the character recognition device 20.

また、補正要否判定部３２は、上記要否判定に代えて、実入力画像Ｉ_Ｒに所定の相対長さ（例えば、フォーマットＦの縦辺又は横辺の全長の７０％又は８０％）以上の直線部分が含まれるか否かに基づいて、歪み補正部２６による歪み補正を実行すべきか否かを判定してもよい。 In addition, instead of the above-mentioned correction necessity determination, the correction necessity determination unit 32 may determine whether or not distortion correction should be performed by the distortion correction unit 26 based on whether or not the actual input image I _R includes a straight line portion having a predetermined relative length (e.g., 70% or 80% of the total length of the vertical or horizontal side of the format F) or more.

第１実施形態においては、ステップＳ５３０において、歪み補正部２６が、実入力画像Ｉ_Ｒの全体に対して歪み補正を実行する。対照的に、第２実施形態において、歪み補正部２６は、実入力画像Ｉ_Ｒを複数の部分画像Ｉ_Ｐに分割した後に、各部分画像Ｉ_Ｐに含まれる除去済み特徴点Ｐ_ＦＥを用いて当該部分画像Ｉ_Ｐの歪みを補正する。歪み補正部２６は、実入力画像Ｉ_Ｒを６つの部分画像Ｉ_Ｐに分割にしてもよく、８つの部分画像Ｉ_Ｐに分割にしてもよく、さらに多くの部分画像Ｉ_Ｐに分割にしてもよい。 In the first embodiment, in step S530, the distortion correction unit 26 performs distortion correction on the entire real input image I _R. In contrast, in the second embodiment, the distortion correction unit 26 divides the real input image I _R into a plurality of partial images I _P , and then corrects the distortion of the partial images I _P using the removed feature points _PFE included in each partial image I _P. The distortion correction unit 26 may divide the real input image I _R into six partial images I _P , eight partial images I _P , or even more partial images I _P.

薄板スプラインアルゴリズムは、処理対象の画像のサイズが増大するに従って顕著に処理負荷が高まり、メモリリーク等の問題が生じる可能性も高まる。以上の構成によれば、歪み補正部２６による歪み補正の処理負荷を低減することが可能である。 The thin plate spline algorithm imposes a significant processing load as the size of the image to be processed increases, and the possibility of problems such as memory leaks also increases. With the above configuration, it is possible to reduce the processing load of distortion correction by the distortion correction unit 26.

なお、第２実施形態においても、ステップＳ５１０の特徴点検出及びステップＳ５２０の外れ点除去は、実入力画像Ｉ_Ｒの全体に対して実行される。 In the second embodiment, too, the feature point detection in step S510 and the outlier point removal in step S520 are performed on the entire real input image I _R.

また、歪み補正部２６は、各部分画像Ｉ_Ｐに含まれる除去済み特徴点Ｐ_ＦＥのうち、所定数以下の除去済み特徴点Ｐ_ＦＥを用いて当該部分画像Ｉ_Ｐの歪みを補正してよい。さらに、歪み補正部２６は、除去済み特徴点Ｐ_ＦＥを選択する際に、除去済み特徴点Ｐ_ＦＥ間の距離が所定以上に保たれるように（例えば、選択後の除去済み特徴点Ｐ_ＦＥ間の距離の合計が所定の閾値を上回るように）選択を行ってよい。 Furthermore, the distortion correction unit 26 may correct the distortion of each partial image IP by using a predetermined number or less of the removed feature points _PFE among the removed feature points _PFE included in the partial image _IP . Furthermore, when selecting the removed feature points _PFE , the distortion correction unit 26 may perform the selection so that the distance between the removed feature _points _PFE is kept at a predetermined value or more (for example, so that the total distance between the removed feature points _PFE after selection exceeds a predetermined threshold value).

以上の構成によれば、歪み補正部２６による歪み補正の処理負荷をさらに低減することが可能である。 The above configuration makes it possible to further reduce the processing load of distortion correction by the distortion correction unit 26.

上記した本実施形態におけるテンプレート選択部３０によるテンプレート選択、補正要否判定部３２による補正要否判定、及び歪み補正部２６による部分画像の歪み補正は、独立して実行可能である。上記３つの独立した動作のうち、いずれか１つ又は２つの動作のみが実行されてもよいことは、当業者に当然に理解される。 In the present embodiment described above, the template selection by the template selection unit 30, the correction necessity determination by the correction necessity determination unit 32, and the distortion correction of the partial image by the distortion correction unit 26 can be performed independently. It will be understood by those skilled in the art that only one or two of the above three independent operations may be performed.

３．その他の実施形態
３．１．変形例
以上、本発明を実施するための形態を説明したが、本発明は上記実施形態に限定されるものではない。上記実施形態は例示に過ぎず、種々の変形が可能であることは当然に理解される。上記実施形態において使用される単語、連語等の表現は例示に過ぎず、実質的に同一の又は類似する表現に置換され得る。 3. Other embodiments 3.1. Modifications Although the embodiments for carrying out the present invention have been described above, the present invention is not limited to the above-mentioned embodiments. It is naturally understood that the above-mentioned embodiments are merely examples and that various modifications are possible. Words, phrases, and other expressions used in the above-mentioned embodiments are merely examples and may be replaced with substantially the same or similar expressions.

歪み補正部２６は、上記実施形態にて説明された歪み補正を行った後に、ＬｏＦＴＲアルゴリズム及び／又はＣＯＮＳＡＣアルゴリズムを用いて、補正済み実入力画像Ｉ_ＲＡにおける歪み補正が適切であるか否かを判定してよい。 After performing the distortion correction described in the above embodiment, the distortion correction unit 26 may use the LoFTR algorithm and/or the CONSAC algorithm to determine whether or not the distortion correction in the corrected actual input image _IRA is appropriate.

上記実施形態に記載された装置が提供する手段および／または機能は、実体的なメモリ装置に記録されたソフトウェアおよびそれを実行するコンピュータ、ソフトウェアのみ、ハードウェアのみ、あるいはそれらの組合せによって提供することができる。例えば、いずれかの上記装置がハードウェアである電子回路によって提供される場合、それは多数の論理回路を含むデジタル回路、またはアナログ回路によって提供することができる。 The means and/or functions provided by the devices described in the above embodiments can be provided by software recorded in a physical memory device and a computer that executes the software, by software alone, by hardware alone, or by a combination of these. For example, if any of the above devices is provided by electronic circuits that are hardware, it can be provided by digital circuits including a large number of logic circuits, or by analog circuits.

上記実施形態に記載された装置は、非遷移的実体的記録媒体（non-transitory tangible storage medium）に格納されたプログラムを実行する。このプログラムが実行されることで、プログラムに対応する方法が実行される。 The device described in the above embodiment executes a program stored in a non-transitory tangible storage medium. Execution of this program results in the execution of a method corresponding to the program.

３．２．付記
上記実施形態及び変形例の一部又は全部は、以下の付記のようにも記載され得るが、以下の付記の内容には限定されない。以下では、複数の付記に従属する付記に対して、複数の付記に従属する付記が従属するという関係性が表現される。以下に表現される付記の従属関係の全てが上記実施形態に含まれる。 3.2. Supplementary Note A part or all of the above embodiment and modified examples may be described as the following supplementary note, but are not limited to the contents of the following supplementary note. In the following, a relationship is expressed in which a supplementary note that is dependent on multiple supplementary notes is dependent on the supplementary note that is dependent on multiple supplementary notes. All of the dependent relationships of the supplementary notes expressed below are included in the above embodiment.

（付記１）
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出する特徴点検出部と、
前記特徴点検出部が検出した前記複数の特徴点から外れ点を除去する外れ点除去部と、
前記外れ点除去部が前記外れ点を除去した後の除去済み特徴点を用いて前記実入力画像の歪みを補正する歪み補正部と、
前記歪み補正部が補正した後の補正済み実入力画像における前記認識エリアに相当するエリアに含まれる補正済み文字に対して文字認識を行う文字認識部と、を備える
文字認識装置。 (Appendix 1)
a feature point detection unit that detects a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
an outlier removal unit that removes outliers from the plurality of feature points detected by the feature point detection unit;
a distortion correction unit that corrects distortion of the actual input image by using removed feature points obtained after the outlier point removal unit has removed the outlier points;
a character recognition unit that performs character recognition on a corrected character included in an area corresponding to the recognition area in the corrected actual input image after correction by the distortion correction unit.

（付記２）
前記外れ点除去部は、ＣＯＮＳＡＣ（Conditional Sample Consensus）アルゴリズムに基づいて、前記複数の特徴点に対応する座標値から外れ値を特定し、特定された前記外れ値に対応する特徴点である前記外れ点を、前記歪み補正部による歪み補正に用いるべき特徴点から除去する
付記１に記載の文字認識装置。 (Appendix 2)
The outlier removal unit identifies an outlier from coordinate values corresponding to the plurality of feature points based on a Conditional Sample Consensus (CONSAC) algorithm, and removes the outlier that is a feature point corresponding to the identified outlier from feature points to be used for distortion correction by the distortion correction unit.

（付記３）
前記歪み補正部は、薄板スプライン（Thin Plate Spline）アルゴリズムに基づいて、前記実入力画像における除去済み特徴点の座標を、前記テンプレート画像における対応する特徴点の座標に近付けるように前記実入力画像を補正する
付記２に記載の文字認識装置。 (Appendix 3)
The character recognition device according to claim 2, wherein the distortion correction unit corrects the actual input image based on a Thin Plate Spline algorithm so as to bring coordinates of removed feature points in the actual input image closer to coordinates of corresponding feature points in the template image.

（付記４）
前記特徴点検出部は、前記テンプレート画像に含まれる点と前記実入力画像に含まれる点とが対応していることの程度を示す確信度が所定の閾値を上回る点を、前記複数の特徴点として抽出する
付記２又は付記３に記載の文字認識装置。 (Appendix 4)
The character recognition device according to claim 2 or 3, wherein the feature point detection unit extracts points whose confidence level, which indicates the degree to which points included in the template image correspond to points included in the actual input image, exceeds a predetermined threshold value as the plurality of feature points.

（付記５）
前記特徴点検出部は、ＬｏＦＴＲ（Local Feature Matching with Transformers）アルゴリズムに基づいて前記複数の特徴点を検出する
付記２から付記４のいずれかに記載の文字認識装置。 (Appendix 5)
The character recognition device according to any one of Supplementary Note 2 to Supplementary Note 4, wherein the feature point detection unit detects the plurality of feature points based on a Local Feature Matching with Transformers (LoFTR) algorithm.

（付記６）
前記ＬｏＦＴＲアルゴリズム及び／又は前記ＣＯＮＳＡＣアルゴリズムを用いて、前記特徴点検出部が用いるべき前記テンプレート画像を複数のテンプレート画像から選択するテンプレート選択部を更に備える
付記２から付記５のいずれかに記載の文字認識装置。 (Appendix 6)
The character recognition device according to any one of Supplementary Note 2 to Supplementary Note 5, further comprising a template selection unit that uses the LoFTR algorithm and/or the CONSAC algorithm to select the template image to be used by the feature point detection unit from a plurality of template images.

（付記７）
前記ＬｏＦＴＲアルゴリズム及び／又は前記ＣＯＮＳＡＣアルゴリズムを用いて、前記歪み補正部による前記歪み補正を実行すべきか否かを判定する補正要否判定部を更に備える
付記２から付記５のいずれかに記載の文字認識装置。 (Appendix 7)
The character recognition device according to any one of Supplementary Note 2 to Supplementary Note 5, further comprising a correction necessity determination unit that determines whether or not the distortion correction should be performed by the distortion correction unit using the LoFTR algorithm and/or the CONSAC algorithm.

（付記８）
前記実入力画像に所定の相対長さ以上の直線部分が含まれるか否かに基づいて、前記歪み補正部による前記歪み補正を実行すべきか否かを判定する補正要否判定部を更に備える
付記１から付記５のいずれかに記載の文字認識装置。 (Appendix 8)
The character recognition device according to any one of Supplementary Note 1 to Supplementary Note 5, further comprising a correction necessity determination unit that determines whether or not the distortion correction should be performed by the distortion correction unit, based on whether or not the actual input image includes a straight line portion having a predetermined relative length or more.

（付記９）
前記歪み補正部は、前記実入力画像を複数の部分画像に分割した後に、各部分画像に含まれる除去済み特徴点を用いて当該部分画像の歪みを補正する
付記１から付記８のいずれかに記載の文字認識装置。 (Appendix 9)
9. The character recognition device according to claim 1, wherein the distortion correction unit divides the actual input image into a plurality of partial images, and then corrects distortion of each partial image by using the removed feature points included in each partial image.

（付記１０）
前記歪み補正部は、各部分画像に含まれる前記除去済み特徴点のうち、所定数以下の除去済み特徴点を用いて当該部分画像の歪みを補正する
付記９に記載の文字認識装置。 (Appendix 10)
The character recognition device according to claim 9, wherein the distortion correction unit corrects distortion of each partial image by using a predetermined number or less of the removed feature points included in each partial image.

（付記１１）
前記歪み補正部は、前記所定数以下の除去済み特徴点を選択する際に、除去済み特徴点Ｐ間の距離が所定以上に保たれるように選択を行う
付記１０に記載の文字認識装置。 (Appendix 11)
The character recognition device according to claim 10, wherein when selecting the predetermined number or less of removed feature points, the distortion correction unit performs selection such that a distance between removed feature points P is maintained at a predetermined value or more.

（付記１２）
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出する特徴点検出部と、
前記特徴点検出部が検出した前記複数の特徴点から外れ点を除去する外れ点除去部と、
前記外れ点除去部が前記外れ点を除去した後の除去済み特徴点を用いて前記実入力画像の歪みを補正する歪み補正部と、を備える
画像前処理装置。 (Appendix 12)
a feature point detection unit that detects a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
an outlier removal unit that removes outliers from the plurality of feature points detected by the feature point detection unit;
a distortion correction unit that corrects distortion of the actual input image by using removed feature points obtained after the outlier point removal unit has removed the outlier points.

（付記１３）
コンピュータのプロセッサにより、
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出することと、
検出された前記複数の特徴点から外れ点を除去することと、
前記外れ点が除去された後の除去済み特徴点を用いて前記実入力画像の歪みを補正することと、を備える
画像前処理方法。 (Appendix 13)
The computer's processor
Detecting a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
removing outliers from the detected feature points;
and correcting distortion of the real input image using the removed feature points after the outlier points have been removed.

（付記１４）
文字認識装置のプロセッサにより、
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出することと、
前記複数の特徴点から外れ点を除去することと、
前記外れ点が除去された後の除去済み特徴点を用いて前記実入力画像の歪みを補正することと、
補正された後の補正済み実入力画像における前記認識エリアに相当するエリアに含まれる補正済み文字に対して文字認識を行うことと、を備える
方法。 (Appendix 14)
The processor of the character recognition device
Detecting a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
removing outliers from the plurality of feature points;
correcting distortion of the real input image using the removed feature points after the outlier points have been removed;
performing character recognition on a corrected character included in an area corresponding to the recognition area in a corrected actual input image after the correction.

（付記１５）
文字認識装置のプロセッサに、
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出することと、
前記複数の特徴点から外れ点を除去することと、
前記外れ点が除去された後の除去済み特徴点を用いて前記実入力画像の歪みを補正することと、
補正された後の補正済み実入力画像における前記認識エリアに相当するエリアに含まれる補正済み文字に対して文字認識を行うことと、を実行させる
プログラム。 (Appendix 15)
The processor of the character recognition device
Detecting a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
removing outliers from the plurality of feature points;
correcting distortion of the real input image using the removed feature points after the outlier points have been removed;
performing character recognition on a corrected character included in an area corresponding to the recognition area in a corrected actual input image after the correction;

（付記１６）
文字認識装置のプロセッサに、
複数の認識エリアを含むテンプレート画像と、前記テンプレート画像に対応するフォーマットに認識されるべき文字が記載されている実入力画像と、の間で互いに対応する複数の特徴点を検出することと、
前記複数の特徴点から外れ点を除去することと、
前記外れ点が除去された後の除去済み特徴点を用いて前記実入力画像の歪みを補正することと、
補正された後の補正済み実入力画像における前記認識エリアに相当するエリアに含まれる補正済み文字に対して文字認識を行うことと、を実行させる
プログラムを記録した非遷移的実体的記録媒体。 (Appendix 16)
The processor of the character recognition device
Detecting a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
removing outliers from the plurality of feature points;
correcting distortion of the real input image using the removed feature points after the outlier points have been removed;
and performing character recognition on the corrected characters included in an area corresponding to the recognition area in the corrected actual input image after the correction.

１０ユーザ端末
２０文字認識装置
２２特徴点検出部
２４外れ点除去部
２６歪み補正部
２８文字認識部
３０テンプレート選択部
３２補正要否判定部

REFERENCE SIGNS LIST 10 User terminal 20 Character recognition device 22 Feature point detection unit 24 Outlier removal unit 26 Distortion correction unit 28 Character recognition unit 30 Template selection unit 32 Correction necessity determination unit

Claims

a feature point detection unit that detects a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
an outlier removal unit that identifies an outlier from coordinate values corresponding to the plurality of feature points detected by the feature point detection unit, and removes the outlier that is the feature point corresponding to the identified outlier from feature points to be used for distortion correction;
a distortion correction unit that corrects distortion of the actual input image by using removed feature points obtained after the outlier point removal unit has removed the outlier points;
a character recognition unit that performs character recognition on a corrected character included in an area corresponding to the recognition area in the corrected actual input image after the distortion correction unit has corrected the image,
the feature point detection unit detects the plurality of feature points without performing a process related to distortion correction of the actual input image in advance;
The outlier removal unit removes the outliers based on a CONSAC (Conditional Sample Consensus) algorithm.
Character recognition device.

2. The character recognition device according to claim 1, wherein the outlier removal unit identifies an outlier from a coordinate value of each feature point of the template image detected by the feature point detection unit and a coordinate value of each feature point of the actual input image corresponding to each feature point of the template image, and removes an outlier that is a feature point corresponding to the identified outlier from feature points to be used for distortion correction by the distortion correction unit.

3. The character recognition device according to claim 2, wherein the outlier removal unit recognizes a coordinate value as the outlier when a distance between coordinate values of a pair of corresponding feature points exceeds a threshold value that is set based on a statistical value of distances between all or a part of the corresponding multiple feature points.

4. The character recognition device according to claim 3, wherein the distortion correction unit corrects the actual input image based on a Thin Plate Spline algorithm so as to bring coordinates of removed feature points in the actual input image closer to coordinates of corresponding feature points in the template image.

5. The character recognition device according to claim 4, wherein the feature point detection unit extracts, as the plurality of feature points, points whose degree of certainty, which indicates the degree to which points included in the template image correspond to points included in the actual input image, exceeds a predetermined threshold value.

The character recognition device according to claim 5 , wherein the feature point detection unit detects the plurality of feature points based on a Local Feature Matching with Transformers (LoFTR) algorithm.

The character recognition device according to claim 6 , further comprising a template selection unit that selects the template image to be used by the feature point detection unit from a plurality of template images by using the LoFTR algorithm and/or the CONSAC algorithm.

The character recognition device according to claim 6 , further comprising a correction necessity determination unit that determines whether or not the distortion correction should be performed by the distortion correction unit, using the LoFTR algorithm and/or the CONSAC algorithm.

7. The character recognition device according to claim 1, further comprising a correction necessity determination unit that determines whether or not the distortion correction should be performed by the distortion correction unit, based on whether or not the actual input image includes a straight line portion having a predetermined relative length or more.

The character recognition device according to claim 1 , wherein the distortion correction unit divides the actual input image into a plurality of partial images, and then corrects the distortion of each partial image by using the removed feature points included in each partial image.

the distortion correction unit corrects distortion of each partial image by using a predetermined number or less of the removed feature points included in each partial image;
The character recognition device according to claim 10 , wherein when the predetermined number or less of removed feature points are selected, the selection is performed such that a distance between the removed feature points is maintained at a predetermined value or more.

The computer's processor
Detecting a plurality of corresponding feature points between a template image including a plurality of recognition areas and an actual input image in which characters to be recognized are written in a format corresponding to the template image;
identifying an outlier from coordinate values corresponding to the detected plurality of feature points, and removing the outlier that is the feature point corresponding to the identified outlier from feature points to be used for distortion correction;
and correcting the distortion of the real input image using the removed feature points after the outlier points have been removed;
detecting the plurality of feature points includes detecting the plurality of feature points without performing a process related to distortion correction of the actual input image in advance;
removing the outlier points based on a Conditional Sample Consensus (CONSAC) algorithm;
Image pre-processing methods.