TW200416583A - Definition data generation method of account book voucher and processing device of account book voucher - Google Patents

Definition data generation method of account book voucher and processing device of account book voucher Download PDF

Info

Publication number
TW200416583A
TW200416583A TW092132932A TW92132932A TW200416583A TW 200416583 A TW200416583 A TW 200416583A TW 092132932 A TW092132932 A TW 092132932A TW 92132932 A TW92132932 A TW 92132932A TW 200416583 A TW200416583 A TW 200416583A
Authority
TW
Taiwan
Prior art keywords
definition data
aforementioned
book voucher
information
defined area
Prior art date
Application number
TW092132932A
Other languages
Chinese (zh)
Inventor
Eisuke Asano
Hiroshi Shinjo
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of TW200416583A publication Critical patent/TW200416583A/en

Links

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41BWEAPONS FOR PROJECTING MISSILES WITHOUT USE OF EXPLOSIVE OR COMBUSTIBLE PROPELLANT CHARGE; WEAPONS NOT OTHERWISE PROVIDED FOR
    • F41B11/00Compressed-gas guns, e.g. air guns; Steam guns
    • F41B11/80Compressed-gas guns, e.g. air guns; Steam guns specially adapted for particular purposes
    • F41B11/89Compressed-gas guns, e.g. air guns; Steam guns specially adapted for particular purposes for toys
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41BWEAPONS FOR PROJECTING MISSILES WITHOUT USE OF EXPLOSIVE OR COMBUSTIBLE PROPELLANT CHARGE; WEAPONS NOT OTHERWISE PROVIDED FOR
    • F41B11/00Compressed-gas guns, e.g. air guns; Steam guns
    • F41B11/50Magazines for compressed-gas guns; Arrangements for feeding or loading projectiles from magazines
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41BWEAPONS FOR PROJECTING MISSILES WITHOUT USE OF EXPLOSIVE OR COMBUSTIBLE PROPELLANT CHARGE; WEAPONS NOT OTHERWISE PROVIDED FOR
    • F41B11/00Compressed-gas guns, e.g. air guns; Steam guns
    • F41B11/70Details not provided for in F41B11/50 or F41B11/60

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Input (AREA)

Abstract

Decrease the user's loading when generating definition data of account book voucher. The account book voucher processing device proceeds the retrieval (procedure 200) of layout data and the retrieval of item name pertinent to the designated area regarding the area designated by user. Convert these data into the definition data (procedure 500, procedure 600) so as to realize the automatic generation of definition data concerning account book. By taking advantage of this, automatic generation of definition data can be done by simply designating the definition data. As a result, the setup loading for definition data of user's account book voucher can be alleviated.

Description

200416583 (1) 玖、發明說明 【發明所屬之技術領域】 本發明係關於處理帳簿傳票之技術,特別是關於作 成在進行文字辨識處理時所使用的帳簿傳票定義畜料之 技術。 【先前技術】 在辨識帳簿傳票的金額欄等文字列時,係利用預先 登錄在文字辨識要求區域的位置或文字數等資訊的帳簿 傳票疋義資料’以進行文字辨識。關於此帳簿傳票定義 資料作成’至目前爲止,有各種手法被提出,那些手法 主要係下工夫在使負擔大的定義資料變得更容易之手法 〇 例如,在記載於日本專利特開2 0 0 1 - 1 2 6 0 1 0號公報 (第8-9頁,第7圖)之帳簿傳票定義作成方法中,就 對於沒有記載資料的帳簿傳票,由格線抽出以及框抽出 以自動抽出定義資料;另外,可以辨識預先印刷的文字 ,藉由與預先登錄的關鍵字辭典對照,設定未被格線圍 住的文字寫入框、設定存在於對應關鍵字之位置的框之 文字種類;另外,藉由讀取全部的預先印刷文字,不以 人手指定框’可自動進行關於全部的定義作成之作業的 手法做說明。 【發明內容】 -5- (2) 200416583 但是,在上述定義作成 定框內如沒有預先印刷時, 資料。另外,在辨識全面預 由於非常花時間之故,並+ 位置,因帳簿傳票而不同古& 廣泛地使用關鍵字辭典。 因此,此揭示之主要n 術,不管特定的帳簿傳票胃 ,即使各種帳簿傳票都可Μ 〇 具體爲’例如以所指定 預先印刷文字、記載文字和 抽出的框、格線等佈局資雲只 動做成。此定義資料做成方 有預先印刷,也可以辨識存 ’豬由將該辨識結果轉換爲 抽出。另外,即使在讀取區 ’由文字列的位置或尺寸、 字列尺寸的比例等,將作爲 後’稱爲項目名稱)的妥當 當的文字列之辨識結果轉換 資料的抽出。 藉由以上的處理,不管 ft m、對於讀取區域之項目 方法中,在指定框時,在指 則無法作成文字種類等定義 先印刷文字上,於現況下, 實用。另外,對於關鍵字的 ’不可能對於各種帳簿傳票 點係用於解決此種問題之技 否已經記載完畢,還未記載 自動做成帳簿傳票定義資料 的讀取區域周邊或者內部的 在定義影像資料輸入時自動 爲基礎,進行定義資料的自 法係即使在讀取區域內部沒 在於讀取區域周邊的文字列 定義資料而可做定義資料的 域周邊存在多數的文字列時 框的有無、對於框尺寸之文 對於讀取區域的關鍵字(以 性予以數値化,藉由將最妥 爲定義資料,可以進行定羲 §己載帳簿傳票/未記載帳簿 名稱的位置、指定區域內部 -6- (3) (3)200416583 的預先印刷有無,都可以自動做成帳簿傳票定義資料。 另外,可進一步採用各種形態,例如,可以上述的 帳簿傳票處理之自動定義資料做成方法構成,也可以使 電腦實現此種機能用的電腦程式構成。此處,作爲記憶 媒體,可以利用軟碟、CD-ROM、DVD、光磁碟、1C卡 、IC晶片、R Ο Μ卡匣、衝孔紙卡、印刷條碼等符號之印 刷物、電腦的內部記憶裝置(RAM或ROM等記憶體)以 及外部記憶裝置等電腦可讀取之光學、磁性、電性的種 種媒體。上述種種之特徵可以適當地予以組合。 【實施方式】 一面參考圖面一面分成以下的項目來說明合適的一 實施形態。 A.系統構成 B ·帳簿傳票定義資料的構成 C .帳簿傳票定義資料的作成 C 1 ·項目名稱一定義資料轉換處理 A .系統構成: 第1圖係顯示支援帳簿傳票定義資料的作成之帳簿 傳票處理裝置的構造方塊圖。另外,在以下,雖以依據 帳簿傳票1 〇 6的影像資料而重新自動作成帳簿傳票定義 資料爲例做說明,但是,本帳簿傳票處理裝置也可在已 (4) (4)200416583 經作成的帳簿傳票定義資料新追加登錄別的讀取區域的 定義資料。 此帳簿傳票處理裝置係如圖示般,硬體係連接:泛 用的個人電腦1 0 1以及顯示器1 02、鍵盤1 03、滑鼠1 04 、掃描器1 0 5而構成。在個人電腦1 0 1中安裝有實現帳 簿傳票處理裝置之機能用的應用軟體。圖中,顯示作爲 帳簿傳票處理裝置之機能方塊1 〇 7〜1 1 3。這些機能方塊 係藉由上述應用程式構成。當然,也可以硬體構成。 影像輸入部1 0 7係控制掃描器1 〇 5以達成輸入成爲 作成帳簿傳票定義資料之樣本的帳簿傳票1 〇 6之影像資 料的機能。帳簿傳票定義資料作成部1 〇8係藉由鍵盤1 03 或滑鼠1 〇 4之輸入裝置以指定定義區域,依據此影像資 料以達成自動抽出帳簿傳票定義資料的機能。在此時, 文字辨識用辭典1 1 〇、項目名稱對照用知識辭典1 Π、項 目名稱一定義資訊轉換辭典 Π 2等各資料庫會被參考。 文字辨識用辭典1 1 0係以各文字單位對照影像資料的形 狀和文字用的辭典。項目名稱對照用知識辭典1 1 1係藉 由對照成爲文字列和項目名稱的單字’以提升文字辨識 率用的辭典。項目名稱一定義資訊轉換辭典1 1 2係由藉 由上述項目名稱對照所獲得的項目名稱’而轉換爲成爲 讀取對象之屬性或文字數等之定義資料用的辭典。 帳簿傳票定義資料輸出部1 09係輸出藉由帳簿傳票 定義資料作成部1 0 8所抽出的定義資料。自動作成的定 義資料係登錄在帳簿傳票定義資料用的資料庫1 ] 3。 (5) (5)200416583 B .帳簿傳票定義資料的構造: 第2圖係顯示帳簿傳票影像和帳簿傳票定義資料的 構造圖。顯示要定在圖的上方之帳簿傳票影像20 1的例 子,在下方顯示帳簿傳票定義資料2 0 2之構造的一例。 在帳簿傳票影像2 0 1中係以左上角爲原點,在圖示方向 定義爲X、y軸。 帳簿傳票定義資料2 02的一例可由辨識要求區域座 標、框形狀、知識辭典種類、文字數、手寫·鉛字等構 成。例如,位於帳簿傳票影像2 0 1的右上之委託日期的 文字辨識定義資料係對應位於定義資料202的左上之定 義資料。在定義資料中,關於此辨識要求區域,應進行 文字辨識處理的矩形範圍係以左上(開始位置)以及右 下(結束位置)的各頂點之()座標所定義。在圖中 的例子中,左上的頂點設定爲(1 2 0 0,1 0 0 )、右下的頂點 設定爲( 1 4 00 1 5 0)。另外,框存在故,框形狀設定爲「有 框」。關於知識辭典種類,由於讀取對象的屬性爲日期 故,設定爲「日期」,和文字數爲^ 1 2字」,和文字種 類爲「鉛字」。 但是,此處所舉之定義資料只爲其之一例,此外也 可將各種資訊設定作爲定義資料。例如,辨識要求區域 爲鉛字,文字間距爲一定等事先知道的狀況時,可將文 字間距設定於定義資料,在文字辨識時,藉由使用該資 訊,則可以提升辨識率。 (6) (6)200416583 C .帳簿傳票定義資料的作成: 第3圖係定義資料自動作成處理的流程圖。爲個人 電腦1 0 1的C P U對應使用者的指令而實行之處理。處理 一開始,則CPU先以影像輸入部1 〇7輸入帳簿傳票的影 像資料(步驟S 1 00 ),對於帳簿傳票全面實行佈局解析 處理(步驟 S 2 0 0 )。即由所輸入的影像資料將表格、框 、格線等資訊當成文字資訊予以抽出,同時,被辨識爲 文字的部份也當成文字資訊予以抽出。 藉由個人電腦1 0 1的顯示器1 0 2等之顯示裝置將藉 此處理所獲得的佈局資訊提示給使用者。例如,在第4 ( a )圖的例子中,在視窗4 0 5顯示藉由佈局解析所獲得的 框抽出結果。在此例中,爲了簡化,雖只顯示框抽出結 果,但是實際上,也可以依據按鈕或指令等而切換格線 、文字行資訊的顯示。 此後要定義的預定區域之格線或框被錯誤抽出時, 使用者修正錯誤抽出的佈局資訊(步驟 S 3 0 0 )。修正處 理係利用個人電腦1 01的滑鼠1 0 4等指向裝置來修正顯 不在顯示器1 0 2等的顯示裝置之框或格線等。例如,在 第4 ( b )圖之例子中,錯誤抽出由佈局解析所獲得的框 4 0 6故,啓動修正按鈕4 〇 1,以滑鼠〗〇 4選擇修正框後予 以拖曳而修正框(4 0 7 ) 。C P U —檢測到在佈局資訊被施 以修正處理時,C P U依據所修正的資訊,再度進行佈局 角牛析’正確設疋關於要疋義的χ!疋區域之佈局査訊。 此處理只在定義區域於框或格線等佈局資訊有錯誤 -10- (7) (7)200416583 抽出時進行。因此,在無法確認錯誤抽出時,或在要定 義區域以外的部份發生錯誤抽出時,也可跳過此處理故 ’可以縮短定義資料作成時間。 佈局資訊的修正係藉由對於框或格線等佈局資訊做 追加/消除/修正/合倂/分割等處理而進行。另外, 藉由變更關於在內部所保持的佈局資訊抽出之臨界値, 也可以一次修正佈局資訊。例如,變更在內部保持之可 抽出的框的最小尺寸、最大尺寸的臨界値,藉由再度進 行佈局解析處理,在臨界値變更前無法抽出的框也可一 次予以抽出。 耢此處理,一獲得必要的佈局資訊時,接著,cPU 進行定義區域設定處理(步驟S4〇〇)。在此處理中,藉 由顯示器1 02等的顯示裝置將所獲得之佈局資訊提示給 使用者,利用滑鼠1 〇4等之指向裝置以指定到底是定義 哪個區域。此處理可以選擇所獲得的框,在想要定義無 框之區域時’也可選擇以滑鼠拖曳等包圍寫有讀取文字 列部份的區域。例如,在第4 ( d )圖的例子中,要將框 408當成定義區域時,則使選擇按鈕402啓動,藉由以滑 鼠1 0 4選擇框4 〇 8而進行定義區域的設定。 如請使用者指定定義區域時,CPU則進行佈局資訊 -定義資料轉換處理(步驟S 5 0 0 )。藉由定義區域設定 處理,在選擇了當成佈局資訊所被抽出的框時,則由佈 局資訊表格取得符合選擇框的資訊,以此爲基礎,轉換 爲定義資料。在定義了無框區域時,則將圍住之框視爲 -11 - (8) 200416583 假想框,作成定義資料。此處所謂定義資料係指藉由辨 識要求資訊的矩形座標或框的有無等佈局資訊而可抽出 的定義項目。 另外,在定義區域內存在多數框,而且’由各框的 縱向、橫向尺寸,如判定全框爲1文字框時,則可以框 數爲基礎,進行文字數的設定。例如,在金額欄存在很 多藉由位數線所區別的1文字框。在定義此區域時’可 藉由上述方法以抽出辨識要求區域的矩形座標或框的有 無、文字數等的定義資料。 C P U在此處理前後,也進行項目名稱-定義資料轉 換處理(步驟 S600 )。詳細處理內容雖之後予以敘述, 但是在此處理中,係藉由辨識指定的定義區域之周邊的 文字’可以抽出讀取屬性或文字數等定義資料之處理。 藉由這些處200416583 (1) 发明 Description of the invention [Technical field to which the invention belongs] The present invention relates to a technique for processing a book voucher, and in particular, a technique for defining a book voucher used for character recognition processing to define animal feed. [Prior art] When identifying character strings such as the amount column of a book voucher, it uses the book voucher definition data 'registered in the text recognition request area or the number of characters in advance to perform character recognition. Regarding the creation of this book voucher definition data, various methods have been proposed so far, and these methods are mainly methods that make it easier to define definition data with a large burden. For example, it is described in Japanese Patent Laid-Open No. 20001 -In the method of creating book voucher definitions in Bulletin 1 2 6 0 10 (pages 8-9, Figure 7), for book vouchers with no recorded data, the ruled lines are drawn and the boxes are extracted to automatically extract the definition data; In addition, preprinted text can be recognized, and by comparing with a pre-registered keyword dictionary, a text writing box not surrounded by ruled lines can be set, and the type of text in a frame existing at the position corresponding to the keyword can be set. By reading all the pre-printed text and not using a manual designation box, it is possible to explain the operation of automatically creating all the definitions. [Summary of the Invention] -5- (2) 200416583 However, if there is no pre-printed information in the above-mentioned definition frame, the information is not included. In addition, it takes a lot of time to identify comprehensive predictions, and the + position varies widely due to book vouchers & widely used keyword dictionaries. Therefore, the main technique of this disclosure is that regardless of the specific account book voucher stomach, even various account book vouchers can be specified specifically as' for example, the specified pre-printed text, recorded text, and drawn frames, grids, etc. can only be used for layout Made. The definition data is prepared in advance, or it can be identified and stored as a pig, and the identification result can be converted to extracted. In addition, even in the reading area ′, the position of the character string, the size of the character string, the ratio of the character string size, etc. will be referred to as the item name after the conversion of the proper character string to extract the data. With the above processing, regardless of the ft m or the item method for the reading area, when specifying the frame, the guidelines cannot be used to define the character type and other definitions. The text is printed first, which is practical in the current situation. In addition, the keyword "impossible for various book voucher points is used to solve this problem. Whether the technique for solving such a problem has been recorded has not yet been recorded. The surrounding or internal reading image data for automatically defining the book voucher definition data has not been recorded." Automatically based on input. The self-defining system for defining data, even if there is no character string definition data in the reading area around the reading area, and there are many character strings around the field that can be used to define the data. The size of the keywords for the reading area (the number is digitized by nature, and the most appropriate definition data can be used to determine the location of the book voucher / unrecorded book name, the designated area inside -6- (3) (3) The presence or absence of preprinted 200416583 can be used to automatically create the book voucher definition data. In addition, various forms can be further adopted. For example, the automatic definition data creation method of the book voucher processing described above can be configured, or A computer program for a computer to implement this function. Here, as a storage medium, a floppy disk or CD-ROM can be used , DVD, optical disk, 1C card, IC chip, R OM cartridge, punched paper card, printed matter such as bar code, computer internal memory device (RAM or ROM memory) and external memory device Readable optical, magnetic, and electrical media. The above-mentioned features can be combined as appropriate. [Embodiment] A suitable embodiment will be described with reference to the drawings and divided into the following items. A. System configuration B · The structure of the book voucher definition data C. The creation of the book voucher definition data C 1 · The project name-definition data conversion processing A. The system structure: Figure 1 shows the construction block of the book voucher processing device that supports the creation of the book voucher definition data In addition, in the following, although the definition of the book voucher is automatically created based on the image data of the book voucher 106, the book voucher processing device can also be used in (4) (4) 200416583 The created book voucher definition data is newly added to register the definition data of another reading area. This book voucher processing device is hard as shown in the figure. System connection: General-purpose personal computer 1 0 1 and display 10 02, keyboard 1 03, mouse 1 04, scanner 1 0 5. The personal computer 1 0 1 is equipped with functions for realizing the book voucher processing device. The application software shown in the figure shows the functional blocks 1007 ~ 1 1 as the book voucher processing device. These functional blocks are configured by the above application program. Of course, they can also be configured by hardware. The image input unit 1 0 7 series The scanner 105 is controlled to achieve the function of inputting the image data of the book voucher 1 06 which is a sample for creating the book voucher definition data. The book voucher definition data creation unit 1 08 is through the keyboard 103 or the mouse 1 04 The input device specifies the definition area, and based on this image data, the function of automatically extracting the definition data of the account book voucher is achieved. At this time, various databases such as the character recognition dictionary 1 10, the project name comparison knowledge dictionary 1 Π, and the project name-definition information conversion dictionary Π 2 will be referred to. The character recognition dictionary 110 is a dictionary for comparing the shape of image data and characters in each character unit. Knowledge dictionary for project name comparison 1 1 1 is a dictionary used to improve the recognition rate of words by comparing the word to the word string and the name of the project name. The project name-definition information conversion dictionary 1 1 2 is a dictionary for converting definition data such as the attribute or number of characters to be read from the project name obtained by comparing the above project names. The ledger voucher definition data output unit 109 outputs the definition data extracted by the ledger voucher definition data creation unit 108. The automatically created definition data is registered in the database for the definition of book voucher 1] 3. (5) (5) 200416583 B. Structure of book voucher definition data: Figure 2 shows the structure of the book voucher image and the book voucher definition data. An example of the book voucher image 20 1 to be fixed on the upper side of the figure is displayed, and an example of the structure of the book voucher definition data 2 0 2 is displayed below. In the book voucher image 201, the upper left corner is taken as the origin, and it is defined as the X and y axes in the illustrated direction. An example of the ledger voucher definition data 202 can be composed of the identification request area coordinates, the frame shape, the type of knowledge dictionary, the number of characters, handwriting, and typeface. For example, the character recognition definition data of the commission date located at the upper right of the account book voucher image 2 01 corresponds to the definition data located at the upper left of the definition data 202. In the definition data, regarding this recognition request area, the rectangular area to be text-recognized is defined by the () coordinates of each vertex of the upper left (start position) and lower right (end position). In the example in the figure, the top left vertex is set to (1 2 0 0, 1 0 0) and the bottom right vertex is set to (1 4 00 1 5 0). The frame shape is set to "with frame". Regarding the type of knowledge dictionary, since the attribute to be read is date, it is set to "date", the number of characters is ^ 12 characters, and the type of characters is "type". However, the definition data given here is only an example, and various information settings can also be used as definition data. For example, when the recognition request area is a typeface and the text spacing is constant, you can set the text spacing to the definition data. In text recognition, by using this information, the recognition rate can be improved. (6) (6) 200416583 C. Creation of account book voucher definition data: Figure 3 is a flowchart of automatic definition data processing. C P U for personal computer 101 is executed in response to a user's instruction. At the beginning of the processing, the CPU first inputs the image data of the account book voucher with the image input unit 107 (step S 100), and performs a layout analysis process for the account book voucher (step S 2 0). That is, information such as tables, frames, and ruled lines is extracted as text information from the input image data. At the same time, the part recognized as text is also extracted as text information. A display device such as a display 102 of a personal computer 101, etc., presents the layout information obtained through the processing to the user. For example, in the example of FIG. 4 (a), the frame extraction result obtained by layout analysis is displayed in the window 405. In this example, for the sake of simplification, although only the results of the display frame are extracted, in fact, you can also switch the display of ruled lines and text line information according to buttons or instructions. When the ruled line or frame of the predetermined area to be defined is extracted incorrectly, the user corrects the incorrectly extracted layout information (step S 3 0 0). The correction process uses a pointing device such as a mouse 104 of a personal computer 101 to correct a frame or a ruled line of a display device such as a display 102. For example, in the example in FIG. 4 (b), the frame 4 obtained by layout analysis is incorrectly extracted. Therefore, the correction button 4 〇1 is activated, and the correction frame is selected by dragging the frame with the mouse 〇 04 to correct the frame ( 4 0 7). C P U — It is detected that when the layout information is subjected to correction processing, C P U performs layout again based on the corrected information. The corner analysis' correctly sets the layout inquiry about the χ! 疋 area to be defined. This process is performed only when the layout information such as the frame or grid is incorrect in the definition area. -10- (7) (7) 200416583 is extracted. Therefore, it is also possible to skip this process when the error extraction cannot be confirmed, or when the error extraction occurs outside the area to be defined, so the definition data creation time can be shortened. The layout information is corrected by adding, removing, correcting, combining, and dividing the layout information such as a frame or a ruled line. In addition, by changing the threshold of extraction of layout information held internally, the layout information can also be corrected at one time. For example, if you change the minimum size and maximum size of the extractable frame that is held internally, you can extract the frames that cannot be extracted before the critical volume is changed by performing layout analysis processing again. In this process, once the necessary layout information is obtained, the cPU then performs a defined area setting process (step S400). In this process, the obtained layout information is presented to the user by a display device such as a display 102, and a pointing device such as a mouse 104 is used to specify which area is defined. In this process, you can select the obtained frame, and when you want to define a frameless area, you can also choose to surround the area where the read text string is written by dragging with a mouse or the like. For example, in the example of FIG. 4 (d), when the frame 408 is to be defined, the selection button 402 is activated, and the definition area is set by selecting the frame 4 08 with the mouse 104. If the user is requested to specify the definition area, the CPU performs layout information-definition data conversion processing (step S 500). Through the definition area setting processing, when the frame extracted as the layout information is selected, the information matching the selection frame is obtained from the layout information table, and based on this, it is converted into definition data. When a frameless area is defined, the enclosing frame is regarded as -11-(8) 200416583 imaginary frame to define the data. The definition data here refers to definition items that can be extracted by identifying the presence or absence of rectangular coordinates or the presence or absence of a frame of the requested information. In addition, there are a number of frames in the definition area, and the vertical and horizontal dimensions of each frame. If it is determined that the entire frame is a text frame, the number of frames can be set based on the number of frames. For example, there are many 1 text boxes in the amount column that are distinguished by digit lines. When defining this area, you can use the above method to extract definition data such as the presence or absence of rectangular coordinates or frames, the number of characters, etc. of the required area. C P U also performs a project name-definition data conversion process before and after this process (step S600). Although the details of the processing will be described later, in this processing, the identification data such as the attribute or the number of characters can be extracted by recognizing the characters around the designated definition area. With these places

獲得定義資料時,接著,CPU整 理疋義資料,藉由顯示器等顯示裝置1 0 2,對於使用者提 Z疋義資料。在提示的定義資料有錯誤時,或有未被設 白勺項目時,使用者修正、追加定義資料(步驟S 70 0 ) 在全寸便田也 一一访 %用考提不疋義資料時,藉由以顏色區分自動設When the definition data is obtained, the CPU organizes the definition data, and uses a display device such as a display 102 to provide Z definition data to the user. When there is an error in the definition data or if there are no items, the user revises and adds the definition data (step S 70 0). When the full-size convenience field is visited one by one% , Set automatically by color

定的項自 ’在自 或未設定項目等,可使使用者容易了解。另外 動設 笥敎 定的項目’噯昧性高的項目也同樣予以顏色 者梅_ 、促使用者注意。此處,雖舉定義資料的使用 :¾之一例’但是此外,也慮各種提示方法 分 例如 在第4 ( e )圖的例子中,整理由佈局解析以 -12> (9) 200416583 及項目名稱所抽出的定義資料予以顯示在視窗409中。 使用者確認所顯示的定義資料,如全部的定義資料爲正 確,則不對定義資料加上修正,如錯誤,則使用者修正 各定義資料,藉由按下OK按鈕410,結束對於框408之 定義區域設定。在沒有設定定義區域時’錯由按下淸除 按鈕4 1 1,則選擇之框的定義區域設定變成無效° 在第4 ( d )圖之例子中’成爲表格形式故’各框的 定義資料屬性在列單位具有相同値。例如’位於「銀行 名稱」下的框全部爲「銀行名稱」之屬性’ 「分店名稱 」也相同。如此,在將定義資料的屬性於列單位相等的 區域設定爲定義區域時,藉由使用者定義區域的複製機 能,可有效率地進行定義作業(步驟S 8 0 0 ) ° 例如,在第5 ( f )圖的例子中’定義區域在定義「 銀行名稱」、「分店名稱」、「戶頭號碼」的全區域時 ,則藉由以上說明的步驟,以設定存在於各項目正下方 的區域4 1 2。接著,按下複製按鈕4 0 3後,如第5 ( g ) 圖所示般,藉由以滑鼠104拖曳想要複製的區域413予 以圍住。CPU對於想要複製的區域4 1,印行檢出設定完 畢的定義資區域之處理,以及和檢出設定完畢的定義區 域4 1 2其縱向、橫向寬相等的框之處理。在此處理中, 對於設定完畢的定義區域4 1 2,在想要複製的區域4〗3內 ,搜尋上下方向,檢出縱向、橫向寬相等的框。接著, C P U如第5 ( h )圖所示般,將設定完畢的定義屬性値複 製於檢出的框中(4 1 *4 )。此處所謂之定義屬性値係指座 -13 - (10) (10)200416583 標資訊以外的文字數或知識辭典種類等定義資料。開始 位置或結束位置等座標資訊在各框不同故,這些定義資 料係藉由佈局解析所獲得的框資訊予以抽出。 在此例中,雖就對於列之複製做說明,但是對於行 之複製也可以同樣實現。另外,此外在檢出縱性、橫向 寬相等的框時,藉由顯示器1 02提示給使用者,使用者 以滑鼠1 04只選擇想要進行定義屬性複製的框,也可使 實行定義屬性複製。 藉由以上的處理,輸出設定的帳簿傳票定義資料( 步驟 S 9 0 0 ),結束帳簿傳票定義資料的自動作成處理。 作成的帳簿傳票如先前說明般,被記憶在帳簿傳票處理 裝置,可使用於帳簿傳票的文字辨識。例如,在第4圖 之例子中,於確認全部的帳簿傳票定義資料被正確設定 後,藉由按下保存按鈕404,可以保存。 C 1 .項目名稱-帳簿傳票定義資料轉換處理: 第6圖係項目名稱-定義資料轉換處理60 0的流程 圖。在此處理中,對於使用者指定的定義區域,檢出在 上方向、左方向鄰接的框(步驟 S6〇I)。此處,CPU參 考預先在帳簿傳票全面抽出的佈局資訊之框資訊表格, 檢出相符的框資訊。例如,在第7圖的例子中,在將「 平成14年12月1日」之區域7〇5指定爲定義區域時, 對於區域7 〇 5之鄰接框係以區域7 〇 6爲相符。 接著,CPU對於存在相符的鄰接框內之文字行,進 (11) (11)200416583 行文字辨識處理(步驟S 6 0 2 ),進行所獲得之辨識結果是 否存在的檢查(步驟S603)。此處’ CPU參考先前說明 的文字辨識用辭典1 1 〇,進行取得之光柵圖像和文字之對 照。另外,藉由將所獲得的文字列與項目名稱對照用知 識辭典1 1 1對照,實行以確定單字之知識對照處理。 例如,在第7圖之例子中,對於指定之定義區域7 〇 5 的鄰接框7 〇 6內的文字行7 0 7參考文字辨識用辭典1 1 0 以及項目名稱對照用知識辭典1 Π,獲得項目名稱辨識結 果「存入指定日期」。所獲得之辨識結果不存在係指無 相符之鄰接框時,或者鄰接框內不存在文字行時,另外 ,文字行雖存在,但是在知識對照失敗。例如,在第 7 圖之例子中,對於區域7 0 1之鄰接框不存在,只有鄰接 的文字行7 0 2存在。另外,對於區域7 0 3,也不存在鄰接 框,在區域7 0 3的內部存在文字行7 0 4。另外,鄰接框存 在有2個以上,項目名稱辨識結果存在2個以上時,以 由文字辨識處理所獲得之可靠度高者爲優先。另外,在 此情形,夜遊提示給使用者,也可使選擇正確項目名稱 〇 在可以獲得對於鄰接框內之項目名稱辨識結果時, C P U將辨識結果之項目名稱轉換爲定義資料(步驟S 6 0 9 )。在此處理中,藉由參考先前說明的項目名稱一定義 資訊轉換辭典1 n,轉換爲對於項目名稱之定義資料。第 8圖係顯示項目名稱一定義資訊轉換辭典n丨之一例。如 以第7圖之區域7 06內的「存入指定日期」爲例做說明 (12) (12)200416583 時,此項目名稱係存在於項目名稱-定義資訊轉換辭典 1 1 ],與此項目名稱相符的知識辭典種類爲「日期」、文 字數爲「1 2字」。如此,由項目名稱抽出定義資料。另 外,附隨項目名稱之定義資料可不管知識辭典種類或文 字數而設定各種資訊。例如,在知識辭典種類或文字數 以外,也可考慮文字種類等。 在步驟 6 0 3中,在無法獲得對於鄰接框之項目名稱 辨識結果時,則進行指定之定義區域內的文字行抽出處 理。此處,C P U參考預先在帳簿傳票全面所抽出的佈局 資訊的文字行資訊表格,檢出存在於指定區域內的文字 行資訊。對於抽出的文字行,進行文字辨識處理(步驟 S 6 0 4 ),進行所獲得的辨識結果是否存在的檢查(步驟 S 605 )。此處,CPU與上述相同,利用文字辨識用辭典 1 1 〇、項目名稱對照用知識辭典1 1 1,進行文字辨識。 例如,在第7圖之例子中,於將「委託日 年 月 」之區域7〇3指定爲定義區域時,將存在於指定之定義 區域7 0 3內部的文字行7 04參考文字辨識用辭典1 1〇以 及項目名稱對照知識辭典1 1 1,獲得項目名稱辨識結果「 委託日」。在可以獲得對於內部文字行之項目名稱辨識 結果時,CPU將辨識結果之項目名稱轉換爲定義資料( 步驟S 6 0 9 )。 在無法獲得對於內部文字行之項目名稱辨.識結果時 ,對於指定的定義區域,檢出與上方向、左方向鄰接之 文字行(步驟 S606)。此處,CPU參考預先在帳簿傳票 (13) (13)200416583 全面所抽出的佈局資訊的框資訊表格,檢出相符之文字 行資訊。例如,在第7圖之例子中,在將「一——殿」 之區域7 0 1指定爲定義區域時,對於區域7 01之鄰接文 字行係相當於區域7 〇 2。 接著,CPU對於相符之鄰接文字行,進行文字辨識 處理(步驟S 6 0 7 ),進行所獲得之辨識結果是否存在之 檢查(步驟 S 6 0 8 )。此處,C P U與上述相同,利用文字 辨識用辭典1 1 〇、項目名稱對照知識辭典丨i 1以進行文字 辨識。例如,在第7圖之例子中,將對於指定之定義區 域7 0 1的鄰接文字行7 0 2參考文字辨識用辭典1 1 0以及 項目名稱對照知識辭典1 1 1,獲得項目名稱辨識結果「委 託人」。 在可獲得對於鄰接文字行之項目名稱辨識結果時, CPU將辨識結果的項目名稱轉換爲定義資料(步驟S60 9 )。在無法獲得對於鄰接文字行之項目名稱辨識結果時 ,將指定之定義區域視爲不具有項目名稱之區域,知識 辭典種類或文字數等之定義資料當成未設定而結束。 CPU對於全部指定定義區域進行以上處理。另外, 在此次的項目名稱抽出處理中,雖以鄰接框內文字行、 指定定義區域內文字行、鄰接文字行之順序設定優先度 ,但是,也可依據帳簿傳票種類而變更優先度。另外, 不使用3個文字行,例如也可以只使用鄰接框納文字行 。藉由如此,在項目名稱出現的位置受到限制之帳簿傳 票等,可以進行更正確之項目名稱抽出定義資料作成。 -17- (14) (14)200416583 如以上說明般,依據揭示之技術,可以儘可能使帳 簿傳票定義資料的作成自動化,關於無法自動化的處理 ,藉由部份地人工之介入,可以更圓滑地支援帳簿傳票 定義資料作成。 使用者在修正、追加定義資料時,藉由在顯示裝置 以顏色區分自動設定的項目或未設定項目等,可以使之 成爲使用者容易了解的畫面構造。另外,在自動設定的 項目中,曖昧性高的項目也可同樣予以顏色區分,可促 使使用者注意。 另外,不管特定帳簿傳票或者記載完畢、未記載, 任何種類之帳簿傳票都可以自動作成帳簿傳票定義資料 〇 以上,所揭示的技術並不限定於實施形態,在不脫 離其旨趣之範圍內,不用說可以有種種之構造。例如, 以上的控制處理在以軟體實現之外,也可以硬體實現。 另外,也可以帳簿傳票處理裝置的文字辨識手段實施帳 簿傳票定義資料的作成來構成。 如依據所揭示的技術,不管特定的帳簿傳票或者記 載完畢、未記載,任何種類之帳簿傳票都可以自動作成 帳簿傳票定義資料。 【圖式簡單說明】 第1圖係帳簿傳票處理裝置的槪略構造圖。 第2圖係顯示帳簿傳票影像以及帳簿傳票定義資料 -18- (15) (15)200416583 的構造圖。 第3圖係帳簿傳票定義資料自動做成處理的流程圖 〇 第 4圖係顯示說明帳簿傳票定義資料的作成用之顯 示例圖。 第5圖係顯示說明帳簿傳票定義資料作成用之顯示 例圖。 第6圖係帳簿傳票定義資料作成之項目名稱-定義 資料轉換處理的流程圖。 第 7圖係顯示帳簿傳票定義資料之對於所指定的定 義區域的項目名稱的位置圖。 第8圖係顯示帳簿傳票定義資料作成的項目名稱一 定義資料轉換辭典的一例圖。 主要元件對照表 101 個人電腦 1 02 顯示器 103 鍵盤 1 04 滑鼠 10 5 掃描器 1 06 帳簿傳票 10 7 影像輸入部 108 帳簿傳票定義資料作成部 109 帳簿傳票定義資料輸出部 -19- (16) (16)200416583 110 文字辨識用辭典 111 項目名稱對照用知識辭典 1 1 2 項目名稱-定義資訊轉換辭典 113 帳簿傳票定義資料 201 帳簿傳票影像 2 0 2 帳簿傳票定義資料The fixed items can be easily understood by the user or not. In addition, the fixed items ‘highly ambiguous items’ are also colored in the same way, and users are reminded. Here, although the use of definition data is given as an example: “¾” In addition, various methods of presentation are also considered. For example, in the example in Figure 4 (e), the layout analysis is organized as -12 > (9) 200416583 and the project name. The extracted definition data is displayed in a window 409. The user confirms the displayed definition data. If all the definition data is correct, no correction is added to the definition data. If the definition data is incorrect, the user corrects each definition data. By pressing the OK button 410, the definition of the box 408 is ended. Locale. When the definition area is not set, 'wrongly press the delete button 4 1 1 and the definition area setting of the selected frame becomes invalid ° In the example in Fig. 4 (d), the definition data of each frame is' formed into a table' Attributes have the same unit in column units. For example, "the box under" Bank Name "is all attributes of" Bank Name "" and "Branch Name" is the same. In this way, when an area where the attributes of the definition data are equal to the unit of the row is set as the definition area, the copy function of the user-defined area can efficiently perform the definition operation (step S 8 0 0). (f) In the example of the figure, when defining the entire area of "Bank name", "Branch name", and "Account number" in the definition area, use the steps described above to set the area directly below each item. 4 1 2. Next, after pressing the copy button 403, as shown in FIG. 5 (g), the area 413 to be copied is dragged with the mouse 104 to surround it. For the area 41 to be copied, the CPU processes the print-defining defined area and the process of detecting the set-defined area 4 1 2 with the same vertical and horizontal width. In this process, for the defined area 4 1 2 that has been set, within the area 4 〖3 that you want to copy, search the vertical direction, and detect the frames with the same width in the vertical and horizontal directions. Next, as shown in FIG. 5 (h), C P U copies the set definition attributes 于 into the detected box (4 1 * 4). The definition attribute here refers to definition data such as the number of characters other than the subject information or the type of knowledge dictionary. The coordinate information such as the start position or end position is different in each frame. Therefore, these definitions are extracted from the frame information obtained by layout analysis. In this example, although the copying of columns is explained, the copying of rows can be implemented in the same way. In addition, when a box with the same vertical and horizontal width is detected, the user is prompted by the display 102, and the user selects only the box with the desired attribute copy by using the mouse 104, and the defined attribute can also be implemented. copy. Through the above processing, the set book voucher definition data is output (step S900), and the automatic creation process of the book voucher definition data is ended. The book voucher created is stored in the book voucher processing device as described above, and can be used for character recognition of the book voucher. For example, in the example shown in FIG. 4, after confirming that all of the book voucher definition data are correctly set, the save button 404 can be saved. C 1. Project name-book voucher definition data conversion process: Figure 6 is a flow chart of the project name-definition data conversion process. In this process, for the defined area designated by the user, a frame adjacent to the up direction and the left direction is detected (step S601). Here, the CPU refers to the frame information table of the layout information extracted in advance from the book voucher in advance, and detects the matching frame information. For example, in the example in FIG. 7, when the area 705 of “December 1, 2014” is designated as the defined area, the adjacent frame of the area 705 corresponds to the area 706. Next, the CPU performs (11) (11) 200416583 line character recognition processing (step S602) on the character lines in the matching adjacent frame, and checks whether the obtained recognition result exists (step S603). Here, the CPU compares the obtained raster image and the character with reference to the previously described character recognition dictionary 1 10. In addition, by comparing the obtained character string with the item name matching knowledge dictionary 1 1 1, a knowledge matching process for determining a single word is performed. For example, in the example shown in FIG. 7, for the text line 7 0 7 in the adjacent frame 7 〇6 of the designated definition area 7 〇 5, refer to the text recognition dictionary 1 1 0 and the project name comparison knowledge dictionary 1 Π to obtain The item name recognition result is "stored on a specified date". The absence of the obtained recognition result refers to the case where there is no matching adjacent frame, or when there is no text line in the adjacent frame. In addition, although the text line exists, the knowledge comparison fails. For example, in the example of Fig. 7, for the adjacent frame of area 701, only the adjacent text line 702 exists. In addition, there is no adjacent frame in the area 703, and there is a line 704 in the area 703. In addition, if there are more than two adjacent frames and there are more than two item name recognition results, the higher reliability obtained by the character recognition process is preferred. In addition, in this case, prompting the night tour to the user can also select the correct item name. When the identification result of the item name in the adjacent frame can be obtained, the CPU converts the item name of the identification result into definition data (step S 6 0 9). In this process, by referring to the previously described item name-definition information conversion dictionary 1 n, it is converted into definition data for the item name. FIG. 8 shows an example of a project name-definition information conversion dictionary n 丨. For example, in the case of "Depositing a Specified Date" in the area 7 06 in Figure 7 (12) (12) 200416583, the item name exists in the item name-definition information conversion dictionary 1 1], and this item The type of knowledge dictionary that matches the name is "date", and the number of characters is "12 characters." In this way, definition data is extracted from the project name. In addition, the definition data accompanying the project name can set various information regardless of the type of knowledge dictionary or the number of words. For example, in addition to the type of knowledge dictionary or the number of characters, the type of characters may be considered. In step 603, when the recognition result of the item name of the adjacent frame cannot be obtained, the character line extraction processing in the designated definition area is performed. Here, CP refers to the text line information table of the layout information extracted in advance from the book voucher in full, and detects the text line information existing in the designated area. For the extracted character line, a character recognition process is performed (step S604), and a check is performed as to whether the obtained recognition result exists (step S605). Here, the CPU performs character recognition using the dictionary 1 1 0 for character recognition and the knowledge dictionary 1 1 1 for item name matching, as described above. For example, in the example in FIG. 7, when the area 703 of the “trusted date year and month” is designated as a defined area, a character line 7 04 existing in the designated defined area 7 0 3 is referred to as a dictionary for character recognition. 1 1 10 and the project name comparison knowledge dictionary 1 1 1 to obtain the project name identification result "trust day". When the identification result of the item name of the internal text line can be obtained, the CPU converts the item name of the identification result into definition data (step S 609). When the identification result of the item name of the internal text line cannot be obtained, the text line adjacent to the up direction and the left direction is detected for the specified definition area (step S606). Here, the CPU refers to the frame information table of the layout information extracted in advance from the bookkeeping voucher (13) (13) 200416583, and detects the matching text and line information. For example, in the example in FIG. 7, when the area 701 of "one-temple" is designated as the defined area, the adjacent text line for area 701 is equivalent to area 702. Next, the CPU performs character recognition processing on the matching adjacent character lines (step S 607), and checks whether the obtained recognition result exists (step S 608). Here, C P U is the same as above, and the character recognition dictionary 1 1 0 and the item name are compared with the knowledge dictionary 丨 i 1 for character recognition. For example, in the example in FIG. 7, the adjacent character line 7 0 2 of the designated definition area 7 0 1 is referred to the character recognition dictionary 1 1 0 and the item name comparison knowledge dictionary 1 1 1 to obtain the item name recognition result " Client ". When the recognition result of the item name of the adjacent text line is obtained, the CPU converts the item name of the recognition result into definition data (step S60 9). When the identification result of the item name of the adjacent text line cannot be obtained, the specified definition area is regarded as the area without the item name, and the definition data such as the type of knowledge dictionary or the number of characters ends as not being set. The CPU performs the above processing for all designated definition areas. In addition, in this item name extraction process, although the priority is set in the order of the text line in the adjacent frame, the text line in the designated definition area, and the adjacent text line, the priority may be changed according to the type of book voucher. In addition, instead of using three lines of text, for example, only adjacent lines can be used to contain text lines. By doing so, it is possible to create more accurate project name extraction definition data in book vouchers, etc. where the position of the project name appears to be restricted. -17- (14) (14) 200416583 As explained above, according to the disclosed technology, the creation of book voucher definition data can be automated as much as possible. Regarding the processing that cannot be automated, it can be smoother by partly manual intervention. Local support account book voucher definition data creation. When the user corrects or adds definition data, the display device can distinguish the automatically set items or unset items by colors on the display device, so that the screen structure can be easily understood by the user. In addition, among the automatically set items, the items with high ambiguity can also be color-coded to attract the user's attention. In addition, regardless of the specific account book voucher, or any recorded account book voucher, any type of account book voucher can be automatically created into account book voucher definition data. Above, the disclosed technology is not limited to the implementation form. It can be said that there can be various structures. For example, the above control processing may be implemented in hardware in addition to software. Alternatively, the character recognition means of the book voucher processing device may be configured to implement creation of book voucher definition data. For example, according to the disclosed technology, regardless of the particular book voucher or after it is recorded and not recorded, any kind of book voucher can be automatically created as book voucher definition data. [Brief description of the drawings] FIG. 1 is a schematic structural diagram of an account book voucher processing device. Figure 2 shows the structure of the book voucher image and the book voucher definition data -18- (15) (15) 200416583. Figure 3 is a flowchart of the process of automatically making the definition of book voucher. Figure 4 is a diagram showing an example of the definition of the definition of book voucher. Fig. 5 is a display example showing the definition of book voucher definition data. Fig. 6 is a flowchart of the project name-definition data conversion process for the definition of book slip voucher data. Fig. 7 is a diagram showing the position of the item name of the designated area defined in the book voucher definition data. FIG. 8 is a diagram showing an example of a project name-definition data conversion dictionary created from book voucher definition data. Main component comparison table 101 Personal computer 1 02 Display 103 Keyboard 1 04 Mouse 10 5 Scanner 1 06 Accounting voucher 10 7 Image input unit 108 Accounting voucher definition data creation unit 109 Accounting voucher definition data output unit-19- (16) ( 16) 200416583 110 Dictionary for character recognition 111 Knowledge dictionary for project name comparison 1 1 2 Project name-definition information conversion dictionary 113 Book voucher definition data 201 Book voucher image 2 0 2 Book voucher definition data

”20-"20-

Claims (1)

(1) (1)200416583 拾、申請專利範圍 1 . 一種帳簿傳票定義資料作成方法,其特徵爲: 取得帳簿傳票的影像資料; 由該影像資料抽出文字資訊的佈局資訊; 由對應所指定的定義區域之上述佈局資訊,抽出關 於該定義區域的位置之第1定義資料; 辨識存在於前述定義區域的周邊或者內部的文字資 訊; 將辨識結果轉換爲關於該定義區域的屬性之第2定 義資料。 2 .如申請專利範圍第1項所記載之帳簿傳票定義資料 作成方法,其中:在前述定義區域的附近檢查前述文字資 訊的存在; 於檢查結果,於前述定義區域的附近沒有檢出文字 資訊的存在的場合,在該定義區域的內部檢查文字資訊 的存在; 於檢查結果,於前述定義區域的內部也沒有檢出文 字資訊的存在的場合,檢查位於該定義區域的上方向以 及左方向之文字資訊的存在。 3 .如申請專利範圍第1項所記載之帳簿傳票定義資料 作成方法,其中:於前述定義區域於列方向連續的場合, 由對應各定義區域的前述佈局資訊抽出關於該各定義區 域的位置之第1定義資料,前述第2定義資料是把關於 前述各定義區域的屬性當成第2定義資料來加以複製。 -21 - (2) (2)200416583 4 ·如申請專利範圍第1項所記載之帳簿傳票定義資料 作成方法,其中:於前述佈局資訊錯誤的場合,以經過修 正的資訊爲基礎,再度抽出佈局資訊。 5 .如申請專利範圍第1項所記載之帳簿傳票定義資料 作成方法,其中:經由自對應到前述定義區域之前述佈局 資訊以求得該定義區域內的各框之縱橫比來判定文字框 的有無,於判定爲文字框的場合,計算文字框數以抽出 文字數的定義資料。 6 ·〜種帳簿傳票處理裝置,是針對依據帳簿傳票的影 像資料以進行所記載的內容之文字辨識處理時所使用的 定義資料的作成,其特徵爲具有: 取得帳簿傳票的影像資料之手段;和 於前述影像資料,抽出框、格線以及文字行等佈局 解析資訊之手段;和 由對應所指定的定義區域之前述佈局解析資訊來抽 出關於該定義區域的位置的定義資料之手段;和 由存在於前述定義區域周邊或者內部的框以及文字 行來抽出該定義區域的項目名稱之手段;和 進行前述項目名稱的文字辨識之手段;和 將由前述文字辨識處理所獲得的辨識結果與項目名 稱辭典對照之手段;和 將由前述對照結果所獲得的項目名稱轉換爲顯示該 定義區域的屬性的定義資料之手段;和 整理前述定義資料以輸出於帳簿傳票定義資料檔案 -22- (3) (3)200416583 之手段。 7 .如申請專利範圍第6項所記載之帳簿傳票處理裝置 ,其中:具有,於上述佈局解析資訊錯誤的場合,以經過 修正的佈局解析修正資訊爲基礎,經由再度進行佈局解 析處理,來修正格線或框等之佈局解析資訊之手段。 8 . —種帳簿傳票處理裝置;是具有,讀取帳簿傳票以 取得影像資料之影像輸入手段,和及文字辨識來自該影 像輸入手段的影像資料之文字辨識手段,其特徵爲: 前述文字辨識手段係由來自前述影像輸入手段的影 像資料抽出文字資訊的佈局資訊,由對應所指定的定義 區域之前述佈局資訊來抽出關於該定義區域的位置之第1 定義資料,辨識存在於前述定義區域的周邊或者內部之 文字資訊,將辨識結果轉換爲關於該定義區域的屬性之 第2定義資料,彙整保存該第2定義資料和上述第1定 義資料。 9 .如申請專利範圍第8項所記載之帳簿傳票處理裝置 ,其中:於前述定義區域於列方向連續的場合,前述文字 辨識手段由對應前述各定義區域的前述佈局資訊抽出關 於該各定義區域的位置之第1定義資料,前述第2定義 資料是把關於前述各定義區域的屬性當成第2定義資料 來加以複製。 1 0 .如申請專利範圍第8項所記載之帳簿傳票處理裝 置,其中:前述文字辨識手段經由自對應到前述定義區域 之前述佈局資訊以求得該定義區域內的各框之縱橫比來 -23- (4)200416583 判定文字框的有無,於判定爲文字框的場合,計算文字 框數以抽出文字數的定義資料。(1) (1) 200416583 Pick up and apply for patent scope 1. A method for creating account book voucher definition data, which is characterized by: obtaining image data of account book voucher; extracting text information from this image data; layout information; The above layout information of the area extracts the first definition data about the position of the defined area; identifies the text information existing around or inside the aforementioned defined area; and converts the recognition result into the second definition data about the attributes of the defined area. 2. The method of preparing the book voucher definition data described in item 1 of the scope of the patent application, wherein: the existence of the foregoing text information is checked near the aforementioned defined area; based on the inspection result, no text information is detected near the aforementioned defined area If it exists, check the existence of text information inside the defined area. If the result of the inspection does not detect the existence of text information inside the aforementioned defined area, check the text in the upper and left directions of the defined area. The existence of information. 3. The method of preparing the book voucher definition data described in item 1 of the scope of the patent application, where: in the case where the aforementioned definition area is continuous in the column direction, the position information of each defined area is extracted from the aforementioned layout information corresponding to each definition area. The first definition data and the second definition data are copied as attributes of the second definition data as the second definition data. -21-(2) (2) 200416583 4 · The method of creating book voucher definition data as described in item 1 of the scope of patent application, where: in the case where the aforementioned layout information is incorrect, based on the revised information, the layout is extracted again Information. 5. The method for making account book voucher definition data as described in item 1 of the scope of patent application, wherein: the aspect ratio of each frame in the defined area is determined through the aforementioned layout information corresponding to the aforementioned defined area to determine the text frame Presence or absence, when it is judged that it is a text frame, the number of text frames is calculated to extract the definition data of the number of characters. 6 ~~ A kind of account book voucher processing device is for the definition data used for character recognition processing of the recorded content based on the image data of the book voucher, and is characterized by having the following means: obtaining the image data of the book voucher; And means for extracting layout analysis information such as frames, ruled lines, and text lines from the aforementioned image data; and means for extracting definition data about the location of the defined area from the aforementioned layout analysis information corresponding to the specified defined area; and A means for extracting the item name of the defined area by a frame and a text line existing around or inside the aforementioned defined area; and a means for character recognition of the aforementioned item name; and a recognition result and a project name dictionary obtained by the aforementioned character recognition processing Means of comparison; and means of converting the item name obtained from the aforementioned comparison result into definition data showing the attributes of the defined area; and collation of the aforementioned definition data to output to the book voucher definition data file-22- (3) (3) 200416583. 7. The book voucher processing device as described in item 6 of the scope of the patent application, which includes: when the layout analysis information is incorrect, based on the revised layout analysis correction information, the layout analysis processing is performed again to correct it Grid or frame layout analysis information means. 8. A kind of account book voucher processing device; it is an image input means that reads the account book voucher to obtain image data, and a character recognition method that recognizes image data from the image input means by characters, and is characterized by the aforementioned character recognition means The layout information of text information is extracted from the image data from the aforementioned image input means, and the first definition data on the position of the defined area is extracted from the aforementioned layout information corresponding to the specified defined area, and the existence existing around the aforementioned defined area is identified. Or the internal text information converts the recognition result into the second definition data about the attributes of the defined area, and saves the second definition data and the above first definition data in an integrated manner. 9. The book voucher processing device as described in item 8 of the scope of patent application, wherein: in the case where the aforementioned defined areas are continuous in the column direction, the aforementioned character recognition means extracts information about the respective defined areas from the aforementioned layout information corresponding to the aforementioned defined areas. The first definition data of the position, the second definition data is a copy of the attributes of each of the aforementioned definition areas as the second definition data. 10. The book voucher processing device as described in item 8 of the scope of the patent application, wherein the aforementioned character recognition means obtains the aspect ratio of each frame in the defined area through the aforementioned layout information corresponding to the defined area- 23- (4) 200416583 Determines the presence or absence of a text frame. When it is determined that it is a text frame, the number of text frames is calculated to extract the definition data of the number of characters. 一 24 -Mon 24-
TW092132932A 2003-02-24 2003-11-24 Definition data generation method of account book voucher and processing device of account book voucher TW200416583A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003045406A JP4183527B2 (en) 2003-02-24 2003-02-24 Form definition data creation method and form processing apparatus

Publications (1)

Publication Number Publication Date
TW200416583A true TW200416583A (en) 2004-09-01

Family

ID=33112215

Family Applications (1)

Application Number Title Priority Date Filing Date
TW092132932A TW200416583A (en) 2003-02-24 2003-11-24 Definition data generation method of account book voucher and processing device of account book voucher

Country Status (4)

Country Link
JP (1) JP4183527B2 (en)
KR (1) KR100570224B1 (en)
CN (1) CN1525378A (en)
TW (1) TW200416583A (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4973063B2 (en) * 2006-08-14 2012-07-11 富士通株式会社 Table data processing method and apparatus
CN101464951B (en) * 2007-12-21 2012-05-30 北大方正集团有限公司 Image recognition method and system
JP5561856B2 (en) * 2010-05-24 2014-07-30 株式会社Pfu Form creation device, form creation program, and form creation method
JP2012009005A (en) * 2010-05-24 2012-01-12 Pfu Ltd Business form processing system, ocr device, ocr processing program, business form creation device, business form creation program, and business form processing method
JP2012009000A (en) * 2010-05-24 2012-01-12 Pfu Ltd Business form processing system, ocr device, ocr processing program, business form creation device, business form creation program, and business form processing method
JP5583542B2 (en) * 2010-05-24 2014-09-03 株式会社Pfu Form processing system, OCR device, OCR processing program, form creation device, form creation program, and form processing method
JP5556524B2 (en) 2010-09-13 2014-07-23 株式会社リコー Form processing apparatus, form processing method, form processing program, and recording medium recording the program
JP2012083951A (en) * 2010-10-12 2012-04-26 Pfu Ltd Information processing equipment, information processing method and program
JP2013109690A (en) * 2011-11-24 2013-06-06 Oki Electric Ind Co Ltd Business form data input device, and business form data input method
WO2014061081A1 (en) * 2012-10-15 2014-04-24 富士通株式会社 Form creation assistance device, form creation assistance method, and form creation assistance program
CN102930174B (en) * 2012-11-20 2015-07-01 江苏省疾病预防控制中心 System and method for acquiring residential health information
CN103092625B (en) * 2013-01-28 2016-01-20 中国航空结算有限责任公司 A kind of method and apparatus of the process civil aviation passenger transport passenger ticket ticket data based on .NET Framework platform
JP6109688B2 (en) * 2013-09-06 2017-04-05 株式会社東芝 Form reader and program
CN104391830A (en) * 2014-10-24 2015-03-04 华迪计算机集团有限公司 Method and device for dynamic layout of bill page
WO2016181458A1 (en) * 2015-05-11 2016-11-17 株式会社東芝 Recognition device, recognition method and program
JP7235269B2 (en) * 2017-03-13 2023-03-08 日本電気株式会社 Data item name estimation device, data item name estimation program, and data item name estimation method
JP6445645B1 (en) * 2017-09-21 2018-12-26 株式会社東芝 Form information recognition apparatus and form information recognition method
CN109634606A (en) * 2018-12-10 2019-04-16 山东浪潮通软信息科技有限公司 A kind of method and device of defined function menu
JP7259468B2 (en) 2019-03-25 2023-04-18 富士フイルムビジネスイノベーション株式会社 Information processing device and program
JP7558644B2 (en) * 2019-03-29 2024-10-01 キヤノン株式会社 Image processing device, control method thereof, and program
CN111931473A (en) * 2019-05-13 2020-11-13 阿里巴巴集团控股有限公司 Bill processing method and device
JP2021039429A (en) * 2019-08-30 2021-03-11 富士ゼロックス株式会社 Information processing device and information processing program
JP7468004B2 (en) 2020-03-11 2024-04-16 富士フイルムビジネスイノベーション株式会社 Document processing device and program
US20230144394A1 (en) * 2020-05-01 2023-05-11 3M Innovative Properties Company Systems and methods for managing digital notes

Also Published As

Publication number Publication date
KR100570224B1 (en) 2006-04-11
JP4183527B2 (en) 2008-11-19
JP2004258706A (en) 2004-09-16
CN1525378A (en) 2004-09-01
KR20040078046A (en) 2004-09-08

Similar Documents

Publication Publication Date Title
TW200416583A (en) Definition data generation method of account book voucher and processing device of account book voucher
US8107727B2 (en) Document processing apparatus, document processing method, and computer program product
JP4998219B2 (en) Form recognition program, form recognition apparatus, and form recognition method
US7853869B2 (en) Creation of semantic objects for providing logical structure to markup language representations of documents
JP2004139484A (en) Form processing apparatus, program for executing the apparatus, and form format creation program
JP2010510563A (en) Automatic generation of form definitions from hardcopy forms
JP2013089197A (en) Electronic comic editing device, method and program
JP4854491B2 (en) Image processing apparatus and control method thereof
US10803233B2 (en) Method and system of extracting structured data from a document
JP2005216203A (en) Table format data processing method and table format data processing apparatus
JP4785655B2 (en) Document processing apparatus and document processing method
JP2010267083A (en) Form search device, form search program, and form search method
JP3898645B2 (en) Form format editing device and form format editing program
CN116682118B (en) A method, system, terminal and medium for ancient character recognition
JP2009151676A (en) Data processing apparatus, data processing method, and program
JP2024003769A (en) Character recognition systems, computer recognition methods, and character search systems
JP2006221569A (en) Document processing system, document processing method, program, and storage medium
JP6994727B1 (en) Reading system, reading program and reading method
JP2007241355A (en) Image processor and image processing program
JP5724286B2 (en) Form creation device, form creation method, program
JP3484446B2 (en) Optical character recognition device
JP6190549B1 (en) Document processing system
JP6960646B1 (en) Reading system, reading program and reading method
JP4521377B2 (en) Form processing apparatus, program for executing the apparatus, and form format creation program
JP2682873B2 (en) Recognition device for tabular documents