WO2020175662A1 - Dispositif de création de dictionnaire, procédé de création de dictionnaire et programme de création de dictionnaire - Google Patents
Dispositif de création de dictionnaire, procédé de création de dictionnaire et programme de création de dictionnaire Download PDFInfo
- Publication number
- WO2020175662A1 WO2020175662A1 PCT/JP2020/008190 JP2020008190W WO2020175662A1 WO 2020175662 A1 WO2020175662 A1 WO 2020175662A1 JP 2020008190 W JP2020008190 W JP 2020008190W WO 2020175662 A1 WO2020175662 A1 WO 2020175662A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dictionary
- words
- common word
- item
- synonym
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Definitions
- the present invention relates to a dictionary creating device, a dictionary creating method, and a dictionary creating program, and in particular, creating a dictionary that creates a synonym dictionary and/or a synonym dictionary for a word in an item name used in a form.
- the present invention relates to a device, a dictionary creating method, and a dictionary creating program.
- the form is a paper medium, but it is desired to reduce the management cost of the form by using an input form that is an electronic form of the paper medium.
- Patent Document 1 discloses a system that determines the type of a form and uses the input form according to the type of the form to perform the acceptance processing of the form.
- Patent Document 1 Japanese Patent Laid-Open No. 20 0 4 _ 1 2 6 9 10
- the corresponding item names may differ depending on the local government or company. Therefore, when trying to standardize item names for many types of forms, there was a problem that the list of item names would be huge and it would be extremely labor intensive to organize them manually. Therefore, it is desirable to set a standard item name for the item name that is used as the same meaning in multiple forms, but to further improve the standardization accuracy of the item name, it is included in the item name.
- the words are synonymous with each other ⁇ 02020/175662 2 ⁇ (: 171?2020/008190
- the present invention has been made in view of the above problems, and an object thereof is whether words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Another object of the present invention is to provide a synonym dictionary for determining whether or not a dictionary, a dictionary creation device for creating a synonym dictionary, a dictionary creation method, and a dictionary creation program. Means for solving the problem
- the above problem is a dictionary creating device for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form, which is described in a plurality of forms.
- an item name acquisition unit that acquires a plurality of item names, and one or more words included in each of the plurality of item names acquired by the item name acquisition unit are classified based on a predetermined condition. For each common word group, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the first processing unit that creates the common word group and the information that identifies the form. And a second processing unit that does.
- a synonym dictionary can be created.
- the first processing unit may classify words other than the common word of item names including words common to a plurality of the item names into the same common word group. ..
- the second processing unit determines that the words are synonymous when the words in one common word group are not used in the same form. Good to do.
- the item name acquisition unit acquires, for each item name, form identification information that identifies a form in which the acquired item name is described, and the common word group is common.
- the word group storage unit has a word belonging to the common word group and form identification information of a form in which the word is described for each word, and the second processing unit is a processing target.
- the second processing unit determines the words to be synonyms when the words to be processed have common form identification information.
- the above problem is a dictionary creating method by a dictionary creating apparatus for creating at least one of a synonym dictionary and a synonym dictionary, wherein the dictionary creating apparatus comprises: An item name acquisition step of acquiring a plurality of item names described in a plurality of forms, and one or a plurality of words included in each of the plurality of item names acquired in the item name acquisition step based on a predetermined condition. Based on the first processing step of classifying and creating one or more common word groups, it is determined whether the words in the common word group are synonymous or synonymous with each other based on the information identifying the form. The second processing step of discriminating for each word group is provided to solve the above problem.
- the above-mentioned problem is a dictionary creation program for creating at least one of a synonym dictionary and a synonym dictionary of item names of a form.
- An item name acquisition unit that acquires a plurality of listed item names and one or more words contained in each of the plurality of item names acquired by the item name acquisition unit are classified based on predetermined conditions.
- the first processing unit that creates one or more common word groups and whether the words in the common word groups are synonymous or synonymous with each other is determined based on the information that identifies the form. It is solved by making it function as the second processing unit that determines for each group.
- a synonym dictionary and a synonym dictionary for determining whether the words in a plurality of item names used in a plurality of forms are synonyms or synonyms. Can be created.
- FIG. 1 is a diagram showing an overall configuration of an information processing system.
- Fig. 2 is a diagram for explaining the outline of the synonym dictionary creation process.
- FIG. 3 is a functional block diagram of the dictionary creation device.
- FIG. 4 A flow chart of dictionary creation processing.
- FIG. 5 A flow chart of dictionary creation processing.
- FIGS. 1 to 5 a dictionary creation device 10 according to an embodiment of the present invention (hereinafter, referred to as the present embodiment) will be described with reference to FIGS. 1 to 5.
- the “form” is a paper medium or electronic medium that can be used to enter information and that is subjected to a prescribed process (procedure).
- a “form” is used to apply for a local government such as a municipality, a country, or a private company. Specifically, birth notification, pregnancy notification, etc. correspond to an example of a “form”.
- the “item name” is a component of the form and is information for defining the content and format of the input information to the form. For example, "child's name”, “child's date of birth”, etc. correspond to an example of the above "item name”.
- “Synonyms” are synonyms when two or more different words have the same meaning as each other, especially when they are used as words showing the same attribute in a form item.
- “Synonyms” are synonyms when two or more different words have different meanings, especially when they are used as words that indicate different attributes in a form item.
- a “synonym dictionary” is a collection of data that has information that allows determining that two or more words are synonyms for each other. For example, if “child” and “child” are synonymous with “name” and “name”, it is possible to determine that these words have a synonymous relationship by referring to the synonym dictionary. ..
- a “synonym dictionary” is a collection of data that has information that enables two or more words to be synonymous with each other. For example, if “child” and “mother” are synonyms, and “name” and “date of birth” are synonyms, it is possible to determine that these terms are synonymous by referring to the synonym dictionary. Is.
- “synonyms” and “synonyms” are collectively referred to as “synonyms”, and “synonyms dictionary” and “synonyms dictionary” are also referred to as “synonyms”.
- “Synonym dictionary” is a data collection of the above “synonym dictionary”. ⁇ 02020/175662 6 ⁇ (: 171?2020/008190
- the information processing system 1 includes a synonym dictionary creating device 10 (hereinafter referred to as “dictionary creating device 10 ”) and a form processing device 30.
- the dictionary creating device 10 and the form processing device 30 are communicably connected via a network such as an internet or an intranet (not shown).
- the form processing device 30 is connected to the scanner 40.
- the scanner 40 is a device that captures image information by optically scanning a paper medium.
- the scanner 40 outputs a scan image (image information) obtained by scanning the form to the form processing device 30.
- the form processing device 30 is a computer that processes the form captured by the scanner 40. Specifically, the form processing device 30 executes XX 8 (optical character recognition) on the form to obtain the character string described in the form. In addition, the form processing device 30 determines whether or not the form? The table structure of is analyzed. More specifically, the form processing apparatus 30 divides the form into item columns, input columns, and fill-in input columns, and analyzes information on item names described in the item column (and fill-in input column). ..
- the item column is an area in which a character string as an item name is written
- the input column is an area in which no character string is written and the information corresponding to the item column is input.
- a character string is described, and information is entered between the character strings. This is the area to
- An input device 3 1 is connected to the form processing device 30 and information can be input via the input device 3 1.
- a display device 32 is connected to the form processing device 30, and a U screen or the like can be displayed on the display device 32.
- information on a plurality of types of forms P analyzed by the form processing device 30 is output to the dictionary creating device 10. Then, the dictionary creating device 10 creates a synonym dictionary and a synonym dictionary for determining whether the words in the item names used in multiple types of forms P are synonyms or synonyms. To do.
- the dictionary creating device 10 is a computer that includes a processor 11, a storage device 12 and a communication interface 13 as hardware.
- the processor 11 is configured to include, for example, a central processing unit (Central Processing Unit), and executes various arithmetic processes based on the programs and data stored in the storage device 12, and the dictionary. Controls each part of the creation device 10.
- a central processing unit Central Processing Unit
- the processor 11 executes various arithmetic processes based on the programs and data stored in the storage device 12, and the dictionary. Controls each part of the creation device 10.
- the storage device 12 is configured to include, for example, a memory and a magnetic disk device, stores various programs and data, and also functions as a work memory for the processor 11.
- the communication interface has a communication interface such as a network interface card (N IC) and is connected to the network via the communication interface. Then, the communication interface communicates with a device such as the form processing device 30 via the network.
- N IC network interface card
- the dictionary creation device 10 is configured to manage a plurality of procedures related to various procedures. ⁇ 02020/175662 8 ⁇ (: 171?2020/008190
- Acquire form group ⁇ consisting of forms. Multiple document contains a form on the same _ procedure that is needed use in more than one municipality. Even in the same _ procedure, when the municipality is different, because the item name, which is the form and use of the form is different, contains each of the book form to the form group ⁇ .
- each form includes one or more item names such as “8”, “Mimi”, and “0”.
- Item name ⁇ is a phrase that contains one or more words.
- each item name includes a form account that can identify the form.
- the dictionary creating device 10 extracts the item name item from each form.
- the procedure identification information, the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
- the dictionary creating device 10 extracts the item name item from each form.
- the procedure identification information, the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
- the form identification information which is the identification information of the procedure mouth, the form I mouth, etc.
- the dictionary creating device 10 extracts the item name item from each form.
- the procedure identification information the form identification information, which is the identification information of the procedure mouth, the form I mouth, etc.
- the entire item name ⁇ extracted from the forms included in the form group ⁇ is referred to as an item name group ⁇ .
- the dictionary creating device 10 classifies the item name I included in the item name group ⁇ into a common word group (first process: common word group creating process).
- the dictionary creation device 10 acquires one procedure (procedure) to be processed, and in the item name I included in the item name group ⁇ , the item name I belonging to the procedure is included in the item name.
- the dictionary creating device 10 extracts a noun from the words (morphemes) obtained by decomposing the item names ⁇ 1 and ⁇ 2 by morphological analysis.
- words morphemes
- nouns extracted by morphological analysis are called "words”.
- the common word group includes a form entry corresponding to each word.
- the dictionary creating device 10 carries out the first process for all item names I belonging to the procedure to be processed, and creates a common word group for the item names of procedure 8. Then, this process is repeated for each procedure to create a common word group for all procedures.
- the procedure 8 to be processed can be input by the user and acquired from the input.
- the dictionary creation device 10 may extract and process only the procedure to be processed from the procedure I or the like of the item name group.
- the dictionary creation device 10 determines whether the words in the group are likely to be synonyms or synonyms for each of the common word groups created in the first process. Is determined to be high, and synonym candidates and synonym candidates are created (second process; synonym candidate creation process).
- the dictionary creating apparatus 10 determines whether or not words to be processed are used in the same form using a form entry. When the words are used in the same form, the dictionary creation device 10 determines that there is a high possibility that they are “synonyms”, and the synonym candidates are candidates for synonyms. Update the memory. On the other hand, when words are not used in the same form, it is determined that they are likely to be “synonyms”, and the synonym candidate storage unit is updated as a synonym candidate.
- the dictionary creation device 10 presents the synonym dictionary candidates created in the second process to the user and accepts the approval input. Specifically, the dictionary creating device 10 causes the display unit provided in the dictionary creating device 10 or a display device or the like connected via a communication line to display the information of the synonym dictionary candidates. Then, it receives an input from an input device connected directly or via a communication line.
- the dictionary creation device 10 accepts approval input from the user, reflects approval/rejection information for each candidate from the synonym candidate, and creates a final synonym dictionary. , Update (Synonym dictionary update process).
- the synonym candidate is created, and the approval/disapproval of the candidate is accepted to determine the final synonym dictionary.
- the synonym candidates created may be fixed as the synonym dictionary as they are.
- the dictionary creation device 10 determines whether the item name ⁇ acquired from a plurality of forms belonging to a procedure is a synonym or a synonym, and creates a synonym dictionary. create. It should be noted that the same-objection language dictionary that is created is one that can take advantage of the different item names of different form in which a plurality of local governments Te same _ procedure smell is using common, at the time of standardization.
- the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
- FIG. 3 shows a functional block diagram of the dictionary creating device 10.
- the dictionary creation device 10 has, as functions, an item name storage unit 20, a common word group storage unit 20, and a synonym candidate storage unit 20 ( 3, a synonym).
- the display section 210, the reception section 21 and the update section 21 are provided.
- the functions of the above-mentioned respective units provided in the dictionary creating device 10 are achieved by the processor 11 operating each unit of the dictionary creating device 10 according to a program (dictionary creating program) stored in the storage unit 12. To be executed.
- the above program may be acquired by the dictionary creation device 10 via a communication network such as a network through a communication interface, or the dictionary creation device 10 reads it from the storage medium storing the program. It may be acquired at.
- processor 11 of the dictionary creating apparatus 10 operates according to the dictionary creating program to implement the dictionary creating method according to the present invention. The details of the functions of the above units will be described below.
- the item name storage unit 20 is a dictionary creation device. The information of the item name extracted from the form included in is stored. The item name storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
- the item name storage unit 20 is realized by an item name table (not shown) stored in the storage device 12.
- the item name table stores, for each item name, the item name, the form identification information of the form in which the item name is extracted, and the procedure identification information to which the form belongs.
- the form identification information and the procedure identification information are, for example, a form account and a procedure I account. Even if the form identification information is a form used in the same procedure, different form identification information is given to different users such as local governments, countries and companies that use the form.
- the common word group storage unit 20 stores information of one or more common word groups created by the dictionary creating device 10.
- the common word group storage unit 20 is mainly realized by the storage device 12 of the dictionary creating device 10.
- the common word group storage unit 20 is realized by a common word group table (not shown) stored in the storage device 12.
- the common word group table stores, for example, common word names, words, and form identification information of forms. ⁇ 02020/175662 12 ((171?2020/008190
- a common word name is one in a common word group. For example, in the “name” group, the common word is “name”.
- the word is a word that is a member of the common word group. For example, when the process of classifying the item name "child's name” into the "name” group is performed in the first process, the word is paired with the common word. , That is, "children" who formed the item name together with the common word.
- the form identification information is stored for each word, and the form identification information of the item name storage unit 20 is the same. If one word is used in multiple forms, multiple form identification information is stored for one word.
- Synonym candidate storage unit 20 shows information created by the dictionary creation device 10 that can identify synonym candidate words and information that can identify synonym candidate words. Stores the data (not shown) containing the same.
- Synonym candidate storage unit 20 (3 is mainly realized by the storage unit 12 of the dictionary creating device 10.
- the synonym candidate storage unit 20 (3 As an example, stores the same contents as the synonym dictionary storage unit 200 described below.
- the synonym dictionary storage unit 200 is realized by the synonym dictionary table (not shown) stored in the storage device 12.
- the synonym dictionary storage unit 200 stores the synonym dictionary data (not shown) including information that allows the synonymous words created by the dictionary creating device 10 to be identified, and the synonymous words. Stores the synonym dictionary data (not shown) that contains identifiable information.
- the synonym dictionary storage unit 200 is realized mainly by the storage device 12 of the dictionary creating device 10.
- the synonym dictionary storage unit 200 stores, for example, synonyms, procedures, and synonyms of word 1, word 2, word 1 and word 2.
- synonyms for word 1 and word 2, for example, “synonyms”, “synonyms”, “intra-procedure synonyms”, “intra-procedure synonyms”, etc. are stored according to the discrimination or approval result. ⁇ 02020/175662 13 ⁇ (: 171?2020/008190
- the item name acquisition unit 21 executes the above-mentioned item name acquisition process to acquire a plurality of item names described in a plurality of forms.
- the item name acquisition unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
- the process executed by the item name acquisition unit 21 corresponds to the item name acquisition process.
- the processor 11 acquires the analysis results of a plurality of forms to be processed from the form processing device 30 via the communication interface 13.
- the analysis results of a plurality of forms include character string data of one or more item names obtained by optical character recognition from the forms, procedure identification information, and form identification information.
- the item name acquisition unit 2 18 acquires a plurality of item names described in a plurality of forms used by different local governments for the same procedure.
- the procedure identification information such as Procedure I 0, Form I, etc.
- the form identification information which can identify from which form belonging to which procedure the item name is extracted, are acquired together with the item name. ..
- the procedure I 0 and the form I 0 can acquire the information input by the user when importing the form.
- the item name acquisition unit 21 may acquire image data of a plurality of forms from the form processing device 30 and may obtain character string data of item names from the acquired images based on predetermined image processing.
- the first processing unit 21 1 executes the above-described first process, and selects one or more words contained in each of the plurality of item names acquired by the item name acquisition unit 2 18 as one or more words. Classify into common word groups and create common word groups.
- the first processing unit 21 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10.
- the processing executed by the first processing unit 21 1 corresponds to the first processing step. ⁇ 0 2020/175662 14 ⁇ (: 171? 2020 /008190
- the first processing unit 21 1 is used in a pair (both) with a word other than the common word of the item names including the word common to the plurality of item names, that is, the common word.
- the words that make up one item name are grouped together for each common word.
- the second processing unit 210 executes the above-mentioned second processing, and for each of the common word groups created in the first processing, is it highly possible that each word in the group is a synonym? Determine whether there is a high probability of synonyms and create synonym candidates and synonym candidates (synonym candidates).
- the second processing unit 21 (3 is mainly realized by the processor 11 and the storage device 12 of the dictionary creating device 10).
- the processing executed by the second processing unit 210 corresponds to the second processing step.
- the second processing unit 21 determines whether the words are synonymous or synonymous based on the form identification information that is information for specifying the form. If they do not have common form identification information, the words are distinguished as synonyms, and if the words to be processed have common form identification information, the words are not synonymous. Distinguish as a word.
- the presentation unit 210 displays the synonym candidates created in the second process on the display device 32 and presents them.
- the presentation unit 210 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
- the processor 11 selects the synonym and/or synonym candidates stored in the synonym candidate storage unit via the communication interface 13 to form processing device 30. And display it on the display device 3 2 of the form processing unit 30. ⁇ 02020/175662 15 ⁇ (: 171?2020/008190
- the processor 11 may not perform the process of transmitting it to the form processing device 30 but may display it on the display device attached to the document creation device.
- the accepting unit 21 accepts information such as approval or rejection of the synonym candidates input by the user from the form processing apparatus 30.
- the processor 11 receives input of information from the form processing device 30 via the communication interface 13.
- the reception unit 21 is mainly realized by the processor 11 of the dictionary creation device 10, the storage device 12 and the communication interface 13.
- the updating unit 21 reflects the approval/rejection information received by the accepting unit 21 1 to the data of the synonym candidates created by the second processing unit 2 1 ⁇ 3, and finally updates the data. Create or update a synonym dictionary.
- the updating unit 21 is realized mainly by the processor 11 and the storage device 12 of the dictionary creating device 10.
- the process executed by the updating unit 21 corresponds to the dictionary creating/updating process.
- the dictionary creation device 10 initializes 3 indicating the number of procedures to 1 (3 1), and selects _ Select procedure 8 (32). The selection of the procedure may be executed by receiving the input from the user.
- the dictionary creation device 10 initializes the variable ⁇ to 1 (3 3) and acquires the item name I belonging to the selected procedure 3 (3 4) and morphologically analyzes the item. Extract the nouns included in the name, To get (3 5). Next, the dictionary creating device 10 selects the item name I + 111 belonging to the procedure 3 (36), and similarly ⁇ 02020/175662 16 ⁇ (: 171?2020/008190
- the dictionary creation device 10 extracts the extracted words! ⁇ And the word 1 ⁇ [[ 3 are compared to determine whether there is a common word (38). When there is no common word (3 8; N 0), the process ends. On the other hand, when there is a common word (3 8; ⁇ 6 3), it searches the common word group storage section 20 (3 9) whether the common word group ⁇ ⁇ of the common word has already been created.
- the words ⁇ and the words 1 to ⁇ [ 3 , and the forms of each word are stored in the common word group (3 10).
- the dictionary creation device 10 creates a new common word group ⁇ ⁇ and adds the word! ⁇ And word 1 ⁇ [ 3 , and the form of each word is stored in the common word group ⁇ ⁇ (3 1 1).
- the dictionary creating device 10 determines whether or not the item name ⁇
- the dictionary creation device 10 determines whether or not all the procedures 3 of the plurality of procedures have been processed at 316. If processing for all procedures is not completed, proceed to 317 and add 1 to 3. If the processes for all the procedures have been completed, the process ends.
- the dictionary creating device 10 executes the process shown in FIG. 5 for each of the common word groups created as described above.
- the dictionary creating device 10 initializes the variable 3 and the variable ! ⁇ (3 2 1) and acquires the procedure 3 (3 2 2).
- the dictionary creation device 10 selects the common word group ⁇ ! ⁇ (3 2 3). Then I ⁇ 02020/175662 17 ⁇ (: 171?2020/008190
- dictionary creating apparatus 1 the calculated number of counts is determined whether 0 (zero) or greater than (3 2 7), when greater than 0 ⁇ 2 1; ⁇ 6 3), their single It is determined that the word is a synonym, and it is written as a synonym in the same-synonym candidate storage unit (3 2 8), and the process proceeds to 3 30. On the other hand, when the count number is 0 (3 2 7 ;1 ⁇ 100)
- the dictionary creation device 10 judges whether or not the word I is the last word (3 30), and when the processing for all the words I is not completed, (3 3 0 ;1 ⁇ 1 ⁇ ), add 1 to ⁇ (3 3 1), and proceed to 3 2 5.
- 3 3 0 ;1 ⁇ 1 ⁇ add 1 to ⁇ (3 3 1), and proceed to 3 2 5.
- 3 3 2 ; 6 Go to 3 3 2.
- 3 34 it is determined whether or not the process has been executed for all procedures 3 among the plurality of procedures (3 3 4). If processing for all procedures is not completed (3 3 4; N 0), proceed to 3 3 5 and add 1 to 3. When the processing for all the procedures is completed, the processing ends.
- words in the common word group are synonymous with each other based on whether or not the words to be processed in the same form are used in the same form. Determine if it is different.
- the process shown in FIG. 5 is an example of a process for determining whether or not the same form is used, and the process is not limited to this and may be any process that can determine whether or not the same form is used. ⁇ 02020/175662 18 ⁇ (: 171?2020/008190
- the series of processes can be learned as a machine learning learning model. By learning in this way, it becomes possible to build a more automated and efficient dictionary generation function.
- the present invention is not limited to the above embodiment.
- the dictionary creation device 10 and the form processing device 30 may be configured as one device.
- the dictionary creation device 10 is not limited to one computer, and may be composed of multiple computers. Explanation of symbols
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
L'objectif de la présente invention est de créer un dictionnaire permettant de déterminer si des mots dans une pluralité de noms d'articles utilisés dans une pluralité de formulaires sont synonymes ou constituent des mots de significations différentes. L'invention concerne un dispositif de création de dictionnaire (10) permettant de créer un dictionnaire de synonymes et/ou un dictionnaire de mots différents les uns des autres, dans des noms d'articles dans des formulaires, comprenant : une unité d'acquisition de nom d'article (21A) permettant d'acquérir une pluralité de noms d'articles mentionnés dans une pluralité de formulaires ; une première unité de traitement (21B) permettant de classifier, en fonction de critères prescrits, un ou plusieurs mots compris dans chaque nom d'article de la pluralité de noms d'articles acquis par l'unité d'acquisition de nom d'article (21A), et de créer un ou plusieurs groupes de mots communs ; et une seconde unité de traitement (21C) permettant de déterminer, en fonction d'informations identifiant un formulaire, si les mots dans chaque groupe de mots communs ont la même signification ou des significations différentes les unes des autres, pour chaque groupe de mots commun.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-037050 | 2019-02-28 | ||
| JP2019037050A JP7029813B2 (ja) | 2019-02-28 | 2019-02-28 | 辞書作成装置、辞書作成方法及び辞書作成プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020175662A1 true WO2020175662A1 (fr) | 2020-09-03 |
Family
ID=72240013
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/008190 Ceased WO2020175662A1 (fr) | 2019-02-28 | 2020-02-27 | Dispositif de création de dictionnaire, procédé de création de dictionnaire et programme de création de dictionnaire |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7029813B2 (fr) |
| WO (1) | WO2020175662A1 (fr) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112269858B (zh) * | 2020-10-22 | 2024-04-19 | 中国平安人寿保险股份有限公司 | 同义词典的优化方法、装置、设备及存储介质 |
| JP2023129001A (ja) * | 2022-03-04 | 2023-09-14 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及び情報処理プログラム |
| JP7410501B1 (ja) | 2023-08-07 | 2024-01-10 | 株式会社ミラボ | プログラム、電子申請書作成方法及び電子申請書作成システム |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6338758B2 (fr) * | 1980-03-05 | 1988-08-02 | Tokyo Shibaura Electric Co | |
| JP2012048291A (ja) * | 2010-08-24 | 2012-03-08 | Dainippon Printing Co Ltd | 同義語辞書生成装置、データ解析装置、データ検出装置、同義語辞書生成方法及び同義語辞書生成プログラム |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5671676B2 (ja) | 2010-08-31 | 2015-02-18 | パナソニックヘルスケアホールディングス株式会社 | 文書データ変換装置及び文書変換プログラム |
| JP5524138B2 (ja) | 2011-07-04 | 2014-06-18 | 日本電信電話株式会社 | 同義語辞書生成装置、その方法、及びプログラム |
| JP2013109597A (ja) | 2011-11-21 | 2013-06-06 | Panasonic Corp | 医用同義語辞書作成装置および医用同義語辞書作成方法 |
| JP6338758B1 (ja) | 2017-11-10 | 2018-06-06 | 株式会社ナビット | 配信システム、配信方法及びプログラム |
-
2019
- 2019-02-28 JP JP2019037050A patent/JP7029813B2/ja active Active
-
2020
- 2020-02-27 WO PCT/JP2020/008190 patent/WO2020175662A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6338758B2 (fr) * | 1980-03-05 | 1988-08-02 | Tokyo Shibaura Electric Co | |
| JP2012048291A (ja) * | 2010-08-24 | 2012-03-08 | Dainippon Printing Co Ltd | 同義語辞書生成装置、データ解析装置、データ検出装置、同義語辞書生成方法及び同義語辞書生成プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7029813B2 (ja) | 2022-03-04 |
| JP2020140583A (ja) | 2020-09-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9639751B2 (en) | Property record document data verification systems and methods | |
| US11386263B2 (en) | Automatic generation of form application | |
| US10503830B2 (en) | Natural language processing with adaptable rules based on user inputs | |
| JP2001515623A (ja) | コンピュータによるテキストサマリ自動生成方法 | |
| US20220375246A1 (en) | Document display assistance system, document display assistance method, and program for executing said method | |
| WO2020175662A1 (fr) | Dispositif de création de dictionnaire, procédé de création de dictionnaire et programme de création de dictionnaire | |
| CN113971207B (zh) | 文档关联方法及装置、电子设备和存储介质 | |
| US8064703B2 (en) | Property record document data validation systems and methods | |
| CN109960707A (zh) | 一种基于人工智能的高校招生数据采集方法及系统 | |
| JP7041963B2 (ja) | 標準項目名設定装置、標準項目名設定方法及び標準項目名設定プログラム | |
| US20240211518A1 (en) | Automated document intake system | |
| JP6529254B2 (ja) | 情報処理装置、情報処理方法、プログラムおよび記憶媒体 | |
| JP7190768B2 (ja) | 窓口業務管理装置、窓口業務管理方法及び窓口業務管理プログラム | |
| JP7255585B2 (ja) | 情報処理装置、情報処理方法、および、プログラム | |
| JP2019003472A (ja) | 情報処理装置及び情報処理方法 | |
| CN111931480A (zh) | 文本主要内容的确定方法、装置、存储介质及计算机设备 | |
| US20190272334A1 (en) | Information processing device, information processing method, and non-transitory computer readable medium storing information processing program | |
| JP6964891B2 (ja) | 窓口業務管理装置、窓口業務管理方法及び窓口業務管理プログラム | |
| JPH08166959A (ja) | 画像処理方法 | |
| JP2004240488A (ja) | 文書管理装置 | |
| JP5877775B2 (ja) | コンテンツ管理装置、コンテンツ管理システム、コンテンツ管理方法、プログラム、及び記憶媒体 | |
| CN118051659A (zh) | 信息卡片生成方法及装置 | |
| JP4169618B2 (ja) | テキスト情報管理装置 | |
| CN120011362B (zh) | 多模态科研数据记录方法、系统、终端及存储介质 | |
| CN112789624A (zh) | 字符候选提议装置、手写字符辨别系统、方法及程序 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20763903 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/11/2021) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20763903 Country of ref document: EP Kind code of ref document: A1 |