WO2015010509A1 - Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe - Google Patents
Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe Download PDFInfo
- Publication number
- WO2015010509A1 WO2015010509A1 PCT/CN2014/080179 CN2014080179W WO2015010509A1 WO 2015010509 A1 WO2015010509 A1 WO 2015010509A1 CN 2014080179 W CN2014080179 W CN 2014080179W WO 2015010509 A1 WO2015010509 A1 WO 2015010509A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- trie tree
- dictionary
- key
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
Definitions
- the invention relates to a dictionary retrieval method, in particular to a dictionary retrieval method based on a one-dimensional linear space to implement a Trie tree. Background technique
- index structures include linear index tables, inverted tables, hash tables, and search trees.
- a linear index structure or an inverted table generally uses a sequential structure to store records in a dictionary.
- the search for records in a dictionary generally traverses each record sequentially, so the time complexity of each search is 0 ( ⁇ ) ⁇ 1( ⁇ ) (where l(n) is the time it takes for the two recorded keywords to be compared).
- the improvement of this method is that each record of the dictionary is ordered by key (key), and it is searched by half when searching, and its time complexity is 0 (logN) X l(n).
- dictionary-based natural language processing applications such as dictionary-based Chinese word segmentation, dictionary-based word-to-speech conversion (Chinese characters converted to pinyin), dictionary-based named entity recognition, dictionary-based annotation (including part-of-speech tagging and semantic tagging, etc.)
- dictionary-based natural language processing applications such as dictionary-based Chinese word segmentation, dictionary-based word-to-speech conversion (Chinese characters converted to pinyin), dictionary-based named entity recognition, dictionary-based annotation (including part-of-speech tagging and semantic tagging, etc.)
- dictionary-based annotation including part-of-speech tagging and semantic tagging, etc.
- the present invention is directed to solving one of the above drawbacks.
- the present invention provides a dictionary search method for implementing a Trie tree based on a one-dimensional linear space, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a query key to be queried according to user input; and implementing according to the current state of the entry key Inquire.
- the dictionary data of the Trie tree in the one-dimensional linear space the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
- the dictionary search based on the one-dimensional linear space to implement the Trie tree can be solved.
- the traditional Trie tree dictionary data retrieval conflicts in the construction process of the Tire tree due to the insertion of a new state, and can avoid the movement of a large amount of dictionary data caused by the conflict.
- the present invention discloses a dictionary retrieval method for implementing a Trie tree based on a one-dimensional linear space, the method comprising the steps of: generating dictionary data of a one-dimensional linear space Trie tree; determining a query key to be queried according to user input; The current state of the entry key implements the query.
- the key of the dictionary data is converted into a node and stored in a one-dimensional array, and the value of the one-dimensional array is used to identify whether the base value is unique.
- all terminal nodes are changed into non-terminal nodes, a leaf node is added after the terminal node, and a check value of the leaf node is assigned to its storage location.
- the leaf node further includes: a base value of the leaf node to identify whether it is a terminal node.
- the query comprises the steps of: pointing the current node to the root node; making a state transition according to the currently input character, obtaining a position of the direct successor state; verifying the precursor of the current state, determining which state the current state is. Transfer from; get the value of the value corresponding to the entry key.
- the query comprises: a query of the entry key can obtain the result of all of its prefix words.
- the invention provides a dictionary retrieval method based on a one-dimensional linear space to implement a Trie tree, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a to-be-queried key according to a user input; and implementing a query according to the current state of the entry key .
- the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
- the base value is adjusted during the construction of the Trie tree so that all its direct successors do not conflict, thus avoiding the backtracking problem of data movement or storage space allocation.
- FIG. 1 is a flow chart of a method for implementing a Trie tree dictionary search based on a one-dimensional linear space according to an embodiment of the present invention.
- FIG. 2 is a flow chart of implementing a query according to the current state of the entry key according to an embodiment of the present invention. detailed description
- a dictionary search method for implementing a Trie tree based on a one-dimensional linear space is provided in an embodiment of the present invention.
- FIG. 1 it is a flowchart of a method for implementing a Trie tree dictionary retrieval method based on a one-dimensional linear space according to an embodiment of the present invention.
- Step S110 Generate dictionary data of the one-dimensional linear space Trie tree.
- Obtaining dictionary data, to generate one-dimensional linear space Trib tree dictionary data includes the following specific steps:
- Step S111 Sort all the terms and attribute information in the lexicographic order with the key as the center, and merge the values having the same key value, so as to ensure that the key does not have a duplicate.
- Root-node, left 0;
- Root-node, depth 0;
- Step S114 The initial state is taken as the current state.
- Step S115 Obtain information of all direct successor states of the current state. If the direct successor node list is empty, that is, the current node is the terminal node "$", indicating that the key formed from the starting node to the current node is exactly A complete entry in the dictionary, the base value of the current node (terminal node) is assigned to the opposite of the current key dictionary sequence number, and the execution is completed on the path; otherwise, step S116 is performed.
- the pseudo code for this step is as follows:
- Step S116 Find a suitable base value for the current node, so that the base value is unique, and does not cause all direct successor nodes to collide with the nodes stored in the existing Trie tree.
- the direct successor node of the current node is inserted into the Trie tree, and the check value is assigned to the base value of the current node, and then the direct successor node of the current node is sequentially used as the current node, and the process proceeds to step S. 115.
- base value starts from 1;
- Used[base] 1 ; II identifies that this value has been used
- the current node is the terminal node (the direct successor is empty)
- Step S120 Enter a query key to be queried according to the user.
- the next step is to query whether the entry entered by the user has a Trie tree, that is, whether it is a complete path from the root node to the leaf node.
- Step S130 Implement a query according to the current state of the entry key.
- Step S131 Point the current node to the root node.
- Step S132 Perform a state transition according to the currently input character to obtain a position of the direct successor state.
- Step S133 Verify the precursor of the current state, and determine which state the current state is transferred from.
- Step S134 Obtain the value of the value corresponding to the entry key.
- Base[t] ⁇ 0 and the value of base[t] is the initial node of DFA.
- the query speed of the dictionary search method based on the one-dimensional linear space to implement the Trie tree is 18.3 MB/s. Therefore, the present invention provides a dictionary search method for implementing a Trie tree based on a one-dimensional linear space, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a query key to be queried according to user input; and implementing according to the current state of the entry key Inquire.
- the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
- the base value is adjusted during the construction of the Trie tree, so that Conflicts do not occur with all direct successors, thus avoiding backtracking issues with data movement or storage space allocation.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Un procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe comprend les étapes consistant à : générer des données de dictionnaire par arbre préfixe dans un espace linéaire unidimensionnel; déterminer une clé d'entrée devant être interrogée sur la base d'une entrée d'utilisateur; et implémenter une interrogation sur la base d'un état actuel de la clé d'entrée. Dans les données de dictionnaire par arbre préfixe, qui sont construites dans un espace linéaire unidimensionnel, les vitesses de chargement et de recherche dans le dictionnaire sont accrues et une récupération rapide de tous les termes associés au préfixe d'une entrée est possible. De plus, la recherche dans le dictionnaire par arbre préfixe implémentée sur la base d'un espace linéaire unidimensionnel permet de résoudre le problème d'un conflit dû à une introduction d'un nouvel état, conflit existant dans un processus de construction d'arbre préfixe d'une recherche de données dans un dictionnaire par arbre préfixe conventionnelle, ce qui permet de prévenir le problème d'un déplacement d'une grande quantité de données de dictionnaire en raison du conflit.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310288821.5A CN103365992B (zh) | 2013-07-03 | 2013-07-03 | 一种基于一维线性空间实现Trie树的词典检索方法 |
| CN201310288821.5 | 2013-07-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015010509A1 true WO2015010509A1 (fr) | 2015-01-29 |
Family
ID=49367333
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2014/080179 Ceased WO2015010509A1 (fr) | 2013-07-03 | 2014-06-18 | Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN103365992B (fr) |
| WO (1) | WO2015010509A1 (fr) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103365991B (zh) * | 2013-07-03 | 2017-03-08 | 深圳市华傲数据技术有限公司 | 一种基于一维线性空间实现Trie树的词典存储管理方法 |
| CN103365992B (zh) * | 2013-07-03 | 2017-02-15 | 深圳市华傲数据技术有限公司 | 一种基于一维线性空间实现Trie树的词典检索方法 |
| CN107680588B (zh) * | 2017-05-10 | 2020-10-20 | 平安科技(深圳)有限公司 | 智能语音导航方法、装置及存储介质 |
| CN107239549A (zh) * | 2017-06-07 | 2017-10-10 | 传神语联网网络科技股份有限公司 | 数据库术语检索的方法、装置及终端 |
| CN107273360A (zh) * | 2017-06-21 | 2017-10-20 | 成都布林特信息技术有限公司 | 基于语义理解的中文实词提取算法 |
| CN108153907B (zh) * | 2018-01-18 | 2021-01-22 | 中国计量大学 | 通过16位Trie树实现空间优化的词典存储管理方法 |
| CN108197313B (zh) * | 2018-02-01 | 2021-06-25 | 中国计量大学 | 通过16位Trie树实现空间优化的词典索引方法 |
| CN108509419B (zh) * | 2018-03-21 | 2022-02-22 | 山东中医药大学 | 中医药古籍文献分词和词性标引方法及系统 |
| CN109684439B (zh) * | 2018-12-28 | 2020-10-30 | 语联网(武汉)信息技术有限公司 | 分词过程中进行前缀索引的方法及装置 |
| CN109739948B (zh) * | 2018-12-28 | 2021-08-03 | 北京金山安全软件有限公司 | 词表的存储管理方法、装置、电子设备及存储介质 |
| CN110688483B (zh) * | 2019-09-16 | 2022-10-18 | 重庆邮电大学 | 文景转换中基于词典的名词可视性标注方法、介质及系统 |
| CN112100132B (zh) * | 2020-09-24 | 2024-07-30 | 深圳软牛科技有限公司 | 一种已删除文件类型识别方法、装置、电子设备及存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1786962A (zh) * | 2005-12-21 | 2006-06-14 | 中国科学院计算技术研究所 | 完美双数组trie树词典管理与检索方法 |
| US20080208854A1 (en) * | 2005-06-06 | 2008-08-28 | 3618633 Canada Inc. | Method of Syntactic Pattern Recognition of Sequences |
| CN101398830A (zh) * | 2007-09-27 | 2009-04-01 | 阿里巴巴集团控股有限公司 | 词库模糊查询方法及词库模糊查询系统 |
| CN101499094A (zh) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | 一种数据压缩存储并检索的方法及系统 |
| CN102651026A (zh) * | 2012-04-01 | 2012-08-29 | 百度在线网络技术(北京)有限公司 | 通过预计算优化搜索引擎分词的方法及搜索引擎分词装置 |
| CN103365992A (zh) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | 一种基于一维线性空间实现Trie树的词典检索方法 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7096235B2 (en) * | 2003-03-27 | 2006-08-22 | Sand Technology Systems International, Inc. | Computer implemented compact 0-complete tree dynamic storage structure and method of processing stored data |
| AU2003274246A1 (en) * | 2003-08-11 | 2005-03-29 | France Telecom | Trie memory device with a circular pipeline mechanism |
| CN101788990A (zh) * | 2009-01-23 | 2010-07-28 | 北京金远见电脑技术有限公司 | Trie树双数组的全局优化构造方法及系统 |
| JP5262864B2 (ja) * | 2009-03-10 | 2013-08-14 | 富士通株式会社 | 記憶媒体、検索方法および検索装置 |
| US10679218B2 (en) * | 2010-05-28 | 2020-06-09 | Securitymetrics, Inc. | Systems and methods employing searches for known identifiers of sensitive information to identify sensitive information in data |
-
2013
- 2013-07-03 CN CN201310288821.5A patent/CN103365992B/zh active Active
-
2014
- 2014-06-18 WO PCT/CN2014/080179 patent/WO2015010509A1/fr not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080208854A1 (en) * | 2005-06-06 | 2008-08-28 | 3618633 Canada Inc. | Method of Syntactic Pattern Recognition of Sequences |
| CN1786962A (zh) * | 2005-12-21 | 2006-06-14 | 中国科学院计算技术研究所 | 完美双数组trie树词典管理与检索方法 |
| CN101398830A (zh) * | 2007-09-27 | 2009-04-01 | 阿里巴巴集团控股有限公司 | 词库模糊查询方法及词库模糊查询系统 |
| CN101499094A (zh) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | 一种数据压缩存储并检索的方法及系统 |
| CN102651026A (zh) * | 2012-04-01 | 2012-08-29 | 百度在线网络技术(北京)有限公司 | 通过预计算优化搜索引擎分词的方法及搜索引擎分词装置 |
| CN103365992A (zh) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | 一种基于一维线性空间实现Trie树的词典检索方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN103365992A (zh) | 2013-10-23 |
| CN103365992B (zh) | 2017-02-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2015010509A1 (fr) | Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe | |
| CN104866593B (zh) | 一种基于知识图谱的数据库搜索方法 | |
| KR101646754B1 (ko) | 모바일 시멘틱 검색 장치 및 그 방법 | |
| CN105706078A (zh) | 实体集合的自动定义 | |
| CN105701253A (zh) | 中文自然语言问句语义化的知识库自动问答方法 | |
| CN110795526B (zh) | 一种用于检索系统的数学公式索引创建方法与系统 | |
| CN103646032A (zh) | 一种基于本体和受限自然语言处理的数据库查询方法 | |
| CN106874422A (zh) | 一种面向关系型数据库的图查询方法 | |
| WO2015010508A1 (fr) | Procédé à base d'espace linéaire unidimensionnel pour mettre en œuvre le stockage et la gestion d'un dictionnaire à l'aide d'un arbre de trie | |
| CN106528647B (zh) | 一种基于cedar双数组字典树算法进行术语匹配的方法 | |
| JPWO2009095981A1 (ja) | 表からツリー構造データを構築する方法及び装置 | |
| CN106021523B (zh) | 基于json的数据仓库存储及查询方法 | |
| CN105808729B (zh) | 基于论文间引用关系的学术大数据分析方法 | |
| CN104346331A (zh) | Xml数据库的检索方法及系统 | |
| CN108846016A (zh) | 一种面向中文分词的搜索算法 | |
| JP2019040598A5 (fr) | ||
| CN107239549A (zh) | 数据库术语检索的方法、装置及终端 | |
| US20090234852A1 (en) | Sub-linear approximate string match | |
| CN113590650B (zh) | 基于特征表达式的结构化查询语句甄别方法及装置 | |
| JP4712718B2 (ja) | 配列の生成方法、及び、配列生成プログラム | |
| CN105824956A (zh) | 一种基于链表结构的倒排索引模型及其构建方法 | |
| CN117435611A (zh) | 一种查询语句生成方法、装置、设备及存储介质 | |
| CN103500222A (zh) | 通信软件聊天对象的搜索方法及装置 | |
| CN100565508C (zh) | 结构化文档管理设备、搜索设备、存储和搜索方法 | |
| CN119474540B (zh) | 一种知识引导的基于大语言模型的可信api推荐方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14828829 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 14828829 Country of ref document: EP Kind code of ref document: A1 |