WO2015010509A1 - Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe - Google Patents

Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe Download PDF

Info

Publication number
WO2015010509A1
WO2015010509A1 PCT/CN2014/080179 CN2014080179W WO2015010509A1 WO 2015010509 A1 WO2015010509 A1 WO 2015010509A1 CN 2014080179 W CN2014080179 W CN 2014080179W WO 2015010509 A1 WO2015010509 A1 WO 2015010509A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
trie tree
dictionary
key
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2014/080179
Other languages
English (en)
Chinese (zh)
Inventor
贾西贝
王国印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huaao Data Technology Co Ltd
Original Assignee
Shenzhen Huaao Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huaao Data Technology Co Ltd filed Critical Shenzhen Huaao Data Technology Co Ltd
Publication of WO2015010509A1 publication Critical patent/WO2015010509A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Definitions

  • the invention relates to a dictionary retrieval method, in particular to a dictionary retrieval method based on a one-dimensional linear space to implement a Trie tree. Background technique
  • index structures include linear index tables, inverted tables, hash tables, and search trees.
  • a linear index structure or an inverted table generally uses a sequential structure to store records in a dictionary.
  • the search for records in a dictionary generally traverses each record sequentially, so the time complexity of each search is 0 ( ⁇ ) ⁇ 1( ⁇ ) (where l(n) is the time it takes for the two recorded keywords to be compared).
  • the improvement of this method is that each record of the dictionary is ordered by key (key), and it is searched by half when searching, and its time complexity is 0 (logN) X l(n).
  • dictionary-based natural language processing applications such as dictionary-based Chinese word segmentation, dictionary-based word-to-speech conversion (Chinese characters converted to pinyin), dictionary-based named entity recognition, dictionary-based annotation (including part-of-speech tagging and semantic tagging, etc.)
  • dictionary-based natural language processing applications such as dictionary-based Chinese word segmentation, dictionary-based word-to-speech conversion (Chinese characters converted to pinyin), dictionary-based named entity recognition, dictionary-based annotation (including part-of-speech tagging and semantic tagging, etc.)
  • dictionary-based annotation including part-of-speech tagging and semantic tagging, etc.
  • the present invention is directed to solving one of the above drawbacks.
  • the present invention provides a dictionary search method for implementing a Trie tree based on a one-dimensional linear space, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a query key to be queried according to user input; and implementing according to the current state of the entry key Inquire.
  • the dictionary data of the Trie tree in the one-dimensional linear space the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
  • the dictionary search based on the one-dimensional linear space to implement the Trie tree can be solved.
  • the traditional Trie tree dictionary data retrieval conflicts in the construction process of the Tire tree due to the insertion of a new state, and can avoid the movement of a large amount of dictionary data caused by the conflict.
  • the present invention discloses a dictionary retrieval method for implementing a Trie tree based on a one-dimensional linear space, the method comprising the steps of: generating dictionary data of a one-dimensional linear space Trie tree; determining a query key to be queried according to user input; The current state of the entry key implements the query.
  • the key of the dictionary data is converted into a node and stored in a one-dimensional array, and the value of the one-dimensional array is used to identify whether the base value is unique.
  • all terminal nodes are changed into non-terminal nodes, a leaf node is added after the terminal node, and a check value of the leaf node is assigned to its storage location.
  • the leaf node further includes: a base value of the leaf node to identify whether it is a terminal node.
  • the query comprises the steps of: pointing the current node to the root node; making a state transition according to the currently input character, obtaining a position of the direct successor state; verifying the precursor of the current state, determining which state the current state is. Transfer from; get the value of the value corresponding to the entry key.
  • the query comprises: a query of the entry key can obtain the result of all of its prefix words.
  • the invention provides a dictionary retrieval method based on a one-dimensional linear space to implement a Trie tree, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a to-be-queried key according to a user input; and implementing a query according to the current state of the entry key .
  • the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
  • the base value is adjusted during the construction of the Trie tree so that all its direct successors do not conflict, thus avoiding the backtracking problem of data movement or storage space allocation.
  • FIG. 1 is a flow chart of a method for implementing a Trie tree dictionary search based on a one-dimensional linear space according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of implementing a query according to the current state of the entry key according to an embodiment of the present invention. detailed description
  • a dictionary search method for implementing a Trie tree based on a one-dimensional linear space is provided in an embodiment of the present invention.
  • FIG. 1 it is a flowchart of a method for implementing a Trie tree dictionary retrieval method based on a one-dimensional linear space according to an embodiment of the present invention.
  • Step S110 Generate dictionary data of the one-dimensional linear space Trie tree.
  • Obtaining dictionary data, to generate one-dimensional linear space Trib tree dictionary data includes the following specific steps:
  • Step S111 Sort all the terms and attribute information in the lexicographic order with the key as the center, and merge the values having the same key value, so as to ensure that the key does not have a duplicate.
  • Root-node, left 0;
  • Root-node, depth 0;
  • Step S114 The initial state is taken as the current state.
  • Step S115 Obtain information of all direct successor states of the current state. If the direct successor node list is empty, that is, the current node is the terminal node "$", indicating that the key formed from the starting node to the current node is exactly A complete entry in the dictionary, the base value of the current node (terminal node) is assigned to the opposite of the current key dictionary sequence number, and the execution is completed on the path; otherwise, step S116 is performed.
  • the pseudo code for this step is as follows:
  • Step S116 Find a suitable base value for the current node, so that the base value is unique, and does not cause all direct successor nodes to collide with the nodes stored in the existing Trie tree.
  • the direct successor node of the current node is inserted into the Trie tree, and the check value is assigned to the base value of the current node, and then the direct successor node of the current node is sequentially used as the current node, and the process proceeds to step S. 115.
  • base value starts from 1;
  • Used[base] 1 ; II identifies that this value has been used
  • the current node is the terminal node (the direct successor is empty)
  • Step S120 Enter a query key to be queried according to the user.
  • the next step is to query whether the entry entered by the user has a Trie tree, that is, whether it is a complete path from the root node to the leaf node.
  • Step S130 Implement a query according to the current state of the entry key.
  • Step S131 Point the current node to the root node.
  • Step S132 Perform a state transition according to the currently input character to obtain a position of the direct successor state.
  • Step S133 Verify the precursor of the current state, and determine which state the current state is transferred from.
  • Step S134 Obtain the value of the value corresponding to the entry key.
  • Base[t] ⁇ 0 and the value of base[t] is the initial node of DFA.
  • the query speed of the dictionary search method based on the one-dimensional linear space to implement the Trie tree is 18.3 MB/s. Therefore, the present invention provides a dictionary search method for implementing a Trie tree based on a one-dimensional linear space, by generating a Trio tree dictionary data of a one-dimensional linear space; determining a query key to be queried according to user input; and implementing according to the current state of the entry key Inquire.
  • the dictionary loading and retrieval speed is improved, and all the prefix words of the entry can be quickly retrieved.
  • the base value is adjusted during the construction of the Trie tree, so that Conflicts do not occur with all direct successors, thus avoiding backtracking issues with data movement or storage space allocation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe comprend les étapes consistant à : générer des données de dictionnaire par arbre préfixe dans un espace linéaire unidimensionnel; déterminer une clé d'entrée devant être interrogée sur la base d'une entrée d'utilisateur; et implémenter une interrogation sur la base d'un état actuel de la clé d'entrée. Dans les données de dictionnaire par arbre préfixe, qui sont construites dans un espace linéaire unidimensionnel, les vitesses de chargement et de recherche dans le dictionnaire sont accrues et une récupération rapide de tous les termes associés au préfixe d'une entrée est possible. De plus, la recherche dans le dictionnaire par arbre préfixe implémentée sur la base d'un espace linéaire unidimensionnel permet de résoudre le problème d'un conflit dû à une introduction d'un nouvel état, conflit existant dans un processus de construction d'arbre préfixe d'une recherche de données dans un dictionnaire par arbre préfixe conventionnelle, ce qui permet de prévenir le problème d'un déplacement d'une grande quantité de données de dictionnaire en raison du conflit.
PCT/CN2014/080179 2013-07-03 2014-06-18 Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe Ceased WO2015010509A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310288821.5A CN103365992B (zh) 2013-07-03 2013-07-03 一种基于一维线性空间实现Trie树的词典检索方法
CN201310288821.5 2013-07-03

Publications (1)

Publication Number Publication Date
WO2015010509A1 true WO2015010509A1 (fr) 2015-01-29

Family

ID=49367333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080179 Ceased WO2015010509A1 (fr) 2013-07-03 2014-06-18 Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe

Country Status (2)

Country Link
CN (1) CN103365992B (fr)
WO (1) WO2015010509A1 (fr)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365991B (zh) * 2013-07-03 2017-03-08 深圳市华傲数据技术有限公司 一种基于一维线性空间实现Trie树的词典存储管理方法
CN103365992B (zh) * 2013-07-03 2017-02-15 深圳市华傲数据技术有限公司 一种基于一维线性空间实现Trie树的词典检索方法
CN107680588B (zh) * 2017-05-10 2020-10-20 平安科技(深圳)有限公司 智能语音导航方法、装置及存储介质
CN107239549A (zh) * 2017-06-07 2017-10-10 传神语联网网络科技股份有限公司 数据库术语检索的方法、装置及终端
CN107273360A (zh) * 2017-06-21 2017-10-20 成都布林特信息技术有限公司 基于语义理解的中文实词提取算法
CN108153907B (zh) * 2018-01-18 2021-01-22 中国计量大学 通过16位Trie树实现空间优化的词典存储管理方法
CN108197313B (zh) * 2018-02-01 2021-06-25 中国计量大学 通过16位Trie树实现空间优化的词典索引方法
CN108509419B (zh) * 2018-03-21 2022-02-22 山东中医药大学 中医药古籍文献分词和词性标引方法及系统
CN109684439B (zh) * 2018-12-28 2020-10-30 语联网(武汉)信息技术有限公司 分词过程中进行前缀索引的方法及装置
CN109739948B (zh) * 2018-12-28 2021-08-03 北京金山安全软件有限公司 词表的存储管理方法、装置、电子设备及存储介质
CN110688483B (zh) * 2019-09-16 2022-10-18 重庆邮电大学 文景转换中基于词典的名词可视性标注方法、介质及系统
CN112100132B (zh) * 2020-09-24 2024-07-30 深圳软牛科技有限公司 一种已删除文件类型识别方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1786962A (zh) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 完美双数组trie树词典管理与检索方法
US20080208854A1 (en) * 2005-06-06 2008-08-28 3618633 Canada Inc. Method of Syntactic Pattern Recognition of Sequences
CN101398830A (zh) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 词库模糊查询方法及词库模糊查询系统
CN101499094A (zh) * 2009-03-10 2009-08-05 焦点科技股份有限公司 一种数据压缩存储并检索的方法及系统
CN102651026A (zh) * 2012-04-01 2012-08-29 百度在线网络技术(北京)有限公司 通过预计算优化搜索引擎分词的方法及搜索引擎分词装置
CN103365992A (zh) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 一种基于一维线性空间实现Trie树的词典检索方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7096235B2 (en) * 2003-03-27 2006-08-22 Sand Technology Systems International, Inc. Computer implemented compact 0-complete tree dynamic storage structure and method of processing stored data
AU2003274246A1 (en) * 2003-08-11 2005-03-29 France Telecom Trie memory device with a circular pipeline mechanism
CN101788990A (zh) * 2009-01-23 2010-07-28 北京金远见电脑技术有限公司 Trie树双数组的全局优化构造方法及系统
JP5262864B2 (ja) * 2009-03-10 2013-08-14 富士通株式会社 記憶媒体、検索方法および検索装置
US10679218B2 (en) * 2010-05-28 2020-06-09 Securitymetrics, Inc. Systems and methods employing searches for known identifiers of sensitive information to identify sensitive information in data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080208854A1 (en) * 2005-06-06 2008-08-28 3618633 Canada Inc. Method of Syntactic Pattern Recognition of Sequences
CN1786962A (zh) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 完美双数组trie树词典管理与检索方法
CN101398830A (zh) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 词库模糊查询方法及词库模糊查询系统
CN101499094A (zh) * 2009-03-10 2009-08-05 焦点科技股份有限公司 一种数据压缩存储并检索的方法及系统
CN102651026A (zh) * 2012-04-01 2012-08-29 百度在线网络技术(北京)有限公司 通过预计算优化搜索引擎分词的方法及搜索引擎分词装置
CN103365992A (zh) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 一种基于一维线性空间实现Trie树的词典检索方法

Also Published As

Publication number Publication date
CN103365992A (zh) 2013-10-23
CN103365992B (zh) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2015010509A1 (fr) Procédé basé sur un espace linéaire unidimensionnel et permettant d'implémenter une recherche dans un dictionnaire par arbre préfixe
CN104866593B (zh) 一种基于知识图谱的数据库搜索方法
KR101646754B1 (ko) 모바일 시멘틱 검색 장치 및 그 방법
CN105706078A (zh) 实体集合的自动定义
CN105701253A (zh) 中文自然语言问句语义化的知识库自动问答方法
CN110795526B (zh) 一种用于检索系统的数学公式索引创建方法与系统
CN103646032A (zh) 一种基于本体和受限自然语言处理的数据库查询方法
CN106874422A (zh) 一种面向关系型数据库的图查询方法
WO2015010508A1 (fr) Procédé à base d'espace linéaire unidimensionnel pour mettre en œuvre le stockage et la gestion d'un dictionnaire à l'aide d'un arbre de trie
CN106528647B (zh) 一种基于cedar双数组字典树算法进行术语匹配的方法
JPWO2009095981A1 (ja) 表からツリー構造データを構築する方法及び装置
CN106021523B (zh) 基于json的数据仓库存储及查询方法
CN105808729B (zh) 基于论文间引用关系的学术大数据分析方法
CN104346331A (zh) Xml数据库的检索方法及系统
CN108846016A (zh) 一种面向中文分词的搜索算法
JP2019040598A5 (fr)
CN107239549A (zh) 数据库术语检索的方法、装置及终端
US20090234852A1 (en) Sub-linear approximate string match
CN113590650B (zh) 基于特征表达式的结构化查询语句甄别方法及装置
JP4712718B2 (ja) 配列の生成方法、及び、配列生成プログラム
CN105824956A (zh) 一种基于链表结构的倒排索引模型及其构建方法
CN117435611A (zh) 一种查询语句生成方法、装置、设备及存储介质
CN103500222A (zh) 通信软件聊天对象的搜索方法及装置
CN100565508C (zh) 结构化文档管理设备、搜索设备、存储和搜索方法
CN119474540B (zh) 一种知识引导的基于大语言模型的可信api推荐方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14828829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14828829

Country of ref document: EP

Kind code of ref document: A1