WO2012025439A1 - Verfahren zum suchen in einer vielzahl von datensätzen und suchmaschine - Google Patents
Verfahren zum suchen in einer vielzahl von datensätzen und suchmaschine Download PDFInfo
- Publication number
- WO2012025439A1 WO2012025439A1 PCT/EP2011/064163 EP2011064163W WO2012025439A1 WO 2012025439 A1 WO2012025439 A1 WO 2012025439A1 EP 2011064163 W EP2011064163 W EP 2011064163W WO 2012025439 A1 WO2012025439 A1 WO 2012025439A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- term
- terms
- probability
- subset
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to a computer implemented method for searching in a plurality of data sets. Furthermore, the invention relates to a corresponding computer-executable search engine.
- the search in a variety of data sets is for example in so-called online stores of great importance.
- a supplier of a large number of products records the products offered using data records in a database.
- a user can now use his computer over a network, such as the Internet, connect to the online store and call the records of the database.
- the database contains a very large amount of data, and the individual data sets are relatively complex, it is necessary for the user to be able to search the data records using a search engine.
- the user submits a search request to the online shop.
- the online shop or an associated system processes the search query and returns as search hits in a certain way ordered records to the user. This raises the problem of determining the records that are particularly relevant to the user's search.
- Such search engines are referred to as Internet search engines.
- search is often vague and fraught with uncertainty.
- the search terms of the search query frequently do not exactly match the terms that appear in the records to be searched.
- the search term may happen that the search term contain typos or should also refer to grammatical other forms of the search term.
- EP 1 095 326 B1 describes a search system for the retrieval of information which is stored in the form of text.
- the search system uses a tree structure as the data structure for the information retrieval process. Further, by a measure, the degree of correspondence between a request and the retrieved information is determined, the measure determining a combination of a distance measure for an approximate degree of correspondence between words or symbols in the text and the request, and another Distance measure for an approximate degree of agreement between sequences of words or symbols in the text and a query sequence.
- EP 1 208 465 B1 describes a search engine for searching a collection of documents.
- data processing units form groups of nodes connected in a network.
- the search engine is customized to scale in terms of data volume and query request rate.
- EP 1 341 009 B1 describes a method for operating an Internet search engine.
- the process traverses links between websites on the Internet using an intelligent agent.
- the contents of the visited websites are filtered to determine the relevance of the content.
- the relevant websites identified are indexed and the indexed, item-specific information is stored in a database.
- the filters the contents of a web site are passed through a subject-specific dictionary-based filter that compares site content with terminology found in the dictionary.
- EP 1 459 206 B1 describes a computer-implemented method for searching a collection of items, each item in the collection having a set of properties.
- a query is received, which is formed from a first set of two or more properties.
- a distance function is then applied to one or more of the items in the collection and one or more result items are identified based on the distance function. In doing so, the distance function determines a distance between the query and an item in the collection, based on the number of items in the collection that have all the properties in the intersection of the first set of properties and the set of properties for the item.
- EP 1 622 054 A1 WO 2008/085637 A2 and WO 2008/137395 A1 describe further search methods and search engines for a search in data records.
- the method of this document does not solve, among other things, the problem that misspelled words in a search result in a very high relevance of the misspelled word because misspelled words do not or rarely occur in documents.
- the present invention is based on the technical problem of providing a computer-implemented method for searching in a plurality of data records and a corresponding computer-executable search engine which outputs from the plurality of data records specific records that are as relevant as possible for a received query.
- a search request is received with at least one search term. Then, from a reference set, a subset is determined with terms that are similar or identical to the search term. If the search query contains several search terms, a reference quantity is determined separately for each search term. For each term of the subset, a measure of similarity to the search term is then determined and the probability of occurrence of the term is determined.
- a term-dependent weighting distribution is applied to the terms of the subset, with the terms having a higher degree of similarity being weighted more heavily than terms having a lower degree of similarity. Thereafter, a modified probability for the term is determined from the weighted probabilities of the terms of the subset.
- the data records are evaluated in terms of their relevance to the search query.
- it is checked whether the terms of the subset occur in the data set, and if a term of the subset occurs in the data set, a lower modified probability of the term leads to a higher relevance score of the data set.
- at least one record subset is output depending on its relevance score. For the relevance of a search term of a query, it is important how often this search term occurs in sets with such terms. If a term is usually very common, it is less relevant to query processing than a term search term, which is usually very rare in term sets.
- the different frequencies with which search terms occur are taken into account by determining an occurrence probability of each term of the subset.
- these term probabilities can be determined in advance on the basis of specific sets of terms, for example on the basis of preselected texts in which the word frequencies have been determined.
- the set to which the probability of occurrence of the terms relates can also be formed by the totality of the terms that occur in the records to be searched. These records can be searched in advance and indexed.
- the frequency with which this term has occurred in the data records can be determined for each term in a data record.
- the problem may arise that, on the one hand, the records to be searched may contain errors and, on the other, that the search terms of the search query may contain errors.
- the errors may be typing or typing errors, for example.
- a word may be contained in an incorrect notation in a record. If now the frequency of occurrence of the terms is determined, it would result in an incorrectly written word of a search term that this word has a particularly high relevance for the search query, since it is very rare.
- the same situation arises when the probability of the occurrence of a term is determined on the basis of the totality of the terms that occur in the data records. If the records contain a word that is not spelled correctly, this word is very rare, so that the probability of occurrence of this word is very low and thus the relevance of the word for a query is very high.
- this problem is solved in that not only the probability of occurrence or the frequency of a query term is considered, but for each search term in the search query a subset of terms from a reference set is determined, which in the subsequent relevance evaluation of the records with regard to this search term.
- the subset can be determined, for example, by means of a lexicon. In this case, it follows that a search term with an incorrectly written word would not be included in the subset, but only similar words that are spelled correctly.
- the reference set includes all grammatical forms of words. The subset will therefore not only contain a word of a search term, but also other grammatical forms of that word, since these forms are similar to the search term.
- the weighting distribution is designed such that when determining the modified probability of a term, only the probability of the term itself and the probabilities of other terms having a higher degree of similarity than the one term are considered.
- the weighting distribution in this case can thus be, for example, a step function which outputs the weighting 1 for the term of the subset itself as well as for other terms which have a higher similarity measure than this term, and the weighting for terms of the subset with a smaller similarity measure 0 so that these terms of less similarity measure are ignored in the determination of the modified probability.
- the weighting with which the probability of a second term t k enters the modified probability of a first term t is determined by the evaluation of a sigmoid function, wherein the evaluation point is the subtraction of the similarity measure of the first Terms t j of similarity measure of the second term t k . Since the sigmoid function has a continuous transition from the value 0 to the value 1 in contrast to a discontinuous step function, terms of the subset which have a slightly lower similarity measure than the term whose modified probability is determined can also be taken into account in this embodiment of the method according to the invention becomes. In this way, the relevance of the records evaluated on the basis of the modified probability can be further improved.
- the weighting distribution is such that, in determining the modified probability of one term, further probabilities from other terms with a lower weighting are taken into account which have a lower degree of similarity, the weighting of another term having a lower degree of similarity the difference of the similarity measure of the term for which the modified probability is determined and the similarity measure of the other term.
- the lower the absolute value of the difference to the similarity measure of the term for which the modified probability is determined the higher will be the weighting of another term with lower similarity measure.
- terms with a lower similarity measure can be taken into account for the determination of the modified probability of each term of the subset and thus for the evaluation of the relevance of the data records.
- the modified term probability of a given term represents the probability of merging all terms whose similarity (in a generalized sense) to the search term is greater than the similarity of the given term. For the evaluation of a data set, however, it makes sense to determine the probability that a data record contains one. Since a record contains many terms, this probability is greater.
- an intermediate step is introduced which takes into account the distribution of the number of terms per data record of the data sets to be searched. It is particularly taken into account that a record could contain several similar terms at the same time.
- the evaluation of a data record can result, for example, from the absolute value of the logarithm of the modified probability of the associated term. In this way, the different probabilities of the terms to be considered for the determination of the modified probability of a term can be more easily combined.
- the probability of occurrence of the term of the subset is determined, in particular, by determining the probability associated with the term beforehand from the frequency of the term in the reference quantity or in the data records and storing it in a memory, and the stored probability for the term later the memory is read out.
- This prior determination of the probabilities can speed up and simplify the execution of the method.
- the reference quantity for general analyzes of the frequency of occurrence of terms in quantities, i. for example, words in texts, access.
- the frequency of occurrence of the terms in the data records it is possible to determine probabilities which are adapted to the specific data records. For example, if the records are a product database, then the frequency of occurrences of particular words may differ from the frequencies determined from general texts of other types.
- the search request contains several search terms.
- a partial evaluation is determined separately for each search term.
- a further partial evaluation is determined for the search query composed search query. The evaluation of the search query is then determined from the partial evaluations.
- the method can rate a data record higher if a term of the subset occurs more frequently in this data record. For example, the more frequently a term of the subset occurs in this dataset, the higher the rating of a dataset. In this way, not only the probability of occurrence of a term as well as further terms of the subset can be included in the relevance rating of the data records, but also the actual frequency of occurrence of a term in the data record to be evaluated. This measure also leads to a further improvement of the relevance evaluation of the data records.
- a record can contain multiple fields. This is the case, for example, when the data records relate to a product database.
- the method preferably also evaluates the relevance of a data record as a function of in which field a term of the subset occurs in the data record. If a term occurs in a particularly important field, this leads to a higher rating of the data set than if the term occurs in a less important field.
- the invention also relates to a computer program product with program codes for carrying out the method described above when the program code is executed by a computer.
- the computer program product can be any storage medium for computer software.
- the computer-executable search engine comprises a receiving unit for receiving a search request with at least one search term.
- the search request may be received over a network, such as the Internet.
- the search engine comprises means for determining a subset with terms that are similar or identical to the search term. This subset is determined in particular from a reference quantity.
- the search engine comprises means for determining a modified probability for each term of the subset.
- These means for determining the modified probability are designed so that a similarity measure of the respective term to the search term can be determined, the probability for the occurrence of the term is determinable, a term-dependent weighting distribution is applicable to the terms of the subset, the terms , which have a higher similarity measure to the search term, are weighted more heavily than terms with a lower similarity measure, and the modified probability for the term can be determined from the weighted probabilities of the terms of the subset.
- the search engine comprises a rating unit for evaluating data records with regard to their relevance to the search query.
- a rating unit for evaluating data records with regard to their relevance to the search query.
- it can be checked by means of this evaluation unit whether the terms of the subset occur in the data record, and if a term of the subset occurs in the data record, a lower modified probability of the term leads to a higher rating of the data record.
- the search engine according to the invention comprises an output unit for outputting a data set subset as a function of its relevance score.
- the search engine according to the invention is in particular designed so that it can carry out the method described above. It therefore also has the same advantages as the method mentioned above.
- this comprises a memory in which the reference quantity with terms or a set of terms which occur in the data records and the probabilities associated with the terms are stored. The probabilities result in particular from the frequency of occurrence of the terms in the reference set or in the data sets that are to be searched.
- Figure 1 shows schematically the basic structure of the search engine according to a
- Figure 2 shows the steps in carrying out an embodiment of the method according to the invention.
- the exemplary embodiment described below relates to the search in a product database D.
- For each product, a data record d
- may in turn be subdivided into several fields, which may relate, for example, to the price of the product, the color of the product, the material of the product or other relevant features of the product.
- the product database D is provided to a user in connection with an online store.
- the user can use his computer 3 to access a website via the Internet 2, which is provided by a central unit 1 of the online shop.
- the user can use his computer 3 via the Internet 2 the online store to submit a query Q, which is received by a receiving unit 4 of the central unit 1 of the online shop.
- the receiving unit 4 transmits the search request Q to a device 5 for determining a subset V with terms that are similar or identical to a search term q, the search query Q.
- the central unit 1 is coupled to a memory 1 1.
- This memory 1 1 can contain the product database D on the one hand.
- the memory 1 1 contains a reference quantity T with terms t.
- the reference set T is, for example, a word database which contains essentially all the words of one or more languages or contains all the words that are present in a product database. can come.
- the terms t in this case are therefore words in particular.
- a probability p j is stored in memory 1 1 for each term t.
- This probability p j of a term t indicates what the probability is that this term t j occurs in an amount with terms.
- these probabilities p j can be derived from the frequencies with which a particular word occurs in texts of a particular language. These occurrence frequencies are known per se and can be stored in advance in the memory 11. Alternatively, it could be determined how often a particular term t j occurs in the database D. From this frequency of occurrence, the probability p j could then be determined for the term t j to occur in the database D.
- a similarity measure for the respective term t j is determined for each term t j of the subset V by the device 6.
- the similarity measure indicates how similar the term t j to a search term q, the search query Q is.
- the device 6 is coupled to a device 7 which can determine the probability p j for the occurrence of the term t j .
- this probability p j can be simply read out of the memory 1 1 by the device 7, in which these previously determined probabilities, as explained above, are stored.
- the device 7 is coupled to a device 8 in which a weighting distribution X j dependent on a term t j of the subset V is applicable to all terms t k of the subset V.
- a weighting distribution X j dependent on a term t j of the subset V is applicable to all terms t k of the subset V.
- the device 8 can determine modified term probabilities p " j
- the device can consider the number of terms per data set (for example by Access to the memory 1 1) This results in the modified probability p ' j for a term t j or even more similar to occur in a data set.
- the modified probabilities p ' j are transmitted by the device 8 to a rating unit 9.
- the evaluation unit 9 evaluates the records d
- the relevance score transfers the score unit 9 to an output unit 10.
- the output unit may output a certain number of records d
- This output can be made available via the Internet 2 to the user's computer 3, for example by an advertisement on a website displayed on the user's computer 3. Details of the above-described devices of the central unit 1 are described below in connection with the explanation of an embodiment of the method according to the invention:
- the product database D records d
- the product database D may contain, for example, 300 data records.
- the product database D thus comprises the data records di, d 2 , d 30 o.
- for a query Q is high even if the product of the probabilities of the words in this record d
- the absolute value of the logarithm of the probability is therefore preferably formed. This absolute amount increases with the relevance and behaves additively to the individual probabilities.
- FIG. 2 represents a modification of a known method, which considers the inverse occurrence frequency of terms:
- a user generates in step 20 a query Q containing the search terms q i, where i is a natural number.
- the search term may be "shirt” and the search term q 2 may be "blue”.
- a subset V with terms t which are similar or identical to the first search term.
- the reference set T may, as explained above, be a word database containing all the words of a language. Methods for automatically evaluating the similarity of two strings are known per se. In this case, the similarity of a search term q, with all terms t of the reference quantity T is determined for the determination of the subset V. The terms t which lie in a certain similarity range are included in the subset V.
- a method for the automatic evaluation of the similarity is described, for example, in WO 2007/144199 A1, the disclosure content of which is hereby incorporated by reference.
- the subset V can thus contain, for example, three terms t x , t y and t z .
- the following subset V can be determined for the search term "shirt”: ⁇ shirt, shirts, T-shirt ⁇
- the following subset V can be determined for the second search term "blue”: ⁇ blue, blue, blue ⁇ ,
- the subset V can contain only elements of the reference set T.
- the search query Q contains a misspelled word, that word will not be included in the telset V since it is not included in the reference set T.
- misspelled words of a search query Q can be sorted out so that they are not given a very high relevance because they are very rare.
- misspelled words are also taken into account in the relevance evaluation of the product database D since, instead of the misspelled word, a subset V is taken into account. which contains terms similar to the misspelled word.
- the method according to the invention is fault-tolerant.
- t x , t y and t z of the subset V are used in the method for relevance evaluation of the records d
- a similarity measure A i is determined which has this term t j relative to the underlying search term q i.
- This similarity measure A can also be determined, for example, by means of a method as described in WO 2007/144199 A1.
- the subset V can thus be sorted as a function of the similarity measure Aj j of the contained terms t j with respect to a search term q. If the search term q is contained even in the subset V, this term t j of the subset V will have the highest similarity measure A since it is identical to the search term q i. With decreasing similarity measure A, the further terms t j of subset V follow.
- the following sorted subset V can result for the search term "blue”: 1. blue, 2. blue, 3. blue
- the probabilities p j of the terms t j of subset V are determined the probability p j that the term t j is drawn from the database D if one is selected at random, in contrast to the relevance measure (inverse document frequency) described above, the frequency of the documents, ie data records, was considered
- the previously stored probability p j is read out from the memory 1 1 for the term t j of the subset T to occur, ie for example in certain texts or in the data records d 1.
- the word “blue” occurs with a probability of 0.02
- the word “blue” occurs with a probability of 0.01
- the word “blue” also occurs he probability of 0.01 occurs.
- a weighting distribution X j is now applied for each term t j of the subset V.
- the type of weighting distribution X j is dependent on the term t j of the subset V considered.
- the weighting distribution X j is a step function which outputs the weight 1 for the term t j considered, as well as for those other terms t k of the subset V which have a higher similarity measure A ik than the term t j that you are currently looking at.
- the weighting distribution X j thus acts as a filter in this case, which filters out all the terms t k of the subset V which have a lower similarity measure A than the considered term t.
- a modified term probability p " j is then determined in step 25, ie the modified probability p ' j is determined for the term t j .
- the modified term probability p " x for the word” blue “ is 0.02 (probability for the word” blue ")
- the modified probability p" y for the word “blue” is 0.03 (probability for the word “Blue” or “blue")
- the modified probability p " z for the word” blue “to 0.04 probability for the word” blue ",” blue “or” blue ").
- Step 26 For the evaluation of datasets, it makes sense to calculate the probability that a dataset contains a term. Therefore, it is useful to consider the number of terms per dataset. This has a distribution that can be determined and stored in advance. For example, the average number of terms per record can be determined. But also a precise calculation is possible. For the example with 300 data records, consider the case that 150 of them have 5 terms and another 150 of them have 10 terms. The likelihood that a combination of 5 terms would give a given term for can be calculated by: 1 - (1 - ⁇ ' ⁇ ) ⁇ 5. The expression in brackets indicates the probability that a term is not blue at all. The 5th power then gives the probability that a combination of 5 terms does not contain the term 'blue'.
- a list is generated in step 28 with the data sets d u , d v , d w , ..., which have the highest relevance rating ⁇ . This list is then output in the order of relevance ratings ⁇ .
- the second embodiment differs from the first embodiment described above in the weighting distribution X.
- the probability distribution X is a sigmoid function.
- the sigmoid function results in a continuous transition between the two values 0 and 1. This ensures that terms t k of the subset V which have a smaller similarity measure A, but whose similarity measures are very close to the term t j , for which the modified probability p ' j is determined do not remain disregarded as in the first embodiment, but are still considered with a lower weighting.
- the weighting with which the probability p j of a second term t k is included in the modified term probability p " j of a first term t j is determined by the evaluation of a sigmoid function, the evaluation point determining the subtraction of the similarity measure A of the first term t j from the similarity measure A ik of the second term t k .
- this is explained using the example of the search term (q i) "sympathy”.
- the device 5 has determined a number of similar terms (V) for this purpose, and the devices 6 and 7 (steps 22, 23) have determined the associated similarities and term probabilities.
- the device 8 (step 24) now determines the weighting distribution using a sigmoid function.
- One possible such function is the cumulative Gaussian normal distribution.
- the similarity differences (with the associated weighting), as calculated by the device 8, are shown:
- the method steps described above can be implemented as hardware components or as software.
- the software may be stored on a data carrier, ie on a computer program product.
- the program code contained in the software is suitable for carrying out the method described above when the program code is executed by a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2013525253A JP5890413B2 (ja) | 2010-08-25 | 2011-08-17 | 多数のデータレコードをサーチする方法及びサーチエンジン |
| RU2013112783/08A RU2013112783A (ru) | 2010-08-25 | 2011-08-17 | Способ поиска в большом количестве наборов данных и поисковая машина |
| BR112013004243A BR112013004243A2 (pt) | 2010-08-25 | 2011-08-17 | processo implementado por computador para a pesquisa em uma multiplicidade de conjuntos de dados, produto de programa de computador com código de programa, e, motor de pesquisa. |
| CN201180040712.0A CN103098052B (zh) | 2010-08-25 | 2011-08-17 | 用于搜索多个数据记录的方法和搜索引擎 |
| US13/818,180 US9087119B2 (en) | 2010-08-25 | 2011-08-17 | Method for searching in a plurality of data sets and search engine |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP10174043.9 | 2010-08-25 | ||
| EP20100174043 EP2423830A1 (de) | 2010-08-25 | 2010-08-25 | Verfahren zum Suchen in einer Vielzahl von Datensätzen und Suchmaschine |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012025439A1 true WO2012025439A1 (de) | 2012-03-01 |
Family
ID=42791041
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2011/064163 Ceased WO2012025439A1 (de) | 2010-08-25 | 2011-08-17 | Verfahren zum suchen in einer vielzahl von datensätzen und suchmaschine |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US9087119B2 (de) |
| EP (1) | EP2423830A1 (de) |
| JP (1) | JP5890413B2 (de) |
| CN (1) | CN103098052B (de) |
| BR (1) | BR112013004243A2 (de) |
| RU (1) | RU2013112783A (de) |
| WO (1) | WO2012025439A1 (de) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3089097A1 (de) | 2015-04-28 | 2016-11-02 | Omikron Data Quality GmbH | Verfahren zum erzeugen von prioritätsdaten für produkte |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9335885B1 (en) | 2011-10-01 | 2016-05-10 | BioFortis, Inc. | Generating user interface for viewing data records |
Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0752676A2 (de) * | 1995-07-07 | 1997-01-08 | Sun Microsystems, Inc. | Verfahren und Gerät, um Suchantworten in einem rechnergestützten Dokumentwiederauffindungssystem zu generieren |
| EP1072982A2 (de) * | 1999-07-30 | 2001-01-31 | Matsushita Electric Industrial Co., Ltd. | Verfahren und System zum Extrahieren von ähnlichen Wörtern und zum Wiederauffinden von Dokumenten |
| EP1095326B1 (de) | 1998-07-10 | 2002-01-30 | Fast Search & Transfer ASA | Ein suchsystem und verfahren zum zurückholen von daten und die anwendung in einem suchgerät |
| EP1622054A1 (de) | 2004-07-26 | 2006-02-01 | Google, Inc. | Phrasen basiertes Suchen in einem System zur Informationsabfrage |
| EP1341009B1 (de) | 1996-09-24 | 2006-04-19 | Seiko Epson Corporation | Beleuchtungsvorrichtung und diese verwendende Anzeigevorrichtung |
| US20060253427A1 (en) * | 2005-05-04 | 2006-11-09 | Jun Wu | Suggesting and refining user input based on original user input |
| EP1459206B1 (de) | 2001-12-20 | 2007-07-11 | Endeca Technologies, Inc. | Verfahren und vorrichtung für ähnlichkeitssuche und gruppenbildung |
| WO2007144199A1 (de) | 2006-06-16 | 2007-12-21 | Omikron Data Quality Gmbh | Verfahren zum automatischen bewerten der ähnlichkeit von zwei zeichenketten, die in einem computer gespeichert sind |
| US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
| US20080082511A1 (en) * | 2006-08-31 | 2008-04-03 | Williams Frank J | Methods for providing, displaying and suggesting results involving synonyms, similarities and others |
| US20080114721A1 (en) * | 2006-11-15 | 2008-05-15 | Rosie Jones | System and method for generating substitutable queries on the basis of one or more features |
| WO2008085637A2 (en) | 2007-01-05 | 2008-07-17 | Yahoo! Inc. | Clustered search processing |
| WO2008137395A1 (en) | 2007-05-02 | 2008-11-13 | Yahoo! Inc, | Enabling clustered search processing via text messaging |
| EP1208465B1 (de) | 1999-05-10 | 2009-08-12 | Fast Search & Transfer ASA | Suchmaschine mit zweidimensionaler, linear skalierbarer paralleller architektur |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001134588A (ja) * | 1999-11-04 | 2001-05-18 | Ricoh Co Ltd | 文書検索装置 |
| JP2005309760A (ja) * | 2004-04-21 | 2005-11-04 | Nippon Telegr & Teleph Corp <Ntt> | 検索語ランキング算出方法及び装置及びプログラム |
| CN101535945A (zh) * | 2006-04-25 | 2009-09-16 | 英孚威尔公司 | 全文查询和搜索系统及其使用方法 |
| KR100931025B1 (ko) * | 2008-03-18 | 2009-12-10 | 한국과학기술원 | 재현율의 저하 없이 정확도를 향상시키기 위한 추가 용어를이용한 질의 확장 방법 |
-
2010
- 2010-08-25 EP EP20100174043 patent/EP2423830A1/de not_active Ceased
-
2011
- 2011-08-17 BR BR112013004243A patent/BR112013004243A2/pt not_active IP Right Cessation
- 2011-08-17 JP JP2013525253A patent/JP5890413B2/ja not_active Expired - Fee Related
- 2011-08-17 US US13/818,180 patent/US9087119B2/en active Active
- 2011-08-17 RU RU2013112783/08A patent/RU2013112783A/ru unknown
- 2011-08-17 CN CN201180040712.0A patent/CN103098052B/zh not_active Expired - Fee Related
- 2011-08-17 WO PCT/EP2011/064163 patent/WO2012025439A1/de not_active Ceased
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0752676A2 (de) * | 1995-07-07 | 1997-01-08 | Sun Microsystems, Inc. | Verfahren und Gerät, um Suchantworten in einem rechnergestützten Dokumentwiederauffindungssystem zu generieren |
| EP1341009B1 (de) | 1996-09-24 | 2006-04-19 | Seiko Epson Corporation | Beleuchtungsvorrichtung und diese verwendende Anzeigevorrichtung |
| EP1095326B1 (de) | 1998-07-10 | 2002-01-30 | Fast Search & Transfer ASA | Ein suchsystem und verfahren zum zurückholen von daten und die anwendung in einem suchgerät |
| EP1208465B1 (de) | 1999-05-10 | 2009-08-12 | Fast Search & Transfer ASA | Suchmaschine mit zweidimensionaler, linear skalierbarer paralleller architektur |
| EP1072982A2 (de) * | 1999-07-30 | 2001-01-31 | Matsushita Electric Industrial Co., Ltd. | Verfahren und System zum Extrahieren von ähnlichen Wörtern und zum Wiederauffinden von Dokumenten |
| EP1459206B1 (de) | 2001-12-20 | 2007-07-11 | Endeca Technologies, Inc. | Verfahren und vorrichtung für ähnlichkeitssuche und gruppenbildung |
| EP1622054A1 (de) | 2004-07-26 | 2006-02-01 | Google, Inc. | Phrasen basiertes Suchen in einem System zur Informationsabfrage |
| US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
| US20060253427A1 (en) * | 2005-05-04 | 2006-11-09 | Jun Wu | Suggesting and refining user input based on original user input |
| WO2007144199A1 (de) | 2006-06-16 | 2007-12-21 | Omikron Data Quality Gmbh | Verfahren zum automatischen bewerten der ähnlichkeit von zwei zeichenketten, die in einem computer gespeichert sind |
| US20080082511A1 (en) * | 2006-08-31 | 2008-04-03 | Williams Frank J | Methods for providing, displaying and suggesting results involving synonyms, similarities and others |
| US20080114721A1 (en) * | 2006-11-15 | 2008-05-15 | Rosie Jones | System and method for generating substitutable queries on the basis of one or more features |
| WO2008085637A2 (en) | 2007-01-05 | 2008-07-17 | Yahoo! Inc. | Clustered search processing |
| WO2008137395A1 (en) | 2007-05-02 | 2008-11-13 | Yahoo! Inc, | Enabling clustered search processing via text messaging |
Non-Patent Citations (3)
| Title |
|---|
| OH-WOOG KWON ET AL: "Query expansion using domain-adapted, weighted thesaurus in an extended Boolean model", CIKM 94. PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT ACM NEW YORK, NY, USA, 1994, pages 140 - 146, XP002603538, ISBN: 0-89791-674-3 * |
| TUAN-QUANG NGUYEN ET AL.: "Query expansion using augmented terms in an extended Boolean model", JOURNAL OF COMPUTING SCIENCE AND ENGINEERING KOREAN INSTITUTE OF INFORMATION SCIENTISTS AND ENGINEERS SOUTH KOREA, vol. 2, no. 1, March 2008 (2008-03-01), pages 26 - 43, XP002603537 |
| TUAN-QUANG NGUYEN ET AL: "Query expansion using augmented terms in an extended Boolean model", JOURNAL OF COMPUTING SCIENCE AND ENGINEERING KOREAN INSTITUTE OF INFORMATION SCIENTISTS AND ENGINEERS SOUTH KOREA, vol. 2, no. 1, March 2008 (2008-03-01), pages 26 - 43, XP002603537, ISSN: 1976-4677 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3089097A1 (de) | 2015-04-28 | 2016-11-02 | Omikron Data Quality GmbH | Verfahren zum erzeugen von prioritätsdaten für produkte |
| WO2016174142A1 (de) | 2015-04-28 | 2016-11-03 | Omikron Data Quality Gmbh | Verfahren zum erzeugen von prioritätsdaten für produkte |
| US11170428B2 (en) | 2015-04-28 | 2021-11-09 | Omikron Data Quality Gmbh | Method for generating priority data for products |
Also Published As
| Publication number | Publication date |
|---|---|
| US20130151499A1 (en) | 2013-06-13 |
| CN103098052A (zh) | 2013-05-08 |
| JP5890413B2 (ja) | 2016-03-22 |
| RU2013112783A (ru) | 2014-09-27 |
| US9087119B2 (en) | 2015-07-21 |
| BR112013004243A2 (pt) | 2016-07-26 |
| EP2423830A1 (de) | 2012-02-29 |
| CN103098052B (zh) | 2017-05-24 |
| JP2013536519A (ja) | 2013-09-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| DE69904588T2 (de) | Datenbankzugangswerkzeug | |
| DE69230814T2 (de) | Datenbankauffindungssystem zur Beantwortung natursprachlicher Fragen mit dazugehörigen Tabellen | |
| DE69811066T2 (de) | Datenzusammenfassungsgerät. | |
| DE69426541T2 (de) | Dokumentdetektionssystem mit Darstellung des Detektionsresultats zur Erleichterung des Verständnis des Benutzers | |
| WO2021032824A1 (de) | Verfahren und vorrichtung zur vorauswahl und ermittlung ähnlicher dokumente | |
| DE102024206129A1 (de) | Verfahren und vorrichtung zur ursachenanalyse von produktfehlern auf der grundlage eines wissensgraphen | |
| DE102013209868A1 (de) | Abfragen und Integrieren strukturierter und unstrukturierter Daten | |
| DE102012221251A1 (de) | Semantisches und kontextbezogenes Durchsuchen von Wissensspeichern | |
| EP3948577B1 (de) | Automatisiertes maschinelles lernen auf basis gespeicherten daten | |
| DE112010004914B4 (de) | Indexieren von Dokumenten | |
| DE112021001743T5 (de) | Vektoreinbettungsmodelle für relationale tabellen mit null- oder äquivalenten werten | |
| DE112018002626T5 (de) | Verfahren und Systeme zur optimierten visuellen Zusammenfassung von Sequenzen mit zeitbezogenen Ereignisdaten | |
| CH712988A1 (de) | Verfahren zum Durchsuchen von Daten zur Verhinderung von Datenverlust. | |
| DE112010002620T5 (de) | Ontologie-nutzung zum ordnen von datensätzen nachrelevanz | |
| DE102012025350A1 (de) | Verarbeitungn eines elektronischen Dokuments | |
| WO2012025439A1 (de) | Verfahren zum suchen in einer vielzahl von datensätzen und suchmaschine | |
| DE10063514A1 (de) | Verwendung einer gespeicherten Prozedur zum Zugriff auf Indexkonfigurationsdaten in einem fernen Datenbankverwaltungssystem | |
| DE69129681T2 (de) | Verfahren und Gerät zum Interpretieren und Organisieren von Zeitsteuerungsspezifikationsdaten | |
| DE112019006523T5 (de) | Satzstrukturvektorisierungsvorrichtung, satzstrukturvektorisierungsverfahren und satzstrukturvektorisierungsprogramm | |
| DE102022128157A1 (de) | Computerimplementiertes Verfahren zur Standardisierung von Teilenamen | |
| DE112022006882T5 (de) | Lerneinrichtung, verwaltungsblatterstellung-unterstützungseinrichtung, programm, lernverfahren und verwaltungsblatterstellung-unterstützungsverfahren | |
| DE10221178A1 (de) | Verfahren zur Generierung von Seiten in einer Auszeichnungssprache zur Auswahl von Produkten und Softwaretool | |
| DE10160920B4 (de) | Verfahren und Vorrichtung zur Erzeugung eines Extrakts von Dokumenten | |
| DE102016217191A1 (de) | Verfahren zum Auswählen und Bewerten von einer Vielzahl von Datensätzen aus zumindest einer Datenquelle | |
| DE112014002696T5 (de) | Verfahren und System für effizientes Sortieren in einer relationalen Datenbank |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 201180040712.0 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11748921 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 13818180 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2013525253 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2013112783 Country of ref document: RU Kind code of ref document: A |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11748921 Country of ref document: EP Kind code of ref document: A1 |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112013004243 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 112013004243 Country of ref document: BR Kind code of ref document: A2 Effective date: 20130222 |


