JP4141460B2 - 自動分類生成 - Google Patents
自動分類生成 Download PDFInfo
- Publication number
- JP4141460B2 JP4141460B2 JP2005184985A JP2005184985A JP4141460B2 JP 4141460 B2 JP4141460 B2 JP 4141460B2 JP 2005184985 A JP2005184985 A JP 2005184985A JP 2005184985 A JP2005184985 A JP 2005184985A JP 4141460 B2 JP4141460 B2 JP 4141460B2
- Authority
- JP
- Japan
- Prior art keywords
- document
- node
- term
- terms
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
L=Sum{Sum[n(di,wjk)log(P(di,wjk)),j],i,k}
上記の式中、n(di,wjk)はノードkでの文書di内の用語wjの出現回数、P(di,wjk)は任意の文書に出現する用語の確率に基づく、文書di内に出現するノードkの用語wjの確率である。各ノードに関連付けられている用語の確率は、次いで対数尤度を最大にするように繰り返し調整することができる。最大化は、絶対最大値または相対最大値とすることができる。結果として得られたこれらの用語の確率は、図3のベクトル340、350に格納され、データストア内のそれぞれの子ノードに関連付けられる。このように、2つの子ノード(または図1の親ノード152、153)のそれぞれは、訓練用語のリスト(用語ベクトル320)、および1組の訓練文書が各子ノードの用語から形成される対数尤度を最大にするために最適化された文書に出現する各用語のそれぞれの確率(用語確率ベクトル340、350)に関連付けられる。
Sj=Sum[P(wj)*log(P(wi)/Zj(wi))]
式中、SjはKl divergence、P(wi)は用語wiが所与の文書内で検出される確率、およびZj(wi)は用語wiがノードjで検出される確率である。上記の式の対称的なバージョンを含めて、他の適した系統的に定められた距離または類似度も適していることを理解されたい。
210 訓練文書
220 ツリー生成器
240 文書ソータ
242 文書
260 情報取り出しシステム
310 用語生成器
320 用語ベクトル
330 ノード生成器
360 文書割当器
362,364,366 文書セット
370 ツリーマネージャ
Claims (16)
- コンピュータ実行可能なコンポーネントを有するコンピュータ記録媒体であって、該コンポーネントは、ノード生成器、文書割当器、ツリーマネージャ、文書ソータを含み、
(a)該ノード生成器は、1組の訓練文書に基づいた訓練用語のリストを受信し、第1の確率の組を含む第1の兄弟ノードを生成し、第2の確率の組を含む第2の兄弟ノードを生成するように構成され、前記第1の確率の組は、前記訓練用語のリスト内の用語ごとに、前記用語が文書に出現する確率を含み、前記第2の確率の組は、前記訓練用語のリスト内の用語ごとに、前記用語が文書に出現する確率を含み、前記第1及び第2の兄弟ノードは、親ノードを分割して生成され、該ノード生成器は、第1および第2のノードに関連付けられた訓練文書のすべてについての尤度を最大にする期待値最大化アルゴリズムに基づいて、前記第1および第2の確率の組を決定するものであり、
(b)該文書割当器は、前記第1および第2の確率の組に基づいて、前記1組の訓練文書の各文書を、前記第1の兄弟ノード、前記第2の兄弟ノード、およびヌルセットから成るグループのうちの少なくとも1つに関連付けるように構成され、前記文書は第1の文書セットを形成する前記第1の兄弟ノードに関連付けられ、前記文書は第2の文書セットを形成する前記第2の兄弟ノードに関連付けられるものであり、
(c)該ツリーマネージャは、前記第1の文書セットおよび第2の文書セットのうちの少なくとも一方を前記ノード生成器に接続して、前記ノード生成器および前記文書割当器の再帰的性能に基づいて、複数の兄弟ノードの階層を含むバイナリツリーデータ構造を作成するように構成され、
(d)該文書ソータは、新しい文書を、前記ノードと関連付けられた確率の各組に基づいて前記複数の兄弟ノードのうちの少なくとも1つのノードに関連付けるように構成され、前記ツリーマネージャが、該文書ソータによるアクセスのための前記バイナリツリーデータ構造を格納している、
コンピュータ記録媒体。 - 請求項1に記載のコンピュータ記録媒体において、前記文書ソータは、前記新しい文書と前記第1および第2の兄弟ノードのそれぞれとの間の統計距離を比較する、コンピュータ記録媒体。
- 請求項1に記載のコンピュータ記録媒体であって、前記1組の訓練文書を受信し、前記1組の訓練文書内の前記文書の少なくとも一部分に出現する用語に基づいて前記訓練用語のリストを生成するように構成されている用語生成器をさらに含む、コンピュータ記録媒体。
- 請求項3に記載のコンピュータ記録媒体において、前記用語生成器は、前記文書の少なくとも一部分に出現する前記用語の出現頻度に基づいて前記訓練用語のリストを生成する、コンピュータ記録媒体。
- 請求項3に記載のコンピュータ記録媒体において、前記用語生成器は、予め定められた排除用語のリストを考慮に入れる、コンピュータ記録媒体。
- 請求項1に記載のコンピュータ記録媒体において、前記文書割当器は、前記1組の訓練文書の各文書と、前記第1のノードおよび前記第2のノードのそれぞれとの間の統計距離値を決定する、コンピュータ記録媒体。
- 請求項6に記載のコンピュータ記録媒体において、前記文書割当器は、前記文書と前記第1のノードとの間の前記決定された距離値が予め定められた閾値を下回る場合、前記1組の訓練文書の文書を前記第1のノードに関連づける、コンピュータ記録媒体。
- 請求項6に記載のコンピュータ記録媒体において、前記距離値はKldivergence値である、コンピュータ記録媒体。
- コンピュータによる実施方法であって、該方法は、
(a)1組の訓練文書に基づいてバイナリ分類ツリーを作成するステップであって、前記バイナリ分類ツリーの各ノードは用語のリストに関連付けられ、前記各用語のリスト内の各用語は当該用語が前記ノードに与えられた文書に出現する確率に関連付けられ、最初にルートノードが作成されて、それが次にバイナリ分類ツリーの子ノードの作成に使用され、前記子ノードは、各親ノードを分割して作成され、前記バイナリ分類ツリーは、新しい文書を前記バイナリ分類ツリーのノードと関連付けるために格納され、該バイナリ分類ツリーを作成するステップは、前記1組の訓練文書内の各文書が前記バイナリ分類ツリーの2つの兄弟ノードのそれぞれに関連付けられている前記用語のリストによって生成される尤度を最大にする期待値最大化アルゴリズムに基づいて文書に出現する前記用語の各確率を決定するステップを含む、ステップと、
(b)新しい文書を当該文書と前記ノードとの間の距離値に基づいて前記バイナリツリーの少なくとも1つのノードに関連付けるステップと
を含む、方法。 - 請求項9に記載の方法において、前記距離値はKl divergenceに基づいて決定される、方法。
- 請求項10に記載の方法において、前記新しい文書は距離閾値を下回るKldivergenceを有するノードに関連付けられている、方法。
- 請求項10に記載の方法において、前記新しい文書を関連付けるステップは、前記新しい文書を、パスを有するノードであって前記パスにわたって最も小さいKldivergenceを有するノードに関連付けるステップを含む、方法。
- 請求項9に記載の方法において、前記バイナリ分類ツリーを作成するステップは、前記用語のリストに関連付けられている前記ノードの親ノードに関連付けられている前記用語のリストに基づいてノードに関連付けられる用語の各リストを決定するステップを含む、方法。
- 請求項9に記載の方法において、前記バイナリ分類ツリーを作成するステップは、前記1組の訓練文書の少なくとも一部分を第1の子ノード、第2の子ノード、およびヌルセットのうちの少なくとも1つに関連付けるステップを含む、方法。
- 請求項14に記載の方法において、前記訓練文書の少なくとも一部分を関連付けるステップは、各用語が前記第1の子ノードに関連付けられる各確率、および各用語が前記第2の子ノードに関連付けられる各確率に基づくものである、方法。
- コンピュータに、請求項9ないし15のいずれか一項に記載の方法の各ステップのすべてを実行させるためのプログラム・コードを有するコンピュータ・プログラム。
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10/881,893 US7266548B2 (en) | 2004-06-30 | 2004-06-30 | Automated taxonomy generation |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2006018829A JP2006018829A (ja) | 2006-01-19 |
| JP2006018829A5 JP2006018829A5 (ja) | 2008-02-14 |
| JP4141460B2 true JP4141460B2 (ja) | 2008-08-27 |
Family
ID=35063178
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2005184985A Expired - Fee Related JP4141460B2 (ja) | 2004-06-30 | 2005-06-24 | 自動分類生成 |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US7266548B2 (ja) |
| EP (1) | EP1612701A3 (ja) |
| JP (1) | JP4141460B2 (ja) |
| KR (1) | KR20060048583A (ja) |
| CN (1) | CN1716256A (ja) |
| BR (1) | BRPI0502591A (ja) |
| CA (1) | CA2510761A1 (ja) |
| MX (1) | MXPA05007136A (ja) |
Families Citing this family (90)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7836083B2 (en) * | 2004-02-20 | 2010-11-16 | Factiva, Inc. | Intelligent search and retrieval system and method |
| US7698333B2 (en) * | 2004-07-22 | 2010-04-13 | Factiva, Inc. | Intelligent query system and method using phrase-code frequency-inverse phrase-code document frequency module |
| US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
| WO2006047654A2 (en) * | 2004-10-25 | 2006-05-04 | Yuanhua Tang | Full text query and search systems and methods of use |
| US7814105B2 (en) * | 2004-10-27 | 2010-10-12 | Harris Corporation | Method for domain identification of documents in a document database |
| US7457808B2 (en) * | 2004-12-17 | 2008-11-25 | Xerox Corporation | Method and apparatus for explaining categorization decisions |
| JP2008537225A (ja) * | 2005-04-11 | 2008-09-11 | テキストディガー,インコーポレイテッド | クエリについての検索システムおよび方法 |
| CA2545232A1 (en) * | 2005-07-29 | 2007-01-29 | Cognos Incorporated | Method and system for creating a taxonomy from business-oriented metadata content |
| CA2545237A1 (en) * | 2005-07-29 | 2007-01-29 | Cognos Incorporated | Method and system for managing exemplar terms database for business-oriented metadata content |
| US8023739B2 (en) * | 2005-09-27 | 2011-09-20 | Battelle Memorial Institute | Processes, data structures, and apparatuses for representing knowledge |
| US8600997B2 (en) * | 2005-09-30 | 2013-12-03 | International Business Machines Corporation | Method and framework to support indexing and searching taxonomies in large scale full text indexes |
| WO2007081681A2 (en) | 2006-01-03 | 2007-07-19 | Textdigger, Inc. | Search system with query refinement and search method |
| US7720848B2 (en) * | 2006-03-29 | 2010-05-18 | Xerox Corporation | Hierarchical clustering with real-time updating |
| WO2007114932A2 (en) | 2006-04-04 | 2007-10-11 | Textdigger, Inc. | Search system and method with text function tagging |
| US8429526B2 (en) * | 2006-04-10 | 2013-04-23 | Oracle International Corporation | Efficient evaluation for diff of XML documents |
| US8055597B2 (en) * | 2006-05-16 | 2011-11-08 | Sony Corporation | Method and system for subspace bounded recursive clustering of categorical data |
| US7844557B2 (en) * | 2006-05-16 | 2010-11-30 | Sony Corporation | Method and system for order invariant clustering of categorical data |
| US7761394B2 (en) * | 2006-05-16 | 2010-07-20 | Sony Corporation | Augmented dataset representation using a taxonomy which accounts for similarity and dissimilarity between each record in the dataset and a user's similarity-biased intuition |
| US7961189B2 (en) * | 2006-05-16 | 2011-06-14 | Sony Corporation | Displaying artists related to an artist of interest |
| US7630946B2 (en) * | 2006-05-16 | 2009-12-08 | Sony Corporation | System for folder classification based on folder content similarity and dissimilarity |
| US7640220B2 (en) * | 2006-05-16 | 2009-12-29 | Sony Corporation | Optimal taxonomy layer selection method |
| US7664718B2 (en) * | 2006-05-16 | 2010-02-16 | Sony Corporation | Method and system for seed based clustering of categorical data using hierarchies |
| US20070271286A1 (en) * | 2006-05-16 | 2007-11-22 | Khemdut Purang | Dimensionality reduction for content category data |
| US20080027971A1 (en) * | 2006-07-28 | 2008-01-31 | Craig Statchuk | Method and system for populating an index corpus to a search engine |
| WO2008039929A1 (en) * | 2006-09-27 | 2008-04-03 | Educational Testing Service | Method and system for xml multi-transform |
| US7496568B2 (en) * | 2006-11-30 | 2009-02-24 | International Business Machines Corporation | Efficient multifaceted search in information retrieval systems |
| KR100837751B1 (ko) * | 2006-12-12 | 2008-06-13 | 엔에이치엔(주) | 문서 집합을 기반으로 단어 간의 연관도를 측정하는 방법및 상기 방법을 수행하는 시스템 |
| US20080140591A1 (en) * | 2006-12-12 | 2008-06-12 | Yahoo! Inc. | System and method for matching objects belonging to hierarchies |
| US7689625B2 (en) * | 2007-01-10 | 2010-03-30 | Microsoft Corporation | Taxonomy object modeling |
| US8104080B2 (en) * | 2007-01-26 | 2012-01-24 | Microsoft Corporation | Universal schema for representing management policy |
| US20080184277A1 (en) * | 2007-01-26 | 2008-07-31 | Microsoft Corporation | Systems management policy validation, distribution and enactment |
| WO2008097891A2 (en) * | 2007-02-02 | 2008-08-14 | Musgrove Technology Enterprises Llc | Method and apparatus for aligning multiple taxonomies |
| US9405819B2 (en) * | 2007-02-07 | 2016-08-02 | Fujitsu Limited | Efficient indexing using compact decision diagrams |
| US20080222515A1 (en) * | 2007-02-26 | 2008-09-11 | Microsoft Corporation | Parameterized types and elements in xml schema |
| US7765241B2 (en) * | 2007-04-20 | 2010-07-27 | Microsoft Corporation | Describing expected entity relationships in a model |
| US7792826B2 (en) * | 2007-05-29 | 2010-09-07 | International Business Machines Corporation | Method and system for providing ranked search results |
| JP5045240B2 (ja) * | 2007-05-29 | 2012-10-10 | 富士通株式会社 | データ分割プログラム、該プログラムを記録した記録媒体、データ分割装置、およびデータ分割方法 |
| US20090012984A1 (en) * | 2007-07-02 | 2009-01-08 | Equivio Ltd. | Method for Organizing Large Numbers of Documents |
| US8171029B2 (en) * | 2007-10-05 | 2012-05-01 | Fujitsu Limited | Automatic generation of ontologies using word affinities |
| US9081852B2 (en) * | 2007-10-05 | 2015-07-14 | Fujitsu Limited | Recommending terms to specify ontology space |
| US20090112865A1 (en) * | 2007-10-26 | 2009-04-30 | Vee Erik N | Hierarchical structure entropy measurement methods and systems |
| WO2009059297A1 (en) * | 2007-11-01 | 2009-05-07 | Textdigger, Inc. | Method and apparatus for automated tag generation for digital content |
| US8099430B2 (en) * | 2008-12-18 | 2012-01-17 | International Business Machines Corporation | Computer method and apparatus of information management and navigation |
| US8745028B1 (en) * | 2007-12-27 | 2014-06-03 | Google Inc. | Interpreting adjacent search terms based on a hierarchical relationship |
| US8331699B2 (en) * | 2009-03-16 | 2012-12-11 | Siemens Medical Solutions Usa, Inc. | Hierarchical classifier for data classification |
| WO2011013865A1 (ko) * | 2009-07-30 | 2011-02-03 | Park Soo Min | 연락 정보 제공 장치, 방법 및 이를 이용한 휴대 단말기 |
| US8600967B2 (en) * | 2010-02-03 | 2013-12-03 | Apple Inc. | Automatic organization of browsing histories |
| US8954440B1 (en) * | 2010-04-09 | 2015-02-10 | Wal-Mart Stores, Inc. | Selectively delivering an article |
| US20110307240A1 (en) * | 2010-06-10 | 2011-12-15 | Microsoft Corporation | Data modeling of multilingual taxonomical hierarchies |
| US20110307243A1 (en) * | 2010-06-10 | 2011-12-15 | Microsoft Corporation | Multilingual runtime rendering of metadata |
| US20120076416A1 (en) * | 2010-09-24 | 2012-03-29 | Castellanos Maria G | Determining correlations between slow stream and fast stream information |
| US8775444B2 (en) * | 2010-10-29 | 2014-07-08 | Xerox Corporation | Generating a subset aggregate document from an existing aggregate document |
| US9852311B1 (en) | 2011-03-08 | 2017-12-26 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9667741B1 (en) | 2011-03-08 | 2017-05-30 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9300637B1 (en) * | 2011-03-08 | 2016-03-29 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US8726398B1 (en) | 2011-12-13 | 2014-05-13 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9356993B1 (en) | 2011-03-08 | 2016-05-31 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9432342B1 (en) | 2011-03-08 | 2016-08-30 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9413526B1 (en) | 2011-03-08 | 2016-08-09 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US11228566B1 (en) | 2011-03-08 | 2022-01-18 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US9338220B1 (en) | 2011-03-08 | 2016-05-10 | Ciphercloud, Inc. | System and method to anonymize data transmitted to a destination computing device |
| US8645381B2 (en) * | 2011-06-27 | 2014-02-04 | International Business Machines Corporation | Document taxonomy generation from tag data using user groupings of tags |
| US10372741B2 (en) | 2012-03-02 | 2019-08-06 | Clarabridge, Inc. | Apparatus for automatic theme detection from unstructured data |
| US20130246392A1 (en) * | 2012-03-14 | 2013-09-19 | Inago Inc. | Conversational System and Method of Searching for Information |
| US8874435B2 (en) | 2012-04-17 | 2014-10-28 | International Business Machines Corporation | Automated glossary creation |
| EP2945071B1 (en) * | 2013-01-11 | 2020-08-26 | NEC Corporation | Index generating device and method, and search device and search method |
| GB2517122A (en) * | 2013-04-30 | 2015-02-18 | Giovanni Tummarello | Method and system for navigating complex data sets |
| US9548049B2 (en) * | 2014-02-19 | 2017-01-17 | Honeywell International Inc. | Methods and systems for integration of speech into systems |
| EP3174232A4 (en) * | 2014-07-25 | 2017-08-16 | Sanechips Technology Co., Ltd. | Path detection method and device, and sphere decoding detection device |
| US10002185B2 (en) | 2015-03-25 | 2018-06-19 | International Business Machines Corporation | Context-aware cognitive processing |
| US20160371345A1 (en) * | 2015-06-22 | 2016-12-22 | International Business Machines Corporation | Preprocessing Heterogeneously-Structured Electronic Documents for Data Warehousing |
| CN108604902A (zh) * | 2016-02-08 | 2018-09-28 | 皇家飞利浦有限公司 | 确定聚类的设备和方法 |
| US11657077B2 (en) | 2016-03-03 | 2023-05-23 | Rakuten Group, Inc. | Document classification device, document classification method and document classification program |
| US9471836B1 (en) * | 2016-04-01 | 2016-10-18 | Stradvision Korea, Inc. | Method for learning rejector by forming classification tree in use of training images and detecting object in test images, and rejector using the same |
| US11157540B2 (en) * | 2016-09-12 | 2021-10-26 | International Business Machines Corporation | Search space reduction for knowledge graph querying and interactions |
| CN109767546B (zh) * | 2017-01-10 | 2022-02-15 | 中钞印制技术研究院有限公司 | 有价票据的质量核查调度装置和质量核查调度方法 |
| US11397558B2 (en) | 2017-05-18 | 2022-07-26 | Peloton Interactive, Inc. | Optimizing display engagement in action automation |
| CN107330021A (zh) * | 2017-06-20 | 2017-11-07 | 北京神州泰岳软件股份有限公司 | 基于多叉树的数据分类方法、装置及设备 |
| CN111742322A (zh) * | 2017-12-29 | 2020-10-02 | 罗伯特·博世有限公司 | 用于使用深度神经网络来进行独立于领域和语言的定义提取的系统和方法 |
| US10963499B2 (en) | 2017-12-29 | 2021-03-30 | Aiqudo, Inc. | Generating command-specific language model discourses for digital assistant interpretation |
| US10963495B2 (en) * | 2017-12-29 | 2021-03-30 | Aiqudo, Inc. | Automated discourse phrase discovery for generating an improved language model of a digital assistant |
| US10929613B2 (en) | 2017-12-29 | 2021-02-23 | Aiqudo, Inc. | Automated document cluster merging for topic-based digital assistant interpretation |
| US11080341B2 (en) * | 2018-06-29 | 2021-08-03 | International Business Machines Corporation | Systems and methods for generating document variants |
| JP7360655B2 (ja) * | 2018-08-07 | 2023-10-13 | 国立大学法人 名古屋工業大学 | 議論支援装置および議論支援装置用のプログラム |
| CN112445900A (zh) * | 2019-08-29 | 2021-03-05 | 上海卓繁信息技术股份有限公司 | 快速检索方法及系统 |
| CN111177301B (zh) * | 2019-11-26 | 2023-05-26 | 云南电网有限责任公司昆明供电局 | 一种关键信息识别提取方法及系统 |
| CN111460747B (zh) * | 2020-04-10 | 2023-03-31 | 重庆百瑞互联电子技术有限公司 | 一种用于集成电路设计的标准单元追踪方法 |
| CN111563097A (zh) * | 2020-04-30 | 2020-08-21 | 广东小天才科技有限公司 | 一种无监督式的题目聚合方法、装置、电子设备及存储介质 |
| CN113569012B (zh) * | 2021-07-28 | 2023-12-26 | 卫宁健康科技集团股份有限公司 | 医疗数据查询方法、装置、设备及存储介质 |
| CN116166798A (zh) * | 2022-12-06 | 2023-05-26 | 杭州安恒信息技术股份有限公司 | 基于层次Softmax的文本分类模型训练方法、装置和设备 |
Family Cites Families (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5325298A (en) * | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
| US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
| US6484149B1 (en) * | 1997-10-10 | 2002-11-19 | Microsoft Corporation | Systems and methods for viewing product information, and methods for generating web pages |
| US6446061B1 (en) * | 1998-07-31 | 2002-09-03 | International Business Machines Corporation | Taxonomy generation for document collections |
| US6360227B1 (en) * | 1999-01-29 | 2002-03-19 | International Business Machines Corporation | System and method for generating taxonomies with applications to content-based recommendations |
| US6678681B1 (en) * | 1999-03-10 | 2004-01-13 | Google Inc. | Information extraction from a database |
| US6438590B1 (en) * | 1999-04-13 | 2002-08-20 | Hewlett-Packard Company | Computer system with preferential naming service |
| US6298340B1 (en) * | 1999-05-14 | 2001-10-02 | International Business Machines Corporation | System and method and computer program for filtering using tree structure |
| US6404752B1 (en) * | 1999-08-27 | 2002-06-11 | International Business Machines Corporation | Network switch using network processor and methods |
| US6615209B1 (en) * | 2000-02-22 | 2003-09-02 | Google, Inc. | Detecting query-specific duplicate documents |
| US6675163B1 (en) * | 2000-04-06 | 2004-01-06 | International Business Machines Corporation | Full match (FM) search algorithm implementation for a network processor |
| US6704729B1 (en) * | 2000-05-19 | 2004-03-09 | Microsoft Corporation | Retrieval of relevant information categories |
| US7136854B2 (en) * | 2000-07-06 | 2006-11-14 | Google, Inc. | Methods and apparatus for providing search results in response to an ambiguous search query |
| US6529903B2 (en) * | 2000-07-06 | 2003-03-04 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
| US20020078091A1 (en) * | 2000-07-25 | 2002-06-20 | Sonny Vu | Automatic summarization of a document |
| US6658423B1 (en) * | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
| US7039803B2 (en) * | 2001-01-26 | 2006-05-02 | International Business Machines Corporation | Method for broadcast encryption and key revocation of stateless receivers |
| US6526440B1 (en) * | 2001-01-30 | 2003-02-25 | Google, Inc. | Ranking search results by reranking the results based on local inter-connectivity |
| US8001118B2 (en) * | 2001-03-02 | 2011-08-16 | Google Inc. | Methods and apparatus for employing usage statistics in document retrieval |
| US7085771B2 (en) * | 2002-05-17 | 2006-08-01 | Verity, Inc | System and method for automatically discovering a hierarchy of concepts from a corpus of documents |
| US7103609B2 (en) * | 2002-10-31 | 2006-09-05 | International Business Machines Corporation | System and method for analyzing usage patterns in information aggregates |
| US7320000B2 (en) * | 2002-12-04 | 2008-01-15 | International Business Machines Corporation | Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy |
| US7130777B2 (en) * | 2003-11-26 | 2006-10-31 | International Business Machines Corporation | Method to hierarchical pooling of opinions from multiple sources |
-
2004
- 2004-06-30 US US10/881,893 patent/US7266548B2/en not_active Expired - Fee Related
-
2005
- 2005-06-21 EP EP05105453A patent/EP1612701A3/en not_active Withdrawn
- 2005-06-24 JP JP2005184985A patent/JP4141460B2/ja not_active Expired - Fee Related
- 2005-06-27 CA CA002510761A patent/CA2510761A1/en not_active Abandoned
- 2005-06-28 KR KR1020050056062A patent/KR20060048583A/ko not_active Withdrawn
- 2005-06-28 BR BR0502591-5A patent/BRPI0502591A/pt not_active IP Right Cessation
- 2005-06-29 MX MXPA05007136A patent/MXPA05007136A/es not_active Application Discontinuation
- 2005-06-30 CN CNA2005100822558A patent/CN1716256A/zh active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| MXPA05007136A (es) | 2006-01-11 |
| EP1612701A2 (en) | 2006-01-04 |
| CA2510761A1 (en) | 2005-12-30 |
| BRPI0502591A (pt) | 2006-02-07 |
| EP1612701A3 (en) | 2008-05-21 |
| US20060004747A1 (en) | 2006-01-05 |
| KR20060048583A (ko) | 2006-05-18 |
| CN1716256A (zh) | 2006-01-04 |
| JP2006018829A (ja) | 2006-01-19 |
| US7266548B2 (en) | 2007-09-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4141460B2 (ja) | 自動分類生成 | |
| US8407164B2 (en) | Data classification and hierarchical clustering | |
| Dai et al. | Co-clustering based classification for out-of-domain documents | |
| US7711747B2 (en) | Interactive cleaning for automatic document clustering and categorization | |
| US9355171B2 (en) | Clustering of near-duplicate documents | |
| Ruggieri | Efficient C4. 5 | |
| Elmagarmid et al. | Duplicate record detection: A survey | |
| KR101114023B1 (ko) | 확장형 문서 검색을 위한 콘텐츠 전파 | |
| JP3665480B2 (ja) | 文書整理装置および方法 | |
| US20050021545A1 (en) | Very-large-scale automatic categorizer for Web content | |
| CN119336900B (zh) | 基于层级专家路由模型与CoT推理的检索优化方法 | |
| Liu et al. | A novel DBSCAN with entropy and probability for mixed data | |
| Obaid et al. | Semantic Web and Web Page Clustering Algorithms: A Landscape View. | |
| Ming et al. | Filter feature selection methods for text classification: a review | |
| Liao et al. | Mining concept sequences from large-scale search logs for context-aware query suggestion | |
| Moskalenko et al. | Scalable recommendation of wikipedia articles to editors using representation learning | |
| Adami et al. | Bootstrapping for hierarchical document classification | |
| Kumar et al. | Hierarchical topic segmentation of websites | |
| CN100378713C (zh) | 为对象分类的自动确定显著特点的方法和装置 | |
| Natarajan et al. | Finding structure in noisy text: topic classification and unsupervised clustering | |
| Ghonge et al. | A review on improving the clustering performance in text mining | |
| Vargas Quiros | Information-theoretic anomaly detection and authorship attribution in literature | |
| Malik et al. | Instance driven hierarchical clustering of document collections | |
| O'Driscoll | CITYNET Europe | |
| Divya et al. | Effective document summarization: a hybrid clustering approach using transformer model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| RD03 | Notification of appointment of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7423 Effective date: 20071213 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20071218 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20071218 |
|
| A871 | Explanation of circumstances concerning accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A871 Effective date: 20071218 |
|
| RD04 | Notification of resignation of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7424 Effective date: 20071226 |
|
| A975 | Report on accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A971005 Effective date: 20080123 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20080207 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20080421 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20080516 |
|
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20080610 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20110620 Year of fee payment: 3 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 4141460 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20110620 Year of fee payment: 3 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120620 Year of fee payment: 4 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120620 Year of fee payment: 4 |
|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130620 Year of fee payment: 5 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| S111 | Request for change of ownership or part of ownership |
Free format text: JAPANESE INTERMEDIATE CODE: R313113 |
|
| R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
| S111 | Request for change of ownership or part of ownership |
Free format text: JAPANESE INTERMEDIATE CODE: R313113 |
|
| R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
| LAPS | Cancellation because of no payment of annual fees |