WO2012134396A1 - Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document - Google Patents
Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document Download PDFInfo
- Publication number
- WO2012134396A1 WO2012134396A1 PCT/SG2012/000106 SG2012000106W WO2012134396A1 WO 2012134396 A1 WO2012134396 A1 WO 2012134396A1 SG 2012000106 W SG2012000106 W SG 2012000106W WO 2012134396 A1 WO2012134396 A1 WO 2012134396A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- semantic
- term
- vector
- terms
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the method further comprises: receiving a search query; and, retrieving the document based on a comparison using the document semantic context inference vector and the search-query.
- the search-query semantic context inference vector is calculated by summing together the search-query semantic inference vectors.
- a re-weighted indexing vector for a spoken document or a query is generated using (e.g. by summing together) all the semantic inference vectors related to the terms in the spoken document or the search query. Accordingly, the semantic concepts in the spoken document or search query are reinforced by promoting the terms which are likely to be valid and demoting the terms which are likely to be invalid.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Divers modes de réalisation concernent un procédé servant à indexer un document pour la récupération de document. Le document peut consister : à générer un vecteur de document indiquant si chaque élément d'une pluralité de termes est présent dans le document ; à calculer un vecteur d'inférence sémantique de document pour chaque élément de la pluralité de termes présent dans le document au moyen du vecteur de document et d'une matrice de relation sémantique, la matrice de relation sémantique identifiant les relations sémantiques entre différents termes de la pluralité de termes ; et à indexer le document au moyen d'un vecteur d'inférence de contexte sémantique de document calculé sur la base des vecteurs d'inférence sémantiques de document. Divers modes de réalisation concernent un appareil correspondant et un support lisible par ordinateur.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201280024604.9A CN103548015B (zh) | 2011-03-28 | 2012-03-28 | 索引用于文件检索的文件的方法及装置 |
| SG2013072921A SG193995A1 (en) | 2011-03-28 | 2012-03-28 | A method, an apparatus and a computer-readable medium for indexing a document for document retrieval |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SG2011021763 | 2011-03-28 | ||
| SG201102176-3 | 2011-03-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012134396A1 true WO2012134396A1 (fr) | 2012-10-04 |
Family
ID=59011936
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SG2012/000106 Ceased WO2012134396A1 (fr) | 2011-03-28 | 2012-03-28 | Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document |
Country Status (3)
| Country | Link |
|---|---|
| CN (1) | CN103548015B (fr) |
| SG (1) | SG193995A1 (fr) |
| WO (1) | WO2012134396A1 (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111524502A (zh) * | 2020-05-27 | 2020-08-11 | 科大讯飞股份有限公司 | 一种语种检测方法、装置、设备及存储介质 |
| US11397776B2 (en) | 2019-01-31 | 2022-07-26 | At&T Intellectual Property I, L.P. | Systems and methods for automated information retrieval |
| WO2025144815A1 (fr) * | 2023-12-26 | 2025-07-03 | Knostic, Inc. | Assistant d'ia qui donne des réponses modérées à des interrogations |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102192678B1 (ko) * | 2015-10-16 | 2020-12-17 | 삼성전자주식회사 | 음향 모델 입력 데이터의 정규화 장치 및 방법과, 음성 인식 장치 |
| CN108334611B (zh) * | 2018-02-07 | 2020-04-24 | 清华大学 | 基于非负张量分解的时序可视媒体语义索引精度增强方法 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101329867A (zh) * | 2007-06-21 | 2008-12-24 | 西门子(中国)有限公司 | 一种语音点播方法及装置 |
| CN101593519B (zh) * | 2008-05-29 | 2012-09-19 | 夏普株式会社 | 检测语音关键词的方法和设备及检索方法和系统 |
| CN101364222B (zh) * | 2008-09-02 | 2010-07-28 | 浙江大学 | 一种两阶段的音频检索方法 |
| CN102023995B (zh) * | 2009-09-22 | 2013-01-30 | 株式会社理光 | 语音检索设备和语音检索方法 |
| CN101833986B (zh) * | 2010-05-20 | 2011-10-05 | 哈尔滨工业大学 | 一种三级音频索引的创建方法及音频检索方法 |
-
2012
- 2012-03-28 WO PCT/SG2012/000106 patent/WO2012134396A1/fr not_active Ceased
- 2012-03-28 CN CN201280024604.9A patent/CN103548015B/zh not_active Expired - Fee Related
- 2012-03-28 SG SG2013072921A patent/SG193995A1/en unknown
Non-Patent Citations (3)
| Title |
|---|
| KONTOSTATHIS, A. ET AL.: "Detecting Patterns in the LSI Term-Term Matrix", PROCEEDINGS ICDM'02 WORKSHOP ON FOUNDATIONS OF DATA MINING AND DISCOVERY, 2002 * |
| MILL, W. ET AL.: "Analysis of the values in the LSI Term-Term Matrix", LSI TERM-TERM MATRIX, 2004 * |
| MOUSSA, M.: "A Comparative Study on Semantic Techniques and Their Application in Information Retrieval", CAPITA SELECTA AND RESEARCH TOPICS ASSIGNMENT, 2008, UNIVERSITEIT TWENTE * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11397776B2 (en) | 2019-01-31 | 2022-07-26 | At&T Intellectual Property I, L.P. | Systems and methods for automated information retrieval |
| US12067061B2 (en) | 2019-01-31 | 2024-08-20 | At&T Intellectual Property I, L.P. | Systems and methods for automated information retrieval |
| CN111524502A (zh) * | 2020-05-27 | 2020-08-11 | 科大讯飞股份有限公司 | 一种语种检测方法、装置、设备及存储介质 |
| CN111524502B (zh) * | 2020-05-27 | 2024-04-30 | 科大讯飞股份有限公司 | 一种语种检测方法、装置、设备及存储介质 |
| WO2025144815A1 (fr) * | 2023-12-26 | 2025-07-03 | Knostic, Inc. | Assistant d'ia qui donne des réponses modérées à des interrogations |
Also Published As
| Publication number | Publication date |
|---|---|
| SG193995A1 (en) | 2013-11-29 |
| CN103548015B (zh) | 2017-05-17 |
| CN103548015A (zh) | 2014-01-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6066354B2 (ja) | 信頼度計算の方法及び装置 | |
| Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
| US8650031B1 (en) | Accuracy improvement of spoken queries transcription using co-occurrence information | |
| US20110224982A1 (en) | Automatic speech recognition based upon information retrieval methods | |
| CN111916070A (zh) | 经由深度前馈神经网络使用自然语言理解相关知识的语音识别 | |
| CN114154487B (zh) | 文本自动纠错方法、装置、电子设备及存储介质 | |
| WO2003010754A1 (fr) | Systeme de recherche a entree vocale | |
| WO2016151700A1 (fr) | Dispositif, procédé et programme de compréhension d'intention | |
| JP5524138B2 (ja) | 同義語辞書生成装置、その方法、及びプログラム | |
| Staš et al. | Classification of heterogeneous text data for robust domain-specific language modeling | |
| Yamamoto et al. | Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition. | |
| Chen et al. | Integrating natural language processing with image document analysis: what we learned from two real-world applications | |
| WO2012134396A1 (fr) | Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document | |
| US20070005345A1 (en) | Generating Chinese language couplets | |
| Bigot et al. | Person name recognition in ASR outputs using continuous context models | |
| Juhár et al. | Recent progress in development of language model for Slovak large vocabulary continuous speech recognition | |
| Chien | Association pattern language modeling | |
| Siivola et al. | Large vocabulary statistical language modeling for continuous speech recognition in finnish. | |
| Zhang et al. | Active learning with semi-automatic annotation for extractive speech summarization | |
| Lee et al. | Improved spoken term detection using support vector machines based on lattice context consistency | |
| JP4653598B2 (ja) | 構文・意味解析装置、音声認識装置、及び構文・意味解析プログラム | |
| Huang et al. | Speech Indexing Using Semantic Context Inference. | |
| CN114219012A (zh) | 样本数据处理的方法、装置、计算机程序产品和存储介质 | |
| Masumura et al. | Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition. | |
| Staš et al. | Semantic indexing and document retrieval for personalized language modeling |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12763481 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 12763481 Country of ref document: EP Kind code of ref document: A1 |