WO2012134396A1 - Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document - Google Patents

Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document Download PDF

Info

Publication number
WO2012134396A1
WO2012134396A1 PCT/SG2012/000106 SG2012000106W WO2012134396A1 WO 2012134396 A1 WO2012134396 A1 WO 2012134396A1 SG 2012000106 W SG2012000106 W SG 2012000106W WO 2012134396 A1 WO2012134396 A1 WO 2012134396A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
semantic
term
vector
terms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SG2012/000106
Other languages
English (en)
Inventor
Chien-Lin Huang
Bin Ma
Haizhou Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agency for Science Technology and Research Singapore
Original Assignee
Agency for Science Technology and Research Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency for Science Technology and Research Singapore filed Critical Agency for Science Technology and Research Singapore
Priority to CN201280024604.9A priority Critical patent/CN103548015B/zh
Priority to SG2013072921A priority patent/SG193995A1/en
Publication of WO2012134396A1 publication Critical patent/WO2012134396A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the method further comprises: receiving a search query; and, retrieving the document based on a comparison using the document semantic context inference vector and the search-query.
  • the search-query semantic context inference vector is calculated by summing together the search-query semantic inference vectors.
  • a re-weighted indexing vector for a spoken document or a query is generated using (e.g. by summing together) all the semantic inference vectors related to the terms in the spoken document or the search query. Accordingly, the semantic concepts in the spoken document or search query are reinforced by promoting the terms which are likely to be valid and demoting the terms which are likely to be invalid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Divers modes de réalisation concernent un procédé servant à indexer un document pour la récupération de document. Le document peut consister : à générer un vecteur de document indiquant si chaque élément d'une pluralité de termes est présent dans le document ; à calculer un vecteur d'inférence sémantique de document pour chaque élément de la pluralité de termes présent dans le document au moyen du vecteur de document et d'une matrice de relation sémantique, la matrice de relation sémantique identifiant les relations sémantiques entre différents termes de la pluralité de termes ; et à indexer le document au moyen d'un vecteur d'inférence de contexte sémantique de document calculé sur la base des vecteurs d'inférence sémantiques de document. Divers modes de réalisation concernent un appareil correspondant et un support lisible par ordinateur.
PCT/SG2012/000106 2011-03-28 2012-03-28 Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document Ceased WO2012134396A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280024604.9A CN103548015B (zh) 2011-03-28 2012-03-28 索引用于文件检索的文件的方法及装置
SG2013072921A SG193995A1 (en) 2011-03-28 2012-03-28 A method, an apparatus and a computer-readable medium for indexing a document for document retrieval

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG2011021763 2011-03-28
SG201102176-3 2011-03-28

Publications (1)

Publication Number Publication Date
WO2012134396A1 true WO2012134396A1 (fr) 2012-10-04

Family

ID=59011936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2012/000106 Ceased WO2012134396A1 (fr) 2011-03-28 2012-03-28 Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document

Country Status (3)

Country Link
CN (1) CN103548015B (fr)
SG (1) SG193995A1 (fr)
WO (1) WO2012134396A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524502A (zh) * 2020-05-27 2020-08-11 科大讯飞股份有限公司 一种语种检测方法、装置、设备及存储介质
US11397776B2 (en) 2019-01-31 2022-07-26 At&T Intellectual Property I, L.P. Systems and methods for automated information retrieval
WO2025144815A1 (fr) * 2023-12-26 2025-07-03 Knostic, Inc. Assistant d'ia qui donne des réponses modérées à des interrogations

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102192678B1 (ko) * 2015-10-16 2020-12-17 삼성전자주식회사 음향 모델 입력 데이터의 정규화 장치 및 방법과, 음성 인식 장치
CN108334611B (zh) * 2018-02-07 2020-04-24 清华大学 基于非负张量分解的时序可视媒体语义索引精度增强方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329867A (zh) * 2007-06-21 2008-12-24 西门子(中国)有限公司 一种语音点播方法及装置
CN101593519B (zh) * 2008-05-29 2012-09-19 夏普株式会社 检测语音关键词的方法和设备及检索方法和系统
CN101364222B (zh) * 2008-09-02 2010-07-28 浙江大学 一种两阶段的音频检索方法
CN102023995B (zh) * 2009-09-22 2013-01-30 株式会社理光 语音检索设备和语音检索方法
CN101833986B (zh) * 2010-05-20 2011-10-05 哈尔滨工业大学 一种三级音频索引的创建方法及音频检索方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KONTOSTATHIS, A. ET AL.: "Detecting Patterns in the LSI Term-Term Matrix", PROCEEDINGS ICDM'02 WORKSHOP ON FOUNDATIONS OF DATA MINING AND DISCOVERY, 2002 *
MILL, W. ET AL.: "Analysis of the values in the LSI Term-Term Matrix", LSI TERM-TERM MATRIX, 2004 *
MOUSSA, M.: "A Comparative Study on Semantic Techniques and Their Application in Information Retrieval", CAPITA SELECTA AND RESEARCH TOPICS ASSIGNMENT, 2008, UNIVERSITEIT TWENTE *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11397776B2 (en) 2019-01-31 2022-07-26 At&T Intellectual Property I, L.P. Systems and methods for automated information retrieval
US12067061B2 (en) 2019-01-31 2024-08-20 At&T Intellectual Property I, L.P. Systems and methods for automated information retrieval
CN111524502A (zh) * 2020-05-27 2020-08-11 科大讯飞股份有限公司 一种语种检测方法、装置、设备及存储介质
CN111524502B (zh) * 2020-05-27 2024-04-30 科大讯飞股份有限公司 一种语种检测方法、装置、设备及存储介质
WO2025144815A1 (fr) * 2023-12-26 2025-07-03 Knostic, Inc. Assistant d'ia qui donne des réponses modérées à des interrogations

Also Published As

Publication number Publication date
SG193995A1 (en) 2013-11-29
CN103548015B (zh) 2017-05-17
CN103548015A (zh) 2014-01-29

Similar Documents

Publication Publication Date Title
JP6066354B2 (ja) 信頼度計算の方法及び装置
Kim et al. Two-stage multi-intent detection for spoken language understanding
US8650031B1 (en) Accuracy improvement of spoken queries transcription using co-occurrence information
US20110224982A1 (en) Automatic speech recognition based upon information retrieval methods
CN111916070A (zh) 经由深度前馈神经网络使用自然语言理解相关知识的语音识别
CN114154487B (zh) 文本自动纠错方法、装置、电子设备及存储介质
WO2003010754A1 (fr) Systeme de recherche a entree vocale
WO2016151700A1 (fr) Dispositif, procédé et programme de compréhension d'intention
JP5524138B2 (ja) 同義語辞書生成装置、その方法、及びプログラム
Staš et al. Classification of heterogeneous text data for robust domain-specific language modeling
Yamamoto et al. Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition.
Chen et al. Integrating natural language processing with image document analysis: what we learned from two real-world applications
WO2012134396A1 (fr) Procédé, appareil et support lisible par un ordinateur servant à indexer un document pour la récupération de document
US20070005345A1 (en) Generating Chinese language couplets
Bigot et al. Person name recognition in ASR outputs using continuous context models
Juhár et al. Recent progress in development of language model for Slovak large vocabulary continuous speech recognition
Chien Association pattern language modeling
Siivola et al. Large vocabulary statistical language modeling for continuous speech recognition in finnish.
Zhang et al. Active learning with semi-automatic annotation for extractive speech summarization
Lee et al. Improved spoken term detection using support vector machines based on lattice context consistency
JP4653598B2 (ja) 構文・意味解析装置、音声認識装置、及び構文・意味解析プログラム
Huang et al. Speech Indexing Using Semantic Context Inference.
CN114219012A (zh) 样本数据处理的方法、装置、计算机程序产品和存储介质
Masumura et al. Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition.
Staš et al. Semantic indexing and document retrieval for personalized language modeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12763481

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12763481

Country of ref document: EP

Kind code of ref document: A1