WO2021212817A1 - 用于语音对话的纠正方法及装置 - Google Patents
用于语音对话的纠正方法及装置 Download PDFInfo
- Publication number
- WO2021212817A1 WO2021212817A1 PCT/CN2020/129337 CN2020129337W WO2021212817A1 WO 2021212817 A1 WO2021212817 A1 WO 2021212817A1 CN 2020129337 W CN2020129337 W CN 2020129337W WO 2021212817 A1 WO2021212817 A1 WO 2021212817A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- text information
- semantic keyword
- semantic
- skill
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the present invention relates to the field of intelligent speech, in particular to a correction method and device for speech dialogue.
- Smart devices perform semantic understanding and recognition of the voice input by the user, thereby obtaining the user's intention, and then feedback the corresponding operation to the user. If an error is identified, the user can usually correct it, for example:
- the chat bot replied: Find out if Lin Yongkai's phone number is dialing for you.
- the recognition result of the user's voice it is a model of a model.
- the chat bot replied: Find out if Lin Yongkai's phone number is dialing for you.
- the recognition result of the user's voice it is the station of the station.
- an embodiment of the present invention provides a correction method for a voice conversation, including:
- the first text information includes: a first semantic keyword determined by a plurality of candidate words;
- an embodiment of the present invention provides a correction device for voice dialogue, including:
- a candidate word feedback program module which is used to feed back the multiple candidate words to the user in response to the user's selection of the first semantic keyword in the first result
- an embodiment of the present invention provides a storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the method for correcting a voice conversation in any embodiment of the present invention are implemented.
- Figure 2 is a software implementation flow chart of a method for correcting voice conversations according to an embodiment of the present invention
- Fig. 3 is a schematic structural diagram of a correction device for voice dialogue provided by an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of an embodiment of the electronic device of the present invention.
- the embodiment of the present invention provides a correction method for voice dialogue, which is applied to an electronic device.
- the electronic device may be a smart TV, a smart phone, a smart speaker, a smart car device, a smart screen, etc., which is not limited in the present invention.
- Fig. 1 is a flowchart of a method for correcting voice conversations according to an embodiment of the present invention. The method is applied to an electronic device, and the method includes the following steps:
- the electronic device recognizes the first text information of the dialogue voice input by the user, where the first text information includes: a first semantic keyword determined by a plurality of candidate words;
- S12 The electronic device feeds back the first result with the first semantic keyword to the user based on the first text information
- S14 The electronic device receives the second semantic keyword input by the user, corrects the first text information based on the second semantic keyword, and determines the corrected second text information based on the second text information Feedback the second result with the second semantic keyword to the user.
- the microphone array in order to be able to distinguish the dialogue voice of other users, taking into account the need to support multi-modality, in the first round of the multi-modal dialogue system, the microphone array usually converts the recognized audio into text or uses the keyboard to directly input text As input.
- the microphone array is composed of a certain number of microphones, which are used to record voice and audio signals.
- the microphone array can identify the direction of the sound source, and at the same time, it can also remove the background sound to a certain extent, thereby improving the automatic voice Accuracy of recognition.
- the automatic speech recognition service By using the microphone array to capture the audio stream, and using the network to transmit to the cloud to the automatic speech recognition service to obtain the text information corresponding to the voice, the automatic speech recognition service based on the acoustic model and language model in the home environment can also improve the accuracy of the text recognition Spend.
- the recognized text will be directly sent to the semantic analysis module in the cloud, and the semantic analysis module can parse a sentence of text into a semantic entity.
- the user input is the voice "I want to see the heartbeat”.
- speech recognition there are a certain amount of multiple candidate words for each word to adjust the sentence.
- the voice of "pounded heartbeat” due to the user's pronunciation problem or language model, it is easy to identify the two words “pounded heartbeat” and “pumped star motion”, and these two words It happens to be in the movie semantic slot.
- the system will select a candidate word with high confidence as the keyword of the sentence. Furthermore, this makes the voice input by the user "I want to watch the heartbeat", and then, the recognition result is "I want to see the star motion".
- step S12 "I want to watch the star animation” determined by the recognition in step S11, give feedback to the user, for example, “find the following content that matches the star animation for you, which one do you want to watch” .
- the first semantic keyword determined from a plurality of candidate words includes:
- the feeding back the multiple candidate words to the user includes:
- the design of the candidate word window includes, but is not limited to, lists and grids.
- step S14 the user clicks to select "Popular Heartbeat" from the candidate list, and the smart device corrects it, and finds the following content that meets the pounding heartbeat for you again. May I ask which one you want to look at. Then there is the normal voice dialogue process: User: the first one. Smart device: Show you the heartbeat of Director Rob Reiner, the specific process is shown in Figure 2.
- the second semantic keyword input by the user includes:
- the corrected dialogue voice When the user inputs the corrected dialogue voice, the corrected dialogue voice is recognized, and the second semantic keyword is determined according to the recognition result.
- the second semantic keyword is determined according to the corrected text.
- Method 1 Correct the results of this round of speech recognition directly through keyboard or virtual keyboard input.
- the second semantic keyword input by the user further includes:
- the method further includes: recording each round of feedback to the user with multiple rounds of results with semantic keywords;
- the user is constantly communicating with the smart device, and the smart device records the conversation record with the user through the screen, for example:
- the first result of the text with the first skill is fed back to the user through the first skill.
- the second skill is re-determined according to the first text information, and the text with the second skill is fed back to the user through the second skill The second result of; or
- the user selects the word video in the prompt language returned by the dialogue system.
- the user selects an audio book by clicking.
- the error correction function provided can not only deal with true ambiguity in the dialogue, but also deal with semantic parsing errors, which improves the dialogue system's ability to handle errors in response to errors.
- Figure 3 is a schematic structural diagram of a voice dialogue correction device provided by an embodiment of the present invention.
- the device can execute the voice dialogue correction method described in any of the above embodiments and is configured in a terminal .
- the correction device for speech dialogue includes: a speech recognition program module 11, a result feedback program module 12, a candidate word feedback program module 13, and a correction program module 14.
- the voice recognition program module 11 is used to recognize the first text information of the dialogue voice input by the user, where the first text information includes: a first semantic keyword determined by a plurality of candidate words; the result feedback program module 12 uses In order to feed back the first result with the first semantic keyword to the user based on the first text information; the candidate word feedback program module 13 is used to respond to the user’s comments on the first semantic in the first result The selection of keywords, the multiple candidate words are fed back to the user; the correction program module 14 is used to receive the second semantic keyword input by the user, and compare the first text based on the second semantic keyword The information is corrected, the corrected second text information is determined, and the second result with the second semantic keyword is fed back to the user based on the second text information.
- speech recognition program module is used for:
- the candidate word feedback program module is used for:
- the multiple candidate words are sorted according to the recognition confidence, and the list window of the multiple candidate words is fed back to the user.
- the embodiment of the present invention also provides a non-volatile computer storage medium, the computer storage medium stores computer-executable instructions, and the computer-executable instructions can execute the correction method for voice dialogue in any of the foregoing method embodiments;
- the non-volatile computer storage medium of the present invention stores computer executable instructions, and the computer executable instructions are set as:
- the first text information includes: a first semantic keyword determined by a plurality of candidate words;
- Receive the second semantic keyword input by the user correct the first text information based on the second semantic keyword, determine the corrected second text information, and report to the user based on the second text information Feedback the second result with the second semantic keyword.
- non-volatile computer-readable storage medium it can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention.
- One or more program instructions are stored in a non-volatile computer-readable storage medium, and when executed by a processor, execute the correction method for voice dialogue in any of the foregoing method embodiments.
- the non-volatile computer-readable storage medium may include a storage program area and a storage data area.
- the storage program area may store an operating system and an application program required by at least one function; Data etc.
- the non-volatile computer-readable storage medium may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
- the non-volatile computer-readable storage medium may optionally include memories remotely provided with respect to the processor, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that can be executed by the at least one processor , The instruction is executed by the at least one processor, so that the at least one processor can execute:
- the first text information includes: a first semantic keyword determined by a plurality of candidate words;
- Receive the second semantic keyword input by the user correct the first text information based on the second semantic keyword, determine the corrected second text information, and report to the user based on the second text information Feedback the second result with the second semantic keyword.
- the first semantic keyword determined from a plurality of candidate words includes: selecting a candidate word with the highest recognition confidence from the plurality of candidate words and determining it as the first semantic keyword;
- the feeding back the plurality of candidate words to the user includes: sorting the plurality of candidate words according to recognition confidence, and feeding back a list window of the plurality of candidate words to the user.
- the second semantic keyword input by the user includes:
- the second semantic keyword is determined according to the corrected text.
- the receiving the second semantic keyword input by the user further includes:
- the corrected text in the image information is recognized, and the second semantic keyword is determined according to the corrected text.
- the processor is further configured to: record each round of feedback to the user with multiple rounds of results with semantic keywords; in response to the user’s selection of semantic keywords in any round of results, The multiple candidate words corresponding to the semantic keywords are fed back to the user.
- the feeding back the first result with the first semantic keyword to the user based on the first text information includes:
- the first result of the text with the first skill is fed back to the user through the first skill.
- the processor is further configured to: in response to the user's selection of the text of the first skill in the first result, feedback the multiple candidate skills to the user;
- the second skill is re-determined according to the first text information, and the text with the second skill is fed back to the user through the second skill The second result of; or
- the corresponding third skill is re-determined according to the first text information, and the third skill is used to feed back the first text with the third skill to the user.
- Fig. 4 is a schematic diagram of the hardware structure of an electronic device for performing a correction method for voice dialogue according to another embodiment of the present invention. As shown in Fig. 4, the device includes:
- One or more processors 410 and a memory 420 are taken as an example in FIG. 4.
- the device for performing the correction method for the voice dialogue may further include: an input device 430 and an output device 440.
- the processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or in other ways. In FIG. 4, the connection by a bus is taken as an example.
- the memory 420 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the method for correcting voice conversations in the embodiments of the present invention Corresponding program instructions/modules.
- the processor 410 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions, and modules stored in the memory 420, that is, implements the correction method for the voice dialogue in the foregoing method embodiment.
- the memory 420 may include a storage program area and a storage data area.
- the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the correction device for voice dialogue. Wait.
- the memory 420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
- the memory 420 may optionally include a memory remotely provided with respect to the processor 410, and these remote memories may be connected to a correction device for voice dialogue via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the input device 430 may receive inputted numeric or character information, and generate signals related to user settings and function control of the correction device for voice dialogue.
- the output device 440 may include a display device such as a display screen.
- the one or more modules are stored in the memory 420, and when executed by the one or more processors 410, the correction for the voice dialogue in any of the foregoing method embodiments is performed.
- the electronic devices in the embodiments of the present invention exist in various forms, including but not limited to:
- Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
- Such terminals include: smart phones, multimedia phones, functional phones, and low-end phones.
- Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
- Such terminals include: PDA, MID and UMPC devices, such as tablet computers.
- Portable entertainment equipment This type of equipment can display and play multimedia content. Such devices include: audio, video players, handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
- the device embodiments described above are merely illustrative.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
- Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
- each implementation manner can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
- the above technical solution essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic A disc, an optical disc, etc., include several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (10)
- 一种用于语音对话的纠正方法,用于电子设备,所述方法包括:所述电子设备识别用户输入的对话语音的第一文本信息,其中,所述第一文本信息包括:由多个候选词确定的第一语义关键字;所述电子设备基于所述第一文本信息向所述用户反馈带有所述第一语义关键字的第一结果;响应于所述用户对所述第一结果内第一语义关键字的选择,所述电子设备将所述多个候选词向所述用户反馈;所述电子设备接收所述用户输入的第二语义关键字,基于所述第二语义关键字对所述第一文本信息进行纠正,确定纠正后的第二文本信息,基于所述第二文本信息向所述用户反馈带有所述第二语义关键字的第二结果。
- 根据权利要求1所述的方法,其中,所述由多个候选词中确定的第一语义关键字包括:从所述多个候选词中选取识别置信度最高的候选词确定为第一语义关键字;所述将所述多个候选词向所述用户反馈包括:将所述多个候选词按照识别置信度排序,向所述用户反馈所述多个候选词的列表窗口。
- 根据权利要求1所述的方法,其中,所述接收所述用户输入的第二语义关键字包括:当所述用户从所述多个候选词中选择词语时,将所选词语确定为所述第二语义关键字;当所述用户输入纠正对话语音时,识别所述纠正对话语音,根据识别结果确定所述第二语义关键字;当所述用户输入纠正文本时,根据所述纠正文本确定所述第二语义关键字。
- 根据权利要求3所述的方法,其中,所述接收所述用户输入的第二语义关键字还包括:当所述用户输入图像信息时,识别所述图像信息内的纠正文本,根据所述纠正文本确定所述第二语义关键字。
- 根据权利要求1所述的方法,其中,所述方法还包括:记录每一轮向所述用户反馈带有语义关键字的多轮结果;响应于所述用户对任一轮结果内语义关键字的选择,将与所述语义关键字对应的多个候选词向所述用户反馈。
- 根据权利要求1所述的方法,其中,所述基于所述第一文本信息向所述用户反馈带有所述第一语义关键字的第一结果包括:基于所述第一文本信息确定对应的第一技能,当所述第一文本信息命中多个候选技能时,选取预设优先级最高的技能确定为所述第一技能;通过所述第一技能向所述用户反馈带有所述第一技能的文本的第一结果。
- 根据权利要求6所述的方法,其中,所述方法还包括:响应于所述用户对所述第一结果内第一技能的文本的选择,将所述多个候选技能向所述用户反馈;当所述用户的输入包含第二技能的语音对话时,根据所述第一文本信息重新确定所述第二技能,通过所述第二技能向所述用户反馈带有所述第二技能的文本的第二结果;或当用户输入包含第一技能的否定语气对话时,根据所述第一文本信息重新确定对应的第三技能,通过所述第三技能向所述用户反馈带有所述第三技能的文本的第三结果。
- 一种用于语音对话的纠正装置,包括:语音识别程序模块,用于识别用户输入的对话语音的第一文本信息,其中,所述第一文本信息包括:由多个候选词确定的第一语义关键字;结果反馈程序模块,用于基于所述第一文本信息向所述用户反馈带有所述第一语义关键字的第一结果;候选词反馈程序模块,用于响应于所述用户对所述第一结果内第一语义关键字的选择,将所述多个候选词向所述用户反馈;纠正程序模块,用于接收所述用户输入的第二语义关键字,基于所述第二语义关键字对所述第一文本信息进行纠正,确定纠正后的第二文本信息,基于所述第二文本信息向所述用户反馈带有所述第二语义关键字的第二结果。
- 一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述方法的步骤。
- 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-7中任一项所述方法的步骤。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022563122A JP7413568B2 (ja) | 2020-04-20 | 2020-11-17 | 音声対話の訂正方法及び装置 |
| EP20932568.7A EP4141865B1 (en) | 2020-04-20 | 2020-11-17 | Method and apparatus for correcting voice dialogue |
| US17/996,643 US11804217B2 (en) | 2020-04-20 | 2020-11-17 | Method and apparatus for correcting voice dialogue |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010311357.7 | 2020-04-20 | ||
| CN202010311357.7A CN111540356B (zh) | 2020-04-20 | 2020-04-20 | 用于语音对话的纠正方法及系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021212817A1 true WO2021212817A1 (zh) | 2021-10-28 |
Family
ID=71978839
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/129337 Ceased WO2021212817A1 (zh) | 2020-04-20 | 2020-11-17 | 用于语音对话的纠正方法及装置 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11804217B2 (zh) |
| EP (1) | EP4141865B1 (zh) |
| JP (1) | JP7413568B2 (zh) |
| CN (1) | CN111540356B (zh) |
| WO (1) | WO2021212817A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114023302A (zh) * | 2022-01-10 | 2022-02-08 | 北京中电慧声科技有限公司 | 文本语音处理装置及文本读音处理方法 |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111540356B (zh) | 2020-04-20 | 2022-05-17 | 思必驰科技股份有限公司 | 用于语音对话的纠正方法及系统 |
| CA3125124A1 (en) * | 2020-07-24 | 2022-01-24 | Comcast Cable Communications, Llc | Systems and methods for training voice query models |
| CN114155846B (zh) * | 2020-08-18 | 2025-08-05 | 海信视像科技股份有限公司 | 一种语义槽提取方法及显示设备 |
| CN112417867B (zh) * | 2020-12-07 | 2022-10-18 | 四川长虹电器股份有限公司 | 一种语音识别后的视频片名纠错方法及系统 |
| CN112700768B (zh) * | 2020-12-16 | 2024-04-26 | 科大讯飞股份有限公司 | 语音识别方法以及电子设备、存储装置 |
| CN112684913B (zh) * | 2020-12-30 | 2023-07-14 | 维沃移动通信有限公司 | 信息修正方法、装置及电子设备 |
| CN115408413A (zh) * | 2021-05-28 | 2022-11-29 | 博泰车联网科技(上海)股份有限公司 | 与用户交互的方法及计算机存储介质 |
| CN115457961B (zh) * | 2022-11-10 | 2023-04-07 | 广州小鹏汽车科技有限公司 | 语音交互方法、车辆、服务器、系统及存储介质 |
| CN118280369A (zh) * | 2022-12-30 | 2024-07-02 | 腾讯科技(深圳)有限公司 | 语音转化方法、装置、设备及可读存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160379513A1 (en) * | 2015-06-26 | 2016-12-29 | John Nicholas DuQuette | Dynamic Feedback and Scoring of Transcription of a Dictation |
| CN107093423A (zh) * | 2017-05-27 | 2017-08-25 | 努比亚技术有限公司 | 一种语音输入修正方法、装置及计算机可读存储介质 |
| CN108091328A (zh) * | 2017-11-20 | 2018-05-29 | 北京百度网讯科技有限公司 | 基于人工智能的语音识别纠错方法、装置及可读介质 |
| CN108121455A (zh) * | 2016-11-29 | 2018-06-05 | 渡鸦科技(北京)有限责任公司 | 识别纠正方法及装置 |
| CN109215661A (zh) * | 2018-08-30 | 2019-01-15 | 上海与德通讯技术有限公司 | 语音转文字方法、装置设备及存储介质 |
| CN111540356A (zh) * | 2020-04-20 | 2020-08-14 | 苏州思必驰信息科技有限公司 | 用于语音对话的纠正方法及系统 |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6789231B1 (en) * | 1999-10-05 | 2004-09-07 | Microsoft Corporation | Method and system for providing alternatives for text derived from stochastic input sources |
| US7149970B1 (en) * | 2000-06-23 | 2006-12-12 | Microsoft Corporation | Method and system for filtering and selecting from a candidate list generated by a stochastic input method |
| JP2005043461A (ja) * | 2003-07-23 | 2005-02-17 | Canon Inc | 音声認識方法及び音声認識装置 |
| JP2005275228A (ja) * | 2004-03-26 | 2005-10-06 | Equos Research Co Ltd | ナビゲーション装置 |
| US8972268B2 (en) * | 2008-04-15 | 2015-03-03 | Facebook, Inc. | Enhanced speech-to-speech translation system and methods for adding a new word |
| US10937415B2 (en) * | 2016-06-15 | 2021-03-02 | Sony Corporation | Information processing device and information processing method for presenting character information obtained by converting a voice |
| JP2018097029A (ja) * | 2016-12-08 | 2018-06-21 | 三菱電機株式会社 | 音声認識装置および音声認識方法 |
| JP6416309B1 (ja) * | 2017-04-12 | 2018-10-31 | 株式会社アドバンスト・メディア | 端末装置及びプログラム |
| US10861446B2 (en) * | 2018-12-10 | 2020-12-08 | Amazon Technologies, Inc. | Generating input alternatives |
| JP2020187163A (ja) * | 2019-05-10 | 2020-11-19 | 本田技研工業株式会社 | 音声操作システム、音声操作制御方法、及び音声操作制御プログラム |
-
2020
- 2020-04-20 CN CN202010311357.7A patent/CN111540356B/zh active Active
- 2020-11-17 EP EP20932568.7A patent/EP4141865B1/en active Active
- 2020-11-17 US US17/996,643 patent/US11804217B2/en active Active
- 2020-11-17 JP JP2022563122A patent/JP7413568B2/ja active Active
- 2020-11-17 WO PCT/CN2020/129337 patent/WO2021212817A1/zh not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160379513A1 (en) * | 2015-06-26 | 2016-12-29 | John Nicholas DuQuette | Dynamic Feedback and Scoring of Transcription of a Dictation |
| CN108121455A (zh) * | 2016-11-29 | 2018-06-05 | 渡鸦科技(北京)有限责任公司 | 识别纠正方法及装置 |
| CN107093423A (zh) * | 2017-05-27 | 2017-08-25 | 努比亚技术有限公司 | 一种语音输入修正方法、装置及计算机可读存储介质 |
| CN108091328A (zh) * | 2017-11-20 | 2018-05-29 | 北京百度网讯科技有限公司 | 基于人工智能的语音识别纠错方法、装置及可读介质 |
| CN109215661A (zh) * | 2018-08-30 | 2019-01-15 | 上海与德通讯技术有限公司 | 语音转文字方法、装置设备及存储介质 |
| CN111540356A (zh) * | 2020-04-20 | 2020-08-14 | 苏州思必驰信息科技有限公司 | 用于语音对话的纠正方法及系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4141865A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114023302A (zh) * | 2022-01-10 | 2022-02-08 | 北京中电慧声科技有限公司 | 文本语音处理装置及文本读音处理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4141865A1 (en) | 2023-03-01 |
| CN111540356A (zh) | 2020-08-14 |
| JP7413568B2 (ja) | 2024-01-15 |
| EP4141865B1 (en) | 2025-12-24 |
| US11804217B2 (en) | 2023-10-31 |
| JP2023515897A (ja) | 2023-04-14 |
| EP4141865A4 (en) | 2023-11-01 |
| US20230223015A1 (en) | 2023-07-13 |
| CN111540356B (zh) | 2022-05-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021212817A1 (zh) | 用于语音对话的纠正方法及装置 | |
| US12475170B2 (en) | Voice-based auto-completions and auto-responses for assistant systems | |
| US12354596B1 (en) | Centralized feedback service for performance of virtual assistant | |
| US10217463B2 (en) | Hybridized client-server speech recognition | |
| JP4854259B2 (ja) | 音声コマンドを明瞭化する集中化された方法およびシステム | |
| CN107430616B (zh) | 语音查询的交互式再形成 | |
| US11093110B1 (en) | Messaging feedback mechanism | |
| US10860289B2 (en) | Flexible voice-based information retrieval system for virtual assistant | |
| US10453477B2 (en) | Method and computer system for performing audio search on a social networking platform | |
| US20230169272A1 (en) | Communication framework for automated content generation and adaptive delivery | |
| CN108962233A (zh) | 用于语音对话平台的语音对话处理方法及系统 | |
| US20230186941A1 (en) | Voice identification for optimizing voice search results | |
| US12216963B1 (en) | Computer system-based pausing and resuming of natural language conversations | |
| WO2022143349A1 (zh) | 一种确定用户意图的方法及装置 | |
| CN117198289B (zh) | 语音交互方法、装置、设备、介质及产品 | |
| WO2021077528A1 (zh) | 人机对话打断方法 | |
| CN108305618A (zh) | 语音获取及搜索方法、智能笔、搜索终端及存储介质 | |
| US12020683B2 (en) | Real-time name mispronunciation detection | |
| CN109190116B (zh) | 语义解析方法、系统、电子设备及存储介质 | |
| CN115906808A (zh) | 发言回应度确认方法、装置、介质和计算设备 | |
| KR20190094080A (ko) | 사용자간 대화 세션에 대한 모니터링에 기초하여 능동적으로 주문 또는 예약 서비스를 제공하는 대화형 ai 에이전트 시스템, 방법 및 컴퓨터 판독가능 기록 매체 | |
| WO2021098175A1 (zh) | 录制语音包功能的引导方法、装置、设备和计算机存储介质 | |
| CN116153310A (zh) | 语音对话交互方法、系统、电子设备和存储介质 | |
| CN111968630A (zh) | 信息处理方法、装置和电子设备 | |
| CN116070621A (zh) | 语音识别结果的纠错方法、装置、电子设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20932568 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022563122 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020932568 Country of ref document: EP Effective date: 20221121 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020932568 Country of ref document: EP |