WO2020029673A1 - 语音处理方法、装置、存储介质及电子设备 - Google Patents

语音处理方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2020029673A1
WO2020029673A1 PCT/CN2019/090417 CN2019090417W WO2020029673A1 WO 2020029673 A1 WO2020029673 A1 WO 2020029673A1 CN 2019090417 W CN2019090417 W CN 2019090417W WO 2020029673 A1 WO2020029673 A1 WO 2020029673A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
preset
electronic device
keyword set
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/090417
Other languages
English (en)
French (fr)
Inventor
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to EP19846665.8A priority Critical patent/EP3826008A4/en
Publication of WO2020029673A1 publication Critical patent/WO2020029673A1/zh
Priority to US17/144,667 priority patent/US20210125616A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/667Preventing unauthorised calls from a telephone set
    • H04M1/67Preventing unauthorised calls from a telephone set by electronic means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present application relates to the technical field of speech recognition, and in particular, to a speech processing method, device, storage medium, and electronic device.
  • the voice processing function can support a user's operation of the electronic device through voice. Therefore, the voice processing function realizes a better voice interaction experience for users.
  • the embodiments of the present application provide a voice processing method, device, storage medium, and electronic device, which can improve the wake-up rate of the electronic device.
  • an embodiment of the present application provides a voice processing method, including:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • an embodiment of the present application further provides a voice processing apparatus, including:
  • a first acquisition module configured to acquire voice information of a user, where the voice information includes a first keyword
  • a second obtaining module configured to obtain a preset keyword set according to a display state of a display screen of the electronic device, the display state includes a locked state and an unlocked state, and the preset keyword set includes at least one second keyword;
  • a judging module configured to judge whether the preset keyword set includes a second keyword that is the same as the first keyword
  • An execution module is configured to execute an operation instruction corresponding to the first keyword if the preset keyword set includes a second keyword that is the same as the first keyword.
  • an embodiment of the present application further provides a storage medium.
  • a computer program is stored in the storage medium, and when the computer program runs on the computer, the computer is caused to perform the following steps:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • an embodiment of the present application further provides an electronic device.
  • the electronic device includes a processor and a memory.
  • the memory stores a computer program.
  • the processor calls the computer program stored in the memory. To perform the following steps:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • FIG. 1 is a schematic diagram of a user performing voice control on an electronic device.
  • FIG. 2 is a schematic flowchart of a voice processing method according to an embodiment of the present application.
  • FIG. 3 is another schematic flowchart of a voice processing method according to an embodiment of the present application.
  • FIG. 4 is another schematic flowchart of a voice processing method according to an embodiment of the present application.
  • FIG. 5 is another schematic flowchart of a voice processing method according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a voice processing apparatus according to an embodiment of the present application.
  • FIG. 7 is another schematic structural diagram of a voice processing apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a user performing voice control on an electronic device.
  • the user outputs a voice
  • the electronic device collects the user's voice information.
  • the electronic device compares the collected voice information with a speech recognition model stored in the electronic device.
  • the electronic device recognizes the control instruction from the voice information.
  • the electronic device performs operations corresponding to the control instruction, such as operations such as brightening a screen, opening an application, exiting an application, and locking the screen, so as to implement a user's voice control of the electronic device.
  • An embodiment of the present application provides a voice processing method, and the voice processing method may be applied to an electronic device.
  • the electronic device may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality, AR) device, a car, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computer, and other devices.
  • AR Augmented Reality
  • An embodiment of the present application provides a voice processing method, including:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • the step of acquiring a preset keyword set according to a display state of a display screen of the electronic device includes:
  • the display state of the display screen is an unlocked state, determining a foreground application that is currently running
  • a second preset keyword set is obtained according to the foreground application and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application and the preset keyword set.
  • the step of obtaining a second preset keyword set according to the foreground application and a preset correspondence includes:
  • a second preset keyword set is obtained according to the foreground application, the application interface, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the application interface, and the preset keyword set.
  • the step of obtaining a second preset keyword set according to the foreground application and a preset correspondence includes:
  • a second preset keyword set is obtained according to the foreground application, the geographic location information, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the geographic location information, and the preset keyword set.
  • the first keyword includes a first sub-keyword and a second sub-keyword
  • the step of determining whether the preset keyword set includes a second keyword that is the same as the first keyword includes:
  • the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword;
  • the step of executing an operation instruction corresponding to the first keyword if the preset keyword set includes a second keyword that is the same as the first keyword includes:
  • the keyword set includes a third sub-keyword that is the same as the first sub-keyword and a fourth sub-keyword that corresponds to the second sub-keyword, executing the corresponding to the first keyword Operation instructions.
  • the method before the step of obtaining the user's voice information, the method further includes:
  • the training speech information is trained to obtain a preset speech recognition model.
  • the method before the step of acquiring the preset keyword set according to the display state of the display screen of the electronic device, the method further includes:
  • a preset keyword set is acquired according to a display state of a display screen of the electronic device.
  • the voice processing method may include the following steps:
  • the electronic device After the electronic device enables the voice processing function, the electronic device obtains the user's voice information.
  • a microphone may be provided in the electronic device, and the electronic device collects voice information of the user through the microphone.
  • the voice information includes a first keyword.
  • the server executes an operation instruction on the electronic device by using the first keyword in the user's voice information.
  • the voice information may include operation instructions such as "I want to light up the screen”, "Please turn on WeChat”, “I want to exit Taobao”, and so on.
  • the first keywords are "light up the screen”, “open WeChat”, “exit Taobao” and so on. Therefore, the voice information may include the first keyword or may be the first keyword.
  • a display state of a display screen of the electronic device is determined, and the display state includes a locked state and an unlocked state.
  • the locked state includes a screen-off state and a lock-screen state.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like.
  • the display screen of the electronic device does not display any interface of the electronic device, that is, a state in which the backlight is turned off and the screen is turned off to save power.
  • the server obtains a first preset keyword set corresponding to the off-screen state. After the user sends a voice message of "Open the main interface of the electronic device", it is determined whether the first preset keyword set includes a second keyword that is the same as "Open the main interface of the electronic device", where the second keyword is "open The main interface of the electronic device.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like. For example, the user lights up the screen, but the electronic device cannot operate when the screen is locked.
  • the server determines that the electronic device is in a lock screen state, the electronic device obtains a first preset keyword stored internally. Then, the user sends a voice message "open lock screen" to determine whether the first preset keyword includes the same second keyword as "open lock screen", where the second keyword is "open lock screen”.
  • the screen of the electronic device is not locked and can be used normally. For example, after the electronic device is unlocked, it can make calls, send text messages, open applications, and so on. If the electronic device is unlocked without performing any operation, the electronic device obtains a third preset keyword set stored internally, and then operates the electronic device. For example, in the unlocked state, the electronic device does not perform any operation, and the user sends a voice message of "open phone book". The electronic device internally obtains a third preset keyword set stored therein, and determines whether the third preset keyword set includes a second keyword that is the same as "Open Phone Book", where the second keyword is "Open Phonebook. "
  • the first keyword is included in the user's voice information. It is determined whether the preset keyword set includes a second keyword that is the same as the first keyword. For example, if the user sends a voice message "I want to take a picture", then the first keyword is "take a picture”.
  • the server recognizes that the application of the XX camera is opened by the electronic device, and then according to the application, the corresponding preset keyword set inside the electronic device is then loaded. It is determined whether the preset keyword set includes a second keyword “photograph” which is the same as the first keyword “photograph”.
  • the preset keyword set includes a second keyword that is the same as the first keyword, execute an operation instruction corresponding to the first keyword.
  • an operation instruction corresponding to the first keyword is executed. For example, if the user sends a voice message "I want to take a picture", then the first keyword is "take a picture".
  • the server recognizes that the application of the XX camera is opened by the electronic device, and then according to the application, the corresponding preset keyword set inside the electronic device is then loaded. It is determined whether the preset keyword set includes a second keyword that is the same as the first keyword “photograph”. If the keyword “photographing” is included in the preset keyword set, it is the second keyword.
  • the electronic device executes a “photograph” operation instruction, and performs photographing in the XX camera.
  • the voice information may be the first keyword sent by the user, or may include the first keyword. However, the steps all need to complete the operation instruction according to the first keyword.
  • the method before obtaining voice information of the user in step 110, the method further includes the following steps:
  • Acquire training voice information of a user where the voice information includes multiple keywords.
  • the voice information is trained to obtain a preset voice recognition model.
  • the voice information may be only keywords.
  • the voice message of the user is identified, and a first keyword in the voice message is obtained. For example, the user sends voice messages "I want to take a picture” and "Open XX video”. Then you can train "I want to take a picture” and "open XX video” to get a preset speech recognition model.
  • the preset voice recognition model can not only recognize keywords in voice information, but also recognize voiceprint features such as the user's tone, speed of speech, breath of speaking, and the like. For example, if the user has a bright voice and sends a voice message of "I want to take a picture", then the user's bright voice is trained and the voice message of "I want to take a picture” is trained to obtain a preset voice recognition model.
  • the first keyword is "Enter the panorama model to take a picture”.
  • the two operation instructions generated by the first keyword one is “enter the panorama model” and the other is "photograph”. Therefore, the first keyword included in the first keyword is "entering a panoramic model” and the second keyword is "photographing".
  • the user sends a voice message "open the lock screen to take a picture"
  • the first keyword is "open the lock screen to take a picture”. It can be seen that there are two operation instructions for the first keyword, one is “open the lock screen” and the other is "take a picture”. Therefore, the first keyword includes the first sub-keyword "open lock screen” and the second sub-keyword "photograph”.
  • the method before obtaining the preset keyword set in step 120, the method further includes the following steps:
  • the voiceprint features of the user are extracted, and the voiceprint features include the tone of the user, the breath of the user's voice, the speed of the user's speech, and so on.
  • a preset keyword set can be obtained. For example, the user sends a voice message "take a picture", and the server detects that the user's voice is a bright tone. The user's bright tone is stored in the preset voice recognition model, so the voice tone issued by the user is the same as the voice tone stored in the preset voice recognition model, and then the preset keyword set can be directly obtained.
  • the preset keyword set cannot be obtained. For example, a user's friend sends out a voice message of "take a picture", but the user's friend has a low tone. The server does not detect the deep tone in the preset speech recognition model. Then, even if “photographing” is spoken and the keyword “photographing” is included in the preset speech recognition model, the electronic device cannot be operated.
  • the preset keyword set can be obtained only when the voiceprint feature matches the voiceprint feature stored in the preset speech recognition model. If only the voice information matches but the voiceprint features do not match, the preset keyword set cannot be obtained. This greatly enhances the security of the electronic device, thereby protecting the user's private information and so on.
  • a preset keyword set is acquired according to a display state of a display screen of the electronic device.
  • a display state of a display screen of the electronic device is determined, and the display state includes a locked state and an unlocked state.
  • the locked state includes a screen-off state and a lock-screen state.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like.
  • the display screen of the electronic device does not display any interface of the electronic device, that is, a state in which the backlight is turned off and the screen is turned off to save power.
  • the server obtains a first preset keyword set corresponding to the off-screen state. After the user sends a voice message of "Open the main interface of the electronic device", it is determined whether the first preset keyword set includes a second keyword that is the same as "Open the main interface of the electronic device", where the second keyword is "open The main interface of the electronic device.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like. For example, the user lights up the screen, but the electronic device cannot operate when the screen is locked.
  • the server determines that the electronic device is in a lock screen state, the electronic device obtains a first preset keyword stored internally. Then, the user sends a voice message "open lock screen" to determine whether the first preset keyword includes the same second keyword as "open lock screen", where the second keyword is "open lock screen”.
  • the screen of the electronic device is not locked and can be used normally. For example, after the electronic device is unlocked, it can make calls, send text messages, open applications, and so on. If the electronic device is unlocked without performing any operation, the electronic device obtains a third preset keyword set stored internally, and then operates the electronic device. For example, in the unlocked state, the electronic device does not perform any operation, and the user sends a voice message of "open phone book". The electronic device internally obtains a third preset keyword set stored therein, and determines whether the third preset keyword set includes a second keyword that is the same as "Open Phone Book", where the second keyword is "Open Phonebook. "
  • a preset keyword set is obtained, where the preset keyword set includes at least one second keyword, and includes the following steps:
  • a display state of a display screen of the electronic device is determined, and the display state includes a locked state and an unlocked state.
  • the locked state includes a screen-off state and a lock-screen state.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like.
  • the display screen of the electronic device does not display any interface of the electronic device, that is, a state in which the backlight is turned off and the screen is turned off to save power.
  • the server obtains a first preset keyword set corresponding to the off-screen state. After the user sends a voice message of "Open the main interface of the electronic device", it is determined whether the first preset keyword set includes a second keyword that is the same as "Open the main interface of the electronic device", where the second keyword is "open The main interface of the electronic device.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like. For example, the user lights up the screen, but the electronic device cannot operate when the screen is locked.
  • the server determines that the electronic device is in a lock screen state, the electronic device obtains a first preset keyword stored internally. Then, the user sends a voice message "open lock screen" to determine whether the first preset keyword includes the same second keyword as "open lock screen", where the second keyword is "open lock screen”.
  • the user When unlocked, the user opens an application in the electronic device.
  • the server first determines a foreground application that is currently running, and then obtains a second preset keyword set according to the foreground application and a preset correspondence relationship.
  • the foreground application of the electronic device includes: XX camera, XX map, XX video, etc., each application corresponds to a fixed second preset keyword set.
  • a corresponding second preset keyword set is loaded from the inside of the electronic device to execute an operation instruction in the XX camera application.
  • a corresponding second preset keyword set is loaded from the inside of the electronic device to execute an operation instruction in the XX map application and the like.
  • the preset correspondence may be the correspondence shown in Table 1:
  • acquiring the second preset keyword set according to the foreground application and the preset correspondence includes the following steps:
  • social software includes: text input interface, address book interface, video call interface, and so on. Then the input text interface corresponds to a preset keyword set, the address book corresponds to a preset keyword set, and so on.
  • XX shopping software includes: payment interface, browsing interface, shopping cart interface, and so on. The payment interface corresponds to a preset keyword set, the browsing interface corresponds to a preset keyword set, and so on.
  • the preset correspondence may be the correspondence shown in Table 2:
  • acquiring the second preset keyword set according to the foreground application and the preset correspondence includes the following steps:
  • the geographical location information of the electronic device is currently obtained.
  • the geographic position may be identified based on GPS (Global Positioning System, Global Positioning System) positioning.
  • GPS Global Positioning System, Global Positioning System
  • the server identifies the current geographic location of the electronic device including: a library, an office, a supermarket, and so on.
  • the library corresponds to a preset keyword set
  • the office corresponds to a preset keyword set, and so on.
  • the corresponding preset relationship may be the corresponding relationship shown in Table 3:
  • step 130 determining whether the preset keyword set includes a second keyword that is the same as the first keyword, includes the following steps:
  • the server After the server obtains the preset keyword set, it compares the first sub-keyword and the second sub-keyword in the voice information with the preset keyword set to execute the next step according to the comparison result.
  • the first sub-keyword is "enter panorama mode” and the second sub-keyword is "take a picture”. It is determined whether a third sub-keyword is "Entering Panorama Mode” and a fourth sub-keyword is "Photographing" in the preset keyword set.
  • the first sub-keyword may also be “take a picture”
  • the second sub-keyword may be "enter into a panorama mode”.
  • the third sub-keyword is "take a picture”
  • the fourth sub-keyword is "enter the panorama mode”.
  • step 140 if the preset keyword set includes a second keyword that is the same as the first keyword, executing a command corresponding to the first keyword Operation instructions include the following steps:
  • the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword, execute the An operation instruction corresponding to a keyword.
  • step 131 if the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword, then Executing an operation instruction corresponding to the first keyword.
  • the first sub-keyword is "enter panorama mode” and the second sub-keyword is "take a picture”. It is determined whether a third sub-keyword is "Entering Panorama Mode” and a fourth sub-keyword is "Photographing" in the preset keyword set.
  • the first sub-keyword may also be “take a picture”
  • the second sub-keyword may be "enter into a panorama mode”.
  • the third sub-keyword is "take a picture”
  • the fourth sub-keyword is "enter the panorama mode”.
  • the server executes the operation instruction of "entering the panorama model to take a picture”.
  • the present application is not limited by the execution order of the various steps described, and in the case of no conflict, some steps may be performed in other orders or simultaneously.
  • the electronic device acquires a preset keyword set according to the display state of the display screen of the electronic device, the preset keyword set includes at least one second keyword; determine whether the preset keyword set includes the The second keyword is the same as the first keyword; if the preset keyword set includes the second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • the electronic device acquires a preset keyword set according to a display state of the display screen, so that the electronic device supports acquiring corresponding preset keyword sets in different display states of the display screen. Then, the electronic device internally determines whether the preset keyword set includes a second keyword that is the same as the first keyword.
  • the voice processing method improves the wake-up rate of the electronic device.
  • An embodiment of the present application further provides a voice processing apparatus, and the voice processing apparatus may be integrated in an electronic device.
  • An embodiment of the present application further provides a voice processing apparatus, including:
  • a first acquisition module configured to acquire voice information of a user, where the voice information includes a first keyword
  • a second obtaining module configured to obtain a preset keyword set according to a display state of a display screen of the electronic device, the display state includes a locked state and an unlocked state, and the preset keyword set includes at least one second keyword;
  • a judging module configured to judge whether the preset keyword set includes a second keyword that is the same as the first keyword
  • An execution module is configured to execute an operation instruction corresponding to the first keyword if the preset keyword set includes a second keyword that is the same as the first keyword.
  • the second obtaining module is configured to:
  • the display state of the display screen is an unlocked state, determining a foreground application that is currently running
  • a second preset keyword set is obtained according to the foreground application and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application and the preset keyword set.
  • the second obtaining module when obtaining a second preset keyword set according to the foreground application and a preset correspondence relationship, is configured to:
  • a second preset keyword set is obtained according to the foreground application, the application interface, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the application interface, and the preset keyword set.
  • the second obtaining module when obtaining a second preset keyword set according to the foreground application and a preset correspondence relationship, is configured to:
  • a second preset keyword set is obtained according to the foreground application, the geographic location information, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the geographic location information, and the preset keyword set.
  • the first keyword includes a first sub-keyword and a second sub-keyword
  • the determining module is configured to determine whether the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword;
  • the execution module is configured to: if the keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword, execute the An operation instruction corresponding to the first keyword.
  • the voice processing device further includes a training module, the training module is configured to:
  • the training speech information is trained to obtain a preset speech recognition model.
  • the voice processing apparatus further includes a matching module, the matching module is configured to:
  • the second obtaining module is configured to:
  • a preset keyword set is acquired according to a display state of a display screen of the electronic device.
  • the speech processing apparatus 200 may include a first acquisition module 201, a second acquisition module 202, a determination module 203, and an execution module 204.
  • the first obtaining module 201 is configured to obtain voice information of a user, where the voice information includes a first keyword.
  • the electronic device After the electronic device enables the voice processing function, the electronic device obtains the user's voice information.
  • a microphone may be provided in the electronic device, and the electronic device collects voice information of the user through the microphone.
  • the voice information includes a first keyword.
  • the server executes an operation instruction on the electronic device by using the first keyword in the user's voice information.
  • the voice information may include operation instructions such as "I want to light up the screen”, "Please turn on WeChat”, “I want to exit Taobao”, and so on.
  • the first keywords are "light up the screen”, “open WeChat”, “exit Taobao” and so on.
  • the second obtaining module 202 is configured to obtain a preset keyword set according to a display state of a display screen of the electronic device, the display state includes a locked state and an unlocked state, and the preset keyword set includes at least one second keyword;
  • a display state of a display screen of the electronic device is determined, and the display state includes a locked state and an unlocked state.
  • the locked state includes a screen-off state and a lock-screen state.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like.
  • the display screen of the electronic device does not display any interface of the electronic device, that is, a state in which the backlight is turned off and the screen is turned off to save power.
  • the server obtains a first preset keyword set corresponding to the off-screen state. After the user sends a voice message of "Open the main interface of the electronic device", it is determined whether the first preset keyword set includes a second keyword that is the same as "Open the main interface of the electronic device", where the second keyword is "open The main interface of the electronic device.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like. For example, the user lights up the screen, but the electronic device cannot operate when the screen is locked.
  • the server determines that the electronic device is in a lock screen state, the electronic device obtains a first preset keyword stored internally. Then, the user sends a voice message "open lock screen" to determine whether the first preset keyword includes the same second keyword as "open lock screen", where the second keyword is "open lock screen”.
  • the screen of the electronic device is not locked and can be used normally. For example, after the electronic device is unlocked, it can make calls, send text messages, open applications, and so on. If the electronic device is unlocked without performing any operation, the electronic device obtains a third preset keyword set stored internally, and then operates the electronic device. For example, in the unlocked state, the electronic device does not perform any operation, and the user sends a voice message of "open phone book". The electronic device internally obtains a third preset keyword set stored therein, and determines whether the third preset keyword set includes a second keyword that is the same as "Open Phone Book", where the second keyword is "Open Phonebook. "
  • the determining module 203 is configured to determine whether the preset keyword set includes a second keyword that is the same as the first keyword.
  • the first keyword is included in the user's voice information. It is determined whether the preset keyword set includes a second keyword that is the same as the first keyword. For example, the user sends a voice message "I want to take a picture", then the first keyword is "take a picture”.
  • the server recognizes that the electronic device has opened an application of the XX camera, and then according to the application, a preset keyword set inside the electronic device is then loaded. It is determined whether the preset keyword set includes a second keyword “photograph” which is the same as the first keyword “photograph”.
  • the execution module 204 is configured to execute an operation instruction corresponding to the first keyword if the preset keyword set includes a second keyword that is the same as the first keyword.
  • an operation instruction corresponding to the first keyword is executed. For example, the user sends a voice message "I want to take a picture", then the first keyword is "take a picture".
  • the server recognizes that the electronic device has opened an application of the XX camera, and then according to the application, a preset keyword set inside the electronic device is then loaded. It is determined whether the preset keyword set includes a second keyword that is the same as the first keyword “photograph”. If the keyword “photographing” is included in the preset keyword set, it is the second keyword.
  • the electronic device executes an operation instruction of “photographing”, and performs photographing in the XX camera.
  • the training module 205 is further configured to perform the following steps:
  • the training speech information is trained to obtain a preset speech recognition model.
  • Acquire training voice information of a user where the voice information includes multiple keywords.
  • the voice information is trained to obtain a preset voice recognition model.
  • the voice information may be only keywords.
  • the user sends out voice information the user's voice information is identified, and a first keyword in the voice information is obtained. For example, the user sends voice messages “I want to take a picture” and "Open XX video”. Then you can train "I want to take a picture” and "open XX video” to get a preset speech recognition model.
  • the preset voice recognition model can not only recognize keywords in voice information, but also recognize voiceprint features such as the user's tone, speed of speech, breath of speaking, and the like. For example, if the user has a bright voice and sends "I want to take a picture", then the user's bright voice and "I want to take a picture” are trained to obtain a preset voice recognition model.
  • the first obtaining module 201 is configured to obtain voice information of a user, where the voice information includes a first keyword, where the first keyword includes a first sub-keyword and a second sub-keyword.
  • the user sends a voice message "Enter the panorama model to take a picture”
  • the first keyword is "Enter the panorama model to take a picture”.
  • the two operation instructions generated by the first keyword one is “enter the panorama model” and the other is "photograph”. Therefore, the first keyword included in the first keyword is "entering a panoramic model” and the second keyword is "photographing".
  • the user sends a voice message "open the lock screen to take a picture"
  • the first keyword is "open the lock screen to take a picture”. It can be seen that there are two operation instructions for the first keyword, one is “open the lock screen” and the other is "take a picture”. Therefore, the first keyword includes the first sub-keyword "open lock screen” and the second sub-keyword "photograph”.
  • the matching module 206 is configured to perform the following steps:
  • a preset keyword set is acquired according to a display state of a display screen of the electronic device.
  • the voiceprint features of the user are extracted, and the voiceprint features include the tone of the user, the breath of the user's voice, the speed of the user's speech, and so on.
  • a preset keyword set can be obtained. For example, the user sends a voice message "take a picture", and the server detects that the user's voice is a bright tone. The user's bright tone is stored in the preset voice recognition model, so the voice tone issued by the user is the same as the voice tone stored in the preset voice recognition model, and then the preset keyword set can be directly obtained.
  • the preset keyword set cannot be obtained. For example, a user's friend sends out a voice message of "take a picture", but the user's friend has a low tone. The server does not detect the deep tone in the preset speech recognition model. Then, even if “photographing” is spoken and the keyword “photographing” is included in the preset speech recognition model, the electronic device cannot be operated.
  • the preset keyword set can be obtained only when the voiceprint feature matches the voiceprint feature stored in the preset speech recognition model. If only the voice information matches but the voiceprint features do not match, the preset keyword set cannot be obtained. This greatly enhances the security of the electronic device, thereby protecting the user's private information and so on.
  • obtaining the preset keyword set includes the following steps:
  • the display state of the display screen is an unlocked state, determining a foreground application that is currently running
  • a second preset keyword set is obtained according to the foreground application and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application and the preset keyword set.
  • a display state of a display screen of the electronic device is determined, and the display state includes a locked state and an unlocked state.
  • the locked state includes a screen-off state and a lock-screen state.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like.
  • the display screen of the electronic device does not display any interface of the electronic device, that is, a state in which the backlight is turned off and the screen is turned off to save power.
  • the server obtains a first preset keyword set corresponding to the off-screen state. After the user sends a voice message of "Open the main interface of the electronic device", it is determined whether the first preset keyword set includes a second keyword that is the same as "Open the main interface of the electronic device", where the second keyword is "open The main interface of the electronic device.
  • the identity verification information includes: password information input by the user, fingerprint characteristics of the user, facial features of the user, voiceprint characteristics of the user, and the like. For example, the user lights up the screen, but the electronic device cannot operate when the screen is locked.
  • the server determines that the electronic device is in a lock screen state, the electronic device obtains a first preset keyword stored internally. Then, the user sends a voice message "open lock screen" to determine whether the first preset keyword includes the same second keyword as "open lock screen", where the second keyword is "open lock screen”.
  • the user When unlocked, the user opens an application in the electronic device.
  • the server first determines a foreground application that is currently running, and then obtains a second preset keyword set according to the foreground application and a preset correspondence relationship.
  • the foreground applications of the electronic device include: XX camera, XX map, XX video, etc., each application corresponds to a fixed second preset keyword set.
  • a corresponding second preset keyword set is loaded from the inside of the electronic device to execute an operation instruction in the XX camera application.
  • a corresponding second preset keyword set is loaded from the inside of the electronic device to execute an operation instruction in the XX map application and the like.
  • a second preset keyword set is acquired according to the foreground application and a preset correspondence relationship.
  • the second acquisition module 202 includes the following steps:
  • a second preset keyword set is obtained according to the foreground application, the application interface, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the application interface, and the preset keyword set.
  • social software includes: text input interface, address book interface, video call interface, and so on. Then the input text interface corresponds to a preset keyword set, the address book corresponds to a preset keyword set, and so on.
  • XX shopping software includes: payment interface, browsing interface, shopping cart interface, and so on. The payment interface corresponds to a preset keyword set, the browsing interface corresponds to a preset keyword set, and so on.
  • a second preset keyword set is acquired according to the foreground application and a preset correspondence relationship.
  • the second acquisition module 202 includes the following steps:
  • a second preset keyword set is obtained according to the foreground application, the geographic location information, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the geographic location information, and the preset keyword set.
  • the geographical location information of the electronic device is currently obtained.
  • the geographic position may be identified based on GPS (Global Positioning System, Global Positioning System) positioning.
  • GPS Global Positioning System, Global Positioning System
  • the server identifies the current geographic location of the electronic device including: a library, an office, a supermarket, and so on.
  • the library corresponds to a preset keyword set
  • the office corresponds to a preset keyword set, etc.
  • the determining module 203 when determining whether the preset keyword set includes a second keyword that is the same as the first keyword, is configured to perform the following steps:
  • the server After the server obtains the preset keyword set, it compares the first sub-keyword and the second sub-keyword in the voice information with the preset keyword set to perform the next step according to the comparison result.
  • the first sub-keyword is "enter panorama mode” and the second sub-keyword is "take a picture”. It is determined whether a third sub-keyword is "Entering Panorama Mode” and a fourth sub-keyword is "Photographing" in the preset keyword set.
  • the first sub-keyword may also be “take a picture”
  • the second sub-keyword may be "enter into a panorama mode”.
  • the third sub-keyword is "take a picture”
  • the fourth sub-keyword is "enter the panorama mode”.
  • the execution module 204 when executing an operation instruction corresponding to the first keyword, is configured to: Perform the following steps:
  • the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword, executing the first key Word corresponding operation instructions.
  • the first sub-keyword is "enter panorama mode” and the second sub-keyword is "take a picture”. It is judged whether the third sub-keyword in the preset keyword set is "into panorama mode” and the fourth sub-keyword is "photograph”.
  • the first sub-keyword may also be "take a picture”
  • the second sub-keyword may be "enter into a panorama mode”.
  • the third sub-keyword is "take a picture”
  • the fourth sub-keyword is "enter the panorama mode”.
  • the server executes the operation instruction of "entering the panorama model to take a picture”.
  • each of the above modules may be implemented as independent entities, or any combination may be implemented as the same or several entities.
  • the voice processing apparatus 200 obtains the user's voice information through the first acquisition module 201.
  • the second obtaining module 202 obtains a preset keyword set according to the display state of the display screen of the electronic device, where the preset keyword set includes at least one second keyword; and the judgment module 203 determines whether the preset keyword set includes A second keyword that is the same as the first keyword; an execution module 204 is configured to execute the second keyword that is the same as the first keyword if the preset keyword set includes the second keyword that is the same as the first keyword An operation instruction corresponding to a keyword.
  • the electronic device acquires a preset keyword set according to the display state of the display screen, so that the electronic device supports the acquisition of the second acquisition module 202 in different display states of the display screen. Then, the determining module 203 determines whether the preset keyword set includes a second keyword that is the same as the first keyword. Because the preset keyword set corresponds to different display states of the display screen of the electronic device, and if the first keyword is the same as the second keyword in the preset keyword set, the electronic device will definitely be in the corresponding Perform voice processing in the display state. Therefore, the voice processing method improves the wake-up rate of the electronic device.
  • An embodiment of the present application further provides an electronic device.
  • the electronic device may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality, AR) device, a car, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computing device, or a wearable device such as an electronic device. Watches, electronic glasses, electronic helmets, electronic bracelets, electronic necklaces, electronic clothing and other equipment.
  • the electronic device 300 includes a processor 301 and a memory 302.
  • the processor 301 and the memory 302 are electrically connected.
  • the processor 301 is the control center of the electronic device 300. It connects various parts of the entire electronic device by using various interfaces and lines. Various functions of the device and processing data, so as to monitor the overall electronic equipment.
  • the processor 301 in the electronic device 300 loads the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following steps, and the processor 301 runs the stored information in the memory 302
  • a computer program in the computer which implements various functions:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • the processor 301 before acquiring the user's voice information, before the voice information includes the first keyword, performs the following steps:
  • the training speech information is trained to obtain a preset speech recognition model.
  • the processor 301 before obtaining a preset keyword set, where the preset keyword set includes at least one second keyword, the processor 301 performs the following steps:
  • a preset keyword set is acquired according to a display state of a display screen of the electronic device.
  • the processor 301 when acquiring the preset keyword set according to the display state of the display screen of the electronic device, performs the following steps:
  • the display state of the display screen is an unlocked state, determining a foreground application that is currently running
  • a second preset keyword set is obtained according to the foreground application and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application and the preset keyword set.
  • the processor 301 when the second preset keyword set is acquired according to the foreground application and the preset correspondence relationship, the processor 301 performs the following steps:
  • a second preset keyword set is obtained according to the foreground application, the application interface, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the application interface, and the preset keyword set.
  • the processor 301 when the second preset keyword set is acquired according to the foreground application and the preset correspondence relationship, the processor 301 performs the following steps:
  • a second preset keyword set is obtained according to the foreground application, the geographic location information, and a preset correspondence relationship, and the preset correspondence relationship includes a correspondence relationship between the application, the geographic location information, and the preset keyword set.
  • the first keyword includes a first sub-keyword and a second sub-keyword, and it is determined whether the preset keyword set includes a second keyword that is the same as the first keyword.
  • the processor 301 performs the following steps:
  • the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword;
  • the processor 301 performs the following steps:
  • the preset keyword set includes a third sub-keyword identical to the first sub-keyword and a fourth sub-keyword corresponding to the second sub-keyword, executing the first key Word corresponding operation instructions.
  • the memory 302 may be used to store computer programs and data.
  • the computer program stored in the memory 302 contains instructions executable by a processor.
  • Computer programs can be composed of various functional modules.
  • the processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.
  • the electronic device 300 further includes a microphone 303, an audio circuit 304, and a power source 305.
  • the processor 301 is electrically connected to the microphone 303, the audio circuit 304, and the power source 305, respectively.
  • the microphone 303 is used to collect voice information of the user.
  • the microphone 303 is configured to collect voice information of a user multiple times.
  • the audio circuit 304 may provide an audio interface between the user and the electronic device through a microphone, a speaker, a microphone, or the like.
  • the power source 305 is used to supply power to various components of the electronic device 300.
  • the power supply 305 may be logically connected to the processor 301 through a power management system, so as to implement functions such as managing charging, discharging, and power consumption management through the power management system.
  • the electronic device 300 may further include a display screen, a camera, a radio frequency circuit, a Bluetooth module, and the like, and details are not described herein again.
  • an embodiment of the present application provides an electronic device that performs the following steps: obtaining a user's voice information; obtaining a preset keyword set according to a display state of a display screen of the electronic device, the preset key
  • the word set includes at least one second keyword; determining whether the preset keyword set includes a second keyword that is the same as the first keyword; if the preset keyword set includes the first keyword If the second keyword has the same keyword, an operation instruction corresponding to the first keyword is executed.
  • the electronic device acquires a preset keyword set according to a display state of the display screen, so that the electronic device supports acquiring corresponding preset keyword sets in different display states of the display screen.
  • the electronic device internally determines whether the preset keyword set includes a second keyword that is the same as the first keyword. Because the preset keyword set corresponds to different display states of the display screen of the electronic device, and if the first keyword is the same as the second keyword in the preset keyword set, the electronic device will definitely be in the corresponding Perform voice processing in the display state. Therefore, the voice processing method improves the wake-up rate of the electronic device.
  • An embodiment of the present application further provides a storage medium.
  • a computer program is stored in the storage medium.
  • the computer program runs on a computer, the computer executes the voice processing method according to any one of the foregoing embodiments.
  • the computer program when the computer program runs on a computer, the computer performs the following steps:
  • the display state includes a locked state and an unlocked state
  • the preset keyword set includes at least one second keyword
  • the preset keyword set includes a second keyword that is the same as the first keyword, an operation instruction corresponding to the first keyword is executed.
  • the storage medium may include, but is not limited to, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种语音处理方法、装置、存储介质及电子设备,所述语音处理方法包括:获取用户的语音信息;根据电子设备的显示屏的显示状态获取预设关键词集合;判断所述预设关键词集合中是否包括与第一关键词相同的第二关键词;若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。

Description

语音处理方法、装置、存储介质及电子设备
本申请要求于2018年08月08日提交中国专利局、申请号为201810898885.X、发明名称为“语音处理方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音识别技术领域,特别涉及一种语音处理方法、装置、存储介质及电子设备。
背景技术
随着电子技术的快速发展,诸如智能手机等电子设备的功能越来越丰富。例如,语音处理功能可以支持用户通过语音的方式实现对电子设备的操作。所以语音处理功能实现了用户更好的语音交互体验。
发明内容
本申请实施例提供一种语音处理方法、装置、存储介质及电子设备,可以提高电子设备的唤醒率。
第一方面,本申请实施例提供一种语音处理方法,包括:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
第二方面,本申请实施例还提供一种语音处理装置,包括:
第一获取模块,用于获取用户的语音信息,所述语音信息包括第一关键词;
第二获取模块,用于根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断模块,用于判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
执行模块,用于若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
第三方面,本申请实施例还提供一种存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行以下步骤:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
第四方面,本申请实施例还提供一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行以下步骤:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为用户对电子设备进行语音控制的示意图。
图2为本申请实施例提供的语音处理方法的流程示意图。
图3为本申请实施例提供的语音处理方法的另一流程示意图。
图4为本申请实施例提供的语音处理方法的又一流程示意图。
图5为本申请实施例提供的语音处理方法的再一流程示意图。
图6为本申请实施例提供的语音处理装置的结构示意图。
图7为本申请实施例提供的语音处理装置的另一结构示意图。
图8为本申请实施例提供的电子设备的结构示意图。
图9为本申请实施例提供的电子设备的另一结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本申请的保护范围。
本申请的说明书和权利要求书以及上述附图中的术语“第一”、“第二”、“第三”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应当理解,这样描述的对象在适当情况下可以互换。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含。例如,包含了一系列步骤的过程、方法或包含了一系列模块的装置、电子设备、系统不必限于清楚地列出的那些步骤或模块,还可以包括没有清楚地列出的步骤或模块,也可以包括对于这些过程、方法、装置、电子设备或系统固有的其它步骤或模块。
参考图1,图1为用户对电子设备进行语音控制的示意图。
其中,用户输出一段语音,电子设备采集用户的语音信息。随后,电子设备将采集到的语音信息与电子设备中存储的语音识别模型进行比较。当语音信息与语音识别模型吻合时,电子设备从语音信息中识别出控制指令。随后,电子设备执行与所述控制指令对应的操作,例如亮屏、开启应用、退出应用、锁屏等操作,从而实现用户对电子设备的语音控制。
本申请实施例提供一种语音处理方法,所述语音处理方法可以应用于电子设备中。所述电子设备可以是智能手机、平板电脑、游戏设备、AR(Augmented Reality,增强现实)设备、汽车、数据存储装置、音频播放装置、视频播放装置、笔记本、桌面计算机等设备。
本申请实施例提供一种语音处理方法,包括:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第 一关键词对应的操作指令。
在一些实施例中,所述根据电子设备的显示屏的显示状态获取预设关键词集合的步骤包括:
若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
在一些实施例中,所述根据所述前台应用以及预设对应关系获取第二预设关键词集合的步骤包括:
确定所述前台应用当前显示的应用界面;
根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
在一些实施例中,所述根据所述前台应用以及预设对应关系获取第二预设关键词集合的步骤包括:
获取所述电子设备当前所处的地理位置信息;
根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
在一些实施例中,所述第一关键词包括第一子关键词和第二子关键词;
所述判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词的步骤包括:
判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
所述若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令的步骤,包括:
若所述关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
在一些实施例中,所述获取用户的语音信息的步骤之前,还包括:
获取用户的训练语音信息;
对所述训练语音信息进行训练,以得到预设语音识别模型。
在一些实施例中,所述根据电子设备的显示屏的显示状态获取预设关键词集合的步骤之前,还包括:
从所述语音信息中提取用户的声纹特征;
将所述声纹特征与所述预设语音识别模型进行匹配;
当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
如图2所示,所述语音处理方法,可以包括以下步骤:
110,获取用户的语音信息,所述语音信息包括第一关键词。
当电子设备开启语音处理功能后,电子设备获取用户的语音信息。例如,电子设备中可以设置有麦克风,电子设备通过麦克风采集用户的语音信息。
其中,所述语音信息包括第一关键词。服务器通过对用户语音信息中的第一关键词执行对电子设备的操作指令。例如,所述语音信息可以包括“我想要点亮屏幕”、“请开启微信”、“我想要退出淘宝”等等操作指令。所述第一关键词就为“点亮屏幕”、“开启微信”、“退出淘宝”等等。所以语音信息可以包括第一关键词也可以为第一关键词。
120,根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁 定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词。
首先,确定电子设备的显示屏的显示状态,所述显示状态包括锁定状态和解锁状态。其中锁定状态包括熄屏状态和锁屏状态。在锁定状态下,需要用户的身份验证信息进行验证才能打开电子设备,然后才能在电子设备上进行操作。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。
在熄屏状态时,电子设备的显示屏不显示电子设备的任何界面,也就是说正常关闭背光熄灭屏幕节省电源的状态。例如,当电子设备已经确定所述电子设备的显示状态在熄屏状态下时,服务器获取在熄屏状态下对应的第一预设关键词集合。用户发出“打开电子设备的主界面”的语音信息之后,判断第一预设关键词集合中是否包括与“打开电子设备的主界面”相同的第二关键词,其中第二关键词为“打开电子设备的主界面”。
在锁屏状态时,电子设备被点亮屏幕并且显示了锁屏的界面,但是电子设备不能进行任何操作,需要对用户的身份验证信息进行验证并通过后才能打开锁屏。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。例如,用户点亮屏幕,但是电子设备在锁屏状态下无法进行操作。当服务器确定电子设备在锁屏状态下时,电子设备获取内部存储的第一预设关键词。然后用户发送语音信息“打开锁屏”,判断第一预设关键词中是否包括与“打开锁屏”的相同第二关键词,其中第二关键词为“打开锁屏”。
在解锁状态时,电子设备的屏幕没有被锁定可以正常使用,例如,电子设备解锁之后,可以进行打电话、发短信、打开应用等等。如果解锁电子设备而没有进行任何操作时,电子设备获取存储在内部的第三预设关键词集合,然后对电子设备进行操作。例如,在解锁状态下,电子设备没有进行任何操作,用户发送“打开电话簿”的语音信息。电子设备内部获取存储在内部的第三预设关键词集合,判断所述第三预设关键词集合中是否包括与“打开电话簿”相同的第二关键词,其中第二关键词为“打开电话簿”。
130,判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。
第一关键词包括在用户的语音信息中。判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。例如,用户发出语音信息“我想要拍照”,那么第一关键词就是“拍照”。服务器识别电子设备打开了XX相机的应用,那么根据所述应用,随后加载电子设备内部对应的预设关键词集合。判断预设关键词集合中是否包括与第一关键词“拍照”相同的第二关键词“拍照”。
140,若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
如果第一关键词与预设关键词集合中的第二关键词相同,就执行第一关键词对应的操作指令。例如,用户发出语音信息“我想要拍照”,那么第一关键词就是“拍照”。服务器识别电子设备打开了XX相机的应用,那么根据所述应用,随后加载电子设备内部对应的预设关键词集合。判断预设关键词集合中是否包括与第一关键词“拍照”相同的第二关键词。如果预设关键词集合中有“拍照”这个关键词,也就是第二关键词。电子设备执行“拍照”操作指令,在XX相机中执行拍照。
需要说明的是,所述语音信息可以为用户发出的第一关键词,也可以是包含第一关键词。但是所述步骤都需要根据第一关键词完成操作指令。
在一些实施例中,如图3所示,步骤110,获取用户的语音信息之前,还包括以下步骤:
151,获取用户的训练语音信息;
152,对所述训练语音信息进行训练,以得到预设语音识别模型;
获取用户的训练语音信息,所述语音信息包含多个关键词。对语音信息进行训练,得到预设语音识别模型。所述语音信息也可以只为关键词。当用户发出语音信息时,识别用 户的语音信息,得到语音信息中的第一关键词。例如,用户发送语音信息“我想要拍照”和“打开XX视频”。那么可以对“我想要拍照”和“打开XX视频”进行训练,得到预设语音识别模型。
预设语音识别模型不仅可以识别语音信息中的关键词,也可以识别出用户的声调、语速、说话的气息等等声纹特征。例如,用户具有明亮的嗓音并发出“我想要拍照”的语音信息,那么将用户的明亮的嗓音进行训练和“我想要拍照”的语音信息进行训练,得到预设语音识别模型。
110,获取用户的语音信息,所述语音信息包括第一关键词,其中所述第一关键词包括第一子关键词和第二子关键词。
例如,用户发出语音信息“进入全景模型进行拍照”,那么第一关键词就是“进入全景模型进行拍照”。第一关键词生成的两个操作指令中,一个是“进入全景模型”,另一个是“拍照”。所以说第一关键词包括的第一子关键词为“进入全景模型”以及第二子关键词为“拍照”。
再例如用户发出语音信息“打开锁屏拍照”,那么第一关键词就是“打开锁屏拍照”。可以看出第一关键词出现两个操作指令,一个是“打开锁屏”,另一个是“拍照”。所以说第一关键词包括第一子关键词“打开锁屏”以及第二子关键词“拍照”。
在一些实施例中,如图3所示,步骤120,获取预设关键词集合之前,还包括以下步骤:
161,从所述语音信息中提取用户的声纹特征,将声纹特征与预设语音识别模型进行匹配;
162,当声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合;
提取用户的声纹特征,所述声纹特征包括:用户发出的语调、用户声音的气息、用户的语速等等。当声纹特征能够与预设语音识别模型相匹配,那么就可以获得预设关键词集合。例如,用户发出语音信息“拍照”,服务器检测用户的声音为明亮的声调。在预设语音识别模型中存储有用户明亮的声调,那么用户发出的语音声调与预设语音识别模型中存储的语音声调相同,那么就可以直接获取预设关键词集合。
若声纹特征与预设语音识别模型不匹配时,无法获取预设关键词集合。例如,用户的朋友发出“拍照”的语音信息,但是用户的朋友具有低沉的声调。服务器在预设语音识别模型中,没有检测到所述低沉的声调。那么即使说出了“拍照”并且在预设语音识别模型中包括所述“拍照”这一关键词,也无法使电子设备执行操作。综上所述,只有声纹特征与预设语音识别模型中存储的声纹特征匹配时,才可以获取预设关键词集合。如果只是语音信息匹配而声纹特征不匹配,是不能获取预设关键词集合。这样便大大加强了电子设备的安全性,从而保护用户的私密信息等等。
当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。首先,确定电子设备的显示屏的显示状态,所述显示状态包括锁定状态和解锁状态。其中锁定状态包括熄屏状态和锁屏状态。在锁定状态下,需要用户的身份验证信息进行验证才能打开电子设备,然后才能在电子设备上进行操作。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。
在熄屏状态时,电子设备的显示屏不显示电子设备的任何界面,也就是说正常关闭背光熄灭屏幕节省电源的状态。例如,当电子设备已经确定所述电子设备的显示状态在熄屏状态下时,服务器获取在熄屏状态下对应的第一预设关键词集合。用户发出“打开电子设备的主界面”的语音信息之后,判断第一预设关键词集合中是否包括与“打开电子设备的主界面”相同的第二关键词,其中第二关键词为“打开电子设备的主界面”。
在锁屏状态时,电子设备被点亮屏幕并且显示了锁屏的界面,但是电子设备不能进行任何操作,需要对用户的身份验证信息进行验证并通过后才能打开锁屏。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。例如,用户点亮屏幕,但是电子设备在锁屏状态下无法进行操作。当服务器确定电子设备在锁屏状态下时,电子设备获取内部存储的第一预设关键词。然后用户发送语音信息“打开锁屏”,判断第一预设关键词中是否包括与“打开锁屏”的相同第二关键词,其中第二关键词为“打开锁屏”。
在解锁状态时,电子设备的屏幕没有被锁定可以正常使用,例如,电子设备解锁之后,可以进行打电话、发短信、打开应用等等。如果解锁电子设备而没有进行任何操作时,电子设备获取存储在内部的第三预设关键词集合,然后对电子设备进行操作。例如,在解锁状态下,电子设备没有进行任何操作,用户发送“打开电话簿”的语音信息。电子设备内部获取存储在内部的第三预设关键词集合,判断所述第三预设关键词集合中是否包括与“打开电话簿”相同的第二关键词,其中第二关键词为“打开电话簿”。
在一些实施例中,如图3所示,步骤120,获取预设关键词集合,所述预设关键词集合包括至少一个第二关键词,包括以下步骤:
121,若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
122,若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
123,根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
首先,确定电子设备的显示屏的显示状态,所述显示状态包括锁定状态和解锁状态。其中锁定状态包括熄屏状态和锁屏状态。在锁定状态下,需要用户的身份验证信息进行验证才能打开电子设备,然后才能在电子设备上进行操作。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。
在熄屏状态时,电子设备的显示屏不显示电子设备的任何界面,也就是说正常关闭背光熄灭屏幕节省电源的状态。例如,当电子设备已经确定所述电子设备的显示状态在熄屏状态下时,服务器获取在熄屏状态下对应的第一预设关键词集合。用户发出“打开电子设备的主界面”的语音信息之后,判断第一预设关键词集合中是否包括与“打开电子设备的主界面”相同的第二关键词,其中第二关键词为“打开电子设备的主界面”。
在锁屏状态时,电子设备被点亮屏幕并且显示了锁屏的界面,但是电子设备不能进行任何操作,需要对用户的身份验证信息进行验证并通过后才能打开锁屏。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。例如,用户点亮屏幕,但是电子设备在锁屏状态下无法进行操作。当服务器确定电子设备在锁屏状态下时,电子设备获取内部存储的第一预设关键词。然后用户发送语音信息“打开锁屏”,判断第一预设关键词中是否包括与“打开锁屏”的相同第二关键词,其中第二关键词为“打开锁屏”。
在解锁状态时,用户打开了电子设备中的某应用。服务器就会先确定当前正在运行的前台应用,然后根据所述前台应用以及预设对应关系获取第二预设关键词集合。例如,电子设备的前台应用包括:XX相机、XX地图、XX视频等等,每个应用对应一个固定的第二预设关键词集合。当检测到电子设备打开了XX相机,从电子设备内部加载对应的第二预设关键词集合,以执行在XX相机应用中的操作指令。或者当检测电子设备打开了XX地图,从电子设备内部加载对应的第二预设关键词集合,以执行在XX地图应用中的操作指令等等。
例如,所述预设对应关系可以为如表1所示的对应关系:
表1
应用1 预设关键词集合1
应用2 预设关键词集合2
…… ……
由表1所示,可以清楚的了解应用与预设关键词集合之间的对应关系。
在一些实施例中,如图4所示,123,根据所述前台应用以及预设对应关系获取第二预设关键词集合,包括以下步骤:
1231,确定所述前台应用当前显示的应用界面;
1232,根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
在电子设备中,打开应用不仅存在主界面,还存在个人信息等界面。例如,社交软件包括:输入文字界面、通讯录界面、视频通话界面等等。那么输入文字界面对应一个预设关键词集合,通讯录对应一个预设关键词集合等等。再例如、XX购物软件包括:支付界面、浏览界面、购物车界面等等。支付界面对应一个预设关键词集合,浏览界面对应一个预设关键词集合等等。所述预设对应关系可以为表2所示的对应关系:
表2
Figure PCTCN2019090417-appb-000001
在一些实施例中,如图5所示,123,根据所述前台应用以及预设对应关系获取第二预设关键词集合,包括以下步骤:
1233,获取所述电子设备当前所处的地理位置信息;
1234,根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
在打开电子设备上的应用时,可以获取所述电子设备当前所处的地理位置信息。所述地理位置可以根据GPS(GlobalPositioning System,全球定位系统)定位识别。例如,服务器识别电子设备当前所处的地理位置包括:图书馆、办公室、超市等等。并且图书馆对应一个预设关键词集合,办公室对应一个预设关键词集合等等。所述对应预设关系可以为如表3所示的对应关系:
表3
Figure PCTCN2019090417-appb-000002
在一些实施例中,如图3所示,步骤130,判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词,包括以下步骤:
131,判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词。
服务器获取预设关键词集合之后,将语音信息中的第一子关键词和第二子关键词与预 设关键词集合比较,以根据比较结果执行下一步骤。
例如,用户发送语音信息“进入全景模式进行拍照”,那么第一子关键词为“进入全景模式”,第二子关键词为“拍照”。判断预设关键词集合中是否存在第三子关键词为“进入全景模式”和第四子关键词为“拍照”。这里第一子关键词也可以为“拍照”,第二子关键词可以为“进入全景模式”。以及第三子关键词为“拍照”,第四子关键词为“进入全景模式”。
在一些实施例中,如图3所示,步骤140,若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令包括以下步骤:
141,若所述预设关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
根据步骤131的判断方法,若所述预设关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
例如,用户发送语音信息“进入全景模式进行拍照”,那么第一子关键词为“进入全景模式”,第二子关键词为“拍照”。判断预设关键词集合中是否存在第三子关键词为“进入全景模式”和第四子关键词为“拍照”。这里第一子关键词也可以为“拍照”,第二子关键词可以为“进入全景模式”。以及第三子关键词为“拍照”,第四子关键词为“进入全景模式”。可以看出第一子关键词“进入全景模式”和第三子关键词“进入全景模式”相同,第二子关键词“拍照”与第四子关键词“拍照”相同。或者第一子关键词“拍照”和第三子关键词“拍照”相同,第二子关键词“进入全景模式”与第四子关键词“进入全景模式”相同。那么服务器执行“进入全景模型进行拍照”的操作指令。
具体实施时,本申请不受所描述的各个步骤的执行顺序的限制,在不产生冲突的情况下,某些步骤还可以采用其它顺序进行或者同时进行。
由上可知,本申请实施例提供的语音处理方法,包括:
获取用户的语音信息;根据电子设备的显示屏的显示状态获取预设关键词集合,所述预设关键词集合包括至少一个第二关键词;判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。所述语音处理方法中,电子设备根据显示屏的显示状态获取预设关键词集合,使电子设备支持在显示屏的不同显示状态下获取对应的预设关键词集合。然后,电子设备内部判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。因为预设关键词集合对应的是电子设备显示屏的不同显示状态,并且若所述第一关键词与所述预设关键词集合中的第二关键词相同,那么电子设备一定会在对应的显示状态下进行语音处理。所以所述语音处理方法提高了电子设备的唤醒率。
本申请实施例还提供一种语音处理装置,所述语音处理装置可以集成在电子设备中。
本申请实施例还提供一种语音处理装置,包括:
第一获取模块,用于获取用户的语音信息,所述语音信息包括第一关键词;
第二获取模块,用于根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断模块,用于判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
执行模块,用于若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
在一些实施例中,所述第二获取模块用于:
若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
在一些实施例中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述第二获取模块用于:
确定所述前台应用当前显示的应用界面;
根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
在一些实施例中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述第二获取模块用于:
获取所述电子设备当前所处的地理位置信息;
根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
在一些实施例中,所述第一关键词包括第一子关键词和第二子关键词;
所述判断模块用于:判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
所述执行模块用于:若所述关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
在一些实施例中,所述语音处理装置还包括训练模块,所述训练模块用于:
获取用户的训练语音信息;
对所述训练语音信息进行训练,以得到预设语音识别模型。
在一些实施例中,所述语音处理装置还包括匹配模块,所述匹配模块用于:
从所述语音信息中提取用户的声纹特征;
将所述声纹特征与所述预设语音识别模型进行匹配;
所述第二获取模块用于:
当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
如图6所示,语音处理装置200可以包括:第一获取模块201、第二获取模块202、判断模块203和执行模块204。
第一获取模块201,用于获取用户的语音信息,所述语音信息包括第一关键词。
当电子设备开启语音处理功能后,电子设备获取用户的语音信息。例如,电子设备中可以设置有麦克风,电子设备通过麦克风采集用户的语音信息。
其中,所述语音信息包括第一关键词。服务器通过对用户语音信息中的第一关键词执行对电子设备的操作指令。例如,所述语音信息可以包括“我想要点亮屏幕”、“请开启微信”、“我想要退出淘宝”等等操作指令。所述第一关键词就为“点亮屏幕”、“开启微信”、“退出淘宝”等等。
第二获取模块202,用于根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
首先,确定电子设备的显示屏的显示状态,所述显示状态包括锁定状态和解锁状态。其中锁定状态包括熄屏状态和锁屏状态。在锁定状态下,需要用户的身份验证信息进行验证才能打开电子设备,然后才能在电子设备上进行操作。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。
在熄屏状态时,电子设备的显示屏不显示电子设备的任何界面,也就是说正常关闭背光熄灭屏幕节省电源的状态。例如,当电子设备已经确定所述电子设备的显示状态在熄屏状态下时,服务器获取在熄屏状态下对应的第一预设关键词集合。用户发出“打开电子设备的主界面”的语音信息之后,判断第一预设关键词集合中是否包括与“打开电子设备的主界面”相同的第二关键词,其中第二关键词为“打开电子设备的主界面”。
在锁屏状态时,电子设备被点亮屏幕并且显示了锁屏的界面,但是电子设备不能进行任何操作,需要对用户的身份验证信息进行验证并通过后才能打开锁屏。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。例如,用户点亮屏幕,但是电子设备在锁屏状态下无法进行操作。当服务器确定电子设备在锁屏状态下时,电子设备获取内部存储的第一预设关键词。然后用户发送语音信息“打开锁屏”,判断第一预设关键词中是否包括与“打开锁屏”的相同第二关键词,其中第二关键词为“打开锁屏”。
在解锁状态时,电子设备的屏幕没有被锁定可以正常使用,例如,电子设备解锁之后,可以进行打电话、发短信、打开应用等等。如果解锁电子设备而没有进行任何操作时,电子设备获取存储在内部的第三预设关键词集合,然后对电子设备进行操作。例如,在解锁状态下,电子设备没有进行任何操作,用户发送“打开电话簿”的语音信息。电子设备内部获取存储在内部的第三预设关键词集合,判断所述第三预设关键词集合中是否包括与“打开电话簿”相同的第二关键词,其中第二关键词为“打开电话簿”。
判断模块203,用于判断预设关键词集合是否包括与所述第一关键词相同的第二关键词。
第一关键词包括在用户的语音信息中。判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。例如用户发出语音信息“我想要拍照”,那么第一关键词就是“拍照”。服务器识别电子设备打开了XX相机的应用,那么根据所述应用,随后加载电子设备内部的预设关键词集合。判断预设关键词集合中是否包括与第一关键词“拍照”相同的第二关键词“拍照”。
执行模块204,用于若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
如果第一关键词与预设关键词集合中的第二关键词相同,就执行第一关键词对应的操作指令。例如用户发出语音信息“我想要拍照”,那么第一关键词就是“拍照”。服务器识别电子设备打开了XX相机的应用,那么根据所述应用,随后加载电子设备内部的预设关键词集合。判断预设关键词集合中是否包括与第一关键词“拍照”相同的第二关键词。如果预设关键词集合中有“拍照”这个关键词,也就是第二关键词。电子设备执行“拍照”的操作指令,在XX相机中执行拍照。
在一些实施例中,如图7所示获取用户的语音信息之前,还包括训练模块205用于执行以下步骤:
获取用户的训练语音信息;
对所述训练语音信息进行训练,以得到预设语音识别模型。
获取用户的训练语音信息,所述语音信息包含多个关键词。对语音信息进行训练,得到预设语音识别模型。所述语音信息也可以只为关键词。当用户发出语音信息时,识别用户的语音信息,得到语音信息中的第一关键词。例如,用户发送语音信息“我想要拍照”和“打开XX视频”。那么可以对“我想要拍照”和“打开XX视频”进行训练,得到预设语音识别模型。
预设语音识别模型不仅可以识别语音信息中的关键词,也可以识别出用户的声调、语速、说话的气息等等声纹特征。例如,用户具有明亮的嗓音并发送“我想要拍照”,那么将 用户的明亮的嗓音和“我想要拍照”进行训练,得到预设语音识别模型。
第一获取模块201,用于获取用户的语音信息,所述语音信息包括第一关键词,其中所述第一关键词包括第一子关键词和第二子关键词。
例如用户发出语音信息“进入全景模型进行拍照”,那么第一关键词就是“进入全景模型进行拍照”。第一关键词生成的两个操作指令中,一个是“进入全景模型”,另一个是“拍照”。所以说第一关键词包括的第一子关键词为“进入全景模型”以及第二子关键词为“拍照”。
再例如用户发出语音信息“打开锁屏拍照”,那么第一关键词就是“打开锁屏拍照”。可以看出第一关键词出现两个操作指令,一个是“打开锁屏”,另一个是“拍照”。所以说第一关键词包括第一子关键词“打开锁屏”以及第二子关键词“拍照”。
在一些实施例中,如图7所示获取预设关键词集合之前,匹配模块206用于执行以下步骤:
从语音信息中提取用户的声纹特征,将声纹特征与预设语音识别模型进行匹配;
当声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
提取用户的声纹特征,所述声纹特征包括:用户发出的语调、用户声音的气息、用户的语速等等。当声纹特征能够与预设语音识别模型相匹配,那么就可以获得预设关键词集合。例如,用户发出语音信息“拍照”,服务器检测用户的声音为明亮的声调。在预设语音识别模型中存储有用户明亮的声调,那么用户发出的语音声调与预设语音识别模型中存储的语音声调相同,那么就可以直接获取预设关键词集合。
若声纹特征与预设语音识别模型不匹配时,无法获取预设关键词集合。例如,用户的朋友发出“拍照”的语音信息,但是用户的朋友具有低沉的声调。服务器在预设语音识别模型中,没有检测到所述低沉的声调。那么即使说出了“拍照”并且在预设语音识别模型中包括所述“拍照”这一关键词,也无法使电子设备执行操作。综上所述,只有声纹特征与预设语音识别模型中存储的声纹特征匹配时,才可以获取预设关键词集合。如果只是语音信息匹配而声纹特征不匹配,是不能获取预设关键词集合。这样便大大加强了电子设备的安全性,从而保护用户的私密信息等等。
在一些实施例中,获取预设关键词集合包括以下步骤:
若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
首先,确定电子设备的显示屏的显示状态,所述显示状态包括锁定状态和解锁状态。其中锁定状态包括熄屏状态和锁屏状态。在锁定状态下,需要用户的身份验证信息进行验证才能打开电子设备,然后才能在电子设备上进行操作。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。
在熄屏状态时,电子设备的显示屏不显示电子设备的任何界面,也就是说正常关闭背光熄灭屏幕节省电源的状态。例如,当电子设备已经确定所述电子设备的显示状态在熄屏状态下时,服务器获取在熄屏状态下对应的第一预设关键词集合。用户发出“打开电子设备的主界面”的语音信息之后,判断第一预设关键词集合中是否包括与“打开电子设备的主界面”相同的第二关键词,其中第二关键词为“打开电子设备的主界面”。
在锁屏状态时,电子设备被点亮屏幕并且显示了锁屏的界面,但是电子设备不能进行任何操作,需要对用户的身份验证信息进行验证并通过后才能打开锁屏。所述身份验证信息包括:用户输入的密码信息、用户的指纹特征、用户的脸部特征、用户的声纹特征等等。 例如,用户点亮屏幕,但是电子设备在锁屏状态下无法进行操作。当服务器确定电子设备在锁屏状态下时,电子设备获取内部存储的第一预设关键词。然后用户发送语音信息“打开锁屏”,判断第一预设关键词中是否包括与“打开锁屏”的相同第二关键词,其中第二关键词为“打开锁屏”。
在解锁状态时,用户打开了电子设备中的某应用。服务器就会先确定当前正在运行的前台应用,然后根据所述前台应用以及预设对应关系获取第二预设关键词集合。例如电子设备的前台应用包括:XX相机、XX地图、XX视频等等,每个应用对应一个固定的第二预设关键词集合。当检测到电子设备打开了XX相机,从电子设备内部加载对应的第二预设关键词集合,以执行在XX相机应用中的操作指令。或者当检测电子设备打开了XX地图,从电子设备内部加载对应的第二预设关键词集合,以执行在XX地图应用中的操作指令等等。
在一些实施例中,如图6所示根据所述前台应用以及预设对应关系获取第二预设关键词集合,第二获取模块202包括以下步骤:
确定所述前台应用当前显示的应用界面;
根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
在电子设备中,打开应用不仅存在主界面,还存在个人信息等界面。例如,社交软件包括:输入文字界面、通讯录界面、视频通话界面等等。那么输入文字界面对应一个预设关键词集合,通讯录对应一个预设关键词集合等等。再例如、XX购物软件包括:支付界面、浏览界面、购物车界面等等。支付界面对应一个预设关键词集合,浏览界面对应一个预设关键词集合等等。
在一些实施例中,如图7所示根据所述前台应用以及预设对应关系获取第二预设关键词集合,第二获取模块202包括以下步骤:
获取所述电子设备当前所处的地理位置信息;
根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
在打开电子设备上的应用时,可以获取所述电子设备当前所处的地理位置信息。所述地理位置可以根据GPS(GlobalPositioning System,全球定位系统)定位识别。例如,服务器识别电子设备当前所处的地理位置包括:图书馆、办公室、超市等等。并且图书馆对应一个预设关键词集合,办公室对应一个预设关键词集合等等
在一些实施例中,判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词时,判断模块203用于执行以下步骤:
服务器获取预设关键词集合之后,将语音信息中的第一子关键词和第二子关键词与预设关键词集合比较,以根据比较结果执行下一步骤。
例如,用户发送语音信息“进入全景模式进行拍照”,那么第一子关键词为“进入全景模式”,第二子关键词为“拍照”。判断预设关键词集合中是否存在第三子关键词为“进入全景模式”和第四子关键词为“拍照”。这里第一子关键词也可以为“拍照”,第二子关键词可以为“进入全景模式”。以及第三子关键词为“拍照”,第四子关键词为“进入全景模式”。
在一些实施例中,若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令时,执行模块204用于执行以下步骤:
若所述预设关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
例如,用户发送语音信息“进入全景模式进行拍照”,那么第一子关键词为“进入全景模式”,第二子关键词为“拍照”。判断预设关键词集合中是否存在第三子关键词为“进 入全景模式”和第四子关键词为“拍照”。这里第一子关键词也可以为“拍照”,第二子关键词可以为“进入全景模式”。以及第三子关键词为“拍照”,第四子关键词为“进入全景模式”。可以看出第一子关键词“进入全景模式”和第三子关键词“进入全景模式”相同,第二子关键词“拍照”与第四子关键词“拍照”相同。或者第一子关键词“拍照”和第三子关键词“拍照”相同,第二子关键词“进入全景模式”与第四子关键词“进入全景模式”相同。那么服务器执行“进入全景模型进行拍照”的操作指令。
具体实施时,以上各个模块可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现。
由上可知,本申请实施例提供的语音处理装置200,通过第一获取模块201获取用户的语音信息。第二获取模块202,根据电子设备的显示屏的显示状态获取预设关键词集合,所述预设关键词集合包括至少一个第二关键词;判断模块203所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;执行模块204,用于若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。所述语音处理处理装置200中,电子设备根据显示屏的显示状态获取预设关键词集合,使电子设备支持在显示屏的不同显示状态下获取第二获取模块202。然后判断模块203,判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。因为预设关键词集合对应的是电子设备显示屏的不同显示状态,并且若所述第一关键词与所述预设关键词集合中的第二关键词相同,那么电子设备一定会在对应的显示状态下进行语音处理。所以所述语音处理方法提高了电子设备的唤醒率。
本申请实施例还提供一种电子设备。所述电子设备可以是智能手机、平板电脑、游戏设备、AR(Augmented Reality,增强现实)设备、汽车、数据存储装置、音频播放装置、视频播放装置、笔记本、桌面计算设备、可穿戴设备诸如电子手表、电子眼镜、电子头盔、电子手链、电子项链、电子衣物等设备。
如图8所示,电子设备300包括处理器301和存储器302。其中,处理器301与存储器302电性连接。
处理器301是电子设备300的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或调用存储在存储器302内的计算机程序,以及调用存储在存储器302内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
在本实施例中,电子设备300中的处理器301会按照如下的步骤,将一个或一个以上的计算机程序的进程对应的指令加载到存储器302中,并由处理器301来运行存储在存储器302中的计算机程序,从而实现各种功能:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
在一些实施例中,获取用户的语音信息,所述语音信息包括第一关键词之前,处理器301执行以下步骤:
获取用户的训练语音信息;
对所述训练语音信息进行训练,以得到预设语音识别模型。
在一些实施例中,获取预设关键词集合,所述预设关键词集合包括至少一个第二关键词之前,处理器301执行以下步骤:
从所述语音信息中提取用户的声纹特征;
将所述声纹特征与预设语音识别模型进行匹配;
当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
在一些实施例中,根据电子设备的显示屏的显示状态获取预设关键词集合时,处理器301执行以下步骤:
若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
在一些实施例中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,处理器301执行以下步骤:
确定所述前台应用当前显示的应用界面;
根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
在一些实施例中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,处理器301执行以下步骤:
获取所述电子设备当前所处的地理位置信息;
根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
在一些实施例中,所述第一关键词包括第一子关键词和第二子关键词,判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词时,处理器301执行以下步骤:
判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令时,处理器301执行以下步骤:
若所述预设关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
存储器302可用于存储计算机程序和数据。存储器302存储的计算机程序中包含有可在处理器中执行的指令。计算机程序可以组成各种功能模块。处理器301通过调用存储在存储器302的计算机程序,从而执行各种功能应用以及数据处理。
在一些实施例中,如图8所示,电子设备300还包括:麦克风303、音频电路304以及电源305。其中,处理器301分别与麦克风303、音频电路304以及电源305电性连接。
麦克风303用于采集用户的语音信息。在本申请实施例中,所述麦克风303用于多次采集用户的语音信息。
音频电路304可以通过麦克风、扬声器、传声器等提供用户与电子设备之间的音频接口。
电源305用于给电子设备300的各个部件供电。在一些实施例中,电源305可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管图9中未示出,电子设备300还可以包括显示屏、摄像头、射频电路、蓝牙模块等,在此不再赘述。
由上可知,本申请实施例提供了一种电子设备,所述电子设备执行以下步骤:获取用户的语音信息;根据电子设备的显示屏的显示状态获取预设关键词集合,所述预设关键词集合包括至少一个第二关键词;判断所述预设关键词集合中是否包括与所述第一关键词相 同的第二关键词;若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。所述语音处理方法中,电子设备根据显示屏的显示状态获取预设关键词集合,使电子设备支持在显示屏的不同显示状态下获取对应的预设关键词集合。然后,电子设备内部判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词。因为预设关键词集合对应的是电子设备显示屏的不同显示状态,并且若所述第一关键词与所述预设关键词集合中的第二关键词相同,那么电子设备一定会在对应的显示状态下进行语音处理。所以所述语音处理方法提高了电子设备的唤醒率。
本申请实施例还提供一种存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,所述计算机执行上述任一实施例所述的语音处理方法。
例如,在一些实施例中,当所述计算机程序在计算机上运行时,所述计算机执行以下步骤:
获取用户的语音信息,所述语音信息包括第一关键词;
根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
需要说明的是,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过计算机程序来指令相关的硬件来完成,所述计算机程序可以存储于计算机可读存储介质中,所述存储介质可以包括但不限于:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上对本申请实施例所提供的语音处理方法、装置、存储介质及电子设备进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种语音处理方法,包括:
    获取用户的语音信息,所述语音信息包括第一关键词;
    根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
    判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
    若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
  2. 根据权利要求1所述的语音处理方法,其中,所述根据电子设备的显示屏的显示状态获取预设关键词集合的步骤包括:
    若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
    若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
    根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
  3. 根据权利要求2所述的语音处理方法,其中,所述根据所述前台应用以及预设对应关系获取第二预设关键词集合的步骤包括:
    确定所述前台应用当前显示的应用界面;
    根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
  4. 根据权利要求2所述的语音处理方法,其中,所述根据所述前台应用以及预设对应关系获取第二预设关键词集合的步骤包括:
    获取所述电子设备当前所处的地理位置信息;
    根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
  5. 根据权利要求1所述的语音处理方法,其中,所述第一关键词包括第一子关键词和第二子关键词;
    所述判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词的步骤包括:
    判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
    所述若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令的步骤,包括:
    若所述关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
  6. 根据权利要求1所述的语音处理方法,其中,所述获取用户的语音信息的步骤之前,还包括:
    获取用户的训练语音信息;
    对所述训练语音信息进行训练,以得到预设语音识别模型。
  7. 根据权利要求6所述的语音处理方法,其中,所述根据电子设备的显示屏的显示状态获取预设关键词集合的步骤之前,还包括:
    从所述语音信息中提取用户的声纹特征;
    将所述声纹特征与所述预设语音识别模型进行匹配;
    当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
  8. 一种语音处理装置,包括:
    第一获取模块,用于获取用户的语音信息,所述语音信息包括第一关键词;
    第二获取模块,用于根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
    判断模块,用于判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
    执行模块,用于若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
  9. 根据权利要求8所述的语音处理装置,其中,所述第二获取模块用于:
    若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
    若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
    根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
  10. 根据权利要求9所述的语音处理装置,其中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述第二获取模块用于:
    确定所述前台应用当前显示的应用界面;
    根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
  11. 根据权利要求9所述的语音处理装置,其中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述第二获取模块用于:
    获取所述电子设备当前所处的地理位置信息;
    根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
  12. 根据权利要求8所述的语音处理装置,其中,所述第一关键词包括第一子关键词和第二子关键词;
    所述判断模块用于:判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
    所述执行模块用于:若所述关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
  13. 一种存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行以下步骤:
    获取用户的语音信息,所述语音信息包括第一关键词;
    根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
    判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
    若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
  14. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行以下步骤:
    获取用户的语音信息,所述语音信息包括第一关键词;
    根据电子设备的显示屏的显示状态获取预设关键词集合,所述显示状态包括锁定状态和解锁状态,所述预设关键词集合包括至少一个第二关键词;
    判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词;
    若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令。
  15. 根据权利要求14所述的电子设备,其中,根据电子设备的显示屏的显示状态获取预设关键词集合时,所述处理器用于执行以下步骤:
    若所述显示屏的显示状态为锁定状态,则获取第一预设关键词集合;
    若所述显示屏的显示状态为解锁状态,则确定当前正在运行的前台应用;
    根据所述前台应用以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用与预设关键词集合之间的对应关系。
  16. 根据权利要求15所述的电子设备,其中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述处理器用于执行以下步骤:
    确定所述前台应用当前显示的应用界面;
    根据所述前台应用、所述应用界面以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、应用界面与预设关键词集合之间的对应关系。
  17. 根据权利要求15所述的电子设备,其中,根据所述前台应用以及预设对应关系获取第二预设关键词集合时,所述处理器用于执行以下步骤:
    获取所述电子设备当前所处的地理位置信息;
    根据所述前台应用、所述地理位置信息以及预设对应关系获取第二预设关键词集合,所述预设对应关系包括应用、地理位置信息与预设关键词集合之间的对应关系。
  18. 根据权利要求14所述的电子设备,其中,所述第一关键词包括第一子关键词和第二子关键词;
    判断所述预设关键词集合中是否包括与所述第一关键词相同的第二关键词时,所述处理器用于执行以下步骤:
    判断所述预设关键词集合中是否包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词;
    若所述预设关键词集合中包括与所述第一关键词相同的第二关键词,则执行与所述第一关键词对应的操作指令时,所述处理器用于执行以下步骤:
    若所述关键词集合中包括与所述第一子关键词相同的第三子关键词以及与所述第二子关键词对应的第四子关键词,则执行与所述第一关键词对应的操作指令。
  19. 根据权利要求14所述的电子设备,其中,获取用户的语音信息之前,所述处理器还用于执行以下步骤:
    获取用户的训练语音信息;
    对所述训练语音信息进行训练,以得到预设语音识别模型。
  20. 根据权利要求19所述的电子设备,其中,根据电子设备的显示屏的显示状态获取预设关键词集合之前,所述处理器还用于执行以下步骤:
    从所述语音信息中提取用户的声纹特征;
    将所述声纹特征与所述预设语音识别模型进行匹配;
    当所述声纹特征与所述预设语音识别模型匹配成功时,根据电子设备的显示屏的显示状态获取预设关键词集合。
PCT/CN2019/090417 2018-08-08 2019-06-06 语音处理方法、装置、存储介质及电子设备 Ceased WO2020029673A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19846665.8A EP3826008A4 (en) 2018-08-08 2019-06-06 VOICE PROCESSING PROCESS AND APPARATUS, STORAGE MEDIA AND ELECTRONIC DEVICE
US17/144,667 US20210125616A1 (en) 2018-08-08 2021-01-08 Voice Processing Method, Non-Transitory Computer Readable Medium, and Electronic Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810898885.X 2018-08-08
CN201810898885.XA CN110827824B (zh) 2018-08-08 2018-08-08 语音处理方法、装置、存储介质及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/144,667 Continuation US20210125616A1 (en) 2018-08-08 2021-01-08 Voice Processing Method, Non-Transitory Computer Readable Medium, and Electronic Device

Publications (1)

Publication Number Publication Date
WO2020029673A1 true WO2020029673A1 (zh) 2020-02-13

Family

ID=69413895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090417 Ceased WO2020029673A1 (zh) 2018-08-08 2019-06-06 语音处理方法、装置、存储介质及电子设备

Country Status (4)

Country Link
US (1) US20210125616A1 (zh)
EP (1) EP3826008A4 (zh)
CN (1) CN110827824B (zh)
WO (1) WO2020029673A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669836A (zh) * 2020-12-10 2021-04-16 鹏城实验室 命令的识别方法、装置及计算机可读存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516977B (zh) * 2021-03-15 2024-08-02 每刻深思智能科技(北京)有限责任公司 关键词识别方法及系统
CN113960935A (zh) * 2021-09-15 2022-01-21 珠海格力电器股份有限公司 智能晾衣架的控制方法、装置及存储介质
CN113870855A (zh) * 2021-09-29 2021-12-31 联想(北京)有限公司 一种设备的唤醒方法及其电子设备
CN113870857A (zh) * 2021-11-04 2021-12-31 深圳华龙讯达信息技术股份有限公司 一种语音控制场景方法和语音控制场景系统
CN114822554B (zh) * 2022-04-28 2022-11-22 支付宝(杭州)信息技术有限公司 基于语音的交互处理方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061962A1 (en) * 2015-08-24 2017-03-02 Mstar Semiconductor, Inc. Smart playback method for tv programs and associated control device
CN107492374A (zh) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 一种语音控制方法、智能设备及存储介质
CN108305626A (zh) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 应用程序的语音控制方法和装置
CN108320747A (zh) * 2018-02-08 2018-07-24 广东美的厨房电器制造有限公司 家电设备控制方法、设备、终端及计算机可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101109641A (zh) * 2006-07-21 2008-01-23 英华达(上海)电子有限公司 用于gps设备的语音控制与交互装置及其方法
CN102074231A (zh) * 2010-12-30 2011-05-25 万音达有限公司 语音识别方法和语音识别系统
US9031847B2 (en) * 2011-11-15 2015-05-12 Microsoft Technology Licensing, Llc Voice-controlled camera operations
KR101889836B1 (ko) * 2012-02-24 2018-08-20 삼성전자주식회사 음성인식을 통한 단말기의 잠금 상태 해제 및 조작 방법 및 장치
US20150051913A1 (en) * 2012-03-16 2015-02-19 Lg Electronics Inc. Unlock method using natural language processing and terminal for performing same
CN103943110A (zh) * 2013-01-21 2014-07-23 联想(北京)有限公司 控制方法、装置和电子设备
CN103338311A (zh) * 2013-07-11 2013-10-02 成都西可科技有限公司 一种智能手机锁屏界面启动app的方法
CN106815507A (zh) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 语音唤醒实现方法、装置及终端
CN105933765A (zh) * 2016-04-19 2016-09-07 乐视控股(北京)有限公司 一种语音解锁方法及装置
CN107424609A (zh) * 2017-07-31 2017-12-01 北京云知声信息技术有限公司 一种语音控制方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061962A1 (en) * 2015-08-24 2017-03-02 Mstar Semiconductor, Inc. Smart playback method for tv programs and associated control device
CN107492374A (zh) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 一种语音控制方法、智能设备及存储介质
CN108305626A (zh) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 应用程序的语音控制方法和装置
CN108320747A (zh) * 2018-02-08 2018-07-24 广东美的厨房电器制造有限公司 家电设备控制方法、设备、终端及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669836A (zh) * 2020-12-10 2021-04-16 鹏城实验室 命令的识别方法、装置及计算机可读存储介质
CN112669836B (zh) * 2020-12-10 2024-02-13 鹏城实验室 命令的识别方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
EP3826008A1 (en) 2021-05-26
CN110827824B (zh) 2022-05-17
EP3826008A4 (en) 2021-09-08
CN110827824A (zh) 2020-02-21
US20210125616A1 (en) 2021-04-29

Similar Documents

Publication Publication Date Title
CN110827824B (zh) 语音处理方法、装置、存储介质及电子设备
US10586541B2 (en) Communicating metadata that identifies a current speaker
CN108877793B (zh) 应用控制方法、装置、存储介质及电子设备
CN110837557B (zh) 摘要生成方法、装置、设备及介质
US20210020177A1 (en) Device for processing user voice input
CN108197450A (zh) 人脸识别方法、人脸识别装置、存储介质及电子设备
CN111343346A (zh) 基于人机对话的来电代接方法、装置、存储介质及设备
CN109003607B (zh) 语音识别方法、装置、存储介质及电子设备
CN114996515A (zh) 视频特征提取模型的训练方法、文本生成方法及装置
CN116801254A (zh) 反诈骗管控方法、装置、系统、设备及存储介质
CN111913627A (zh) 录音文件显示方法、装置及电子设备
US10936276B2 (en) Confidential information concealment
CN111028846B (zh) 免唤醒词注册的方法和装置
CN108762712A (zh) 电子设备控制方法、装置、存储介质及电子设备
WO2023221895A1 (zh) 一种目标信息的处理方法、装置以及电子设备
US10276169B2 (en) Speaker recognition optimization
CN113056784A (zh) 语音信息的处理方法、装置、存储介质及电子设备
US20250307446A1 (en) Automatic secure storage of confidential digital content
CN112764600A (zh) 资源处理方法、装置、存储介质及计算机设备
CN117153166B (zh) 语音唤醒方法、设备及存储介质
US20150205518A1 (en) Contextual data for note taking applications
CN110399046A (zh) 输入法中候选项的处理方法、装置、设备及存储介质
US10572955B2 (en) Presenting context for contacts
US20210243252A1 (en) Digital media sharing
CN111143441A (zh) 性别确定方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19846665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019846665

Country of ref document: EP

Effective date: 20210216