WO2021063343A1 - 语音交互方法及装置 - Google Patents
语音交互方法及装置 Download PDFInfo
- Publication number
- WO2021063343A1 WO2021063343A1 PCT/CN2020/118748 CN2020118748W WO2021063343A1 WO 2021063343 A1 WO2021063343 A1 WO 2021063343A1 CN 2020118748 W CN2020118748 W CN 2020118748W WO 2021063343 A1 WO2021063343 A1 WO 2021063343A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- skill
- voice skill
- user
- assistant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72469—User interfaces specially adapted for cordless or mobile telephones for operating the device by selecting functions from two or more displayed items, e.g. menus or icons
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Definitions
- This application relates to the field of terminal technology, and in particular to a voice interaction method and device.
- a voice assistant application to realize the voice interaction between the electronic device and the user.
- APP voice assistant application
- a developer connects the voice assistant and various services in the electronic device through a dialogue development platform, so that the voice assistant can call various services in the electronic device through the dialogue development platform and voice skill commands.
- the dialogue development platform is a voice skill development tool provided for developers, such as the Google dialogue development platform (dialogflow).
- Voice skills are services implemented by voice assistants by calling various services in electronic devices. Each voice skill is required Specific voice skill commands to call.
- This application provides a voice interaction method and device.
- the voice skill instructions commonly used in the first application scenario are displayed on the display interface, that is, the user is in the first application scenario.
- the voice skill instructions that may be input below are recommended to the user to realize the scene-based recommendation of the voice skill instructions, covering as many usage scenarios as possible.
- the present application provides a voice interaction method applied to an electronic device, and the electronic device is equipped with a voice assistant.
- the method includes: after the voice assistant is awakened, determining a first application scenario based on one or more information items .
- the information items include the current display interface of the electronic device, the current time, the current location of the electronic device, the current motion state of the electronic device, the current event, or the application running on the current electronic device.
- the voice assistant determines the frequently used voice skill instructions in the first application scenario according to the first application scenario and the historical voice skill usage record.
- the voice skill instruction is used to call the voice skill
- the voice skill is a service provided by the voice assistant
- the historical voice skill usage record includes one or more records, and the record is used to indicate the time to call the voice skill and the voice skill to call the voice skill.
- Instructions, and application scenarios for invoking voice skills Finally, the voice assistant displays the frequently used voice skill instructions in the first application scenario on the display interface.
- the voice assistant determines one or more information items in the first application scenario according to the application recognition mechanism of the electronic device. Further, the voice assistant determines the commonly used voice skill instructions in the first application scenario based on the information items in the first application scenario and the application scenario in which the voice skill is invoked in the historical voice skill usage record. The more the number of information items in the first application scenario, the more sufficient the factors considered by the voice assistant when determining the commonly used voice skill instructions in the first application scenario, the more commonly used voice skill instructions determined by the voice assistant The practicality and accuracy are higher. In addition, determining the frequently used voice skill instructions displayed on the display interface according to the first application scenario and the voice skill usage record can realize the dynamic adjustment of the voice skill instructions recommended to the user, covering as many usage scenarios as possible.
- the voice assistant After determining the frequently-used voice skill instructions, the voice assistant also displays these frequently-used voice skill instructions on the display interface. Therefore, the voice assistant can realize the scene-based recommendation of voice skill instructions, so that the user can call the voice skills according to the voice skill instructions displayed on the display interface, and reduce the user's wrong input caused by the voice assistant not recognizing the voice skill instructions input by the user. Or the situation that the voice skill instruction cannot be successfully invoked according to the voice skill instruction input by the user occurs, which improves the interactive experience between the user and the voice assistant.
- the method further includes: if the current user is a new user, the network is normally connected, and the voice assistant can normally obtain the high-frequency voice in the current network Skill commands, the voice assistant displays high-frequency voice skill commands on the display interface. If the current user is an old user, the voice assistant determines the common voice skill instructions of the current user based on the historical voice skill usage records, and the voice assistant displays the common voice skill instructions of the current user on the display interface. If the network is not properly connected, the voice assistant informs the user that the network is abnormal, and displays a voice skill command for opening the network system settings on the display interface. If the voice assistant cannot normally acquire the high-frequency voice skills in the current network, the voice assistant displays the preset voice skill instructions on the display interface.
- the voice assistant after the voice assistant is awakened for the first time, it can obtain the high-frequency voice skills instructions in the current network according to the current user type, current network connection status and whether the voice assistant can normally obtain the high-frequency voice skills instructions in the current network, and recommend the voice skills that they may use Instruction, realize the scene-based recommendation of voice skill instructions, reduce the occurrence of users mistakenly input voice skill instructions, or the voice assistant cannot realize the invocation of voice skills according to the voice skill instructions input by the user, and improve the interaction between the user and the voice assistant Experience.
- the method further includes: the voice assistant according to the historical voice skills Using the record and the first application scenario, determine the frequency of occurrence of the commonly used voice skills in the first application scenario, and the commonly used voice skills correspond to the commonly used voice skill instructions.
- the voice assistant determines the priority of the commonly used voice skills according to the frequency of the commonly used voice skill instructions in the first application scenario.
- the voice assistant determines the position of the commonly used voice skill instruction on the display interface according to the priority of the commonly used voice skill.
- the voice assistant can determine the priority of frequently used voice skill commands in the first application scenario according to the application scenarios of invoking voice skills in the historical voice skill usage record, the number of voice skills invoking, and the first application scenario. In this way, the voice assistant can determine the display position and display order of the commonly used voice skills on the display interface according to the priority of these commonly used voice skills, and the voice skill instructions with higher priority are preferentially displayed on the display interface.
- the voice interaction method further includes: in response to the voice skill instruction input by the user, the voice assistant calls a voice skill corresponding to the voice skill instruction input by the user.
- the voice interaction method further includes: in the case that the voice assistant fails to call the voice skill, re-determine the common voice in the first application scenario according to the time when the voice skill is called in the historical voice skill usage record Skill instructions, and displayed on the display interface.
- the voice interaction method further includes: if the voice assistant does not receive the voice skill instruction input by the user within the first preset time period, the voice assistant determines the second application scenario. The voice assistant determines the frequently used voice skill instructions in the second application scenario according to the second application scenario and the historical voice skill usage record. Subsequently, the voice assistant displays the frequently used voice skill instructions in the second application scenario on the display interface.
- the voice interaction method further includes: if the voice assistant does not receive a voice skill instruction input by the user within the second preset time period, the voice assistant is turned off.
- the voice interaction method further includes: if the network is not normally connected, the voice assistant notifies the user that the network is abnormal, and displays a voice skill instruction for opening the network system settings on the display interface.
- the commonly used voice skill instructions are used to call the voice skills corresponding to the controls that can be clicked on the display interface, and/or the commonly used voice skills instructions are used to call the voice skills in the display interface.
- the voice skills corresponding to the recognizable text or picture, and/or the commonly used voice skills instructions are used to call the voice skills corresponding to the recognizable scene intent in the display interface, and/or the commonly used voice skills instructions are used to call the voice skills based on the current moment.
- Voice skills corresponding to regular behaviors, and/or commonly used voice skills instructions are used to call voice skills corresponding to regular behaviors based on the current location of the electronic device, and/or commonly used voice skills instructions are used to call based on the current motion state of the electronic device
- the voice skills corresponding to the regular behaviors, and/or the commonly used voice skills instructions are used to call the voice skills corresponding to the native application, and/or the commonly used voice skills instructions are used to call the voice skills corresponding to the third-party application, and/or the commonly used voice skills
- the voice skill instructions are used to call the voice skills corresponding to the preset event, and/or the common voice skills instructions are used to call the voice skills corresponding to the operations involving multiple applications, and/or the common voice skills instructions are used to call the path comparison.
- the voice skill corresponding to the long operation and/or the commonly used voice skill instruction is used to call the voice skill corresponding to the function in the application currently running on the electronic device.
- the voice assistant is displayed in a half-screen mode, where the ratio of the voice assistant's application interface to the overall display interface of the electronic device is greater than 0 and less than 1.
- the method includes: the voice assistant uses the voice skills
- the feedback result of the instruction is displayed on the user interface of the voice assistant.
- the voice assistant responds to the user's operation instructions and shares the feedback results of the voice skill instructions to other applications.
- the voice assistant displays the feedback result of the voice skill instruction on the user interface of the voice assistant, including: the voice assistant displays the feedback result on the user interface of the voice assistant in the form of a card.
- the voice assistant responds to the user's operation instruction and shares the feedback result of the voice skill instruction to other applications, including: the voice assistant selects the card in response to the user's pressing operation on the card.
- the voice assistant drags the card from the user interface of the voice assistant to the user interface of other applications in response to the user's drag operation on the card.
- an embodiment of the present application provides an electronic device with a voice assistant installed on the electronic device.
- the electronic device includes a processor, a memory, and a display, and the memory, the display, and the processor are coupled.
- the display is used to display the image generated by the processor, and the memory is used to store computer program code.
- the computer program code includes computer instructions.
- the processor executes the above computer instructions, the processor is used for the voice assistant after being awakened.
- One or more information items determine the first application scenario.
- the information items include the current display interface of the electronic device, the current time, the current location of the electronic device, the current motion state of the electronic device, the current event, or the application running on the current electronic device.
- the processor is used to determine the commonly used voice skill instructions in the first application scenario according to the first application scenario and the historical voice skill usage record.
- the voice skill instruction is used to call the voice skill, and the voice skill is a service provided by the voice assistant.
- the historical voice skill usage record includes one or more records, and the record is used to indicate the time to call the voice skill and the voice skill instruction to call the voice skill. And the application scenarios of calling voice skills.
- the processor is also used to display frequently used voice skill instructions in the first application scenario on the display interface.
- the processor when the voice assistant is awakened for the first time, the processor is also used to: if the current user is a new user, the network is normally connected, and the voice assistant can normally obtain the high frequency in the current network Voice skills instructions, the voice assistant displays high-frequency voice skills instructions on the display interface. If the current user is an old user, the voice assistant determines the current user's common voice skill commands based on the historical voice skill usage records, and the voice assistant displays the current user's common voice skill commands on the display interface. If the network is not properly connected, the voice assistant informs the user that the network is abnormal, and displays a voice skill command for opening the network system settings on the display interface. If the voice assistant cannot normally acquire the high-frequency voice skills in the current network, the voice assistant displays the preset voice skill instructions on the display interface.
- the processor is configured to determine the commonly used voice skill instructions in the first application scenario according to the first application scenario and the historical voice skill usage record.
- the processor is also used to determine the frequency of occurrence of commonly used voice skills in the first application scenario according to the historical voice skill usage records and the first application scenario, and the commonly used voice skills correspond to the commonly used voice skill instructions.
- the processor is also used to determine the priority of the commonly used voice skills according to the frequency of occurrence of the commonly used voice skill instructions in the first application scenario.
- the processor is also used to determine the position of the commonly used voice skill instruction on the display interface according to the priority of the commonly used voice skill.
- the processor is further configured to respond to the voice skill instruction input by the user, invoking a voice skill corresponding to the voice skill instruction input by the user.
- the processor is also used to re-determine the commonly used voice skill commands in the first application scenario according to the time when the voice skill is called in the historical voice skill usage record in the case that the voice skill fails to be called, and Show on the display interface.
- the processor is further configured to, if the voice assistant does not receive the voice skill instruction input by the user within the first preset time period, the voice assistant determines the second application scenario.
- the voice assistant determines the frequently used voice skill instructions in the second application scenario according to the second application scenario and the historical voice skill usage record.
- the voice assistant displays the frequently used voice skill instructions in the second application scenario on the display interface.
- the processor is further configured to shut down the voice assistant if the voice assistant does not receive the voice skill instruction input by the user within the second preset time period.
- the processor is further configured to, if the network is not normally connected, the voice assistant informs the user that the network is abnormal, and displays a voice skill instruction for opening the network system settings on the display interface.
- the commonly used voice skill instructions are used to call the voice skills corresponding to the controls that can be clicked on the display interface, and/or the commonly used voice skills instructions are used to call recognizable text or pictures in the display interface
- the corresponding voice skills and/or commonly used voice skills instructions are used to call the voice skills corresponding to the recognizable scene intent in the display interface, and/or the commonly used voice skills instructions are used to call the voice skills corresponding to the regular behavior at the current moment .
- And/or commonly used voice skill commands are used to call voice skills corresponding to regular behaviors based on the current location of the electronic device, and/or commonly used voice skill commands are used to call voice corresponding to regular behaviors based on the current electronic device's motion state Skills
- and/or commonly used voice skills instructions are used to call the voice skills corresponding to the native application
- commonly used voice skills instructions are used to call the voice skills corresponding to third-party applications
- commonly used voice skills instructions are used to call Voice skills corresponding to preset events, and/or commonly used voice skills instructions are used to call
- the voice assistant is displayed in a half-screen state, and the half-screen state is that the ratio of the voice assistant's application interface to the overall display interface of the electronic device is greater than 0 and less than 1.
- the processor is also used to combine voice skills
- the feedback result of the instruction is displayed on the user interface of the voice assistant.
- the voice assistant responds to the user's operation instruction and shares the feedback result of the voice skill instruction to other applications.
- the processor is used to display the feedback result of the voice skill instruction on the user interface of the voice assistant.
- the processor is used for the voice assistant to display the feedback result in the form of a card to the user of the voice assistant. Interface.
- the voice assistant responds to the user's operation instruction and shares the feedback result of the voice skill instruction to other applications.
- the processor is used by the voice assistant to select the card in response to the user's pressing operation on the card; the voice assistant responds to the user's drag of the card Operation, drag the card from the user interface of the voice assistant to the user interface of other applications.
- an embodiment of the present application provides a computer storage medium, which includes computer instructions.
- the computer instructions run on an electronic device, the electronic device executes the first aspect and any of its possible implementations.
- an embodiment of the present application provides a computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute as in the first aspect and any one of its possible implementation manners.
- the voice interaction method when the computer program product runs on a computer, the computer is caused to execute as in the first aspect and any one of its possible implementation manners.
- a chip system including a processor, and when the processor executes an instruction, the processor executes the voice interaction method described in the first aspect and any one of its possible implementation manners.
- FIG. 1 is a first structural diagram of an electronic device provided by an embodiment of the application
- FIG. 2 is a block diagram of the software structure of an electronic device provided by an embodiment of the application.
- FIG. 3 is a flowchart of a voice interaction method provided by an embodiment of this application.
- FIG. 4 is a first schematic diagram of displaying a common voice skill instruction provided by an embodiment of the application.
- FIG. 5 is a second schematic diagram of displaying a common voice skill instruction provided by an embodiment of the application.
- FIG. 6 is a third schematic diagram of displaying a commonly used voice skill instruction provided by an embodiment of this application.
- FIG. 7 is a fourth schematic diagram of displaying a commonly used voice skill instruction provided by an embodiment of the application.
- FIG. 8 is a fifth schematic diagram of displaying a commonly used voice skill instruction provided by an embodiment of the application.
- FIG. 9 is a sixth schematic diagram of displaying a common voice skill instruction provided by an embodiment of the application.
- FIG. 10 is a seventh schematic diagram of displaying a commonly used voice skill instruction provided by an embodiment of the application.
- FIG. 11 is an eighth schematic diagram of displaying a commonly used voice skill instruction provided by an embodiment of the application.
- FIG. 12 is a second structural diagram of an electronic device provided by an embodiment of this application.
- the embodiments of the present application provide a voice interaction method and device, which can be applied to an electronic device, and the voice interaction between the electronic device and the user is realized through a voice assistant on the electronic device.
- the voice assistant determines the first application scenario information including one or more information items (the information items include the current time, the current location of the electronic device, the current display interface of the electronic device, etc.) according to the application recognition mechanism on the electronic device where it is located.
- the voice assistant determines the priority order of frequently used voice skill instructions according to the first scene information and historical voice skill usage records, and displays the frequently used voice skill instructions in the first application scene on the display interface according to the priority. Recommend voice skill instructions for users to realize scene-based recommendation of voice skill instructions.
- the voice assistant in the electronic device records the voice skill instructions issued by the user and the scene information at the time one by one, as a data source for judging the user's possible intention to use the voice assistant, so as to cover as many use scenarios of the voice skill instruction as possible , Improve user experience.
- the voice assistant may be an application program installed in an electronic device, and the application program may be an embedded application program in the electronic device (ie, a system application of the electronic device), or a downloadable application program.
- the embedded application is an application provided as a part of the implementation of an electronic device (such as a mobile phone).
- the embedded application program may be a "settings” application, a "short message” application, a "camera” application, and so on.
- a downloadable application is an application that can provide its own Internet protocol multimedia subsystem (IMS) connection.
- the downloadable application can be pre-installed in the electronic device or can be downloaded and installed by the user.
- Third-party applications in electronic devices For example, the downloadable application program may be a "WeChat” application, an "Alipay” application, and an "mail” application.
- the electronic devices in the embodiments of this application may be portable computers (such as mobile phones), notebook computers, personal computers (personal computers, PCs), wearable electronic devices (such as smart watches), tablets, smart home devices, augmented reality (augmented) Reality (AR) ⁇ virtual reality (VR) devices, artificial intelligence (AI) terminals (such as smart robots), on-board computers, etc.
- portable computers such as mobile phones
- notebook computers personal computers
- personal computers personal computers
- PCs personal computers
- wearable electronic devices such as smart watches
- tablets smart home devices
- AI) terminals such as smart robots
- on-board computers etc.
- the following embodiments do not specifically limit the specific form of the device.
- FIG. 1 shows a schematic structural diagram of an electronic device 100 provided in this embodiment.
- the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display 194 , And subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
- SIM subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
- the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device 100.
- the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
- the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
- AP application processor
- modem processor modem processor
- GPU graphics processing unit
- image signal processor image signal processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the different processing units may be independent devices or integrated in one or more processors.
- the DSP can monitor the voice data in real time.
- the voice data can be handed over AP.
- the AP performs text verification and voiceprint verification on the voice data.
- the electronic device can call the voice skill instruction to execute the corresponding voice skill.
- the controller may be the nerve center and command center of the electronic device 100.
- the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 to store instructions and data.
- the memory in the processor 110 is a cache memory.
- the memory can store instructions or data that the processor 110 has just used or used cyclically. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- UART universal asynchronous transmitter receiver/transmitter
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB Universal Serial Bus
- the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may include multiple sets of I2C buses.
- the processor 110 may couple the touch sensor 180K, the charger, the flash, the camera 193, etc., respectively through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement the touch function of the electronic device 100.
- the I2S interface can be used for audio communication.
- the processor 110 may include multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
- the MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
- the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
- the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
- the GPIO interface can be configured through software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
- the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
- the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect to other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in this embodiment is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
- the electronic device 100 may also adopt different interface connection modes in the above-mentioned embodiments, or a combination of multiple interface connection modes.
- the charging management module 140 is used to receive charging input from the charger.
- the charger can be a wireless charger or a wired charger.
- the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
- the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
- the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
- the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
- the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
- the power management module 141 may also be provided in the processor 110.
- the power management module 141 and the charging management module 140 may also be provided in the same device.
- the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
- the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
- the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
- at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor.
- the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
- the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
- WLAN wireless local area networks
- BT wireless fidelity
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication technology
- infrared technology infrared, IR
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
- the wireless communication module 160 may also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
- the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Beidou navigation satellite system
- QZSS quasi-zenith satellite system
- SBAS satellite-based augmentation systems
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
- the display screen 194 is used to display images, videos, and the like.
- the display screen 194 includes a display panel.
- the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
- LCD liquid crystal display
- OLED organic light-emitting diode
- active-matrix organic light-emitting diode active-matrix organic light-emitting diode
- emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
- the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
- the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
- the ISP is used to process the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
- ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193.
- the camera 193 is used to capture still images or videos.
- the object generates an optical image through the lens and is projected to the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
- ISP outputs digital image signals to DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
- Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
- MPEG moving picture experts group
- MPEG2 MPEG2, MPEG3, MPEG4, and so on.
- NPU is a neural-network (NN) computing processor.
- NN neural-network
- applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
- the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
- the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
- the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
- the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
- UFS universal flash storage
- the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
- the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
- the audio module 170 can also be used to encode and decode audio signals.
- the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
- the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
- the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C.
- the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals.
- the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
- the earphone interface 170D is used to connect wired earphones.
- the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association of the USA, CTIA
- the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
- the pressure sensor 180A may be provided on the display screen 194.
- the capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
- the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
- the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example, when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed. When a touch operation with a touch operation intensity greater than the second pressure threshold acts on the short message application icon and moves the touch position, the user can drag the short message application icon to another position.
- the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
- the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
- the gyro sensor 180B can be used for image stabilization.
- the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
- the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
- the air pressure sensor 180C is used to measure air pressure.
- the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor.
- the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
- the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D.
- features such as automatic unlocking of the flip cover are set.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers and so on.
- the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
- the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
- the light emitting diode may be an infrared light emitting diode.
- the electronic device 100 emits infrared light to the outside through the light emitting diode.
- the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
- the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
- the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
- the ambient light sensor 180L is used to sense the brightness of the ambient light.
- the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
- the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
- the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
- the temperature sensor 180J is used to detect temperature.
- the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
- the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
- the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
- Touch sensor 180K also called “touch panel”.
- the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
- the touch sensor 180K is used to detect touch operations acting on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- the visual output related to the touch operation can be provided through the display screen 194.
- the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
- the bone conduction sensor 180M can acquire vibration signals.
- the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice.
- the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
- the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
- the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
- the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
- the button 190 includes a power-on button, a volume button, and so on.
- the button 190 may be a mechanical button. It can also be a touch button.
- the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
- the motor 191 can generate vibration prompts.
- the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
- touch operations that act on different applications can correspond to different vibration feedback effects.
- Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
- Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
- the SIM card interface 195 is used to connect to the SIM card.
- the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
- the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
- the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
- the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
- the SIM card interface 195 can also be compatible with different types of SIM cards.
- the SIM card interface 195 may also be compatible with external memory cards.
- the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
- the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
- the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
- the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
- an Android system with a layered architecture is taken as an example to illustrate the software structure of the electronic device 100 by way of example.
- FIG. 2 is a software structure block diagram of an electronic device 100 provided in this embodiment.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
- the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
- the application layer can include a series of application packages.
- the application package can include applications such as voice assistant, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
- the window manager is used to manage window programs.
- the window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- the content provider is used to store and retrieve data and make these data accessible to applications.
- the data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
- the view system can be used to build applications.
- the display interface can be composed of one or more views.
- a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
- the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
- the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
- the notification manager is used to notify download completion, message reminders, and so on.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
- Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
- the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
- the application layer and the application framework layer run in a virtual machine.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
- surface manager surface manager
- media library Media Libraries
- 3D graphics processing library for example: OpenGL ES
- 2D graphics engine for example: SGL
- the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
- the 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
- the technical solutions involved in the following embodiments may all be implemented in the electronic device 100 having the foregoing hardware architecture and software architecture.
- the following describes the voice interaction method provided in the present application in detail with reference to the drawings and specific application scenarios, taking the electronic device 100 as a mobile phone as an example.
- an embodiment of the present application provides a voice interaction method, as shown in Figure 3(a).
- the method includes step S301-step S303:
- the voice assistant needs to be awakened first. Specifically, the user can wake up the voice assistant by entering a voice keyword (for example, "Xiaoyi Xiaoyi"), or by clicking the voice assistant icon on the display interface, or long pressing the hard button of the electronic device (for example, long pressing the power button) 1s) to wake up the voice assistant. After the voice assistant wakes up, it is in the radio state. At this time, the voice assistant can receive the voice skill instructions input by the user.
- a voice keyword for example, "Xiaoyi Xiaoyi"
- the voice assistant can receive the voice skill instructions input by the user.
- S301 After the voice assistant is awakened, determine the first application scenario according to one or more information items.
- the voice assistant obtains one or more information items by using the application recognition mechanism of the electronic device, and then determines the first application scenario that includes the one or more information items.
- the information items include the current display interface of the electronic device, the current time, the current location of the electronic device, the current motion state of the electronic device, the current event, or the application running on the current electronic device.
- the current display interface of the electronic device can be further divided into a display interface with unrecognizable text or picture or scene intent, and a display with recognizable text or picture according to whether it has recognizable text or picture or scene intent.
- Interface and a display interface with identifiable scene intentions (instant messaging (IM) type).
- IM instant messaging
- the current motion state of the electronic device is determined according to the current speed and/or acceleration of the electronic device.
- Recognizable text is valid (meaning meaningful) text
- recognizable pictures are valid (meaning meaningful) pictures.
- Recognizable text or pictures in the display interface of electronic devices can be understood through natural language (NLU). ) Entity recognition interface or HiTouch capability to recognize specific things such as hotels, scenic spots, TV shows, movies, etc.
- the NLU scene recognition interface recognizes the text and/or pictures in the current display interface to obtain structured data, and can determine the category of the obtained structured data, for example, the structured data
- the data of is the address, phone number, URL, etc.
- the specific scene intention can be further determined based on this information, such as navigating to a certain place, calling someone, copying the URL, etc.
- the current motion state of the electronic device is stationary, walking, driving a motor vehicle, and so on. If the current speed/acceleration of the electronic device is 0, the current electronic device is in a static state, that is, the user's motion state is static. If the current speed or acceleration of the electronic device is greater than the first preset threshold but less than the second preset threshold, the current electronic device is in a walking state, that is, the motion state of the user holding the electronic device is walking. If the current speed and/or acceleration of the electronic device is greater than the second preset threshold, the current electronic device is in a motor vehicle driving state, that is, the current user's motion state is a motor vehicle driving.
- the location of the current electronic device can be a home, a company, a business center, etc.
- the current time can be a specific time in the 24-hour system, or a specific time period of the day, such as morning, noon, evening, late night, etc. If the current time is 6:50 and the location of the electronic device is home, it can be determined that the current event is getting up.
- the prompt message can be displayed in text or played by voice, such as "What help do you need?" to prompt the user to input voice skills instructions, so that the voice assistant is awakened
- voice skill instruction input by the user can be received as soon as possible.
- the voice assistant can also remind the user that the voice assistant has been turned on when the user accidentally touches the voice assistant, so that the user can turn off the voice assistant as soon as possible.
- the voice assistant determines the commonly used voice skill instructions in the first application scenario according to the first application scenario and the historical voice skill usage record.
- the voice skill instruction is used to call the service provided by the voice assistant, that is, voice skill.
- the historical voice skill usage record includes one or more records, which are used to indicate the time when the voice assistant called the voice skill, the voice skill instruction to call the voice skill, and the voice skill call in the past period of time.
- step S302 There are two specific implementation manners of this step S302, namely implementation manner 1 and implementation manner 2, and the two implementation manners are respectively described below:
- the voice assistant determines the first application scenario, in the historical voice skill usage record, if the application scenario in which the voice skill is invoked includes all the information items in the first application scenario, the application scenario in which the voice skill is invoked corresponds to Voice skills are commonly used voice skills in the first application scenario.
- the preset voice skill instruction corresponding to the commonly used voice skill is the commonly used voice skill instruction in the first application scenario, or the voice skill with the most frequently used voice skills in the historical voice skill usage record.
- the instruction is a commonly used voice skill instruction in the first application scenario.
- the preset voice skill instruction is a voice skill instruction manually configured by the developer in the dialogue development platform, and there is a corresponding relationship with the voice skill.
- the first application scenario includes two information items.
- the two information items are information item 1 and information item 2.
- information item 1 is the current time at 10 am
- information item 2 is the current location of the electronic device.
- For home In the historical voice skill usage record, there are 4 records, which are respectively the usage record of voice skill A, the usage record of voice skill B, the usage record of voice skill C, and the usage record of voice skill D.
- the application scenario for invoking voice skill A includes information item 1
- the application scenario for invoking voice skill B includes information item 1 and information item 2
- the application scenario for invoking voice skill C includes information item 1, information item 2, and information Item 3, for example, the information item 3 is that the current motion state of the electronic device is static
- the application scenario of invoking the voice skill D includes the information item 3.
- the voice assistant can determine that the commonly used voice skills in the first application scenario are voice skill B and voice skill C, and the preset voice skill commands corresponding to voice skills B and C are the commonly used voice in the first application scenario Skill instructions.
- implementation manner 2 When the above-mentioned implementation manner 1 is used to determine the commonly-used voice skills, the number of commonly-used voice skills may be zero. Therefore, the embodiment of the present application also provides another possible implementation manner, namely, implementation manner 2:
- the voice assistant determines the first application scenario, in the historical voice skill usage record, if the application scenario for invoking the voice skill includes at least one information item in the first application scenario, then the application scenario for invoking the voice skill instruction
- the corresponding voice skill is the commonly used voice skill in the first application scenario
- the voice skill instruction corresponding to the commonly used voice skill in the historical voice skill usage record is the commonly used voice skill instruction in the first application scenario.
- the voice assistant will determine that voice skill A, voice skill B, and voice skill C are the first application scenarios.
- Commonly used voice skills, the voice skill instructions corresponding to the voice skills A, B, and C in the historical voice skill usage record are the commonly used voice skill instructions in the first application scenario.
- the voice skill instructions used to call the voice skill A include A1, A2, and A3. Compared with A2 and A3, the voice skill instruction A1 appears more frequently.
- the voice skill instructions used to call the voice skill B include B1, B2, and B3, where B1, B2, and B3 appear in the same number of times.
- the voice skill commands used to call voice skill C include C1, C2, C3, and C4. Compared with C1, C3, and C4, voice skill commands C2 appear more frequently, and voice skill commands corresponding to voice skill D include D1, D2, and D2. D3, D4, D5, compared with D1, D2, D3 and D4, the voice skill instruction D5 appears more often. Therefore, the voice skill commands corresponding to voice skills A, B, C, and D are A1, B1/B2/B3, C2, and D5, respectively.
- each information item corresponds to a difference in the commonly used voice skill instructions.
- the voice assistant will use the voice skill corresponding to the control (such as application icon) that can be clicked in the current display interface as the current display interface.
- the commonly used voice skills of is used to call the voice skill instructions of the commonly used voice skills, which are the commonly used voice skills instructions under the current display interface.
- the voice assistant In the historical voice skill usage record, if there is a voice skill call record based on the current display interface, the voice assistant will use the historical voice skill record, the number of calls exceeds the preset threshold of the current display interface that can be clicked
- the voice skill corresponding to the control is the voice skill commonly used in the current display interface, the voice skill command used to call the commonly used voice skill, is the commonly used voice skill command in the current display interface.
- the commonly used voice skills are the voice skills corresponding to the recognizable text and/or pictures in the current display interface, which are used to call the commonly used voice
- the voice skill command of the skill is the commonly used voice skill command under the current display interface.
- the commonly used voice skills are the voice skill commands related to "fast and furious” , Such as “Search Fast and Furious”.
- the commonly used voice skills are the voice skills related to the recognizable scene intents in the current display page, and the voice skill commands used to call the commonly used voice skills are: Commonly used voice skill commands under the current display interface.
- the “Xiaolongkan Hotpot Restaurant on East China Road” is identified by identifying the recognizable text and/or picture in the display interface through the NLU entity recognition interface or the HiTouch capability. Through the NLU intention recognition interface, "Xiaolongkan Hotpot Restaurant on East China Road” can be recognized as the address. Under the current display interface, the commonly used voice skills are those related to the address "Xiaolongkan Hotpot Restaurant on East China Road”. The voice skill command can be "Navigate to Xiaolongkan Hotpot Restaurant on Huadong Road”.
- the commonly used voice skills are the voice skills corresponding to the regular behavior at the current moment, third-party applications or native applications, which are used to call the voice skills commands of the commonly used voice skills, and are the commonly used voices under the current display interface. Skill instructions.
- the regular behavior based on the current moment is the behavior that the user frequently occurred at the current moment in the past.
- Third-party applications are applications that users download from application stores, such as "WeChat”, etc.
- Native applications are system applications that come with the electronic device itself, and they are also embedded applications, such as "cameras”.
- the current time is 10:28 in the morning.
- the user often uses “voice translation” and uses the software "Youdao Dictionary", then the commonly used voice skills command can be " Open Voice Translation", “Open Youdao Dictionary”.
- the commonly used voice skills are voice skills corresponding to the regular behavior of the electronic device's current location, third-party applications, or native applications, and are used to call the voice skill commands of the commonly used voice skills, which are Commonly used voice skill commands under the current display interface.
- the regular behavior based on the current moment is the behavior that the user frequently occurred in the current location of the electronic device in the past.
- the current location of the electronic device is home.
- the user often uses “Tencent Video” under the current location of the electronic device, and the commonly used voice skill command can be "Open Tencent Video” .
- the commonly used voice skills are regular behaviors based on the current motion state of the electronic device, voice skills corresponding to third-party applications or native applications, which are used to call the voice skill commands of the commonly used voice skills, which are Commonly used voice skill commands under the current display interface.
- the regular behavior based on the current motion state of the electronic device is a behavior that the user frequently occurred in the current motion state of the electronic device in the past.
- the current exercise state of the electronic device is running.
- the user often uses “music” at the current location of the electronic device, and the commonly used voice skill command may be "open music”.
- commonly used voice skills are preset events (such as preset schedules, alarm clocks, etc.), or voice skills corresponding to operations involving multiple applications (also called associated operations), which are used to call the commonly used voice skills.
- the voice skill command of the voice skill is the commonly used voice skill command under the current display interface.
- the current event may be an event that is currently occurring determined according to information items such as the current time, the current location of the electronic device, and the current display interface of the electronic device. For example, if the current time is 6:50 and the location of the electronic device is Home, you can determine that the current event is getting up.
- Association operations are operations involving multiple applications. For example, after playing music, open the "Financial News" application, and then open the "Alipay” application.
- the current event is getting up.
- the user often uses the voice skill corresponding to the associated operation.
- the associated operation is to play the weather first, then open the "Music” application and play. This associated operation corresponds to The voice skill command is "I got up”.
- users often pre-set the schedule after waking up as "learning English at 8 o'clock in the morning”. Therefore, the commonly used voice skill commands may be "I got up” and “Turn on "Speak English fluently”” and so on.
- the commonly used voice skills are the voice skills corresponding to the operation with a long path (that is, the user needs to manually touch and click multiple times), the functions in the application running on the current electronic device, and/or the need
- the voice skill corresponding to the voice skill corresponding to the cross-application related service is used to call the voice skill instruction of the commonly used voice skill, which is the commonly used voice skill instruction under the current display interface.
- the application currently running on the electronic device is "WeChat", and the current display interface is the “Discover” interface of the “WeChat” application.
- the user needs to open the payment code, the user needs to perform multiple click operations, for example, first Click “My”, then click “Pay”, click “Receive Payment”, and finally click “QR Code Collection” to open the WeChat payment code.
- the operation in this example requires the user to perform multiple operations. Therefore, when the application running on the current electronic device is "WeChat” and the display interface is the "Discover” interface of the "WeChat” application, "Open Payment Code” is Operation with a long path.
- the cross-application related service can be "open music"
- the function in the application currently running on the electronic device can be "view circle of friends” and so on.
- the commonly used voice skill commands can be operations such as “open payment code”, “view Moments” and/or "open music”.
- the voice assistant recommends the voice skill instructions (the process of displaying the voice skill instructions on the display interface by the voice assistant, that is, the voice assistant recommends the voice skill instructions).
- Process can be divided into system-level recommendation and module-level recommendation.
- the voice assistant recommends the user's voice skill instruction as a system-level recommendation based on a first application scenario including at least one information item in the current time, the current location of the electronic device, the current event, and the current display interface of the electronic device.
- the voice assistant is based on the first application scenario that includes the application running in the current electronic device, and the recommendation of the voice skill instruction to the user is a module-level recommendation, which can also be said to be an application-level recommendation.
- the voice assistant can combine the voice skill instruction recommendation algorithm to analyze historical voice skill usage records and the first application scenario, determine the common voice skills in the first application scenario, and Recommend the voice skill instruction corresponding to the common voice skill to the user.
- the voice skill instruction recommendation algorithm may be a machine learning algorithm or the like.
- the voice assistant displays the frequently used voice skill instructions in the first application scenario on the display interface.
- the voice assistant can determine the commonly used voice skill instructions in the first application scenario, and then the voice assistant displays the commonly used voice skill instructions on the display interface, so that the user can follow the voice skill instructions displayed on the display interface Enter the correct formatted voice skill instructions.
- the voice assistant may determine that there are many commonly used voice skill instructions in the first application scenario, and the voice assistant cannot display all the commonly used voice skill instructions on the display interface. Then, after step S302, the voice assistant It is also necessary to determine the display position of the frequently-used voice skill instructions, and whether the frequently-used voice skill instructions are displayed on the display interface.
- the voice assistant may determine the frequency of the voice skills invoked by the commonly used voice skill instructions in the first application scenario based on the historical voice skill usage records and the first application scenario. Subsequently, the voice assistant determines the priority of the commonly used voice skills according to the frequency of appearance of the commonly used voice skills in the first application scenario. The higher the frequency of appearance, the higher the priority. Finally, the voice assistant determines the display position of the commonly used voice skill instruction on the display interface according to the priority of the commonly used voice skill, and whether the commonly used voice skill instruction is displayed on the display interface. The frequently used voice skills with higher priority are displayed on the display interface first, and the voice skill commands corresponding to the frequently used voice skills with higher priority are displayed on the upper/left side of the display interface.
- the commonly used speech skills are speech skill A, speech skill B, and speech skill C.
- the number of appearances of speech skill A in the first application scenario is 5, and speech skill B is the first.
- the number of appearances in the application scenario is 3, and the number of appearances of voice skill C in the first application scenario is 4.
- the commonly used voice skills A, B, and C are sorted from high to low in order of priority as voice skill A, voice Skill C and Voice Skill B, for example, the voice skill commands corresponding to Voice Skills A, B, and C are "Turn on Voice Translation", "What Can You Do", and "Mobile Features". If the number of voice skill instructions displayed on the display interface is 2, the voice skill instruction corresponding to voice skill B with the lowest priority is not displayed.
- the display interface of the electronic device may be as shown in FIG. 4, and in any interface of the electronic device, the voice assistant is displayed in a floating state.
- the contents shown in 401, 402, 403, and 404 are all displayed in suspension.
- the content shown in 402 and 403 can be displayed in the same card.
- the prompt graphic shown in 401 is a sound wave, indicating that the voice assistant is in the radio mode.
- the voice skill instructions are displayed in the priority order (from high priority to low priority) of their corresponding commonly used voice skills A, B, and C, for example, “open” is displayed in sequence from top to bottom.
- the prompt text shown in 403 is used to prompt the user to enter a voice skill instruction, for example, "You can try to say to me”.
- the prompt graphic shown in 404 is used to switch the voice assistant from the floating state to the full-screen state.
- the voice assistant is displayed in the full-screen state, the ratio of its application interface to the overall display interface of the electronic device is 1. If the user clicks on the prompt graphic shown in 403, the voice assistant will be displayed in full screen, as shown in Figure 5.
- the commonly used voice skill instructions and their priority may be changed, or they may be different from the commonly used voice skill instructions and their priority shown in 402 in FIG. 4 the same.
- the commonly used voice skill commands and their priority levels have not changed as an example.
- the voice skill commands follow the priority order of their corresponding commonly used voice skills A, B, and C (from high priority).
- Level to low priority) display for example, “Turn on voice translation", “Mobile phone features", and "What can you do” displayed from top to bottom (Of course, the priority of voice skills commands from bottom to top can also be gradually displayed. Lowered, not shown in the figure).
- the user can input voice skill instructions by clicking on each item shown in 501.
- the "Help" button of the voice assistant is shown in 502, and the user can click the “Help” button to get familiar with the use of the voice assistant.
- the voice assistant "Settings” button is shown in 503, and the user can open the setting item interface of the voice assistant by clicking the "Settings” button, so as to modify the setting items of the voice assistant.
- the setting item interface of the voice assistant includes the opening mode of the voice assistant, and common voice skill commands set by the user.
- the prompt graphic shown in 504 is a sound wave, which is used to indicate that the voice assistant is in a receiving state.
- the voice assistant can also be displayed in a half-screen mode.
- the ratio of the voice assistant's application interface to the overall display interface of the electronic device is greater than 0 and less than 1, and the voice assistant can be split-screen with other applications Display, you can also display a part of the homepage interface, such as a part of the application icons on the homepage interface.
- the split-screen display of the application interface of the voice assistant and the "mail" application is taken as an example for description, as shown in Figure 6.
- the prompt graphic and prompt text shown in 601 are used to indicate that the voice assistant is in a receiving state, where the prompt graphic can be a sound wave, and the prompt text can be "Hi, I'm listening.".
- the voice skill instructions are displayed in the order of their corresponding common voice skills A, C, and B from high to low, for example, “Turn on voice translation” and "" are displayed in turn from left to right.
- Mobile phone features "What can you do” (Of course, the priority of voice skill commands from right to left can also be gradually reduced, not shown in the figure).
- step S302 when the voice assistant determines the commonly used voice skill instructions in the first application scenario, there may be two implementation manners, implementation manner 1 and implementation manner 2, respectively. Regarding implementation 2 in step S302, this application may also provide another specific implementation of step S303. The following introduces another specific implementation of step S303 in the form of a specific example:
- the first application scenario includes two information items.
- the two information items are information item 1 and information item 2.
- information item 1 is the current time at 10 am
- information item 2 is the location of the current electronic device. The location is the company.
- the historical voice skill usage record contains 6 voice skill usage records.
- there are 2 voice skill A usage records there are 2 voice skill A usage records , 2 use records of voice skill B, 1 use record of voice skill C, and 1 use record of voice skill D.
- These 6 use records are represented by a1, a2, b1, b2, c, and d, respectively.
- the application scenario for invoking voice skill A includes information item 1.
- the application scenario for invoking voice skill A includes information item 2.
- the application scenarios for invoking voice skill B both include information item 1 and information item 2.
- the application scenario for invoking voice skill C includes information item 1 and information item 2.
- the application scenario for invoking voice skill D includes information item 1.
- the number of times that the application scenario of calling voice skill B includes information item 1 and information item 2 is 2, and the number of times that the application scenario of calling voice skill C includes information item 1 and information item 2 is once, and the number of times that voice skill is called
- the application scenarios of A and D when calling the voice skill both only include one of the information item 1 and the information item 2.
- the priority of speech skills B and C is higher than the priority of speech skills A and D, and the priority of speech skill B is higher than the priority of speech skill C.
- the application scenario of calling voice skills A and D includes only one of information item 1 and information item 2, voice skill A is called twice, voice skill D is called once, and voice skill A has a higher priority The priority of speech skill D.
- the voice assistant According to the priority order of voice skills A, B, C, and D, the voice assistant successively displays voice skills instructions corresponding to voice skills B, C, A, and D on the display interface.
- the manner in which the voice assistant determines the display position of the voice skill instruction on the display interface according to the priority order of commonly used voice skills can refer to the above example, and will not be repeated here.
- the voice assistant if the voice assistant is awakened for the first time on the current electronic device, the voice assistant does not need to determine the first application scenario, but the current user’s user type, network connection status, and/or whether the voice assistant
- the high-frequency voice skill commands in the current network can be normally obtained for judgment, and the judgment result can be obtained.
- the voice assistant displays the corresponding voice skill instruction on the display interface according to the judgment result.
- the user types of current users include new users and old users
- the network connection status includes normal network connection and abnormal network connection.
- the user type of the current user is a new user. If the account registration duration of the current user exceeds the preset registration duration, or the account registration duration of the current user does not exceed the preset registration duration, but the current user performs cloud backup and restore operations on the electronic device, the user type of the current user is an old user .
- the abnormal network connection may include slow network connection, poor network signal, network disconnection, or network failure.
- the voice assistant displays the high-frequency voice skill commands in the current network on the display interface.
- the high-frequency voice skill instruction may be a voice skill instruction whose number of occurrences exceeds a preset threshold value in the current network.
- the voice assistant informs the user that the network is abnormal through text or voice, and displays the voice skill command for opening the network system settings on the display interface.
- the network system settings include, for example, opening the data connection, opening the wifi connection, etc. Setting items. If the voice assistant cannot normally obtain the high-frequency voice skill instructions in the current network, the voice assistant displays the preset voice skill instructions on the display interface. Among them, the preset voice skill instruction is a voice skill instruction manually set by the developer on the dialogue development platform.
- the voice assistant displays the voice skill instructions corresponding to the high-frequency voice skills in the current network on the display interface.
- the voice skill of the) position is the common voice skill of the current user, and then the voice assistant displays the voice skill instruction corresponding to the common voice skill of the current user on the display interface.
- the voice assistant informs the user that the network is abnormal through text or voice, and displays the voice skill instruction corresponding to the voice skill set for opening the network system on the display interface.
- the voice assistant displays the voice skill instructions corresponding to the preset voice skills on the display interface.
- the preset voice skills are the voice skills manually set by the developer on the dialogue development platform.
- the voice assistant will display text information on the display interface, such as "current network abnormal", which has informed the user that the network is unavailable. Connect normally.
- the voice assistant displays the text information "current network abnormality" on the display interface, the text information can also be played by voice.
- the voice assistant may also display a voice skill instruction for opening network system settings on the display interface.
- the network system settings include setting items such as opening data connection and opening wifi connection.
- This application provides a voice interaction method.
- the first application scenario can be determined based on one or more information items, and then the first application scenario can be determined based on the first application scenario and historical voice skill usage records.
- Commonly used voice skills instructions In other words, the voice assistant will determine the voice skill instructions that the user may input in the current application scenario according to the user's usage habits and the current application scenario.
- the voice assistant displays the frequently used voice skill instructions in the first application scenario on the display interface. Through this step, the voice assistant can recommend the frequently used voice skill instructions in the current application scenario to the user, so as to realize the scenario-based recommendation of the voice skill instructions.
- the voice assistant can recommend the voice skill instructions commonly used in the first scene to the user, thereby realizing the scene-based recommendation of the voice skill instructions, so that the user can call the voice skills they want to use according to the voice skill instructions recommended by the voice assistant , Reduce the situation that the voice assistant cannot recognize the voice skill instruction input by the user, or cannot successfully call the voice skill instruction according to the voice skill instruction input by the user, and improve the interactive experience between the user and the voice assistant.
- this application also provides a voice interaction method, as shown in Figure 3(b), after the above step S303, it further includes step S304 :
- the voice assistant In response to the voice skill instruction input by the user, the voice assistant calls the voice skill corresponding to the voice skill instruction input by the user.
- the voice assistant receives voice skill instructions input by the user through voice interaction, keyboard interaction, video interaction, and other forms.
- the voice skill instruction is input by the user according to the voice skill instruction displayed on the display interface (that is, the voice skill instruction recommended to the user by the voice assistant), or it may be input by the user.
- the voice assistant calls the voice skill corresponding to the voice skill instruction in response to the voice skill instruction input by the user. If the voice assistant fails to call the voice skill, multiple rounds of dialogue are conducted between the voice assistant and the user, prompting the user to input other voice skill instructions related to completing the call of the voice skill.
- the voice assistant determines other voice skill instructions input by the user after the voice skill instruction is input according to the time when the voice skill is invoked in the historical voice skill usage record. Subsequently, the voice assistant re-determines that the other voice skill commands are commonly used voice skill commands in the first application scenario, and displays them on the display interface.
- the voice assistant after the voice assistant receives the voice skill instruction input by the user, such as "set the alarm clock", the voice assistant cannot determine the ringing time of the alarm clock. Therefore, the voice assistant cannot successfully call the corresponding voice skill according to the voice skill instruction and realize the service called by the voice skill.
- the voice assistant needs to conduct multiple rounds of voice interaction with the user, that is, multiple rounds of dialogue, to determine the alarm time. As shown in Figure 7. As shown in FIG. 7, the voice assistant can prompt the user to input the alarm time of the alarm through the prompt text shown in 701, for example, "what time do you want to set the alarm clock".
- the voice assistant can determine the time when the voice skill is invoked in the historical voice skill usage record to determine that the user often enters the two voices "8 AM” or “8 PM” after inputting the voice skill command of "set the alarm”. Skill instructions, so the other voice skill instructions related to "set alarm” shown in 702 are “8 AM”, "8 PM” and so on.
- the sound wave shown in 703 is used to indicate that the voice assistant is in a radio receiving state.
- the voice assistant can input a new voice skill instruction according to the voice skill instruction shown in 702, for example, "8 AM”. Then, the voice assistant can use the two voice skills commands of "set the alarm clock” and "8:00 am” to successfully call the voice skills, and complete the operation of "set the alarm clock at 8:00 am”.
- the voice assistant can stop radio reception and enter a sleep state. It should be noted that users can also input other related voice skills commands according to their needs, such as "7 AM”. At this time, the voice assistant can follow the two voice skills commands of "Set an alarm clock” and "7 AM”. , To achieve the successful call of voice skills, complete the operation of "setting the alarm clock at 8 am”.
- the voice assistant if the voice assistant fails to call the voice skill, multiple rounds of dialogue are conducted between the voice assistant and the user, prompting the user to input other voice skill instructions related to completing the call of the voice skill.
- the voice assistant re-determines these other voice skill commands as common voice skill commands in the first application scenario according to other voice skill commands manually set by the developer on the corresponding dialogue node, and displays them on the display interface.
- the voice assistant after the voice assistant receives the voice skill instruction input by the user, such as "calling", it cannot determine the communication partner. Therefore, the voice assistant cannot successfully call the corresponding voice skill according to the voice skill instruction, and realize the service called by the voice skill, the voice assistant needs to conduct multiple rounds of voice interaction with the user to determine the call partner, as shown in Figure 8. .
- the voice assistant can prompt the user to enter the call object through the prompt text shown in 801, such as "Who do you want to call".
- the voice assistant can determine the corresponding dialogue node according to the voice skill command "call" input by the user, and other voice skill commands related to "call” manually set by the developer on the dialogue node, such as “mom” Or “Wang Xiaohei” and so on. Therefore, the other voice skill commands shown in 802 are “Mom”, “Wang Xiaohei” and so on.
- the sound wave shown in 803 is used to indicate that the voice assistant is in the receiving state. At this time, the voice assistant can input a new voice skill instruction according to the voice skill instruction shown in 802, such as "Wang Xiaohei".
- the voice assistant can successfully call the voice skills according to the two voice skills instructions of "Call” and “Give Wang Xiaohei” and complete the operation of "Call Wang Xiaohei”. Subsequently, the voice assistant can stop radio reception and enter a sleep state. It should be noted that users can also input other related voice skills commands according to their needs, such as "Zhang Xiaobai”. At this time, the voice assistant can implement voice according to the two voice skills commands of "Call” and “Zhang Xiaobai". The successful call of the skill completes the operation of "Call Zhang Xiaobai".
- the voice assistant will stop radio reception and enter the dormant state after completing the operation corresponding to the voice skill.
- the voice assistant stops receiving the sound and enters the dormant state.
- the first application scenario may be re-determined, and according to the technical solution of step S302 to step S304, the voice skill instruction recommended to the user is displayed on the display interface.
- the voice assistant may also receive a specific voice skill instruction input by the user.
- the display interface of the electronic device is as shown in Figure 9(a).
- the prompt text shown in 901 is used to indicate that the current scene is a driving scene, for example, "has entered the driving scene”. If the user clicks on the prompt graphic shown in 901, for example, " ⁇ ", the driving scene will be exited.
- the prompt text shown in 902 is a voice skill instruction input by the user, such as "navigate to Haidilao”.
- the prompt graphic shown in 903 is a floating ball, which is used to indicate that the voice assistant has stopped receiving audio and is in a dormant state.
- the voice assistant searches for multiple destinations according to the voice skill command "navigate to Haidilao" input by the user. Therefore, the voice assistant cannot determine the destination, nor can it successfully call the corresponding voice skill according to the voice skill command. The service corresponding to the voice skill. At this time, the voice assistant needs to conduct multiple rounds of voice interaction with the user to determine the destination and realize the operation of navigating to a certain place.
- the display interface of the electronic device is shown in Figure 9(b). Referring to FIG. 9(b), the content shown in 901 and 902 is the same as the content shown in 901 and 902 in FIG. 9(a), and will be repeated here.
- the prompt graphic shown in 903 is switched to sound wave, which is used to indicate that the voice assistant is in the radio state, and the user can input voice skill instructions.
- the prompt message shown in 904 such as "Find multiple destinations, which one to navigate to?” can inform the user that there are multiple destinations nearby and prompt the user to make a choice.
- the voice assistant can also display the prompt message shown in 904 on the display interface while playing the prompt message "multiple destinations found, which one should be navigated to?”.
- the content shown in 905 is the nearby destinations searched by the voice assistant according to the voice skill instructions input by the user. In fact, the number of possible destinations searched by the voice assistant may not be the five shown in the figure.
- the user can click on any column in the destination search results shown in 905, for example, click on the column where "Haidilao Hot Pot (Zhongshan South Road)” is located, and the destination is determined as “Haidilao Hot Pot (Zhongshan South Road)” ".
- Shown in 906 are the voice skill instructions recommended to the user by the voice assistant, such as "first”, “fifth”, “next page", and "cancel".
- the user can also directly click or input "first” shown in 906 into the voice assistant to determine the destination as "Haidilao Hotpot (Zhongshan South Road)”. If “Haidilao Hot Pot (Shangyuan Avenue Store)" is the user's destination, the user can also directly input the voice skill command "No.
- the voice assistant enters the dormant state and the electronic device
- the display interface is shown in Figure 9(c).
- the contents shown in 901, 902, and 904 are the same as the contents shown in 901, 902, and 904 in FIG. 9(b).
- the prompt graphic shown in 903 switches to a floating ball, indicating that the voice assistant enters the dormant state.
- 907 shows prompt messages, such as "You can directly tell me to quit the navigation, play music, call..., or wake me up through Xiaoyi Xiaoyi", and the voice assistant according to the re-determined first application scenario and Historical voice skill commands, voice skill commands recommended to the user, such as "Exit Navigation", “Play Music”, or "Call" to remind the user of the voice skill commands that can be input in the current scene.
- prompt information and voice skill instructions shown in 907 the electronic device can exit the driving scene or perform other operations through the voice interaction between the user and the voice assistant.
- the text information shown in 908 is the voice skill instruction "first" input by the user according to the voice skill shown in 906.
- the voice assistant determines the second application Scenes. Subsequently, the voice assistant determines the frequently used voice skill instructions in the second application scenario according to the second application scenario and the historical voice skill usage records, and the voice assistant displays the frequently used voice skill instructions in the second application scenario on the display interface.
- the first preset time period may be determined by the user according to actual application requirements, or may be preset by the voice assistant.
- the voice assistant if the voice assistant does not receive the voice skill instruction input by the user within the second preset time period after being awakened, the voice assistant is turned off.
- the second preset time period is longer than the first preset time period, and the second preset time period and the first preset time period start counting from the same time point.
- the second preset time period may be determined by the user according to actual application requirements, or may be preset by the voice assistant. Through this process, the waste of resources caused by the user's accidental touch to wake up the voice assistant can be reduced.
- the voice assistant determines the second application scenario after the first preset time period, and displays the common voice skill instructions in the second application scenario on the display interface. If the voice assistant does not receive the voice skill instruction input by the user within the second preset time period, the voice assistant is turned off. At this time, the second preset time period is after the first preset time period, and the second preset time period may be longer than the first preset time period, or may be shorter than the first preset time period. Both the first preset time period and the second preset time period can be determined by the user according to actual application requirements, or can be preset by the voice assistant.
- the voice assistant determines the second application scenario, and then according to the second application scenario and the historical voice skill usage records, determines the voice skill commands commonly used in the second application scenario, and the voice assistant displays the information in the second application scenario on the display interface. Commonly used voice skill instructions to re-recommend the voice skill instructions that the user may use.
- the first preset time period may be determined by the user according to actual application requirements, or may be preset by the voice assistant.
- the second application scenario and the specific implementation process of determining and displaying the commonly used voice skill instructions in the second application scenario please refer to the description of the first application scenario in the above content and the determination of the first application scenario The description of the specific implementation process of commonly used voice skill instructions and display will not be repeated here.
- the voice assistant will automatically record the voice skill instructions entered by the user, the voice skills invoked by the voice skills instructions, the time when the voice skills instructions invoke the voice skills, and the current application environment, and store these information in the historical voice
- the skill usage record is used to further enhance the practicability of the voice skill instructions recommended by the voice assistant to the user.
- the voice assistant may call the voice skill instruction corresponding to the voice skill in response to the voice skill instruction input by the user.
- the voice assistant fails to call the voice skill according to the voice skill instruction input by the user, it will re-recommend the voice skill instruction for the user and display it on the display interface, so that the user can input according to the voice skill instruction recommended by the voice assistant.
- Corresponding voice skills improve the interaction experience between the user and the voice assistant, and reduce the occurrence of the voice assistant failing to recognize the voice skill instructions input by the user, or failing to successfully invoke the voice skill instructions based on the voice skill instructions input by the user.
- this application also provides a voice interaction method.
- the user can complete the corresponding operation according to the received voice skill instruction by the voice assistant, and obtain the feedback result, and then feedback the result. Share to other apps.
- the following takes the voice assistant's split-screen display in the half-screen mode and other applications as an example.
- the half-screen mode of the voice assistant that is, the ratio of the voice assistant’s display interface to the overall display interface of the electronic device is greater than 0 and less than 1.
- the voice interaction method provided by the embodiment is described. As shown in (c) of FIG. 3, the method includes steps S305-S306:
- the voice assistant displays the feedback result of the voice skill instruction on the display interface of the voice assistant.
- the voice assistant calls the voice skill corresponding to the voice skill instruction according to the voice skill instruction input by the user, and obtains the feedback result of the voice skill instruction after completing the corresponding operation.
- the voice assistant displays the feedback result obtained in the form of a card.
- the voice assistant shares the feedback result of the voice skill instruction to other applications.
- the user's operation instructions include pressing operations and dragging operations.
- the voice assistant selects the card in response to the user's pressing operation on the card bearing the feedback result. Subsequently, the voice assistant drags the card to the user interface of other applications in response to the user's drag operation on the card. After the user selects the card, the selected card is displayed in suspension, and at the same time, the card is displayed in the original position in a lighter color. The user needs to keep pressing until the card is dragged to other application interfaces to complete sharing. If the user releases the card before dragging the card to the user interface of another application, the card bounces back, that is, the card is unselected, and the display interface of the electronic device reverts to the shape before the card was selected, and the sharing fails. If the user releases the card while dragging the card to the user interface of another application, the sharing is successful.
- the data format when the card is shared to other applications is determined by the type of the card content, such as pictures, text, or links.
- the voice assistant is displayed in a half-screen mode, and the voice assistant and "Mail" are displayed on the electronic device in separate screens.
- the ratio of the display interface of the voice assistant to the display interface of the "Mail” application is 5:3.
- the display interface is on the top, and the display interface of the "Mail” application is on the bottom.
- the prompt graphic shown in 1001 is a floating ball, which is used to indicate that the voice assistant is in a dormant state and stops receiving audio.
- the prompt text shown in 1001 is the feedback text "Sunny weather in Shanghai on weekends" obtained after the voice assistant completes the operation of querying the weather.
- the card shown in 1002 is a feedback card obtained after the voice assistant completes the operation of querying the weather, and the card contains more detailed weather information on weekends in Shanghai.
- the keywords shown in 1003 are common voice skills instructions recommended to the user by the voice assistant, such as "will it rain today", “the weather tomorrow", and so on.
- the display interface of the electronic device is as shown in Figure 10(b).
- the content shown in 1001, 1002, and 1003 is the same as the content shown in 1001, 1002, and 1003 in FIG. 10(a).
- the original feedback card shown in 1002 is shrunk and hovered by a certain ratio, and the interface of the "mail" application is highlighted to prompt the user to drag the hovering feedback card to the display interface of the "mail" application.
- the floating card shown in 1004 has not been dragged onto the display interface of the "Mail” application.
- the display interface of the electronic device is as shown in Figure 10(a). If the user drags the floating card to the display interface of the "Mail” application, as shown in Figure 10(d), when the user lets go, the Shanghai weather information in the card can be shared to the "Mail" application as a picture . After the card content is successfully shared, the display interface of the electronic device is shown in Figure 10(a). At this time, the display interface of the "Mail” application is different from the display interface of the "Mail” application before sharing. reflect.
- the voice assistant is displayed in full-screen mode, that is, the ratio of the voice assistant’s display interface to the overall display interface of the electronic device is 1, or the voice assistant is displayed on the display interface of the electronic device in a floating state, that is, voice
- the assistant is displayed in the form of a floating ball (stop radio) or sonic (radio) on the display interface
- the user can complete the sharing of the card content by long-pressing the card and clicking on the selection items that appear after the long-pressing the card .
- the data form of the card content sharing depends on the type of the card content, such as pictures, text, links, and so on.
- the voice assistant is displayed in a floating state.
- the voice assistant After receiving the voice skill instruction input by the user, such as "search for intellectual property rights", the voice assistant calls the corresponding voice skill to complete the query according to the voice skill instruction "search for intellectual property rights" Intellectual property related operations introduced, and feedback results are obtained.
- the display interface of the electronic device is as shown in (a) of FIG. 11. Referring to Figure 11(a), the voice assistant is displayed in a floating state, and the prompt graphic shown in 1101 is a floating ball, which is used to indicate that the voice assistant is in a dormant state and stops receiving audio.
- the prompt text shown in 1102 is used to indicate that the voice skill instruction input by the user received by the voice assistant is “search for intellectual property rights”, and then the voice assistant completes the corresponding response according to the voice skill instruction "search for intellectual property rights"
- the display interface of the electronic device is shown in Figure 11(b).
- the prompt graphic shown in 1101 is still a floating ball, and the card including the feedback text obtained after the voice assistant completes the search operation is shown in 1103.
- the voice assistant uses the voice to play 1103.
- the content in the card shown is "intellectual property rights, also known as intellectual property rights".
- the option card shown in 1104 includes operations that the user may want to perform on the content in the card shown in 1103, such as "copy”, “select", "share”, and so on. If the user clicks the "Share” item in the option card shown in 1104, the voice assistant recommends applications that can be shared to the user, such as "WeChat”, “QQ", "Email”, etc., and then, referring to the prior art, The user can choose to share the content of the card to other applications in the form of a link by touching and clicking.
- the user can also directly issue the voice skill instruction of "Share search results through WeChat" to the voice assistant, and share content through the voice interaction between the voice assistant and the user.
- the voice assistant calls the voice skills and completes the corresponding operations
- the user can operate on the feedback content obtained by the voice assistant and share the feedback content to other applications, thereby realizing the collaborative work of voice interaction and touch interaction, and improving user experience.
- the embodiment of the present application may divide the above-mentioned terminal and the like into functional modules according to the above-mentioned method examples.
- each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
- the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
- FIG. 12 shows a schematic diagram of a possible structure of the electronic device involved in the foregoing embodiment.
- the electronic device 1200 includes: a processing module 1201, a storage module 1202, and a display module 1203.
- the processing module 1201 is used to control and manage the actions of the electronic device 1200.
- the display module 1203 is used for displaying the image generated by the processing module 1201.
- the storage module 1202 is used to store the program code and data of the terminal.
- the storage module 1202 stores a preset wake-up word registered in the terminal and a first voiceprint model.
- the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened.
- the voiceprint model represents the voiceprint characteristics of the preset wake-up words.
- the electronic device 1200 may further include a communication module for supporting communication between the terminal and other network entities.
- a communication module for supporting communication between the terminal and other network entities.
- the processing module 1201 may be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (digital signal processor, DSP), and an application-specific integrated circuit (application-specific integrated circuit). integrated circuit, ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of this application.
- the processor may also be a combination for realizing computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
- the communication module can be a transceiver, a transceiver circuit, or a communication interface.
- the storage module 1202 may be a memory.
- the processing module 1201 is a processor (the processor 110 shown in FIG. 1)
- the communication module includes a Wi-Fi module and a Bluetooth module (the mobile communication module 150 and the wireless communication module 160 shown in FIG. 1).
- Communication modules such as Wi-Fi modules and Bluetooth modules can be collectively referred to as communication interfaces.
- the storage module 1202 is a memory (the internal memory 121 shown in FIG. 1 and an external SD card connected to the electronic device 1200 through the external memory interface 120).
- the display module 1203 is a touch screen (including the display screen 194 shown in FIG. 1 )
- the terminal provided in the embodiment of the present application may be the electronic device 100 shown in FIG. 1.
- the above-mentioned processor, communication interface, touch screen and memory can be coupled together through a bus.
- the embodiment of the present application also provides a chip system, which includes at least one processor 1301 and at least one interface circuit 1302.
- the processor 1301 and the interface circuit 1302 may be interconnected by wires.
- the interface circuit 1302 may be used to receive signals from other devices (such as the memory of the electronic device 100).
- the interface circuit 1302 may be used to send signals to other devices (such as the processor 1301).
- the interface circuit 1302 can read an instruction stored in the memory, and send the instruction to the processor 1301.
- the electronic device can be made to execute the steps executed by the electronic device 100 (for example, a mobile phone) in the above-mentioned embodiment.
- the chip system may also include other discrete devices, which are not specifically limited in the embodiment of the present application.
- the embodiment of the present application also provides a computer storage medium, the computer storage medium includes computer instructions, when the above-mentioned computer instructions are executed on an electronic device, the electronic device is caused to execute as shown in any one of the drawings in FIG. 3 or FIG. 5
- the relevant method steps, such as S301, S302, S303, S304, S305, and S306, implement the voice interaction method in the foregoing embodiment.
- the embodiments of the present application also provide a computer program product containing instructions.
- the computer program product runs on a computer, the computer executes the relevant method steps in (a), (b), and (c) of Figure 3 , Such as S301, S302, S303, S304, S305, and S306 to implement the voice interaction method in the foregoing embodiment.
- the embodiment of the present application also provides a voice interaction device, which has the function of realizing the behavior of the voice assistant in the electronic device in the above method.
- the function can be realized by hardware, or by hardware executing corresponding software.
- the hardware or software includes one or more modules corresponding to the above-mentioned functions.
- the electronic devices, computer storage media, or computer program products provided in the embodiments of the present application are all used to execute the corresponding methods provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding methods provided above The beneficial effects in the process will not be repeated here.
- the foregoing embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- the above-mentioned embodiments may appear in the form of a computer program product in whole or in part, and the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- Computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
- the disclosed device and method can be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. It can be combined or integrated into another device, or some features can be omitted or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate parts may be physically separated or not physically separated.
- the parts displayed as a unit may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed. To many different places. In the application process, some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art or the part of the technical solutions can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
- Including several instructions to make a device (which may be a personal computer, a server, a network device, a single-chip microcomputer, or a chip, etc.) or a processor execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- User Interface Of Digital Computer (AREA)
- Telephone Function (AREA)
Abstract
本申请提供一种语音交互方法及装置,涉及终端技术领域,可以根据第一应用场景以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令,并显示在显示界面上。这样,可以实现语音技能指令的场景化推荐,尽可能多的覆盖使用场景。所述方法包括:语音助手在被唤醒之后,根据一个或多个信息项,确定第一应用场景。语音助手根据第一应用场景,以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令。语音助手在显示界面上显示第一应用场景下的常用的语音技能指令。
Description
本申请要求于2019年09月30日提交国家知识产权局、申请号为201910941167.0、申请名称为“语音交互方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及终端技术领域,尤其涉及一种语音交互方法及装置。
随着语音交互技术的日益成熟,越来越多的电子设备都安装了语音助手这一应用程序(application,APP),以实现电子设备与用户的语音交互。通常的,开发者通过对话开发平台连接语音助手与电子设备中的各项业务,使得语音助手可以通过对话开发平台以及语音技能指令来调用电子设备中的各项业务。其中,对话开发平台是为开发者提供的语音技能开发工具,例如谷歌对话开发平台(dialogflow),语音技能是语音助手通过调用电子设备中的各项业务来实现的服务,每个语音技能都需要特定的语音技能指令来进行调用。开发者在通过对话开发平台开发语音技能时,在每个对话节点上手动配置语音技能指令,当用户在语音交互的过程中遇到某一对话节点时,相应的语音技能指令就会出现在界面中。因此,在现有技术中,每一对话节点所能调用的语音技能指令相对固定,且语音技能指令的覆盖场景有限。
发明内容
本申请提供一种语音交互方法及装置,根据用户的历史语音技能使用记录以及第一应用场景,在显示界面上显示第一应用场景下常用的语音技能指令,也就是将用户在第一应用场景下可能输入的语音技能指令推荐给用户,以实现语音技能指令的场景化推荐,尽可能多的覆盖使用场景。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种语音交互方法,应用于电子设备中,电子设备安装有语音助手,该方法包括:语音助手在被唤醒之后,根据一个或多个信息项,确定第一应用场景。其中,信息项包括电子设备的当前显示界面、当前时刻、电子设备当前所在位置、当前电子设备的运动状态、当前事件、或者当前电子设备上运行的应用。然后,语音助手根据第一应用场景,以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令。其中,语音技能指令用于调用语音技能,语音技能为语音助手提供的服务,历史语音技能使用记录包括一个或多个记录,所述记录用于指示调用语音技能的时间、调用语音技能的语音技能指令,以及调用语音技能的应用场景。最后,语音助手在显示界面上显示第一应用场景下的常用的语音技能指令。
本申请实施例中,语音助手根据电子设备的应用识别机制,确定第一应用场景中的一个或多个信息项。进一步的,语音助手根据第一应用场景中的信息项,以及历史语音技能使用记录中调用语音技能的应用场景,确定第一应用场景下的常用的语音技 能指令。第一应用场景中的信息项的数量越多,则语音助手在确定第一应用场景下的常用的语音技能指令时,所考虑到的因素就更充分,语音助手所确定的常用的语音技能指令的实用性和准确性就更高。另外,根据第一应用场景和语音技能使用记录确定显示在显示界面上的常用的语音技能指令,可实现推荐给用户的语音技能指令的动态调整,尽可能多的覆盖使用场景。语音助手在确定常用的语音技能指令后,还将这些常用的语音技能指令显示在显示界面上。因此,语音助手可以实现语音技能指令的场景化推荐,使得用户可以根据显示界面上所显示的语音技能指令来调用语音技能,减少用户错误输入而导致的语音助手不能识别用户输入的语音技能指令,或者不能根据用户输入的语音技能指令成功调用语音技能指令的情况发生,提高用户与语音助手之间的交互体验。
在一种可能的实现方式中,在语音助手被第一次唤醒的情况下,该方法还包括:若当前用户为新用户、网络正常连接,且语音助手能正常获取当前网络中的高频语音技能指令,则语音助手在显示界面上显示高频语音技能指令。若当前用户为老用户,则语音助手根据历史语音技能使用记录,确定当前用户的常用语音技能指令,语音助手在显示界面上显示所述当前用户的常用语音技能指令。若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。若语音助手不能正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示预设语音技能指令。
通过上述过程,语音助手在第一次被唤醒后,可以根据当前用户类型、当前网络连接状况以及语音助手是否能正常获取当前网络中的高频语音技能指令,向用户推荐其可能使用的语音技能指令,实现语音技能指令的场景化推荐,减少用户错误输入语音技能指令,或语音助手不能根据用户输入的语音技能指令来实现语音技能的调用的情况的发生,提升用户与语音助手之间的交互体验。
在一种可能的实现方式中,在语音助手根据第一应用场景,以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令之后,该方法还包括:语音助手根据历史语音技能使用记录以及第一应用场景,确定常用的语音技能在第一应用场景下的出现频率,常用的语音技能与所述常用的语音技能指令对应。语音助手根据常用的语音技能指令在第一应用场景下的出现频率,确定常用的语音技能的优先级。语音助手根据常用的语音技能的优先级,确定常用的语音技能指令在显示界面上的位置。
通过上述过程,语音助手可以根据历史语音技能使用记录中调用语音技能的应用场景、调用语音技能的次数,以及第一应用场景,确定第一应用场景下的常用的语音技能指令的优先级。以使得语音助手可以根据这些常用的语音技能的优先级,确定常用的语音技能在显示界面上的显示位置,以及显示次序,优先级高的语音技能指令优先在显示界面上显示。
在一种可能的实现方式中,该语音交互方法还包括:语音助手响应于用户输入的语音技能指令,调用与用户输入的语音技能指令对应的语音技能。
在一种可能的实现方式中,该语音交互方法还包括:语音助手在调用语音技能失败的情况下,根据历史语音技能使用记录中调用语音技能的时间,重新确定第一应用场景下的常用语音技能指令,并在显示界面上显示。
通过上述过程,可以实现语音技能指令的场景化推荐,实现语音技能指令的动态调整,并尽可能多的覆盖应用场景。
在一种可能的实现方式中,该语音交互方法还包括:若语音助手在第一预设时间段内未接收到用户输入的语音技能指令,则语音助手确定第二应用场景。语音助手根据第二应用场景,以及历史语音技能使用记录,确定第二应用场景下的常用的语音技能指令。随后,语音助手在显示界面上显示第二应用场景下的常用语音技能指令。
在一种可能的实现方式中,该语音交互方法还包括:若语音助手在第二预设时间段内未接收到用户输入的语音技能指令,则语音助手关闭。
通过上述过程,可以减少由于用户误触而打开语音助手所造成的资源浪费。
在一种可能的实现方式中,该语音交互方法还包括:若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。
在一种可能的实现方式中,该语音交互方法中,常用的语音技能指令用于调用显示界面中能够被点击的控件对应的语音技能,和/或常用的语音技能指令用于调用显示界面中可识别的文本或图片对应的语音技能,和/或常用的语音技能指令用于调用显示界面中可识别的场景意图对应的语音技能,和/或常用的语音技能指令用于调用基于当前时刻的规律行为对应的语音技能,和/或常用的语音技能指令用于调用基于电子设备当前所在位置的规律行为对应的语音技能,和/或常用的语音技能指令用于调用基于当前电子设备的运动状态的规律行为对应的语音技能,和/或常用的语音技能指令用于调用原生应用对应的语音技能,和/或常用的语音技能指令用于调用第三方应用对应的语音技能,和/或常用的语音技能指令用于调用预设事件对应的语音技能,和/或常用的语音技能指令用于调用涉及到多个应用的操作对应的语音技能,和/或常用的语音技能指令用于调用路径较长的操作对应的语音技能,和/或常用的语音技能指令用于调用当前电子设备上运行的应用中的功能对应的语音技能。
在一种可能的实现方式中,语音助手以半屏态显示,半屏态为语音助手的应用界面与电子设备的整体显示界面的比例大于0且小于1,该方法包括:语音助手将语音技能指令的反馈结果显示在语音助手的用户界面上。语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用。
在一种可能的实现方式中,语音助手将语音技能指令的反馈结果显示在语音助手的用户界面上,包括:语音助手将反馈结果以卡片的形式显示在语音助手的用户界面上。语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用,包括:语音助手响应于用户对卡片的按压操作,选中卡片。语音助手响应于用户对卡片的拖动操作,将卡片从语音助手的用户界面拖动到其他应用的用户界面。
第二方面,本申请实施例提供一种电子设备,该电子设备上安装有语音助手。电子设备包括处理器、存储器和显示器,存储器、显示器与处理器耦合。显示器用于显示所述处理器生成的图像,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器执行上述计算机指令时,该处理器,用于语音助手在被唤醒之后,根据一个或多个信息项,确定第一应用场景。其中,信息项包括电子设备的当前显示界面、当前时刻、电子设备当前所在位置、当前电子设备的运动状态、当前事件、或者当前电子设备上运行的应用。处理器用于根据第一应用场景,以及历史语音技能使 用记录,确定第一应用场景下的常用的语音技能指令。语音技能指令用于调用语音技能,语音技能为语音助手提供的服务,历史语音技能使用记录包括一个或多个记录,所述记录用于指示调用语音技能的时间、调用语音技能的语音技能指令,以及调用语音技能的应用场景。该处理器还用于在显示界面上显示第一应用场景下的常用的语音技能指令。
在一种可能的实现方式中,在语音助手被第一次唤醒的情况下,处理器还用于:若当前用户为新用户、网络正常连接,且语音助手能正常获取当前网络中的高频语音技能指令,则语音助手在显示界面上显示高频语音技能指令。若当前用户为老用户,则语音助手根据历史语音技能使用记录,确定当前用户的常用语音技能指令,语音助手在显示界面上显示当前用户的常用语音技能指令。若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。若语音助手不能正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示预设语音技能指令。
在一种可能的实现方式中,处理器用于根据第一应用场景,以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令之后。该处理器还用于根据历史语音技能使用记录以及第一应用场景,确定常用的语音技能在第一应用场景下的出现频率,常用的语音技能与常用的语音技能指令对应。处理器还用于根据常用的语音技能指令在第一应用场景下的出现频率,确定常用的语音技能的优先级。处理器还用于根据常用的语音技能的优先级,确定常用的语音技能指令在显示界面上的位置。
在一种可能的实现方式中,处理器还用于响应于用户输入的语音技能指令,调用与用户输入的语音技能指令对应的语音技能。
在一种可能的实现方式中,处理器还用于在调用语音技能失败的情况下,根据历史语音技能使用记录中调用语音技能的时间,重新确定第一应用场景下的常用语音技能指令,并在显示界面上显示。
在一种可能的实现方式中,处理器还用于,若语音助手在第一预设时间段内未接收到用户输入的语音技能指令,则语音助手确定第二应用场景。语音助手根据第二应用场景,以及历史语音技能使用记录,确定第二应用场景下的常用的语音技能指令。语音助手在显示界面上显示第二应用场景下的常用语音技能指令。
在一种可能的实现方式中,处理器还用于若语音助手在第二预设时间段内未接收到用户输入的语音技能指令,则语音助手关闭。
在一种可能的实现方式中,处理器还用于若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。
在一种可能的实现方式中,常用的语音技能指令用于调用显示界面中能够被点击的控件对应的语音技能,和/或常用的语音技能指令用于调用显示界面中可识别的文本或图片对应的语音技能,和/或常用的语音技能指令用于调用显示界面中可识别的场景意图对应的语音技能,和/或常用的语音技能指令用于调用基于当前时刻的规律行为对应的语音技能,和/或常用的语音技能指令用于调用基于电子设备当前所在位置的规律行为对应的语音技能,和/或常用的语音技能指令用于调用基于当前电子设备的运动状态的规律行为对应的语音技能,和/或常用的语音技能指令用于调用原生应用对应的语 音技能,和/或常用的语音技能指令用于调用第三方应用对应的语音技能,和/或常用的语音技能指令用于调用预设事件对应的语音技能,和/或常用的语音技能指令用于调用涉及到多个应用的操作对应的语音技能,和/或常用的语音技能指令用于调用路径较长的操作对应的语音技能,和/或常用的语音技能指令用于调用当前电子设备上运行的应用中的功能对应的语音技能。
在一种可能的实现方式中,语音助手以半屏态显示,半屏态为语音助手的应用界面与电子设备的整体显示界面的比例大于0且小于1.处理器还用于,将语音技能指令的反馈结果显示在语音助手的用户界面上,语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用。
在一种可能的实现方式中,处理器用于将语音技能指令的反馈结果显示在语音助手的用户界面上具体为:处理器,用于语音助手将反馈结果以卡片的形式显示在语音助手的用户界面上。语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用具体为:处理器用于语音助手响应于用户对卡片的按压操作,选中卡片;语音助手响应于用户对卡片的拖动操作,将卡片从语音助手的用户界面拖动到其他应用的用户界面。
第三方面,本申请实施例提供一种计算机存储介质,该计算机存储介质包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行如第一方面及其任一种可能的实现方式中所述的语音交互方法。
第四方面,本申请实施例提供一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面及其任一种可能的实现方式中所述的语音交互方法。
第五方面、提供一种芯片系统,包括处理器,当处理器执行指令时,处理器执行如第一方面中及其任一种可能的实现方式中所述的语音交互方法。
另外,第二方面及其任一种设计方式所述的电子设备,以及第三方面所述的计算机存储介质、第四方面所述的计算机程序产品所带来的技术效果可参见上述第一方面及其不同设计方式所带来的技术效果,此处不再赘述。
图1为本申请实施例提供的一种电子设备的结构示意图一;
图2为本申请实施例提供的一种电子设备的软件结构框图;
图3为本申请实施例提供的一种语音交互方法的流程图;
图4为本申请实施例提供的一种常用的语音技能指令显示示意图一;
图5为本申请实施例提供的一种常用的语音技能指令显示示意图二;
图6为本申请实施例提供的一种常用的语音技能指令显示示意图三;
图7为本申请实施例提供的一种常用的语音技能指令显示示意图四;
图8为本申请实施例提供的一种常用的语音技能指令显示示意图五;
图9为本申请实施例提供的一种常用的语音技能指令显示示意图六;
图10为本申请实施例提供的一种常用的语音技能指令显示示意图七;
图11为本申请实施例提供的一种常用的语音技能指令显示示意图八;
图12为本申请实施例提供的一种电子设备的结构示意图二。
下面结合附图对本申请的实施方式进行详细描述。
本申请实施例提供一种语音交互方法及装置,可以应用于电子设备中,通过电子设备上的语音助手实现电子设备与用户的语音交互。语音助手根据其所在的电子设备上的应用识别机制,确定包括一个或多个信息项(信息项包括当前时刻、电子设备当前所在位置、电子设备的当前显示界面等)的第一应用场景信息。语音助手根据第一场景信息和历史语音技能使用记录,确定常用的语音技能指令的优先级顺序,并按照优先级在显示界面上显示第一应用场景下的常用语音技能指令。为用户推荐语音技能指令,以实现语音技能指令的场景化推荐。同时,电子设备中的语音助手将用户发出的语音技能指令,以及当时的场景信息一一记录,作为判断用户使用语音助手可能的意图的数据来源,从而尽可能多的覆盖语音技能指令的使用场景,提高用户体验。
其中,语音助手可以是安装在电子设备中的应用程序,该应用程序可以是电子设备中的嵌入式应用程序(即电子设备的系统应用),也可以是可下载的应用程序。其中,嵌入式应用程序是作为电子设备(如手机)实现的一部分提供的应用程序。例如,嵌入式应用程序可以为“设置”应用、“短消息”应用和“相机”应用等。可下载应用程序是一个可以提供自己的因特网协议多媒体子系统(internet protocol multimedia subsystem,IMS)连接的应用程序,该可下载应用程序可以预先安装在电子设备中的应用或可以由用户下载并安装在电子设备中的第三方应用。例如,该可下载应用程序可以为“微信”应用、“支付宝”应用和“邮件”应用等。
本申请实施例中的电子设备可以为便携式计算机(如手机)、笔记本电脑、个人计算机(personal computer,PC)、可穿戴电子设备(如智能手表)、平板电脑、智能家居设备、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)终端(例如智能机器人)、车载电脑等,以下实施例对该设备的具体形式不做特殊限制。
请参考图1,其示出了本实施例提供的一种电子设备100的结构示意图。其中,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本实施例示意的结构并不构成对电子设备100的具体限定。在另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理 器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
在本申请实施例中,DSP可以实时监测语音数据,当DSP监测到的语音数据与电子设备中语音助手所推荐的语音技能指令的相似度满足预设条件时,便可以将该语音数据交给AP。由AP对上述语音数据进行文本校验和声纹校验。当AP确定该语音数据与语音助手所推荐的语音技能指令匹配时,电子设备便可以调用该语音技能指令,执行相应的语音技能。
控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接 口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频 输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息或需要通过语音助手触发电子设备100执行某些功能时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。当有触摸操作强度大于第二压力阈值的触摸操作作用于短消息应用图标并移动触摸位置时,用户可以拖动该短消息应用图标到其他位置。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨 块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
请参考图2,其是本实施例提供的一种电子设备100的软件结构框图。其中,分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括语音助手,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可 用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
示例性的,以下实施例中所涉及的技术方案均可以在具有上述硬件架构和软件架构的电子设备100中实现。以下结合附图和具体应用场景,以电子设备100为手机为例,对本申请所提供的语音交互方法进行详细介绍。
为了实现语音技能指令的场景化推荐,并尽可能多的覆盖使用场景,使用户输入正确的语音技能指令,本申请实施例提供一种语音交互方法,如图3的(a)所示,该方法包括步骤S301-步骤S303:
需要说明的是,在进行步骤S301之前,还需要先唤醒语音助手。具体的,用户可以通过输入语音关键词(例如“小艺小艺”)来唤醒语音助手,也可以通过点击显示界面上语音助手图标,或者长按电子设备硬按键(例如,长按电源power键1s)的方 式来唤醒语音助手,语音助手唤醒后,处于收音状态,此时语音助手可接收用户输入的语音技能指令。
S301、语音助手被唤醒后,根据一个或多个信息项,确定第一应用场景。
可选的,语音助手利用电子设备的应用识别机制获取一个或多个信息项,进而根据确定包括这一个或多个信息项的第一应用场景。信息项中包括电子设备的当前显示界面、当前时刻、电子设备当前所在位置、当前电子设备的运动状态、当前事件、或者当前电子设备上运行的应用。
其中,电子设备的当前显示界面可以根据其是否存在可识别的文本或图片或者场景意图,进一步详细的划分为无可识别的文本或图片或场景意图的显示界面、有可识别文本或图片的显示界面、以及有可识别的场景意图(即时通讯(instant messaging,IM)类)的显示界面。其中,当前电子设备的运动状态根据当前该电子设备速度和/或加速度确定。可识别的文本是有效(即有含义)文本,可识别的图片是有效(即有含义)图片,电子设备的显示界面中的可识别的文本或者图片可以通过自然语言理解(natural language understanding,NLU)实体识别接口或者HiTouch能力识别出酒店、景点、电视剧、电影等具体事物。关于可识别的场景意图,NLU场景识别接口对当前显示界面中的文本和/或图片等信息进行识别得到结构化的数据,并可以确定所得到的结构化的数据的所属类别,例如该结构化的数据为地址、电话号码、网址等,此时,可以进一步根据这些信息确定具体的场景意图,例如导航去某地、打电话给某人、复制网址等。
示例性的,当前电子设备的运动状态为静止、步行、机动车驾驶等。若当前电子设备的速度/加速度均为0,则当前电子设备处于静止状态,即用户的运动状态为静止。若当前电子设备的速度或加速度大于第一预设阈值但小于第二预设阈值,则当前电子设备处于步行状态,即持有该电子设备的用户的运动状态为步行。若当前电子设备的速度和/或加速度大于第二预设阈值,则当前电子设备处于机动车驾驶状态,即当前用户的运动状态为机动车驾驶。当前电子设备的所在位置可以为家、公司、商业中心等。当前时刻可以是24小时制中的具体时刻,也可以是一天中的特定时间段,如早晨、午间、傍晚、深夜等。若当前时刻为6:50,且电子设备所在位置为家,则可以确定当前事件为起床。
可选的,语音助手被唤醒后,可以通过文本显示提示信息,也可以通过语音播放提示信息,例如“请问您需要什么帮助呢”,来提示用户输入语音技能指令,以使得语音助手在被唤醒后,可以尽快接收到用户输入的语音技能指令。或者通过上述方式,语音助手也可以在由于用户误触而被唤醒的情况下,提醒用户语音助手已打开,以使得用户尽快关闭语音助手。
S302、语音助手根据第一应用场景,以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令。
其中,语音技能指令用于调用语音助手提供的服务,即语音技能。另外,历史语音技能使用记录中包括一个或多个记录,这一个或多个记录用于指示在过去一段时间内,语音助手调用语音技能的时间、调用语音技能的语音技能指令,以及调用语音技能的应用场景。
本步骤S302的具体实现方式有两种,分别为实现方式1和实现方式2,下面对这两种实现方式分别进行说明:
实现方式1:
可选的,语音助手确定第一应用场景后,在历史语音技能使用记录中,若调用语音技能的应用场景中包括第一应用场景中的所有信息项,则该调用语音技能的应用场景对应的语音技能为第一应用场景下的常用的语音技能。所述常用的语音技能对应的预设语音技能指令为第一应用场景下的常用的语音技能指令,或者在历史语音技能使用记录中,用于调用所述常用的语音技能的次数最多的语音技能指令为第一应用场景下的常用的语音技能指令。其中,预设语音技能指令是开发者在对话开发平台中手动配置的语音技能指令,与语音技能存在对应关系。
示例性的,第一应用场景中包括两个信息项,这两个信息项为信息项1和信息项2,例如信息项1为当前时刻为上午10点,信息项2为电子设备当前所在位置为家。在历史语音技能使用记录中,有4个记录,这4个记录分别为语音技能A的使用记录、语音技能B的使用记录、语音技能C的使用记录、语音技能D的使用记录。其中,调用语音技能A的应用场景中包括信息项1,调用语音技能B的应用场景中包括信息项1和信息项2,调用语音技能C的应用场景中包括信息项1、信息项2和信息项3,例如信息项3为当前电子设备的运动状态为静止,调用语音技能D的应用场景中包括信息项3。根据上述内容,语音助手可以确定第一应用场景下的常用的语音技能为语音技能B和语音技能C,语音技能B、C所对应的预设语音技能指令为第一应用场景下的常用的语音技能指令。
实现方式2:
利用上述实现方式1确定常用的语音技能时,可能会出现常用的语音技能的数量为0的情况,因此,本申请实施例还提供了另一种可能的实现方式,即实现方式2:
可选的,语音助手确定第一应用场景后,在历史语音技能使用记录中,若调用语音技能的应用场景中包括第一应用场景中的至少一个信息项,则该调用语音技能指令的应用场景对应的语音技能为第一应用场景下的常用的语音技能,该常用的语音技能在历史语音技能使用记录中对应的语音技能指令为第一应用场景下的常用的语音技能指令。那么在本步骤S302所给出的实现方式1的示例中,若按照实现方式2来进行处理和分析,则语音助手会确定语音技能A、语音技能B、语音技能C为第一应用场景下的常用的语音技能,语音技能A、B、C在历史语音技能使用记录中对应的语音技能指令为第一应用场景下的常用的语音技能指令。
示例性的,在历史语音技能使用记录中,用于调用语音技能A的语音技能指令包括A1、A2、A3,相对于A2和A3,语音技能指令A1出现的次数较多。用于调用语音技能B对应的语音技能指令包括B1、B2、B3,其中,B1、B2和B3出现的次数相同。用于调用语音技能C的语音技能指令包括C1、C2、C3、C4,相对于C1、C3和C4,语音技能指令C2出现的次数较多,语音技能D对应的语音技能指令包括D1、D2、D3、D4、D5,相对于D1、D2、D3和D4,语音技能指令D5出现的次数较多。因此,语音技能A、B、C、D所对应的语音技能指令分别为A1、B1/B2/B3、C2、D5。
需要说明的是,在第一应用场景中每一信息项,所对应的常用的语音技能指令间 存在差异,下面分别对第一应用场景中每个信息项所对应的常用的语音技能指令进行介绍:
1、无可识别的文本和/或图片或场景意图的电子设备的当前显示界面。
其中,关于可识别的文本、可识别的图片和可识别的场景意图的具体介绍可以参见上述步骤S301中的描述,在此不再赘述。
(1)在历史语音技能使用记录中,没有基于当前显示界面的语音技能的调用记录,则语音助手将当前显示界面中可以点击的控件(例如应用图标)对应的语音技能,作为当前显示界面下的常用语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。
(2)在历史语音技能使用记录中,有基于当前显示界面的语音技能的调用记录,则语音助手根据历史语音技能使用记录中,被调用次数超过预设次数阈值的当前显示界面中可以点击的控件(例如应用图标等)对应的语音技能,为当前显示界面下常用的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。
2、有可识别的文本和/或图片的电子设备的当前显示界面。
在有可识别的文本和/或图片的电子设备的当前显示界面下,常用的语音技能为与当前显示界面中的可识别的文本和/或图片对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。
示例性的,通过NLU实体识别接口或者HiTouch能力来识别显示界面中可识别的文本和/或图片识别出“速度与激情”,则常用的语音技能为与“速度与激情”相关的语音技能指令,例如“搜索速度与激情”。
3、有可识别的场景意图的电子设备的当前显示界面。
在有可识别的场景意图的电子设备的当前显示界面下,常用的语音技能为与当前显示页面中可识别的场景意图相关的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。
示例性的,通过NLU实体识别接口或者HiTouch能力来识别显示界面中可识别的文本和/或图片识别出“华东路的小龙坎火锅店”。通过NLU意图识别接口可以识别到“华东路的小龙坎火锅店”为地址,则在当前显示界面下,常用的语音技能为与地址“华东路的小龙坎火锅店”相关的语音技能,则常用的语音技能指令可以为“导航去华东路的小龙坎火锅店”。
4、当前时刻。
在当前时刻下,常用的语音技能为基于当前时刻的规律行为、第三方应用或者原生应用对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。其中,基于当前时刻的规律行为是用户过去在当前时刻下经常发生的行为。第三方应用为用户从应用商店中下载的应用程序,例如“微信”等,原生应用为电子设备本身自带的系统应用,也成嵌入式应用,例如“相机”等。
示例性的,当前时刻为上午10:28,在历史语音技能使用记录中,当前时刻下,用户经常使用“语音翻译”,并使用软件“有道词典”,则常用的语音技能指令可以为“打开语音翻译”、“打开有道词典”。
5、电子设备的当前所在位置。
在电子设备的当前所在位置下,常用的语音技能为基于电子设备的当前所在位置、第三方应用或者原生应用的规律行为对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。其中,基于当前时刻的规律行为是用户过去在电子设备的当前所在位置下经常发生的行为。
示例性的,电子设备的当前所在位置为家,在历史语音技能使用记录中,电子设备的当前所在位置下,用户经常使用“腾讯视频”,则常用的语音技能指令可以为“打开腾讯视频”。
6、当前电子设备的运动状态。
在当前电子设备的运动状态下,常用的语音技能为基于当前电子设备的运动状态的规律行为、第三方应用或者原生应用对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。其中,基于当前电子设备的运动状态的规律行为是用户过去当前电子设备的运动状态下经常发生的行为。
示例性的,当前电子设备的运动状态为跑步,在历史语音技能使用记录中,电子设备的当前所在位置下,用户经常使用“音乐”,则常用的语音技能指令可以为“打开音乐”。
7、当前事件。
在当前事件下,常用的语音技能为预设事件(例如预先设定的日程、闹钟等)、或者涉及到多个应用的操作(也称关联操作)对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。其中,当前事件可以是根据当前时刻、电子设备的当前所在位置、电子设备的当前显示界面等信息项来确定的当前所发生的事件,例如若当前时刻为6:50,且电子设备所在位置为家,则可以确定当前事件为起床。关联操作是涉及到多个应用的操作,例如在播放音乐后,打开“财经新闻”应用,然后打开“支付宝”应用。
示例性的,当前事件为起床,在历史语音技能使用记录中,用户经常会使用关联操作对应的语音技能,例如关联操作为先播放天气,然后打开“音乐”应用并播放,这个关联操作对应的语音技能指令为“我起床了”。另外,用户经常预先设定起床后的日程为“早上8点学习英语”。因此,常用的语音技能指令可能为“我起床了”以及“打开’流利说英语’”等。
8、当前电子设备上运行的应用。
在当前电子设备上运行的应用下,常用的语音技能为路径较长(即需要用户多次手动触摸点击)的操作对应的语音技能、当前电子设备上运行的应用中的功能,和/或需要跨应用的相关服务对应的语音技能对应的语音技能,用于调用该常用的语音技能的语音技能指令,为当前显示界面下的常用的语音技能指令。
示例性的,当前电子设备上运行的应用为“微信”,且当前显示界面为“微信”应用的“发现”界面,此时若用户需要打开付款码,用户需要进行多次点击操作,例如先点击“我的”,再点击“支付”,点击“收付款”,最后点击“二维码收款”,才能够打开微信的付款码。该示例中的操作就需要用户进行多次操作,因此,在当前电子设备上运行的应用为“微信”,且显示界面为“微信”应用的“发现”界面时, “打开付款码”即为路径较长的操作。此时,跨应用的相关服务可以为“打开音乐”,当前电子设备上运行的应用中的功能可以为“查看朋友圈”等。综上,常用的语音技能指令可以为“打开付款码”、“查看朋友圈”和/或“打开音乐”等操作。
另外,根据第一应用场景中所包含的信息项的不同,语音助手所进行的语音技能指令的推荐(语音助手在显示界面上显示语音技能指令的过程,即语音助手进行语音技能指令的推荐的过程),可以划分为系统级推荐和模块级推荐。其中,语音助手基于包括当前时刻、电子设备当前所在位置、当前事件和电子设备的当前显示界面中的至少一个信息项的第一应用场景,对用户进行的语音技能指令的推荐为系统级推荐。语音助手基于包括当前电子设备中运行的应用的第一应用场景,对用户进行的语音技能指令的推荐为模块级推荐,也可以说是应用级推荐。
可选的,在另一种可能的实现方式中,语音助手可以结合语音技能指令推荐算法,对历史语音技能使用记录以及第一应用场景进行分析,确定第一应用场景下的常用语音技能,并将该常用语音技能对应的语音技能指令推荐给用户。其中,语音技能指令推荐算法可以为机器学习算法等。
S303、语音助手在显示界面上显示第一应用场景下的常用的语音技能指令。
通过上述步骤S302,语音助手可以确定第一应用场景下的常用的语音技能指令,随后,语音助手在显示界面上显示所述常用的语音技能指令,以便于用户根据显示界面上显示的语音技能指令输入格式正确的语音技能指令。
在步骤S302之后,语音助手所确定的第一应用场景下的常用的语音技能指令可能会较多,语音助手无法在显示界面上显示所有的常用的语音技能指令,则在步骤S302之后,语音助手还需要确定常用的语音技能指令的显示位置,以及所述常用的语音技能指令是否在显示界面上显示。
可选的,语音助手可以根据历史语音技能使用记录以及第一应用场景,确定常用的语音技能指令调用的语音技能在第一应用场景下的出现频率。随后,语音助手根据常用的语音技能在第一应用场景下的出现频率,确定常用的语音技能的优先级,出现频率越高,则优先级越高。最后,语音助手根据常用的语音技能的优先级,确定常用的语音技能指令在显示界面上的显示位置,以及所述常用的语音技能指令是否在显示界面上显示。优先级高的常用的语音技能优先在显示界面上显示,且优先级越高的常用的语音技能对应的语音技能指令,在显示界面上的显示位置越靠上/左。
示例性的,常用的语音技能为语音技能A、语音技能B和语音技能C,在历史语音技能使用记录中,语音技能A在第一应用场景下的出现次数为5,语音技能B在第一应用场景下的出现次数为3,语音技能C在第一应用场景下的出现次数为4,则常用的语音技能A、B、C按照优先级顺序从高到低进行排序为语音技能A、语音技能C和语音技能B,例如,语音技能A、B、C对应的语音技能指令分别为“打开语音翻译”、“你能做什么”、“手机特色功能”。若显示界面上所显示的语音技能指令的数量为2个,则优先级最低的语音技能B对应的语音技能指令不显示。
在上述示例中,电子设备的显示界面可以如图4所示,在电子设备的任意界面中,语音助手以悬浮态的形式显示。参照图4所示,401、402、403与404所示的内容均悬浮显示。可选的,402与403所示的内容可以显示在同一卡片中。401所示的提示图 形为音波,表示语音助手处于收音状态。可选的,如402所示,语音技能指令按照其对应的常用的语音技能A、B、C的优先级顺序(从高优先级到低优先级)显示,例如从上到下依次显示“打开语音翻译”、“手机特色功能”、“你能做什么”(当然也可以是从低优先级到高优先级,图中未示出)。可选的,如403所示的提示文本,用于提示用户输入语音技能指令,例如“你可以试着对我说”。可选的,如404所示的提示图形用于将语音助手从悬浮态切换全屏态,语音助手以全屏态显示时,其应用界面与电子设备的整体显示界面的比例为1。若用户点击403所示的提示图形,则语音助手以全屏态显示,如图5所示。
参照图5所示,电子设备的当前显示界面发生改变,则常用的语音技能指令及其优先级可能会发生改变,也可能与图4中的402所示的常用的语音技能指令及其优先级相同。本示例中以常用的语音技能指令及其优先级未发生改变为例进行说明,如501所示,语音技能指令按照其对应的常用的语音技能A、B、C的优先级顺序(从高优先级到低优先级)显示,例如从上到下依次显示“打开语音翻译”、“手机特色功能”、“你能做什么”(当然也可以是从下到上的语音技能指令的优先级逐渐降低,图中未示出)。可选的,用户可以通过点击501所示的各项来实现语音技能指令的输入。可选的,语音助手的“帮助”按钮如502所示,用户可以通过点击该“帮助”按钮,来熟悉语音助手的使用方法。可选的,语音助手“设置”按钮如503所示,用户可以通过点击该“设置”按钮,来打开语音助手的设置项界面,从而实现对语音助手的设置项的修改。其中,语音助手的设置项界面中包括语音助手的打开方式、用户自行设定的常用语音技能指令等。可选的,504所示的提示图形为音波,用于表示语音助手处于收音状态。可选的,504所示的音波两侧分别有按钮“1”和按钮“2”,其中,“1”和“2”用于切换语音技能指令的输入方式,示例性的,点击按钮“1”,则语音技能指令的输入方式切换为键盘输入,点击按钮“2”,则语音技能指令的输入方式切换为视频输入。
在上述示例中,语音助手也可以是以半屏态的形式显示的,此时语音助手的应用界面与电子设备的整体显示界面的比例大于0,且小于1,语音助手可以与其他应用分屏显示,也可以显示主页界面的一部分,例如主页界面上的一部分应用图标。在本示例中,以语音助手与“邮件”应用的应用界面分屏显示为例进行说明,如图6所示。参照图6所示,601所示的提示图形和提示文本用于表示语音助手处于收音状态,其中提示图形可以为音波,提示文本可以为“嗨,我在听…”。可选的,如602所示,语音技能指令按照其对应的常用的语音技能A、C、B的优先级从高到低的顺序显示,例如从左到右依次显示“打开语音翻译”、“手机特色功能”、“你能做什么”(当然也可以是从右到左的语音技能指令的优先级逐渐降低,图中未示出)。
由上述步骤S302的具体描述可知,语音助手在确定第一应用场景下的常用的语音技能指令时,可以有两种实现方式,分别为实现方式1和实现方式2。针对上述步骤S302中的实现方式2,本申请还可以提供步骤S303的另一种具体实现方式,下面以具体示例的形式对步骤S303的另一种具体实现方式进行介绍:
第一应用场景中的信息项的数量越多,则语音助手在显示界面上所显示的常用语音技能指令的实用性越高。因此,对于常用语音技能来说,调用该语音技能的应用场 景中的信息项与第一应用场景中的信息项相同的数量越多,则该语音技能的优先级越高。在调用语音技能的应用场景中的信息项与第一应用场景中的信息项相同的数量一样的情况下,被调用次数越多的语音技能的优先级越高。
示例性的,第一应用场景中包括两个信息项,这两个信息项为信息项1和信息项2,例如信息项1为当前时刻为上午10点,信息项2为当前电子设备的所在位置为公司。对于常用的语音技能A、B、C、D来说,历史语音技能使用记录中包含6条语音技能的使用记录,在这6条语音技能的使用记录中,有2条语音技能A的使用记录,2条语音技能B的使用记录,1条语音技能C的使用记录,以及1条语音技能D的使用记录,这6条使用记录分别用a1、a2、b1、b2、c、d来表示。在a1中,调用语音技能A的应用场景中包括信息项1。在a2中,调用语音技能A的应用场景中包括信息项2。在b1和b2中,调用语音技能B的应用场景中均包括信息项1和信息项2。在c中,调用语音技能C的应用场景中包括信息项1和信息项2。在d中,调用语音技能D的应用场景中包括信息项1。其中,调用语音技能B的应用场景中包括信息项1和信息项2的次数为2次,调用语音技能C的应用场景中包括信息项1和信息项2的次数为1次,而调用语音技能A和调用语音技能D时的应用场景中均只包括信息项1和信息项2中的一项。因此语音技能B、C的优先级高于语音技能A、D的优先级,且语音技能B的优先级高于语音技能C的优先级。调用语音技能A、D的应用场景中只包括信息项1和信息项2中的一项时,语音技能A被调用2次,语音技能D被调用1次,则语音技能A的优先级高于语音技能D的优先级。语音助手按照语音技能A、B、C、D的优先级顺序,先后在显示界面上显示语音技能B、C、A、D对应的语音技能指令。语音助手根据常用的语音技能的优先级顺序,确定语音技能指令在显示界面上的显示位置的方式可以参见上述示例,在此不再赘述。
需要说明的是,若在当前电子设备上,语音助手是第一次被唤醒,则语音助手无需确定第一应用场景,而是对当前用户的用户类型、网络连接状况、和/或语音助手是否能正常获取当前网络中的高频语音技能指令进行判断,并得到判断结果。随后,语音助手根据所述判断结果,在显示界面上显示相应的语音技能指令。其中,当前用户的用户类型包括新用户和老用户,网络连接状况包括网络正常连接和网络未正常连接。
示例性的,若当前用户的账号注册时长未超过预设注册时长(如6个月),则当前用户的用户类型为新用户。若当前用户的账号注册时长超过预设注册时长,或者当前用户的账号注册时长未超过预设注册时长,但当前用户在电子设备上进行了云备份恢复操作,则当前用户的用户类型为老用户。另外,网络未正常连接的情况可以包括网络连接缓慢、网络信号不佳、网络断开,或者网络故障等。
具体的,在当前电子设备上,语音助手第一次被唤醒后,若当前用户(的用户类型)为新用户、网络正常连接、且语音助手可正常获取当前网络中的高频语音技能指令,则语音助手在显示界面上显示当前网络中的高频语音技能指令。其中,所述高频语音技能指令可以是在当前网络中出现次数超过预设次数阈值的语音技能指令。若当前用户(的用户类型)为老用户,则语音助手根据历史语音技能使用记录中各个语音技能指令的使用次数,先确定使用次数超过预设次数阈值的语音技能指令,或者使用次数较多的前n(n>=1)位的语音技能指令为当前用户的常用语音技能指令,然后语 音助手在显示界面上显示当前用户的常用语音技能指令。其中,关于历史语音技能使用记录的详细描述可以参见下述步骤S302中的描述,在此不再赘述。若网络未正常连接,则语音助手通过文本或者语音来告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令,网络系统设置中包括例如打开数据连接、打开wifi连接等设置项。若语音助手不能正常获取当前网络中的高频语音技能指令,则语音助手在显示界面上显示预设语音技能指令。其中,预设语音技能指令是开发者在对话开发平台上手动设置的语音技能指令。
可选的,在另一种可能的实现方式中,在电子设备上,语音助手在第一次被唤醒后,若当前用户为新用户、网络正常连接、且语音助手可正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示当前网络中的高频语音技能对应的语音技能指令。其中,所述高频语音技能可以是在当前网络中出现次数超过预设次数阈值的语音技能。若当前用户为老用户,则语音助手根据历史语音技能使用记录中各个语音技能的调用次数,先确定使用次数超过预设次数阈值的语音技能,或者使用次数较多的前n(n>=1)位的语音技能为当前用户的常用语音技能,然后语音助手在显示界面上显示当前用户的常用语音技能对应的语音技能指令。其中,关于历史语音技能使用记录的详细描述可以参见下述步骤S302中的描述,在此不再赘述。若网络未正常连接,则语音助手通过文本或者语音来告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能对应的语音技能指令。若语音助手不能正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示预设语音技能对应的语音技能指令。其中,预设语音技能是开发者在对话开发平台上手动设置的语音技能。
相类似的,若当前网络未正常连接,且当前电子设备上的语音助手不是第一次被唤醒,则语音助手会在显示界面上显示文本信息,例如“当前网络异常”,已告知用户网络未正常连接。可选的,语音助手在显示界面上显示文本信息“当前网络异常”时,也可以语音播放该文本信息。可选的,语音助手还可以在显示界面上显示用于打开网络系统设置的语音技能指令,网络系统设置中包括例如打开数据连接、打开wifi连接等设置项。
本申请提供了一种语音交互方法,语音助手在被唤醒之后,可以根据一个或多个信息项确定第一应用场景,然后根据第一应用场景以及历史语音技能使用记录,确定第一应用场景下的常用的语音技能指令。也就是说,语音助手会根据用户的使用习惯,以及当前的应用场景,来确定在当前应用场景下用户可能会输入的语音技能指令。最后,语音助手在显示界面上显示第一应用场景下的常用的语音技能指令。通过这一步骤,语音助手可以将当前应用场景下的常用的语音技能指令推荐给用户,实现语音技能指令的场景化推荐。通过上述过程,语音助手可以将第一场景下常用的语音技能指令推荐给用户,从而实现语音技能指令的场景化推荐,使得用户可以根据语音助手推荐的语音技能指令来调用其想使用的语音技能,减少语音助手不能识别用户输入的语音技能指令,或者不能根据用户输入的语音技能指令成功调用语音技能指令的情况发生,提高用户与语音助手之间的交互体验。
为了进一步实现语音技能指令的场景化推荐,使用户输入正确的语音技能指令,本申请还提供一种语音交互方法,如图3的(b)所示,在上述步骤S303之后,还包 括步骤S304:
S304、语音助手响应于用户输入的语音技能指令,调用与用户输入的语音技能指令对应的语音技能。
可选的,语音助手通过语音交互、键盘交互、视频交互等形式接收用户输入的语音技能指令。其中,该语音技能指令是用户根据显示界面上显示的语音技能指令(即语音助手推荐给用户的语音技能指令)输入的,也可以是用户自行输入的。随后,语音助手响应于用户输入的语音技能指令,调用与该语音技能指令对应的语音技能。若语音助手调用语音技能失败,则语音助手与用户之间进行多轮对话,提示用户输入与完成语音技能的调用相关的其他语音技能指令。可选的,语音助手在接收到用户输入的语音技能指令后,根据历史语音技能使用记录中调用语音技能的时间,确定用户在输入该语音技能指令之后输入的其他语音技能指令。随后,语音助手重新确定所述其他语音技能指令为在第一应用场景下的常用语音技能指令,并在显示界面上显示。
示例性的,语音助手在接收到用户输入的语音技能指令后,例如“定闹钟”,无法确定该闹钟的响铃时间。因此,语音助手也无法根据该语音技能指令成功调用相应的语音技能,并实现该语音技能所调用的服务,则语音助手需要与用户进行多轮语音交互,即多轮对话,以确定闹钟时间,如图7所示。参照图7所示,语音助手可以通过701所示的提示文字,例如“你想设定几点的闹钟”来提示用户输入闹钟的响铃时间。另外,语音助手可以根据历史语音技能使用记录中调用语音技能的时间,确定用户在输入“定闹钟”的语音技能指令后,还经常输入“上午8点”或者“下午8点”这两个语音技能指令,因此702所示的与“定闹钟”相关的其他语音技能指令为“上午8点”以及“下午8点”等。703所示的音波用于表示语音助手处于收音状态,此时,语音助手可以根据702所示的语音技能指令来输入新的语音技能指令,例如“上午8点”。然后,语音助手可以根据“定闹钟”和“上午8点”这两个语音技能指令,来实现语音技能的成功调用,完成“设定上午8点的闹钟”这一操作。随后,语音助手可以停止收音,进入休眠状态。需要说明的是,用户也可以根据自己的需要输入其他相关的语音技能指令,如“上午7点”,此时,语音助手可以根据“定闹钟”和“上午7点”这两个语音技能指令,来实现语音技能的成功调用,完成“设定上午8点的闹钟”这一操作。
在另一种可能的实现方式中,若语音助手调用语音技能失败,则语音助手与用户之间进行多轮对话,提示用户输入与完成语音技能的调用相关的其他语音技能指令。可选的,语音助手根据开发者在相应的对话节点上手动设置的其他语音技能指令,重新确定这些其他语音技能指令为第一应用场景下的常用语音技能指令,并显示在显示界面上。
示例性的,语音助手在接收到用户输入的语音技能指令后,例如“打电话”,无法确定通信对象。因此,语音助手也无法根据该语音技能指令成功调用相应的语音技能,并实现该语音技能所调用的服务,则语音助手需要与用户进行多轮语音交互,以确定通话对象,如图8所示。参照图8所示,语音助手可以通过801所示的提示文字,例如“你想打给谁”来提示用户输入通话对象。另外,语音助手可以根据用户输入的语音技能指令“打电话”来确定相应的对话节点,以及开发者在该对话节点上手动设 置的与“打电话”相关的其他语音技能指令,例如“妈妈”或者“王小黑”等。因此802所示的其他语音技能指令为“妈妈”、“王小黑”等。803所示的音波用于表示语音助手处于收音状态,此时,语音助手可以根据802所示的语音技能指令来输入新的语音技能指令,例如“王小黑”。然后,语音助手可以根据“打电话”和“给王小黑”这两个语音技能指令,来实现语音技能的成功调用,完成“打电话给王小黑”的操作。随后,语音助手可以停止收音,进入休眠状态。需要说明的是,用户也可以根据自己的需要输入其他相关的语音技能指令,如“张小白”,此时,语音助手可以根据“打电话”和“张小白”这两个语音技能指令,来实现语音技能的成功调用,完成“打电话给张小白”这一操作。
可选的,若语音助手响应于用户输入的语音技能指令,成功调用与该语音技能指令对应的语音技能,则语音助手完成该语音技能对应的操作后,停止收音,进入休眠状态。或者,语音助手在通过语音助手的多轮语音交互所确定的信息,成功实现语音技能的调用,并完成该语音技能对应的操作后,停止收音,进入休眠状态。
可选的,语音助手在进入休眠状态后,还可以重新确定第一应用场景,并根据上述步骤S302-步骤S304的技术方案,在显示界面上显示推荐给用户的语音技能指令。可选的,语音助手进入休眠状态后,也可以接收用户输入的特定的语音技能指令。
示例性的,在驾驶场景中,语音助手在接收到用户输入的语音技能指令“导航去海底捞”后,电子设备的显示界面如图9的(a)所示。参见图9的(a)所示,901所示的提示文本用于表示当前场景为驾驶场景,例如“已进入驾驶场景”。若用户点击如901所示的提示图形,例如“×”,则退出驾驶场景。902所示的提示文字为用户输入的语音技能指令,例如“导航去海底捞”。903所示的提示图形为悬浮球,用于表示语音助手停止收音,处于休眠状态。语音助手在根据用户输入的语音技能指令“导航去海底捞”进行搜索,得到多个目的地,因此,语音助手无法确定目的地,也无法根据该语音技能指令成功调用相应的语音技能,并实现该语音技能所对应的服务。此时,语音助手需要与用户进行多轮语音交互,以确定目的地,实现导航去某地的操作,电子设备的显示界面如图9的(b)所示。参见图9的(b)所示,901与902所示内容与图9的(a)中901与902所示的内容相同,在此再赘述。903所示的提示图形切换为音波,用于表示语音助手处于收音状态,用户可输入语音技能指令。904所示的提示信息,例如“找到多个目的地,导航去第几个?”,可以告知用户附近有多个目的地并提示用户做出选择。可选的,语音助手也可在显示界面上显示904所示提示信息的同时,语音播放该提示信息“找到多个目的地,导航去第几个?”。905所示的内容为语音助手根据用户输入的语音技能指令搜索到的附近的目的地,实际上,语音助手搜索到的可能的目的地的数量可能并不是图示的5个。用户可以通过对905所示的目的地搜索结果中的任意栏进行点击,例如点击“海底捞火锅(中山南路店)”所在的一栏,则确定目的地为“海底捞火锅(中山南路店)”。906所示为语音助手推荐给用户的语音技能指令,例如“第一个”、“第五个”、“下一页”以及“取消”等。用户还可以直接点击或者向语音助手输入906所示的“第一个”来确定目的地为“海底捞火锅(中山南路店)”。若“海底捞火锅(上元大街店)”是用户的目的地,则用户也可以直接向语音助手输入语音技能指令“第4个”、“导航去海底捞火锅(上 元大街店)”或者“导航去第4个目的地”。若用户未在906所示的地址中找到正确的目的地,则可以点击或向语音助手输入906所示语音技能指令中的“下一页”,来查看其他的目的地搜索结果,或者用户也可以直接点击或者向语音助手输入“取消”,再重新向输入导航去某地的语音技能指令。当用户根据906所示的语音技能指令输入“第一个”后,语音助手确定目的地为“海底捞火锅(中山南路店)”,并开始导航,此时,语音助手进入休眠状态,电子设备的显示界面如图9的(c)所示。参照图9的(c)所示,901、902和904所示的内容与图9的(b)中901、902和904所示的内容相同。903所示的提示图形切换为悬浮球,表示语音助手进入休眠状态。此外,907所示为提示信息,例如“你可以直接对我说退出导航、播放音乐、打电话给…,或通过小艺小艺唤醒我”,以及语音助手根据重新确定的第一应用场景和历史语音技能指令,推荐给用户的语音技能指令,例如“退出导航”、“播放音乐”、或者“打电话给…”用于提示用户在当前场景下可以输入的语音技能指令。参照907所示的提示信息和语音技能指令,电子设备可以通过用户与语音助手的语音交互,来退出驾驶场景,或者进行其他操作等。908所示的文本信息为用户根据906所示的语音技能输入的语音技能指令“第一个”。
可选的,若语音助手在被唤醒后,由于麦克风异常或者用户未输入语音技能指令等原因,未在第一预设时间段内接收到用户输入的语音技能指令,则语音助手确定第二应用场景。随后,语音助手根据第二应用场景以及历史语音技能使用记录,确定第二应用场景下常用的语音技能指令,并且语音助手在显示界面上显示第二应用场景下的常用的语音技能指令。其中,第一预设时间段可以由用户根据实际应用需求来确定,也可以由语音助手预先设定。另外,关于第二应用场景的介绍以及确定第二应用场景下的常用的语音技能指令并显示的具体实现过程,可以参见上述内容中对于第一应用场景的描述,以及确定第一应用场景下的常用的语音技能指令并显示的具体实现过程的描述,在此不再赘述。
可选的,若语音助手在被唤醒后,未在第二预设时间段内接收到用户输入的语音技能指令,则语音助手关闭。其中,第二预设时间段长于第一预设时间段,且第二预设时间段与第一预设时间段从同一时间点开始计时。第二预设时间段可以由用户根据实际应用需求来确定,也可以由语音助手预先设定。通过该过程,可以减少因用户误触唤醒语音助手而造成的资源浪费。
可选的,在另一种可能的实现方式中,语音助手在第一预设时间段后确定第二应用场景,并在显示界面上显示第二应用场景下的常用语音技能指令后。若语音助手在第二预设时间段内未接收到用户输入的语音技能指令,则语音助手关闭。此时,第二预设时间段位于第一预设时间段之后,且第二预设时间段可以长于第一预设时间段,也可以短于第一预设时间段。第一预设时间段和第二预设时间段均可由用户根据实际应用需求来确定,也可由语音助手预先设定。
可选的,若语音助手在被唤醒后,由于语音助手接收到的用户输入的语音技能指令所要调用的语音技能,超出语音助手所能调用的语音技能的范围,或者语音助手未能正确识别用户输入的语音技能指令,导致语音助手无法成功调用语音技能指令对应的语音技能。此时,语音助手确定第二应用场景,然后根据第二应用场景以及历史语 音技能使用记录,确定第二应用场景下常用的语音技能指令,并且语音助手在显示界面上显示第二应用场景下的常用的语音技能指令,以重新向用户推荐其可能使用的语音技能指令。其中,第一预设时间段可以由用户根据实际应用需求来确定,也可以由语音助手预先设定。另外,关于第二应用场景的介绍以及确定第二应用场景下的常用的语音技能指令并显示的具体实现过程,可以参见上述内容中对于第一应用场景的描述,以及确定第一应用场景下的常用的语音技能指令并显示的具体实现过程的描述,在此不再赘述。
需要说明的是,语音助手会自动记录用户输入的语音技能指令、该语音技能指令调用的语音技能、该语音技能指令调用语音技能的时间,以及当前的应用环境,并将这些信息存入历史语音技能使用记录中,以进一步提升语音助手向用户推荐的语音技能指令的实用性。
在本申请实施例所提供的语音交互方法中,语音助手可以响应于用户输入的语音技能指令,调用语音技能对应的语音技能指令。其中,语音助手根据用户输入的语音技能指令调用语音技能失败后,会重新为用户推荐语音技能指令,并显示在显示界面上,从而使用户可以根据语音助手推荐的语音技能指令来进行输入,调用相应的语音技能,提升用户与语音助手的交互体验,减少语音助手不能识别用户输入的语音技能指令,或者不能根据用户输入的语音技能指令成功调用语音技能指令的情况发生。
在现有技术中,语音交互与传统的触摸操作之间相互独立,用户可能不会因为要完成一项操作而专门调用语音交互。因此,为了实现语音交互与触摸交互的协同工作,本申请还提供了一种语音交互方法,用户可以在语音助手根据接收到的语音技能指令完成相应的操作,并得到反馈结果后,将反馈结果分享至其他应用。下面以语音助手以半屏态形式与其它应用分屏显示为例,其中,语音助手的半屏态形式即语音助手的显示界面与电子设备的整体显示界面的比例大于0且小于1,对本申请实施例所提供的语音交互方法进行说明,如图3的(c)所示,该方法中包括步骤S305-S306:
S305、语音助手将语音技能指令的反馈结果显示在语音助手的显示界面上。
语音助手根据用户输入的语音技能指令,调用该语音技能指令对应的语音技能,完成相应的操作后,得到该语音技能指令的反馈结果。可选地,语音助手将得到的反馈结果以卡片的形式显示。
S306、语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用。
其中,用户的操作指令包括按压操作和拖动操作。
可选的,语音助手响应于用户对承载反馈结果的卡片的按压操作,选中该卡片。随后,语音助手响应于用户对卡片的拖动操作,将卡片拖动到其他应用的用户界面。用户在选中卡片后,被选中的卡片悬浮显示,同时,该卡片还会以较浅的颜色在原位置上显示,用户需要一直保持按压操作,直至将卡片拖动到其他应用界面中,完成分享。若用户在未将卡片拖动到其他应用的用户界面之前松手,则卡片弹回,即该卡片被取消选中,电子设备的显示界面回复为该卡片被选中之前的形态,分享失败。若用户将卡片拖动至其他应用的用户界面时松手,则分享成功。其中,卡片分享至其他应用时的数据形式由卡片内容的类型决定,例如图片、文本或者链接等。
示例性的,语音助手以半屏态形式显示,语音助手与“邮件”在电子设备上分屏显示,语音助手的显示界面与“邮件”应用的显示界面的比例为5:3,语音助手的显示界面在上,“邮件”应用的显示界面在下。语音助手在接收到用户输入的语音技能指令后,例如“上海周末天气怎么样”,根据该语音技能指令“上海周末天气怎么样”,调用相应的语音技能来完成查询天气的操作,并得到反馈结果。语音助手在接收到用户输入的语音技能指令后,电子设备的显示界面如图10中的(a)所示。参照图10的(a)所示,1001所示的提示图形为悬浮球,用于表示语音助手处于休眠状态,停止收音。1001所示提示文本为语音助手完成查询天气的操作后得到的反馈文本“上海周末天气晴”。可选的,1002所示卡片为语音助手完成查询天气的操作后得到的反馈卡片,该卡片中包含较为详细的上海周末的天气信息。可选的,1003所示的关键词为语音助手推荐给用户的常用语音技能指令,例如“今天会下雨吗”、“明天的天气”等。长按1002所示的反馈卡片预设时长,例如0.5s,则电子设备的显示界面如图10的(b)所示。参照图10的(b)所示,1001、1002和1003所示的内容与图10的(a)中1001、1002和1003所示的内容。如1004所示,原1002所示的反馈卡片以一定比例缩小悬浮显示,且“邮件”应用的界面高亮显示,以提示用户拖动悬浮的反馈卡片到“邮件”应用的显示界面。用户在拖动反馈卡片一定距离后,如图10的(c)所示,如1004所示的悬浮的卡片还未拖动到“邮件”应用的显示界面之上,此时,若用户松手,则此次分享失败,悬浮的卡片回弹,此时,电子设备的显示界面如图10的(a)所示。若用户将拖动悬浮的卡片到“邮件”应用的显示界面上,如图10的(d)所示,此时用户松手,则卡片中的上海天气信息可以以图片形式分享至“邮件”应用。卡片内容分享成功后,电子设备的显示界面如图10的(a)所示,此时,“邮件”应用的显示界面与分享之前的“邮件”应用的显示界面有所不同,在此不再体现。
需要说明的是,若语音助手以全屏态形式显示,即语音助手的显示界面与电子设备的整体显示界面的比例为1,或者语音助手以悬浮态形式在电子设备的显示界面上显示,即语音助手在显示界面上以悬浮球(停止收音)或者音波(收音)的形式显示时,用户可以通过长按卡片,并对长按卡片后出现的选择项进行点击的方式,来完成卡片内容的分享。其中,卡片内容分享的数据形式取决于卡片内容的类型,例如图片、文本、链接等。
示例性的,语音助手以悬浮态显示,在接收到用户输入的语音技能指令后,例如“搜索知识产权”,语音助手根据该语音技能指令“搜索知识产权”,调用相应的语音技能来完成查询知识产权相关介绍的操作,并得到反馈结果。语音助手接收到用户输入的语音技能指令后,电子设备的显示界面如图11中的(a)所示。参照图11的(a)所示,语音助手以悬浮态显示,1101所示的提示图形为悬浮球,用于表示语音助手处于休眠状态,停止收音。1102所示的提示文本,例如“搜索知识产权”,用于表示语音助手接收到的用户输入的语音技能指令为“搜索知识产权”,随后,语音助手根据语音技能指令“搜索知识产权”完成相应搜索操作,电子设备的显示界面如图11的(b)所示。参照图11的(b)所示,1101所示的提示图形仍为悬浮球,语音助手完成搜索操作后得到的包括反馈文本的卡片如1103所示,可选的,语音助手通过语音播放1103所示的卡片中的内容“知识产权,也称其为知识所属权…”。长按1103所示的卡片(例 如长按0.5s),则弹出选择项卡片,如图11的(c)所示。参照图11的(c)所示,1104所示的选择项卡片中包括用户可能想对1103所示的卡片中内容进行的操作,例如“复制”、“选择”、“分享”等。若用户点击1104所示的选择项卡片中的“分享”项,则语音助手向用户推荐可以分享的应用,例如“微信”、“QQ”、“电子邮件”等,随后,参照现有技术,用户可以通过触摸点击,选择将卡片内容以链接的形式分享至其他应用。
当然,用户也可以直接向语音助手下发“通过微信分享搜索结果”的语音技能指令,通过语音助手与用户之间的语音交互,来实现内容的分享。
通过上述过程,在语音助手调用语音技能,完成相应的操作后,用户可以通过对语音助手得到的反馈内容进行操作,将反馈内容分享至其他应用,从而实现语音交互与触摸交互的协同工作,提升用户体验。
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图12示出了上述实施例中所涉及的电子设备的一种可能的结构示意图。该电子设备1200包括:处理模块1201、存储模块1202和显示模块1203。处理模块1201用于对电子设备1200的动作进行控制管理。显示模块1203用于显示处理模块1201生成的图像。存储模块1202,用于保存终端的程序代码和数据。例如,存储模块1202中保存有终端中注册的预置唤醒词,以及第一声纹模型,所述第一声纹模型用于在唤醒所述语音助手时进行声纹校验,所述第一声纹模型表征所述预置唤醒词的声纹特征。可选的,电子设备1200还可以包括通信模块用于支持终端与其他网络实体的通信。电子设备1200包括的各个单元的详细描述可以参考上述各方法实施例中的描述,这里不再赘述。
其中,处理模块1201可以是处理器或控制器,例如可以是中央处理器(central processing unit,CPU),通用处理器,数字信号处理器(digital signal processor,DSP),专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块可以是收发器、收发电路或通信接口等。存储模块1202可以是存储器。
当处理模块1201为处理器(如图1所示的处理器110),通信模块包括Wi-Fi模块和蓝牙模块(如图1所示的移动通信模块150和无线通信模块160)。Wi-Fi模块和蓝牙模块等通信模块可以统称为通信接口。存储模块1202为存储器(如图1所示的内部存储器121和通过外部存储器接口120连接电子设备1200的外置SD卡)。显示模块1203为触摸屏(包括图1所示的显示屏194)时,本申请实施例所提供的终端可以为图1所示的电子设备100。其中,上述处理器、通信接口、触摸屏和存储器可以通 过总线耦合在一起。
本申请实施例还提供一种芯片系统,该芯片系统包括至少一个处理器1301和至少一个接口电路1302。处理器1301和接口电路1302可通过线路互联。例如,接口电路1302可用于从其它装置(例如电子设备100的存储器)接收信号。又例如,接口电路1302可用于向其它装置(例如处理器1301)发送信号。示例性的,接口电路1302可读取存储器中存储的指令,并将该指令发送给处理器1301。当所述指令被处理器1301执行时,可使得电子设备执行上述实施例中的电子设备100(比如,手机)执行的各个步骤。当然,该芯片系统还可以包含其他分立器件,本申请实施例对此不作具体限定。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质中包括计算机指令,当上述计算机指令在电子设备上运行时,使得该电子设备执行如图3或图5中任一附图中的相关方法步骤,如S301、S302、S303、S304、S305、S306,实现上述实施例中的语音交互方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行如图3的(a)、(b)和(c)中的相关方法步骤,如S301、S302、S303、S304、S305、S306,实现上述实施例中的语音交互方法。
本申请实施例还提供了一种语音交互的装置,该装置具有实现上述方法实际中电子设备中的语音助手的行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
其中,本申请实施例提供的电子设备、计算机存储介质或计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
上述实施例可以全部或部分通过软件,硬件,固件或者其任意组合实现。当使用软件程序实现时,上述实施例可以全部或部分地以计算机程序产品的形式出现,计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。
其中,所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要 而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是物理上分开的,或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。在应用过程中,可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是个人计算机,服务器,网络设备,单片机或者芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。
Claims (25)
- 一种语音交互方法,应用于电子设备中,所述电子设备安装有语音助手,其特征在于,包括:所述语音助手在被唤醒之后,根据一个或多个信息项,确定第一应用场景;所述信息项包括电子设备的当前显示界面、当前时刻、电子设备当前所在位置、当前电子设备的运动状态、当前事件、或者当前电子设备上运行的应用;所述语音助手根据所述第一应用场景,以及历史语音技能使用记录,确定所述第一应用场景下的常用的语音技能指令,所述语音技能指令用于调用语音技能,所述语音技能为语音助手提供的服务,所述历史语音技能使用记录包括一个或多个记录,所述记录用于指示调用语音技能的时间、调用语音技能的语音技能指令,以及调用语音技能的应用场景;语音助手在显示界面上显示所述第一应用场景下的常用的语音技能指令。
- 根据权利要求1所述的语音交互方法,其特征在于,在所述语音助手被第一次唤醒的情况下,所述方法还包括:若当前用户为新用户、网络正常连接,且所述语音助手能正常获取当前网络中的高频语音技能指令,则所述语音助手在显示界面上显示所述高频语音技能指令;若所述当前用户为老用户,则所述语音助手根据所述历史语音技能使用记录,确定所述当前用户的常用语音技能指令,所述语音助手在显示界面上显示所述当前用户的常用语音技能指令;若网络未正常连接,则所述语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令;若所述语音助手不能正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示预设语音技能指令。
- 根据权利要求1或2所述的语音交互方法,其特征在于,在所述语音助手根据所述第一应用场景,以及历史语音技能使用记录,确定所述第一应用场景下的常用的语音技能指令之后,所述方法还包括:所述语音助手根据所述历史语音技能使用记录以及所述第一应用场景,确定常用的语音技能在所述第一应用场景下的出现频率,所述常用的语音技能与所述常用的语音技能指令对应;所述语音助手根据所述常用的语音技能指令在所述第一应用场景下的出现频率,确定所述常用的语音技能的优先级;所述语音助手根据所述常用的语音技能的优先级,确定所述常用的语音技能指令在显示界面上的位置。
- 根据权利要求1-3中任一项所述的语音交互方法,其特征在于,所述方法还包括:语音助手响应于用户输入的语音技能指令,调用与所述用户输入的语音技能指令对应的语音技能。
- 根据权利要求4所述的语音交互方法,其特征在于,所述方法还包括:语音助手在调用语音技能失败的情况下,根据所述历史语音技能使用记录中调用 语音技能的时间,重新确定所述第一应用场景下的常用语音技能指令,并在显示界面上显示。
- 根据权利要求1-5中任一项所述的语音交互方法,其特征在于,所述方法还包括:若所述语音助手在第一预设时间段内未接收到用户输入的语音技能指令,则所述语音助手确定第二应用场景;所述语音助手根据第二应用场景,以及历史语音技能使用记录,确定第二应用场景下的常用的语音技能指令;所述语音助手在显示界面上显示所述第二应用场景下的常用语音技能指令。
- 根据权利要求1-6中任一项所述的语音交互方法,其特征在于,所述方法还包括:若语音助手在第二预设时间段内未接收到用户输入的语音技能指令,则语音助手关闭。
- 根据权利要求1-7中任一项所述的语音交互方法,其特征在于,所述方法还包括:若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。
- 根据权利要求1-8中任一项所述的语音交互方法,其特征在于,常用的语音技能指令用于调用显示界面中能够被点击的控件对应的语音技能;和/或,所述常用的语音技能指令用于调用显示界面中可识别的文本或图片对应的语音技能;和/或,所述常用的语音技能指令用于调用显示界面中可识别的场景意图对应的语音技能;和/或,所述常用的语音技能指令用于调用基于当前时刻的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用基于电子设备当前所在位置的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用基于当前电子设备的运动状态的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用原生应用对应的语音技能;和/或,所述常用的语音技能指令用于调用第三方应用对应的语音技能;和/或,所述常用的语音技能指令用于调用预设事件对应的语音技能;和/或,所述常用的语音技能指令用于调用涉及到多个应用的操作对应的语音技能;和/或,所述常用的语音技能指令用于调用路径较长的操作对应的语音技能;和/或,所述常用的语音技能指令用于调用当前电子设备上运行的应用中的功能对应的语音技能。
- 根据权利要求1-9中任一项所述的语音交互方法,其特征在于,语音助手以半屏态显示,所述半屏态为语音助手的应用界面与电子设备的整体显示界面的比例大于0且小于1;所述方法包括:语音助手将语音技能指令的反馈结果显示在语音助手的用户界面上;语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用。
- 根据权利要求10所述的语音交互方法,其特征在于,所述语音助手将语音技能指令的反馈结果显示在语音助手的用户界面上,包括:语音助手将所述反馈结果以卡片的形式显示在语音助手的用户界面上;所述语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用,包括:语音助手响应于用户对所述卡片的按压操作,选中所述卡片;语音助手响应于用户对所述卡片的拖动操作,将所述卡片从语音助手的用户界面拖动到所述其他应用的用户界面。
- 一种电子设备,所述电子设备上安装有语音助手,其特征在于,所述电子设备包括处理器、存储器和显示器;所述存储器、所述显示器与所述处理器耦合;所述显示器用于显示所述处理器生成的图像;所述存储器用于存储计算机程序代码;所述计算机程序代码包括计算机指令,当所述处理器执行上述计算机指令时,所述处理器,用于所述语音助手在被唤醒之后,根据一个或多个信息项,确定第一应用场景;所述信息项包括电子设备的当前显示界面、当前时刻、电子设备当前所在位置、当前电子设备的运动状态、当前事件、或者当前电子设备上运行的应用;所述处理器,用于根据所述第一应用场景,以及历史语音技能使用记录,确定所述第一应用场景下的常用的语音技能指令;所述语音技能指令用于调用语音技能,所述语音技能为语音助手提供的服务,所述历史语音技能使用记录包括一个或多个记录,所述记录用于指示调用语音技能的时间、调用语音技能的语音技能指令,以及调用语音技能的应用场景;所述处理器,用于在显示界面上显示所述第一应用场景下的常用的语音技能指令。
- 根据权利要求12所述的电子设备,其特征在于,在所述语音助手被第一次唤醒的情况下,所述处理器还用于:若当前用户为新用户、网络正常连接,且所述语音助手能正常获取当前网络中的高频语音技能指令,则所述语音助手在显示界面上显示所述高频语音技能指令;若所述当前用户为老用户,则所述语音助手根据所述历史语音技能使用记录,确定所述当前用户的常用语音技能指令,所述语音助手在显示界面上显示所述当前用户的常用语音技能指令;若网络未正常连接,则所述语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令;若所述语音助手不能正常获取当前网络中的高频语音技能,则语音助手在显示界面上显示预设语音技能指令。
- 根据权利要求12或13所述的电子设备,其特征在于,所述处理器,用于根据所述第一应用场景,以及历史语音技能使用记录,确定所述第一应用场景下的常用的语音技能指令之后,所述处理器还用于根据所述历史语音技能使用记录以及所述第一应用场景,确定常用的语音技能在所述第一应用场景下的出现频率,所述常用的语音技能与所述常用的语音技能指令对应;所述处理器,还用于根据所述常用的语音技能指令在所述第一应用场景下的出现频率,确定所述常用的语音技能的优先级;所述处理器,还用于根据所述常用的语音技能的优先级,确定所述常用的语音技能指令在显示界面上的位置。
- 根据权利要求12-14中任一项所述的电子设备,其特征在于,所述处理器,还用于响应于用户输入的语音技能指令,调用与所述用户输入的语音技能指令对应的语音技能。
- 根据权利要求15所述的电子设备,其特征在于,所述处理器还用于在调用语音技能失败的情况下,根据所述历史语音技能使用记录中调用语音技能的时间,重新确定所述第一应用场景下的常用语音技能指令,并在显示界面上显示。
- 根据权利要求12-16中任一项所述的电子设备,其特征在于,所述处理器还用于,若所述语音助手在第一预设时间段内未接收到用户输入的语音技能指令,则所述语音助手确定第二应用场景;所述语音助手根据第二应用场景,以及历史语音技能使用记录,确定第二应用场景下的常用的语音技能指令;所述语音助手在显示界面上显示所述第二应用场景下的常用语音技能指令。
- 根据权利要求12-17中任一项所述的电子设备,其特征在于,所述处理器还用于,若语音助手在第二预设时间段内未接收到用户输入的语音技能指令,则语音助手关闭。
- 根据权利要求12-18中任一项所述的电子设备中,其特征在于,所述处理器还用于,若网络未正常连接,则语音助手告知用户网络异常,并在显示界面上显示用于打开网络系统设置的语音技能指令。
- 根据权利要求12-19中任一项所述的电子设备,其特征在于,常用的语音技能指令用于调用显示界面中能够被点击的控件对应的语音技能;和/或,所述常用的语音技能指令用于调用显示界面中可识别的文本或图片对应的语音技能;和/或,所述常用的语音技能指令用于调用显示界面中可识别的场景意图对应的语音技能;和/或,所述常用的语音技能指令用于调用基于当前时刻的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用基于电子设备当前所在位置的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用基于当前电子设备的运动状态的规律行为对应的语音技能;和/或,所述常用的语音技能指令用于调用原生应用对应的语音技能;和/或,所述常用的语音技能指令用于调用第三方应用对应的语音技能;和/或,所述常用的语音技能指令用于调用预设事件对应的语音技能;和/或,所述常用的语音技能指令用于调用涉及到多个应用的操作对应的语音技能;和/或,所述常用的语音技能指令用于调用路径较长的操作对应的语音技能;和/或,所述常用的语音技能指令用于调用当前电子设备上运行的应用中的功能对应的语音技能。
- 根据权利要求12-20中任一项所述的电子设备,其特征在于,语音助手以半屏态显示,所述半屏态为语音助手的应用界面与电子设备的整体显示界面的比例大于0且小于1;所述处理器还用于,将语音技能指令的反馈结果显示在语音助手的用户界面上;语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用。
- 根据权利要求21所述的电子设备,其特征在于,所述处理器,用于将语音技能指令的反馈结果显示在语音助手的用户界面上具体为:所述处理器,用于语音助手将所述反馈结果以卡片的形式显示在语音助手的用户界面上;所述处理器,用于所述语音助手响应于用户的操作指令,将语音技能指令的反馈结果分享给其他应用具体为:所述处理器,用于语音助手响应于用户对所述卡片的按压操作,选中所述卡片;语音助手响应于用户对所述卡片的拖动操作,将所述卡片从语音助手的用户界面拖动到所述其他应用的用户界面。
- 一种计算机存储介质,其特征在于,所述计算机存储介质包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-11中任一项所述的语音交互方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-11中任一项所述的语音交互方法。
- 一种芯片系统,其特征在于,包括一个或多个处理器,当所述一个或多个处理器执行指令时,所述一个或多个处理器执行如权利要求1-11中任一项所述的语音交互方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20872498.9A EP4030422B1 (en) | 2019-09-30 | 2020-09-29 | Voice interaction method and device |
| US17/707,666 US12190878B2 (en) | 2019-09-30 | 2022-03-29 | Voice interaction method and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910941167.0 | 2019-09-30 | ||
| CN201910941167.0A CN110910872B (zh) | 2019-09-30 | 2019-09-30 | 语音交互方法及装置 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/707,666 Continuation US12190878B2 (en) | 2019-09-30 | 2022-03-29 | Voice interaction method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021063343A1 true WO2021063343A1 (zh) | 2021-04-08 |
Family
ID=69815351
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/118748 Ceased WO2021063343A1 (zh) | 2019-09-30 | 2020-09-29 | 语音交互方法及装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12190878B2 (zh) |
| EP (1) | EP4030422B1 (zh) |
| CN (2) | CN116564304A (zh) |
| WO (1) | WO2021063343A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220223154A1 (en) * | 2019-09-30 | 2022-07-14 | Huawei Technologies Co., Ltd. | Voice interaction method and apparatus |
| WO2023159536A1 (zh) * | 2022-02-28 | 2023-08-31 | 华为技术有限公司 | 人机交互方法、装置以及终端设备 |
| CN116775963A (zh) * | 2022-03-10 | 2023-09-19 | 比亚迪股份有限公司 | 信息推送方法、电子装置、车辆及电子设备 |
Families Citing this family (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
| US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
| US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
| US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
| US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
| US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
| US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
| US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
| US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
| DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
| US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
| DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
| US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
| US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
| CN111488443B (zh) * | 2020-04-08 | 2022-07-12 | 思必驰科技股份有限公司 | 技能选择方法及装置 |
| CN113572798B (zh) * | 2020-04-29 | 2023-03-28 | 华为技术有限公司 | 设备控制方法、系统、设备和存储介质 |
| WO2021223232A1 (zh) * | 2020-05-08 | 2021-11-11 | 赣州市牧士电子有限公司 | 一种基于Gaia AI语音控制的智能电视多语种识别系统 |
| US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
| CN111552794B (zh) * | 2020-05-13 | 2023-09-19 | 海信电子科技(武汉)有限公司 | 提示语生成方法、装置、设备和存储介质 |
| CN111599362A (zh) * | 2020-05-20 | 2020-08-28 | 湖南华诺科技有限公司 | 一种自定义智能音箱技能的系统、方法及存储介质 |
| CN113703656B (zh) * | 2020-05-22 | 2026-01-13 | 苹果公司 | 数字助理用户界面和响应模式 |
| CN111625094B (zh) * | 2020-05-25 | 2023-07-14 | 阿波罗智联(北京)科技有限公司 | 智能后视镜的交互方法、装置、电子设备和存储介质 |
| JP7491147B2 (ja) * | 2020-08-31 | 2024-05-28 | セイコーエプソン株式会社 | 表示システムの制御方法、表示システム、及び、表示装置の制御方法 |
| CN112291428B (zh) * | 2020-10-23 | 2021-10-01 | 北京蓦然认知科技有限公司 | 一种语音助手的智能呼叫方法、装置 |
| CN112463106A (zh) * | 2020-11-12 | 2021-03-09 | 深圳Tcl新技术有限公司 | 基于智能屏幕的语音交互方法、装置、设备及存储介质 |
| CN113205807B (zh) * | 2021-04-06 | 2023-08-29 | 珠海格力电器股份有限公司 | 一种语音设备的控制方法、装置、存储介质及语音设备 |
| CN115407909A (zh) * | 2021-05-27 | 2022-11-29 | Oppo广东移动通信有限公司 | 内容分享方法、装置、终端及存储介质 |
| CN115421682A (zh) * | 2021-05-31 | 2022-12-02 | Oppo广东移动通信有限公司 | 设备控制方法、装置、电子设备及存储介质 |
| CN113568556A (zh) * | 2021-07-05 | 2021-10-29 | 深圳康佳电子科技有限公司 | 分屏控制方法、装置、智能终端及计算机可读存储介质 |
| CN116489420A (zh) * | 2022-01-17 | 2023-07-25 | 北京达佳互联信息技术有限公司 | 交互方法、装置、设备及存储介质 |
| CN116798418A (zh) * | 2022-03-14 | 2023-09-22 | 华为技术有限公司 | 基于语音助手的控制方法和装置 |
| US11995457B2 (en) | 2022-06-03 | 2024-05-28 | Apple Inc. | Digital assistant integration with system interface |
| JP7582263B2 (ja) * | 2022-06-20 | 2024-11-13 | トヨタ自動車株式会社 | 健康増進システム及び健康増進方法 |
| CN116168696A (zh) * | 2022-12-23 | 2023-05-26 | 博泰车联网科技(上海)股份有限公司 | 一种车载多指令执行方法、装置、电子设备和存储介质 |
| CN119003059A (zh) * | 2023-09-22 | 2024-11-22 | 北京字跳网络技术有限公司 | 一种信息处理方法、系统、设备及介质 |
| CN117278710B (zh) * | 2023-10-20 | 2024-06-25 | 联通沃音乐文化有限公司 | 一种通话交互功能确定方法、装置、设备和介质 |
| CN121905167A (zh) * | 2023-10-31 | 2026-04-21 | 华为技术有限公司 | 语音助手交互的方法和电子设备 |
| CN117238322B (zh) * | 2023-11-10 | 2024-01-30 | 深圳市齐奥通信技术有限公司 | 一种基于智能感知的自适应语音调控方法及系统 |
| WO2025192898A1 (ko) * | 2024-03-14 | 2025-09-18 | 삼성전자 주식회사 | 전자 장치 및 이를 이용한 음성 어시스턴트 기능과 관련된 사용자 인터페이스 표시 방법 |
| CN119311361A (zh) * | 2024-06-20 | 2025-01-14 | 华为技术有限公司 | 对话方法和电子设备 |
| CN120276795A (zh) * | 2024-06-20 | 2025-07-08 | 华为技术有限公司 | 一种智慧助手界面显示方法以及电子设备 |
| CN118658479B (zh) * | 2024-08-21 | 2024-11-15 | 南京科睿金信技术有限公司 | 一种基于语音识别的人机交互系统 |
| WO2026055988A1 (zh) * | 2024-09-14 | 2026-03-19 | 荣耀终端股份有限公司 | 一种虚拟助手协同使用方法及电子设备 |
| CN120407778A (zh) * | 2024-10-23 | 2025-08-01 | 荣耀终端股份有限公司 | 一种个性化推荐方法、电子设备及存储介质 |
| CN119068864B (zh) * | 2024-11-05 | 2025-04-04 | 宝略科技(浙江)有限公司 | 一种语言识别和大语言模型融合的智能交互系统和方法 |
| CN119512408A (zh) * | 2024-11-08 | 2025-02-25 | 北京稀宇极智科技有限公司 | 一种应用程序的交互方法及交互装置 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010128144A (ja) * | 2008-11-27 | 2010-06-10 | Toyota Central R&D Labs Inc | 音声認識装置及びプログラム |
| CN109584879A (zh) * | 2018-11-23 | 2019-04-05 | 华为技术有限公司 | 一种语音控制方法及电子设备 |
| CN109829107A (zh) * | 2019-01-23 | 2019-05-31 | 华为技术有限公司 | 一种基于用户运动状态的推荐方法及电子设备 |
| CN110012151A (zh) * | 2019-02-22 | 2019-07-12 | 维沃移动通信有限公司 | 一种信息显示方法及终端设备 |
| CN110138959A (zh) * | 2019-04-10 | 2019-08-16 | 华为技术有限公司 | 显示人机交互指令的提示的方法及电子设备 |
| CN110164427A (zh) * | 2018-02-13 | 2019-08-23 | 阿里巴巴集团控股有限公司 | 语音交互方法、装置、设备以及存储介质 |
| CN110175012A (zh) * | 2019-04-17 | 2019-08-27 | 百度在线网络技术(北京)有限公司 | 技能推荐方法、装置、设备及计算机可读存储介质 |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6622119B1 (en) * | 1999-10-30 | 2003-09-16 | International Business Machines Corporation | Adaptive command predictor and method for a natural language dialog system |
| US10705794B2 (en) * | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
| KR101781640B1 (ko) * | 2011-02-14 | 2017-09-25 | 엘지전자 주식회사 | 원격 제어 서비스 제공 방법 및 그를 이용한 영상 표시 기기 |
| US9842299B2 (en) * | 2011-01-25 | 2017-12-12 | Telepathy Labs, Inc. | Distributed, predictive, dichotomous decision engine for an electronic personal assistant |
| US9691115B2 (en) * | 2012-06-21 | 2017-06-27 | Cellepathy Inc. | Context determination using access points in transportation and other scenarios |
| EP2865203A4 (en) * | 2012-06-21 | 2016-02-17 | Cellepathy Ltd | DEVICE CONTEXT DETERMINATION |
| CN103795677B (zh) * | 2012-10-23 | 2015-03-04 | 腾讯科技(深圳)有限公司 | 确保安全软件客户端连接云端可靠性的方法和客户端 |
| US9570090B2 (en) * | 2015-05-26 | 2017-02-14 | Google Inc. | Dialog system with automatic reactivation of speech acquiring mode |
| EP3036924A4 (en) * | 2013-08-23 | 2017-04-12 | Cellepathy Ltd. | Mobile device context aware determinations |
| US9489171B2 (en) * | 2014-03-04 | 2016-11-08 | Microsoft Technology Licensing, Llc | Voice-command suggestions based on user identity |
| KR20160045353A (ko) * | 2014-10-17 | 2016-04-27 | 현대자동차주식회사 | 에이브이엔 장치, 차량, 및 에이브이엔 장치의 제어방법 |
| JP6246142B2 (ja) * | 2015-01-14 | 2017-12-13 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
| US10018977B2 (en) * | 2015-10-05 | 2018-07-10 | Savant Systems, Llc | History-based key phrase suggestions for voice control of a home automation system |
| JP2019521449A (ja) * | 2016-03-31 | 2019-07-25 | ジボ インコーポレイテッド | 永続的コンパニオンデバイス構成及び配備プラットフォーム |
| US10679608B2 (en) * | 2016-12-30 | 2020-06-09 | Google Llc | Conversation-aware proactive notifications for a voice interface device |
| US10547729B2 (en) * | 2017-03-27 | 2020-01-28 | Samsung Electronics Co., Ltd. | Electronic device and method of executing function of electronic device |
| US10950228B1 (en) * | 2017-06-28 | 2021-03-16 | Amazon Technologies, Inc. | Interactive voice controlled entertainment |
| US10579641B2 (en) * | 2017-08-01 | 2020-03-03 | Salesforce.Com, Inc. | Facilitating mobile device interaction with an enterprise database system |
| CN108470034B (zh) * | 2018-02-01 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | 一种智能设备服务提供方法及系统 |
| CN116564304A (zh) * | 2019-09-30 | 2023-08-08 | 华为终端有限公司 | 语音交互方法及装置 |
-
2019
- 2019-09-30 CN CN202310426733.0A patent/CN116564304A/zh active Pending
- 2019-09-30 CN CN201910941167.0A patent/CN110910872B/zh active Active
-
2020
- 2020-09-29 WO PCT/CN2020/118748 patent/WO2021063343A1/zh not_active Ceased
- 2020-09-29 EP EP20872498.9A patent/EP4030422B1/en active Active
-
2022
- 2022-03-29 US US17/707,666 patent/US12190878B2/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010128144A (ja) * | 2008-11-27 | 2010-06-10 | Toyota Central R&D Labs Inc | 音声認識装置及びプログラム |
| CN110164427A (zh) * | 2018-02-13 | 2019-08-23 | 阿里巴巴集团控股有限公司 | 语音交互方法、装置、设备以及存储介质 |
| CN109584879A (zh) * | 2018-11-23 | 2019-04-05 | 华为技术有限公司 | 一种语音控制方法及电子设备 |
| CN109829107A (zh) * | 2019-01-23 | 2019-05-31 | 华为技术有限公司 | 一种基于用户运动状态的推荐方法及电子设备 |
| CN110012151A (zh) * | 2019-02-22 | 2019-07-12 | 维沃移动通信有限公司 | 一种信息显示方法及终端设备 |
| CN110138959A (zh) * | 2019-04-10 | 2019-08-16 | 华为技术有限公司 | 显示人机交互指令的提示的方法及电子设备 |
| CN110175012A (zh) * | 2019-04-17 | 2019-08-27 | 百度在线网络技术(北京)有限公司 | 技能推荐方法、装置、设备及计算机可读存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4030422A4 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220223154A1 (en) * | 2019-09-30 | 2022-07-14 | Huawei Technologies Co., Ltd. | Voice interaction method and apparatus |
| US12190878B2 (en) * | 2019-09-30 | 2025-01-07 | Huawei Technologies Co., Ltd. | Voice interaction method and apparatus |
| WO2023159536A1 (zh) * | 2022-02-28 | 2023-08-31 | 华为技术有限公司 | 人机交互方法、装置以及终端设备 |
| CN116775963A (zh) * | 2022-03-10 | 2023-09-19 | 比亚迪股份有限公司 | 信息推送方法、电子装置、车辆及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220223154A1 (en) | 2022-07-14 |
| CN110910872B (zh) | 2023-06-02 |
| EP4030422B1 (en) | 2024-06-19 |
| CN110910872A (zh) | 2020-03-24 |
| US12190878B2 (en) | 2025-01-07 |
| CN116564304A (zh) | 2023-08-08 |
| EP4030422A1 (en) | 2022-07-20 |
| EP4030422A4 (en) | 2023-05-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110910872B (zh) | 语音交互方法及装置 | |
| KR102470275B1 (ko) | 음성 제어 방법 및 전자 장치 | |
| WO2021052263A1 (zh) | 语音助手显示方法及装置 | |
| WO2021213164A1 (zh) | 应用界面交互方法、电子设备和计算机可读存储介质 | |
| WO2021013158A1 (zh) | 显示方法及相关装置 | |
| WO2020151387A1 (zh) | 一种基于用户运动状态的推荐方法及电子设备 | |
| WO2020211701A1 (zh) | 模型训练方法、情绪识别方法及相关装置和设备 | |
| WO2022068819A1 (zh) | 一种界面显示方法及相关装置 | |
| WO2021249281A1 (zh) | 一种用于电子设备的交互方法和电子设备 | |
| WO2020077540A1 (zh) | 一种信息处理方法及电子设备 | |
| WO2020073288A1 (zh) | 一种触发电子设备执行功能的方法及电子设备 | |
| WO2022068483A1 (zh) | 应用启动方法、装置和电子设备 | |
| WO2021036770A1 (zh) | 一种分屏处理方法及终端设备 | |
| US12217069B2 (en) | Operation sequence adding method, electronic device, and system | |
| US12282761B2 (en) | Application module startup method and electronic device | |
| WO2021238371A1 (zh) | 生成虚拟角色的方法及装置 | |
| CN116798418A (zh) | 基于语音助手的控制方法和装置 | |
| WO2022037726A1 (zh) | 分屏显示方法和电子设备 | |
| WO2022135157A1 (zh) | 页面显示的方法、装置、电子设备以及可读存储介质 | |
| CN111835904A (zh) | 一种基于情景感知和用户画像开启应用的方法及电子设备 | |
| WO2024037542A1 (zh) | 一种触控输入的方法、系统、电子设备及存储介质 | |
| WO2024012346A1 (zh) | 任务迁移的方法、电子设备和系统 | |
| CN116450026A (zh) | 用于识别触控操作的方法和系统 | |
| CN118131891A (zh) | 一种人机交互的方法和装置 | |
| WO2023016347A1 (zh) | 声纹认证应答方法、系统及电子设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20872498 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020872498 Country of ref document: EP Effective date: 20220412 |