WO2021027267A1 - 语音交互方法、装置、终端及存储介质 - Google Patents
语音交互方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2021027267A1 WO2021027267A1 PCT/CN2020/074988 CN2020074988W WO2021027267A1 WO 2021027267 A1 WO2021027267 A1 WO 2021027267A1 CN 2020074988 W CN2020074988 W CN 2020074988W WO 2021027267 A1 WO2021027267 A1 WO 2021027267A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- service type
- target
- service
- mapping relationship
- type set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
Definitions
- This application relates to the field of terminal technology, and in particular to a voice interaction method, device, terminal, and storage medium.
- terminals With the development of terminal technology, more and more terminals support the function of voice interaction. Users can interact with the terminal through man-machine interaction by emitting voice, thereby freeing both hands and improving the efficiency of man-machine interaction.
- the voice interaction process usually includes: when the user wants to interact with the terminal, first, the user speaks the wake-up word, and the terminal will collect the voice command to determine whether the voice command contains a wake-up word. If the voice command contains a wake-up word, the terminal It will switch from the standby state to the working state, that is, the terminal is awakened; after that, the user speaks the service that needs to be processed by the terminal, and the terminal collects voice commands again, and according to the voice commands, determines the service to be processed and processes the service.
- the wake-up word of a vehicle-mounted terminal is "Hello, Xiaohua”
- the user turns on the radio while driving, and wants to play a song on the vehicle-mounted terminal
- the terminal is awakened, and the user says “Help play the songs of "Eastern Music Radio”
- the vehicle-mounted terminal will automatically adjust the radio station to "Eastern Music Radio” and play the songs of "Eastern Music Radio”.
- the user When using the above-mentioned method for voice interaction, the user needs to speak the wake-up word before waking up the terminal to process services, which results in cumbersome operations and low efficiency.
- the embodiments of the present application provide a voice interaction method, device, terminal, and storage medium, which can solve the technical problems of cumbersome operations and low efficiency of voice interaction in related technologies.
- the technical solution is as follows:
- a voice interaction method includes: determining that a target event is detected to occur, and the target event is an event that can trigger a voice interaction; according to the target event, querying a mapping relationship to obtain a service type set ,
- the service type set includes one or more target service types; collecting voice instructions; obtaining the first service corresponding to the semantic information according to the semantic information corresponding to the voice command; if the service type of the first service is For any target service type in the service type set, the first service is executed according to the voice instruction.
- This embodiment provides a method for triggering voice interaction without wake-up words.
- the set of service types for which the user has voice interaction intention is predicted. If the service type of the first service expressed by the voice command is service
- the first service is executed.
- the determining that the target event is detected to occur includes: determining that the first operation of the user is detected;
- the querying the mapping relationship according to the target event to obtain the service type set includes: according to the first operation, querying the mapping relationship to obtain the service type set, and the target service type included in the service type set is A service type corresponding to one or more target second operations, where the one or more target second operations are continuous operations associated with the first operation.
- the next operation will be executed continuously, so the intention of voice interaction for the business corresponding to the next operation will be generated.
- This optional method makes full use of the operation
- the law of continuity will map the user's current operation to the business type corresponding to the next operation to be executed under a certain probability, so that when the user performs the operation, the user can accurately predict which business type the user wants to target Perform voice interaction, thus ensuring the accuracy of the target service type.
- the determining that the target event is detected to occur includes: receiving a notification message from an operating system or an application;
- the querying the mapping relationship according to the target event to obtain the service type set includes:
- the mapping relationship is queried to obtain the service type set, and the target service type included in the service type set is message viewing or message processing corresponding to the notification message.
- the terminal receives a notification message
- the user will have the need to view the notification message or process the notification message, so the voice interaction intention for message viewing or message processing will be generated.
- This optional method fully considers the user
- the need to view notification messages or process notification messages will map the event of receiving notification messages to the service type of message viewing or message processing, so that when the notification message is received, the user’s desire can be accurately predicted Which service type should be used for voice interaction, thus ensuring the accuracy of the target service type.
- the notification message includes at least one of an incoming call notification, a short message, an instant messaging message, and an alarm message
- the querying the mapping relationship according to the notification message to obtain the service type set includes the following At least one:
- the mapping relationship According to the incoming call notification, query the mapping relationship to obtain the service type set, and the target service type included in the service type set is answering the call;
- query the mapping relationship to obtain the service type set, and the target service type included in the service type set is message viewing or message reply;
- the mapping relationship is queried to obtain the service type set, and the target service type included in the service type set is fault processing or information query.
- the determining that the target event is detected includes: determining that the current environmental parameter satisfies the first condition
- the querying the mapping relationship according to the target event to obtain the service type set includes: querying the mapping relationship according to the environmental parameter to obtain the service type set, and the target service type included in the service type set is adjustment Environmental parameters.
- the environment will affect the user's perception, and the user will have the need to cope with the environment. For example, if a certain environmental parameter changes, the user will have the need to adjust this environmental parameter, so a voice for adjusting the environmental parameter will be generated Interactive intention, and through this optional method, the user’s need to respond to the environment is fully considered, and the event that the environmental parameter meets the first condition is mapped to the business type of adjusting the environmental parameter, so that the environmental parameter meets the first condition. Under certain conditions, it accurately predicts which service type the user wants to perform voice interaction with, thus ensuring the accuracy of the target service type.
- the determining that the target event has been detected includes: determining that the progress of the current business meets the second condition;
- the querying the mapping relationship according to the target event to obtain the service type set includes: querying the mapping relationship according to the current service to obtain the service type set, and the target service type included in the service type set is all State the business type of the current business.
- the progress of the current business will affect the user’s perception, and the user will have the need to cope with the current business. For example, if the current business is about to end, the user usually wants to re-execute the current business, stop executing the current business, or respond to the current business. The business is adjusted, and through this optional method, the user’s needs for responding to business changes are fully considered, and the event that the current business progress meets the second condition is mapped to the business type of the current business, so that the current business When the progress meets the second condition, it accurately predicts which service type the user wants to perform voice interaction for, thus ensuring the accuracy of the target service type.
- the target event can cover multiple modalities, and the target event of any modal can trigger the voice interaction function of the corresponding service type, so that in a variety of application scenarios, it can support wake-up without wake-up words Function, expand the scope of application.
- the process of establishing the mapping relationship includes:
- the historical record obtain the historical business executed in association with the historical target event, and write the business type of the historical business and the historical target event into the mapping relationship;
- the user due to the regularity of the personal behavior pattern of the same user, under normal circumstances, the user’s voice interaction intention after the current target event will be more likely to be compared with the historical time after the historical target event. If the generated voice interaction intentions are the same or similar, then the business that will be executed after the current target event will be the same or similar to the business that will be executed by the historical target event with a high probability. Therefore, it is predicted by combining historical records The target service type targeted by the current voice interaction intention can improve the accuracy of the target service type.
- the process of establishing the mapping relationship includes: invoking a machine learning model, inputting sample target events into the machine learning model, outputting business types, and writing the output business types and the sample target events into the mapping relationship ,
- the machine learning model is used to predict the type of business according to the event.
- the method further includes: if the service type of the first service is in the service type set Each target service type of is different, and the service type of the first service is written into the mapping relationship.
- the service type set obtained will include the service type of the first service. Then, when the user expresses the service type of the first service through voice commands, Will respond to voice commands to perform the first service.
- the terminal detects event X in the historical operation process, the user expresses the intention of voice interaction with service type Y through voice, and the event X and service type Y can be added to the mapping relationship. Then, On the one hand, with the execution of the voice interaction process, the correlation between events and business types can be mined, and the business types corresponding to events and semantic information can be supplemented and improved.
- the terminal can add new events and new service types to the mapping relationship to improve the scalability and timeliness of the mapping relationship.
- the querying the mapping relationship according to the target event to obtain the service type set includes: querying the mapping relationship according to the target event to obtain the service type set and each of the service type sets The probability corresponding to the target service type.
- the probability indicates the probability that the service corresponding to the target service type will be executed; if the service type of the first service is any target service type in the service type set, according to the Before the voice command executes the first service, the method further includes: filtering out target service types whose probability does not meet the probability threshold from the service type set.
- the method further includes: updating the mapping relationship according to the semantic information corresponding to the voice instruction The probability.
- the probability can be dynamically adjusted according to the semantic information expressed by the user this time, so as to evaluate the correctness of the predicted business type in a self-learning manner. Iteratively corrects the probability, so that the probability of each business type in the mapping relationship can be continuously optimized with the occurrence of the target event and the semantics expressed by the user, gradually getting closer to the user's personal behavior habits, and ensuring the mapping relationship is more accurate.
- the updating the probability in the mapping relationship according to the semantic information corresponding to the voice instruction includes:
- the service type of the first service is any target service type in the service type set, increase the probability corresponding to the service type of the first service in the mapping relationship;
- the target event X corresponds to the target service type Y
- the user always requests to execute the service of the target service type Y through voice, indicating that the target service type Y happens to happen
- the probability of the target service type Y will continue to increase, so when the target event is detected again subsequently, the probability of the target service type Y will be The probability threshold is met, so that the target service type Y will be screened out. If the voice command expresses the semantic information of the target service type Y, the terminal will process the service in response to the voice command.
- the updating the probability in the mapping relationship according to the semantic information corresponding to the voice instruction includes:
- the target event X corresponds to the target service type Y
- the user does not request to execute the service of the target service type Y through voice, indicating that the target service type Y does not occur
- the probability of the target service type Y will continue to decrease, so when the target event X is detected again subsequently, the probability of the target service type Y If the probability threshold is not met, the target service type Y will be filtered out, and the terminal will not process the service of the target service type Y, thereby avoiding false wakeup.
- the updating the probability in the mapping relationship according to the semantic information corresponding to the voice instruction includes: if the semantic information includes a wake-up word, adding the service type of the first service in the mapping relationship The corresponding probability.
- the target event X corresponds to the target business type Y
- the user will speak a wake-up word, indicating that the target business type Y happens to be after the target event X occurs, and the user has The service type of the voice interaction intention
- the probability of the target service type Y will continue to increase. Therefore, when the target event is subsequently detected again, the probability of the target service type Y will meet the probability threshold, making the target service type Y will be screened out. If the voice command expresses the semantic information of the target service type Y, the terminal will process the service in response to the voice command.
- the method further includes: if the service type of the first service is in the service type set Each target service type is different, and the voice command is discarded.
- the terminal may not respond to voice commands, but discard voice commands, so as to avoid false wake-ups caused by service processing based on voice commands, and save the buffer space occupied by voice commands.
- a voice interaction device in a second aspect, has the function of implementing voice interaction in the first aspect or any one of the optional methods of the first aspect.
- the device includes at least one module, and the at least one module is configured to implement the voice interaction method provided in the first aspect or any one of the optional modes of the first aspect.
- a terminal in a third aspect, includes one or more processors and one or more memories, and at least one instruction is stored in the one or more memories.
- the processor loads and executes to implement the voice interaction method provided in the first aspect or any one of the optional modes of the first aspect.
- a computer-readable storage medium is provided, and at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the foregoing first aspect or any one of the optional methods of the first aspect.
- Voice interaction method provided.
- a computer program product includes: computer program code, when the computer program code is run by a terminal, the terminal can execute the first aspect or any one of the first aspects. Select the voice interaction method provided by the method.
- a chip including a processor, configured to call and execute instructions stored in the memory from a memory, so that a terminal installed with the chip can execute the first aspect or any one of the first aspects. Select the voice interaction method provided by the method.
- another chip including: an input interface, an output interface, a processor, and a memory, the input interface, output interface, the processor, and the memory are connected by an internal connection path, and the processing
- the processor is configured to execute the code in the memory, and when the code is executed, the processor is configured to execute the voice interaction method provided in the first aspect or any one of the optional modes of the first aspect.
- FIG. 1 is a schematic diagram of an implementation environment of a voice interaction method provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a terminal 100 provided by an embodiment of the present application.
- FIG. 3 is a functional architecture diagram of a terminal 100 provided by an embodiment of the present application.
- FIG. 4 is a flowchart of a voice interaction method provided by an embodiment of the present application.
- Figure 5 is a software architecture diagram of a voice interaction system provided by an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a vehicle-mounted terminal provided by an embodiment of the present application.
- Fig. 7 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application.
- Business type a general term for a type of business, and a business type can also be called a business field.
- business types can include message viewing, message processing, adjustment of environmental parameters, navigation, schedule consultation, air conditioning, radio, music, car control, mileage inquiry, question and answer consultation, games, system settings, vehicle control, charging, maintenance, communication
- the message viewing may include viewing short messages, viewing instant messaging messages of instant messaging applications, and viewing push messages of resource recommendation applications.
- the message processing may include answering calls, message responses, troubleshooting, information query, etc.
- adjusting environmental parameters may include Adjust dust concentration, adjust humidity, adjust light, adjust noise intensity, adjust temperature, etc.
- the service type of message checking can be: checking the conversation message X sent by user A, checking the latest group announcement Y in the group chat, checking the preferential news released by the shopping application today, etc., and adjusting the environmental parameters.
- the type of business can be: adjust the temperature to 25° through air-conditioning
- the business of the music business type can be: play the latest song Z of singer B
- the business of the navigation business type can be: navigate to C city D district E In the district of Road F
- the business type of answering incoming calls can be: answering the call of the caller ZZ
- the business type of replying to the message can be: replying to the contact G with a short message "I am driving, I will reply later”
- the service type of radio station can be: Open "Oriental Music Radio".
- Fig. 1 is a schematic diagram of an implementation environment of a voice interaction method provided by an embodiment of the present application.
- the implementation environment includes: a terminal 100 and a voice interaction platform 200.
- the terminal 100 is connected to the voice interaction platform 200 through a wireless network or a wired network.
- the terminal 100 can be a smart phone, a smart speaker, a robot, a smart car, a car terminal, a home device, a game console, a desktop computer, a tablet computer, an e-book reader, a smart TV, MP3 (moving picture experts, group audio layer III, dynamic video) Expert compression standard audio layer 3) player or MP4 (moving picture experts group audio layer IV, moving picture expert compression standard audio layer 4) player and laptop portable computer at least one.
- the terminal 100 installs and runs an application program supporting voice interaction.
- the application can be a voice assistant, an intelligent question answering application, etc.
- the terminal 100 is a terminal used by a user, and a user account is logged in an application program running in the terminal 100.
- the voice interaction platform 200 includes at least one of a server, multiple servers, a cloud computing platform, and a virtualization center.
- the voice interaction platform 200 is used to provide background services for applications that support voice interaction.
- the voice interaction platform may construct the mapping relationship provided in the following method embodiments, and send the mapping relationship to the terminal 100 so that the terminal 100 can perform voice interaction based on the mapping relationship.
- the voice interaction platform 200 includes: a voice interaction server 201 and a database 202.
- the voice interaction server 201 is used to provide background services related to voice interaction.
- the voice interaction server 201 may be one or more.
- there are multiple voice interaction servers 201 there are at least two voice interaction servers 201 for providing different services, and/or there are at least two voice interaction servers 201 for providing the same service, for example, in a load balancing manner.
- the same service is not limited in this embodiment of the application.
- the database 202 can be used to store the mapping relationship.
- the database 202 may store sample events and sample service types, so that the voice interaction server 201 reads the sample events and sample service types from the database 202, and trains a machine learning model based on the sample events and sample service types. Through the machine learning model Establish a mapping relationship.
- the terminal 100 may generally refer to one of multiple terminals, and this embodiment only uses the terminal 100 as an example for illustration. Those skilled in the art may know that the number of the aforementioned terminals 100 may be more or less. For example, there may be only one terminal 100, or there may be dozens or hundreds of terminals 100, or more. In this case, the voice interaction system may also include other terminals. The embodiment of the present application does not limit the number of terminals 100 and the device type.
- FIG. 2 is a schematic structural diagram of a terminal 100 provided by an embodiment of the present application.
- the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and user An identification module (subscriber identification module, SIM) card interface 195, etc.
- SIM subscriber identification module
- the sensor module 180 may include pressure sensor 180A, gyroscope sensor 180B, air pressure sensor 180C, magnetic sensor 180D, acceleration sensor 180E, distance sensor 180F, proximity light sensor 180G, fingerprint sensor 180H, temperature sensor 180J, touch sensor 180K, ambient light Sensor 180L, bone conduction sensor 180M, etc.
- the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 100.
- the terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
- the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
- AP application processor
- modem processor modem processor
- GPU graphics processing unit
- image signal processor image signal processor
- ISP image signal processor
- controller video codec
- digital signal processor digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the different processing units may be independent devices or integrated in one or more processors.
- the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 to store instructions and data.
- the memory in the processor 110 is a cache memory.
- the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transmitter
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB Universal Serial Bus
- the I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may include multiple sets of I2C buses.
- the processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the terminal 100.
- the I2S interface can be used for audio communication.
- the processor 110 may include multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
- the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
- the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the terminal 100.
- the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the terminal 100.
- the GPIO interface can be configured through software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
- GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
- the USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transfer data between the terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect to other terminals, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the terminal 100.
- the terminal 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
- the charging management module 140 is used to receive charging input from the charger.
- the charger can be a wireless charger or a wired charger.
- the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
- the charging management module 140 may receive the wireless charging input through the wireless charging coil of the terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the terminal through the power management module 141.
- the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
- the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
- the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
- the power management module 141 may also be provided in the processor 110.
- the power management module 141 and the charging management module 140 may also be provided in the same device.
- the wireless communication function of the terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
- the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in the terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
- the antenna can be used in combination with a tuning switch.
- the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the terminal 100.
- the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
- the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering and amplifying the received electromagnetic waves, and then transmitting them to the modem processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
- at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
- the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
- the modem processor may be an independent device.
- the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the terminal 100, including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
- WLAN wireless local area networks
- BT Bluetooth
- GNSS global navigation satellite system
- frequency modulation frequency modulation, FM
- NFC near field communication technology
- infrared technology infrared, IR
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
- the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
- the antenna 1 of the terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include the global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code Division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
- the GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi- Zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Beidou navigation satellite system
- QZSS quasi-zenith satellite system
- QZSS quasi-zenith satellite system
- SBAS satellite-based augmentation systems
- the terminal 100 implements a display function through a GPU, a display screen 194, and an application processor.
- the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
- the display screen 194 is used to display images, videos, etc.
- the display screen 194 includes a display panel.
- the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
- LCD liquid crystal display
- OLED organic light-emitting diode
- active-matrix organic light-emitting diode active-matrix organic light-emitting diode
- AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
- the terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.
- the terminal 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
- the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and transforms it into an image visible to the naked eye.
- ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193.
- the camera 193 is used to capture still images or videos.
- the object generates an optical image through the lens and projects it to the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
- ISP outputs digital image signals to DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats.
- the terminal 100 may include 1 or N cameras 193, and N is a positive integer greater than 1.
- Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
- Video codecs are used to compress or decompress digital video.
- the terminal 100 may support one or more video codecs.
- the terminal 100 can play or record videos in multiple encoding formats, for example: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
- MPEG moving picture experts group
- NPU is a neural-network (NN) computing processor.
- NN neural-network
- applications such as intelligent cognition of the terminal 100 can be implemented, such as image recognition, face recognition, voice recognition, text understanding, etc.
- the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
- the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
- the data storage area can store data (such as audio data, phone book, etc.) created during the use of the terminal 100.
- the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
- the processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
- the terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
- the audio module 170 is used for converting digital audio information into an analog audio signal for output, and also for converting an analog audio input into a digital audio signal.
- the audio module 170 can also be used to encode and decode audio signals.
- the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
- the speaker 170A also called a “speaker” is used to convert audio electrical signals into sound signals.
- the terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
- the terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C.
- the terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
- the earphone interface 170D is used to connect wired earphones.
- the earphone interface 170D may be a USB interface 130, or may be a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
- open mobile terminal platform OMTP
- cellular telecommunications industry association cellular telecommunications industry association of the USA, CTIA
- the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
- the pressure sensor 180A may be provided on the display screen 194.
- the capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
- the terminal 100 determines the strength of the pressure according to the change in capacitance.
- the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
- the gyro sensor 180B may be used to determine the movement posture of the terminal 100.
- the angular velocity of the terminal 100 around three axes ie, x, y, and z axes
- the gyro sensor 180B can be used for image stabilization.
- the gyro sensor 180B detects the shake angle of the terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the terminal 100 through a reverse movement to achieve anti-shake.
- the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
- the air pressure sensor 180C is used to measure air pressure.
- the terminal 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor.
- the terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
- the terminal 100 can detect the opening and closing of the flip according to the magnetic sensor 180D.
- features such as automatic unlocking of the flip cover are set.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the terminal's posture, apply to horizontal and vertical screen switching, pedometer and other applications.
- the terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
- the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
- the light emitting diode may be an infrared light emitting diode.
- the terminal 100 emits infrared light to the outside through the light emitting diode.
- the terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100.
- the terminal 100 can use the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power.
- the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
- the ambient light sensor 180L is used to sense the brightness of the ambient light.
- the terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
- the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
- the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
- the temperature sensor 180J is used to detect temperature.
- the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
- the terminal 100 when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature.
- the terminal 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
- Touch sensor 180K also called “touch device”.
- the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
- the touch sensor 180K is used to detect touch operations acting on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- the visual output related to the touch operation can be provided through the display screen 194.
- the touch sensor 180K may also be disposed on the surface of the terminal 100, which is different from the position of the display screen 194.
- the bone conduction sensor 180M can acquire vibration signals.
- the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice.
- the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
- the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
- the audio module 170 can parse out the voice command based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M to realize the voice function.
- the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
- the button 190 includes a power button, a volume button, and so on.
- the button 190 may be a mechanical button. It can also be a touch button.
- the terminal 100 may receive key input, and generate key signal input related to user settings and function control of the terminal 100.
- the motor 191 can generate vibration prompts.
- the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
- touch operations applied to different applications can correspond to different vibration feedback effects.
- Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
- Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
- the SIM card interface 195 is used to connect to the SIM card.
- the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the terminal 100.
- the terminal 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
- the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
- the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
- the SIM card interface 195 can also be compatible with different types of SIM cards.
- the SIM card interface 195 may also be compatible with external memory cards.
- the terminal 100 interacts with the network through the SIM card to implement functions such as call and data communication.
- the terminal 100 adopts an eSIM, that is, an embedded SIM card.
- the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
- the software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
- the software structure of the terminal 100 is exemplified.
- FIG. 3 is a functional architecture diagram of a terminal 100 provided by an embodiment of the present application.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
- the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
- the application layer can include a series of application packages.
- the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
- the application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
- the window manager is used to manage window programs.
- the window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- the content provider is used to store and retrieve data and make these data accessible to applications.
- This data can include videos, images, audios, calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls that display text and controls that display pictures.
- the view system can be used to build applications.
- the display interface can be composed of one or more views.
- a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
- the phone manager is used to provide the communication function of the terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
- the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
- the notification manager is used to notify the download completion, message reminder, etc.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
- prompt text information in the status bar sound a prompt tone, terminal vibration, flashing indicator light, etc.
- Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
- the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
- the application layer and the application framework layer run in a virtual machine.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
- the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
- the 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
- the process of detecting the trigger operation may include: when the touch sensor 180K receives the touch operation, a corresponding hardware interrupt is sent to the kernel layer.
- the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, etc.).
- the original input events are stored in the kernel layer.
- the application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the input event, and detects that the touch operation occurs.
- the touch operation as a click operation
- the control corresponding to the click operation is an icon of a music application as an example
- the music application calls the interface of the application framework layer to start the music application, and then displays the interface of the music application through the display screen 194.
- the embodiments of the present application can be applied to a scenario where a terminal is awakened for voice interaction.
- a terminal is awakened for voice interaction.
- the function of avoiding wake-up word can be realized.
- the terminal's wake-up word "Hello, Xiaohua" as an example, the following is a comparative description in combination with scenario one to scenario eight:
- Scenario 1 During driving, the user wants to play a song on the vehicle terminal.
- this scenario includes the following steps 1 to 7:
- Step 1 The user says "Hello, Xiaohua”.
- Step 2 The vehicle terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", the vehicle terminal is awakened, and the voice "Is there" is played, thereby responding to the user's voice command; after that, the vehicle terminal is preset If the voice command is not collected for the duration, the vehicle-mounted terminal sleeps again.
- Step 3 The user turns on the radio switch.
- Step 4 The user says "Hello, Xiaohua”.
- Step 5 The vehicle-mounted terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", and the vehicle-mounted terminal is re-awakened and plays the voice "Are you there", thereby responding to the user's voice command.
- Step 6 The user says “Help play the song of "Eastern Music Radio”.
- Step 7 The vehicle-mounted terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", then the vehicle-mounted terminal will adjust the radio station to "Eastern Music Radio” and play songs from "Eastern Music Radio".
- this scenario may include the following steps 1 to 4:
- Step 1 The user turns on the radio switch.
- Step 2 The radio switch sends a signal to the vehicle-mounted terminal, and the vehicle-mounted terminal determines that the opening operation of the radio switch is detected. According to the opening operation of the radio switch, the mapping relationship is inquired, and the target service type obtained is music, thereby predicting that the user has a target The intention of the business of music business type to conduct voice interaction.
- Step 3 The user says “Help play the songs of "Eastern Music Radio”.
- Step 4 The vehicle-mounted terminal collects the voice command, obtains that the service type corresponding to "Help to play the song of "Oriental Music Radio” is music, and determines that the predicted target service type is the same as the service type expressed by the user through voice, and then plays the "Oriental Music” Radio” songs. In addition, if the content said by the user in step 3 has nothing to do with music, the vehicle-mounted terminal may not respond to the content said by the user.
- the user can activate the voice interaction function of the vehicle-mounted terminal for the business type of music, which is free of wake-up words, thereby eliminating the step of frequently speaking the wake-up words by the user.
- Scenario 2 During driving, the user wants to view instant messaging messages.
- this scenario includes the following steps 1 to 7:
- Step 1 Application A running on the mobile phone receives an instant messaging message.
- Step 2 The user says "Hello, Xiaohua”.
- Step 3 The terminal collects the voice command and confirms that the voice command contains "Hello, Xiaohua", then the phone is woken up and plays the voice "Are you there", so as to respond to the user's voice command; after that, the phone does not collect the voice command for a preset time When the voice command is reached, the phone sleeps again.
- Step 4 The user says "Hello, Xiaohua”.
- Step 5 The mobile phone collects the voice command and confirms that the voice command contains "Hello, Xiaohua", then the mobile phone is re-awakened and the voice "Are there" is played, thereby responding to the user's voice command.
- Step 6 The user says “Look at what application A is saying”.
- Step 7 The mobile phone collects the voice command and confirms that the voice command contains "Look at what application A is saying", then the mobile phone obtains the instant messaging message "Go eat hot pot at 7pm" received by application A, and then plays the voice "7pm at night” Order to eat hot pot”.
- this scenario may include the following steps 1 to 4:
- Step 1 Application A running on the mobile phone receives an instant messaging message.
- Step 2 The mobile phone queries the mapping relationship according to the received instant messaging message, and the target service type obtained is application A, thereby predicting that the user has the intention of voice interaction for the service of the application A type.
- Step 3 The user says “Look at what application A is saying”.
- Step 4 The mobile phone collects voice commands and obtains the service type corresponding to "Look at what application A is saying" as application A.
- the mobile phone determines that the predicted target service type is the same as the service type expressed by the user through voice, then the mobile phone obtains the application
- the instant messaging message "Go to eat hot pot at seven in the evening" received by Program A plays the voice "Go eat hot pot at seven in the evening".
- the mobile phone may not respond to the content said by the user, thereby avoiding false wakeup.
- the wake-up-free voice interaction function of the mobile phone for instant messaging application services can be activated, thereby eliminating the need for users to frequently speak wake-up words.
- Scenario 3 When a song is about to end, the user wants to continue playing another song.
- this scenario includes the following steps 1 to 5:
- Step 1 The song A currently playing on the smart speaker will end.
- Step 2 The user says "Hello, Xiaohua”.
- Step 3 The smart speaker collects the voice command and determines that the voice command contains "Hello, Xiaohua", then the smart speaker is awakened and plays the voice "Are you there", thereby responding to the user's voice command.
- Step 4. The user says “put down song B”.
- Step 5 The smart speaker collects the voice command and determines that the voice command contains "put down song B", and the smart speaker plays song B.
- this scenario may include the following steps one to three:
- Step 1 The song A currently played by the smart speaker is about to end, the smart speaker determines that the current business progress meets the conditions, and the smart speaker queries the mapping relationship according to the current business music, and the target business type obtained is music, thereby predicting that the user has music-oriented The intention of this type of service for voice interaction.
- Step 2 The user says “put down song B”.
- Step 3 The smart speaker collects the voice command and obtains that the service type corresponding to "Put Down Song B" is music. Then it is determined that the predicted target service type is the same as the service type expressed by the user through voice, and the smart speaker plays song B. In addition, if the content said by the user in step 2 has nothing to do with music, the smart speaker may not respond to the content said by the user.
- this scenario includes the following steps 1 to 4:
- Step 1 The user says "Hello, Xiaohua”.
- Step 2 The vehicle-mounted terminal collects voice commands and determines that the voice command contains "Hello, Xiaohua", and the vehicle-mounted terminal is awakened and plays the voice "Are you there", thereby responding to the user's voice command.
- Step 3 The user says “Turn on the wipers”.
- Step 4 The vehicle-mounted terminal collects the voice command and determines that the voice command includes "open the wiper", then the vehicle-mounted terminal sends a signal to the car's controller, and the controller controls the wiper drive circuit to drive the wiper to rotate.
- this scenario may include the following steps one to three:
- Step 1 The vehicle-mounted terminal determines that it detects rain through the raindrop sensor, and then queries the mapping relationship according to the rain, and the target service type is wiper, thus predicting that the user has the intention of voice interaction for the service type of wiper .
- Step 2 The user says “Turn on the wiper.”
- Step 3 The vehicle-mounted terminal collects the voice command and obtains that the service type corresponding to "Turn on the wiper" is the wiper. Then it is determined that the predicted target service type is the same as the service type expressed by the user through voice, and the vehicle-mounted terminal sends a signal to the car controller. The controller instructs the drive circuit of the wiper to drive the wiper to rotate. In addition, if the content said by the user in step 2 has nothing to do with the wiper, the vehicle-mounted terminal may not respond to the content said by the user.
- the mobile phone can activate the wake-up word-free voice interaction function of the service type of wiper, thereby eliminating the need for users to frequently speak wake-up words.
- Scenario 5 During driving, the car's fuel is insufficient.
- this scenario includes the following steps 1 to 4:
- Step 1 The user says "Hello, Xiaohua”.
- Step 2 The vehicle-mounted terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", then the mobile phone is awakened and the voice "Are there" is played, thereby responding to the user's voice command.
- Step 3 The user says "where is the nearest gas station”.
- Step 4 The on-board terminal collects the voice command and determines that the voice command contains "Where is the nearest gas station", then the on-board terminal calls the interface with the navigation application to query the gas station address, and the voice "gas station address is in X area Y road” .
- this scenario may include the following steps one to three:
- Step 1 The vehicle-mounted terminal detects the fuel quantity of the car and determines that the fuel quantity is lower than the threshold. Then the mapping relationship is queried according to the fuel quantity, and the service type set obtained includes gas station and navigation, thereby predicting that the user has two types of gas station or navigation The intent of the business type of business for voice interaction.
- Step 2 The user says "where is the nearest gas station”.
- Step 3 The vehicle-mounted terminal collects voice instructions and obtains that the service type corresponding to "Where is the nearest gas station" is navigation. Then it is determined that the predicted target service type is the same as the service type expressed by the user through voice, and the vehicle-mounted terminal calls and navigation applications To query the address of the gas station, the voice "gas station address is in X area Y road" is played. In addition, if the content said by the user in step 2 has nothing to do with navigation, the vehicle-mounted terminal may not respond to the content said by the user.
- the on-board terminal can activate the wake-up-free voice interaction function for the two types of services, gas station or navigation, thereby eliminating the need for users to frequently speak wake-up words.
- this scenario includes the following steps 1 to 5:
- Step 1 When the mobile phone receives the call request from the calling party, the operating system of the mobile phone pushes the incoming call notification.
- Step 2 The user says "Hello, Xiaohua”.
- Step 3 The terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", then the mobile phone is awakened and the voice "Are there" is played.
- Step 4. The user says “Help me answer the call”.
- Step 5 The mobile phone collects the voice command and confirms that the voice command contains "Help me answer the call", then the mobile phone answers the call.
- this scenario may include the following steps one to three:
- Step 1 When the mobile phone receives the call request from the calling party, the operating system of the mobile phone pushes the notification of the incoming call, and the mobile phone detects the notification of the incoming call. According to the notification of the incoming call, the mapping relationship is queried, and the target service type obtained is communication, thus predicting that the user has The intention of voice interaction for the business type of communication.
- Step 2 The user says “Help me answer the call”.
- Step 3 The mobile phone collects the voice command, and obtains the semantic information "answer the call” corresponding to the semantic information "answer the call” according to the semantic information "answer the call” as the communication type, and determine the predicted target business type and the user's pass If the voice service type is the same, the mobile phone will answer the call. In addition, if the content said by the user in step 3 has nothing to do with the communication, the vehicle-mounted terminal may not respond to the content said by the user.
- the mobile phone if it receives an incoming call, it can activate the mobile phone's voice interaction function without wake-up words for the service type of communication, thereby eliminating the need for users to frequently speak wake-up words.
- Scene 7 The car drives to an area with poor air quality.
- this scenario includes the following steps 1 to 4:
- Step 1 The user says "Hello, Xiaohua”.
- Step 2 The terminal collects the voice command and determines that the voice command contains "Hello, Xiaohua", then the mobile phone is awakened and the voice "Are there" is played, thereby responding to the user's voice command.
- Step 3 The user says “Turn on the air purifier”.
- Step 4 The vehicle-mounted terminal collects the voice command and determines that the voice command includes "turn on the air purifier", then the vehicle-mounted terminal sends a signal to the car's controller, and the controller controls the air purifier to turn on.
- this scenario may include the following steps one to three:
- Step 1 The sensor of the air purifier detects the dust concentration and sends the detected dust concentration to the vehicle terminal.
- the vehicle terminal determines that the dust concentration exceeds the threshold, and then queries the mapping relationship according to the dust concentration, and the target service type is air purifier , Thus predicting that the user has the intention of voice interaction for the air purifier type of business.
- Step 2 The user says “Turn on the air purifier”.
- Step 3 The vehicle-mounted terminal collects the voice command, obtains that the service type corresponding to "turn on the air purifier" is the air purifier, and determines that the predicted target service type is the same as the service type expressed by the user through voice, and the vehicle-mounted terminal reports to the car controller Send a signal and the controller controls the air purifier to turn on.
- the vehicle-mounted terminal may not respond to the content said by the user.
- the on-board terminal can activate the wake-up-free voice interaction function for the service type of air purifier, thereby eliminating the need for users to frequently speak wake-up words.
- this scenario includes the following steps 1 to 4:
- Step 1 The user says "Hello, Xiaohua”.
- Step 2 The vehicle-mounted terminal collects voice commands and determines that the voice command contains "Hello, Xiaohua", and the vehicle-mounted terminal is awakened and plays the voice "Are you there", thereby responding to the user's voice command.
- Step 3 The user says “Stop putting down the sun shade.”
- Step 4 The vehicle-mounted terminal collects the voice command and determines that the voice command contains "Put down the sun shade".
- the vehicle-mounted terminal sends a stop signal to the car's controller.
- the stop signal is used to instruct to stop opening the sun shade.
- the drive circuit that controls the sun shade stops and continues to open the sun shade.
- this scenario may include the following steps one to three:
- Step 1 The sun shade sends the current status to the vehicle-mounted terminal.
- the vehicle-mounted terminal determines that the opening degree of the sun shade satisfies the condition according to the state of the sun shade, and then queries the mapping relationship according to the sun shade to obtain the target service type "sun shade". Therefore, it is predicted that the user has the intention of voice interaction for the "sun shade" type of business.
- Step 2 The user says “Stop putting down the sun shade.”
- Step 3 The vehicle-mounted terminal collects the voice command, and according to the semantic information "Stop putting down the sun shade” corresponding to the voice instruction, the semantic information "Stop putting down the sun shade” is obtained.
- the service type corresponding to the sun shade then the vehicle terminal sends a stop to the car controller Signal, stop signal is used to instruct to stop opening the sun shade.
- stop signal is used to instruct to stop opening the sun shade.
- the controller receives the stop signal, the drive circuit that controls the sun shade stops and continues to open the sun shade.
- the vehicle-mounted terminal may not respond to the content said by the user.
- the on-board terminal can activate the wake-up word-free voice interaction function for the business type of sun shade, thereby eliminating the need for users to frequently speak wake-up words step.
- Fig. 4 is a flowchart of a voice interaction method provided by an embodiment of the present application. This embodiment is described by taking the execution subject as a terminal as an example. Referring to Fig. 4, the method includes:
- the terminal determines that a target event has been detected.
- the target event is an event that can trigger voice interaction. If the target event occurs, the user will have a certain probability of generating the intention of voice interaction, and has the need to wake up the terminal for voice interaction. In view of this, the terminal will detect the target event, so that when it is determined that the target event is detected, the target event and the voice command are combined to determine whether to activate the voice interaction function.
- the target event may have one or more modalities, and the modalities refer to the form or dimension of the target event.
- the modality of the target event can be the user's operation
- the modality of the target event can also be a notification message
- the modality of the target event can also be that the environmental parameters meet the first condition
- the modality of the target event can also be the current business The progress of the target event satisfies the second condition.
- the modal of the target event can also be replaced with other modalities according to business requirements. This embodiment does not limit the modal of the target event.
- the user's operation may not be limited to one or more of the operation of the physical button, the operation of the interface, the voice command, and the browsing behavior.
- the physical button may be a button on the terminal body, or a button on another device that has established a communication connection with the terminal.
- the terminal is a vehicle-mounted terminal
- the physical button can be a button on any device mounted on the car.
- the physical button can be an air-conditioning switch, a radio switch, and so on. If the user operates the physical button, the physical button can send a signal to the terminal, and the terminal determines that the operation on the physical button is detected.
- the interface can be the system interface of the terminal or the interface of an application program.
- the terminal determines that the operation on the interface is detected.
- Voice commands can be collected through a microphone.
- the browsing behavior may be the behavior of the user browsing the interface of the terminal. If the user performs the browsing behavior, the terminal may capture the user's line of sight through a camera, thereby determining that the browsing behavior is detected.
- the operation may be a pressing operation, a clicking operation or a sliding operation, etc. The specific type of operation is not limited in this embodiment.
- the notification message may be a message pushed by the operating system or an application program.
- the notification message may be one or more of incoming call notification, short message, instant messaging message, alarm message, and resource recommendation message.
- the alarm message may indicate that the terminal has malfunctioned, for example, it may indicate that the power is lower than 10% of the total power, insufficient memory, or the network is attacked; the alert message may also indicate that other devices that have established a communication connection with the terminal have failed.
- an alarm message from a vehicle-mounted terminal can indicate that the motor of the car is malfunctioning;
- a resource recommendation message can indicate a resource recommended to the user, for example, it can be news recommended by news applications, goods or services recommended by e-commerce applications, or game applications recommended Virtual items, etc.
- Environmental parameters can be but not limited to one or more of noise, temperature, humidity, brightness, dust concentration, and oil volume.
- the first condition may not be limited to that the environmental parameter exceeds the parameter threshold or the change amount of the environmental parameter exceeds the change threshold.
- the sensor may collect the environmental parameter in real time or periodically, and send the collected environmental parameter to the terminal.
- a temperature sensor can collect temperature and send temperature to the terminal;
- a humidity sensor can collect humidity and send humidity to a terminal;
- a dust sensor can collect dust concentration and send it to the terminal;
- a brightness sensor can collect brightness and send brightness to a terminal;
- a microphone can Collect the noise intensity and send the noise intensity to the terminal.
- the fuel level sensor can collect the current remaining fuel level of the vehicle and send the fuel level to the terminal.
- the terminal may receive the environmental parameter from the sensor; the terminal may determine whether the environmental parameter exceeds the parameter threshold, and if the environmental parameter exceeds the parameter threshold, it is determined that the detected environmental parameter satisfies the first condition.
- the terminal can obtain the change amount of the environmental parameter according to the currently obtained environmental parameter and the historically obtained environmental parameter, and determine whether the change amount of the environmental parameter exceeds the change amount threshold. If the change amount of the environmental parameter exceeds the change amount threshold, it indicates the environment If a change occurs, it is determined that the detected environmental parameter meets the first condition.
- the current service may be the service currently performed by the terminal or the service currently performed by other devices that have established a communication connection with the terminal.
- the terminal is a vehicle-mounted terminal
- the vehicle-mounted terminal can establish a communication connection with the air conditioner, sunshade, and wiper on the vehicle through the vehicle controller
- the current service can be the navigation service or music service currently performed by the vehicle-mounted terminal, or it can be Air conditioner temperature adjustment business, windshield wiper rotation business, sun shade lowering business.
- the second condition may not be limited to a change in the progress of the business.
- the progress of the business may exceed the threshold or the amount of change in the progress of the business exceeds the threshold of change.
- the second condition may be that the business is about to end and the business is executed. Half wait.
- the current business progress meeting the second condition can be that the air conditioner starts to perform temperature adjustment services, such as the air conditioner starts to set the temperature, the air conditioner starts to increase the air volume, etc.; the current business progress meets the second condition, it can also be the air conditioner
- the progress of the temperature adjustment service performed changes, for example, the temperature sensor of an air conditioner detects a temperature change.
- the terminal can obtain the progress of the current business and determine whether the progress of the current business exceeds the threshold, and if the progress of the current business exceeds the threshold, determine that the progress of the current business meets the second condition; Alternatively, the terminal can obtain the change amount of the current business progress according to the current progress of the current business and the historical progress of the current business, and judge whether the change amount exceeds the change amount threshold. If the change amount of the current business progress exceeds the change amount threshold, it indicates the current If the business changes, it is determined that the current business progress meets the second condition.
- the amount of change and the threshold value of the amount of change may be expressed by the percentage of change, or may be expressed by the time of change, of course, it may also be expressed by data of other dimensions, which is not limited in this embodiment.
- the target event is not limited to this. It should be understood that the modality of the target event can be expanded according to the actual service of the terminal, but any event that can trigger a voice interaction can be provided as a target event, and this embodiment does not limit the target event.
- the terminal queries the mapping relationship according to the target event to obtain a service type set.
- the service type set includes one or more target service types, and each target service type is a service type corresponding to the voice interaction intention.
- the terminal can predict the user's voice interaction intention based on the target event, and obtain a set of service types.
- the target service type can be the type of service performed by the terminal, or the type of service performed by other devices that have established a communication connection with the terminal.
- the target service type can be the air conditioner on the vehicle.
- Type of business performed by equipment such as, lights, etc.
- the target service type may be one or more of navigation, schedule consultation, air conditioning, radio, music, car control, mileage inquiry, question and answer consultation, games, system settings, vehicle control, charging, maintenance, and communication.
- the mapping relationship may include one or more events and one or more business types.
- the mapping relationship may indicate the corresponding relationship between the event and the business type.
- Each event in the mapping relationship may correspond to one or more business types.
- the event in the mapping relationship can be the first entry
- the service type corresponding to the event can be the second entry
- the positions of the first entry and the second entry correspond to each other.
- the first entry and the second entry can be Located in the same row.
- the mapping relationship can be shown in Table 1 below.
- the terminal can use the target event as an index, and in the mapping relationship, the service type set can be obtained by querying. For example, if the detected target event is the operation of the master switch, look up Table 1, and the service type set obtained is (navigation, music, schedule consultation).
- the mapping relationship may specifically include the first mapping relationship between the user's operation and the service type corresponding to the operation, the second mapping relationship between the notification message and message viewing or message processing, and the relationship between the environmental parameter and the adjustment environmental parameter.
- the first mapping relationship may include one or more operations and service types corresponding to the one or more operations, and any operation may correspond to one or more service types.
- the first mapping relationship may be as shown in Table 2 below.
- the first mapping relationship may be constructed based on operation continuity rules.
- the operation continuity rule refers to: if operation A and operation B are continuous operations, then if the user performs operation A, it can be predicted that the user has the intention to perform operation B. Continuous operation means that after the user performs operation A, operation B is performed immediately.
- operation A (the user's current operation) is called the first operation
- operation B (the continuous operation associated with the first operation) is called the target second operation
- the target second operation is the execution of the first operation.
- the target second operation is the predicted operation
- the target second operation may or may not be performed
- the target second operation may be the first operation after the first operation.
- the first mapping relationship may include a mapping relationship between the first operation and one or more business types, each business type in the first mapping relationship is a business type corresponding to the target second operation, and one or more target second operations
- the operation is a continuous operation associated with the first operation. For example, referring to Table 2, turning on the master switch (first operation) and navigating to the destination (target second operation) are continuous operations, and turning on the master switch (first operation) and clicking the song play button (target second operation) ) Is also a continuous operation, and turning on the master switch (first operation) and viewing today’s schedule (target second operation) are also continuous operations.
- the business type of navigating to the destination is navigation, click on the song to play
- the business type corresponding to the button operation is music
- the business type of viewing today’s schedule is schedule consultation. Therefore, when constructing the first mapping relationship, the operation of the master switch can be the first operation to navigate, music, and As the business type corresponding to the target second operation, the schedule consultation will write the operation, navigation, music and schedule consultation of the master switch into the first mapping relationship.
- opening the navigation interface (first operation) and entering the navigation destination on the navigation interface (target second operation) are continuous operations, and the operation corresponding to the operation of entering the navigation destination is navigation.
- the operation of the master switch can be used as the first operation, and navigation is the target.
- the second operation corresponds to the business type. Then the mapping relationship between the navigation interface and the navigation will be opened and stored in the first mapping relationship.
- the terminal can predict the navigation as the target service type.
- the second mapping relationship includes one or more notification messages and message viewing or message processing.
- Message viewing can include viewing short messages, viewing instant messaging messages of instant messaging applications, and viewing push messages of resource recommendation applications.
- Message processing can include answering calls and messages. Reply, trouble shooting and information inquiry.
- Table 3 the second mapping relationship may be as shown in Table 3 below.
- the second mapping relationship may be constructed based on the user's requirements for viewing notification messages or processing notification messages. Specifically, if a notification message is received, it can be predicted that the user has the intention of viewing or processing the notification message. Therefore, the second mapping relationship may be a mapping relationship between notification messages and message viewing, or the second mapping relationship may be The mapping relationship between notification messages and message processing. For example, referring to Table 3 above, if an instant messaging application pushes an instant messaging message, it can be predicted that the user has the intention of viewing instant messaging messages, and the service type corresponding to viewing instant messaging messages can be instant messaging applications, so you can build a second In the mapping relationship, the instant messaging message and the identifier of the instant messaging application can be written into the second mapping relationship.
- the third mapping relationship includes one or more environmental parameters and adjustment environmental parameters.
- the third mapping relationship may be as shown in Table 4 below.
- the third mapping relationship may be constructed based on the user's needs to respond to environmental changes. Specifically, considering that the environmental parameters will affect the user’s perception when the conditions are met, it can be predicted that the user has the intention to adjust the environmental parameters. Therefore, the third mapping relationship may include the relationship between the environmental parameters and the business types corresponding to the adjusted environmental parameters. The mapping relationship. For example, referring to Table 4 above, if the temperature change meets the threshold, it can be predicted that the user has the need to adjust the temperature, and the service type corresponding to the temperature adjustment is air conditioning. Therefore, when the third mapping relationship is constructed, the temperature change can be satisfied The mapping relationship between the threshold and the air conditioner is stored in the third mapping relationship.
- the fourth mapping relationship may include one or more current services and service types of the current services. Schematically, the fourth mapping relationship may be as shown in Table 5 below.
- the fourth mapping relationship may be constructed based on the user's response requirements. Specifically, considering that the current business progress will affect the user’s perception when the conditions are met, it can be predicted that the user will restart the current business, stop the current business, or adjust the current business. Therefore, the fourth mapping relationship can be the current business Mapping relationship with the business type of the current business.
- the mapping relationship between the music about to end and the music can be stored in the fourth mapping relationship.
- the process of establishing the mapping relationship may include the following implementation manner 1 to implementation manner 2:
- Implementation mode 1 The terminal obtains the historical business executed in association with the historical target event according to the historical record, and writes the business type of the historical business and the historical target event into the mapping relationship.
- the mapping relationship can be constructed according to the historical voice interaction process. Specifically, if the terminal detects a historical target event at a historical point in time, the user interacts with the terminal by voice so that the terminal executes a certain historical service in response to the user’s voice command, the terminal can establish the historical target event and the The mapping relationship between the service types of historical services, then after the subsequent terminal detects the target event, it can be awakened, and when the service type corresponding to the voice command is the same as the service type of the historical service, it can execute in response to the voice command business.
- the historical record includes historical target events and historical services executed in association with the historical target events.
- the historical services executed in association with the historical target events are the services executed after the historical target event is detected. It can be the first time after the historical target event is detected. Services performed by voice interaction. For example, if an operation triggered on the main switch is detected yesterday, and the service performed by the first voice interaction is to navigate to cell A, the history record can include the operation triggered on the main switch and the navigation to cell A. The operation triggered by the master switch and the navigation to cell A write the mapping relationship. Then, if the terminal currently detects an operation triggered on the master switch, it can query the mapping relationship according to the operation of the master switch, and the target service type in the service type set will include navigation.
- the terminal may write the service type of the recently executed historical service into the mapping relationship.
- the terminal can obtain the historical time period according to the current time point and the preset time period, and obtain the historical service executed in association with the historical target event in the historical time period according to the historical record, and the service type of the historical service and the historical target event, Write the mapping relationship.
- the historical time period may be the last day, the last week, or the last month.
- the end point of the historical time period may be the current time point, and the preset duration may be one day, one week, and so on. In this way, the timeliness of the mapping relationship can be guaranteed, and the mapping relationship can better reflect the user's recent behavior habits.
- the terminal may write historical services executed at a high frequency into the mapping relationship.
- the terminal can obtain the number of executions of each historical service executed in association with the target event according to the historical record; the terminal can select the historical service with the most execution times from a plurality of historical services, and the service type of the historical service with the most execution times and Historical target events are written into the mapping relationship.
- the terminal may select historical services whose execution times exceed the threshold, and write the service types and historical target events corresponding to the historical services whose execution times exceed the threshold into the mapping relationship.
- the user's voice interaction intention after the current target event will be highly likely to interact with the voice interaction generated after the historical target event in the past. If the intent is the same or similar, then the business that will be executed after the current target event occurs will have a high probability of being the same or similar to the business executed after the historical target event occurs. Therefore, the current voice can be predicted by combining historical records The target business type targeted by the interaction intention can improve the accuracy of the target business type.
- Implementation mode 2 The terminal calls the machine learning model, inputs the sample target event into the machine learning model, outputs the business type, and writes the output business type and the sample target event into the mapping relationship.
- the machine learning model is used to predict the type of business based on the event. For example, it can predict the type of business executed in association with the event based on the current event.
- multiple sample events and multiple sample business types can be used for model training to obtain the machine learning model.
- the sample event can be an event executed by the sample user’s terminal or the history of the terminal. Events recorded in the log.
- the sample business type is the business type of the business executed in association with the sample event.
- the machine learning model can be, but is not limited to, a neural network model.
- the machine learning model can learn the mapping relationship between events and business types through a large number of samples in advance. Then, the machine learning model can accurately predict the target business targeted by the voice interaction intention based on the current target event. Type, thereby improving the accuracy of the target business type.
- mapping relationship can be constructed by the terminal or other devices other than the terminal, and the other device sends the constructed mapping relationship to the terminal, and the terminal can receive the mapping relationship to obtain the mapping relationship.
- other devices may not be limited to the voice interaction platform 200 shown in FIG. 1.
- the terminal can also obtain the mapping relationship in other ways.
- the mapping relationship can be published through a certain link address, and the terminal can access the link address and download the mapping relationship from the Internet. This embodiment does not show how the terminal obtains the mapping relationship. Make specific restrictions.
- step 402 may not be limited to one or more of the following cases (1) to (4):
- the terminal queries the mapping relationship according to the first operation to obtain the service type set.
- the target service type included in the service type set is the service type corresponding to one or more target second operations, and the one or The multiple target second operations are continuous operations associated with the first operation.
- the terminal may use the service type corresponding to the target second operation as the target service type.
- one first operation may correspond to one or more target second operations. For example, if the user triggers a confirmation operation on the air conditioner button, generally speaking, the user will adjust the temperature immediately after the confirmation operation on the air conditioner button, then the confirmation operation on the air conditioner button is the first operation, and adjusting the temperature is the target second operation.
- the service type corresponding to the temperature adjustment is air conditioner, so the terminal can obtain the air conditioner as the target service type; another example, if the user clicks on the music search option, generally speaking, the user clicks on the music search option and then fills in the song name. Clicking on the music search option is the first operation, and filling in the song name is the target second operation, and the service type corresponding to the song name is music, so the terminal can obtain music as the target service type; for another example, if the user clicks the fault display Options. Generally speaking, after the user clicks on the fault display option, the user will search for the fault solution or solve the fault according to the fault information viewed. Clicking on the fault display option is the first operation, and searching for the fault solution or solving the fault is the goal. Second operation, and the service type corresponding to the search for the solution to the fault or the solution to the fault is the fault, vehicle control or search, so the terminal can acquire the fault, vehicle control or search as the target service type.
- the next operation will be executed continuously, so the intention of voice interaction for the business corresponding to the next operation will be generated.
- the rule of maps the operation currently performed by the user to the service type corresponding to the next operation to be performed with a certain probability, so that when the user performs the operation, it can accurately predict which service type the user wants to voice Interaction, thus ensuring the accuracy of the target business type.
- the terminal may query the first mapping relationship according to the first operation to obtain the service type set.
- Case (2) The terminal queries the mapping relationship to obtain the service type set, and the target service type included in the service type set is the message viewing or message processing corresponding to the notification message.
- the message view may be to view the notification message through an application that pushes the notification message.
- the notification message is an instant messaging message
- the message view may be through an instant messaging application to view the instant messaging message.
- the notification message is For news recommendations, the news can be viewed through the news application to view the news; or, the news can be viewed by playing the notification message, displaying the notification message on the screen, or projecting the notification message, etc.; the message processing can be performed for the notification message Search, reply to notification messages, and handle failures corresponding to notification messages.
- the terminal receives a notification message
- the user will have the need to view the message or process the message, so the voice interaction intention to view or process the message will be generated.
- the user s view or processing of the message is fully considered
- the message demand will map the target event of receiving a notification message to two target service types: message viewing or message processing, so that when the notification message is received, the user can accurately predict which service the user wants to target Type for voice interaction, thus ensuring the accuracy of the target service type.
- case (2) may include one or more of the following cases (2.1) to (2.3).
- message viewing can be to convert short messages or instant messaging messages from text to voice, and to play voice short messages or instant messaging messages.
- the message reply can be based on the short message or instant messaging message, obtaining the reply information corresponding to the short message or instant messaging message, and sending the reply information to the sender of the short message or instant messaging message; or, receiving the reply information entered by the user and sending the reply information Send to the sender of the SMS or IM message.
- Fault handling can be output fault handling plan, repair and maintenance, etc. For example, if the warning message is a low battery message, the fault handling can be mileage inquiry or charging station. If the alarm message is a motor fault message, the fault handling can be Q&A consultation, motor maintenance, etc. Information query can be to query the solution of the fault, query the source of the fault, etc.
- the corresponding target service type is predicted for each notification message, so that it can support various application scenarios in which the notification message is received, and expand the application range.
- the terminal may query the second mapping relationship according to the notification message to obtain one or more target service types.
- Adjusting environmental parameters can include reducing dust concentration through air purifiers, increasing humidity through humidifiers, adjusting temperature through air conditioners or car windows, adjusting light intensity through sunshades or car windows, and adjusting rainfall through wipers.
- the environment will affect the user's perception, and the user will have the need to cope with the environment. For example, if a certain environmental parameter changes, the user will have the need to adjust this environmental parameter, so a voice for adjusting the environmental parameter will be generated In this way, the user’s need to respond to the environment is fully considered, and the target event that the environmental parameter meets the first condition is mapped to the target business type of adjusting the environmental parameter, so that the environmental parameter meets the first condition. Under certain conditions, it accurately predicts which service type the user wants to perform voice interaction with, thus ensuring the accuracy of the target service type.
- the terminal can query the third mapping relationship according to the environmental parameters to obtain one or more target service types.
- the service type of the current service can be, but not limited to, re-execute the current service, stop performing the current service, or adjust the current service. For example, if the current service is playing music, the current service type can be playing other music or re-playing. Music, stop playing music, etc.
- the progress of the current business will affect the user’s perception, and the user will have the need to cope with the current business. For example, if the current business is about to end, the user usually wants to re-execute the current business, stop executing the current business, or respond to the current business. The business is adjusted. In this way, the user’s needs for responding to business changes are fully considered, and the target event that the current business progress meets the second condition is mapped to the business type of the current business, so that the current business progress When the second condition is met, it can accurately predict which service type the user wants to perform voice interaction for, thus ensuring the accuracy of the target service type.
- step 402 may be replaced with: the terminal obtains the historical service executed in association with the historical target event according to the historical record, and obtains the service type corresponding to the historical service as the target service type.
- the target business type can be obtained by querying historical records without establishing a mapping relationship based on historical records.
- step 402 can be replaced with: the terminal invokes a machine learning model, inputs the target event into the machine learning model, and outputs the one or more target service types, and the machine learning model is used to predict the target service type according to the target event. That is, the target business type can be obtained through the machine learning model, without the need to establish a mapping relationship based on the machine learning model.
- the mapping relationship may also include the probability of each target service type.
- Step 402 may be replaced with: the terminal queries the mapping relationship according to the target event to obtain the service type set and each target service type in the service type set The corresponding probability.
- the probability indicates the probability that the business corresponding to the target business type will be executed.
- the greater the probability the greater the possibility that the service corresponding to the target service type will be executed after the corresponding event is detected. For example, if there is a mapping relationship between the target event i, the target business type j, and the target business type j corresponding to the probability 1, the target business type k, and the target business type k corresponding to the probability 2 there is a mapping relationship, then the mapping relationship can indicate that if it is determined to detect When the target event i occurs, it is predicted that the service of the target service type j or the target service type k will be executed, the probability of the service of the target service type j being executed is probability 1, and the probability of the service of the target service type k being executed is probability 2.
- i is the identifier of the target event
- j and k are the identifiers of the target business type.
- mapping relationship may be as shown in Table 6 below, where "/" in Table 6 means none.
- the set of service types can be obtained as (navigation, music, schedule query), where the probability corresponding to navigation is 0.6, the probability corresponding to music is 0.7, and the probability corresponding to schedule query is 0.4.
- mapping relationship shown in Table 6 can be provided as a multi-modal wake-up model.
- the input parameters of the multi-modal wake-up model include target events, and the output parameters of the multi-modal wake-up model include service type set and probability
- the multi-modal wake-up model can be used to predict the service type set according to the detected target event.
- the user can run the multi-modal wake-up model on the terminal or sell the multi-modal wake-up model to a third party.
- the terminal filters out target service types whose probability does not meet the probability threshold from the service type set.
- the terminal can compare the probability corresponding to the target service type with the probability threshold. If the probability corresponding to the target service type meets the probability threshold, the terminal will screen out the target service type, then If the service type of the first service corresponding to the subsequently collected voice command is the target service type, the first service will be executed; if the probability corresponding to the target service type does not meet the probability threshold, the terminal will filter out the target service type, then The business of the target business type will not be executed in the future.
- the same probability threshold can be preset for each target service type, and the probability threshold is pre-stored in the terminal, and the terminal can compare each target service type with the same probability threshold.
- a corresponding probability threshold can be set for each target business type, and the probability threshold corresponding to each target business type can be written into the mapping relationship, and the terminal can query the mapping relationship to obtain each target business The probability threshold corresponding to the type is compared with the corresponding probability threshold for each target service type.
- the probability thresholds corresponding to different service types may be the same or different, which is not limited in this embodiment.
- the mapping relationship may include Table 7 below. If the service type set is (navigation, music, schedule consultation), according to Table 6 above, the probability corresponding to navigation is 0.6, the probability corresponding to music is 0.7, and the corresponding probability to schedule consultation is The probability is 0.4. According to Table 7, the probability threshold corresponding to navigation is 0.5, the probability threshold corresponding to music is 0.5, and the probability threshold corresponding to schedule consultation is 0.5. For navigation, the probability 0.6 is greater than the probability threshold 0.5. For music, the probability 0.7 is greater than the probability threshold 0.5. For schedule consultation, the probability 0.4 is less than the probability threshold 0.5. Therefore, navigation and music are filtered out. Drop the schedule consultation.
- Target business type Probability threshold navigation 0.5 music 0.5 Radio station 0.5 communication 0.7 SMS 0.7 WeChat 0.7 Schedule consultation 0.5 news 0.4 Q&A 0.4 air conditioning 0.4 Car control 0.8 system 0.7 Maintenance 0.5
- the probability of the target service type does not meet the probability threshold, it indicates that the target service type is less likely to be the service type targeted by the user’s voice interaction intention. If the target service type is subsequently processed, the probability of false wake-up will be higher. If you wake up by mistake, it will cause interference to the user and cause excessive load on the terminal. Therefore, by filtering out the target service type, the subsequent voice commands of the service whose semantic information is the target service type will not be responded to. Reduce the probability of false wakeup, thereby avoiding the interference to the user and the load on the terminal due to false wakeup.
- step 403 is an optional step, not a mandatory step.
- the following steps are executed according to all the obtained target service types.
- the terminal collects voice instructions.
- the terminal may start monitoring when it is determined that the target event has been detected, and end monitoring when the monitoring time has elapsed.
- the terminal can collect the voice command through the microphone.
- the time period from the start of monitoring to the end of monitoring can be referred to as the receiving window, and the monitoring time can be set according to experiment, experience, or demand, and can be stored in the terminal in advance.
- the same monitoring duration can be set for each service type, or different monitoring durations can be set for different service types, and the mapping relationship between the service type and the monitoring duration is stored in the terminal, which is not limited in this embodiment.
- the terminal obtains the first service corresponding to the semantic information according to the semantic information corresponding to the voice command.
- the terminal may perform automatic speech recognition (ASR) on the voice command to obtain text information; perform semantic recognition on the text information to obtain semantic information; and obtain the first service by querying according to the semantic information. For example, if the semantic information is "Xiaohua, help plan the route to Building YY in XX Community", the first service is “Navigate to Building YY in XX Community”, and if the semantic information is "Xiaohua, help to put first ZZ" "Songs", the first business is "Play ZZ songs”.
- ASR automatic speech recognition
- the terminal executes the first service according to the voice instruction.
- the terminal can compare the service type of the first service with each target service type in the service type set. If the service type of the first service is the same as any target service type, it indicates that the service type expressed by the user’s voice is within the predicted service type. Within the scope, if the terminal predicts the service type correctly, and the user does have the intention of voice interaction, the wake-up is successful, the terminal will activate the voice interaction function, and in response to the voice command, execute the first service according to the voice command, that is, perform voice The business expressed by the instruction.
- the service type set is (navigation, music) and the first service is "Navigate to Building YY in XX Community"
- the service type of the first service is navigation, and the service type of the first service is in the service type set If the target service type navigation is the same, the terminal will navigate to Building YY in XX community.
- the terminal discards the voice command.
- the service type set does not include the service type of the first service, it indicates that the service type expressed by the user's voice is outside the range of the predicted service type, the terminal's prediction of the service type is incorrect, and the user has no intention of voice interaction, and the terminal may not In response to voice commands, the voice commands are discarded, so as to avoid false wakeups caused by service processing based on the voice commands, and save the buffer space occupied by the voice commands.
- the service type set is (navigation, music), and the first service is "Turn on the air conditioner"
- the service type of the first service is air conditioning, and the service type of the first service is related to each target service type in the service type set If they are different, the terminal will not respond to voice commands, that is, the air conditioner will not be turned on.
- the terminal can start timing when it detects the occurrence of the target event. If the recorded duration reaches the preset duration and the terminal does not receive a voice command, the terminal will exit the monitoring.
- step 407 is an optional step, not a mandatory step.
- the terminal updates the probability in the mapping relationship according to the semantic information corresponding to the voice command.
- the probability can be dynamically adjusted according to the semantic information expressed by the user this time, so as to evaluate the predicted target business in a self-learning manner.
- the correctness of the type through iteration to continuously correct the probability, so that the mapping relationship can be continuously optimized with the occurrence of the target event and the semantics expressed by the user, and gradually approach the user's personal behavior habits, ensuring that the mapping relationship is more accurate.
- the update method may include one or more of the following methods (1) to (3).
- the terminal will increase the probability of navigation.
- the user If each time a target event X is detected, the user always requests to perform services of the target service type Y through voice, indicating that the target service type Y happens to be the service type for which the user has voice interaction intentions after the target event X occurs, then through this Optionally, the probability of the target service type Y will continue to increase. Therefore, when the target event is detected again later, the probability of the target service type Y will meet the probability threshold, so that the target service type Y will be screened out. Then if the voice command corresponds to The first service corresponding to the semantic information of is the service of the target service type Y, and the terminal will respond to the voice command to execute the service of the target service type Y.
- the terminal can write the service type of the first service into the mapping relationship. Specifically, if the target event is not included in the mapping relationship, or the business type of the first business is not included in the mapping relationship, or the target event in the mapping relationship does not correspond to the business type of the first business, then the three In either case, the terminal may write the service type of the target event and the first service into the mapping relationship, so that the service type of the target event and the first service is added to the mapping relationship.
- the mapping relationship can be queried to obtain the service type of the first service, and the service type of the first service can be obtained as the target service type.
- the terminal detects event X during the historical operation process, the user expresses the intention of voice interaction with service type Y through voice, and by writing event X and service type Y into the mapping relationship, the mapping relationship will be new Increasing event X and service type Y, then, on the one hand, with the execution of the voice interaction process, the correlation between the event and the service type can be mined to supplement and improve the mapping relationship.
- the terminal can add new events and new service types to the mapping relationship to improve the scalability and timeliness of the mapping relationship.
- the probability corresponding to the service type of the first service may be generated, and the generated probability may be written into the mapping relationship.
- it is also possible to use the default probability as the probability corresponding to the service type of the first service write the service type and default probability of the first service to the mapping relationship, and subsequently adjust the default probability by performing the process shown in step 408.
- the terminal will reduce the probability of navigation and music correspondence.
- the target event X corresponds to the target service type Y
- the user does not request to execute the service of the target service type Y through voice, indicating that the target service type Y is not the target event X
- the user has The service type of the voice interaction intention
- the probability of the target service type Y will continue to decrease. Therefore, when the target event X is subsequently detected again, the probability of the target service type Y will not meet the probability threshold, so The target service type Y will be filtered out, and the terminal will not process the service of the target service type Y, thereby avoiding false wake-up.
- the terminal will increase the probability of navigation.
- the target event X corresponds to the target service type Y
- the user speaks a wake-up word, indicating that the target service type Y happens to be the service type for which the user has voice interaction intentions after the target event X occurs.
- the probability of the target business type Y will continue to increase. Therefore, when the target event is detected again later, the probability of the target business type Y will meet the probability threshold, so that the target business type Y will be screened out, then If the voice command expresses the semantic information of the target service type Y, the terminal will process the service in response to the voice command.
- step 408 is an optional step, not a mandatory step.
- This embodiment provides a method for triggering voice interaction without wake-up words.
- the target event that can trigger voice interaction predict the set of service types for which the user has voice interaction intentions. If the service type of the first service expressed by the voice command is predicted If the target business type is listed, the first business will be executed.
- the target event covers multiple modalities, and any target event of any modal can trigger the voice interaction function of the corresponding service type, so that in a variety of application scenarios, it can support the wake-up function without wake-up words, which expands the scope of application .
- FIG. 5 is a software architecture diagram of a voice interaction system provided by an embodiment of the present application.
- the system includes the following functional modules, and each module may be a software module.
- Voice activity detection (VAD) module or front-end speech (front-end speech) module used to collect audio signals, perform noise reduction and enhancement processing on the collected audio signals, and detect whether the audio signals are voice commands It is also a non-voice command. If the audio signal is a voice command, the voice command is input to the ASR module, where the non-voice command can be a noise signal, a music signal, etc.
- ASR Automatic speech recognition module: used to receive voice commands from the VAD module or front-end voice module, convert the voice commands into text information, and input the text information into the dialogue understanding module or dialogue management module.
- Multi-modal detection module used to detect the target event, if the target event is detected, the target event is input into the user intention prediction module.
- the multi-modal detection module can receive notification messages pushed by the operating system or application programs, such as short messages, incoming calls, application recommendation messages, and alarm messages; or, it can detect one or more modal operations: Operation of physical buttons or interface, voice commands; or, used to detect environmental changes or business changes, such as: temperature drop, excessive air pollution, music is about to end, etc.
- User intention prediction module used to receive target events from the multi-modal detection module, consider the user's operation continuity rules, user requirements for viewing, processing or response, the impact of environment or business on user perception, predict the user's voice interaction intention, and output The set of business types and the probability corresponding to each target business type in the set of business types.
- SLU Spoken language understanding
- DM dialogue management
- Historical data learning Based on the user's historical operation data, iteratively update the data in the user intention prediction module.
- RG Response generator
- TTS text to speech
- the vehicle-mounted terminal can be implemented by a combination of hardware and software.
- the structure of the vehicle-mounted terminal can be as shown in Figure 6, including:
- CPU Used to access each functional module on the memory or other storage, run each functional module, and can also access the storage and audio manager through the data bus (D-BUS).
- D-BUS data bus
- the processor can access various cloud services and cloud service management modules through a network interface.
- the processor can also access the controller area network (English: Controller Area Network, abbreviated as: CAN) bus through the gateway, read data of the vehicle and various devices carried by the vehicle, and control the vehicle and various devices carried by the vehicle.
- controller area network Terms: Controller Area Network, abbreviated as: CAN
- the storage includes memory and disk storage, and the stored content includes the functional modules shown in Figure 5.
- the audio manager is used to manage car speakers, microphone arrays or other audio devices.
- the voice interaction method of the embodiment of the present application is introduced above, and the voice interaction device provided by the embodiment of the present application is introduced below. It should be understood that the voice interaction device has any function of the terminal in the voice interaction method described above.
- FIG. 7 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application. As shown in FIG. 7, the device includes:
- the determining module 701 is used to perform step 401; the query module 702 is used to perform step 402; the acquisition module 703 is used to perform step 404; the acquisition module 704 is used to perform step 405; the processing module 705 is used to perform step 406 .
- the determining module 701 is configured to determine that the first operation of the user is detected; the query module 702 is configured to execute the case (1) in step 402.
- the determining module 701 is configured to receive notification messages from the operating system or an application program; the query module 702 is configured to execute the case (2) in step 402.
- the determining module 701 is configured to determine that the current environmental parameter satisfies the first condition; the query module 702 is configured to execute the case (3) in step 402.
- the determining module 701 is configured to determine that the progress of the current service meets the second condition; the query module 702 is configured to perform the case (4) in step 402.
- the device further includes: a writing module, configured to write the first service into the mapping relationship if the service type of the first service is different from each target service type in the service type set Business type.
- a writing module configured to write the first service into the mapping relationship if the service type of the first service is different from each target service type in the service type set Business type.
- the query module 702 is further configured to query the mapping relationship according to the target event to obtain the service type set and the probability corresponding to each target service type in the service type set;
- the device also includes: a filtering module for performing step 408.
- the device further includes: an update module for performing step 408.
- the update module is specifically configured to execute one or more of the methods (1) to (3) in step 408.
- the device further includes: a discarding module for performing step 407.
- the voice interaction device provided in the above embodiment only uses the division of the above functional modules for example during voice interaction.
- the above functions can be allocated by different functional modules according to needs, namely The internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above.
- the voice interaction device provided in the foregoing embodiment and the voice interaction method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
- a computer-readable storage medium such as a memory including instructions, which may be executed by a processor of a terminal to complete the voice interaction method in each of the foregoing embodiments.
- the computer-readable storage medium may be non-transitory.
- the computer-readable storage medium may be Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), or Compact Disc Read-Only Memory, ROM for short. : CD-ROM), magnetic tapes, floppy disks and optical data storage devices.
- a computer program product includes: computer program code, when the computer program code is run by a terminal, the terminal causes the terminal to execute the voice interaction method in each of the foregoing embodiments.
- a chip including a processor, configured to call and execute instructions stored in the memory from a memory, so that a device installed with the chip executes the voice interaction method in each of the foregoing embodiments.
- another chip including: an input interface, an output interface, a processor, and a memory.
- the input interface, output interface, the processor and the memory are connected by an internal connection path, and the processor uses To execute the code in the memory, when the code is executed, the processor is used to execute the voice interaction method in each of the foregoing embodiments.
- the computer program product includes one or more computer program instructions.
- the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
- the computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer program instructions can be passed from a website, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital video disc (DVD), or a semiconductor medium (for example, a solid state hard disk).
- multiple in this application means two or more than two, for example, multiple data packets refer to two or more data packets.
- the program can be stored in a computer-readable storage medium.
- the storage medium can be read-only memory, magnetic disk or optical disk, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Telephone Function (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
| 目标业务类型 | 概率阈值 |
| 导航 | 0.5 |
| 音乐 | 0.5 |
| 电台 | 0.5 |
| 通讯 | 0.7 |
| 短信 | 0.7 |
| 微信 | 0.7 |
| 日程咨询 | 0.5 |
| 新闻 | 0.4 |
| 问答 | 0.4 |
| 空调 | 0.4 |
| 车控 | 0.8 |
| 系统 | 0.7 |
| 维修保养 | 0.5 |
Claims (26)
- 一种语音交互方法,其特征在于,所述方法包括:确定检测到目标事件发生,所述目标事件为能够触发语音交互的事件;根据所述目标事件,查询映射关系,得到业务类型集合,所述业务类型集合包括一个或多个目标业务类型;采集语音指令;根据所述语音指令对应的语义信息,得到所述语义信息对应的第一业务;如果所述第一业务的业务类型为所述业务类型集合中的任一目标业务类型,根据所述语音指令执行所述第一业务。
- 根据权利要求1所述的方法,其特征在于,所述确定检测到目标事件发生,包括:确定检测到用户的第一操作;所述根据所述目标事件,查询映射关系,得到业务类型集合,包括:根据所述第一操作,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为与一个或多个目标第二操作相对应的业务类型,所述一个或多个目标第二操作为所述第一操作相关联的连续性操作。
- 根据权利要求1所述的方法,其特征在于,所述确定检测到目标事件发生,包括:从操作系统或应用程序接收通知消息;所述根据所述目标事件,查询映射关系,得到业务类型集合,包括:根据所述通知消息,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为与所述通知消息对应的消息查看或消息处理。
- 根据权利要求3所述的方法,其特征在于,所述通知消息包括来电通知、短信、即时通信消息以及告警消息中的至少一项,所述根据所述通知消息,查询所述映射关系,得到所述业务类型集合,包括下述至少一项:根据来电通知,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为接听来电;根据短信或即时通信消息,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为消息查看或消息回复;根据告警消息,查询映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为故障处理或信息查询。
- 根据权利要求1所述的方法,其特征在于,所述确定检测到目标事件发生,包括:确定当前的环境参数满足第一条件;所述根据所述目标事件,查询映射关系,得到业务类型集合,包括:根据所述环境参数,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为调节 环境参数。
- 根据权利要求1所述的方法,其特征在于,所述确定检测到目标事件发生,包括:确定当前业务的进度满足第二条件;所述根据所述目标事件,查询映射关系,得到业务类型集合,包括:根据所述当前业务,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为所述当前业务的业务类型。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述映射关系的建立过程包括:根据历史记录,得到与历史目标事件关联执行的历史业务,将所述历史业务的业务类型以及所述历史目标事件,写入所述映射关系;调用机器学习模型,将样本目标事件输入所述机器学习模型,输出业务类型,将输出的业务类型以及所述样本目标事件写入所述映射关系,所述机器学习模型用于根据事件预测业务类型。
- 根据权利要求1所述的方法,其特征在于,所述根据所述语音指令对应的语义信息,得到所述语义信息对应的第一业务之后,所述方法还包括:如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,向所述映射关系中,写入所述第一业务的业务类型。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述根据所述目标事件,查询映射关系,得到业务类型集合,包括:根据所述目标事件,查询所述映射关系,得到所述业务类型集合以及所述业务类型集合中每个目标业务类型对应的概率,概率表示将要执行对应目标业务类型的业务的可能性大小;所述如果所述第一业务的业务类型为所述业务类型集合中的任一目标业务类型,根据所述语音指令执行所述第一业务之前,所述方法还包括:从所述业务类型集合中,过滤掉概率不满足概率阈值的目标业务类型。
- 根据权利要求9所述的方法,其特征在于,所述根据所述语音指令对应的语义信息,得到所述语义信息对应的第一业务之后,所述方法还包括:根据所述语音指令对应的语义信息,更新所述映射关系中的概率。
- 根据权利要求10所述的方法,其特征在于,所述根据所述语音指令对应的语义信息,更新所述映射关系中的概率,包括下述任一项:如果所述第一业务的业务类型为所述业务类型集合中的任一目标业务类型,增加所述映射关系中所述第一业务的业务类型对应的概率;如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,减少所述映射关系中所述业务类型集合中的每个目标业务类型对应的概率;如果所述语义信息包含唤醒词,增加所述映射关系中所述第一业务的业务类型对应的概率。
- 根据权利要求1至11中任一项所述的方法,其特征在于,所述根据所述语音指令对应的语义信息,得到所述语义信息对应的第一业务之后,所述方法还包括:如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,丢弃所述语音指令。
- 一种语音交互装置,其特征在于,所述装置包括:确定模块,用于确定检测到目标事件发生,所述目标事件为能够触发语音交互的事件;查询模块,用于根据所述目标事件,查询映射关系,得到业务类型集合,所述业务类型集合包括一个或多个目标业务类型;采集模块,用于采集语音指令;获取模块,还用于根据所述语音指令对应的语义信息,得到所述语义信息对应的业务类型;业务执行模块,用于如果所述第一业务的业务类型为所述业务类型集合中的任一目标业务类型,根据所述语音指令执行所述第一业务。
- 根据权利要求13所述的装置,其特征在于,所述确定模块,用于确定检测到用户的第一操作;所述查询模块,用于根据所述第一操作,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为与一个或多个目标第二操作相对应的业务类型,所述一个或多个目标第二操作为所述第一操作相关联的连续性操作。
- 根据权利要求13所述的装置,其特征在于,所述确定模块,用于从操作系统或应用程序接收通知消息;所述查询模块,用于根据所述通知消息,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为与所述通知消息对应的消息查看或消息处理。
- 根据权利要求15所述的装置,其特征在于,所述通知消息包括来电通知、短信、即时通信消息以及告警消息中的至少一项,所述查询模块,用于执行下述至少一项:根据来电通知,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为接听来电;根据短信或即时通信消息,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为消息查看或消息回复;根据告警消息,查询映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为故障处理或信息查询。
- 根据权利要求13所述的装置,其特征在于,所述确定模块,用于确定当前的环境参数满足第一条件;所述查询模块,用于根据所述环境参数,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为调节环境参数。
- 根据权利要求13所述的装置,其特征在于,所述确定模块,用于确定当前业务的进度满足第二条件;所述查询模块,用于根据所述当前业务,查询所述映射关系,得到所述业务类型集合,所述业务类型集合包括的目标业务类型为所述当前业务的业务类型。
- 根据权利要求13至18中任一项所述的装置,其特征在于,所述映射关系的建立过程包括:根据历史记录,得到与历史目标事件关联执行的历史业务,将所述历史业务的业务类型以及所述历史目标事件,写入所述映射关系;调用机器学习模型,将样本目标事件输入所述机器学习模型,输出业务类型,将输出的业务类型以及所述样本目标事件写入所述映射关系,所述机器学习模型用于根据事件预测业务类型。
- 根据权利要求13所述的装置,其特征在于,所述装置还包括:写入模块,用于如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,向所述映射关系中,写入所述第一业务的业务类型。
- 根据权利要求13至20中任一项所述的装置,其特征在于,所述查询模块,还用于根据所述目标事件,查询所述映射关系,得到所述业务类型集合以及所述业务类型集合中每个目标业务类型对应的概率,概率表示将要执行对应目标业务类型的业务的可能性大小;所述装置还包括:过滤模块,用于从所述一个或多个目标业务类型中,过滤掉概率不满足概率阈值的目标业务类型。
- 根据权利要求21所述的装置,其特征在于,所述装置还包括:更新模块,用于根据所述语音指令对应的语义信息,更新所述映射关系中的概率。
- 根据权利要求22所述的装置,其特征在于,所述更新模块,用于执行下述任一项:如果所述第一业务的业务类型为所述业务类型集合中的任一目标业务类型,增加所述映射关系中所述第一业务的业务类型对应的概率;如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,减少所述映射关系中所述业务类型集合中的每个目标业务类型对应的概率;如果所述语义信息包含唤醒词,增加所述映射关系中所述第一业务的业务类型对应的概率。
- 根据权利要求13至23中任一项所述的装置,其特征在于,所述装置还包括:丢弃模块,用于如果所述第一业务的业务类型与所述业务类型集合中的每个目标业务类型均不同,丢弃所述语音指令。
- 一种终端,其特征在于,所述终端包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条指令,所述指令由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求12任一项所述的语音交互方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要求12任一项所述的语音交互方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021569122A JP7324313B2 (ja) | 2019-08-15 | 2020-02-13 | 音声対話方法及び装置、端末、並びに記憶媒体 |
| EP20803428.0A EP3933830B1 (en) | 2019-08-15 | 2020-02-13 | Speech interaction method and apparatus, terminal and storage medium |
| EP25184716.6A EP4664269A3 (en) | 2019-08-15 | 2020-02-13 | Voice interaction method and apparatus, terminal, and storage medium |
| US17/179,764 US11922935B2 (en) | 2019-08-15 | 2021-02-19 | Voice interaction method and apparatus, terminal, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910755150.6 | 2019-08-15 | ||
| CN201910755150.6A CN112397062B (zh) | 2019-08-15 | 2019-08-15 | 语音交互方法、装置、终端及存储介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/179,764 Continuation US11922935B2 (en) | 2019-08-15 | 2021-02-19 | Voice interaction method and apparatus, terminal, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021027267A1 true WO2021027267A1 (zh) | 2021-02-18 |
Family
ID=74569776
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/074988 Ceased WO2021027267A1 (zh) | 2019-08-15 | 2020-02-13 | 语音交互方法、装置、终端及存储介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11922935B2 (zh) |
| EP (2) | EP3933830B1 (zh) |
| JP (1) | JP7324313B2 (zh) |
| CN (2) | CN119296533A (zh) |
| WO (1) | WO2021027267A1 (zh) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113364669A (zh) * | 2021-06-02 | 2021-09-07 | 中国工商银行股份有限公司 | 消息处理方法、装置、电子设备及介质 |
| CN113739382A (zh) * | 2021-08-12 | 2021-12-03 | 武汉慧联无限科技有限公司 | 空调控制方法及装置、控制设备和存储介质 |
| WO2023273321A1 (zh) * | 2021-06-29 | 2023-01-05 | 荣耀终端有限公司 | 一种语音控制方法及电子设备 |
| CN115691492A (zh) * | 2022-10-29 | 2023-02-03 | 重庆长安汽车股份有限公司 | 一种车载语音控制系统及方法 |
| WO2023040692A1 (zh) * | 2021-09-14 | 2023-03-23 | 北京车和家信息技术有限公司 | 语音控制方法、装置、设备及介质 |
| US11960914B2 (en) * | 2021-11-19 | 2024-04-16 | Samsung Electronics Co., Ltd. | Methods and systems for suggesting an enhanced multimodal interaction |
| JP2024540681A (ja) * | 2021-11-30 | 2024-10-31 | 華為技術有限公司 | デバイス制御方法および装置 |
| TWI918023B (zh) | 2023-07-17 | 2026-03-11 | 美商博姆雲360公司 | 自適應及智慧提示系統及控制介面 |
Families Citing this family (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7338489B2 (ja) * | 2020-01-23 | 2023-09-05 | トヨタ自動車株式会社 | 音声信号制御装置、音声信号制御システム及び音声信号制御プログラム |
| CN111667831B (zh) * | 2020-06-08 | 2022-04-26 | 中国民航大学 | 基于管制员指令语义识别的飞机地面引导系统及方法 |
| CN114765027A (zh) * | 2021-01-15 | 2022-07-19 | 沃尔沃汽车公司 | 用于车辆语音控制的控制设备、车载系统和方法 |
| CN116868158A (zh) * | 2021-03-18 | 2023-10-10 | 深圳传音控股股份有限公司 | 智能交互方法、终端及存储介质 |
| CN113099054A (zh) * | 2021-03-30 | 2021-07-09 | 中国建设银行股份有限公司 | 语音交互的方法、装置、设备和计算机可读介质 |
| CN113282355B (zh) * | 2021-05-18 | 2025-03-04 | Oppo广东移动通信有限公司 | 基于状态机的指令执行方法、装置、终端及存储介质 |
| CN113609266B (zh) * | 2021-07-09 | 2024-07-16 | 阿里巴巴创新公司 | 资源处理方法以及装置 |
| CN115705844B (zh) * | 2021-08-12 | 2025-11-25 | 上海擎感智能科技有限公司 | 语音交互配置方法、电子设备和计算机可读介质 |
| CN113752966B (zh) * | 2021-09-14 | 2022-12-23 | 合众新能源汽车有限公司 | 车机系统的交互方法、装置和计算机可读介质 |
| CN116126509A (zh) * | 2021-11-12 | 2023-05-16 | 华为技术有限公司 | 基于多设备提供服务的方法、相关装置及系统 |
| CN114281967A (zh) * | 2021-12-17 | 2022-04-05 | 深圳市欧瑞博科技股份有限公司 | 人机对话的智能处理方法、装置、电子设备及存储介质 |
| CN116415061A (zh) * | 2021-12-30 | 2023-07-11 | 华为技术有限公司 | 一种服务推荐方法及相关装置 |
| CN114783434A (zh) * | 2022-04-18 | 2022-07-22 | 国汽智控(北京)科技有限公司 | 人机交互系统及交互方法 |
| KR20230150499A (ko) * | 2022-04-22 | 2023-10-31 | 에스케이텔레콤 주식회사 | 사용자 의도의 매핑을 이용하는 대화시스템 |
| CN115150507A (zh) * | 2022-05-07 | 2022-10-04 | Oppo广东移动通信有限公司 | 服务调度方法及系统、电子设备及计算机可读存储介质 |
| US20240013776A1 (en) * | 2022-07-11 | 2024-01-11 | GM Global Technology Operations LLC | System and method for scenario context-aware voice assistant auto-activation |
| CN117690423A (zh) * | 2022-09-05 | 2024-03-12 | 华为技术有限公司 | 人机交互方法及相关装置 |
| CN115531858B (zh) * | 2022-11-16 | 2023-04-18 | 北京集度科技有限公司 | 一种交互方法、终端设备以及车辆 |
| CN117922455A (zh) * | 2023-03-20 | 2024-04-26 | 小米汽车科技有限公司 | 联动系统、方法、车辆、存储介质与芯片 |
| CN118734796A (zh) * | 2023-03-31 | 2024-10-01 | 北京罗克维尔斯科技有限公司 | 车辆控制方法、装置、设备及存储介质 |
| CN116682427A (zh) * | 2023-06-15 | 2023-09-01 | 一汽奔腾轿车有限公司 | 一种免唤醒车载语音交互系统 |
| CN119232837A (zh) * | 2023-06-29 | 2024-12-31 | 荣耀终端有限公司 | 语音控制方法、设备及存储介质 |
| CN119785276B (zh) * | 2025-03-12 | 2025-05-09 | 中科南京软件技术研究院 | 基于多模态大模型在人机协同环境中的意图理解方法 |
| CN119865719A (zh) * | 2025-03-25 | 2025-04-22 | 深圳三基同创电子有限公司 | 一种智能穿戴设备对讲功能的实现方法及系统 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030220793A1 (en) * | 2002-03-06 | 2003-11-27 | Canon Kabushiki Kaisha | Interactive system and method of controlling same |
| EP1887482A1 (en) * | 2006-08-08 | 2008-02-13 | Accenture Global Services GmbH | Mobile audio content delivery system |
| CN107293294A (zh) * | 2016-03-31 | 2017-10-24 | 腾讯科技(深圳)有限公司 | 一种语音识别处理方法及装置 |
| CN107424601A (zh) * | 2017-09-11 | 2017-12-01 | 深圳怡化电脑股份有限公司 | 一种基于语音识别的信息交互系统、方法及其装置 |
| CN107437416A (zh) * | 2017-05-23 | 2017-12-05 | 阿里巴巴集团控股有限公司 | 一种基于语音识别的咨询业务处理方法及装置 |
| CN109243450A (zh) * | 2018-10-18 | 2019-01-18 | 深圳供电局有限公司 | 一种交互式的语音识别方法及系统 |
| CN109473100A (zh) * | 2018-11-12 | 2019-03-15 | 四川驹马科技有限公司 | 基于语音识别的业务场景语音人机交互方法及其系统 |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3963698B2 (ja) * | 2001-10-23 | 2007-08-22 | 富士通テン株式会社 | 音声対話システム |
| JP4802522B2 (ja) * | 2005-03-10 | 2011-10-26 | 日産自動車株式会社 | 音声入力装置および音声入力方法 |
| BR102012024861B1 (pt) * | 2011-09-30 | 2021-02-09 | Apple Inc. | sistema para desambiguar entrada de usuário para realizar uma tarefa |
| US11386886B2 (en) * | 2014-01-28 | 2022-07-12 | Lenovo (Singapore) Pte. Ltd. | Adjusting speech recognition using contextual information |
| DE112014006614B4 (de) * | 2014-04-22 | 2018-04-12 | Mitsubishi Electric Corporation | Benutzerschnittstellensystem, Benutzerschnittstellensteuereinrichtung, Benutzerschnittstellensteuerverfahren und Benutzerschnittstellensteuerprogramm |
| US20180039478A1 (en) * | 2016-08-02 | 2018-02-08 | Google Inc. | Voice interaction services |
| CN107316643B (zh) * | 2017-07-04 | 2021-08-17 | 科大讯飞股份有限公司 | 语音交互方法及装置 |
| KR102428148B1 (ko) * | 2017-08-31 | 2022-08-02 | 삼성전자주식회사 | 가전 기기의 음성 인식을 위한 시스템과 서버, 방법 |
| CN109753264A (zh) * | 2017-11-08 | 2019-05-14 | 阿里巴巴集团控股有限公司 | 一种任务处理方法和设备 |
| US10636424B2 (en) * | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
| JP2019105868A (ja) * | 2017-12-08 | 2019-06-27 | エヌ・ティ・ティ・コムウェア株式会社 | 入力支援装置、入力支援方法、及びプログラム |
| US10051600B1 (en) * | 2017-12-12 | 2018-08-14 | Amazon Technologies, Inc. | Selective notification delivery based on user presence detections |
| CN108337362A (zh) * | 2017-12-26 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | 语音交互方法、装置、设备和存储介质 |
| CN108320742B (zh) * | 2018-01-31 | 2021-09-14 | 广东美的制冷设备有限公司 | 语音交互方法、智能设备及存储介质 |
| CN108597519B (zh) * | 2018-04-04 | 2020-12-29 | 百度在线网络技术(北京)有限公司 | 一种话单分类方法、装置、服务器和存储介质 |
| CN108694947B (zh) * | 2018-06-27 | 2020-06-19 | Oppo广东移动通信有限公司 | 语音控制方法、装置、存储介质及电子设备 |
| CN109545206B (zh) | 2018-10-29 | 2024-01-30 | 百度在线网络技术(北京)有限公司 | 智能设备的语音交互处理方法、装置和智能设备 |
| CN109326289B (zh) * | 2018-11-30 | 2021-10-22 | 深圳创维数字技术有限公司 | 免唤醒语音交互方法、装置、设备及存储介质 |
| CN109285547B (zh) * | 2018-12-04 | 2020-05-01 | 北京蓦然认知科技有限公司 | 一种语音唤醒方法、装置及系统 |
| CN109671435B (zh) * | 2019-02-21 | 2020-12-25 | 三星电子(中国)研发中心 | 用于唤醒智能设备的方法和装置 |
-
2019
- 2019-08-15 CN CN202411377831.0A patent/CN119296533A/zh active Pending
- 2019-08-15 CN CN201910755150.6A patent/CN112397062B/zh active Active
-
2020
- 2020-02-13 JP JP2021569122A patent/JP7324313B2/ja active Active
- 2020-02-13 WO PCT/CN2020/074988 patent/WO2021027267A1/zh not_active Ceased
- 2020-02-13 EP EP20803428.0A patent/EP3933830B1/en active Active
- 2020-02-13 EP EP25184716.6A patent/EP4664269A3/en active Pending
-
2021
- 2021-02-19 US US17/179,764 patent/US11922935B2/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030220793A1 (en) * | 2002-03-06 | 2003-11-27 | Canon Kabushiki Kaisha | Interactive system and method of controlling same |
| EP1887482A1 (en) * | 2006-08-08 | 2008-02-13 | Accenture Global Services GmbH | Mobile audio content delivery system |
| CN107293294A (zh) * | 2016-03-31 | 2017-10-24 | 腾讯科技(深圳)有限公司 | 一种语音识别处理方法及装置 |
| CN107437416A (zh) * | 2017-05-23 | 2017-12-05 | 阿里巴巴集团控股有限公司 | 一种基于语音识别的咨询业务处理方法及装置 |
| CN107424601A (zh) * | 2017-09-11 | 2017-12-01 | 深圳怡化电脑股份有限公司 | 一种基于语音识别的信息交互系统、方法及其装置 |
| CN109243450A (zh) * | 2018-10-18 | 2019-01-18 | 深圳供电局有限公司 | 一种交互式的语音识别方法及系统 |
| CN109473100A (zh) * | 2018-11-12 | 2019-03-15 | 四川驹马科技有限公司 | 基于语音识别的业务场景语音人机交互方法及其系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3933830A4 |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113364669A (zh) * | 2021-06-02 | 2021-09-07 | 中国工商银行股份有限公司 | 消息处理方法、装置、电子设备及介质 |
| CN113364669B (zh) * | 2021-06-02 | 2023-04-18 | 中国工商银行股份有限公司 | 消息处理方法、装置、电子设备及介质 |
| WO2023273321A1 (zh) * | 2021-06-29 | 2023-01-05 | 荣耀终端有限公司 | 一种语音控制方法及电子设备 |
| CN113739382A (zh) * | 2021-08-12 | 2021-12-03 | 武汉慧联无限科技有限公司 | 空调控制方法及装置、控制设备和存储介质 |
| WO2023040692A1 (zh) * | 2021-09-14 | 2023-03-23 | 北京车和家信息技术有限公司 | 语音控制方法、装置、设备及介质 |
| US11960914B2 (en) * | 2021-11-19 | 2024-04-16 | Samsung Electronics Co., Ltd. | Methods and systems for suggesting an enhanced multimodal interaction |
| JP2024540681A (ja) * | 2021-11-30 | 2024-10-31 | 華為技術有限公司 | デバイス制御方法および装置 |
| JP7820516B2 (ja) | 2021-11-30 | 2026-02-25 | 深▲ジェン▼引望智能技術有限公司 | デバイス制御方法および装置 |
| CN115691492A (zh) * | 2022-10-29 | 2023-02-03 | 重庆长安汽车股份有限公司 | 一种车载语音控制系统及方法 |
| TWI918023B (zh) | 2023-07-17 | 2026-03-11 | 美商博姆雲360公司 | 自適應及智慧提示系統及控制介面 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3933830A1 (en) | 2022-01-05 |
| EP3933830A4 (en) | 2022-06-08 |
| EP4664269A2 (en) | 2025-12-17 |
| US11922935B2 (en) | 2024-03-05 |
| EP3933830B1 (en) | 2025-07-30 |
| JP7324313B2 (ja) | 2023-08-09 |
| EP4664269A3 (en) | 2026-03-04 |
| US20210183386A1 (en) | 2021-06-17 |
| CN112397062B (zh) | 2024-10-18 |
| CN119296533A (zh) | 2025-01-10 |
| CN112397062A (zh) | 2021-02-23 |
| JP2022534371A (ja) | 2022-07-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112397062B (zh) | 语音交互方法、装置、终端及存储介质 | |
| US12190878B2 (en) | Voice interaction method and apparatus | |
| AU2019385366B2 (en) | Voice control method and electronic device | |
| US11871328B2 (en) | Method for identifying specific position on specific route and electronic device | |
| EP4064284A1 (en) | Voice detection method, prediction model training method, apparatus, device, and medium | |
| WO2021052263A1 (zh) | 语音助手显示方法及装置 | |
| CN114697348B (zh) | 分布式实现方法、分布式系统、可读介质及电子设备 | |
| WO2020244622A1 (zh) | 一种通知的提示方法、终端及系统 | |
| WO2020192456A1 (zh) | 一种语音交互方法及电子设备 | |
| CN114694646B (zh) | 一种语音交互处理方法及相关装置 | |
| WO2020073288A1 (zh) | 一种触发电子设备执行功能的方法及电子设备 | |
| US12217069B2 (en) | Operation sequence adding method, electronic device, and system | |
| CN111835904A (zh) | 一种基于情景感知和用户画像开启应用的方法及电子设备 | |
| CN115333941B (zh) | 获取应用运行情况的方法及相关设备 | |
| WO2020253694A1 (zh) | 一种用于识别音乐的方法、芯片和终端 | |
| WO2023241482A1 (zh) | 一种人机对话方法、设备及系统 | |
| WO2023098467A1 (zh) | 语音解析方法、电子设备、可读存储介质及芯片系统 | |
| CN113487272A (zh) | 日程活动冲突判断方法、电子设备及存储介质 | |
| CN118131891A (zh) | 一种人机交互的方法和装置 | |
| CN121000813A (zh) | 一种语音交互方法、电子设备和存储介质 | |
| CN117687814A (zh) | 异常处理方法、系统以及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20803428 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021569122 Country of ref document: JP Kind code of ref document: A Ref document number: 2020803428 Country of ref document: EP Effective date: 20201117 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020803428 Country of ref document: EP |






