CN111192581A

CN111192581A - Voice wake-up method, device and storage medium

Info

Publication number: CN111192581A
Application number: CN202010014501.0A
Authority: CN
Inventors: 于德鸿
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-05-22

Abstract

The application discloses a voice awakening method, voice awakening equipment and a storage medium, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: the intelligent device can acquire a first voice in real time within a preset time length of a full duplex state with a preset keyword, and then execute a first instruction corresponding to the first voice when the fact that the first voice contains the preset keyword is monitored. Therefore, in the embodiment of the application, when the intelligent device monitors the voice containing the preset keyword, the instruction corresponding to the voice can be executed, so that the voice interaction between the user and the intelligent device can be accurately identified, and the user experience is improved.

Description

Voice wake-up method, device and storage medium

Technical Field

The application relates to the technical field of internet, in particular to an artificial intelligence technology.

Background

Along with the development of artificial intelligence technology, the function of intelligent stereo set is more and more comprehensive. The smart speaker can be a tool for home consumers to surf the internet by voice, such as ordering songs, shopping on the internet, or knowing weather forecast. The intelligent household appliance can also be controlled, such as opening a curtain, setting the temperature of a refrigerator, raising the temperature of a water heater in advance and the like.

In the prior art, a user can set the intelligent sound box in a full duplex state so as to input voice to the intelligent sound box to control the intelligent sound box when the intelligent sound box needs to be controlled. However, the existing intelligent sound equipment cannot accurately identify which is used for voice communication with other users and which is voice interaction between the user and the intelligent sound equipment, so that more misoperation can be caused.

Disclosure of Invention

The embodiment of the application provides a voice awakening method, voice awakening equipment and a storage medium, and aims to solve the technical problem that misoperation is more in the prior art.

A first aspect of an embodiment of the present application provides a voice wake-up method, where the method is applied to an intelligent device, and the intelligent device is in a full-duplex state with a preset keyword, where the method includes:

collecting a first voice within a preset time length;

and executing a first instruction corresponding to the first voice when the first voice is monitored to contain the preset keyword.

In the embodiment of the application, the intelligent device can acquire a first voice in real time within a preset time length of a full duplex state with a preset keyword, and then execute a first instruction corresponding to the first voice when the situation that the first voice contains the preset keyword is monitored. Therefore, in the embodiment of the application, because the intelligent device is in a full duplex state with the preset keyword, the instruction corresponding to the voice can be executed only when the voice containing the preset keyword is monitored, that is, the instruction corresponding to the voice not containing the preset keyword is not executed, so that the voice interaction between the user and the intelligent device can be accurately identified, and the user experience is improved.

Optionally, when it is monitored that the first voice includes the preset keyword, executing a first instruction corresponding to the first voice, including:

sending the first voice to a server;

receiving a first instruction corresponding to the first voice sent by the server, wherein the first instruction is obtained by analyzing the first voice by the server and is sent when the first voice is monitored to contain the preset keyword;

the first instruction is executed.

Optionally, before the collecting the first voice within the preset time length, the method further includes:

receiving a starting setting request, wherein the starting setting request is used for indicating that the full-duplex state with the preset keywords of the intelligent equipment is started;

and starting the full duplex state with the preset keywords of the intelligent equipment.

sending a wake-up prompt message when a second voice with a preset wake-up word is monitored;

collecting a third voice;

executing a second instruction corresponding to the third voice;

and starting the full-duplex state with the preset keywords of the intelligent equipment, and setting a timer, wherein the timing threshold of the timer is equal to the preset duration.

Optionally, the executing the second instruction corresponding to the third voice includes:

sending the third voice to a server;

receiving a second instruction corresponding to the third voice sent by the server, wherein the second instruction is obtained by analyzing the third voice by the server;

executing the second instruction.

Optionally, the method further comprises:

the timer is reset.

Optionally, the method further comprises:

if the preset duration is exceeded, closing the full duplex state with the preset keywords of the intelligent equipment, and enabling the intelligent equipment to be in a state to be awakened.

A second aspect of the embodiments of the present application provides a voice wake-up method, where the method is applied to a server, and the method includes:

receiving first voice sent by intelligent equipment; the intelligent equipment is in a full-duplex state with preset keywords;

analyzing the first voice;

and when the situation that the first voice contains the preset keywords is monitored, sending a first instruction corresponding to the first voice to the intelligent equipment.

In the embodiment of the application, the server receives a first voice sent by the intelligent device and analyzes the first voice; the first voice is acquired when the intelligent equipment is in a preset duration of a full-duplex state with preset keywords; further, when monitoring that the first voice contains the preset keyword, the server sends a first instruction corresponding to the first voice to the intelligent device, so that the intelligent device can execute the first instruction corresponding to the first voice. It can be seen that, in the embodiment of the present application, the server sends the instruction corresponding to the voice to the intelligent device only when monitoring the voice including the preset keyword, so that the intelligent device can execute the instruction corresponding to the voice including the preset keyword when being in a full duplex state with the preset keyword, that is, the instruction corresponding to the voice not including the preset keyword is not executed, thereby accurately identifying the voice interaction between the user and the intelligent device, and improving the user experience.

Optionally, before the receiving the first voice sent by the smart device, the method further includes:

and sending the starting setting request to the intelligent equipment.

receiving a third voice sent by the intelligent equipment, wherein the third voice is collected by the intelligent equipment after monitoring a second voice with a preset awakening word;

analyzing the third voice to obtain a second instruction;

and sending the second instruction to the intelligent equipment.

A third aspect of the embodiments of the present application provides an intelligent device, where the intelligent device is in a full duplex state with a preset keyword, and the intelligent device includes:

the first acquisition module is used for acquiring first voice within a preset time length;

and the first execution module is used for executing a first instruction corresponding to the first voice when the first voice is monitored to contain the preset keyword.

Optionally, the first execution module includes:

the sending unit is used for sending the first voice to a server;

the receiving unit is used for receiving a first instruction corresponding to the first voice sent by the server, wherein the first instruction is obtained by analyzing the first voice by the server and is sent when the first voice is monitored to contain the preset keyword;

an execution unit to execute the first instruction.

Optionally, the smart device further comprises:

the device comprises a receiving module, a starting setting module and a display module, wherein the receiving module is used for receiving a starting setting request, and the starting setting request is used for indicating the starting of the full-duplex state with the preset keywords of the intelligent equipment;

the first starting module is used for starting the full duplex state with the preset keywords of the intelligent equipment.

Optionally, the smart device further comprises:

the prompting module is used for sending out awakening prompting information when monitoring a second voice with a preset awakening word;

the second acquisition module is used for acquiring third voice;

the second execution module is used for executing a second instruction corresponding to the third voice;

and the second starting module is used for starting the full duplex state with the preset keywords of the intelligent equipment and setting a timer, wherein the timing threshold of the timer is equal to the preset duration.

Optionally, the second execution module includes:

a sending unit, configured to send the third voice to a server;

a receiving unit, configured to receive a second instruction corresponding to the third voice sent by the server, where the second instruction is obtained by analyzing the third voice by the server;

an execution unit to execute the second instruction.

Optionally, the smart device further comprises:

and the resetting module is used for resetting the timer.

Optionally, the smart device further comprises:

and the closing module is used for closing the full duplex state with the preset keywords of the intelligent equipment if the preset duration is exceeded, so that the intelligent equipment is in a state to be awakened.

A fourth aspect of the embodiments of the present application provides a server, including:

the first receiving module is used for receiving a first voice sent by the intelligent equipment; the intelligent equipment is in a full-duplex state with preset keywords;

the first analysis module is used for analyzing the first voice;

and the first sending module is used for sending a first instruction corresponding to the first voice to the intelligent device when the situation that the first voice contains the preset keyword is monitored.

Optionally, the server further comprises:

the second receiving module is used for receiving a starting setting request, wherein the starting setting request is used for indicating the full duplex state with the preset keywords for starting the intelligent equipment;

and the second sending module is used for sending the starting setting request to the intelligent equipment.

Optionally, the server further comprises:

the third receiving module is used for receiving a third voice sent by the intelligent device, wherein the third voice is collected by the intelligent device after monitoring a second voice with a preset awakening word;

the second analysis module is used for analyzing the third voice to obtain a second instruction;

and the third sending module is used for sending the second instruction to the intelligent equipment.

A fifth aspect of an embodiment of the present application provides an electronic device, including:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first or second aspects as described above.

A sixth aspect of embodiments of the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the first or second aspects described above.

In summary, the embodiment of the present application has the following beneficial effects with respect to the prior art:

in the voice awakening method, the device and the storage medium provided by the embodiment of the application, the intelligent device can acquire the first voice in real time within the preset time length of the full duplex state with the preset keyword, and then execute the first instruction corresponding to the first voice when the situation that the first voice contains the preset keyword is monitored. Therefore, the intelligent equipment is in a full-duplex state with the preset keywords, and the instruction corresponding to the voice can be executed when the voice containing the preset keywords is monitored, so that the technical problem of more misoperation in the prior art is solved, the voice interaction between the user and the intelligent equipment can be accurately identified, and the technical effect of user experience is improved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a voice wake-up method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a voice wake-up method according to another embodiment of the present application;

fig. 4 is a schematic flowchart of a voice wake-up method according to another embodiment of the present application;

fig. 5 is a schematic flowchart of a voice wake-up method according to another embodiment of the present application;

fig. 6 is a schematic flowchart of a voice wake-up method according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of an intelligent device provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an intelligent device provided in an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a voice wake-up method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First, an application scenario and a part of words involved in the embodiment of the present application will be explained.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 1, the application scenario of the embodiment of the present application may include: the intelligent device 10 and the server 11; of course, other devices, such as mobile terminal 12, etc., may also be included.

The smart device 10 related in the embodiment of the present application may include: an intelligent sound box or an intelligent remote controller; of course, other devices with voice control function may also be included, which is not limited in the embodiment of the present application.

The mobile terminal 12 according to the embodiment of the present application may include: a mobile phone, a tablet computer or a notebook computer; of course, other devices may be included, which is not limited in the embodiments of the present application.

When the smart device 10 is in an awake state or in a full duplex state with a preset keyword, the collected voice may be sent to the server 11, so that the server 11 may analyze the received voice. It should be understood that, if the data processing capability of the smart device 10 reaches the preset data processing capability threshold, the smart device 10 may also parse the collected voice without sending the voice to the server 11 for parsing.

The server 11 may parse the received voice to obtain a corresponding instruction, and then send the instruction to the smart device 10, so that the smart device 10 executes the instruction. It should be noted that, if the intelligent device 10 is in a full duplex state with a preset keyword, the server 11 may send the instruction to the intelligent device 10 only when it is monitored that the voice includes the preset keyword.

The user may request the server 11 to set the relevant configuration of the smart device 10 through the mobile terminal 12, for example, to start a full duplex state with a preset keyword of the smart device 10. It should be understood that the user may also set the relevant configuration of the smart device 10 via the smart device 10. Of course, the user may set the relevant configuration of the smart device 10 in other ways.

The preset awakening word related in the embodiment of the application is used for awakening the intelligent device, so that the intelligent device is switched from a state to be awakened to an awakening state.

The preset keywords mentioned in the embodiment of the present application are used to identify the voice of the user interacting with the smart device 10. For example, if the voice 1 includes the preset keyword, it may be determined that the voice 1 is a voice of a user interacting with the smart device 10, and the smart device 10 needs to execute an instruction corresponding to the voice 1; if the voice 2 does not include the preset keyword, it may be determined that the voice 2 is not the voice of the user interacting with the intelligent device 10, and the intelligent device 10 does not need to execute the instruction corresponding to the voice 2.

In the embodiment of the present application, when the intelligent device 10 is in a full duplex state with a preset keyword, the intelligent device 10 may perform uplink and downlink information transmission simultaneously with the server 11, and the intelligent device 10 may execute an instruction corresponding to a voice including the preset keyword, that is, not execute an instruction corresponding to a voice not including the preset keyword.

When the intelligent device related to the embodiment of the present application is in the wake-up state, the intelligent device may collect voice, perform uplink and downlink information transmission simultaneously with the server 11, and may execute the instruction corresponding to the voice.

When the intelligent device related to the embodiment of the application is in a state to be awakened, the intelligent device can collect voice and monitor whether the voice has the preset awakening words.

In the prior art, when a user uses an intelligent sound box each time, a wake-up word is required to wake up the intelligent sound box. Then, the user can input voice after receiving the awakening response of the intelligent sound box. Therefore, the user operation is complicated in the existing voice interaction mode, and the user experience is poor.

For the first prior art, in the embodiment of the present application, the intelligent device 10 is in a full duplex state with a preset keyword within a preset time after being awakened. The smart device 10 may collect a first voice within the preset time period, and then execute a first instruction corresponding to the first voice when it is monitored that the first voice includes a preset keyword. Therefore, according to the voice interaction mode provided by the embodiment of the application, when the user uses the intelligent device again within the preset time after waking up the intelligent device, the user does not need to wake up the intelligent device by using the wake-up word, and the voice with the preset keyword can be directly input to control the intelligent device, so that the intelligent device can execute the instruction corresponding to the voice with the preset keyword, the operation of the user is simple, the voice interaction mode of the user and the intelligent device is optimized, and the user experience is improved.

Prior art two, the user can set up intelligent stereo set to full duplex state to input pronunciation in order to control intelligent stereo set to intelligent stereo set when needs control intelligent stereo set. However, the existing intelligent sound equipment cannot accurately identify which is used for voice communication with other users and which is voice interaction between the user and the intelligent sound equipment, so that more misoperation can be caused.

In view of the second prior art, in this embodiment of the present application, the intelligent device 10 is in a full duplex state with a preset keyword, and can collect a first voice within a preset time period, and then execute a first instruction corresponding to the first voice when it is monitored that the first voice contains the preset keyword. It can be seen that, in the embodiment of the present application, when the smart device 10 monitors the voice containing the preset keyword, the instruction corresponding to the voice is executed, that is, the instruction corresponding to the voice not containing the preset keyword is not executed, so that the voice interaction between the user and the smart device can be accurately identified, and the user experience is improved.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart illustrating a voice wake-up method according to an embodiment of the present application. The execution subject in the embodiment of the present application may be the intelligent device 10 or a voice wake-up apparatus in the intelligent device 10 (for convenience of description, the execution subject is taken as the intelligent device 10 in the embodiment for example and is described). Illustratively, the voice wake-up apparatus may be implemented by software and/or hardware. As shown in fig. 2, the voice wake-up method provided in this embodiment may include:

step S201, collecting a first voice within a preset time length.

In this embodiment, the intelligent device 10 is in a full duplex state with a preset keyword, so that a user can interact with the intelligent device 10 through a voice with the preset keyword at any time, and the intelligent device 10 executes an instruction corresponding to the voice only when monitoring the voice with the preset keyword.

In this step, the intelligent device 10 may collect the first voice in real time within a preset duration. For example, the smart device 10 may collect the first voice in real time through a voice collecting device (e.g., a microphone, etc.) in the smart device 10. Of course, the smart device 10 may also collect the first voice in other manners, which is not limited in the embodiment of the present application.

Step S202, when it is monitored that the first voice contains the preset keyword, executing a first instruction corresponding to the first voice.

In this step, when it is monitored that the first voice collected in step S201 includes the preset keyword, the smart device 10 executes a first instruction corresponding to the first voice. For example, the smart device 10 may send the first voice collected in step S201 to the server 11, so that the server 11 analyzes the first voice and monitors whether the first voice contains a preset keyword. For another example, the smart device 10 may monitor whether the first voice collected in step S201 includes the preset keyword.

In a possible implementation manner, the smart device 10 sends the first voice to the server 11; further, the smart device 10 receives a first instruction corresponding to the first voice sent by the server 11, where the first instruction is obtained by analyzing the first voice by the server 11 and is sent when it is monitored that the first voice contains the preset keyword; further, the smart device executes the first instruction.

In this implementation manner, the smart device 10 may send the first voice to the server 11, so that the server 11 analyzes the received first voice, and when it is monitored that the first voice includes the preset keyword, may send a first instruction corresponding to the first voice to the smart device 10; further, the intelligent device 10 receives the first instruction corresponding to the first voice sent by the server 11, and executes the first instruction corresponding to the first voice, so that the intelligent device 10 executes the instruction corresponding to the first voice only when monitoring the voice including the preset keyword.

It should be understood that, when it is monitored that the first voice does not include the preset keyword, the server 11 may abandon the first voice, and does not need to send the first instruction corresponding to the first voice to the intelligent device 10, and correspondingly, the intelligent device 10 does not need to execute the first instruction corresponding to the first voice.

In another possible implementation manner, the smart device 10 may analyze the first voice collected in step S201, and when it is monitored that the first voice includes the preset keyword, send the first voice to the server 11, so that the server 11 analyzes the received first voice and sends a first instruction corresponding to the first voice to the smart device 10; further, the intelligent device 10 receives the first instruction corresponding to the first voice sent by the server 11, and executes the first instruction corresponding to the first voice, so that the intelligent device 10 executes the instruction corresponding to the first voice only when monitoring the voice including the preset keyword.

It should be understood that, the smart device 10 analyzes the first voice collected in the step S201, and when it is monitored that the first voice does not include the preset keyword, the first voice may be discarded without being sent to the server 11.

In another possible implementation manner, the intelligent device 10 may analyze the first voice collected in step S201, and execute a first instruction corresponding to the first voice when it is monitored that the first voice includes the preset keyword, so that the intelligent device 10 executes the instruction corresponding to the first voice when it is monitored that the voice includes the preset keyword.

It should be understood that, the smart device 10 analyzes the first voice collected in the step S201, and when it is monitored that the first voice does not include the preset keyword, the first voice may be discarded.

To sum up, in this embodiment of the application, the intelligent device 10 may collect a first voice in real time within a preset time duration in a full duplex state with a preset keyword, and then execute a first instruction corresponding to the first voice when it is monitored that the first voice includes the preset keyword. Therefore, in the embodiment of the application, because the intelligent device is in a full duplex state with the preset keyword, the instruction corresponding to the voice can be executed only when the voice containing the preset keyword is monitored, that is, the instruction corresponding to the voice not containing the preset keyword is not executed, so that the voice interaction between the user and the intelligent device can be accurately identified, and the user experience is improved.

Optionally, if the preset duration is exceeded, the intelligent device 10 may close the full-duplex state with the preset keyword of the intelligent device, so that the intelligent device is in a state to be woken up, that is, when the user uses the intelligent device 10, the user needs to wake up the intelligent device 10 by using a preset wake-up word, and then the user may input a voice after receiving a wake-up prompt message (or a wake-up response) of the intelligent device 10. It can be seen that the smart device 10 does not need to be in the full duplex state all the time, and resources, such as power resources and processor resources of the smart device 10 and/or transmission resources between the smart device 10 and the server 11, can be saved.

Optionally, before the intelligent device 10 executes the step S201, the full-duplex state with the preset keyword may be started, so that the intelligent device 10 is in the full-duplex state with the preset keyword. In the following embodiments of the present application, a method for opening the smart device 10 in a full duplex state with a preset keyword is described.

In a possible implementation manner, a start setting request is received, where the start setting request is used to indicate that the full duplex state with the preset keyword of the smart device is started; and further, starting the full duplex state with the preset keywords of the intelligent equipment.

For example, a user may input the start setting request to the smart device 10 through a preset key in the smart device 10, and correspondingly, the smart device 10 may receive the start setting request input by the user and then start the full-duplex state with the preset keyword of the smart device 10. Of course, the user may also input the opening setting request to the smart device 10 by other ways, which is not limited in the embodiment of the present application.

For another example, the user may send the start setting request to the server 11 through an Application (APP) in the mobile terminal 12, and then the server 11 sends the received start setting request to the smart device 10. Correspondingly, the smart device 10 may receive the start setting request sent by the server 11, and then start the full duplex state with the preset keyword of the smart device 10.

It should be understood that the preset keyword may be preset by the system, or may be preset by the user (for example, the user may carry the preset keyword in the start setting request, etc.); of course, the preset keywords may also be set in other manners, which is not limited in the embodiment of the present application.

It should be understood that the setting request may or may not carry a preset duration set by the user, for example, the preset duration may be preset by a system, or may be a difference between a time when the smart device 10 receives a close setting request and a time when the smart device receives the open setting request, where the close setting request is used to indicate that the smart device is closed in the full duplex state with the preset keyword. It should be understood that, as for the manner in which the intelligent device 10 receives the closing setting request, reference may be made to the manner in which the opening setting request is received, which is not described in detail in this embodiment of the application.

In another possible implementation manner, when a second voice with a preset awakening word is monitored, sending an awakening prompt message; further, third voice is collected, and then a second instruction corresponding to the third voice is executed; further, the full-duplex state with the preset keyword of the intelligent device is started, and a timer is set, wherein a timing threshold of the timer is equal to the preset duration.

In this implementation manner, the intelligent device 10 is initially in a state to be wakened up, and when monitoring a second voice with a preset wake-up word, the intelligent device may be switched to the wake-up state, and may send a wake-up prompt message to a user, where the intelligent device 10 is configured to prompt the user that the intelligent device 10 is in the wake-up state, so that the user can input a third voice. Illustratively, the wake-up hint information may include, but is not limited to: voice prompt messages, and/or indicator light prompt messages.

Further, the smart device 10 collects a third voice and executes a second instruction corresponding to the third voice. For example, the smart device 10 may send the third voice to the server 11, so that the server 11 parses the received third voice to obtain a second instruction corresponding to the third voice; further, the smart device 10 receives a second command corresponding to the third voice sent by the server 11, and executes the second command. For another example, the smart device 10 may analyze the collected third voice to obtain a second instruction corresponding to the third voice, and then execute the second instruction corresponding to the third voice.

In order to facilitate the voice interaction between the user and the smart device 10, the smart device 10 may further start the full duplex state with the preset keyword, and set a timer, so that the user may control the smart device 10 through the voice including the preset keyword before the timing duration of the timer reaches the timing threshold (or the preset duration), so that the smart device 10 executes an instruction corresponding to the voice including the preset keyword.

It should be noted that, the intelligent device 10 may also perform the step of starting the full duplex state with the preset keyword and setting the timer first, and then perform the step of executing the second instruction corresponding to the third voice, or may also perform the above steps at the same time.

Optionally, after the smart device 10 performs the step S202, the timer may be reset, for example, the timing duration of the timer is reset to be equal to an initial default value (e.g. 0), or the timing threshold of the timer is reset to be equal to the preset duration, etc.

Fig. 3 is a flowchart illustrating a voice wake-up method according to another embodiment of the present application. On the basis of the foregoing embodiment, the present embodiment introduces an implementable manner in which the smart device 10 is switched from the state to be awakened to the state of awakening, and then is in the full-duplex state with the preset keyword. As shown in fig. 3, the method of the embodiment of the present application may include:

step S301, whether a second voice containing a preset awakening word is received or not is monitored.

In this step, the smart device 10 is in a state to be awakened, and monitors whether a second voice containing a preset awakening word is received in real time. If the second voice containing the preset awakening word is monitored, executing the step S302; otherwise, the step S301 is executed again.

And step S302, sending out awakening prompt information.

In this step, the smart device 10 sends a wake-up prompt message to the user, where the wake-up prompt message is used to prompt the user that the smart device 10 is in a wake-up state, so that the user can input a third voice.

And S303, acquiring a third voice and executing a second instruction corresponding to the third voice.

And step S304, starting the full duplex state with the preset keywords, and setting a timer.

In this step, the intelligent device 10 starts the full duplex state with the preset keyword, and sets a timer, where a timing threshold of the timer is equal to the preset duration.

Step S305, monitoring whether the time duration of the timer reaches the time threshold.

If the timing duration of the timer reaches the timing threshold value, returning to execute the step S301; otherwise, step S306 is executed.

And S306, collecting the first voice.

Step S307, monitoring whether the first voice contains a preset keyword.

If it is monitored that the first voice contains a preset keyword, executing step S308; otherwise, the procedure returns to step S305.

It should be noted that, if it is detected that the first voice does not include the preset keyword, the step may also be returned to the step S304 to reset the timer.

Step S308, executing the first instruction corresponding to the first voice.

It should be noted that, for the realizable manner of each step in this embodiment, reference may be made to relevant contents in the foregoing embodiments of the present application, and details are not described here again.

To sum up, in this embodiment of the application, the intelligent device 10 is in a full duplex state with a preset keyword within a preset time period after being awakened, and can acquire a first voice in real time, and then execute a first instruction corresponding to the first voice when it is monitored that the first voice includes the preset keyword. Therefore, according to the voice interaction mode provided by the embodiment of the application, when the user uses the intelligent device again within the preset time after waking up the intelligent device, the user does not need to wake up the intelligent device by using the wake-up word, and the voice with the preset keyword can be directly input to control the intelligent device, so that the intelligent device can execute the instruction corresponding to the voice with the preset keyword, the operation of the user is simple, the voice interaction mode of the user and the intelligent device is optimized, and the user experience is improved.

Fig. 4 is a flowchart illustrating a voice wake-up method according to another embodiment of the present application. On the basis of the above embodiments, the embodiments of the present application introduce a server side. The execution subject in the embodiment of the present application may be the server 11 or a voice wake-up apparatus in the server 11 (for convenience of description, the execution subject is taken as the server 11 in the embodiment for example and is described). Illustratively, the voice wake-up apparatus may be implemented by software and/or hardware. As shown in fig. 4, the voice wake-up method provided in this embodiment may include:

step S401, receiving a first voice sent by the intelligent device.

In this step, the server 11 may receive the first voice sent by the smart device 10; the first voice is a voice collected by the intelligent device 10 in a preset duration of a full duplex state with a preset keyword.

Step S402, analyzing the first voice.

In this step, the server 11 may analyze the first voice received in step S401, wherein a specific voice analysis method may be an existing voice analysis method or a voice analysis method.

And S403, when the situation that the first voice contains the preset keyword is monitored, sending a first instruction corresponding to the first voice to the intelligent device.

In this step, when it is monitored that the first voice includes the preset keyword, the server 11 sends a first instruction corresponding to the first voice to the intelligent device 10, so that the intelligent device 10 executes the first instruction corresponding to the first voice.

To sum up, in the embodiment of the present application, the server 11 receives the first voice sent by the smart device 10, and analyzes the first voice; the first voice is a voice collected by the intelligent device 10 in a preset duration of a full duplex state with a preset keyword; further, when it is monitored that the first voice includes the preset keyword, the server 11 sends a first instruction corresponding to the first voice to the intelligent device 10, so that the intelligent device 10 executes the first instruction corresponding to the first voice. It can be seen that, in this embodiment of the application, the server 11 sends the instruction corresponding to the voice to the intelligent device 10 only when monitoring the voice containing the preset keyword, so that when the intelligent device 10 is in a full duplex state with the preset keyword, the server executes the instruction corresponding to the voice containing the preset keyword, that is, does not execute the instruction corresponding to the voice not containing the preset keyword, thereby accurately identifying the voice interaction between the user and the intelligent device, and improving the user experience.

Optionally, before executing the step S401, the server 11 may further receive an opening setting request, where the opening setting request is used to indicate that the full duplex state with the preset keyword of the smart device 10 is opened. The start setting request may be sent by the user through the mobile terminal 12, and may be sent by other devices.

Further, the server 11 may send the start setting request to the smart device 10, so that the smart device 10 starts a full duplex state with a preset keyword.

Optionally, before executing the step S401, the server 11 may further receive a third voice sent by the smart device 10, where the third voice may be acquired after the smart device 10 switches to the wake-up state when the to-be-wake-up state monitors the second voice with the preset wake-up word. Further, the server 11 analyzes the third voice to obtain a second instruction corresponding to the third voice, and sends the second instruction to the smart device 10, so that the smart device 10 executes the second instruction.

Fig. 5 is a flowchart illustrating a voice wake-up method according to another embodiment of the present application. On the basis of the foregoing embodiment, in the embodiment of the present application, a manner that the voice wake-up method can be implemented is described in combination with the server 11 and the smart device 10 in a full duplex state with a preset keyword. As shown in fig. 5, the method of the embodiment of the present application may include:

step S501, the intelligent device 10 collects a first voice within a preset time.

Step S502, the smart device 10 transmits the first voice to the server 11.

In step S503, the server 11 analyzes the received first voice.

Step S504, when it is monitored that the first voice includes the preset keyword, the server 11 sends a first instruction corresponding to the first voice to the intelligent device 10.

Step S505, the smart device 10 executes a first instruction corresponding to the first voice.

To sum up, the intelligent device 10 may collect a first voice in real time and send the first voice to the server 11 within a preset duration of a full duplex state with a preset keyword, so that the server 11 may analyze the first voice, and send a first instruction corresponding to the first voice to the intelligent device 10 when it is monitored that the first voice includes the preset keyword, so that the intelligent device 10 executes the first instruction corresponding to the first voice. Therefore, in the embodiment of the present application, when the smart device 10 monitors the voice containing the preset keyword, the instruction corresponding to the voice is executed, so that the voice interaction between the user and the smart device can be accurately identified, and the user experience is improved.

Fig. 6 is a flowchart illustrating a voice wake-up method according to another embodiment of the present application. On the basis of the foregoing embodiment, in the embodiment of the present application, a description is given of another implementation manner of the voice wakeup method in combination with the foregoing smart device 10 and the foregoing server 11. As shown in fig. 6, the method of the embodiment of the present application may include:

step S601, the smart device 10 monitors whether a second voice containing a preset wake-up word is received.

In this step, the smart device 10 is in a state to be awakened, and monitors whether a second voice containing a preset awakening word is received in real time. If the second voice containing the preset awakening word is monitored, executing step S602; otherwise, the step S601 is executed again.

Step S602, the intelligent device 10 sends a wake-up prompt message.

Step S603, the smart device 10 collects the third voice.

Step S604, the smart device 10 sends the third voice to the server 11.

In step S605, the server 11 analyzes the received third voice to obtain a second instruction corresponding to the third voice.

In step S606, the server 11 sends a second instruction corresponding to the third voice to the smart device 10.

In step S607, the smart device 10 executes a second instruction corresponding to the third voice.

Step S608, the intelligent device 10 starts the full duplex state with the preset keyword, and sets a timer.

Step S609, the intelligent device 10 monitors whether the timing duration of the timer reaches the timing threshold.

If the timing duration of the timer reaches the timing threshold value, returning to execute the step S601; otherwise, step S610 is executed.

Step S610, the smart device 10 collects the first voice.

Step S611, the smart device 10 sends the first voice to the server 11.

In step S612, the server 11 analyzes the first voice.

Step S613, when it is monitored that the first voice includes the preset keyword, the server 11 sends a first instruction corresponding to the first voice to the smart device 11.

It should be understood that, if the server 11 detects that the first voice does not include the preset keyword, the first voice may be discarded.

In step S614, the smart device 10 executes a first instruction corresponding to the first voice.

It should be understood that the steps S609 to S614 in this embodiment may be executed in a loop, until the smart device 10 monitors that the timing duration of the timer reaches the timing threshold, and then the step S601 is executed again.

To sum up, in this embodiment of the application, the intelligent device 10 is in a full duplex state with a preset keyword within a preset time period after being awakened, and can collect a first voice in real time and send the first voice to the server 11, so that the server 11 can analyze the first voice, and send a first instruction corresponding to the first voice to the intelligent device 10 when monitoring that the first voice contains the preset keyword, so that the intelligent device 10 executes the first instruction corresponding to the first voice. Therefore, according to the voice interaction mode provided by the embodiment of the application, when the user uses the intelligent device again within the preset time after waking up the intelligent device, the user does not need to wake up the intelligent device by using the wake-up word, and the voice with the preset keyword can be directly input to control the intelligent device, so that the intelligent device can execute the instruction corresponding to the voice with the preset keyword, the operation of the user is simple, the voice interaction mode of the user and the intelligent device is optimized, meanwhile, the voice interaction of the user and the intelligent device can be accurately identified, and the user experience is improved.

Fig. 7 is a schematic structural diagram of an intelligent device provided in an embodiment of the present application. The intelligent device provided by the embodiment of the application is in a full duplex state with preset keywords. As shown in fig. 7, the smart device provided in the embodiment of the present application may include: a first acquisition module 701 and a first execution module 702.

The first acquisition module 701 is configured to acquire a first voice within a preset time duration;

a first executing module 702, configured to execute a first instruction corresponding to the first voice when it is monitored that the first voice includes the preset keyword.

Optionally, the first executing module 702 includes:

the sending unit is used for sending the first voice to a server;

an execution unit to execute the first instruction.

Optionally, the smart device further comprises:

the second acquisition module is used for acquiring third voice;

Optionally, the second execution module includes:

a sending unit, configured to send the third voice to a server;

an execution unit to execute the second instruction.

Optionally, the smart device further comprises:

and the resetting module is used for resetting the timer.

Optionally, the smart device further comprises:

The intelligent device provided in this embodiment is configured to execute the technical solution related to the intelligent device 10 in the above voice wakeup method embodiment of the present application, and the technical principle and the technical effect are similar, which are not described herein again.

Fig. 8 is a schematic structural diagram of an intelligent device provided in an embodiment of the present application. As shown in fig. 8, a server provided in an embodiment of the present application may include: a first receiving module 801, a first parsing module 802 and a first sending module 803.

The first receiving module 801 is configured to receive a first voice sent by the smart device; the intelligent equipment is in a full-duplex state with preset keywords;

a first parsing module 802, configured to parse the first voice;

a first sending module 803, configured to send a first instruction corresponding to the first voice to the intelligent device when it is monitored that the first voice includes the preset keyword.

Optionally, the server further comprises:

The server provided in this embodiment is configured to execute the technical solution related to the server 11 in the above voice wakeup method embodiment of the present application, and the technical principle and the technical effect are similar, which are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 9 is a block diagram of an electronic device according to the voice wake-up method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as smart phones, personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 9, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the voice wake-up method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the voice wake-up method provided herein.

The memory 902, which is a non-transitory computer readable storage medium, can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the voice wake-up method in the embodiment of the present application (for example, the first acquisition module 701 and the first execution module 702 shown in fig. 7, or the first receiving module 801, the first parsing module 802, and the first sending module 803 shown in fig. 8). The processor 901 executes various functional applications and data processing of the electronic device by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the voice wake-up method in the above method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device described above, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the electronic devices via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the voice wake-up method in the embodiment of the application may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903 and the output device 904 may be connected by a bus or other means, and fig. 9 illustrates the connection by a bus as an example.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the intelligent device can acquire the first voice in real time within the preset time length of the full duplex state with the preset keyword, and then execute the first instruction corresponding to the first voice when the situation that the first voice contains the preset keyword is monitored. Therefore, the intelligent equipment is in a full-duplex state with the preset keywords, and the instruction corresponding to the voice can be executed when the voice containing the preset keywords is monitored, so that the technical problem of more misoperation in the prior art is solved, the voice interaction between the user and the intelligent equipment can be accurately identified, and the technical effect of user experience is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A voice awakening method is applied to intelligent equipment, the intelligent equipment is in a full-duplex state with preset keywords, and the method comprises the following steps:

collecting a first voice within a preset time length;

2. The method according to claim 1, wherein when it is monitored that the first voice includes the preset keyword, executing a first instruction corresponding to the first voice, includes:

sending the first voice to a server;

the first instruction is executed.

3. The method according to claim 1 or 2, wherein before the collecting the first voice within the preset time period, the method further comprises:

4. The method according to claim 1 or 2, wherein before the collecting the first voice within the preset time period, the method further comprises:

collecting a third voice;

executing a second instruction corresponding to the third voice;

5. The method of claim 4, wherein executing the second instruction corresponding to the third speech comprises:

sending the third voice to a server;

executing the second instruction.

6. The method of claim 4, wherein after the executing the first instruction corresponding to the first voice, the method further comprises:

the timer is reset.

7. The method of claim 4, further comprising:

8. A voice wake-up method is applied to a server, and comprises the following steps:

analyzing the first voice;

9. The method of claim 8, wherein prior to receiving the first voice sent by the smart device, the method further comprises:

and sending the starting setting request to the intelligent equipment.

10. The method of claim 8, wherein prior to receiving the first voice sent by the smart device, the method further comprises:

analyzing the third voice to obtain a second instruction;

and sending the second instruction to the intelligent equipment.

11. An intelligent device, wherein the intelligent device is in a full duplex state with a preset keyword, the intelligent device comprising:

12. The apparatus of claim 11, wherein the first execution module comprises:

the sending unit is used for sending the first voice to a server;

an execution unit to execute the first instruction.

13. The device of claim 11 or 12, wherein the smart device further comprises:

14. The device of claim 11 or 12, wherein the smart device further comprises:

the second acquisition module is used for acquiring third voice;

15. The apparatus of claim 14, wherein the second execution module comprises:

a sending unit, configured to send the third voice to a server;

an execution unit to execute the second instruction.

16. The device of claim 14, wherein the smart device further comprises:

and the resetting module is used for resetting the timer.

17. The device of claim 14, wherein the smart device further comprises:

18. A server, comprising:

the first analysis module is used for analyzing the first voice;

19. The server of claim 18, further comprising:

20. The server of claim 18, further comprising:

21. An electronic device, comprising:

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7 or 8-10.

22. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7 or 8-10.