WO2024076114A1 - 보이스 커맨드의 실행을 제어하기 위한 전자 장치 및 그 방법 - Google Patents
보이스 커맨드의 실행을 제어하기 위한 전자 장치 및 그 방법 Download PDFInfo
- Publication number
- WO2024076114A1 WO2024076114A1 PCT/KR2023/015158 KR2023015158W WO2024076114A1 WO 2024076114 A1 WO2024076114 A1 WO 2024076114A1 KR 2023015158 W KR2023015158 W KR 2023015158W WO 2024076114 A1 WO2024076114 A1 WO 2024076114A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- voice
- function
- utterance
- engine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- This disclosure relates to an electronic device and method for controlling the execution of voice commands.
- the interface between the electronic device and the user may include a keyboard and/or mouse.
- the types of interfaces between the electronic device and the user may be expanded.
- an electronic device may use a microphone to identify a user's speech for controlling the electronic device.
- an electronic device may include a communication circuit, a microphone, a speaker, a memory for storing at least one instruction, and at least one processor.
- the at least one processor may be configured to execute the at least one instruction and identify the first audio signal received through the microphone.
- the at least one processor executes the at least one instructions to perform at least one first of the plurality of functions based on identifying a first audio signal containing an utterance for sequentially executing the plurality of functions. It can be configured to perform a function.
- the at least one processor executes the at least one instructions to perform at least one of the plurality of functions of the external electronic device via the communication circuit based on identifying that the first audio signal includes utterance. It may be configured to perform a second function.
- the at least one second function may be different from the at least one first function among the plurality of functions.
- the at least one processor executes the at least one instruction and represents at least one result of performing the at least one first function and the at least one second function through the speaker. 2 may be configured to output audio signals sequentially based on the order associated with the utterance.
- a method of an electronic device may include identifying a speech in the first audio signal based on receiving the first audio signal through a microphone of the electronic device. .
- the method includes, based on identifying a plurality of voice commands from the utterance, obtaining a state of a first voice engine executed by the electronic device and a state of a second voice engine executed by an external electronic device. may include.
- the external electronic device may be connected through a communication circuit of the electronic device.
- the method includes, based on the state of the first voice engine and the state of the second voice engine, the plurality of voice commands, at least one first voice command performed by the first voice engine, and the second voice command. 2 It may include an operation of obtaining information to distinguish a second voice command performed by a voice engine.
- the method includes executing a first function corresponding to the at least one first voice command using the first voice engine, and executing a second function corresponding to the at least one second voice command using an external command. It may include an operation requesting an electronic device.
- the method includes representing at least one result of executing the first function and the second function, based on the order of the plurality of voice commands indicated by the utterance, through the speaker of the electronic device. ) may include the operation of outputting at least one second audio signal.
- a method of an electronic device may include identifying a first audio signal received through a microphone of the electronic device.
- the method may include performing at least one first function from the plurality of functions based on identifying that the first audio signal includes an utterance for sequentially performing the plurality of functions. there is.
- the method based on identifying that the first audio signal includes the utterance, performs at least one second function of the plurality of functions in an external electronic device connected through a communication circuit of the external electronic device. It may include actions to be performed.
- the at least one second function may be different from the at least one first function.
- the method includes at least one second function that expresses at least one result of performing the at least one first function and the at least one second function based on a remark related to the remark through a speaker of the electronic device. It may include an operation of sequentially outputting audio signals.
- an electronic device may include a communication circuit, a microphone, a speaker, and a processor.
- the processor may be configured to identify a first audio signal through the microphone.
- the processor may be configured to execute at least one first function of the plurality of functions based on identifying an utterance for sequentially executing the plurality of functions from the first audio signal. Based on identifying the utterance, the processor executes at least one second function, among the plurality of functions, that is different from the at least one first function, using an external electronic device connected through the communication circuit. It can be configured to do so.
- the processor sequentially transmits, through the speaker, second audio signals representing results of executing the at least one first function and the at least one second function, in an order related to the utterance. Can be configured to output.
- a method of an electronic device may include identifying a first audio signal through a microphone of the electronic device.
- the method includes executing at least one first function of the plurality of functions based on a processor of the electronic device based on identifying an utterance for sequentially executing the plurality of functions from the first audio signal.
- the method based on identifying the utterance, uses an external electronic device connected through a communication circuit of the electronic device to perform at least one second function, among the plurality of functions, that is different from the at least one first function. It can include actions that execute functions.
- the method sequentially transmits second audio signals representing results of executing the at least one first function and the at least one second function through a speaker of the electronic device in an order related to the utterance. It may include an output operation.
- an electronic device may include a communication circuit, a microphone, a speaker, and a processor.
- the processor may be configured to identify utterances included in the first audio signal based on receiving the first audio signal through the microphone.
- the processor includes a first voice engine executed by the processor to process a voice command based on identifying a plurality of voice commands from the utterance, and the communication circuitry. It may be configured to obtain the states of a second voice engine executed by an external electronic device connected through.
- the processor provides information for dividing the plurality of voice commands into a first voice command corresponding to the first voice engine and a second voice command corresponding to the second voice engine, based on the states. It can be configured to obtain.
- the processor executes a first function corresponding to the first voice command using the first voice engine, and executes a second function corresponding to the second voice command to the external electronic device. It can be configured to request execution of .
- the processor is configured to represent, through the speaker, results of executing the first function and the second function, based on the order of the plurality of voice commands indicated by the utterance. 2 may be configured to output an audio signal.
- a method of an electronic device may include identifying a speech included in the first audio signal based on receiving the first audio signal through a microphone of the electronic device. You can.
- the method includes a first voice engine executed by a processor of the electronic device to process voice commands based on identifying a plurality of voice commands from the utterance, and an external electronic device connected through a communication circuit of the electronic device. It may include obtaining the states of a second voice engine executed by the device.
- the method provides information for dividing the plurality of voice commands into a first voice command corresponding to the first voice engine and a second voice command corresponding to the second voice engine, based on the states. It may include acquisition operations.
- the method executes a first function corresponding to the first voice command using the first voice engine, and executes a second function corresponding to the second voice command with the external electronic device. It may include an action requesting the execution of .
- the method includes, through a speaker of the electronic device, a second function representing the results of executing the first function and the second function, based on the order of the plurality of voice commands indicated by the utterance. It may include an operation of outputting an audio signal.
- FIG. 1 is a block diagram of an electronic device in a network environment, according to one embodiment.
- FIG. 2 illustrates an example of an operation in which an electronic device executes one or more functions based on speech, according to an embodiment.
- Figure 3 is a block diagram of an electronic device and an external electronic device, according to one embodiment.
- FIG. 4 is a block diagram for explaining a program executed by an electronic device, according to an embodiment.
- Figure 5 is a block diagram of a voice command database included in an electronic device, according to one embodiment.
- Figure 6 is a block diagram for explaining a program executed by an electronic device, according to an embodiment.
- FIG. 7 is a block diagram for explaining a program executed by an electronic device, according to an embodiment.
- FIGS. 8A and 8B illustrate example states in which an electronic device displays a user interface (UI), according to an embodiment.
- UI user interface
- FIG. 9 illustrates an operation in which an electronic device executes a plurality of functions based on a utterance, according to an embodiment.
- Figure 10 is a flowchart of operations performed by an electronic device, according to one embodiment.
- 11 is a flowchart of operations performed by an electronic device, according to one embodiment.
- FIG. 12 is a flowchart of operations performed by an electronic device, according to one embodiment.
- FIG. 13 is a block diagram showing an artificial intelligence (AI) system according to an embodiment.
- AI artificial intelligence
- Figure 14 is a diagram showing a schema for storing relationship information between concepts and actions in a database, according to an embodiment.
- FIG. 15 is a diagram illustrating a user terminal displaying a screen for processing voice input received through an intelligent app, according to an embodiment.
- the components are not limited. When a component (e.g., a first) component is said to be “connected (functionally or communicatively)" or “connected” to another (e.g., second) component, it means that the component is connected to the other component. It may be connected directly to a component or may be connected through another component (e.g., a third component).
- module used in this document includes a unit comprised of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example.
- a module may be an integrated part, a minimum unit that performs one or more functions, or a part thereof.
- a module may be comprised of an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- FIG. 1 is a block diagram of an electronic device 101 in a network environment 100, according to one embodiment.
- an electronic device 101 communicates with an external electronic device 102 through a first network 198 (e.g., a short-range wireless communication network) or a second network 199. ) (e.g., a long-distance wireless communication network) may communicate with at least one of the external electronic device 104 or the server 108. According to one embodiment, the electronic device 101 may communicate with the external electronic device 104 through the server 108.
- a first network 198 e.g., a short-range wireless communication network
- the electronic device 101 may communicate with the external electronic device 104 through the server 108.
- the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, and a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or may include an antenna module 197.
- at least one of these components eg, the connection terminal 178) may be omitted, or one or more other components may be added to the electronic device 101.
- some of these components e.g., sensor module 176, camera module 180, or antenna module 197) are integrated into one component (e.g., display module 160). It can be.
- the processor 120 for example, executes software (e.g., program 140) to operate at least one other component (e.g., hardware or software component) of the electronic device 101 connected to the processor 120. It can be controlled and various data processing or operations can be performed. According to one embodiment, as at least part of data processing or computation, the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132. The commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134.
- software e.g., program 140
- the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132.
- the commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134.
- the processor 120 includes a main processor 121 (e.g., a central processing unit or an application processor) or an auxiliary processor 123 that can operate independently or together (e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor).
- a main processor 121 e.g., a central processing unit or an application processor
- auxiliary processor 123 e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor.
- the electronic device 101 includes a main processor 121 and a auxiliary processor 123
- the auxiliary processor 123 may be set to use lower power than the main processor 121 or be specialized for a designated function. You can.
- the auxiliary processor 123 may be implemented separately from the main processor 121 or as part of it.
- the auxiliary processor 123 may, for example, act on behalf of the main processor 121 while the main processor 121 is in an inactive (e.g., sleep) state, or while the main processor 121 is in an active (e.g., application execution) state. ), together with the main processor 121, at least one of the components of the electronic device 101 (e.g., the display module 160, the sensor module 176, or the communication module 190) At least some of the functions or states related to can be controlled.
- co-processor 123 e.g., image signal processor or communication processor
- may be implemented as part of another functionally related component e.g., camera module 180 or communication module 190. there is.
- the auxiliary processor 123 may include a hardware structure specialized for processing artificial intelligence models.
- Artificial intelligence models can be created through machine learning. For example, such learning may be performed in the electronic device 101 itself on which the artificial intelligence model is performed, or may be performed through a separate server (e.g., server 108).
- Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited.
- An artificial intelligence model may include multiple artificial neural network layers.
- Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above.
- artificial intelligence models may additionally or alternatively include software structures.
- the memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101. Data may include, for example, input data or output data for software (e.g., program 140) and instructions related thereto.
- Memory 130 may include volatile memory 132 or non-volatile memory 134.
- the program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142, middleware 144, or application 146.
- the input module 150 may receive commands or data to be used in a component of the electronic device 101 (e.g., the processor 120) from outside the electronic device 101 (e.g., a user).
- the input module 150 may include, for example, a microphone, mouse, keyboard, keys (eg, buttons), or digital pen (eg, stylus pen).
- the sound output module 155 may output sound signals to the outside of the electronic device 101.
- the sound output module 155 may include, for example, a speaker or a receiver. Speakers can be used for general purposes such as multimedia playback or recording playback.
- the receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.
- the display module 160 can visually provide information to the outside of the electronic device 101 (eg, a user).
- the display module 160 may include, for example, a display, a hologram device, or a projector, and a control circuit for controlling the device.
- the display module 160 may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of force generated by the touch.
- the audio module 170 can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device (e.g., directly or wirelessly connected to the electronic device 101). Sound can be output through an external electronic device 102 (e.g., speaker or headphone).
- an external electronic device 102 e.g., speaker or headphone
- the sensor module 176 detects the operating state (e.g., power or temperature) of the electronic device 101 or the external environmental state (e.g., user state) and generates an electrical signal or data value corresponding to the detected state. can do.
- the sensor module 176 includes, for example, a gesture sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, humidity sensor, or light sensor.
- the interface 177 may support one or more designated protocols that can be used to connect the electronic device 101 directly or wirelessly with an external electronic device (eg, the external electronic device 102).
- the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
- HDMI high definition multimedia interface
- USB universal serial bus
- SD card interface Secure Digital Card
- connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the external electronic device 102).
- the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).
- the haptic module 179 can convert electrical signals into mechanical stimulation (e.g., vibration or movement) or electrical stimulation that the user can perceive through tactile or kinesthetic senses.
- the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.
- the camera module 180 can capture still images and moving images.
- the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.
- the power management module 188 can manage power supplied to the electronic device 101.
- the power management module 188 may be implemented as at least a part of, for example, a power management integrated circuit (PMIC).
- PMIC power management integrated circuit
- Battery 189 may supply power to at least one component of electronic device 101.
- the battery 189 may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.
- Communication module 190 provides a direct (e.g., wired) communication channel or wireless communication between electronic device 101 and an external electronic device (e.g., external electronic device 102, external electronic device 104, or server 108). It can support the establishment of a channel and the performance of communication through the established communication channel. Communication module 190 operates independently of processor 120 (e.g., an application processor) and may include one or more communication processors that support direct (e.g., wired) communication or wireless communication.
- processor 120 e.g., an application processor
- the communication module 190 is a wireless communication module 192 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (e.g., : LAN (local area network) communication module, or power line communication module) may be included.
- a wireless communication module 192 e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module
- GNSS global navigation satellite system
- wired communication module 194 e.g., : LAN (local area network) communication module, or power line communication module
- the corresponding communication module is a first network 198 (e.g., a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (e.g., legacy It may communicate with an external electronic device 104 through a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network
- the wireless communication module 192 uses subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199.
- subscriber information e.g., International Mobile Subscriber Identifier (IMSI)
- IMSI International Mobile Subscriber Identifier
- the wireless communication module 192 may support 5G networks after 4G networks and next-generation communication technologies, for example, NR access technology (new radio access technology).
- NR access technology provides high-speed transmission of high-capacity data (eMBB (enhanced mobile broadband)), minimization of terminal power and access to multiple terminals (mMTC (massive machine type communications)), or high reliability and low latency (URLLC (ultra-reliable and low latency). -latency communications)) can be supported.
- the wireless communication module 192 may support high frequency bands (eg, mmWave bands), for example, to achieve high data rates.
- the wireless communication module 192 uses various technologies to secure performance in high frequency bands, for example, beamforming, massive array multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. It can support technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna.
- the wireless communication module 192 may support various requirements specified in the electronic device 101, an external electronic device (e.g., external electronic device 104), or a network system (e.g., second network 199).
- the wireless communication module 192 supports Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mmTC, or U-plane latency (e.g., 164 dB or less) for realizing URLLC.
- Peak data rate e.g., 20 Gbps or more
- loss coverage e.g., 164 dB or less
- U-plane latency e.g., 164 dB or less
- the antenna module 197 may transmit signals or power to or receive signals or power from the outside (e.g., an external electronic device).
- the antenna module 197 may include an antenna including a radiator made of a conductor or a conductive pattern formed on a substrate (eg, PCB).
- the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is connected to the plurality of antennas by, for example, the communication module 190. can be selected Signals or power may be transmitted or received between the communication module 190 and an external electronic device through the at least one selected antenna.
- other components eg, radio frequency integrated circuit (RFIC) may be additionally formed as part of the antenna module 197.
- RFIC radio frequency integrated circuit
- a mmWave antenna module includes: a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed circuit board and capable of transmitting or receiving signals in the designated high frequency band. can do.
- a first side e.g., bottom side
- a designated high frequency band e.g., mmWave band
- a plurality of antennas e.g., array antennas
- peripheral devices e.g., bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)
- signal e.g. commands or data
- commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199.
- Each of the external electronic devices 102 or 104 may be of the same or different type as the electronic device 101.
- all or part of the operations performed in the electronic device 101 may be executed in one or more of the external electronic devices 102, 104, or 108.
- the electronic device 101 may perform the function or service instead of executing the function or service on its own.
- one or more external electronic devices may be requested to perform at least part of the function or service.
- One or more external electronic devices that have received the request may execute at least part of the requested function or service, or an additional function or service related to the request, and transmit the result of the execution to the electronic device 101.
- the electronic device 101 may process the result as is or additionally and provide it as at least part of a response to the request.
- cloud computing distributed computing, mobile edge computing (MEC), or client-server computing technology can be used.
- the electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing.
- the external electronic device 104 may include an Internet of Things (IoT) device.
- Server 108 may be an intelligent server using machine learning and/or neural networks.
- the external electronic device 104 or server 108 may be included in the second network 199.
- the electronic device 101 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.
- FIG. 2 illustrates an operation in which the electronic device 101 executes one or more functions based on a speech 220, according to an embodiment.
- the electronic device 101 of FIG. 2 may include the electronic device 101 of FIG. 1 .
- the electronic device 101 may be a terminal owned by a user.
- Terminals may include, for example, personal computers (PCs) such as laptops and desktops, smartphones, smartpads, and tablet PCs.
- the terminal may include smart accessories such as a smartwatch and/or a head-mounted device (HMD).
- HMD head-mounted device
- the electronic device 101 may interact with the user using sound.
- the electronic device 101 uses a microphone (e.g., the input module 150 in FIG. 1) to acquire an audio signal, which is an electrical signal dependent on the sound of an external space including the electronic device 101. can do.
- the electronic device 101 may identify the user's speech 220 from the audio signal.
- electronic device 101 may execute one or more functions related to utterance 220 .
- the electronic device 101 may output information including results of executing the one or more functions to the user.
- the electronic device 101 may generate an audio signal including a speech 250.
- the electronic device 101 may control a speaker (e.g., the sound output module 155 of FIG. 1) based on the audio signal to reproduce a sound dependent on the audio signal.
- the structure of the electronic device 101 including a microphone and/or speaker is described with reference to FIG. 3 .
- the electronic device 101 may identify an input indicating execution of at least one function related to the electronic device 101 from the utterance 220.
- a voice command may include a command (eg, a command in natural language) input to the electronic device 101 based on a sound, such as a speech 220.
- the voice command is a function and/or task ( It can be a unit of task).
- Executing a voice command may include an operation of the electronic device 101 executing at least one function corresponding to the voice command.
- the electronic device 101 may identify one or more voice commands included in the utterance 220 based on the natural language included in the utterance 220.
- the external electronic device 210 connected to the electronic device 101 includes a server (e.g., server 108 in FIG. 1) connected to the electronic device 101 to process audio signals including utterances 220. can do.
- the electronic device 101 identifies a plurality of voice commands (e.g., a first voice command 230 and/or a second voice command 240) from an audio signal identified through a microphone.
- a plurality of voice commands e.g., a first voice command 230 and/or a second voice command 240
- the electronic device 101 may identify an utterance 220 for sequentially executing a plurality of functions from the audio signal.
- the utterance 220 may include designated natural language (e.g., words such as “Hey, Bixby”) for triggering recognition of the utterance 220 by the electronic device 101.
- the electronic device 101 that has identified the specified natural language may recognize the utterance 220 and identify one or more voice commands included in the utterance 220.
- the utterance 220 recognized by the electronic device 101 may include natural language (eg, words, phrases, and/or sentences) expressing each of the plurality of functions.
- the utterance 220 recognized by the electronic device 101 may include a name assigned to the group of the plurality of functions.
- the name assigned to the group may be referred to as a quick command and/or shortcut.
- the electronic device 101 can execute a plurality of functions included in the group distinguished by the name.
- a group of voice commands for executing a plurality of functions may be registered with the electronic device 101 based on the interaction between the electronic device 101 and the user.
- An example of a user interface (UI) displayed by the electronic device 101 to identify an input for registering the group is described with reference to FIG. 8A.
- UI user interface
- the electronic device 101 based on identifying the utterance 220 for sequential execution of the first voice command 230 and the second voice command 240, executes the first voice command 230 and the second voice command 240. (230), and functions corresponding to the second voice command (240) can be executed.
- the above functions may be executed based on applications executed by the electronic device 101 and/or the external electronic device 210.
- the application executed by the electronic device 101 and/or the external electronic device 210 may be referred to as a voice engine and/or a natural language unit (NLU). there is.
- An electronic device executing a voice engine (eg, the electronic device 101 and/or the external electronic device 210) may be controlled to perform an operation matched to a voice command.
- the duration and/or speed at which the operation matched to the voice command is performed by execution of the voice engine may depend on the performance and/or state of resources of the electronic device on which the voice engine is running.
- the first voice command 230 and the second voice command 240 correspond to Functions may be executed by at least one of the voice engines.
- the electronic device 101 that identifies the utterance 220 for sequential execution of the first voice command 230 and the second voice command 240 includes a first voice engine executed by the electronic device 101, and a second voice engine executed by the external electronic device 210, capable of matching each of the first function corresponding to the first voice command 230 and the second function corresponding to the second voice command 240. there is.
- the electronic device 101 matches the first voice engine and the second voice engine, the first voice command 230, and the second voice command 240 to perform the first function and the second voice command.
- the execution of functions can be controlled or scheduled.
- both the first function and the second function may be centrally executed by either the first voice engine or the second voice engine.
- each of the first function and the second function may be distributed and executed by the first voice engine and the second voice engine.
- the electronic device 101 selects a voice to be used for execution of at least one voice command included in the utterance 220, based on the states of voice engines executed by different electronic devices including the electronic device 101. You can choose your engine. For example, at least one of the status of the first voice engine executed by the electronic device 101 or the status of the second voice engine executed by the external electronic device 210 may be monitored.
- the state of the voice engine includes whether the function corresponding to the voice command can be executed based on the execution of the voice engine, the time required to execute the function, and/or the latency time before executing the function.
- the electronic device 101 selects a voice engine to be used for executing each of the first voice command 230 and the second voice command 240 included in the utterance 220, the first voice command 240, and the second voice command 240. You can select from a voice engine and the second voice engine. Within the example case of FIG. 2 , the electronic device 101 matches the first voice engine and the first voice command 230 and matches the second voice engine and the second voice command 240. can do.
- the electronic device 101 may execute a first function corresponding to the first voice command 230.
- the electronic device 101 may obtain first information 235 including the result of executing the first function based on execution of the first voice engine.
- the electronic device 101 executes the second voice engine to execute a second function corresponding to the second voice command 240.
- the request can be made to the external electronic device 210.
- the electronic device 101 may receive second information 245 related to the second function from the external electronic device 210 as a response to the request.
- the second information 245 may include a result of executing the second function based on the external electronic device 210 and/or the second voice engine.
- the second information 245 may include information for controlling the electronic device 101 based on execution of the second function.
- the electronic device 101 performs a first function matched to the first voice command 230 based on obtaining the first information 235 and/or the second information 245, And an audio signal including a speech 250 expressing the results of executing the second function matched to the second voice command 240 may be generated.
- the electronic device 101 provides the first information 235 including the result of executing the first function, and the second information including the result of executing the second function, based on one or more natural language sentences. It is possible to obtain utterance (250), which expresses (245).
- the electronic device 101 may generate the audio signal including the utterance 250.
- the electronic device 101 may control a speaker based on the generated audio signal and output a sound dependent on the audio signal through the speaker.
- the form in which the electronic device 101 outputs the speech 250 is not limited to the audio signal and/or sound shown in FIG. 2 .
- An embodiment in which the electronic device 101 uses hardware such as the display 260 to visualize the result of executing at least one voice command for the utterance 220 is described with reference to FIG. 8B.
- the electronic device 101 executes the plurality of functions based on identifying the utterance 220 for sequentially executing the plurality of functions from the audio signal output from the microphone.
- Voice engines can be scheduled.
- the electronic device 101 may perform the plurality of functions based on the state of at least one of the voice engines, at least one first function to be executed by a first voice engine, and at least one first function to be executed by a second voice engine. It can be distinguished by at least one secondary function to be executed.
- the electronic device 101 executes at least one first function among the plurality of functions and, using the external electronic device 210, among the plurality of functions: capable of executing at least one second function that is different from the at least one first function.
- the first information 235 of FIG. 2 may be an example of a result of executing the at least one first function based on the first voice engine executed by the electronic device 101.
- the second information 245 of FIG. 2 may be an example of a result of executing the at least one second function based on the second voice engine executed by the external electronic device 210.
- the electronic device 101 may output second audio signals expressing results of executing the at least one first function and the at least one second function through a speaker.
- the order in which the electronic device 101 outputs the second audio signals may match the order of the plurality of functions related to the utterance 220. Because a plurality of functions corresponding to the utterance 220 are executed by distributed processing based on voice engines executed by different electronic devices, the electronic device 101 can more quickly display the results of executing the plurality of functions. It can be obtained easily.
- FIG. 3 is a block diagram of an electronic device 101 and an external electronic device 210, according to an embodiment.
- the electronic device 101 of FIG. 3 may include the electronic device 101 of FIGS. 1 and 2 .
- the external electronic device 210 of FIG. 3 may include the external electronic device 210 of FIG. 2 .
- the electronic device 101 may include at least one of a processor 120, a memory 130, a display 260, a speaker 310, a microphone 320, or a communication circuit 330.
- Processor 120, memory 130, display 260, speaker 310, microphone 320, and communication circuit 330 are electronic components such as a communication bus 305. may be electrically and/or operably coupled to each other.
- hardware being operatively combined will mean that a direct connection or an indirect connection between the hardware is established, wired or wireless, such that the second hardware is controlled by the first hardware among the hardware. You can. Although shown based on different blocks, the embodiment is not limited thereto, and some of the hardware in FIG.
- SoC SoC
- the electronic device 101 may include only some of the hardware shown in FIG. 3 .
- the processor 120 of the electronic device 101 may include hardware components for processing data based on one or more instructions.
- Hardware components for processing data include, for example, an arithmetic and logic unit (ALU), a floating point unit (FPU), a field programmable gate array (FPGA), a central processing unit (CPU), and/or an application processor (AP).
- ALU arithmetic and logic unit
- FPU floating point unit
- FPGA field programmable gate array
- CPU central processing unit
- AP application processor
- the processor 120 may include a plurality of processors.
- the processor 120 may have the structure of a multi-core processor such as dual core, quad core, or hexa core.
- the processor 120 of FIG. 3 may include the processor 120 of FIG. 1 .
- the memory 130 of the electronic device 101 may include hardware components for storing data and/or instructions input and/or output to the processor 120 .
- Memory 130 may include, for example, volatile memory such as random-access memory (RAM) and/or non-volatile memory such as read-only memory (ROM). there is.
- Volatile memory may include, for example, at least one of dynamic RAM (DRAM), static RAM (SRAM), cache RAM, and pseudo SRAM (PSRAM).
- Non-volatile memory includes, for example, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), flash memory, hard disk, compact disk, solid state drive (SSD), and embedded multi media card (eMMC).
- PROM programmable ROM
- EPROM erasable PROM
- EEPROM electrically erasable PROM
- flash memory hard disk, compact disk, solid state drive (SSD), and embedded multi media card (eMMC).
- eMMC embedded multi media card
- the display 260 of the electronic device 101 may output visualized information (eg, at least one of the screens of FIGS. 8A and 8B) to the user.
- the display 260 may be controlled by a controller such as a graphic processing unit (GPU) and/or the processor 120 to output visualized information to the user.
- the display 260 may include a flat panel display (FPD) and/or electronic paper.
- the FPD may include a liquid crystal display (LCD), a plasma display panel (PDP), and/or one or more light emitting diodes (LED).
- the LED may include an organic LED (OLED).
- the display 260 of FIG. 3 may include the display module 160 of FIG. 1 .
- the display 260 of the electronic device 101 may include a sensor (e.g., touch sensor panel (TSP)) for detecting an external object (e.g., a user's finger) on the display 260.
- a sensor e.g., touch sensor panel (TSP)
- TSP touch sensor panel
- the electronic device 101 may detect an external object that is in contact with the display 260 or floating on the display 260.
- the electronic device 101 executes a function associated with a specific visual object corresponding to the location on the display 260 of the external object among the visual objects being displayed within the display 260. You can.
- the electronic device 101 may include a speaker 310 as an output means for outputting information in a form other than a visualized form.
- the speaker 310 may include a circuit element that is vibrated by an audio signal received from the processor 120 (e.g., an audio signal including the utterance 250 of FIG. 2).
- the number of speakers 310 included in the electronic device 101 is not limited to the example shown in FIG. 3, and the electronic device 101 may include one or more speakers.
- the electronic device 101 may include other output means for outputting information in forms other than visual and auditory forms.
- the electronic device 101 may include a motor to provide haptic feedback based on vibration.
- the microphone 320 of the electronic device 101 may output an electrical signal indicating vibration in the atmosphere.
- the electronic device 101 may identify the user's speech (e.g., speech 220 in FIG. 2) from an audio signal, which is an electrical signal output from the microphone 320.
- the user's utterances included in the audio signal are in a format recognizable by the electronic device 101 based on a speech recognition model and/or a natural language understanding model, which is an application and/or process executed by the processor 120. It can be converted into information.
- the electronic device 101 may recognize a user's speech and execute one or more functions among a plurality of functions that can be provided by the electronic device 101.
- the speaker 310 and/or microphone 320 of FIG. 3 may include the sound output module 155 and/or audio module 170 of FIG. 1 .
- the communication circuit 330 of the electronic device 101 may include hardware to support transmission and/or reception of electrical signals between the electronic device 101 and the external electronic device 210.
- the number of external electronic devices 210 connected to the electronic device 101 through the communication circuit 330 is not limited to the embodiment of FIG. 2 and/or FIG. 3 .
- the communication circuit 330 may include, for example, at least one of a modem (MODEM), an antenna, and an optical/electronic (O/E) converter.
- MODEM modem
- O/E optical/electronic
- the communication circuit 330 includes Ethernet, local area network (LAN), wide area network (WAN), wireless fidelity (WiFi), Bluetooth, bluetooth low energy (BLE), ZigBee, long term evolution (LTE), It can support transmission and/or reception of electrical signals based on various types of protocols, such as 5G new radio (NR).
- the communication circuit 330 of FIG. 3 may include the communication module 190, subscriber identification module 196, and/or antenna module 197 of FIG. 1.
- one or more instructions indicating operations and/or operations to be performed by the processor 120 on data may be stored.
- the set of one or more instructions may include firmware, an operating system (e.g., operating system 142 in FIG. 1), a program (e.g., program 140 in FIG. 1), a process, a routine, a sub-routine, and/or an application (e.g., , may be referred to as application 146 in FIG. 1.
- the electronic device 101 and/or the processor 120 executes a set of a plurality of instructions distributed in the form of an operating system, firmware, driver, and/or application. , at least one of the operations of FIGS. 10 to 12 may be performed.
- This may mean stored in a format executable by the processor 120 (eg, a file with an extension specified by the operating system of the electronic device 101).
- the external electronic device 210 may be connected to the electronic device 101 by wire or wirelessly to perform functions related to voice recognition.
- the external electronic device 210 may include a server.
- a server may include one or more PCs and/or workstations.
- the server is a service that executes one or more functions corresponding to a speech identified from an audio signal output from the microphone 320 of the electronic device 101, and may include, for example, a voice recognition service. You can.
- the external electronic device 210 may include at least one of a processor 120, a memory 130, and a communication circuit 330. Within external electronic device 210 , processor 120 , memory 130 , and communication circuitry 330 may be electrically and/or operationally coupled via communication bus 305 .
- the processor 120, memory 130, and communication circuit 330 included in the external electronic device 210 are the processor 120, memory 130, and communication circuit 330 of the electronic device 101. It may include corresponding hardware components and/or circuits.
- the description of the processor 120, memory 130, and communication circuit 330 included in the external electronic device 210 will be described as the processor 120, memory ( 130), and may be omitted to the extent it overlaps with the communication circuit 330.
- the program executed by the processor 120 of the electronic device 101 includes an input receiver 340, a quick command processor 350, a speech processor 360, and/or a first voice engine ( 370) is shown.
- a set of a plurality of instructions stored in memory 130 and/or processes executed by processor 120 include input receiver 340, quick command processor 350, speech processor 360, and /Or it may be divided into a first voice engine 370.
- the input receiver 340, quick command processor 350, speech processor 360, and/or first voice engine 370 may be included in an application installed in the electronic device 101.
- the application may be executed by the processor 120 of the electronic device 101 to control the electronic device 101 based on utterances containing natural language sentences.
- a second voice engine 380 is shown as a program executed by the processor 120 of the external electronic device 210.
- the second voice engine 380 may include a set of a plurality of applications (eg, applications) stored in the memory 130.
- the second voice engine 380 uses the processor 120 of the external electronic device 210 to execute a voice command requested from another electronic device (e.g., the electronic device 101) connected to the external electronic device 210. It can be executed by .
- the electronic device 101 uses a communication circuit to execute at least one voice command identified by the electronic device 101 using the second voice engine 380 executed by the external electronic device 210.
- a communication link with an external electronic device 210 can be established using 330. For example, the electronic device 101 establishes the communication link based on identifying, from an audio signal output from the microphone 320, a designated natural language for guiding utterance of a natural language sentence corresponding to at least one voice command. can be established.
- the electronic device 101 may identify a speech from the audio signal output from the microphone 320 based on the execution of the input receiver 340. With the input receiver 340 running, the electronic device 101 may perform an operation to identify speech included in the audio signal based on receiving the audio signal through the microphone 320.
- the audio signal processed by the input receiver 340 may be transmitted to the second voice engine 380 of the external electronic device 210 by the processor 120 of the electronic device 101.
- the input receiver 340 receives not only an audio signal output from the microphone 320, but also one or more characters input through a software keyboard displayed through the display 260 and/or a hardware keyboard connected to the electronic device 101. , the text corresponding to the user's remarks can be identified.
- the electronic device 101 may obtain text representing the utterance included in the audio signal from the external electronic device 210 executing the second voice engine 380. Based on the execution of the quick command processor 350, the electronic device 101 determines whether the text obtained from the external electronic device 210 includes a name (e.g., quick command) assigned to a group of one or more voice commands. can be identified. The electronic device 101 may obtain the name and a group of one or more voice commands corresponding to the name from the user based on execution of the quick command processor 350. Based on the execution of the quick command processor 350, the electronic device 101 may determine whether the user's statement includes a name assigned to the group or process a user input for registering the group. An exemplary structure of the quick command processor 350 is described with reference to FIG. 6 .
- the electronic device 101 When a utterance included in an audio signal acquired through the microphone 320 includes a name assigned to a group of one or more voice commands, the electronic device 101, based on the execution of the utterance processor 360, selects the group. You can control the execution of one or more voice commands included.
- the electronic device 101 may select a voice engine through which the one or more voice commands will be input, from the first voice engine 370 or the second voice engine 380, based on the execution of the speech processor 360. For example, the electronic device 101 may schedule execution of one or more functions corresponding to the one or more voice commands based on the execution of the speech processor 360.
- the electronic device 101 Based on identifying a plurality of voice commands from the utterance, the electronic device 101 selects one or more voice engines (e.g., first voice engine 370, and/or second voice engine 370) running to process the voice command.
- the states of the engine 380 can be obtained. Based on the states, the electronic device 101 may schedule execution of the one or more functions. Based on the execution of the speech processor 360, the electronic device 101 inputs one or more voice commands in the group to the first voice engine 370, or the communication circuit 330 of the electronic device 101. It can be transmitted to the external electronic device 210 through.
- voice engines e.g., first voice engine 370, and/or second voice engine 370
- the states of the engine 380 can be obtained. Based on the states, the electronic device 101 may schedule execution of the one or more functions. Based on the execution of the speech processor 360, the electronic device 101 inputs one or more voice commands in the group to the first voice engine 370, or the communication circuit 330 of the electronic device 101. It can be transmitted to
- the electronic device 101 Based on the execution of the speech processor 360, the electronic device 101 generates information (e.g., first information 235 of FIG. 2) including results of executing one or more functions corresponding to the one or more voice commands. and/or second information 245) may be obtained from the first voice engine 370 and/or the second voice engine 380.
- the electronic device 101 may output the information obtained using the speech processor 360 through the speaker 310 and/or the display 260.
- the electronic device 101 may transmit an audio signal representing the information to the speaker 310.
- the electronic device 101 may control the display 260 to display a visual object and/or a screen including the information.
- An example structure of a speech processor 360 executed by the electronic device 101 for scheduling voice commands and/or obtaining results of executing functions corresponding to voice commands is shown with reference to FIGS. 4 to 7 . explained.
- the electronic device 101 may identify at least one voice command matched to the first voice engine 370 by the speech processor 360, based on the execution of the first voice engine 370.
- the electronic device 101 may execute at least one function corresponding to the at least one voice command.
- the at least one function includes a function for controlling hardware included in the electronic device 101 (e.g., a function to adjust the volume of the speaker 310) and a function supported by an application installed in the memory 130 (e.g. , alarm, music playback), and/or a function for retrieving information from the network through the communication circuit 330 (e.g., weather retrieval).
- the electronic device 101 may store the result of executing the at least one voice command using the speech processor 360 or output it to the user.
- the processor 120 of the external electronic device 210 may execute at least one function corresponding to at least one voice command requested from the electronic device 101 based on the execution of the second voice engine 380. . Based on the speech processor 360, the electronic device 101 may request the external electronic device 210 to execute at least one voice command. While the second voice engine 380 is running, the processor 120 of the external electronic device 210 may execute at least one function corresponding to the at least one voice command included in the request. The processor 120 of the external electronic device 210 may transmit the result of executing the at least one function to the electronic device 101 using the communication circuit 330. The results transmitted to the electronic device 101 may be output to the user by the processor 120 of the electronic device 101 executing the speech processor 360.
- the electronic device 101 operates a plurality of voice engines (e.g., a first voice engine 370 and a second voice engine) based on an application such as the speech processor 360. Execution of at least one function to be executed by (380)) may be scheduled or controlled. Based on the scheduling of the plurality of voice engines, the electronic device 101 efficiently uses the plurality of voice engines, while the electronic device 101 and/or external electronics are occupied for execution of the plurality of voice engines. The resources of the device 210 can be reduced.
- a plurality of voice engines e.g., a first voice engine 370 and a second voice engine
- the electronic device 101 may identify attributes of voice commands included in the group while registering a group of voice commands by interacting with a user.
- the attribute can be used for scheduling of the voice engine to execute functions corresponding to voice commands included in the group.
- Figure 4 is a block diagram of a program executed by the electronic device 101, according to one embodiment.
- the electronic device 101 of FIG. 4 may be an example of the electronic device 101 of FIG. 3 .
- the engine 370 may include the electronic device 101 of FIG. 4, memory 130, input receiver 340, quick command processor 350, speech processor 360, and first voice engine 370. You can.
- instructions and/or sub-routines of the first voice engine 370 for executing functions related to the user's utterance are included in the utterance recognizer 450.
- TTS text-to-speech
- natural language processor 460 natural language processor 460
- dependency manager 470 dependency manager 470
- event handler 475 Instructions and/or sub-routines of the natural language processor 460 may be divided into a domain classifier 462 and/or a goal classifier 464.
- instructions and/or sub-routines of the utterance processor 360 for identifying one or more voice commands from a user's utterance include the utterance type classifier 410. ), a voice command interpreter 420, and/or a voice command controller 440.
- the electronic device 101 can manage the voice command database 430 in the memory 130.
- An example of information stored in the voice command database 430 is described with reference to FIG. 5 .
- Instructions, and/or sub-routines within the voice command interpreter 420 may be divided into a meta data analyzer 422, and/or a voice command generator 424.
- Instructions and/or sub-routines within the voice command controller 440 may be divided into a voice command manager 442, a voice engine scheduler 446, and a utterance storage 448.
- the electronic device 101 may control the execution of one or more voice commands that match the user's intention included in the speech.
- the electronic device 101 selects each of the plurality of voice commands included in the group based on the execution of the utterance type classifier 410. Properties for can be obtained.
- the electronic device 101 may generate metadata of a voice command based on the above properties.
- the metadata may be stored in the memory 130 of the electronic device 101. Table 1 may include an example of properties included in the metadata of a voice command.
- the names in Table 1 may be example names assigned to attributes.
- An attribute with the name Prompt can indicate whether additional parameters are required to execute the function of the voice command, based on a boolean data type.
- An attribute with the name Time may indicate whether the result of executing the above function varies depending on time.
- An attribute with the Location name may indicate whether the result of executing the function varies depending on the location of the electronic device 101.
- An attribute with a device context name may indicate whether the result of executing the function varies depending on the state of the electronic device 101, based on a Boolean data type.
- An attribute with the name onDevice supported may indicate whether the function is executable by the first voice engine 370, based on a Boolean data type.
- Attributes included in metadata e.g., attributes with Prompt, Time, Location, Device context, onDevice supported names
- dependency manager 470 e.g., dependency map
- an attribute with the name Event list may include a list of one or more functions corresponding to a voice command.
- the names of the one or more functions may be stored.
- storing the text 'message' in an attribute with an event list name may indicate that a voice command corresponding to metadata requests execution of a function to display a message.
- storing the text 'TTS_Response' in an attribute with the Event list name may indicate that a voice command corresponding to metadata requests execution of the TTS function.
- the fact that the text 'appLaunch' is stored in an attribute with the Event list name may indicate that a voice command corresponding to metadata requests execution of at least one application installed on the electronic device 101.
- the electronic device 101 may obtain metadata for a voice command based on the first voice engine 370 running together with the utterance type classifier 410. Based on the execution of the natural language processor 460 in the first voice engine 370, the electronic device 101 may perform preprocessing on the natural language included in the voice command. For example, the electronic device 101 may identify whether the voice command can be processed by the first voice engine 370 based on the execution of the domain classifier 462. Whether the voice command can be processed by the first voice engine 370 may be stored in an attribute with the name onDevice supported in Table 1. The electronic device 101 may identify the target of the function (or action) to be executed by the voice command based on the execution of the target classifier 464.
- the electronic device 101 may perform post-processing on the preprocessed voice command. Based on the execution of the dependency manager 470, the electronic device 101 can identify device dependencies of resources used for processing voice commands. The device dependency may include, for example, whether a function corresponding to the voice command can be processed by the first voice engine 370 and/or the electronic device 101. Based on execution of the event handler 475, the electronic device 101 may obtain a list of functions corresponding to the voice command. The list acquired by the electronic device 101 may be stored in an attribute with the name Event list within metadata.
- the electronic device 101 may obtain metadata corresponding to one or more voice commands grouped by a quick command.
- the acquired metadata may be stored in the memory 130 of the electronic device 101.
- Registering a quick command by the electronic device 101 may include storing the metadata in the memory 130.
- Registering a quick command by the electronic device 101 may include storing the one or more voice commands in the voice command database 430.
- the electronic device 101 may process the metadata obtained based on execution of the utterance type classifier 410 based on the voice command interpreter 420. Based on the execution of the voice command interpreter 420, the electronic device 101 can identify information to be stored in the voice command database 430 from the metadata. Based on the execution of the metadata analyzer 422, the electronic device 101 may parse metadata stored in the memory 130 based on the utterance type classifier 410. Based on the execution of the voice command generator 424, the electronic device 101 may store the information parsed by the metadata analyzer 422 in the voice command database 430. An example of a schema of the voice command database 430 is described with reference to FIG. 5 .
- the electronic device 101 may obtain metadata corresponding to one or more voice commands matching the quick command based on an input indicating registration of the quick command. .
- the electronic device 101 may store information corresponding to one or more voice commands in the voice command database 430.
- the electronic device 101 uses the stored metadata to One or more voice commands to be executed, and properties of the one or more voice commands may be identified.
- electronic device 101 may select the one or more voice commands and the properties. Can be identified quickly. Operations performed by the electronic device 101 based on identifying the utterance including the quick command are described with reference to FIG. 6 .
- FIG. 5 is a block diagram of a voice command database 430 included in an electronic device, according to one embodiment.
- the electronic device in FIG. 5 may be an example of the electronic device 101 in FIGS. 3 and 4 .
- the voice command database 430 of FIG. 4 may include the voice command database 430 of FIG. 5 .
- the voice command database 430 of FIG. 5 may be managed based on the execution of the speech processor 360 of FIGS. 3 and 4.
- the electronic device includes information included in the voice command database 430, a first table 510 for storing parameters corresponding to voice commands, and one or more events to be executed by the voice command. Related parameters can be stored separately in a second table 520 for storing them.
- the electronic device may update the voice command database 430 based on one or more voice commands while registering a quick command including one or more voice commands.
- the electronic device may store parameters corresponding to voice commands.
- Table 2 may include an example of the parameters stored in the first table 510 of the voice command database 430.
- a parameter with an ID name may include an identifier (e.g., key value) that is uniquely assigned to records in the first table 510 of the voice command database 430.
- Parameters with Command names may include natural language corresponding to voice commands.
- a parameter with the Engine type name may include a value for identifying a voice engine that can execute one or more functions corresponding to a voice command.
- a parameter with the Engine type name indicates that the one or more functions are a first voice engine (e.g., the first voice engine 370 of FIG. 3) executed by an electronic device, and an external electronic device (e.g., It may include a value indicating whether it can be executed by each of the second voice engines (e.g., the second voice engine 380 of FIG.
- a parameter with the name Static event list may indicate whether the function corresponding to the voice command is changed.
- a parameter with an event list name may indicate how the results of executing one or more functions corresponding to a voice command are output.
- a parameter with the name Event list may include an identifier uniquely assigned to a record of the second table 520.
- the electronic device stores a method in which results of executing one or more functions of a voice command matched to a specific record of the first table 510 are output based on a parameter having an event list name in the second table 520. Records can be identified.
- the parameter with the Event list name can be used for combining the first table 510 and the second table 520 (eg, combining tables based on a join operation).
- the electronic device may indicate how the results of executing one or more functions of the voice command are output based on a parameter (or field) having the name of an event message.
- the parameter may represent the method based on structured text such as JSON (javascript object notation).
- JSON Javascript object notation
- a parameter with an event message name may indicate how the results are output based on xml (extended marked-up language).
- xml extended marked-up language
- the electronic device may process an input indicating registering a quick command for executing a plurality of groups of functions.
- the quick command for example, performs functions arranged by the user (e.g., a function to play music, a function to cancel an alarm, and a weather forecast) based on a simplified natural language such as “good morning.” output function) can be executed sequentially.
- the electronic device may store information used to execute the functions in units of voice commands corresponding to each of the functions in the memory of the electronic device.
- the information stored by the electronic device may include parameters used in scheduling the voice commands (eg, the type of electronic device and/or voice engine on which the voice command can be executed).
- the electronic device may execute one or more functions corresponding to the one or more registered voice commands in response to an utterance including the quick command.
- the electronic device may control scheduling and/or execution of the one or more functions based on execution of the voice command controller 440 of FIG. 4 .
- FIGS. 6 and 7 an example of an operation performed by an electronic device based on identifying a quick command from an audio signal output from a microphone will be described, according to an embodiment.
- FIG. 6 is a block diagram for explaining a program executed by the electronic device 101, according to an embodiment.
- the electronic device 101 of FIG. 6 may be an example of the electronic device 101 of FIGS. 3 and 4 .
- the engine 370 may include the electronic device 101 of FIG. 6, memory 130, input receiver 340, quick command processor 350, speech processor 360, and first voice engine 370. You can.
- the instructions and/or sub-routines of the quick command processor 350 for identifying whether the user's utterance includes a quick command include a comparator 610, an executor 620, and a quick command. It can be divided into a generator 630.
- the electronic device 101 may manage a quick command database 640 containing information about quick commands in the memory 130. In the state of registering a quick command from a user, the electronic device 101 registers the natural language corresponding to the quick command, a language type (e.g., English), an identifier uniquely assigned to the quick command, and the time at which the quick command was registered. Can be stored in the quick command database 640.
- the information stored in the quick command database 640 may be stored in an external electronic device connected to the electronic device 101 (e.g., memory 130 of the external electronic device 210 in FIGS. 2 and 3).
- the electronic device 101 may identify a user's speech from an audio signal output from a microphone (eg, microphone 320 in FIG. 3).
- the electronic device 101 identifies the utterance by using the utterance recognizer 450 in the first voice engine 370 and/or the second voice engine of the external electronic device (e.g., the second voice engine 380 in FIG. 2). ) may include an operation of obtaining text corresponding to the audio signal, based on the execution of.
- the electronic device 101 can compare the text and the quick command stored in the quick command database 640 to identify whether a quick command is included in the utterance.
- the electronic device 101 may identify one or more voice commands included in the utterance based on the execution of the utterance processor 360.
- the electronic device 101 may identify metadata corresponding to the one or more voice commands from the memory 130 using the utterance type classifier 410. If metadata corresponding to the one or more voice commands exists in the memory 130, the electronic device 101 may identify properties of the one or more voice commands based on the metadata. Based on the identified properties, the electronic device 101 selects a voice engine to be used for executing the one or more voice commands: the first voice engine 370, or the second voice engine (e.g., the second voice engine in FIG. 3). You can choose from engines (380).
- the electronic device 101 may transmit the plurality of voice commands to the first voice engine 370, or the second voice engine ( Yes, it can be processed using at least one of the second voice engines 380 of FIG. 3). For example, the electronic device 101 may select a voice engine in an idle state among the first voice engine 370 and the second voice engine. In the above example, the electronic device 101 may sequentially execute a plurality of functions corresponding to the plurality of voice commands using the selected voice engine. To select an idle voice engine, the electronic device 101 establishes a communication link with an external electronic device (e.g., the external electronic device 210 of FIGS. 2 and 3) on which the second voice engine runs, or Alternatively, an established communication link may be maintained.
- an external electronic device e.g., the external electronic device 210 of FIGS. 2 and 3
- the electronic device 101 may perform an operation related to the quick command based on the executor 620. For example, the electronic device 101 may control execution of the speech processor 360 based on the executor 620. Within the state of identifying the quick command included in the utterance, the electronic device 101 executes the utterance processor 360 to identify metadata related to the quick command and parameters stored in the voice command database 430. You can. The metadata and the parameters may indicate at least one voice command matching the quick command.
- the quick command is registered in the electronic device 101 for execution of a plurality of voice commands, the user utters the quick command of a relatively short length independently of the sentences corresponding to each of the plurality of voice commands.
- functions corresponding to the plurality of voice commands can be executed.
- voice commands frequently executed by a user are grouped into quick commands
- the electronic device 101 recognizes the relatively short length of the quick command instead of recognizing sentences corresponding to the voice commands, and executes the plurality of voice commands. Commands can be executed.
- the electronic device 101 Within the state of identifying the metadata related to the quick command and the information in the voice command database 430, the electronic device 101 generates at least one command corresponding to the quick command based on the execution of the voice command controller 440. You can obtain a list of voice commands. The electronic device 101 may schedule a voice engine to process the at least one voice command based on the list. While the voice command controller 440 is running, the electronic device 101 can obtain information about the first voice engine 370 and the second voice engine executed by an external electronic device. Information on the second voice engine may include latency and/or delay generated when an external electronic device executing the second voice engine processes a voice command. Information about the second voice engine may include the status of the second voice engine (eg, whether it is in an idle state).
- the electronic device 101 indicates that the first voice engine 370 or the second voice engine, indicated by the states of the first voice engine 370 and the second voice engine, each corresponds to a voice command.
- the at least one voice command corresponding to the quick command may be classified based on whether at least one function can be executed. For example, when a plurality of voice commands are identified in the list, the electronic device 101 generates the plurality of voice commands based on the states of the first voice engine 370 and the second voice engine. , can be classified into a first group to be executed by the first voice engine 370, and a second group to be executed by the second voice engine.
- the electronic device 101 may identify a plurality of voice commands corresponding to the quick command based on identifying the quick command from the user's utterance.
- the electronic device 101 is used for scheduling the plurality of voice commands based on information (e.g., metadata, and/or voice command database 430) stored in the memory 130 while registering a quick command.
- the above information can be obtained.
- the electronic device 101 executes the voice command controller 440 based on the information, and selects a voice engine to process each of the plurality of voice commands, using different voice engines (e.g., a first voice engine 370, and the second voice engine).
- Figure 7 is a block diagram of a program executed by the electronic device 101, according to one embodiment.
- the electronic device 101 of FIG. 7 may be an example of the electronic device 101 of FIGS. 3 to 6 .
- the first voice engine 370 of FIGS. 3 to 6 may include the first voice engine 370 of FIG. 7 .
- the voice command controller 440 and the voice command database 430 of FIG. 4 may include the voice command controller 440 and the voice command database 430 of FIG. 7 .
- the external electronic device 210 of FIG. 3 may include the external electronic device 210 of FIG. 7 .
- instructions and/or sub-routines of the voice command controller 440 executed by the electronic device 101 include the voice command manager 442, the voice engine scheduler 446, and/or It can be divided into utterance storage 448. Based on identifying the quick command registered in the electronic device 101 from the utterance, the electronic device 101 can execute the voice command controller 440.
- the electronic device 101 may obtain information required for execution of the voice command controller 440 from the voice command database 430. From the voice command database 430, the electronic device 101 may obtain information related to at least one voice command corresponding to the quick command.
- the electronic device 101 may obtain information related to at least one voice command corresponding to the quick command. Based on the information, the electronic device 101 may determine whether to process the at least one voice command based on a voice engine. For example, when the result of executing at least one function corresponding to a voice command is cached in the memory (e.g., memory 130 of FIG. 3) of the electronic device 101, the electronic device 101 may bypass processing of voice commands based on the voice engine and store the cached results in utterance storage 448. Whether to store the result in utterance storage 448 may depend on whether the result, as identified by electronic device 101, is valid.
- the electronic device 101 may decide to process the voice command based on the voice engine. there is.
- the electronic device 101 can identify a voice command to be processed by a voice engine among the plurality of voice commands based on execution of the voice command manager 442. there is.
- the electronic device 101 may perform scheduling for one or more voice commands to be processed by the voice engine.
- the scheduling is used to process the one or more voice commands among the first voice engine 370 executed by the electronic device 101 and the second voice engine 380 executed by the external electronic device 210.
- the operation of selecting the voice engine to be used may be included.
- the electronic device 101 sends a plurality of voice commands to the first cue 710 corresponding to the first voice engine 370 and the second voice engine ( It can be allocated to any one of the second queues 720 corresponding to 380).
- the first queue 710 is an area formed in the memory of the electronic device 101 (e.g., the memory 130 in FIG.
- the second queue 720 is another area formed in the memory, and may store information about one or more voice commands to be sequentially transmitted to the external electronic device 210.
- the electronic device 101 forms a first queue 710 and a second queue 720 having a first in first out (FIFO) data structure based on the execution of the voice engine scheduler 446. Although shown, the embodiment is not limited thereto.
- the electronic device 101 In a state in which the voice engine scheduler 446 is executed, the electronic device 101 is connected to at least one of the states of the first voice engine 370 and/or the second voice engine 380 and the properties of the voice command. Based on this, the voice command can be stored in either the first cue 710 or the second cue 720. For example, among the properties, based on the property with the onDevice supported name in Table 1, the electronic device 101 determines that the voice command is exclusively performed by a specific voice engine (e.g., the second voice engine 380). You can obtain whether it should be processed or not. In response to identifying a voice command with properties that should be processed exclusively by the second voice engine 380, the electronic device 101 may place the voice command in first queue 710 or second queue 720.
- a specific voice engine e.g., the second voice engine 380
- the electronic device 101 may classify voice commands based on whether the first voice engine 370 or the second voice engine 380 is in an idle state. For example, when the second voice engine 380 is in another state different from the idle state, the electronic device 101 may store at least one voice command to be processed by the voice engine in the first queue 710. there is.
- the electronic device 101 can calculate the total duration for which a plurality of voice commands included in the quick command are processed using a cost function. . To minimize the total period, the electronic device 101 may classify a plurality of voice commands into either the first queue 710 or the second queue 720.
- the cost function may depend on the states and/or performances of the first voice engine 370 and the second voice engine 380, respectively.
- the electronic device 101 directly or indirectly classifies the voice engines. It can be controlled with .
- the electronic device 101 may execute the first voice engine 370 based on at least one voice command accumulated in the first queue 710.
- the electronic device 101 may transmit at least one signal including at least one voice command accumulated in the second queue 720 to the external electronic device 210 executing the second voice engine 380. .
- the electronic device 101 can indirectly control the second voice engine 380 executed by the external electronic device 210.
- the electronic device 101 Based on the execution of the speech storage 448, the electronic device 101 stores the results of executing the voice commands stored in each of the first queue 710 and the second queue 720 in the memory of the electronic device 101. You can save it. Based on the execution of the speech storage 448, the electronic device 101 stores the result of processing at least one voice command accumulated in the first queue 710 using the first voice engine 370 in the memory. You can. Based on the execution of the speech storage 448, the electronic device 101 processes the result of processing at least one voice command included in the signal received from the external electronic device 210 and accumulated in the second queue 720. , can be stored in the memory.
- the electronic device 101 may store results of processing voice commands by different voice engines based on the order of processing a plurality of voice commands set while registering a quick command. For example, the results accumulated in the memory of the electronic device 101 based on execution of the speech storage 448 may be sorted based on the order of processing the plurality of voice commands. Based on the execution of the speech storage 448, the electronic device 101 may form a queue in the memory in which the results of processing a plurality of voice commands are stored.
- the electronic device 101 From the results of processing a plurality of voice commands accumulated in the memory of the electronic device 101 based on the speech storage 448 and corresponding to the quick command, the electronic device 101 generates an audio signal corresponding to each of the results. can be obtained. Based on the order of the plurality of voice commands, the electronic device 101 may sequentially transmit the audio signals to the speaker. Based on the sequential transmission of the audio signals, the electronic device 101 can output sounds representing the results through the speaker.
- the audio signals may include utterances (eg, natural language sentences) expressing the results.
- the electronic device 101 generates a plurality of voice commands corresponding to one quick command using each of the first voice engine 370 and the second voice engine 380. It can be processed separately. Since the plurality of voice commands are processed separately, the idle time of each of the first voice engine 370 and the second voice engine 380 can be minimized.
- FIGS. 8A and 8B an operation of registering a quick command by the electronic device 101 according to an embodiment, and a plurality of messages corresponding to the quick command in response to a statement including the quick command.
- An example of an operation for processing voice commands is described.
- FIGS. 8A and 8B illustrate example states 801 and 802 in which the electronic device 101 displays a user interface (UI), according to one embodiment.
- UI user interface
- the electronic device 101 of FIGS. 8A and 8B may be an example of the electronic device 101 of FIGS. 3 to 7 .
- the electronic device 101 and the display 260 of FIG. 3 may include the electronic device 101 and the display 260 of FIGS. 8A and 8B.
- 8A and 8B, different states 801 and 802 are shown in which the electronic device 101 displays a UI provided from an application for recognizing speech from an audio signal.
- the application may include one or more instructions for executing the input receiver 340, quick command processor 350, speech processor 360, and/or first voice engine 370 of FIG. 3.
- the electronic device 101 may display a screen for registering a quick command on the display 260.
- the electronic device 101 may receive information required for registration of a quick command from the user through the display 260.
- the electronic device 101 may display a text box 810 on the display 260 to obtain a natural language to be used to identify a quick command.
- the electronic device 101 may display a button 815 for obtaining the natural language based on the utterance.
- the electronic device 101 may obtain the natural language from an audio signal obtained using a microphone.
- the electronic device 101 may display a button 860 on the display 260 for adding one or more voice commands executed by a quick command.
- the electronic device 101 may obtain a voice command through text and/or audio signals.
- the electronic device 101 displays a visual object corresponding to each of the four voice commands in the display 260. (820, 830, 840, 850) can be displayed.
- the order in which the visual objects 820, 830, 840, and 850 are arranged within the display 260 may indicate the order of executing the fourth voice commands based on the quick command.
- the electronic device 101 may visualize voice commands to be executed by the quick command using visual objects 820 , 830 , 840 , and 850 .
- the visual object 820 may correspond to a voice command for executing an alarm-related function.
- the electronic device 101 may display a natural language sentence (eg, “Set the alarm”) representing the voice command.
- the electronic device 101 may display a button 822 for adjusting the order in which voice commands corresponding to the visual object 820 are executed.
- the electronic device 101 may display a button 824 for excluding the voice command corresponding to the visual object 820 from voice commands to be executed by the quick command.
- the electronic device 101 may display a natural language sentence (e.g., “Read the last text I received”) representing a voice command corresponding to the visual object 830.
- the electronic device 101 may display a natural language sentence (eg, “Register a reminder in 10 minutes”) expressing a voice command corresponding to the visual object 840.
- the electronic device 101 may display a natural language sentence (eg, “Tell me the weather”) expressing a voice command corresponding to the visual object 850.
- the utterance designated for a quick command is the visual objects 820, 830, 840, and 850.
- the length of the designated utterance may be shorter than the natural language sentences.
- the electronic device 101 may display a button 870 for registering a quick command based on information displayed in the display 260.
- the electronic device 101 may display a button 875 for bypassing registration of the quick command.
- the electronic device 101 may register a quick command executed by text entered into the text box 810.
- the electronic device 101 may perform the operations described above with reference to FIGS. 4 and 5 .
- the electronic device 101 may store information about voice commands corresponding to the visual objects 820, 830, 840, and 850 based on the execution of the speech processor 360 of FIGS. 3 and 4.
- the information may include metadata for the voice commands, described above with reference to Table 1, and/or properties for the voice commands, described above with reference to Table 2.
- the electronic device 101 may obtain properties for the voice commands, as shown in Table 3.
- the properties in Table 3 may be stored in metadata for voice commands corresponding to visual objects 820, 830, 840, and 850.
- the first voice command corresponding to the visual object 820 may include a natural language sentence related to an alarm (eg, “Set the alarm”). Since the natural language sentence does not include parameters (eg, time) for controlling the alarm, additional input of the parameters may be required to process the first voice command.
- the electronic device 101 may assign a specified value indicating that additional input for processing the first voice command is required within the attribute having the prompt name in the metadata corresponding to the first voice command. there is.
- the second voice command corresponding to the visual object 830 may include a natural language sentence related to short message service (SMS) (eg, “Read the last text message”).
- SMS short message service
- the second voice command may depend on the state of the electronic device 101 at the time of processing the second voice command (eg, a state in which at least one text message is stored in the electronic device 101).
- the electronic device 101 includes, within the metadata corresponding to the second voice command, an attribute having a device context name, a designated message indicating that it is required to identify the state of the electronic device 101 for processing the second voice command.
- a value can be assigned.
- the third voice command corresponding to the visual object 840 may include a natural language sentence related to an alarm (e.g., “Register a reminder in 10 minutes”).
- the function executed by the third voice command may be dependent on the timing of processing the third voice command.
- the electronic device 101 assigns a specified value to an attribute with the name time in the metadata corresponding to the third voice command, indicating that the current time of the electronic device 101 is required for processing the third voice command. It can be expressed.
- the fourth voice command corresponding to the visual object 850 may include a natural language sentence (eg, “Tell me the weather”) for acquiring information through the network.
- the function executed by the fourth voice command may require communication with an external electronic device different from the electronic device 101 (eg, a third-party server for providing weather information).
- the electronic device 101 may indicate that an external electronic device is required for processing the fourth voice command by assigning a specified value to an attribute with the name “onDevice supported” in the metadata corresponding to the fourth voice command. there is.
- the electronic device 101 can obtain the properties of the four voice commands matched to the quick command by the user. there is.
- state 801 of FIG. 8A based on identifying the “command” utterance from the audio signal output from the microphone, the electronic device 101 identifies the quick command, which is registered to execute the four voice commands. can do.
- FIG. 8B after state 801 of FIG. 8A, a state 802 is shown in which the electronic device 101 identifies the utterance (“command”) for executing the quick command.
- the electronic device 101 may obtain metadata for four voice commands corresponding to the quick command, described above with reference to Table 3. Based on the metadata, the electronic device 101 can match each of the voice commands with one of different voice engines.
- the voice engines include a first voice engine executed by the electronic device 101 (e.g., the first voice engine 370 in FIG. 3), and a second voice engine executed by an external electronic device different from the electronic device 101. It may include a voice engine (eg, the second voice engine 380 of FIG. 3).
- the electronic device 101 may allocate the voice commands to the voice engines to minimize the total period during which the voice commands are processed based on the voice engines. After the electronic device 101 allocates the voice commands to the voice engines, processing the voice commands directly or indirectly based on the voice engines is as described above with reference to FIGS. 6 to 7. It can be performed based on movement.
- the electronic device 101 may sequentially output audio signals representing the results of processing the voice commands according to the order of the voice commands. For example, the electronic device 101 may sequentially output the audio signals based on the order of the visual objects 820, 830, 840, and 850 in the state 801 of FIG. 8A.
- the electronic device 101 may visualize the results of processing voice commands in the display 260.
- the electronic device 101 can display visual objects 882, 884, 886, and 888 corresponding to each of the results in parallel through a visual object 880 in the form of a pop-up window. there is.
- the order and/or layout in which the electronic device 101 displays the visual objects 882, 884, 886, and 888 is not limited to the embodiment of FIG. 8B.
- the electronic device 101 may display information including the result of executing the first voice command.
- the electronic device 101 may display information including the result of executing the second voice command.
- the electronic device 101 may display information including the result of executing the third voice command.
- the electronic device 101 may display information including the result of executing the fourth voice command.
- the electronic device 101 may individually display the results of executing the voice commands in parallel using different voice engines using visual objects 882, 884, 886, and 888. For example, before the processing of the second voice command is completed, the electronic device 101 may display the result of processing the third voice command using the visual object 886.
- the electronic device 101 can quickly execute a plurality of voice commands matched to the quick command in response to an utterance including a quick command using different voice engines. .
- the electronic device 101 may output audio signals expressing the results of processing the plurality of voice commands using a speaker.
- the electronic device 101 may display the results in parallel on the display 260.
- the embodiment is not limited to this, and for example, the electronic device 101 may sequentially display the results on the display 260. Because the electronic device 101 displays the results in parallel through the display 260, the electronic device 101 executes other voice commands independently of failure in processing a specific voice command among the plurality of voice commands.
- the processed results can be displayed to the user.
- FIG. 9 illustrates an operation in which the electronic device 101 executes a plurality of functions based on a utterance 910, according to one embodiment.
- the electronic device 101 of FIG. 9 may be an example of the electronic device 101 of FIGS. 3 to 7 .
- the electronic device 101 and the display 260 of FIG. 3 may include the electronic device 101 and the display 260 of FIG. 9 .
- the external electronic device 210 of FIG. 9 may include the external electronic device 210 of FIG. 2 and/or FIG. 7 .
- the electronic device 101 may identify a utterance 910 including a plurality of voice commands.
- the statement 910 can be identified from an audio signal output through a microphone included in the electronic device 101.
- the electronic device 101 generates a function corresponding to the function for scheduling an alarm from the first part 912 (e.g., “Set the alarm for 7 o’clock”) in the natural language sentence included in the utterance 910.
- the first voice command 920 can be identified.
- the electronic device 101 generates a second voice command ( 930) can be identified.
- the electronic device 101 may identify the voice commands 920 and 930 from the utterance 910 based on the execution of the utterance processor 360 of FIG. 4 .
- the electronic device 101 may transmit the plurality of voice commands into different electronic Input may be made by at least one of the voice engines running on the devices.
- the voice engines include a first voice engine (e.g., first voice engine 370 in FIG. 3) executed by the electronic device 101, and an external electronic device 210 connected to the electronic device 101. may include a second voice engine (eg, the second voice engine 380 of FIG. 3).
- the electronic device 101 processes the first voice command 920 using the first voice engine, and generates first information 925 including the result of processing the first voice command 920. can be obtained.
- the electronic device 101 may transmit a signal including the second voice command 930 to the external electronic device 210 in order to process the second voice command 930 using the second voice engine.
- the electronic device 101 may receive another signal including second information 935 in response to the signal from the external electronic device 210.
- the second information 935 included in the other signal may include the result of processing the second voice command 930 by the external electronic device 210 executing the second voice engine.
- the electronic device 101 includes results of processing a plurality of voice commands (e.g., the first voice command 920 and the second voice command 930) identified from the utterance 910.
- a statement 940 including information may be output to the user.
- the electronic device 101 may output the speech 940 using an audio signal transmitted to a speaker.
- the utterance 940 is a natural language sentence 942 based on first information 925 that is the result of processing the first voice command 920 (e.g., "The alarm will sound at 7 o'clock "), and a natural language sentence 944 (e.g., "It's news. Today, company S is --) based on the second information 935 that is the result of processing the second voice command 930. You can.
- the electronic device 101 may make a utterance including a quick command (e.g., utterance 220 in FIG. 2) and/or a utterance expressing a plurality of voice commands (e.g., utterance 220 in FIG. 2). Based on the utterance 910 of 9, an input for processing a plurality of voice commands can be identified. Based on the input, the electronic device 101, based on the states of the first voice engine executed by the electronic device 101 and the second voice engine executed by the external electronic device 210, Multiple voice commands can be processed in parallel.
- the electronic device 101 sequentially processes the results of processing the plurality of voice commands in parallel according to the order of the plurality of voice commands sorted by the quick command. It can be displayed as .
- Figure 10 shows a flowchart of operations performed by an electronic device, according to one embodiment.
- the electronic device of FIG. 10 may include the electronic device 101 of FIGS. 1 to 9 . At least one of the operations of FIG. 10 may be executed by the electronic device 101 of FIG. 3 and/or the processor 120 within the electronic device 101 of FIG. 3 . In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.
- the electronic device may identify a first audio signal through a microphone.
- the first audio signal may represent vibration of an external space including the microphone.
- the microphone in operation 1010 is a microphone connected to the electronic device by wire, such as the microphone 320 of FIG. 3 or headphones, or a microphone connected wirelessly to the electronic device through the communication circuit 330 of FIG. 3, such as wireless earphones. It may include at least one microphone.
- the electronic device may determine whether an utterance for sequentially executing a plurality of functions has been identified from the first audio signal.
- the utterance may include natural language registered in the electronic device to execute a quick command.
- the utterance may include natural language sentences expressing a plurality of functions, such as utterance 910 of FIG. 9.
- the electronic device may perform a plurality of functions based on the processor. At least one first function among the functions may be executed.
- the electronic device is connected to an external electronic device (e.g., external electronic device 330 in FIG. 3) via a communication circuit (e.g., communication circuit 330 in electronic device 101 in FIG. 3).
- the electronic device 210 may be used to execute at least one second function, which is different from at least one first function, among a plurality of functions.
- the electronic device includes the plurality of functions matched to the utterance identified from the first audio signal, the at least one first function to be executed by a processor of the electronic device, and the plurality of functions matched to the utterance identified from the first audio signal. It can be distinguished by at least one secondary function.
- the electronic device's division of the plurality of functions into the at least one first function and the at least one second function may be performed based on the operations described above with reference to FIGS. 6 and 7 .
- the electronic device may perform the plurality of functions based on the states of a first voice engine executed by a processor and a second voice engine executed by an external electronic device, the at least one first function, and the at least one second function.
- the electronic device may process at least one first function classified among the plurality of functions using a first voice engine executed by a processor of the electronic device.
- the electronic device may transmit the at least one second function to an external electronic device and obtain a result of executing the at least one second function from the external electronic device.
- the electronic device displays a second function that expresses the results of executing at least one first function and at least one second function through a speaker.
- Audio signals can be output.
- the electronic device may output the second audio signals in response to the utterance identified through the first audio signal.
- Each of the second audio signals may include a natural language sentence expressing the results of processing each of the plurality of functions of operation 1020.
- the order of natural language sentences output from the electronic device through the second audio signals may correspond to the order of a plurality of functions indicated by the utterance.
- an utterance for executing a single function (e.g., a third function) is identified from the first audio signal (1020-No), based on operation 1060, the electronic device, the electronic device's processor, or an external electronic device Any one of the devices can be used to execute a third function corresponding to the first audio signal.
- the electronic device may identify the state of the first voice engine executed by the processor and/or the state of the second voice engine executed by the external electronic device. For example, the electronic device may execute the third function using a voice engine that is in an idle state among the first voice engine or the second voice engine.
- the electronic device may output a third audio signal expressing the result of executing the third function.
- the electronic device may output second audio signals and/or third audio signals based on the utterance included in the first audio signal.
- the embodiment is not limited thereto, and the electronic device visualizes the results of executing one or more functions corresponding to the utterance identified by operation 1020 using a display within the electronic device, as described above with reference to FIG. 8B. can do.
- Figure 11 shows a flowchart of operations performed by an electronic device, according to one embodiment.
- the electronic device of FIG. 11 may include the electronic device 101 of FIGS. 1 to 9 . At least one of the operations of FIG. 11 may be executed by the electronic device 101 of FIG. 3 and/or the processor 120 within the electronic device 101 of FIG. 3 . At least one of the operations in FIG. 11 may be related to at least one of the operations in FIG. 10 . In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.
- the electronic device may identify a utterance from an audio signal.
- the electronic device may identify utterances using a first voice engine executed by a processor of the electronic device, or may identify utterances using a second voice engine executed by an external electronic device connected to the electronic device. Identifying a utterance by an electronic device may include an operation of acquiring text data corresponding to the utterance.
- the electronic device may determine whether metadata corresponding to the utterance has been identified.
- the metadata may be identified using a speech processor (eg, speech processor 360 of FIG. 3) executed by the electronic device.
- the electronic device Within the state of identifying metadata corresponding to the utterance (1120-Yes), based on operation 1130, the electronic device generates a voice engine to process the utterance, based on the metadata identified by operation 1120. You can select .
- the electronic device may select a voice engine for executing one or more functions related to speech from among the first voice engine or the second voice engine.
- the electronic device can independently perform operation 1130 for each of the plurality of voice commands. For example, based on operation 1130, the voice engine through which each of the plurality of voice commands will be processed may be independently selected from the first voice engine or the second voice engine.
- the electronic device may select a voice engine through which speech will be processed, from the first voice engine or the second voice engine, based on properties in metadata. For example, when an attribute indicating that the utterance is selectively executed by a specific voice engine is identified through metadata, the electronic device may select a voice engine to process the utterance based on the attribute. If metadata corresponding to the utterance is not identified (1120-No), selection of the voice engine based operation 1130 may be bypassed.
- the electronic device may determine, from the utterance, whether a parameter required to execute a function corresponding to the utterance has been identified. If the parameter is not identified from the statement (1140-No), based on operation 1160, the electronic device may request the user for the parameter required to execute the function. If the parameter has been identified from an utterance, or if the function does not require any parameters (1140 - Yes), based on operation 1150, the electronic device may process the utterance using a voice engine. The electronic device may output results of processing the utterance (eg, results of executing one or more voice commands included in the utterance) using a speaker and/or display.
- results of processing the utterance eg, results of executing one or more voice commands included in the utterance
- Figure 12 shows a flowchart of operations performed by an electronic device, according to one embodiment.
- the electronic device of FIG. 12 may include the electronic device 101 of FIGS. 1 to 9 . At least one of the operations of FIG. 12 may be executed by the electronic device 101 of FIG. 3 and/or the processor 120 within the electronic device 101 of FIG. 3 . At least one of the operations in FIG. 12 may be related to at least one of the operations in FIGS. 10 and 11 . In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.
- the electronic device may identify a speech included in the first audio signal received through the microphone.
- the electronic device may perform operation 1210 of FIG. 12 similar to operation 1110 of FIG. 11 .
- the electronic device includes a first voice engine executed by a processor, based on identifying a plurality of voice commands from the utterance, and an external electronic device.
- the states of the second voice engine executed by can be obtained.
- the electronic device may perform operation 1220 based on identifying a quick command for executing the group of the plurality of voice commands.
- the states that the electronic device acquires based on operation 1220 may indicate whether the first voice engine and/or the second voice engine are in an idle state.
- the electronic device within operation 1230, generates a plurality of voice commands based on states, a first voice command corresponding to a first voice engine, and a second voice command. It can be classified into a second voice command corresponding to the voice engine.
- the electronic device is indicated by states and, based on the possibility of executing the respective functions of the first voice engine or the second voice engine, each of the plurality of functions is configured to perform at least one first function and at least one second function. It can be divided into 2 functions.
- the electronic device based on the feasibility of each function of the first voice engine or the second voice engine indicated by the states, configures each of the plurality of functions to at least one first function and at least one second function.
- the electronic device may perform scheduling for a plurality of voice commands in operation 1220.
- the electronic device may store the first voice command and the second voice command in different areas (e.g., the first queue 710 and the second queue 720 of FIG. 7) within the memory of the electronic device.
- the electronic device may execute a first function corresponding to a first voice command using a first voice engine. Since the first voice engine is executed by the processor of the electronic device, the electronic device may selectively process the first voice command among the plurality of voice commands based on operation 1240.
- the electronic device may request execution of a second function corresponding to the second voice command from an external electronic device.
- the electronic device may request execution of the second function by transmitting information related to the second voice command to the external electronic device executing the second voice engine.
- the electronic device may receive a signal and/or information including a result of executing the second function from an external electronic device in response to the request.
- the order in which the electronic device executes operations 1240 and 1250 is not limited to the embodiment of FIG. 12 .
- the electronic device may perform operations 1240 and 1250 in parallel.
- the electronic device executes the first function and the second function based on the order of a plurality of voice commands indicated by the utterance.
- a second audio signal representing the sound may be output.
- the sound output by the electronic device through the second audio signal may include natural language sentences expressing the results of executing the first function and the second function.
- the order in which the natural language sentences are output may depend on the order in which the plurality of voice commands are processed, which is set within the quick command registration state.
- FIG. 13 is a block diagram showing an artificial intelligence (AI) system according to an embodiment.
- AI artificial intelligence
- the artificial intelligence system 10 of one embodiment may include a user terminal 1300, an intelligent server 1400, and a service server 1500.
- the user terminal 1300 (e.g., the electronic device 101 in FIG. 1) of one embodiment may be a terminal device (or electronic device) capable of connecting to the Internet, for example, a mobile phone, a smartphone, or a personal digital assistant (PDA). It may be a digital assistant), laptop computer, TV, white goods, wearable device, HMD, or smart speaker.
- a terminal device or electronic device capable of connecting to the Internet
- PDA personal digital assistant
- It may be a digital assistant
- laptop computer TV, white goods, wearable device, HMD, or smart speaker.
- the user terminal 1300 may include a communication interface 1310, a microphone 1320, a speaker 1330, a display 1340, a memory 1350, and a processor 1360.
- the components listed above may be operatively or electrically connected to each other.
- the communication interface 1310 may be connected to an external device and configured to transmit and receive data.
- the microphone 1320 may receive sound (eg, a user's speech) and convert it into an electrical signal.
- the speaker 1330 may output an electrical signal as sound (eg, voice).
- display 1340 may be configured to display images or videos.
- the display 1340 may display a graphic user interface (GUI) of an app (or application program) that is being executed.
- GUI graphic user interface
- Display 1340 in one embodiment may be configured to display images or video.
- the display 1340 of one embodiment may also display a graphic user interface (GUI) of an app (or application program) that is being executed.
- GUI graphic user interface
- the display 1340 in one embodiment may receive a touch input through a touch sensor.
- the display 1340 may receive text input through a touch sensor in the on-screen keyboard area displayed within the display 1340.
- the memory 1350 may store a client module 1351, a software development kit (SDK) 1353, and a plurality of apps 1355.
- the client module 1351 and SDK 1353 may form a framework (or solution program) for performing general functions. Additionally, the client module 1351 or SDK 1353 may configure a framework for processing user input (eg, voice input, text input, touch input).
- the plurality of apps 1355 may be programs for performing designated functions.
- the plurality of apps 1355 may include a first app 1355_1 and a second app 1355_3.
- each of the plurality of apps 1355 may include a plurality of operations to perform a designated function.
- the plurality of apps 1355 may include at least one of an alarm app, a message app, and a schedule app.
- the plurality of apps 1355 are executed by the processor 1360 to sequentially execute at least some of the plurality of operations.
- the processor 1360 may control the overall operation of the user terminal 1300.
- the processor 1360 may be electrically connected to the communication interface 1310, the microphone 1320, the speaker 1330, the display 1340, and the memory 1350 to perform designated operations.
- the processor 1360 may also execute a program stored in the memory 1350 to perform a designated function.
- the processor 1360 may execute at least one of the client module 1351 or the SDK 1353 and perform the following operations to process user input.
- the processor 1360 may control the operation of a plurality of apps 1355 through, for example, the SDK 1353.
- the following operations described as operations of the client module 1351 or SDK 1353 may be operations performed by the processor 1360.
- the client module 1351 may receive user input.
- the client module 1351 may generate a voice signal corresponding to a user utterance detected through the microphone 1320.
- the client module 1351 may receive a touch input detected through the display 1340.
- the client module 1351 may receive text input detected through a keyboard or visual keyboard.
- the client module 1351 may receive various types of user input detected through an input module included in the user terminal 1300 or an input module connected to the user terminal 1300.
- the client module 1351 may transmit the received user input to the intelligent server 1400.
- the client module 1351 may transmit status information of the user terminal 1300 to the intelligent server 1400 along with the received user input.
- the status information may be, for example, execution status information of an app.
- the client module 1351 may receive a result corresponding to the received user input.
- the client module 1351 may receive a result corresponding to the user input from the intelligent server 1400.
- the client module 1351 may display the received result on the display 1340. Additionally, the client module 1351 may output the received result as audio through the speaker 1330.
- the client module 1351 may receive a plan corresponding to the received user input.
- the client module 1351 can display the results of executing a plurality of operations of the app according to the plan on the display 1340.
- the client module 1351 can sequentially display execution results of a plurality of operations on a display and output audio through the speaker 1330.
- the user terminal 1300 may display only partial results of executing a plurality of operations (eg, the result of the last operation) and output audio through the speaker 1330.
- the client module 1351 may receive a request from the intelligent server 1400 to obtain information necessary to calculate a result corresponding to the user input.
- Information needed to calculate the result may be, for example, status information of the user terminal 1300.
- the client module 1351 may transmit the necessary information to the intelligent server 1400 in response to the request.
- the client module 1351 may transmit information as a result of executing a plurality of operations according to a plan to the intelligent server 1400.
- the intelligent server 1400 can confirm that the received user input has been correctly processed through the result information.
- the client module 1351 may include a voice recognition module. According to one embodiment, the client module 1351 can recognize voice input that performs a limited function through the voice recognition module. For example, the client module 1351 may run an intelligent app to process voice input to perform an organic action through a designated input (e.g., wake up!).
- the intelligent server 1400 may receive information related to the user's voice input from the user terminal 1300 through a communication network. According to one embodiment, the intelligent server 1400 may change data related to the received voice input into text data. According to one embodiment, the intelligent server 1400 may create a plan for performing a task corresponding to the user's voice input based on the text data.
- the plan may be generated by an artificial intelligence (AI) system.
- Artificial intelligence systems may be rule-based systems, neural network-based systems (e.g., feedforward neural networks (FNN)), recurrent neural networks, etc. (RNN)). Alternatively, it may be a combination of the above or a different artificial intelligence system.
- a plan may be selected from a set of predefined plans or may be generated in real time in response to a user request. For example, an artificial intelligence system can select at least one plan from a plurality of predefined plans.
- the intelligent server 1400 may transmit a result calculated according to the generated plan to the user terminal 1300 or transmit the generated plan to the user terminal 1300.
- the user terminal 1300 may display results calculated according to the plan on the display.
- the user terminal 1300 may display the results of executing an operation according to the plan on the display.
- the intelligent server 1400 of one embodiment includes a front end 1410, a natural language platform 1420, a capsule DB 1430, an execution engine 1440, It may include an end user interface (1450), a management platform (1460), a big data platform (1470), and an analytic platform (1480).
- the front end 1410 may receive user input received from the user terminal 1300.
- the front end 1410 may transmit a response corresponding to the user input.
- the natural language platform 1420 includes an automatic speech recognition module (ASR module) 1421, a natural language understanding module (NLU module) 1423, and a planner module ( It may include a planner module (1425), a natural language generator module (NLG module) (1427), and a text to speech module (TTS module) (1429).
- ASR module automatic speech recognition module
- NLU module natural language understanding module
- TTS module text to speech module
- the automatic voice recognition module 1421 may convert voice input received from the user terminal 1300 into text data.
- the natural language understanding module 1423 can determine the user's intention using text data of voice input. For example, the natural language understanding module 1423 may determine the user's intention by performing syntactic analysis or semantic analysis on user input in the form of text data.
- the natural language understanding module 1423 uses linguistic features (e.g., grammatical elements) of morphemes or phrases to determine the meaning of a word extracted from user input, and matches the meaning of the identified word to intent. You can determine the user's intention by doing this.
- the natural language understanding module 1423 can acquire intent information corresponding to the user's utterance.
- Intention information may be information indicating the user's intention determined by interpreting text data.
- Intent information may include information indicating an action or function that the user wishes to perform using the device.
- the planner module 1425 may generate a plan using the intent and parameters determined by the natural language understanding module 1423. According to one embodiment, the planner module 1425 may determine a plurality of domains required to perform the task based on the determined intention. The planner module 1425 may determine a plurality of operations included in each of the plurality of domains determined based on the intention. According to one embodiment, the planner module 1425 may determine parameters required to execute the determined plurality of operations or result values output by executing the plurality of operations. The parameters and the result values may be defined as concepts related to a specified format (or class). Accordingly, the plan may include a plurality of operations and a plurality of concepts determined by the user's intention.
- the planner module 1425 may determine the relationship between the plurality of operations and the plurality of concepts in a stepwise (or hierarchical) manner. For example, the planner module 1425 may determine the execution order of a plurality of operations determined based on the user's intention based on a plurality of concepts. In other words, the planner module 1425 may determine the execution order of the plurality of operations based on parameters required for execution of the plurality of operations and results output by executing the plurality of operations. Accordingly, the planner module 1425 may generate a plan that includes association information (eg, ontology) between a plurality of operations and a plurality of concepts. The planner module 1425 can create a plan using information stored in the capsule database 1430, which stores a set of relationships between concepts and operations.
- association information eg, ontology
- the natural language generation module 1427 may change designated information into text form.
- the information changed to the text form may be in the form of natural language speech.
- the text-to-speech conversion module 1429 in one embodiment can change information in text form into information in voice form.
- the capsule database 1430 may store information about the relationship between a plurality of concepts and operations corresponding to a plurality of domains.
- the capsule database 1430 may store a plurality of capsules including a plurality of action objects (action objects or action information) and concept objects (concept objects or concept information) of the plan.
- the capsule database 1430 may store the plurality of capsules in the form of CAN (concept action network).
- a plurality of capsules may be stored in a function registry included in the capsule database 1430.
- the capsule database 1430 may include a strategy registry in which strategy information necessary for determining a plan corresponding to a voice input is stored.
- the strategy information may include standard information for determining one plan when there are multiple plans corresponding to user input.
- the capsule database 1430 may include a follow up registry in which information on follow-up actions is stored to suggest follow-up actions to the user in a specified situation.
- the follow-up action may include, for example, follow-up speech.
- the capsule database 1430 may include a layout registry that stores layout information of information output through the user terminal 1300.
- the capsule database 1430 may include a vocabulary registry where vocabulary information included in capsule information is stored.
- the capsule database 1430 may include a dialogue registry in which information about dialogue (or interaction) with a user is stored.
- the capsule database 1430 may update stored objects through a developer tool.
- the developer tool may include, for example, a function editor for updating operation objects or concept objects.
- the developer tool may include a vocabulary editor for updating the vocabulary.
- the developer tool may include a strategy editor that creates and registers a strategy for determining the plan.
- the developer tool may include a dialogue editor that creates a dialogue with the user.
- the developer tool may include a follow up editor that can edit follow-up utterances to activate follow-up goals and provide hints. The subsequent goal may be determined based on currently set goals, user preferences, or environmental conditions.
- the capsule database 1430 may also be implemented within the user terminal 1300.
- the user terminal 1300 may include a capsule database 1430 that stores information for determining an operation corresponding to a voice input.
- the execution engine 1440 may calculate a result using the generated plan.
- the end user interface 1450 may transmit the calculated result to the user terminal 1300.
- the user terminal 1300 can receive the result and provide the received result to the user.
- the management platform 1460 can manage information used in the intelligent server 1400.
- the big data platform 1470 may collect user data.
- the analysis platform 1480 may manage quality of service (QoS) of the intelligent server 1400.
- analytics platform 1480 may manage the components and processing speed (or efficiency) of intelligent server 1400.
- the service server 1500 may provide a designated service (eg, food ordering or hotel reservation) to the user terminal 1300.
- the service server 1500 may be a server operated by a third party.
- the service server 1500 may include a first service server 1501, a second service server 1503, and a third service server 1505 operated by different third parties.
- the service server 1500 may provide the intelligent server 1400 with information for creating a plan corresponding to the received voice input.
- the provided information may be stored in capsule database 1430, for example. Additionally, the service server 1500 may provide result information according to the plan to the intelligent server 1400.
- the user terminal 1300 can provide various intelligent services to the user in response to user input.
- the user input may include, for example, input through a physical button, touch input, or voice input.
- the user terminal 1300 may provide a voice recognition service through an internally stored intelligent app (or voice recognition app).
- the user terminal 1300 may recognize a user utterance or voice input received through the microphone and provide a service corresponding to the recognized voice input to the user. .
- the user terminal 1300 may perform a designated operation alone or together with the intelligent server and/or service server based on the received voice input. For example, the user terminal 1300 may run an app corresponding to a received voice input and perform a designated operation through the executed app.
- the user terminal 1300 when the user terminal 1300 provides a service together with the intelligent server 1400 and/or the service server, the user terminal detects a user utterance using the microphone 1320, A signal (or voice data) corresponding to the detected user utterance may be generated. The user terminal may transmit the voice data to the intelligent server 1400 using the communication interface 1310.
- the intelligent server 1400 in response to a voice input received from the user terminal 1300, creates a plan for performing a task corresponding to the voice input, or performs an operation according to the plan. can produce one result.
- the plan may include, for example, a plurality of operations for performing a task corresponding to a user's voice input, and a plurality of concepts related to the plurality of operations.
- the concept may define parameters input to the execution of the plurality of operations or result values output by the execution of the plurality of operations.
- the plan may include association information between a plurality of operations and a plurality of concepts.
- the user terminal 1300 in one embodiment may receive the response using the communication interface 1310.
- the user terminal 1300 outputs a voice signal generated inside the user terminal 1300 to the outside using the speaker 1330, or outputs an image generated inside the user terminal 1300 to the outside using the display 1340. It can be output as .
- FIG. 14 is a diagram illustrating a schema for database of relationship information between concepts and operations, according to various embodiments.
- the capsule database (e.g., capsule database 1430 of FIG. 13) of the intelligent server may store a plurality of capsules in the form of a concept action network (CAN) 1600.
- the capsule database may store operations for processing tasks corresponding to the user's voice input, and parameters necessary for the operations in CAN (concept action network) format.
- the CAN may represent an organic relationship between an action and a concept that defines the parameters necessary to perform the action.
- the capsule database may store a plurality of capsules (eg, Capsule A (1601), Capsule B (1604)) corresponding to each of a plurality of domains (eg, applications).
- one capsule eg, Capsule A (1601)
- one capsule is connected to at least one service provider (e.g., CP 1 (1602), CP 2 (1603), CP 3 (1606), or CP 4 (1605)) to perform the functions of the domain related to the capsule. can be responded to.
- one capsule may include at least one operation 1610 and at least one concept 1620 for performing a designated function.
- a natural language platform may generate a plan for performing a task corresponding to a received voice input using a capsule stored in a capsule database.
- the planner module of the natural language platform e.g., planner module 1425 in FIG. 13
- plan 1607 is created using the operations 1711, 1713 and concepts 1712, 1714 of Capsule A (1601) and the operations 1741 and concepts 1742 of Capsule B (1604). can be created.
- Figure 15 is a diagram showing a screen on which a user terminal processes voice input received through an intelligent app according to various embodiments.
- the user terminal 1300 may run an intelligent app to process user input through an intelligent server (eg, the intelligent server 1400 in FIG. 13).
- an intelligent server eg, the intelligent server 1400 in FIG. 13.
- the user terminal 1300 processes the voice input.
- a designated voice input e.g., wake up! or receives an input through a hardware key (e.g., a dedicated hardware key)
- the user terminal 1300 processes the voice input.
- You can run intelligent apps for example, the user terminal 1300 may run an intelligent app while executing a schedule app.
- the user terminal 1300 may display an object (e.g., an icon) 1511 corresponding to an intelligent app on a display (e.g., the display 1340 of FIG. 13).
- the user terminal 1300 may receive voice input through a user's utterance.
- the user terminal 1300 may receive a voice input saying “Tell me this week’s schedule!”
- the user terminal 1300 may display a user interface (UI) 1513 (e.g., input window) of an intelligent app displaying text data of a received voice input on the display.
- UI user interface
- the user terminal 1300 may display a result corresponding to the received voice input on the display.
- the user terminal 1300 may receive a plan corresponding to the received user input and display 'this week's schedule' on the display according to the plan.
- an electronic device e.g., electronic device 101 of FIG. 3
- a communication circuit e.g., communication circuit 330 of FIG. 3
- a microphone e.g. , may include a microphone 320 in FIG. 3
- a speaker e.g., speaker 310 in FIG. 3
- a processor e.g., processor 120 in FIG. 3
- the processor may be configured to identify a first audio signal through the microphone.
- the processor based on identifying an utterance (e.g., utterance 220 in FIG. 2) for sequentially executing a plurality of functions from the first audio signal, executes at least one first function of the plurality of functions. Can be configured to run. Based on identifying the utterance, the processor performs at least one of the plurality of functions using an external electronic device (e.g., external electronic device 210 of FIG. 3) connected through the communication circuit. Can be configured to perform at least one second function that is different from the first function. The processor sequentially transmits, through the speaker, second audio signals representing results of executing the at least one first function and the at least one second function, in an order related to the utterance. Can be configured to output. According to one embodiment, the electronic device can execute a plurality of functions more quickly using the electronic device and an external electronic device based on a statement for executing the plurality of functions.
- the processor may include a first voice engine (e.g., the first voice engine 370 in FIG. 3) executed by the processor, and a second voice engine (e.g., the first voice engine 370 in FIG. 3) executed by the external electronic device. It may be configured to obtain the states of the second voice engine 380 of 3. The processor, based on the conditions, configures the plurality of functions into the at least one first function to be executed by the first voice engine and the at least one second function to be executed by the second voice engine. It can be configured to distinguish.
- a first voice engine e.g., the first voice engine 370 in FIG. 3
- a second voice engine e.g., the first voice engine 370 in FIG. 3 executed by the external electronic device. It may be configured to obtain the states of the second voice engine 380 of 3.
- the processor based on the conditions, configures the plurality of functions into the at least one first function to be executed by the first voice engine and the at least one second function to be executed by the second voice engine. It can be configured
- the processor may perform each of the plurality of functions based on whether each of the first voice engine or the second voice engine, indicated by the states, is capable of executing a function. , It may be configured to be divided into the at least one first function, and the at least one second function.
- the processor sends at least one first voice command corresponding to the at least one first function to a first cue (e.g., first cue 710 in FIG. 7) corresponding to the first voice engine. It can be configured to allocate as .
- the processor allocates at least one second voice command corresponding to the at least one second function, or to a second cue (e.g., second queue 720 in FIG. 7) corresponding to the second voice engine. It can be configured to do so.
- the processor may be configured to execute the first voice engine based on the at least one first voice command accumulated in the first queue, among the voice commands.
- the processor may be configured to transmit, among the voice commands, at least one signal including the at least one second voice command accumulated in the second queue to the external electronic device. .
- the electronic device may further include memory (eg, memory 130 of FIG. 3).
- the processor may be configured to store information in a memory, including the results, with the results arranged according to the order.
- the information may include a queue in which utterances included in the second audio signals are arranged in the order.
- the electronic device may further include a display (eg, display 260 in FIG. 3).
- the processor may be configured to display, in the display, different visual objects containing the results (e.g., visual objects 882, 884, 886, 888 of FIG. 8) in parallel. .
- a method of an electronic device identifies a speech included in the first audio signal based on receiving the first audio signal through a microphone of the electronic device. It may include an operation (e.g., operation 1210 of FIG. 12).
- the method includes a first voice engine executed by a processor of the electronic device to process voice commands based on identifying a plurality of voice commands from the utterance, and an external electronic device connected through a communication circuit of the electronic device.
- An operation of obtaining the states of a second voice engine executed by the device (e.g., operation 1220 of FIG. 12) may be included.
- the method provides information for dividing the plurality of voice commands into a first voice command corresponding to the first voice engine and a second voice command corresponding to the second voice engine, based on the states. It may include an acquisition operation (eg, operation 1230 of FIG. 12).
- the method based on the information, executes a first function corresponding to the first voice command using the first voice engine, and executes a second function corresponding to the second voice command with the external electronic device. It may include an operation requesting execution of (e.g., operations 1240 and 1250 of FIG. 12).
- the method includes a second function representing the results of executing the first function and the second function, based on the order of the plurality of voice commands indicated by the utterance, through the speaker of the electronic device. It may include an operation of outputting an audio signal (e.g., operation 1260 of FIG. 12).
- the operation of identifying the utterance may include, in response to receiving the first audio signal, requesting text data corresponding to the first audio signal from the external electronic device.
- the operation of identifying the utterance may include identifying the utterance based on the text data received from the external electronic device.
- the requesting operation may include requesting execution of the second function from the external electronic device and then receiving information including a result of executing the second function from the external electronic device. there is.
- the requesting operation may include storing the information received from the external electronic device in the memory based on the order.
- the operation of obtaining the information may include obtaining the information based on the states and whether each of the first voice engine or the second voice engine is capable of executing the plurality of voice commands. It may include actions such as:
- the output operation may include an operation of displaying visual objects corresponding to each of the results of executing the first function and the second function on the display of the electronic device.
- the method of an electronic device may include an operation of identifying a first audio signal through a microphone of the electronic device (e.g., operation 1010 of FIG. 10).
- the method includes executing at least one first function of the plurality of functions based on a processor of the electronic device based on identifying an utterance for sequentially executing the plurality of functions from the first audio signal. It may include an operation (e.g., operation 1030 of FIG. 10).
- the method based on identifying the utterance, uses an external electronic device connected through a communication circuit of the electronic device to perform at least one second function, among the plurality of functions, that is different from the at least one first function.
- the method may include an operation to execute a function (e.g., operation 1040 of FIG. 10).
- the method sequentially transmits second audio signals representing results of executing the at least one first function and the at least one second function through a speaker of the electronic device in an order related to the utterance. It may include an output operation (e.g., operation 1050 of FIG. 10).
- the method may include obtaining the states of a first voice engine executed by the processor and a second voice engine executed by the external electronic device.
- the method may, based on the conditions, convert each of the plurality of functions into the at least one first function to be executed by the first voice engine and the at least one second function to be executed by the second voice engine. It can include actions that are classified by function.
- the operation of distinguishing each of the plurality of functions is based on whether each of the first voice engine or the second voice engine, indicated by the states, is capable of executing the function. It may include an operation of dividing into at least one first function and the at least one second function.
- the distinguishing operation may include allocating at least one first voice command corresponding to the at least one first function to a first cue corresponding to the first voice engine.
- the distinguishing operation may include an operation of allocating at least one second voice command corresponding to the at least one second function, or to a second cue corresponding to the second voice engine.
- the outputting operation may include expressing the results and outputting the second audio signals based on the queue of utterances arranged according to the order.
- an electronic device e.g., electronic device 101 in FIG. 3 includes a communication circuit (e.g., communication circuit 330 in FIG. 3) and a microphone (e.g., It may include a microphone 320 in FIG. 3), a speaker (eg, speaker 310 in FIG. 3), and a processor (eg, processor 120 in FIG. 3).
- the processor may be configured to identify a utterance included in the first audio signal (e.g., utterance 220 of FIG. 2) based on receiving the first audio signal through the microphone.
- the processor is configured to process a voice command based on identifying a plurality of voice commands (e.g., voice commands 230 and 240 of FIG. 2) from the utterance.
- a first voice engine executed by a processor (e.g., first voice engine 370 in FIG. 3), and executed by an external electronic device (e.g., external electronic device 210 in FIG. 3) connected through the communication circuit. It may be configured to obtain the states of the second voice engine (eg, the second voice engine 380 of FIG. 3).
- the processor provides information for dividing the plurality of voice commands into a first voice command corresponding to the first voice engine and a second voice command corresponding to the second voice engine, based on the states. It can be configured to obtain.
- the processor executes a first function corresponding to the first voice command using the first voice engine, and executes a second function corresponding to the second voice command to the external electronic device. It can be configured to request execution of .
- the processor is configured to represent, through the speaker, the results of executing the first function and the second function based on the order of the plurality of voice commands indicated by the utterance. 2 may be configured to output an audio signal.
- the electronic device may further include memory (eg, memory 130 of FIG. 3).
- the processor may be configured to identify, within the memory, a designated utterance for triggering execution of the plurality of voices based on the order.
- the processor may be configured to identify the plurality of voice commands matched to the utterance based on whether the utterance included in the first audio signal corresponds to the specified utterance.
- the processor may be configured to receive information including a result of executing the second function from the external electronic device after requesting the external electronic device to execute the second function.
- the processor may be configured to store the information received from the external electronic device in the memory based on the order.
- the designated utterance may include another word that is different from one or more words associated with the plurality of voice commands.
- the processor may be configured to obtain the information based on the conditions and whether each of the first voice engine or the second voice engine is capable of executing the plurality of voice commands. You can.
- the processor may be configured to request text data corresponding to the first audio signal from the external electronic device in response to receiving the first audio signal.
- the processor may be configured to identify the utterance based on the text data received from the external electronic device.
- the electronic device may further include a display.
- the processor may be configured to display visual objects corresponding to each of the first function and results of executing the second function, along with the second audio signal output through the speaker, in the display. there is.
- the processor may execute each of the plurality of voice commands based on latency times of the first voice engine and the second voice engine corresponding to the states. It may be configured to obtain the information by matching with either the first voice engine or the second voice engine.
- the processor obtains the information indicating a schedule for executing the plurality of voice commands using at least one of the first voice engine or the second voice engine, based on the conditions. It can be configured to do so.
- the electronic device may include a communication circuit, a microphone, a speaker, a memory for storing at least one instruction, and at least one processor.
- the at least one processor may be configured to execute the at least one instruction and identify the first audio signal received through the microphone.
- the at least one processor executes the at least one instructions to perform at least one first of the plurality of functions based on identifying a first audio signal containing an utterance for sequentially executing the plurality of functions. It can be configured to perform a function.
- the at least one processor executes the at least one instructions to perform at least one of the plurality of functions of the external electronic device via the communication circuit based on identifying that the first audio signal includes utterance. It may be configured to perform a second function.
- the at least one second function may be different from the at least one first function among the plurality of functions.
- the at least one processor executes the at least one instruction and represents at least one result of performing the at least one first function and the at least one second function through the speaker. 2 may be configured to sequentially output audio signals based on the order associated with the utterance.
- the at least one processor executes the at least one instruction to obtain a state of a first voice engine running in the electronic device and a state of a second voice engine running by the external electronic device.
- the at least one processor executes the at least one instruction to perform the plurality of functions to be performed by the first voice engine, based on the state of the first voice engine and the state of the second voice engine. It may be configured to distinguish between at least one first function and the at least one second function to be executed by the second voice engine.
- the state of the first voice engine indicates whether the first voice engine can perform any one of the plurality of functions
- the state of the second voice engine indicates whether the second voice engine is capable of performing one of the plurality of functions. It may indicate whether the engine can perform one of the plurality of functions.
- the at least one processor executes the at least one instructions to assign at least one first voice command corresponding to the at least one first function to a first queue corresponding to the first voice engine.
- the at least one processor is configured to execute the at least one instruction to allocate at least one second voice command corresponding to the at least one second function, or to a second queue corresponding to the second voice engine. It can be.
- the at least one processor is configured to execute the at least one instruction to execute the first voice engine based on the at least one first voice command accumulated in the first queue. It can be.
- the at least one processor executes the at least one instruction to transmit at least one signal representing the at least one second voice command accumulated in the second queue to the external electronic device, It can be configured.
- the electronic device may include memory.
- the at least one processor may be configured to execute the at least one instruction and store information including the at least one result in the memory based on the order.
- the information may include a queue in which at least one utterance in at least one second audio signal is arranged based on the order.
- the electronic device may include a display.
- the at least one processor may be configured to execute the at least one instruction and control the display to display different visual objects for displaying the at least one result in parallel.
- a method of an electronic device includes an operation of identifying a speech in the first audio signal based on receiving the first audio signal through a microphone of the electronic device. may include.
- the method includes, based on identifying a plurality of voice commands from the utterance, obtaining a state of a first voice engine executed by the electronic device and a state of a second voice engine executed by an external electronic device.
- the external electronic device may be connected through a communication circuit of the electronic device.
- the method includes, based on the state of the first voice engine and the state of the second voice engine, the plurality of voice commands, at least one first voice command performed by the first voice engine, and the second voice command.
- the method may include an operation of obtaining information to distinguish a second voice command performed by a voice engine.
- the method includes executing a first function corresponding to the at least one first voice command using the first voice engine, and executing a second function corresponding to the at least one second voice command using an external command. It may include an operation requesting an electronic device.
- the method includes representing at least one result of executing the first function and the second function, based on the order of the plurality of voice commands indicated by the utterance, through the speaker of the electronic device. ) may include the operation of outputting at least one second audio signal.
- the operation of obtaining the state of the first voice engine and the state of the second voice engine triggers performance of the plurality of voice commands based on the order in the memory of the electronic device. It may include an operation to identify a designated utterance to make.
- Obtaining the state of the first voice engine and the state of the second voice engine includes identifying the plurality of voice commands based on whether the utterance in the first audio signal corresponds to the specified utterance. may include.
- requesting the external electronic device to perform the second function may include receiving information including a result of performing the second function from the external electronic device.
- the operation of requesting the external electronic device to perform the second function may include storing the information in the memory of the electronic device based on the order.
- the state of the first voice engine indicates whether the first voice engine can perform one of a plurality of functions
- the state of the second voice engine indicates whether the second voice engine is capable of performing the plurality of functions. It can indicate whether one of a plurality of functions can be performed.
- the operation of outputting the at least one second audio signal may be performed by the electronic device to display at least one visual object showing a result of performing the second function and a result of performing the first function. It may include an operation to control the display of .
- a method of an electronic device may include an operation of identifying a first audio signal received through a microphone of the electronic device.
- the method may include performing at least one first function from the plurality of functions based on identifying that the first audio signal includes an utterance for sequentially performing the plurality of functions. there is.
- the method based on identifying that the first audio signal includes the utterance, performs at least one second function of the plurality of functions in an external electronic device connected through a communication circuit of the external electronic device. It may include actions to be performed.
- the at least one second function may be different from the at least one first function.
- the method includes at least one second function that expresses at least one result of performing the at least one first function and the at least one second function based on a remark related to the remark through a speaker of the electronic device. It may include an operation of sequentially outputting audio signals.
- the method may include obtaining the state of a first voice engine running in the electronic device and the state of a second voice engine running in the external electronic device.
- the method includes, based on the state of the first voice engine and the state of the second voice engine, the plurality of functions, the at least one first function to be performed by the first voice engine and the second voice. It may include an operation to classify the at least one second function to be performed by the engine.
- the state of the first voice engine may indicate whether the first voice engine can execute any one of the plurality of functions.
- the state of the second voice engine may indicate whether the second voice engine will execute one of the plurality of functions.
- the distinguishing operation may include an operation of allocating at least one first voice command corresponding to the at least one first function to a first cue corresponding to the first voice engine.
- the distinguishing operation may include an operation of allocating a second voice command corresponding to the at least one second function to a second cue corresponding to the second voice engine.
- the outputting operation includes outputting the at least one utterance expressing the at least one result according to the utterance, based on a cue of at least one utterance in the at least one second audio signal.
- the outputting operation includes outputting the at least one utterance expressing the at least one result according to the utterance, based on a cue of at least one utterance in the at least one second audio signal.
- Electronic devices may be of various types.
- Electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, wearable devices, or home appliances.
- Electronic devices according to embodiments of this document are not limited to the above-described devices.
- first, second, or first or second may be used simply to distinguish one component from another, and to refer to that component in other respects (e.g., importance or order) is not limited.
- One (e.g., first) component is said to be “coupled” or “connected” to another (e.g., second) component, with or without the terms “functionally” or “communicatively.”
- any of the components can be connected to the other components directly (e.g. wired), wirelessly, or through a third component.
- module used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as logic, logic block, component, or circuit, for example. It can be used as A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- Various embodiments of the present document are one or more instructions stored in a storage medium (e.g., built-in memory 136 or external memory 138) that can be read by a machine (e.g., electronic device 101). It may be implemented as software (e.g., program 140) including these.
- a processor e.g., processor 120
- the one or more instructions may include code generated by a compiler or code that can be executed by an interpreter.
- a storage medium that can be read by a device may be provided in the form of a non-transitory storage medium.
- 'non-transitory' only means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves), and this term refers to cases where data is semi-permanently stored in the storage medium. There is no distinction between temporary storage cases.
- Computer program products are commodities and can be traded between sellers and buyers.
- the computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- a machine-readable storage medium e.g. compact disc read only memory (CD-ROM)
- an application store e.g. Play StoreTM
- two user devices e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
- each component (e.g., module or program) of the above-described components may include a single or plural entity, and some of the plurality of entities may be separately placed in other components. there is.
- one or more of the components or operations described above may be omitted, or one or more other components or operations may be added.
- multiple components eg, modules or programs
- the integrated component may perform one or more functions of each component of the plurality of components in the same or similar manner as those performed by the corresponding component of the plurality of components prior to the integration. .
- operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, or omitted. Alternatively, one or more other operations may be added.
- the device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components.
- the devices and components described in the embodiments include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU).
- ALU arithmetic logic unit
- FPGA field programmable gate array
- PLU programmable logic unit
- It may be implemented using one or more general-purpose or special-purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions.
- the processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
- OS operating system
- a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
- a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include.
- a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.
- Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device.
- the software and/or data may be embodied in any type of machine, component, physical device, computer storage medium or device for the purpose of being interpreted by or providing instructions or data to the processing device. there is.
- Software may be distributed over networked computer systems and stored or executed in a distributed manner.
- Software and data may be stored on one or more computer-readable recording media.
- the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium.
- the medium may continuously store a computer-executable program, or temporarily store it for execution or download.
- the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims (15)
- 전자 장치(electronic device)에 있어서,통신 회로;마이크;스피커;적어도 하나의 인스트럭션를 저장하기 위한 메모리; 및적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 마이크를 통하여 수신된 제1 오디오 신호를 식별하고;복수의 기능들을 순차적으로 실행하기 위한 발언을 포함하는 제1 오디오 신호를 식별하는 것에 기반하여, 상기 복수의 기능들 중 적어도 하나의 제1 기능을 수행하고;상기 제1 오디오 신호가 발언을 포함함을 식별하는 것에 기반하여, 상기 통신 회로를 통해 외부 전자 장치의 상기 복수의 기능들 중에서 적어도 하나의 제2 기능을 수행하고, 상기 적어도 하나의 제2 기능은, 상기 복수의 기능들 중에서, 상기 적어도 하나의 제1 기능과 상이함; 및상기 스피커를 통하여, 상기 적어도 하나의 제1 기능, 및 상기 적어도 하나의 제2 기능을 수행한 적어도 하나의 결과를 표현하는(representing) 제2 오디오 신호들을, 상기 발언과 관련된 순서에 기반하여 순차적으로 출력하도록, 구성된,전자 장치.
- 제1항에 있어서, 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 전자 장치에서 실행되는 제1 보이스 엔진의 상태, 및 상기 외부 전자 장치에 의해 실행되는 제2 보이스 엔진의 상태를 획득하고;상기 제1 보이스 엔진의 상태 및 상기 제2 보이스 엔진의 상태에 기반하여, 상기 복수의 기능들을, 상기 제1 보이스 엔진에 의해 수행될 상기 적어도 하나의 제1 기능 및 상기 제2 보이스 엔진에 의해 실행될 상기 적어도 하나의 제2 기능으로 구분하도록, 구성된,전자 장치.
- 제2항에 있어서, 상기 제1 보이스 엔진의 상기 상태는 상기 제1 보이스 엔진이 상기 복수의 기능들 중 어느 하나를 수행할 수 있는 지 여부를 나타내고, 상기 제2 보이스 엔진의 상기 상태는 상기 제2 보이스 엔진이 상기 복수의 기능들 중 하나를 수행할 수 있는지 여부를 나타내는,전자 장치.
- 제2항에 있어서, 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 적어도 하나의 제1 기능에 대응하는 적어도 하나의 제1 보이스 커맨드를 상기 제1 보이스 엔진에 대응하는 제1 큐로 할당하고; 및상기 적어도 하나의 제2 기능에 대응하는 적어도 하나의 제2 보이스 커맨드를, 또는 상기 제2 보이스 엔진에 대응하는 제2 큐로 할당하도록, 구성된,전자 장치.
- 제4항에 있어서, 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 제1 큐에 누적된(accumulated) 상기 적어도 하나의 제1 보이스 커맨드에 기반하여 상기 제1 보이스 엔진을 실행하도록, 구성된,전자 장치.
- 제4항에 있어서, 상기 프로세서는,상기 제2 큐에 누적된 상기 적어도 하나의 제2 보이스 커맨드를 나타내는 적어도 하나의 신호를, 상기 외부 전자 장치로 송신하도록, 구성된,전자 장치.
- 제1항에 있어서, 메모리를 더 포함하고,상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 순서에 기반하여, 상기 메모리 내에 상기 적어도 하나의 결과를 포함하는 정보를 저장하도록, 구성된,전자 장치.
- 제7항에 있어서, 상기 정보는,적어도 하나의 제2 오디오 신호 내 적어도 하나의 발언이 상기 순서에 기반하여 배열된 큐(queue)를 포함하는,전자 장치.
- 제1항에 있어서, 디스플레이를 더 포함하고,상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션들을 실행하여,상기 적어도 하나의 결과를 병렬로(in a parallel) 표시하기 위한 상이한 시각적 객체들을 표시하기 위하여 상기 디스플레이를 제어하도록, 구성된,전자 장치.
- 전자 장치(electronic device)의 방법에 있어서,상기 전자 장치의 마이크를 통하여 제1 오디오 신호를 수신하는 것에 기반하여, 상기 제1 오디오 신호 내 발언을 식별하는 동작;상기 발언으로부터 복수의 보이스 커맨드들을 식별하는 것에 기반하여, 상기 전자 장치에 의해 실행되는 제1 보이스 엔진의 상태, 및 외부 전자 장치에 의해 실행되는 제2 보이스 엔진의 상태를 획득하는 동작, 상기 외부 전자 장치는 상기 전자 장치의 통신 회로를 통해 연결됨;상기 제1 보이스 엔진의 상태 및 상기 제2 보이스 엔진의 상태에 기반하여, 상기 복수의 보이스 커맨드들을, 상기 제1 보이스 엔진에 의해 수행되는 적어도 하나의 제1 보이스 커맨드, 및 상기 제2 보이스 엔진에 의해 수행되는 제2 보이스 커맨드로 구분하기 위한 정보를 획득하는 동작;상기 적어도 하나의 제1 보이스 커맨드에 대응하는 제1 기능을, 상기 제1 보이스 엔진을 이용하여 실행하고, 상기 적어도 하나의 제2 보이스 커맨드에 대응하는 제2 기능을 수행하기 위하여 외부 전자 장치로 요청하는 동작; 및상기 전자 장치의 스피커를 통하여, 상기 발언에 의해 지시되는 상기 복수의 보이스 커맨드들의 순서에 기반하여, 상기 제1 기능, 및 상기 제2 기능을 실행한 적어도 하나의 결과를 표현한(representing) 적어도 하나의 제2 오디오 신호를 출력하는 동작을 포함하는,방법.
- 제10항에 있어서, 상기 제1 오디오 신호 내에서 상기 발언을 식별하는 동작은,상기 외부 전자 장치로부터 상기 제1 오디오 신호에 대응하는 텍스트 데이터를 수신하는 동작;상기 텍스트 데이터에 기반하여 상기 제1 오디오 신호 내 상기 발언을 식별하는 동작을 포함하는,방법.
- 제10항에 있어서, 상기 제1 보이스 엔진의 상태 및 상기 제2 보이스 엔진의 상태를 획득하는 동작은,상기 전자 장치의 메모리 내에서, 상기 순서에 기반하는 상기 복수의 보이스 커맨드들의 수행을 촉발(triggering)하기 위한 지정된 발언을 식별하는 동작; 및상기 제1 오디오 신호 내 상기 발언이 상기 지정된 발언에 대응하는지 여부에 기반하여, 상기 복수의 보이스 커맨드들을 식별하는 동작을 포함하는,방법.
- 제12한 항에 있어서, 상기 제2 기능을 수행하기 위하여 상기 외부 전자 장치에 요청하는 동작은,상기 제2 기능을 수행한 결과를 포함하는 정보를, 상기 외부 전자 장치로부터 수신하는 동작;상기 전자 장치의 메모리 내에, 상기 순서에 기반하여 상기 상기 정보를 저장하는 동작을 포함하는,방법.
- 제10항에 있어서, 상기 제1 보이스 엔진의 상기 상태는 상기 제1 보이스 엔진이 복수의 기능들 중 하나를 수행할 수 있는지 여부를 나타내고, 상기 제2 보이스 엔진의 상기 상태는 상기 제2 보이스 엔진이 상기 복수의 기능들 중 어느 하나를 수행할 수 있는지 여부를 나타내는,방법.
- 제10항에 있어서, 상기 적어도 하나의 제2 오디오 신호를 출력하는 동작은,상기 제2 기능을 수행한 결과 및 상기 제1 기능을 수행한 결과를 보여주는 적어도 하나의 시각적 객체를 표시하기 위하여, 상기 전자 장치의 디스플레이를 제어하는 동작을 포함하는,방법.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202380069810.XA CN119895381A (zh) | 2022-10-08 | 2023-09-27 | 用于控制语音命令执行的电子装置及其方法 |
| EP23875178.8A EP4582926A4 (en) | 2022-10-08 | 2023-09-27 | ELECTRONIC DEVICE FOR CONTROLLING THE EXECUTION OF A VOICE COMMAND, AND ASSOCIATED METHOD |
| US18/535,568 US20240127815A1 (en) | 2022-10-08 | 2023-12-11 | Electronic device for controlling execution of voice command and method thereof |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR20220129140 | 2022-10-08 | ||
| KR10-2022-0129140 | 2022-10-08 | ||
| KR1020220144807A KR20240049507A (ko) | 2022-10-08 | 2022-11-02 | 보이스 커맨드의 실행을 제어하기 위한 전자 장치 및 그 방법 |
| KR10-2022-0144807 | 2022-11-02 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/535,568 Continuation US20240127815A1 (en) | 2022-10-08 | 2023-12-11 | Electronic device for controlling execution of voice command and method thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024076114A1 true WO2024076114A1 (ko) | 2024-04-11 |
Family
ID=90608732
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2023/015158 Ceased WO2024076114A1 (ko) | 2022-10-08 | 2023-09-27 | 보이스 커맨드의 실행을 제어하기 위한 전자 장치 및 그 방법 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240127815A1 (ko) |
| EP (1) | EP4582926A4 (ko) |
| CN (1) | CN119895381A (ko) |
| WO (1) | WO2024076114A1 (ko) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20170087207A (ko) * | 2016-01-20 | 2017-07-28 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 음성 명령 처리 방법 |
| KR20180116726A (ko) * | 2017-04-17 | 2018-10-25 | 삼성전자주식회사 | 음성 데이터 처리 방법 및 이를 지원하는 전자 장치 |
| KR20190035454A (ko) * | 2017-09-26 | 2019-04-03 | 주식회사 케이티 | 음성인식 서비스를 제공하는 단말, 서버 및 방법 |
| KR102026479B1 (ko) * | 2019-03-06 | 2019-09-30 | 주식회사 다이얼로그디자인에이전시 | 병렬처리 플랫폼 기반 인공지능 음성인식 서비스 제공 시스템 |
| US20200126538A1 (en) * | 2018-07-20 | 2020-04-23 | Google Llc | Speech recognition with sequence-to-sequence models |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10311856B2 (en) * | 2016-10-03 | 2019-06-04 | Google Llc | Synthesized voice selection for computational agents |
| US10552204B2 (en) * | 2017-07-07 | 2020-02-04 | Google Llc | Invoking an automated assistant to perform multiple tasks through an individual command |
| KR102374910B1 (ko) * | 2017-08-22 | 2022-03-16 | 삼성전자주식회사 | 음성 데이터 처리 방법 및 이를 지원하는 전자 장치 |
| JP7170739B2 (ja) * | 2018-03-08 | 2022-11-14 | グーグル エルエルシー | リモートに生成された自動化アシスタントコンテンツのレンダリングにおけるクライアントデバイスレイテンシの軽減 |
| WO2019216873A1 (en) * | 2018-05-07 | 2019-11-14 | Google Llc | Determining responsive content for a compound query based on a set of generated sub-queries |
-
2023
- 2023-09-27 WO PCT/KR2023/015158 patent/WO2024076114A1/ko not_active Ceased
- 2023-09-27 CN CN202380069810.XA patent/CN119895381A/zh active Pending
- 2023-09-27 EP EP23875178.8A patent/EP4582926A4/en active Pending
- 2023-12-11 US US18/535,568 patent/US20240127815A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20170087207A (ko) * | 2016-01-20 | 2017-07-28 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 음성 명령 처리 방법 |
| KR20180116726A (ko) * | 2017-04-17 | 2018-10-25 | 삼성전자주식회사 | 음성 데이터 처리 방법 및 이를 지원하는 전자 장치 |
| KR20190035454A (ko) * | 2017-09-26 | 2019-04-03 | 주식회사 케이티 | 음성인식 서비스를 제공하는 단말, 서버 및 방법 |
| US20200126538A1 (en) * | 2018-07-20 | 2020-04-23 | Google Llc | Speech recognition with sequence-to-sequence models |
| KR102026479B1 (ko) * | 2019-03-06 | 2019-09-30 | 주식회사 다이얼로그디자인에이전시 | 병렬처리 플랫폼 기반 인공지능 음성인식 서비스 제공 시스템 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4582926A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4582926A1 (en) | 2025-07-09 |
| EP4582926A4 (en) | 2025-12-17 |
| US20240127815A1 (en) | 2024-04-18 |
| CN119895381A (zh) | 2025-04-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020050475A1 (ko) | 전자 장치 및 단축 명령어에 대응하는 태스크 수행 방법 | |
| WO2020167006A1 (en) | Method of providing speech recognition service and electronic device for same | |
| WO2020197263A1 (en) | Electronic device and multitasking supporting method thereof | |
| WO2022010157A1 (ko) | 인공지능 가상 비서 서비스에서의 화면 제공 방법 및 이를 지원하는 사용자 단말 장치 및 서버 | |
| WO2020180034A1 (ko) | 사용자 선택 기반의 정보를 제공하는 방법 및 장치 | |
| WO2020032381A1 (en) | Electronic apparatus for processing user utterance and controlling method thereof | |
| WO2022220559A1 (en) | Electronic device for processing user utterance and control method thereof | |
| WO2022139420A1 (ko) | 전자 장치 및 그의 연속성을 가지는 사용자 입력에 대한 실행 정보를 공유하는 방법 | |
| WO2025154986A1 (ko) | 전자 장치 및 멀티 윈도우 레이아웃 생성 방법 | |
| WO2024063507A1 (ko) | 전자 장치 및 전자 장치의 사용자 발화 처리 방법 | |
| WO2024043729A1 (ko) | 전자 장치 및 전자 장치의 사용자에 대한 응답 처리 방법 | |
| WO2024076114A1 (ko) | 보이스 커맨드의 실행을 제어하기 위한 전자 장치 및 그 방법 | |
| WO2022177264A1 (ko) | 전자 장치 및 전자 장치의 음성 인식 처리 방법 | |
| WO2022177224A1 (ko) | 전자 장치 및 전자 장치의 동작 방법 | |
| WO2022163963A1 (ko) | 전자 장치 및 전자 장치의 단축 명령어 수행 방법 | |
| WO2022191395A1 (ko) | 사용자 명령을 처리하는 장치 및 그 동작 방법 | |
| WO2025005553A1 (ko) | 음성 신호 처리 방법 및 상기 방법을 수행하는 전자 장치 | |
| WO2024072142A1 (ko) | 서술어를 포함하지 않는 발화를 처리하는 전자 장치, 동작 방법 및 저장 매체 | |
| WO2025127383A1 (ko) | 캐싱을 이용하여 커맨드에 대응하는 기능을 실행하기 위한 전자 장치 및 그 방법 | |
| WO2024080729A1 (ko) | 전자 장치 및 상기 전자 장치에서 위치 기반 컨텍스트를 이용하여 사용자의 발화를 처리하는 방법 | |
| WO2024076139A1 (ko) | 전자 장치 및 상기 전자 장치에서 사용자의 발화를 처리하는 방법 | |
| WO2022139515A1 (ko) | 음성 기반 콘텐츠 제공 방법 및 그 전자 장치 | |
| WO2024010284A1 (ko) | 끝점 검출 시간 결정 방법 및 상기 방법을 수행하는 전자 장치 | |
| WO2023043094A1 (ko) | 전자 장치 및 전자 장치의 동작 방법 | |
| WO2026054323A1 (ko) | 발화에 포함된 요청에 대한 유저 인터페이스를 표시하기 위한 전자 장치, 방법, 및 비일시적 컴퓨터 판독 가능 저장 매체 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23875178 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380069810.X Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202547031735 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023875178 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023875178 Country of ref document: EP Effective date: 20250402 |
|
| WWP | Wipo information: published in national office |
Ref document number: 202547031735 Country of ref document: IN |
|
| WWP | Wipo information: published in national office |
Ref document number: 202380069810.X Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023875178 Country of ref document: EP |


