WO2024071469A1 - 인공지능 기기 및 그의 동작 방법 - Google Patents
인공지능 기기 및 그의 동작 방법 Download PDFInfo
- Publication number
- WO2024071469A1 WO2024071469A1 PCT/KR2022/014593 KR2022014593W WO2024071469A1 WO 2024071469 A1 WO2024071469 A1 WO 2024071469A1 KR 2022014593 W KR2022014593 W KR 2022014593W WO 2024071469 A1 WO2024071469 A1 WO 2024071469A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- artificial intelligence
- intelligence device
- server
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
- G06F16/33295—Natural language query formulation in dialogue systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/197—Version control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- This disclosure relates to artificial intelligence devices and methods of operating them.
- the device is an artificial intelligence (AI) device that can give commands and have conversations through voice.
- AI artificial intelligence
- the voice recognition service has a structure that utilizes a huge amount of database to select the optimal answer to the user's question.
- the voice search function also converts input voice data into text on a cloud server, analyzes it, and retransmits real-time search results based on the results to the device.
- the cloud server has the computing power to classify numerous words into voice data classified by gender, age, and accent, store them, and process them in real time.
- the present disclosure aims to solve the above-described problems and other problems.
- the purpose of this disclosure is to provide artificial intelligence devices.
- the present disclosure aims to derive optimal intention analysis result information that better matches the intention of the speaker using an artificial intelligence device.
- the present disclosure aims to provide corresponding information or perform a function based on optimal intent analysis result information for the input of a speaker using a voice recognition service.
- An artificial intelligence device includes a display; and a processor that controls the display, wherein the processor receives a user input, transmits the user input to a server, and receives response information including intention analysis result information for the user input from the server. , At least one operation of outputting information and performing a function is performed according to the response information, wherein the intention analysis result information includes the user input that is primarily processed based on at least one intention analysis factor transmitted to the server. The results of the intended analysis may be included.
- the processor receives user feedback data according to the at least one operation performed, and performs the one operation based on the feedback data. Or, more intent analysis factors may be updated, and the updated intent analysis factors may be transmitted to the server.
- the artificial intelligence device further includes a memory that communicates with the processor and stores data, wherein the processor parses the response information and sends the parsed response. Based on the information, information to be output and information related to function performance can be read from the memory.
- the processor when the processor outputs information according to the intent analysis result information in the response information, one or more information transmitted to the server Based on the analysis factors, information consisting of a first version or a second version may be provided.
- the processor determines whether the function can be performed, and , If the above determination results show that it is impossible to perform, recommended function information may be provided, and the recommended function may be performed instead.
- the processor when an event occurrence is detected, extracts the immediately preceding user input, intention analysis result information, and function performance operation information to provide recommendation compensation.
- Information on at least one of the functions may be output and a recommendation compensation function may be performed.
- the processor configures routine information regarding function performance and stores it in a memory, and the received intent analysis result information is stored in the stored routine. If there is a relationship with at least one of the routines defined in the information, the remaining routines included in the routine information can be automatically executed.
- the processor provides a speech agent including at least one recommendation query, wherein the at least one recommendation query includes the intent analysis. It may be created based on a recommended keyword configured based on at least one of the factors.
- the method includes receiving a user input; transmitting the user input to a server; Receiving response information including intention analysis result information for the user input from the server; And performing at least one operation of outputting information and performing a function according to the response information, wherein the intent analysis result information is primarily processed based on at least one intent analysis factor transmitted to the server.
- a result of intent analysis of the user input may be included.
- the method includes receiving user feedback data according to the at least one operation performed; updating the one or more intent analysis factors based on the feedback data; And it may further include transmitting the updated intent analysis factors to the server.
- an artificial intelligence device when outputting information according to intent analysis result information in the response information, one or more analyzes transmitted to the server Based on the factors, information consisting of a first version or a second version may be provided.
- an artificial intelligence device when performing a function according to the intention analysis result information in the response information, it is determined whether the function can be performed, If it is impossible to perform as a result of the determination, recommended function information is provided, and the recommended function can be performed instead.
- the method includes detecting the occurrence of an event; Extracting immediately preceding user input, intention analysis result information, and function performance operation information; Outputting information about at least one of information about a recommendation reward function; And it may further include performing a recommendation compensation function.
- the method includes: storing routine information regarding function performance; And if the received intention analysis result information is related to at least one of the routines defined in the stored routine information, it may further include automatically executing the remaining routines included in the routine information.
- an artificial intelligence device further includes providing a speech agent including at least one recommendation query, wherein the at least one recommendation query may be created based on a recommended keyword configured based on at least one of the intent analysis factors.
- the quality of the voice recognition service is improved and the user's device usage satisfaction is improved by deriving the optimal intention analysis result that matches various user inputs and providing corresponding information or performing a function. There is an effect that can be maximized.
- FIG. 1 is a diagram for explaining a voice system according to an embodiment of the present invention.
- Figure 2 is a block diagram for explaining the configuration of an artificial intelligence device according to an embodiment of the present disclosure.
- Figure 3 is a block diagram for explaining the configuration of a voice service server according to an embodiment of the present invention.
- Figure 4 is a diagram illustrating an example of converting a voice signal into a power spectrum according to an embodiment of the present invention.
- Figure 5 is a block diagram illustrating the configuration of a processor for voice recognition and synthesis of an artificial intelligence device, according to an embodiment of the present invention.
- Figure 6 is a diagram for explaining the landscape mode and portrait mode of a stand-type artificial intelligence device according to an embodiment of the present disclosure.
- Figure 7 is a block diagram of an artificial intelligence device according to another embodiment of the present disclosure.
- FIG. 8 is an example of a detailed block diagram of the processor of FIG. 7.
- Figure 9 is a flow chart illustrating a method of operating an artificial intelligence device according to an embodiment of the present disclosure.
- FIG. 10 is a diagram illustrating an operation based on intention analysis result information considering time information in an artificial intelligence device according to an embodiment of the present disclosure.
- FIGS. 11 and 12 are diagrams illustrating operations based on intention analysis result information considering content information in an artificial intelligence device according to an embodiment of the present disclosure.
- FIG. 13 is a diagram illustrating an operation based on intention analysis result information considering spatial information in an artificial intelligence device according to an embodiment of the present disclosure.
- FIG. 14 is a diagram illustrating a user input processing method of an artificial intelligence device according to an embodiment of the present disclosure.
- FIG. 15 is a diagram illustrating a user input processing method of an artificial intelligence device according to another embodiment of the present disclosure.
- FIG. 16 is a diagram illustrating a user input processing method of an artificial intelligence device according to another embodiment of the present disclosure.
- FIG. 17 is a diagram illustrating a recommendation query through a speech agent according to an embodiment of the present disclosure.
- 'Artificial intelligence devices' described in this specification include mobile phones, smart phones, laptop computers, artificial intelligence devices for digital broadcasting, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation, and slates.
- PDAs personal digital assistants
- PMPs portable multimedia players
- PC slate PC
- tablet PC tablet PC
- ultrabook wearable device (e.g., watch-type artificial intelligence device (smartwatch), glass-type artificial intelligence device (smart glass), HMD ( head mounted display)), etc.
- wearable device e.g., watch-type artificial intelligence device (smartwatch), glass-type artificial intelligence device (smart glass), HMD ( head mounted display)
- HMD head mounted display
- artificial intelligence devices may also be applied to fixed artificial intelligence devices such as smart TVs, desktop computers, digital signage, refrigerators, washing machines, air conditioners, and dishwashers.
- artificial intelligence devices can also be applied to fixed or movable robots.
- an artificial intelligence device can perform the function of a voice agent (or speech agent).
- a voice agent may be a program that recognizes the user's voice and outputs a response appropriate for the recognized user's voice as a voice.
- FIG. 1 is a diagram for explaining a voice service system according to an embodiment of the present invention.
- the voice service may include at least one of voice recognition and voice synthesis services.
- the speech recognition and synthesis process converts the speaker's (or user's) voice data into text data, analyzes the speaker's intention based on the converted text data, and converts the text data corresponding to the analyzed intention into synthesized voice data. , It may include a process of outputting the converted synthesized voice data.
- a voice service system as shown in Figure 1, can be used.
- the voice service system includes an artificial intelligence device (10), a Speech To Text (STT) server (20), a Natural Language Processing (NLP) server (30), and a voice synthesis server ( 40) may be included.
- a plurality of AI agent servers 50-1 to 50-3 communicate with the NLP server 30 and may be included in the voice service system.
- the STT server 20, NLP server 30, and voice synthesis server 40 may exist as separate servers as shown, or may be included in one server 200.
- a plurality of AI agent servers 50-1 to 50-3 may also exist as separate servers or may be included in one server.
- the artificial intelligence device 10 may transmit a voice signal corresponding to the speaker's voice received through the microphone 122 of FIG. 2 to the STT server 20.
- the STT server 20 can convert voice data received from the artificial intelligence device 10 into text data.
- the STT server 20 can increase the accuracy of voice-to-text conversion by using a language model.
- a language model can refer to a model that can calculate the probability of a sentence or the probability of the next word appearing given the previous words.
- the language model may include probabilistic language models such as Unigram model, Bigram model, N-gram model, etc.
- the unigram model is a model that assumes that the usage of all words is completely independent of each other, and calculates the probability of a word string as the product of the probability of each word.
- the bigram model is a model that assumes that the use of a word depends only on the previous word.
- the N-gram model is a model that assumes that the usage of a word depends on the previous (n-1) words.
- the STT server 20 can use the language model to determine whether text data converted from voice data has been appropriately converted, and through this, the accuracy of conversion to text data can be increased.
- the NLP server 30 may receive text data from the STT server 20.
- the STT server 20 may be included in the NLP server 30.
- the NLP server 30 may perform intent analysis on text data based on the received text data.
- the NLP server 30 may transmit intention analysis information indicating the result of intention analysis to the artificial intelligence device 10.
- the NLP server 30 may transmit intention analysis information to the voice synthesis server 40.
- the voice synthesis server 40 may generate a synthesized voice based on intent analysis information and transmit the generated synthesized voice to the artificial intelligence device 10.
- the NLP server 30 may generate intention analysis information by sequentially performing a morpheme analysis step, a syntax analysis step, a dialogue act analysis step, and a dialogue processing step on text data.
- the morpheme analysis step is a step that classifies text data corresponding to the voice uttered by the user into morpheme units, which are the smallest units with meaning, and determines what part of speech each classified morpheme has.
- the syntax analysis step is a step that uses the results of the morpheme analysis step to classify text data into noun phrases, verb phrases, adjective phrases, etc., and determines what kind of relationship exists between each divided phrase.
- the subject, object, and modifiers of the voice uttered by the user can be determined.
- the speech act analysis step is a step of analyzing the intention of the voice uttered by the user using the results of the syntax analysis step. Specifically, the speech act analysis step is to determine the intent of the sentence, such as whether the user is asking a question, making a request, or simply expressing an emotion.
- the conversation processing step is a step that uses the results of the dialogue act analysis step to determine whether to answer, respond to, or ask a question for additional information about the user's utterance.
- the NLP server 30 may generate intention analysis information including one or more of a response to the intention uttered by the user, a response, and an inquiry for additional information.
- the NLP server 30 may transmit a search request to a search server (not shown) and receive search information corresponding to the search request in order to search for information that matches the user's utterance intention.
- the search information may include information about the searched content.
- the NLP server 30 transmits search information to the artificial intelligence device 10, and the artificial intelligence device 10 can output the search information.
- the NLP server 30 may receive text data from the artificial intelligence device 10. For example, if the artificial intelligence device 10 supports a voice-to-text conversion function, the artificial intelligence device 10 converts voice data into text data and transmits the converted text data to the NLP server 30. .
- the voice synthesis server 40 can generate a synthesized voice by combining pre-stored voice data.
- the voice synthesis server 40 can record the voice of a person selected as a model and divide the recorded voice into syllables or words.
- the voice synthesis server 40 can store the segmented voice in units of syllables or words in an internal or external database.
- the voice synthesis server 40 may search for syllables or words corresponding to given text data from a database, synthesize a combination of the searched syllables or words, and generate a synthesized voice.
- the voice synthesis server 40 may store a plurality of voice language groups corresponding to each of a plurality of languages.
- the speech synthesis server 40 may include a first audio language group recorded in Korean and a second audio language group recorded in English.
- the speech synthesis server 40 may translate text data in the first language into text in the second language and generate synthesized speech corresponding to the translated text in the second language using the second speech language group.
- the voice synthesis server 40 can transmit the generated synthesized voice to the artificial intelligence device 10.
- the voice synthesis server 40 may receive analysis information from the NLP server 30.
- the analysis information may include information analyzing the intention of the voice uttered by the user.
- the voice synthesis server 40 may generate a synthesized voice that reflects the user's intention based on the analysis information.
- At least two of the STT server 20, NLP server 30, and voice synthesis server 40 may be implemented as one server.
- Each function of the STT server 20, NLP server 30, and voice synthesis server 40 described above may also be performed by the artificial intelligence device 10.
- the artificial intelligence device 10 may include one or more processors.
- Each of the plurality of AI agent servers 50-1 to 50-3 may transmit search information to the NLP server 30 or the artificial intelligence device 10 according to a request from the NLP server 30.
- the NLP server 30 transmits the content search request to one or more of the plurality of AI agent servers 50-1 to 50-3, , content search results can be received from the corresponding server.
- the NLP server 30 may transmit the received search results to the artificial intelligence device 10.
- Figure 2 is a block diagram for explaining the configuration of an artificial intelligence device 10 according to an embodiment of the present disclosure.
- the artificial intelligence device 10 includes a communication unit 110, an input unit 120, a learning processor 130, a sensing unit 140, an output unit 150, a memory 170, and a processor 180. may include.
- the communication unit 110 can transmit/receive data to/from external devices using wired/wireless communication technology.
- the communication unit 110 may transmit/receive sensor information, user input, learning models, control signals, etc. with external devices.
- communication technologies used by the communication unit 110 include GSM (Global System for Mobile communication), CDMA (Code Division Multi Access), LTE (Long Term Evolution), LTE-A (advanced), 5G, WLAN (Wireless LAN), These include Wi-Fi (Wireless-Fidelity), BluetoothTM RFID (Radio Frequency Identification), Infrared Data Association (IrDA), ZigBee, and NFC (Near Field Communication).
- GSM Global System for Mobile communication
- CDMA Code Division Multi Access
- LTE Long Term Evolution
- LTE-A advanced wireless LAN
- WLAN Wireless LAN
- Wi-Fi Wireless-Fidelity
- BluetoothTM RFID Radio Frequency Identification
- IrDA Infrared Data Association
- ZigBee ZigBee
- NFC Near Field Communication
- the input unit 120 can acquire various types of data.
- the input unit 120 may include a camera for inputting video signals, a microphone for receiving audio signals, and a user input unit for receiving information from a user.
- the camera or microphone may be treated as a sensor, and the signal obtained from the camera or microphone may be referred to as sensing data or sensor information.
- the input unit 120 may acquire training data for model learning and input data to be used when obtaining an output using the learning model.
- the input unit 120 may acquire unprocessed input data, and in this case, the processor 180 or the learning processor 130 may extract input features by preprocessing the input data.
- the input unit 120 may include a camera 121 for inputting video signals, a microphone 122 for receiving audio signals, and a user input unit 123 for receiving information from the user. there is.
- Voice data or image data collected by the input unit 120 may be analyzed and processed as a user's control command.
- the input unit 120 is for inputting image information (or signal), audio information (or signal), data, or information input from the user. To input image information, one or more artificial intelligence devices 10 are used. of cameras 121 may be provided.
- the camera 121 processes image frames, such as still images or moving images, obtained by an image sensor in video call mode or shooting mode.
- image frames such as still images or moving images, obtained by an image sensor in video call mode or shooting mode.
- the processed image frame may be displayed on the display unit 151 or stored in the memory 170.
- the microphone 122 processes external acoustic signals into electrical voice data.
- Processed voice data can be used in various ways depending on the function (or application program being executed) being performed by the artificial intelligence device 10. Meanwhile, various noise removal algorithms may be applied to the microphone 122 to remove noise generated in the process of receiving an external acoustic signal.
- the user input unit 123 is for receiving information from the user.
- the processor 180 can control the operation of the artificial intelligence device 10 to correspond to the input information. there is.
- the user input unit 123 is a mechanical input means (or mechanical key, such as a button, dome switch, jog wheel, jog switch, etc. located on the front/rear or side of the artificial intelligence device 10). ) and a touch input means.
- the touch input means consists of a virtual key, soft key, or visual key displayed on the touch screen through software processing, or a part other than the touch screen. It can be done with a touch key placed in .
- the learning processor 130 can train a model composed of an artificial neural network using training data.
- the learned artificial neural network may be referred to as a learning model.
- a learning model can be used to infer a result value for new input data other than learning data, and the inferred value can be used as the basis for a decision to perform an operation.
- the learning processor 130 may include memory integrated or implemented in the artificial intelligence device 10. Alternatively, the learning processor 130 may be implemented using the memory 170, an external memory directly coupled to the artificial intelligence device 10, or a memory maintained in an external device.
- the sensing unit 140 may use various sensors to obtain at least one of internal information of the artificial intelligence device 10, information about the surrounding environment of the artificial intelligence device 10, and user information.
- the sensors included in the sensing unit 140 include a proximity sensor, illuminance sensor, acceleration sensor, magnetic sensor, gyro sensor, inertial sensor, RGB sensor, IR sensor, fingerprint recognition sensor, ultrasonic sensor, light sensor, microphone, and lidar. , radar, etc.
- the output unit 150 may generate output related to vision, hearing, or tactile sensation.
- the output unit 150 includes at least one of a display unit (Display Unit, 151), a sound output unit (152), a haptic module (153), and an optical output unit (Optical Output Unit, 154). It can be included.
- the display unit 151 displays (outputs) information processed by the artificial intelligence device 10.
- the display unit 151 may display execution screen information of an application running on the artificial intelligence device 10, or UI (User Interface) and GUI (Graphic User Interface) information according to such execution screen information.
- UI User Interface
- GUI Graphic User Interface
- the display unit 151 can implement a touch screen by forming a layered structure or being integrated with the touch sensor.
- This touch screen functions as a user input unit 123 that provides an input interface between the artificial intelligence device 10 and the user, and can simultaneously provide an output interface between the terminal 100 and the user.
- the audio output unit 152 may output audio data received from the communication unit 110 or stored in the memory 170 in call signal reception, call mode or recording mode, voice recognition mode, broadcast reception mode, etc.
- the sound output unit 152 may include at least one of a receiver, a speaker, and a buzzer.
- the haptic module 153 generates various tactile effects that the user can feel.
- a representative example of a tactile effect generated by the haptic module 153 may be vibration.
- the optical output unit 154 uses light from the light source of the artificial intelligence device 10 to output a signal to notify the occurrence of an event. Examples of events that occur in the artificial intelligence device 10 may include receiving a message, receiving a call signal, missed call, alarm, schedule notification, receiving email, receiving information through an application, etc.
- the memory 170 can store data supporting various functions of the artificial intelligence device 10.
- the memory 170 may store input data, learning data, learning models, learning history, etc. obtained from the input unit 120.
- the processor 180 may determine at least one executable operation of the artificial intelligence device 10 based on information determined or generated using a data analysis algorithm or a machine learning algorithm. And the processor 180 can control the components of the artificial intelligence device 10 to perform the determined operation.
- the processor 180 may request, retrieve, receive, or utilize data from the learning processor 130 or the memory 170, and may artificially execute an operation that is predicted or determined to be desirable among the at least one executable operation. Components of the intelligent device 10 can be controlled.
- the processor 180 may generate a control signal to control the external device and transmit the generated control signal to the external device.
- the processor 180 may obtain intent information regarding user input and determine the user's request based on the obtained intent information.
- the processor 180 uses at least one of an STT engine (410 in FIG. 5) for converting voice input into a character string or an NLP engine (430 in FIG. 5) for acquiring intent information of natural language, corresponding to the user input. Intention information can be obtained.
- At least one of the STT engine (410 in FIG. 5) or the NLP engine (430 in FIG. 5) may be configured at least in part with an artificial neural network learned according to a machine learning algorithm. And at least one of the STT engine (410 in FIG. 5) or the NLP engine (430 in FIG. 5) is learned by the learning processor 130, and is learned by the learning processor 240 of the AI server 200. , or it may be learned through distributed processing.
- the processor 180 collects history information including the user's feedback on the operation of the artificial intelligence device 10 and stores it in the memory 170 or the learning processor 130 or the AI server 200, etc. Can be transmitted to external devices. The collected historical information can be used to update the learning model.
- the processor 180 may control at least some of the components of the artificial intelligence device 10 to run an application program stored in the memory 170. Furthermore, the processor 180 may operate two or more of the components included in the artificial intelligence device 10 in combination with each other in order to run the application program.
- Figure 3 is a block diagram for explaining the configuration of the voice service server 200 according to an embodiment of the present invention.
- the voice service server 200 may include one or more of the STT server 20, NLP server 30, and voice synthesis server 40 shown in FIG. 1.
- the voice service server 200 may be referred to as a server system.
- the voice service server 200 may include a preprocessor 220, a controller 230, a communication unit 270, and a database 290.
- the preprocessing unit 220 may preprocess the voice received through the communication unit 270 or the voice stored in the database 290.
- the preprocessing unit 220 may be implemented as a separate chip from the controller 230 or may be implemented as a chip included in the controller 230.
- the preprocessor 220 may receive a voice signal (uttered by a user) and filter noise signals from the voice signal before converting the received voice signal into text data.
- the preprocessor 220 If the preprocessor 220 is provided in the artificial intelligence device 10, it can recognize a startup word to activate voice recognition of the artificial intelligence device 10.
- the preprocessor 220 converts the startup word received through the microphone 121 into text data, and if the converted text data is text data corresponding to a pre-stored startup word, it may be determined that the startup word has been recognized. .
- the preprocessor 220 may convert the noise-removed voice signal into a power spectrum.
- the power spectrum may be a parameter that indicates which frequency components and at what magnitude are included in the temporally varying waveform of a voice signal.
- the power spectrum shows the distribution of squared amplitude values according to the frequency of the waveform of the voice signal.
- FIG. 4 is a diagram illustrating an example of converting a voice signal 410 into a power spectrum 430 according to an embodiment of the present invention.
- the voice signal 410 may be received from an external device or may be a signal previously stored in the memory 170.
- the x-axis of the voice signal 410 may represent time, and the y-axis may represent amplitude.
- the power spectrum processor 225 can convert the voice signal 410, where the x-axis is the time axis, into a power spectrum 430, where the x-axis is the frequency axis.
- the power spectrum processor 225 may convert the voice signal 410 into a power spectrum 430 using Fast Fourier Transform (FFT).
- FFT Fast Fourier Transform
- the x-axis of the power spectrum 430 represents frequency, and the y-axis represents the square value of amplitude.
- the functions of the preprocessor 220 and the controller 230 described in FIG. 3 can also be performed by the NLP server 30.
- the preprocessor 220 may include a wave processor 221, a frequency processor 223, a power spectrum processor 225, and an STT converter 227.
- the wave processing unit 221 can extract the waveform of the voice.
- the frequency processing unit 223 can extract the frequency band of the voice.
- the power spectrum processing unit 225 can extract the power spectrum of the voice.
- the power spectrum may be a parameter that indicates which frequency components and at what size are included in the waveform.
- the STT converter 227 can convert voice into text.
- the STT conversion unit 227 can convert voice in a specific language into text in that language.
- the controller 230 can control the overall operation of the voice service server 200.
- the controller 230 may include a voice analysis unit 231, a text analysis unit 232, a feature clustering unit 233, a text mapping unit 234, and a voice synthesis unit 235.
- the voice analysis unit 231 may extract voice characteristic information using one or more of the voice waveform, voice frequency band, and voice power spectrum preprocessed in the preprocessor 220.
- the voice characteristic information may include one or more of the speaker's gender information, the speaker's voice (or tone), the pitch of the sound, the speaker's speaking style, the speaker's speech speed, and the speaker's emotion.
- the voice characteristic information may further include the speaker's timbre.
- the text analysis unit 232 may extract key expressions from the text converted by the STT conversion unit 227.
- the text analysis unit 232 When the text analysis unit 232 detects a change in tone between phrases from the converted text, it can extract the phrase with a different tone as the main expression phrase.
- the text analysis unit 232 may determine that the tone has changed when the frequency band between the phrases changes more than a preset band.
- the text analysis unit 232 may extract key words from phrases in the converted text.
- a key word may be a noun that exists within a phrase, but this is only an example.
- the feature clustering unit 233 can classify the speaker's speech type using the voice characteristic information extracted from the voice analysis unit 231.
- the feature clustering unit 233 may classify the speaker's utterance type by assigning weight to each type item constituting the voice characteristic information.
- the feature clustering unit 233 can classify the speaker's utterance type using the attention technique of a deep learning model.
- the text mapping unit 234 may translate the text converted into the first language into the text of the second language.
- the text mapping unit 234 may map the text translated into the second language with the text of the first language.
- the text mapping unit 234 can map key expressions constituting the text in the first language to corresponding phrases in the second language.
- the text mapping unit 234 may map the utterance type corresponding to the main expression phrases constituting the text of the first language to phrases of the second language. This is to apply the classified utterance type to the phrases of the second language.
- the voice synthesis unit 235 applies the utterance type and speaker's tone classified by the feature clustering unit 233 to the main expressions of the text translated into the second language in the text mapping unit 234 to produce a synthesized voice. can be created.
- the controller 230 may determine the user's speech characteristics using one or more of the delivered text data or the power spectrum 430.
- the user's speech characteristics may include the user's gender, the user's pitch, the user's tone, the user's speech topic, the user's speech speed, and the user's voice volume.
- the controller 230 may use the power spectrum 430 to obtain the frequency of the voice signal 410 and the amplitude corresponding to the frequency.
- the controller 230 can determine the gender of the user who uttered the voice using the frequency band of the power spectrum 430.
- the controller 230 may determine the user's gender as male.
- the controller 230 may determine the user's gender as female.
- the second frequency band range may be larger than the first frequency band range.
- the controller 230 can determine the pitch of the voice using the frequency band of the power spectrum 430.
- the controller 230 may determine the pitch of the sound according to the size of the amplitude within a specific frequency band.
- the controller 230 may determine the user's tone using the frequency band of the power spectrum 430. For example, the controller 230 may determine a frequency band with an amplitude greater than a certain level among the frequency bands of the power spectrum 430 as the user's main sound range, and determine the determined main sound range as the user's tone.
- the controller 230 may determine the user's speech rate based on the number of syllables uttered per unit time from the converted text data.
- the controller 230 can determine the topic of the user's speech using the Bag-Of-Word Model technique for the converted text data.
- the Bag-Of-Word Model technique is a technique to extract frequently used words based on the frequency of words in a sentence.
- the Bag-Of-Word Model technique is a technique that extracts unique words within a sentence and expresses the frequency of each extracted word as a vector to determine the characteristics of the topic of speech.
- the controller 230 may classify the topic of the user's speech as exercise.
- the controller 230 may determine the topic of the user's speech from text data using a known text categorization technique.
- the controller 230 may extract keywords from text data and determine the topic of the user's speech.
- the controller 230 may determine the user's voice volume by considering amplitude information in the entire frequency band.
- the controller 230 may determine the user's voice quality based on the average or weighted average of the amplitude in each frequency band of the power spectrum.
- the communication unit 270 may communicate with an external server by wire or wirelessly.
- the database 290 may store the voice of the first language included in the content.
- the database 290 may store a synthesized voice in which the voice of the first language is converted into the voice of the second language.
- the database 290 may store a first text corresponding to a voice in the first language and a second text in which the first text is translated into the second language.
- the database 290 may store various learning models required for voice recognition.
- the processor 180 of the artificial intelligence device 10 shown in FIG. 2 may include the preprocessor 220 and the controller 230 shown in FIG. 3.
- the processor 180 of the artificial intelligence device 10 may perform the functions of the preprocessor 220 and the controller 230.
- FIG. 5 is a block diagram illustrating the configuration of a processor for voice recognition and synthesis of the artificial intelligence device 10, according to an embodiment of the present invention.
- the voice recognition and synthesis process of FIG. 5 may be performed by the learning processor 130 or processor 180 of the artificial intelligence device 10 without going through a server.
- the processor 180 of the artificial intelligence device 10 may include an STT engine 510, an NLP engine 530, and a voice synthesis engine 550.
- Each engine can be either hardware or software.
- the STT engine 510 may perform the function of the STT server 20 of FIG. 1. That is, the STT engine 510 can convert voice data into text data.
- the NLP engine 530 may perform the functions of the NLP server 30 of FIG. 1. That is, the NLP engine 530 can obtain intention analysis information indicating the speaker's intention from the converted text data.
- the voice synthesis engine 550 may perform the function of the voice synthesis server 40 of FIG. 1.
- the speech synthesis engine 550 may search a database for syllables or words corresponding to given text data, synthesize a combination of the searched syllables or words, and generate a synthesized voice.
- the voice synthesis engine 550 may include a preprocessing engine 551 and a TTS engine 553.
- the preprocessing engine 551 may preprocess text data before generating synthetic speech.
- the preprocessing engine 551 performs tokenization by dividing text data into tokens, which are meaningful units.
- the preprocessing engine 551 may perform a cleansing operation to remove unnecessary characters and symbols to remove noise.
- the preprocessing engine 551 can generate the same word token by integrating word tokens with different expression methods.
- the preprocessing engine 551 may remove meaningless word tokens (stopwords).
- the TTS engine 553 can synthesize speech corresponding to preprocessed text data and generate synthesized speech.
- FIG. 6 is a diagram illustrating the landscape mode and portrait mode of the stand-type artificial intelligence device 10 according to an embodiment of the present disclosure.
- a stand type artificial intelligence device 10 is shown.
- a shaft 603 and a stand base 605 may be connected to the artificial intelligence device 10.
- the shaft 603 can connect the artificial intelligence device 10 and the stand base 605.
- the shaft 603 may extend vertically.
- the lower end of the shaft 603 may be connected to the edge of the stand base 605.
- the lower end of the shaft 603 may be rotatably connected to the circumference of the stand base 605.
- the artificial intelligence device 10 and the shaft 603 can rotate about a vertical axis with respect to the stand base 605.
- the upper part of the shaft 603 may be connected to the rear of the artificial intelligence device 10.
- the stand base 605 may serve to support the artificial intelligence device 10.
- the artificial intelligence device 10 may be configured to include a shaft 603 and a stand base 605.
- the artificial intelligence device 10 can rotate around the point where the top of the shaft 603 and the rear of the display 151 meet.
- Figure 6(a) shows that the display 151 operates in landscape mode in a position in which the horizontal length is greater than the vertical length
- Figure 6(b) indicates that the display 151 operates in a landscape mode in which the vertical length is greater than the horizontal length. It can be indicated that it operates in landscape mode with .
- the stand-type artificial intelligence device 10 has improved mobility and the user is not limited by its placement location.
- the NLP server 30 receives, for example, text data for user input from the STT server 20, and performs a morpheme analysis step, a syntax analysis step, a dialogue act analysis step, and By performing the dialogue processing steps sequentially, intention analysis result information (for convenience, referred to as 'first intention analysis result information') can be generated.
- intention analysis result information for convenience, referred to as 'first intention analysis result information'
- the artificial intelligence device 10 may perform a corresponding operation.
- the corresponding operation may include configuring and providing information (or recommended information) based on the first intent analysis result information, performing a function (or recommended function), etc.
- the artificial intelligence device 10 may derive first intention analysis result information and then derive second intention analysis result information based on the above-described intention analysis factor. there is. Meanwhile, the artificial intelligence device 10 may receive both first intention analysis result information and second intention analysis result information through, for example, the NLP server 30 or the voice synthesis server 40 of FIG. 1, Only second intention analysis result information may be received.
- a single intention analysis result information may be derived by considering the above-described intention analysis factors together in the intention analysis process.
- both the first intention analysis result information and the second intention analysis result information can be generated in the NLP server 30, and the NLP server 30 or the voice synthesis server 40 in the artificial intelligence device 10. This will be explained by taking as an example the case of receiving only the first and second intention analysis result information or the second intention analysis result information.
- Intention analysis factors related to deriving the second intention analysis result information may include, for example, time, space, user, schedule, content, etc. Individual intention analysis factors are explained in detail in the relevant section.
- intent analysis factors may be applied individually for intent analysis, or at least two or more intent analysis factors may be applied simultaneously or sequentially for intent analysis.
- each intent analysis factor may or may not be assigned the same priority or weight.
- at least two of the intent analysis factors may be grouped and assigned and applied together for intent analysis. At this time, one intent analysis factor may belong to multiple groups.
- the number and type of intention analysis factors may be registered in advance with the artificial intelligence device 10 and the voice service server 200, or may be arbitrarily determined.
- Figure 7 is a block diagram of an artificial intelligence device 10 according to another embodiment of the present disclosure.
- FIG. 8 is an example of a detailed block diagram of the processor 720 of FIG. 7.
- the voice service server 200 may include the STT server 20 and the NLP server 30 shown in FIG. 1, and may even include a voice synthesis server 40 depending on the embodiment.
- the 'voice service server 200' when the 'voice service server 200' is described, it may indicate the NLP server 30, or may mean that it further includes at least one of the STT server 20 and the voice synthesis server 40. . However, it is not limited to this.
- some of the functions of the voice service server 200 may be performed by the artificial intelligence device 10.
- the artificial intelligence device 10 may be configured to include a display 150 or 151 and a processing unit 700.
- the processing unit 700 may include a memory 710 and a processor 720.
- the processing unit 700 can be connected to the voice service server 200 in various ways to exchange data.
- the memory 710 may store various data, such as data received or processed by the processing unit 700.
- the memory 710 may store intention analysis result information processed by the processing unit 700 or received from the voice service server 200.
- the memory 710 is controlled by the processing unit 700 or the processor 720 and can store corresponding action information related to the stored intention analysis result information, and can be provided to the user through the display 150 or 151. You can.
- the processor 720 may include a voice data reception module 810, a result reception module 820, and a corresponding operation module.
- the corresponding operation module may include an information generation module 830 and a function generation module 840.
- the present disclosure is not limited to this.
- the voice data receiving module 810 can receive a user's input, that is, a voice input (but is not limited to this), and can transmit the received user's voice input to the voice service server 200.
- the voice data receiving module 810 may receive a user's input (eg, text data) rather than a voice input and transmit it to the voice service server 200 as described above.
- the result receiving module 820 may receive the intent analysis result corresponding to the user's voice input transmitted from the voice service server 200 through the voice data receiving module 810.
- the processor 720 may determine a corresponding action based on the result of parsing the intent analysis result information received through the result receiving module 820. If the determined corresponding action is related to providing information (or recommended information), the information generation module 83 may be operated. If the determined operation is related to performing a function (or recommended function), the function creation module 840 may operate.
- the voice data receiving module 810 transmits voice data to the server 200, but the corresponding operation can be replaced by another module (for example, the result receiving module 820, etc.). there is.
- the voice data reception module 810 is described as transmitting the user input to the voice service server 200 without any additional processing.
- the STT engine (510) after processing the user input in the NLP engine 530, the processed data may be transmitted to the voice service server 200, and intention analysis result information may be derived based on the processed data in the artificial intelligence device 10, Only user input and derived intention analysis result information may be transmitted to the server 200.
- the intent analysis result information represents, for example, the above-described second intent analysis result information, but is not limited thereto and may include first intent analysis result information depending on the embodiment.
- the processor 720 may have the same configuration as the processor 180 of FIG. 2 described above, but may also have a separate configuration.
- FIG. 9 is a diagram illustrating a user input processing method of a voice service system according to an embodiment of the present disclosure.
- the artificial intelligence device 10 can receive user input (S101).
- user input refers to voice input for convenience of explanation, but is not limited thereto.
- the user's input may be a text input or a combination of text input and voice input.
- the remote control device may include a remote control used by the artificial intelligence device 10.
- the remote control device may include at least one of an AI speaker, smartphone, tablet PC, wearable device, etc.
- the remote control device may be a device installed with firmware/software such as applications, programs, and API (Application Program Interface) required for data communication such as voice input with the artificial intelligence device 10.
- the remote control device may be a device registered in advance with the artificial intelligence device 10.
- the artificial intelligence device 10 may transmit the user input received in step S101 to the STT server 20 (S103).
- the STT server 20 may transfer the received user input (text data) to the NLP server 30 as is. Meanwhile, if the received user input is not voice data, the artificial intelligence device 10 may directly transmit the user input to the NLP server 30 rather than the STT server 20.
- the STT server 20 may derive text data corresponding to the user input received through the artificial intelligence device 10 in step S103 (S105).
- the STT server 20 may transmit text data corresponding to the user input derived in step S105 to the NLP server 30 (S107).
- the NLP server 30 may perform an intention analysis process on the text data received from the STT server 20 through step S107 and derive intention analysis result information (S109).
- the intent analysis process may use at least one intent analysis factor among the intent analysis factors according to the present disclosure.
- the intention analysis result information may correspond to or include the second intention analysis result information.
- the NLP server 30 may return (or transmit) the intention analysis result information derived through step S109 to the artificial intelligence device 10 (S111).
- the artificial intelligence device 10 may parse the intention analysis result information according to the user input returned from the NLP server 30 through step S111 and determine a corresponding action based on the intention analysis result information (S113).
- the artificial intelligence device 10 may perform a function (or recommendation function) or output information (e.g., recommended information, information about a function or recommended function, etc.) based on the corresponding action determined through step S113 ( S115).
- a function or recommendation function
- output information e.g., recommended information, information about a function or recommended function, etc.
- the NLP server 30 may also transmit operation control information corresponding to the operation determined by the artificial intelligence device 10, that is, function performance or information output. .
- the artificial intelligence device 10 may recognize this as recommended information or reference information. Accordingly, the artificial intelligence device 10 can select or modify some or all of them and use them to determine a corresponding action.
- the artificial intelligence device 10 may receive additional input from the user, for example, feedback from the user in relation to the function or output information performed through step S115 (S117).
- the artificial intelligence device 10 may transmit the user's feedback to be received through step S117 to the NLP server 30 (S119).
- the NLP server 30 updates the algorithm or artificial intelligence learning model (AI learning model) used to derive the result of the previously performed intention analysis based on the user's feedback received from the artificial intelligence device 10 through step S119. and save (S121).
- AI learning model algorithm or artificial intelligence learning model
- the NLP server 30 may return the fact that the algorithm has been updated in step S121 to the artificial intelligence device 10, etc.
- voice recognition based on voice input related to weather, for example, is increasingly being used.
- the user input for intent analysis is voice input
- the voice input is related to a weather information request as an example, but is not limited thereto.
- FIG. 10 is a diagram illustrating an operation based on intention analysis result information considering time information in an artificial intelligence device 10 according to an embodiment of the present disclosure.
- time information is a specific time (e.g., 9 a.m., 8 p.m., 10 p.m., etc.), time zone (e.g., between 9 a.m. and 10 a.m., 8 p.m., etc.) It can represent at least one of: 10:00 - 10:00 PM, morning, evening, night, etc.), day of the week (e.g., Monday - Friday, weekend, etc.), etc.
- time zone e.g., between 9 a.m. and 10 a.m., 8 p.m., etc.
- the voice recognition function is most used in the evening hours (for example, 19:00), and its use is relatively low in the early morning hours (for example, 4 o'clock).
- Table 1 shows examples of voice recognition use (main speech information) at a specific time or time zone in Korea, and Table 2 shows an example of voice recognition use at a specific time or time zone in Italy.
- the utterance 'tomorrow's weather' was the most frequently uttered at 22:00, followed by 'weather', 'what's the weather like tomorrow', 'today's weather', and 'tell me the weather tomorrow'. You can see that the same utterances are followed, and at 01:00, the utterance 'Tell me the weather tomorrow' was the most frequently uttered, followed by 'today's weather', 'weather', 'weekend weather', and 'weather'. You can see that it is followed by an utterance such as 'Tell me.'
- the utterance 'weather forecast' was the most frequently uttered at 22:00, followed by 'What is the weather today', 'What's the weather like tomorrow', 'Forecasts', and ' It can be seen that utterances such as 'What is the weather' are followed, and at 01:00, utterances 'What's the weather like tomorrow', 'What's the weather like today', and 'Weather forecast' have the highest number of utterances. There were many, and you can see that they are followed by utterances such as 'Trapani weather' and 'Weekend Weather Forecast'.
- a day is set from 0:00 to 23:59:59, and a new day is set from 24:00.
- the same utterance may have different intentions based on the time of utterance.
- an utterance related to tomorrow's weather at 22 o'clock can be analyzed as an intent analysis as it naturally means tomorrow's weather.
- the utterance 'Tell me the weather tomorrow' which has the highest number of utterances at 01:00, can be recognized and judged as telling the weather for the day after tomorrow, not tomorrow, but the day before, because the date has already changed from the device's perspective. there is.
- the user is still awake without sleep and may still perceive it as the previous day, not tomorrow based on absolute time, and from this perspective, 'Tell me the weather tomorrow' is the day after tomorrow based on the previous day. Instead, it may have been intended for tomorrow.
- the artificial intelligence device 10 can guide the weather for January 2, 2022, but the speaker's intention is to say the weather for January 1, 2022.
- the intention may be to receive information about the weather.
- time information e.g., timing of utterance
- the intention of the utterance is to view it as requesting weather information for tomorrow, that is, the day, based on the previous day, not tomorrow based on 01:00. It will be more likely to match the user's intent. Therefore, information about time information (e.g., timing of utterance) must be referred to in intent analysis to more accurately match the intent of the utterance.
- the first intention analysis result information by the NLP server 30 according to FIG. 1 that is, the weather information for tomorrow is provided based on 01 o'clock, it may not match the user's intention.
- time information e.g., utterance time information
- time information is further considered in response to the user's input, and time information is taken into consideration in addition to the first intention analysis result information. 2 It may be desirable to use information resulting from intent analysis.
- time information may represent information about the speaker's utterance point, as described above.
- This time information may also represent information about time referenced based on at least one of statistical values, user log data, and the general idea of time information of a user who uses the server 200 or is registered.
- the relative time can be determined by mapping the time-related utterance content among the user's input to 24 hours, which is absolute time. For example, in Table 1, 22:00 and 01:00 are different dates in absolute time, but the artificial intelligence device 10 can recognize 22:00 and 01:00 as the same date in relative time.
- the artificial intelligence device 10 or the NLP server 30 uses the same voice input at 22:00 and 01:00, that is, in the utterance "tomorrow's weather," the term “tomorrow” related to time information is a relative time standard. It can be judged to be the same date. In general, users may not be aware that the date has changed at the time of utterance, and even if they do, they tend to ignore the fact that the date has changed before going to sleep, which can lead to errors in intent analysis. Therefore, as in the present disclosure, when analyzing the user's intention, it may be more consistent with the user's intention to analyze and respond based on relative time rather than absolute time. At this time, relative time does not exclude the idea of absolute time. For example, 14 o'clock can represent the same date in either absolute or relative time. In other words, in this disclosure, the concepts of absolute time and relative time may be used interchangeably when analyzing the user's intention.
- the user when the user utters 'tomorrow's weather' at 01:00, it may be requesting the weather for tomorrow based on absolute time, that is, the day after tomorrow based on the previous day.
- the absolute time may be requested. This problem can be resolved by providing both today's weather and tomorrow's weather as a standard.
- the artificial intelligence device 10 provides both today's weather and tomorrow's weather, but if the probability of requesting today's weather as a result of intention analysis is greater than the threshold value than the probability of requesting tomorrow's weather, differential composition of weather information provided simultaneously is provided. You can put it.
- the artificial intelligence device 10 provides weather information with a relatively high probability as full or long information, but weather information with a low probability provides simple or short information. It can be provided by configuring only. Meanwhile, simple information may be changed to provide full information depending on the user's selection.
- the artificial intelligence device 10 can configure mapping information for related information and store it in the memory 710, as shown in Table 3 below.
- Table 3 the present disclosure is not limited to the content defined in Table 3.
- the artificial intelligence device 10 provides only today's (6/11) weather information when the voice input 'How's the weather?' is received between 7:00 and 20:59, but it also provides 'What's the weather like tomorrow'?
- the voice input 'What's the weather like' is received between 21:00 and 23:59
- the weather information for today (6/11) and tomorrow ( 6/12) weather information is provided together, but when a voice input saying 'what's the weather like tomorrow' is received, only the weather information for tomorrow (6/12th) is provided, and between 00:00 and 6:59 a.m.
- Tables 1 and 2 may be information on the results of intent analysis without considering day of the week information.
- the artificial intelligence device 10 requests weather information on Monday through voice input, it is intended to request weekday weather information for that week. If weather information is requested on Tuesday and Wednesday (or Thursday may also be included), it can be analyzed as if weather information for today or tomorrow is requested by referring to the above-mentioned Tables 1 and 2. If weather information is requested on Friday, the intent can be analyzed as a request for weekend weather or weather information from that day to the weekend.
- this is only an example, and the present disclosure is not limited thereto.
- the artificial intelligence device 10 may combine the time information in Tables 1 and 2 and the above-described day information to generate mapping information as shown in Table 3, and use it to analyze the intention of the user's input.
- the information provided based on the intention analysis result information may be differentiated, as described above.
- Figures 10 (a) to (c) may be an example of a user interface for weather information
- Figure 10 (d) may be another example of a user interface for weather information.
- the artificial intelligence device 10 may provide a simple version as shown in (a) to (c) of Figure 10, or a full version as shown in (d) of Figure 10. can do.
- the operation of the artificial intelligence device 10 based on the intention analysis result information considering user information is as follows.
- user information may mean, for example, single user/multiple users, logged in user, etc.
- the artificial intelligence device 10 can refer to whether a user is logged in as user information for intention analysis.
- the artificial intelligence device 10 analyzes log data of the logged in user, extracts user history data for intention analysis based on the analyzed log data, and ensures that the extracted user history data is reflected in intention analysis. You can.
- User history data includes, for example, the user's recent or previous user input-intention analysis results and feedback thereon, recent content usage history, the user's artificial intelligence device (10), content or voice command or voice input usage pattern, and usage frequency. It may include at least one of the number of uses, etc., or may be data separately generated based on it for reference in intention analysis.
- user information includes single user/multiple users, for example, information on whether multiple user inputs were entered simultaneously or sequentially within a predetermined time, and whether the user watching the artificial intelligence device 10 is single or multiple users. It may be determined based on information about whether the user's input is for a single user or multiple users (for example, a request to play a two-player game, etc.).
- the artificial intelligence device 10 can provide simple information such as (a) to (c) of Figure 10 when the user information is recognized as a single user, and when the user information is recognized as a single user, the artificial intelligence device 10 can provide simple information such as (a) to (c) of Figure 10. In this case, pool information such as (d) of FIG. 10 may be provided.
- the artificial intelligence device 10 if the logged in user matches the subject of the user input, the artificial intelligence device 10 provides full information as shown in (d) of FIG. 10 in response to the intention analysis result information. , In other cases, that is, when the logged in user and the speaker do not match each other, simple information such as any one of (a) to (c) of Figure 10 can be provided.
- user information may be combined with the time information and day information described above as well as at least one piece of information described later and used for intent analysis.
- scheduling information may represent the user's scheduling information that can be obtained through the user's mobile device or cloud server.
- the artificial intelligence device 10 can obtain related information by accessing the scheduling information with the user's consent so that it can be used.
- the artificial intelligence device 10 may process it as described above in relation to Tables 1 and 2, but may provide more precise intention analysis result information by referring to the scheduling information. You can also obtain it. For example, let's say it's rainy today and the user has a workout schedule for the weekend. In this case, when the user inputs 'what's the weather like?', the artificial intelligence device 10 may simply display today's or tomorrow's weather information, but there is concern about whether or not the schedule scheduled for the weekend, that is, the outdoor exercise schedule, can be completed. It may be because.
- the artificial intelligence device 10 acquires the user's scheduling information, extracts information whose relevance to the user's input is more than a threshold from the obtained scheduling information, and stores the extracted information in the NLP server (30).
- the NLP server 30 refers to the information in the intention analysis of the user input, and determines not just the weather for today or tomorrow or the weather during the week, but also the weather for the weekend, for the user's input 'How is the weather?' By considering outdoor exercise schedules, you can decide what weather information to provide.
- the scheduling information functions as only one intention analysis factor, but when combined with at least one of other intention analysis factors, the accuracy of intention analysis can be further increased.
- the scheduling information may be user scheduling information determined based on the above-described user information.
- the operation of the artificial intelligence device 10 based on the intention analysis result information considering the content information as the intention analysis factor is as follows.
- FIGS. 11 and 12 are diagrams illustrating operations based on intention analysis result information considering content information in an artificial intelligence device 10 according to an embodiment of the present disclosure.
- the content may indicate other information such as type, attribute, genre, etc. of the content currently being played, scheduled to be played, or scheduled to be played on the artificial intelligence device 10.
- the artificial intelligence device 10 is currently providing a news or weather application or information and receives a user's input.
- the artificial intelligence device 10 may transmit the user's input and at least one of content information currently being played, content information just before play, or content information scheduled to be played to the NLP server 30.
- the NLP server 30 extracts related words or corpora from the text data for the delivered user input and the content information delivered by the artificial intelligence device 10, and when the relevance of both is greater than a threshold, the user input
- the content information delivered by the artificial intelligence device 10 can be further referred to.
- the artificial intelligence device 10 provides news.
- the news is providing information about the weather.
- a user input such as 'Show me the weather' is received from the artificial intelligence device 10
- it is transmitted to the NLP server 30, and information about the news providing the weather information can also be provided as content information. there is.
- the NLP server 30 When analyzing the intention of the user input 'Show me the weather', the NLP server 30 performs the intention analysis by referring to the fact that news containing weather information was being played on the artificial intelligence device 10 at the time of the user input. can do.
- the NLP server 30 may provide a corresponding operation using weather information as shown in (b) of FIG. 11, which includes the region and specific weather information, as intention analysis result information.
- the artificial intelligence device 10 or the NLP server 30 is the intention analysis result information, the region to which the current user belongs and/or the region related to the weather information mentioned in the news.
- Detailed weather information e.g. full version
- the artificial intelligence device 10 when the artificial intelligence device 10 is providing a drama rather than news containing weather information, when a user input of 'Show me the weather' is received, the above-mentioned The corresponding operation may be different from the embodiment.
- the artificial intelligence device 10 transmits the content information to the NLP server 30 along with the user input, but if the NLP server 30 determines that the correlation between the content information and the user input is less than a threshold or is irrelevant, the intent analysis is performed. It may not be noted or ignored.
- the artificial intelligence device 10 can provide information on the content currently being played along with user input.
- the artificial intelligence device 10 may transmit content information including additional information to the NLP server 30 depending on the content.
- (a) of FIG. 12 is basically a travel program, and if travel information for a specific region (for example, Denmark) is provided in the corresponding episode, the additional information may include information about the region. If the correlation between content information including additional information and the user input is greater than a threshold, the artificial intelligence device 10 performs intention analysis with reference to this, and as a result of the intention analysis, (b) in FIG. 12 It is possible to provide weather information for the relevant area and/or weather information for Korea (the area to which the artificial intelligence device 10 belongs, etc.) as shown in (c) of FIG. 12 .
- the operation of the artificial intelligence device 10 based on the intention analysis result information considering spatial information as an intention analysis factor is as follows.
- FIG. 13 is a diagram illustrating an operation based on information as a result of intention analysis considering spatial information in an artificial intelligence device 10 according to an embodiment of the present disclosure.
- the space may represent a space pre-registered in at least one of the artificial intelligence device 10 and the voice service server 200.
- These spaces include a living room (e.g., space A in FIG. 13), a kitchen (e.g., space B in FIG. 13), a bedroom (e.g., space C in FIG. 13), and a child's study room (e.g., space C in FIG. 13).
- Settings can be registered by defining them in various ways, such as space D in 13.
- the space does not necessarily have to be physically one space. For example, if the living room can be identified by dividing it into living room 1, living room 2, etc., it can be defined as a separate space.
- the artificial intelligence device 10 must be able to detect or identify entry and exit into the space.
- known technologies related to spatial recognition, detection, and identification are referred to, and separate descriptions thereof are omitted.
- Spatial information may indicate identification information about the space to which the movable artificial intelligence device 10 belongs or when it enters or releases the space, as shown in FIG. 6 .
- the artificial intelligence device 10 can hold map information about spaces, and can identify each space by assigning an identifier to it. Meanwhile, each space may have different usage patterns depending on the characteristics of the space, which can be referred to in intention analysis and contribute to more accurate intention analysis of user input.
- the artificial intelligence device 10 may belong to any one of spaces A to D or enter another space.
- the same user input e.g., 'How's the weather?'
- space A living room
- space C bedroom
- the NLP server 30 may perform intention analysis on the user input using spatial identification information when the artificial intelligence device 10 belongs to space A and space C.
- the NLP server 30 if the artificial intelligence device 10 belongs to space C, the space identification information, that is, the user input in the bedroom ('what's the weather like'), is based on tomorrow's workplace rather than the current weather in the area. You may decide that you want to see information about the weather in your location.
- the NLP server 30 may refer to the spatial identification information of the artificial intelligence device 10, perform intent analysis on the user input, and derive more accurate intent analysis result information.
- the NLP server 30 may further refer to at least one of the various intention analysis factors described above (eg, time information) to derive analysis result information that better matches the user's intention.
- the above-described embodiments are one of the response operations based on the intention analysis result information received through the NLP server 30 in response to user input in the artificial intelligence device 10, according to the present disclosure, and information (recommendation information) It can be seen as an embodiment that provides.
- FIG. 14 is a diagram illustrating a user input processing method of the artificial intelligence device 10 according to an embodiment of the present disclosure.
- FIG. 15 is a diagram illustrating a user input processing method of the artificial intelligence device 10 according to another embodiment of the present disclosure.
- the user input may not necessarily be a direct request for execution of the function.
- the artificial intelligence device 10 transmits it to the NLP server 30, and receives the intention for the user input from the NLP server 30. Analysis result information can be received (S203).
- steps S201 to S203 For specific details of steps S201 to S203, refer to the above-described content in the present disclosure, and redundant description is omitted here.
- the artificial intelligence device 10 When the artificial intelligence device 10 receives intention analysis result information from the NLP server 30 through step S203, it can determine a function corresponding to the user input based on the intention analysis result information (S205).
- the artificial intelligence device 10 may determine whether the function determined in step S205 can currently be performed (S207).
- the artificial intelligence device 10 determines that the function determined in step S207 can currently be performed, it can perform and apply the function (S213).
- the artificial intelligence device 10 may configure and provide recommended function information (S209).
- the artificial intelligence device 10 may determine whether the recommended function provided through step S209 has been selected (S211).
- the artificial intelligence device 10 can perform the corresponding function (S213).
- the artificial intelligence device 10 transmits feedback data, including the user's previous input, to the NLP server 30 to further consider the feedback data and indicate the intention for the user input.
- the analysis result may be requested again, or the user input response function determination process may be performed again based on the feedback data in step S205 and subsequent procedures may be performed again.
- step S201 if the user input received in step S201 is, for example, 'Dark the screen' or 'Turn off the artificial intelligence device 10 in 30 minutes,' for example, an eye protection function in step S205. This is determined, and guide information (e.g., ‘Would you like me to set up the vision protection function?’) may also be provided.
- guide information e.g., ‘Would you like me to set up the vision protection function?’
- step S205 if the user input received in step S201 is, for example, 'dark the screen' or 'turn off the artificial intelligence device 10 in 30 minutes', then yes in step S205.
- the screen brightness function is determined and the screen can be provided darkly.
- the artificial intelligence device 10 additionally determines an eye protection function (or eye protection mode) as a recommended function and recommends guide information related thereto (e.g., 'Can I set up the eye protection function? ?) can also be provided.
- Figure 15 may represent a compensation function that can be automatically provided following or separately from Figure 14.
- FIG. 15 is illustrated by taking the case after the first function (recommended function) is set based on the user input and intent analysis result information in FIG. 14 as an example.
- the artificial intelligence device 10 can detect the occurrence of an event (S301).
- the event is at least one of receiving user input, receiving input from a remote control device such as a function request, turning on the artificial intelligence device 10, turning on/off the power of a device linked to or surrounding the artificial intelligence device 10, etc.
- a remote control device such as a function request
- the user input does not necessarily need to be input related to the above-described compensation function, nor does it necessarily need to be limited to a specific type (eg, voice).
- the artificial intelligence device 10 can configure and provide the first screen (S303).
- the first screen may be configured differently depending on the event detected in step S301. For example, if the event is a request to power on the artificial intelligence device 10, the first screen may be the initial screen. On the other hand, if the artificial intelligence device 10 is already in the power-on state and the event is the power on/off of a device linked to or around the artificial intelligence device 10, the first screen may be displayed on the currently playing content screen. This may be an OSD message or a separately configured user interface screen.
- the artificial intelligence device 10 can extract the immediately preceding user input-intention analysis result information-corresponding action information (S305).
- the artificial intelligence device 10 may determine and provide compensation information or a compensation recommendation function based on the information extracted through step S305 (S307, S309).
- the artificial intelligence device 10 may determine whether the user selects the reward information or reward recommendation function provided through step S309 (S311).
- the artificial intelligence device 10 may set the corresponding function in the artificial intelligence device 10 (S313).
- the reward information or reward recommendation function may represent, for example, a previous or previous user input-intention analysis result information-corresponding action and a corresponding action in a compensation relationship, but is not necessarily limited thereto.
- the reward information or reward recommendation function may be the same as the corresponding action based on the intention analysis result information for the previous user input, but its level or intensity may be different.
- the reward information or reward recommendation function may be a response to the content currently set in the artificial intelligence device 10, and the setting timing of the currently set content may not be a problem.
- the 15 is not necessarily activated when a user input is directly received, and may be automatically or manually performed as an operation corresponding to or following the previous or immediately preceding user input.
- the response action (information or function) according to the previous user input is based on the current state of the artificial intelligence device 10 or surrounding situation information, the user may feel uncomfortable, or the log data analysis results or user history It may be performed automatically when such user input is expected, based on the basis or the like.
- this automatic compensation operation may be based on the average contents (settings, requests, etc.) of various users registered in the voice service server rather than the individual user, which is a compensation operation provided based on the user's input. Rather, it may be because it is an automatic compensation operation. Meanwhile, in another embodiment, the reverse is also possible.
- User A's usage pattern is to set the volume to 30 in the morning, use ABC as the channel, and ask about the weather by voice.
- the user turns on the artificial intelligence device 10 in the morning, based on the above-mentioned usage pattern, it says 'Good morning', 'Shall we switch to ABC channel?', 'Today's weather'. You can perform the action ‘Do you want to see the information?’
- the artificial intelligence device 10 can operate as follows. First, in relation to the results of the operation performed the previous evening, the artificial intelligence device 10 changes the volume from 15 to 30, changes to channel BCD instead of channel ABC, and responds to 'what's the weather like' with voice input on the same day. Weather information of the expected area based on area and/or scheduling information may be provided as compensation information or compensation operation. Alternatively, the artificial intelligence device 10 may perform compensation information or compensation operations corresponding only to voice input, excluding input (volume change and channel change) through a remote control device the evening before.
- FIGS. 14 and 15 describes performing a function or performing a recommendation through a corresponding action corresponding to a user input.
- the content is not performed according to the user input.
- it is not a function corresponding to the information as a result of intention analysis, it may be interpreted as providing a function related to information or recommended information provided based on that information.
- FIG. 16 is a diagram illustrating a user input processing method of the artificial intelligence device 10 according to another embodiment of the present disclosure.
- the artificial intelligence device 10 can configure and store routine information based on user-time-space, etc., and provide services by applying the routine according to preset conditions. However, in cases where user input corresponds to or is placed in one of these routine information, definition of the processing operation may be necessary.
- the artificial intelligence device 10 can configure and store routine information (S401).
- the artificial intelligence device 10 may receive user input after S401 (S403).
- the artificial intelligence device 10 may determine whether the user input received in step S403 matches (or is related to) at least one of the routine information stored in step S401 (S405).
- step S405 if the user input in step S403 matches at least one of the routine information stored in step S401, the artificial intelligence device 10 may determine whether to execute the routine according to the remaining stored routine information (S407). .
- the artificial intelligence device 10 can execute the routine according to the remaining routine information stored as a result of the determination in step S407 (S409).
- the artificial intelligence device 10 can manually or automatically determine whether to execute the routine according to the remaining routine information stored in step S407.
- a guide is provided as to whether or not to execute the routine, and a decision can be made based on the user's input.
- whether to execute the routine may be determined with reference to at least one of the intention analysis factors.
- the artificial intelligence device 10 refers to user information-time information-spatial information, and if it matches the execution content or pattern preset in the routine or if the relevance is above a threshold, it automatically determines whether to execute the routine and applies it. can do.
- step S409 the artificial intelligence device 10 executes the remaining routines sequentially in the subsequent order if the user's input matches (or is related to) at least one of the routine information to be executed sequentially according to the set order. Only scheduled routine information can be executed.
- the artificial intelligence device 10 reads routine information corresponding to the user input, and when the user input matches (or is related to) at least one routine in the read routine information, the user related to the specific routine Even if it is input, all routines included in the read routine information may be executed.
- the artificial intelligence device 10 reads routine information corresponding to the user input, but when there is a plurality of routine information to be read, one of the routine information is manually or automatically selected, It can be processed as described above.
- the artificial intelligence device 10 provides a guide message to select specific routine information through the screen, and performs the specific routine information selected according to the user's input and the procedure after step S407.
- the artificial intelligence device 10 further refers to at least one of the intention analysis factors and the execution schedule information of the original corresponding routines, selects the most relevant, that is, optimal specific routine information, and adds the selected routine information to the selected routine information. Based on this, the procedures after step S407 can be performed. In this case, the unselected routine information may or may not be briefly provided through the screen.
- FIG. 17 is a diagram illustrating a recommendation query 1720 through the speech agent 1710 according to an embodiment of the present disclosure.
- the artificial intelligence device 10 can provide the speech agent 1710 on one area of the display 150 or 151.
- the artificial intelligence device 10 may also provide a recommendation query 1720 when providing the speech agent 1710.
- the recommended query 1720 may be for the convenience of users using the speech agent 1710.
- the utterance agent 1710 may be provided by a remote trigger word or upon occurrence of an arbitrary event, but is not limited thereto.
- the recommended query 1720 may be provided with at least one piece of query information randomly determined at the time the speech agent 1710 is provided.
- the artificial intelligence device 10 when providing the speech agent 1710, may determine and provide a recommended query by referring to at least one of the intent analysis factors. .
- the recommended query included in the speech agent 1710 may or may not change each time the speech agent 1710 is provided.
- the recommendation query may determine the reason for providing the speech agent 1710 by comparing it with at least one of the intention analysis factors at the time the speech agent 1710 is provided, and determine the recommendation query based on the comparison determination result.
- the recommended query may be determined based on at least one of user, time, and space. That is, the artificial intelligence device 10 may create at least one recommendation query 1720 based on a recommendation keyword configured based on at least one of the intent analysis factors.
- the query information includes previous or immediately preceding utterance information, compensation function-related information, recommended utterance information based on the previous or immediately preceding utterance, help information for using the voice recognition function, utterance information related to the current content, etc.
- At least one of arbitrarily determined utterance information may be included in consideration of the various intention analysis factors described above in the present disclosure.
- Table 4 shows an example of a recommended query (or recommended keyword) based on time information, which is one of the intent analysis factors.
- Table 4 may exist in a form stored or embedded in the artificial intelligence device 10, or may be stored in the NLP server 30. Meanwhile, the recommended queries (recommended keywords) in Table 4 can be continuously updated. The update may be customized to the user or may be updated based on the user's usage information or all artificial intelligence devices registered with the voice service server 200. However, the present disclosure is not limited to the content disclosed in Table 4.
- At least one of the operations performed by the artificial intelligence device 10 may be performed by the NLP server 30, and vice versa.
- the order of at least some of the operations disclosed in this disclosure may be performed simultaneously, may be performed in an order different from the previously described order, or some may be omitted/added.
- the above-described method can be implemented as processor-readable code on a program-recorded medium.
- media that the processor can read include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices.
- the display device described above is not limited to the configuration and method of the above-described embodiments, and the embodiments may be configured by selectively combining all or part of each embodiment so that various modifications can be made. It may be possible.
- the quality of the voice recognition service can be improved and the user's satisfaction with device use can be maximized by deriving the optimal intention analysis result that matches various user inputs and providing corresponding information or performing a function. Therefore, it has industrial applicability.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
| 한국 | |||
| 발화문 | 발화횟수비중(%) | ||
| 22시 | 1 | 내일 날씨 | 12.7 |
| 2 | 날씨 | 10.3 | |
| 3 | 내일 날씨 어때 | 7.3 | |
| 4 | 오늘 날씨 | 6.7 | |
| 5 | 내일 날씨 알려줘 | 5.5 | |
| 01시 | 1 | 내일 날씨 알려줘 | 12.8 |
| 2 | 오늘의 날씨 | 12.8 | |
| 3 | 날씨 | 10.3 | |
| 4 | 주말 날씨 | 7.7 | |
| 5 | 날씨 알려줘 | 5.1 | |
| 이탈리아 | |||
| 발화문 | 발화횟수비중(%) | ||
| 22시 | 1 | Meteo(Weather forecast) | 7.3 |
| 2 | Che tempo fa oggi(What's the weather today) | 6.7 | |
| 3 | Che tempo fa domani(What's the weather like tomorrow) | 6.1 | |
| 4 | Previsioni(Forecast) | 4.9 | |
| 5 | Che tempo fa(What's the weather) | 2.4 | |
| 01시 | 1 | Che tempo fa domani(What's the weather like tomorrow) | 9.7 |
| 2 | Che tempo fa oggi?(What's the weather like today?) | 9.7 | |
| 3 | Meteo (Weather forecast) | 9.7 | |
| 4 | Meteo Trapani(Trapani weather) | 6.5 | |
| 5 | Previsioni Meteo del Fine Settimana(Weekend Weather Forecast) | 6.5 | |
| 7-20:59 | 21-23:59 | 0-6:59 | |
| '날씨 어때' | 오늘(6/11) 날씨로 표시 | 오늘(6/11) 날씨 + 내일(6/12) 날씨Ex. '지금은 6월 11일 10시이고, 오늘 밤은 기온이 00도, 습도 00으로 후덥지근하며, 내일 6월 12일 날씨는 아침 기온 00도, 최고 기온은 00도이며, 습도도는 낮아 화창할 것으로 예상됩니다.' | 오늘(6/12) 날씨로 표시 |
| '내일 날씨 어때' | 내일(6/12) 날씨로 표시 | 내일(6/12) 날씨로 표시 | 오늘(6/12) 날씨 + 내일(6/13) 날씨Ex. '지금은 6월 12일 2시이고 오늘 아침은 기온이 00도로 맑고 화창할 것으로 예상됩니다. 내일 6월 13일은 기온이 00도로 ~' |
| 추천 쿼리 (추천 키워드) |
현재 시간대 | |||
| 새벽 (24:00~04:59) |
오전 (05:00~11:59) |
오후 (12:00~17:59) |
저녁 (18:00~23:59) |
|
| 오늘 날씨 어때? | O | O | ||
| 내일 날씨 알려줘 | O | O | ||
| 이번 주말 날씨는? | O | O | O | |
| 유산소 운동 찾아줘 | O | O | ||
| 청력보호모드 켜줘 | O | O | ||
| 시력보호모드 켜줘 | O | O | ||
| 취침예약해줘(30분 뒤 TV 꺼줘) | O | O | ||
| 지금 몇 시야? | O | O | O | O |
| 10분뒤 알람 설정해줘/7시에 알람 설정해줘 | O | O | ||
| 12시에 TV꺼줘 | O | |||
| 외부 입력 목록 보여줘 | O | O | O | |
| 어제 이 시간에 본 채널 틀어줘 | O | O | O | |
| 화면 어둡게 해줘 | O | |||
| 화면 밝게 해줘 | O | |||
| 에어컨 켜줘 | O | O | O | |
| 식기 세척기 다됐어? | O | O | O | |
| 블루투스 스피커 연결해줘 | O | O | O | |
| LG Fitness | O | O | O | |
| 볼만한 거 없어? (매직링크) | O | O | O | O |
| 매직 익스플로러 | O | O | O | O |
| 이 음악 뭐야? | O | O | O | O |
| 화면 설정 | O | O | O | O |
| 스포츠 알람 | O | O | O | O |
| 채널 잠금 | O | O | O | O |
| 게임 홈 | O | O | O | O |
| 멀티 액션 보여줘 | O | O | O | O |
| 음향 설정 | O | O | O | O |
| 나 요즘에 뭐 봤지? | O | O | O | O |
| {인물명} 누구야? | O | O | O | O |
| 화면 꺼줘 | O | O | O | O |
Claims (15)
- 디스플레이; 및상기 디스플레이를 제어하는 프로세서를 포함하되,상기 프로세서는, 사용자 입력을 수신하고, 상기 사용자 입력을 서버로 전송하고, 상기 서버로부터 상기 사용자 입력에 대한 의도 분석 결과 정보가 포함된 응답 정보를 수신하고, 상기 응답 정보에 따라 정보 출력 및 기능 수행 중 적어도 하나의 동작을 수행하되,상기 의도 분석 결과 정보에는, 상기 서버로 전송된 적어도 하나 이상의 의도 분석 팩터들에 기초하여 1차 처리된 상기 사용자 입력에 대해 의도 분석된 결과가 포함되는,인공지능 기기.
- 제1항에 있어서,상기 프로세서는,상기 수행된 적어도 하나의 동작에 따른 사용자의 피드백 데이터를 수신하고, 상기 피드백 데이터에 기초하여 상기 하나 또는 그 이상의 의도 분석 팩터들을 업데이트하고, 업데이트된 상기 의도 분석 팩터들을 상기 서버로 전송하는,인공지능 기기.
- 제2항에 있어서,상기 프로세서와 통신하여 데이터를 저장하는 메모리를 더 포함하고,상기 프로세서는, 상기 응답 정보를 파싱하고, 파싱된 응답 정보에 기초하여 출력할 정보 및 기능 수행에 관련된 정보를 상기 메모리로부터 독출하는,인공지능 기기.
- 제3항에 있어서,상기 프로세서는,상기 응답 정보 내 의도 분석 결과 정보에 따른 정보를 출력하는 경우, 상기 서버로 전송된 하나 또는 그 이상의 의도 분석 팩터들에 기초하여, 제1 버전 또는 제2 버전으로 구성된 정보를 제공하는,인공지능 기기.
- 제4항에 있어서,상기 프로세서는,상기 응답 정보 내 의도 분석 결과 정보에 따른 기능을 수행하는 경우, 해당 기능의 수행 가능 여부를 판단하고, 상기 판단 결과 수행 불가능한 경우에는 추천 기능 정보를 제공하고, 추천 기능을 대신하여 수행하는,인공지능 기기.
- 제1항에 있어서,상기 프로세서는,이벤트 발생이 감지되면, 직전 사용자 입력, 의도 분석 결과 정보 및 기능 수행 동작 정보를 추출하여, 추천 보상 기능에 대한 정보 중 적어도 하나에 대한 정보를 출력하고, 추천 보상 기능을 수행하는,인공지능 기기.
- 제1항에 있어서,상기 프로세서는,기능 수행에 관한 루틴 정보를 구성하여 메모리에 저장하고, 상기 수신된 의도 분석 결과 정보가 상기 저장된 루틴 정보 내 정의된 루틴들 중 적어도 하나와 관련성이 있는 경우, 상기 루틴 정보에 포함된 나머지 루틴을 자동으로 실행하는,인공지능 기기.
- 제1항에 있어서,상기 프로세서는,적어도 하나의 추천 쿼리가 포함된 발화 에이전트를 제공하되, 상기 적어도 하나의 추천 쿼리는, 상기 의도 분석 팩터들 중 적어도 하나에 기초하여 구성된 추천 키워드 기반으로 작성되는,인공지능 기기.
- 사용자 입력을 수신하는 단계;상기 사용자 입력을 서버로 전송하는 단계;상기 서버로부터 상기 사용자 입력에 대한 의도 분석 결과 정보가 포함된 응답 정보를 수신하는 단계; 및상기 응답 정보에 따라 정보 출력 및 기능 수행 중 적어도 하나의 동작을 수행하는 단계를 포함하되,상기 의도 분석 결과 정보에는, 상기 서버로 전송된 적어도 하나 이상의 의도 분석 팩터들에 기초하여 1차 처리된 상기 사용자 입력에 대해 의도 분석된 결과가 포함되는,인공지능 기기의 동작 방법.
- 제9항에 있어서,상기 수행된 적어도 하나의 동작에 따른 사용자의 피드백 데이터를 수신하는 단계; 상기 피드백 데이터에 기초하여 상기 하나 또는 그 이상의 의도 분석 팩터들을 업데이트하는 단계; 및업데이트된 상기 의도 분석 팩터들을 상기 서버로 전송하는 단계를 더 포함하는,인공지능 기기의 동작 방법.
- 제10항에 있어서,상기 응답 정보 내 의도 분석 결과 정보에 따른 정보를 출력하는 경우, 상기 서버로 전송된 하나 또는 그 이상의 분석 팩터들에 기초하여, 제1 버전 또는 제2 버전으로 구성된 정보가 제공되는,인공지능 기기의 동작 방법.
- 제11항에 있어서,상기 응답 정보 내 의도 분석 결과 정보에 따른 기능을 수행하는 경우, 해당 기능의 수행 가능 여부를 판단하고, 상기 판단 결과 수행 불가능한 경우에는 추천 기능 정보를 제공하고, 추천 기능이 대신하여 수행되는,인공지능 기기의 동작 방법.
- 제9항에 있어서,이벤트 발생을 감지하는 단계;직전 사용자 입력, 의도 분석 결과 정보 및 기능 수행 동작 정보를 추출하는 단계;추천 보상 기능에 대한 정보 중 적어도 하나에 대한 정보를 출력하는 단계; 및추천 보상 기능을 수행하는 단계를 더 포함하는,인공지능 기기의 동작 방법.
- 제9항에 있어서,기능 수행에 관한 루틴 정보를 저장하는 단계; 및상기 수신된 의도 분석 결과 정보가 상기 저장된 루틴 정보 내 정의된 루틴들 중 적어도 하나와 관련성이 있는 경우, 상기 루틴 정보에 포함된 나머지 루틴을 자동으로 실행하는 단계를 더 포함하는,인공지능 기기의 동작 방법.
- 제9항에 있어서,적어도 하나의 추천 쿼리가 포함된 발화 에이전트를 제공하는 단계;를 더 포함하되,상기 적어도 하나의 추천 쿼리는, 상기 의도 분석 팩터들 중 적어도 하나에 기초하여 구성된 추천 키워드 기반으로 작성되는,인공지능 기기의 동작 방법.
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2022/014593 WO2024071469A1 (ko) | 2022-09-28 | 2022-09-28 | 인공지능 기기 및 그의 동작 방법 |
| US19/116,739 US20260105086A1 (en) | 2022-09-28 | 2022-09-28 | Artificial intelligence device and method for operating same |
| KR1020257013755A KR20250077550A (ko) | 2022-09-28 | 2022-09-28 | 인공지능 기기 및 그의 동작 방법 |
| EP22961072.0A EP4586132A4 (en) | 2022-09-28 | 2022-09-28 | ARTIFICIAL INTELLIGENCE DEVICE AND METHOD OF OPERATING THE SAME |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2022/014593 WO2024071469A1 (ko) | 2022-09-28 | 2022-09-28 | 인공지능 기기 및 그의 동작 방법 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024071469A1 true WO2024071469A1 (ko) | 2024-04-04 |
Family
ID=90478247
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/014593 Ceased WO2024071469A1 (ko) | 2022-09-28 | 2022-09-28 | 인공지능 기기 및 그의 동작 방법 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20260105086A1 (ko) |
| EP (1) | EP4586132A4 (ko) |
| KR (1) | KR20250077550A (ko) |
| WO (1) | WO2024071469A1 (ko) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119670901A (zh) * | 2025-02-21 | 2025-03-21 | 四川蜀天信息技术有限公司 | 一种基于大模型的人工智能代理方法和装置 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20160142802A (ko) * | 2011-09-30 | 2016-12-13 | 애플 인크. | 가상 비서에서 커맨드 처리를 용이하게 하기 위한 컨텍스트 정보의 이용 |
| KR20180068850A (ko) * | 2016-12-14 | 2018-06-22 | 삼성전자주식회사 | 전자 장치, 그의 가이드 제공 방법 및 비일시적 컴퓨터 판독가능 기록매체 |
| KR20200013152A (ko) * | 2018-07-18 | 2020-02-06 | 삼성전자주식회사 | 이전에 대화를 수집한 결과를 기반으로 인공 지능 서비스를 제공하는 전자 장치 및 방법 |
| US20220111855A1 (en) * | 2020-10-09 | 2022-04-14 | Toyota Jidosha Kabushiki Kaisha | Agent device, agent method and storage medium storing agent program |
| KR20220109238A (ko) * | 2021-01-28 | 2022-08-04 | 삼성전자주식회사 | 사용자의 발화 입력에 관련된 추천 문장을 제공하는 디바이스 및 방법 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10437889B2 (en) * | 2013-01-31 | 2019-10-08 | Lf Technology Development Corporation Limited | Systems and methods of providing outcomes based on collective intelligence experience |
| US10811013B1 (en) * | 2013-12-20 | 2020-10-20 | Amazon Technologies, Inc. | Intent-specific automatic speech recognition result generation |
| US20150370787A1 (en) * | 2014-06-18 | 2015-12-24 | Microsoft Corporation | Session Context Modeling For Conversational Understanding Systems |
| US11816435B1 (en) * | 2018-02-19 | 2023-11-14 | Narrative Science Inc. | Applied artificial intelligence technology for contextualizing words to a knowledge base using natural language processing |
| CN111699469B (zh) * | 2018-03-08 | 2024-05-10 | 三星电子株式会社 | 基于意图的交互式响应方法及其电子设备 |
| WO2020248223A1 (en) * | 2019-06-14 | 2020-12-17 | Beijing Didi Infinity Technology And Development Co., Ltd. | Reinforcement learning method for driver incentives: generative adversarial network for driver-system interactions |
| US20210089860A1 (en) * | 2019-09-24 | 2021-03-25 | Sap Se | Digital assistant with predictions, notifications, and recommendations |
-
2022
- 2022-09-28 EP EP22961072.0A patent/EP4586132A4/en active Pending
- 2022-09-28 KR KR1020257013755A patent/KR20250077550A/ko active Pending
- 2022-09-28 US US19/116,739 patent/US20260105086A1/en active Pending
- 2022-09-28 WO PCT/KR2022/014593 patent/WO2024071469A1/ko not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20160142802A (ko) * | 2011-09-30 | 2016-12-13 | 애플 인크. | 가상 비서에서 커맨드 처리를 용이하게 하기 위한 컨텍스트 정보의 이용 |
| KR20180068850A (ko) * | 2016-12-14 | 2018-06-22 | 삼성전자주식회사 | 전자 장치, 그의 가이드 제공 방법 및 비일시적 컴퓨터 판독가능 기록매체 |
| KR20200013152A (ko) * | 2018-07-18 | 2020-02-06 | 삼성전자주식회사 | 이전에 대화를 수집한 결과를 기반으로 인공 지능 서비스를 제공하는 전자 장치 및 방법 |
| US20220111855A1 (en) * | 2020-10-09 | 2022-04-14 | Toyota Jidosha Kabushiki Kaisha | Agent device, agent method and storage medium storing agent program |
| KR20220109238A (ko) * | 2021-01-28 | 2022-08-04 | 삼성전자주식회사 | 사용자의 발화 입력에 관련된 추천 문장을 제공하는 디바이스 및 방법 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4586132A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119670901A (zh) * | 2025-02-21 | 2025-03-21 | 四川蜀天信息技术有限公司 | 一种基于大模型的人工智能代理方法和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20260105086A1 (en) | 2026-04-16 |
| EP4586132A4 (en) | 2025-08-13 |
| EP4586132A1 (en) | 2025-07-16 |
| KR20250077550A (ko) | 2025-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017160073A1 (en) | Method and device for accelerated playback, transmission and storage of media files | |
| WO2020246844A1 (en) | Device control method, conflict processing method, corresponding apparatus and electronic device | |
| WO2021071115A1 (en) | Electronic device for processing user utterance and method of operating same | |
| WO2020222444A1 (en) | Server for determining target device based on speech input of user and controlling target device, and operation method of the server | |
| WO2020017849A1 (en) | Electronic device and method for providing artificial intelligence services based on pre-gathered conversations | |
| WO2020196955A1 (ko) | 인공 지능 기기 및 인공 지능 기기의 동작 방법 | |
| WO2018182202A1 (en) | Electronic device and method of executing function of electronic device | |
| WO2020218650A1 (ko) | 전자기기 | |
| WO2018194268A1 (en) | Electronic device and method for processing user speech | |
| WO2020246634A1 (ko) | 다른 기기의 동작을 제어할 수 있는 인공 지능 기기 및 그의 동작 방법 | |
| WO2019039834A1 (en) | METHOD FOR PROCESSING VOICE DATA AND ELECTRONIC DEVICE SUPPORTING SAID METHOD | |
| WO2013168860A1 (en) | Method for displaying text associated with audio file and electronic device | |
| WO2019078588A1 (ko) | 전자 장치 및 그의 동작 방법 | |
| WO2016114428A1 (ko) | 문법 모델을 이용하여 음성인식을 수행하는 방법 및 디바이스 | |
| WO2017039142A1 (en) | User terminal apparatus, system, and method for controlling the same | |
| WO2019182325A1 (ko) | 전자 장치 및 전자 장치의 음성 인식 제어 방법 | |
| WO2023085584A1 (en) | Speech synthesis device and speech synthesis method | |
| WO2013176366A1 (en) | Method and electronic device for easy search during voice record | |
| WO2013176365A1 (en) | Method and electronic device for easily searching for voice record | |
| EP3603040A1 (en) | Electronic device and method of executing function of electronic device | |
| WO2020263016A1 (ko) | 사용자 발화를 처리하는 전자 장치와 그 동작 방법 | |
| WO2020218635A1 (ko) | 인공 지능을 이용한 음성 합성 장치, 음성 합성 장치의 동작 방법 및 컴퓨터로 판독 가능한 기록 매체 | |
| WO2018174445A1 (ko) | 파셜 랜딩 후 사용자 입력에 따른 동작을 수행하는 전자 장치 | |
| WO2020153720A1 (ko) | 사용자 음성을 처리하는 전자장치 및 그 제어 방법 | |
| WO2020222338A1 (ko) | 화상 정보를 제공하는 인공 지능 장치 및 그 방법 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22961072 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022961072 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022961072 Country of ref document: EP Effective date: 20250411 |
|
| ENP | Entry into the national phase |
Ref document number: 20257013755 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2022961072 Country of ref document: EP |