WO2025095151A1 - Dispositif d'affichage et procédé de fonctionnement s'y rapportant - Google Patents
Dispositif d'affichage et procédé de fonctionnement s'y rapportant Download PDFInfo
- Publication number
- WO2025095151A1 WO2025095151A1 PCT/KR2023/016981 KR2023016981W WO2025095151A1 WO 2025095151 A1 WO2025095151 A1 WO 2025095151A1 KR 2023016981 W KR2023016981 W KR 2023016981W WO 2025095151 A1 WO2025095151 A1 WO 2025095151A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- name
- server
- error
- artificial intelligence
- names
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
Definitions
- the present disclosure relates to a display device, and to proactively responding to mispronunciation or misrecognition of a speaker.
- Digital TV services using wired or wireless communication networks are becoming widespread. Digital TV services can provide a variety of services that were not available with existing analog broadcasting services.
- IPTV Internet Protocol Television
- smart TV service provide interactivity that allows users to actively select the type of program to watch, the viewing time, etc. Based on this interactivity, IPTV and smart TV services can also provide various additional services, such as Internet search, home shopping, and online games.
- Recent TVs provide voice recognition services that recognize the voice spoken by the user and provide services.
- misfires or misrecognitions have a certain degree of regularity, but at the same time they are complex enough that they cannot be covered by a rule-base.
- the purpose of this disclosure is to predict in advance the number of cases in which a user's content name or search term is mispronounced or misrecognized, and to proactively respond to mispronunciations and misrecognitions.
- the present disclosure aims to reduce search failures when a user utters only some of the key words of a content name or replaces some of the key words with synonyms.
- An artificial intelligence device may include a memory that stores a name and a plurality of error names matching the name; a communication unit that communicates with an electronic device or a generation AI (Artificial Intelligence) server; and a processor that receives voice data corresponding to a voice command uttered by a user from the electronic device, obtains an analysis result based on the received voice data, and if any one of the plurality of stored error names is included in the obtained analysis result, obtains a name matching the plurality of error names, and transmits a search result for the obtained name to the electronic device.
- a memory that stores a name and a plurality of error names matching the name
- a communication unit that communicates with an electronic device or a generation AI (Artificial Intelligence) server
- a processor that receives voice data corresponding to a voice command uttered by a user from the electronic device, obtains an analysis result based on the received voice data, and if any one of the plurality of stored error names is included in the obtained analysis
- the above name can be either a content name or a search term.
- the above processor can reduce the number of the error names as the popularity ranking or the search ranking becomes lower.
- each of the plurality of error names may be a combination of variant words of each keyword.
- a method of operating an artificial intelligence device may include: storing a name and a plurality of error names matching the name; receiving voice data corresponding to a voice command uttered by a user from an electronic device; obtaining an analysis result based on the received voice data; obtaining a name matching the plurality of error names when the obtained analysis result includes any one of the stored plurality of error names; and transmitting a search result for the obtained name to the electronic device.
- the above name can be either a content name or a search term.
- the above method of operation may further include the step of transmitting a prompt requesting generation of an error name for a name of a new content acquired from a new content database to a generation AI server; and the step of receiving error names for the name of the new content from the generation AI server.
- the above operation method may further include a step of transmitting a prompt requesting generation of an error name for the search term having a search frequency of a predetermined frequency or higher to a generation AI server; and a step of receiving error names for the search term from the generation AI server.
- the above operating method may further include a step of transmitting a command to the electronic device to cause the electronic device to output a pop-up window for confirming whether the acquired name matches the user's intention.
- the above operating method may further include a step of adjusting the number of stored error names according to the popularity ranking of the content corresponding to the content name or the search ranking of the search word.
- the above adjusting step may include a step of reducing the number of the error names as the popularity ranking or the search ranking becomes lower.
- search failures due to frequent recognition failures caused by errors in voice recognition according to STT can be reduced.
- the user experience for voice search can be improved by reducing search failures when users utter only some of the key words of the content name or replace some of the key words with synonyms.
- FIG. 1 is a block diagram illustrating the configuration of a display device according to an embodiment of the present invention.
- FIG. 2 is a block diagram of a remote control device according to an embodiment of the present invention.
- Figure 3 shows an example of an actual configuration of a remote control device according to an embodiment of the present invention.
- FIG. 4 shows an example of utilizing a remote control device according to an embodiment of the present invention.
- FIG. 5 illustrates an artificial intelligence (AI) server according to one embodiment of the present disclosure.
- FIG. 6 is a diagram for explaining the configuration of an AI system according to one embodiment of the present disclosure.
- FIG. 7 is a ladder diagram for explaining an operation method of an AI system according to an embodiment of the present disclosure.
- FIG. 8 is a diagram illustrating a process of receiving a plurality of error names corresponding to content names from a generation AI server according to one embodiment of the present disclosure.
- Figures 9a and 9b are diagrams illustrating the output results of LLM according to a prompt to generate an error name of a content name.
- FIG. 10 is a diagram illustrating a table matching content names and multiple error names according to one embodiment of the present disclosure.
- Figure 11 is a diagram illustrating a process for providing search results for desired content even when a user incorrectly pronounces the content name.
- Figure 12 is a diagram illustrating an example of providing a pop-up window to confirm whether the content name intended by the user is correct when the user incorrectly pronounces the content name.
- the display device is, for example, an intelligent display device that adds a computer support function to a broadcast reception function, and while remaining faithful to the broadcast reception function, it can have an Internet function, etc., and can have a more convenient interface such as a manual input device, a touch screen, or a space remote control.
- a wired or wireless Internet function it can be connected to the Internet and a computer, and can also perform functions such as e-mail, web browsing, banking, or games.
- a standardized general-purpose OS can be used for these various functions.
- the display device described in the present invention can perform various user-friendly functions since various applications can be freely added or deleted, for example, on a general-purpose OS kernel. More specifically, the display device can be, for example, a network TV, an HBBTV, a smart TV, an LED TV, an OLED TV, etc., and in some cases, can also be applied to a smartphone.
- FIG. 1 is a block diagram illustrating the configuration of a display device according to one embodiment of the present invention.
- the display device (100) may include a broadcast receiving unit (130), an external device interface (135), a memory (140), a user input interface (150), a controller (170), a wireless communication interface (173), a display (180), a speaker (185), and a power supply circuit (190).
- the broadcast receiving unit (130) may include a tuner (131), a demodulator (132), and a network interface (133).
- the tuner (131) can select a specific broadcast channel according to a channel selection command.
- the tuner (131) can receive a broadcast signal for the selected specific broadcast channel.
- a demodulator (132) can separate a received broadcast signal into a video signal, an audio signal, and a data signal related to a broadcast program, and can restore the separated video signal, audio signal, and data signal into a form that can be output.
- the external device interface (135) can receive an application or a list of applications within an adjacent external device and transmit them to the controller (170) or memory (140).
- the external device interface (135) can provide a connection path between the display device (100) and the external device.
- the external device interface (135) can receive one or more of images and audio output from an external device connected wirelessly or wiredly to the display device (100) and transmit them to the controller (170).
- the external device interface (135) can include a plurality of external input terminals.
- the plurality of external input terminals can include an RGB terminal, one or more HDMI (High Definition Multimedia Interface) terminals, and a component terminal.
- a video signal of an external device input through an external device interface (135) can be output through a display (180).
- a voice signal of an external device input through an external device interface (135) can be output through a speaker (185).
- An external device that can be connected to the external device interface (135) may be any one of a set-top box, a Blu-ray player, a DVD player, a game console, a sound bar, a smartphone, a PC, a USB memory, and a home theater, but these are only examples.
- the network interface (133) can provide an interface for connecting the display device (100) to a wired/wireless network including the Internet.
- the network interface (133) can transmit or receive data to or from another user or another electronic device through the connected network or another network linked to the connected network.
- some of the content data stored in the display device (100) can be transmitted to a selected user or electronic device among other users or other electronic devices pre-registered in the display device (100).
- the network interface (133) can access a predetermined web page through a connected network or another network linked to the connected network. That is, it can access a predetermined web page through a network and transmit or receive data with the corresponding server.
- the network interface (133) can receive content or data provided by a content provider or a network operator. That is, the network interface (133) can receive content such as movies, advertisements, games, VOD, broadcast signals, etc., and information related thereto provided from a content provider or a network provider through a network.
- the network interface (133) can receive firmware update information and update files provided by the network operator, and transmit data to the Internet or content provider or network operator.
- the network interface (133) can select and receive a desired application from among applications open to the public via a network.
- the memory (140) stores programs for each signal processing and control within the controller (170) and can store processed images, voices, or data signals.
- the memory (140) may perform a function for temporary storage of image, voice, or data signals input from an external device interface (135) or a network interface (133), and may also store information about a specific image through a channel memory function.
- the memory (140) can store an application or a list of applications input from an external device interface (135) or a network interface (133).
- the display device (100) can play content files (video files, still image files, music files, document files, application files, etc.) stored in the memory (140) and provide them to the user.
- content files video files, still image files, music files, document files, application files, etc.
- the user input interface (150) can transmit a signal input by the user to the controller (170), or transmit a signal from the controller (170) to the user.
- the user input interface (150) can receive and process control signals such as power on/off, channel selection, and screen setting from the remote control device (200) according to various communication methods such as Bluetooth, Ultra Wideband (WB), ZigBee, RF (Radio Frequency) communication, or infrared (IR) communication, or process control signals from the controller (170) to be transmitted to the remote control device (200).
- the user input interface (150) can transmit control signals input from local keys (not shown) such as a power key, channel key, volume key, and setting value to the controller (170).
- An image signal processed by the controller (170) may be input to the display (180) and displayed as an image corresponding to the image signal.
- an image signal processed by the controller (170) may be input to an external output device through an external device interface (135).
- the voice signal processed in the controller (170) can be output as audio to the speaker (185). Additionally, the voice signal processed in the controller (170) can be input to an external output device through the external device interface (135).
- controller (170) can control the display device (100) by a user command or an internal program input through the user input interface (150), and can connect to a network to allow the user to download a desired application or application list into the display device (100).
- the controller (170) enables the user-selected channel information, etc. to be output through the display (180) or speaker (185) together with the processed video or audio signal.
- controller (170) allows a video signal or audio signal from an external device, for example, a camera or camcorder, input through the external device interface (135) to be output through the display (180) or speaker (185) in accordance with an external device video playback command received through the user input interface (150).
- an external device for example, a camera or camcorder
- the controller (170) can control the display (180) to display an image, for example, a broadcast image input through a tuner (131), an external input image input through an external device interface (135), an image input through a network interface unit, or an image stored in a memory (140) can be controlled to be displayed on the display (180).
- the image displayed on the display (180) can be a still image or a moving image, and can be a 2D image or a 3D image.
- controller (170) can control the playback of content stored in the display device (100), received broadcast content, or external input content input from the outside, and the content can be in various forms such as broadcast images, external input images, audio files, still images, connected web screens, and document files.
- the wireless communication interface (173) can perform communication with an external device through wired or wireless communication.
- the wireless communication interface (173) can perform short range communication with an external device.
- the wireless communication interface (173) can support short range communication by using at least one of BluetoothTM, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technologies.
- Such a wireless communication interface (173) can support wireless communication between the display device (100) and a wireless communication system, between the display device (100) and another display device (100), or between the display device (100) and a network where the display device (100, or an external server) is located via a wireless area network.
- the wireless area network can be a wireless personal area network.
- the other display device (100) may be a wearable device (e.g., a smartwatch, smart glass, HMD (head mounted display)), a mobile terminal such as a smart phone, etc., which can exchange data with (or be linked to) the display device (100) according to the present invention.
- the wireless communication interface (173) may detect (or recognize) a wearable device capable of communication around the display device (100).
- the controller (170) can transmit at least a portion of the data processed in the display device (100) to the wearable device via the wireless communication interface (173). Accordingly, a user of the wearable device can use the data processed in the display device (100) via the wearable device.
- the display (180) can generate a driving signal by converting an image signal, data signal, OSD signal processed by the controller (170) or an image signal, data signal, etc. received from an external device interface (135) into R, G, and B signals, respectively.
- the display device (100) illustrated in FIG. 1 is only an embodiment of the present invention, some of the illustrated components may be integrated, added, or omitted depending on the specifications of the display device (100) actually implemented.
- the display device (100) may receive and play back an image through a network interface (133) or an external device interface (135) without having a tuner (131) and a demodulator (132).
- the display device (100) may be implemented separately as an image processing device, such as a set-top box for receiving contents according to broadcast signals or various network services, and a content playback device for playing contents input from the image processing device.
- an image processing device such as a set-top box for receiving contents according to broadcast signals or various network services
- a content playback device for playing contents input from the image processing device.
- the operating method of the display device according to the embodiment of the present invention to be described below may be performed by any one of the display device (100) described with reference to FIG. 1, as well as an image processing device such as the separated set-top box, or a content playback device having a display (180) and an audio output unit (185).
- FIG. 2 is a block diagram of a remote control device according to an embodiment of the present invention
- FIG. 3 shows an example of an actual configuration of a remote control device (200) according to an embodiment of the present invention.
- the remote control device (200) may include a fingerprint recognition device (210), a wireless communication circuit (220), a user input interface (230), a sensor (240), an output interface (250), a power supply circuit (260), a memory (270), a controller (280), and a microphone (290).
- the wireless communication circuit (220) transmits and receives signals with any one of the display devices according to the embodiments of the present invention described above.
- the remote control device (200) may be equipped with an RF circuit (221) capable of transmitting and receiving signals with the display device (100) in accordance with RF communication standards, and an IR circuit (223) capable of transmitting and receiving signals with the display device (100) in accordance with IR communication standards.
- the remote control device (200) may be equipped with a Bluetooth circuit (225) capable of transmitting and receiving signals with the display device (100) in accordance with Bluetooth communication standards.
- the remote control device (200) may be equipped with an NFC circuit (227) capable of transmitting and receiving signals with the display device (100) in accordance with NFC (Near Field Communication) communication standards, and a WLAN circuit (229) capable of transmitting and receiving signals with the display device (100) in accordance with WLAN (Wireless LAN) communication standards.
- NFC Near Field Communication
- WLAN Wireless LAN
- the remote control device (200) transmits a signal containing information about the movement of the remote control device (200) to the display device (100) through a wireless communication circuit (220).
- the remote control device (200) can receive a signal transmitted by the display device (100) through the RF circuit (221), and, if necessary, can transmit commands for turning the power on/off, changing the channel, changing the volume, etc. to the display device (100) through the IR circuit (223).
- the user input interface (230) may be composed of a keypad, a button, a touch pad, or a touch screen.
- the user may input a command related to the display device (100) to the remote control device (200) by operating the user input interface (230). If the user input interface (230) has a hard key button, the user may input a command related to the display device (100) to the remote control device (200) by pushing the hard key button. This will be described with reference to FIG. 3.
- the remote control device (200) may include a plurality of buttons.
- the plurality of buttons may include a fingerprint recognition button (212), a power button (231), a home button (232), a live button (233), an external input button (234), a volume control button (235), a voice recognition button (236), a channel change button (237), a confirmation button (238), and a back button (239).
- the fingerprint recognition button (212) may be a button for recognizing a user's fingerprint.
- the fingerprint recognition button (212) may be capable of a push operation, and may receive a push operation and a fingerprint recognition operation.
- the power button (231) may be a button for turning the power of the display device (100) on/off.
- the home button (232) may be a button for moving to the home screen of the display device (100).
- the live button (233) may be a button for displaying a real-time broadcast program.
- the external input button (234) may be a button for receiving an external input connected to the display device (100).
- the volume control button (235) may be a button for adjusting the size of the volume output by the display device (100).
- the voice recognition button (236) may be a button for receiving a user's voice and recognizing the received voice.
- the channel change button (237) may be a button for receiving a broadcast signal of a specific broadcast channel.
- the confirmation button (238) may be a button for selecting a specific function, and the back button (239) may be a button for returning to the previous screen.
- the user input interface (230) When the user input interface (230) is equipped with a touch screen, the user can input a command related to the display device (100) using the remote control device (200) by touching a soft key of the touch screen.
- the user input interface (230) may be equipped with various types of input means that can be operated by the user, such as a scroll key or a jog key, and this embodiment does not limit the scope of the rights of the present invention.
- the sensor (240) may be equipped with a gyro sensor (241) or an acceleration sensor (243), and the gyro sensor (241) may sense information about the movement of the remote control device (200).
- the gyro sensor (241) can sense information about the operation of the remote control device (200) based on the x, y, and z axes, and the acceleration sensor (243) can sense information about the movement speed of the remote control device (200).
- the remote control device (200) can further be equipped with a distance measuring sensor, so as to sense the distance to the display (180) of the display device (100).
- the output interface (250) can output a video or audio signal corresponding to an operation of the user input interface (230) or a signal transmitted from the display device (100).
- the user can recognize whether the output interface (250) is manipulating the user input interface (230) or controlling the display device (100).
- the output interface (250) may be equipped with an LED (251) that lights up when the user input interface (230) is operated or a signal is transmitted and received with the display device (100) via the wireless communication unit (225), a vibrator (253) that generates vibrations, a speaker (255) that outputs sound, or a display (257) that outputs images.
- an LED 251 that lights up when the user input interface (230) is operated or a signal is transmitted and received with the display device (100) via the wireless communication unit (225), a vibrator (253) that generates vibrations, a speaker (255) that outputs sound, or a display (257) that outputs images.
- the power supply circuit (260) supplies power to the remote control device (200), and reduces power waste by stopping the power supply when the remote control device (200) does not move for a predetermined period of time.
- the power supply circuit (260) can resume power supply when a predetermined key provided in the remote control device (200) is operated.
- the memory (270) can store various types of programs, application data, etc. required for the control or operation of the remote control device (200).
- the remote control device (200) wirelessly transmits and receives signals through the display device (100) and the RF circuit (221), the remote control device (200) and the display device (100) transmit and receive signals through a predetermined frequency band.
- the controller (280) of the remote control device (200) can store and reference information about the frequency band, etc., that can wirelessly transmit and receive signals with the display device (100) paired with the remote control device (200), in the memory (270).
- the controller (280) controls all matters related to the control of the remote control device (200).
- the controller (280) can transmit a signal corresponding to a predetermined key operation of the user input interface (230) or a signal corresponding to the movement of the remote control device (200) sensed by the sensor (240) to the display device (100) through the wireless communication unit (225).
- the microphone (290) of the remote control device (200) can acquire voice.
- a plurality of microphones (290) may be provided.
- FIG. 4 shows an example of utilizing a remote control device according to an embodiment of the present invention.
- Fig. 4 (a) illustrates that a pointer (205) corresponding to a remote control device (200) is displayed on a display (180).
- the user can move the remote control device (200) up and down, left and right, or rotate it.
- the pointer (205) displayed on the display (180) of the display device (100) corresponds to the movement of the remote control device (200).
- This remote control device (200) can be called a space remote control because, as shown in the drawing, the pointer (205) moves and is displayed according to the movement in 3D space.
- Figure 4 (b) exemplifies that when a user moves the remote control device (200) to the left, the pointer (205) displayed on the display (180) of the display device (100) also moves to the left correspondingly.
- Information about the movement of the remote control device (200) detected through the sensor of the remote control device (200) is transmitted to the display device (100).
- the display device (100) can calculate the coordinates of the pointer (205) from the information about the movement of the remote control device (200).
- the display device (100) can display the pointer (205) to correspond to the calculated coordinates.
- Figure 4 (c) illustrates a case where a user moves the remote control device (200) away from the display (180) while pressing a specific button within the remote control device (200).
- a selection area within the display (180) corresponding to the pointer (205) can be zoomed in and displayed in an enlarged manner.
- the selection area within the display (180) corresponding to the pointer (205) may be zoomed out and displayed in a reduced size.
- the selection area may be zoomed out, and when the remote control device (200) moves closer to the display (180), the selection area may be zoomed in.
- the movement speed or movement direction of the pointer (205) can correspond to the movement speed or movement direction of the remote control device (200).
- the pointer in this specification means an object displayed on the display (180) in response to the operation of the remote control device (200). Accordingly, objects of various shapes other than the arrow shape illustrated in the drawing are possible as the pointer (205). For example, it may be a concept including a point, a cursor, a prompt, a thick outline, etc.
- the pointer (205) may be displayed corresponding to one point of the horizontal and vertical axes on the display (180), and may also be displayed corresponding to multiple points such as lines and surfaces.
- FIG. 5 illustrates an artificial intelligence (AI) server according to one embodiment of the present disclosure.
- the AI server (500) may mean a device that trains an artificial neural network using a machine learning algorithm or uses a trained artificial neural network.
- the AI server (500) may be composed of multiple servers to perform distributed processing and may be defined as a 5G network.
- the AI server (500) may be included as part of the configuration of the AI device (100) and may perform at least part of the AI processing.
- the AI server (500) may include a communication unit (510), a memory (530), a learning processor (540), and a processor (560).
- the communication unit (510) can transmit and receive data with an external device such as a display device (100).
- the memory (530) may include a model storage unit (531).
- the model storage unit (531) can store a model (or artificial neural network, 531a) that is being learned or has been learned through the learning processor (540).
- the learning processor (540) can train an artificial neural network (531a) using learning data.
- the learning model can be used while mounted on the AI server (500) of the artificial neural network, or can be mounted on an external device such as a display device (100).
- the learning model may be implemented in hardware, software, or a combination of hardware and software. If part or all of the learning model is implemented in software, one or more instructions constituting the learning model may be stored in memory (530).
- the processor (560) can use a learning model to infer a result value for new input data and generate a response or control command based on the inferred result value.
- FIG. 6 is a diagram for explaining the configuration of an AI system according to one embodiment of the present disclosure.
- an artificial intelligence (AI) system may include a display device (100), an AI server (500), and a generation AI server (600).
- the AI server (500) and the generation AI server (600) may be configured as a single server.
- the AI server (500) may also be referred to as an artificial intelligence device.
- Components of the AI system (60) can communicate with each other via the Internet.
- Each generation AI server (600) may include the components illustrated in FIG. 5.
- the AI server (500) may be a natural language processing (NLP) server that obtains the intent analysis results of a voice command through natural language processing.
- NLP natural language processing
- the display device (100) may be referred to as an electronic device.
- the AI server (500) can obtain multiple error names corresponding to the content name.
- the AI server (500) can match the acquired multiple error names to content names and store them in the memory (530).
- the display device (100) can obtain a voice command spoken by a user and transmit voice data corresponding to the obtained voice command to the AI server (500) through a network interface (133).
- the AI server (500) can obtain analysis results of voice commands based on voice data.
- the AI server (500) can determine whether the acquired analysis result includes a stored error name.
- the AI server (500) If the AI server (500) does not include an error name or content name stored in the analysis result, it can transmit a non-recognition result indicating that the content name was not recognized to the display device (100) through the communication unit (510).
- the display device (100) can display the unrecognized result on the display (180).
- the AI server (500) can acquire a content name matching the error name from the memory (530).
- the processor (560) of the AI server (500) can obtain search results for the acquired content name and transmit the obtained search results to the display device (100) through the communication unit (510).
- the display device (100) can output search results for content names received from the AI server (500).
- FIG. 7 is a ladder diagram for explaining an operation method of an AI system according to an embodiment of the present disclosure.
- content names are used as examples to explain, but it is not limited to content names and can also be applied to words and sentences spoken by users.
- the processor (560) of the AI server (500) can obtain multiple error names corresponding to the content name (S701).
- the multiple error names may include one or more of a name similar to the content name, a name recognized based on a user's pronunciation error, and an abbreviation of the content name.
- the processor (560) may receive multiple error names from the generation AI server (600).
- FIG. 8 is a diagram illustrating a process of receiving a plurality of error names corresponding to content names from a generation AI server according to one embodiment of the present disclosure.
- the processor (560) can generate a prompt that commands the generation of an error name based on the content name through the communication unit (510) and transmit the generated prompt to the generation AI server (600).
- the prompt may be a command to generate error names that are likely to be misrecognized/mispronounced for the content name.
- a prompt may include an example of a content name and an error name.
- the name included in the prompt can be either the name of the new content or a search term with a search frequency greater than a preset frequency.
- the processor (560) can generate a prompt requesting the generation of error names for the names of new content from a new content database that stores information about new content.
- the processor (560) may generate a prompt requesting generation of error names for search terms having a search frequency greater than a certain frequency from a content provider server or a web server.
- the generation AI server (600) can generate multiple error names as a response to a prompt using a large language model (LLM) (601).
- LLM large language model
- the super-large language model (601) may be a model that learns a large amount of text data and outputs automatic translation results for prompts, question-answering results, etc.
- the super-large language model (601) can take as input a prompt that commands it to generate an error name based on a content name, and generate multiple error names that can be uttered as the content name.
- the processor (560) can transmit multiple prompts of different forms to the generating AI server (600).
- the generation AI server (600) can generate multiple response results corresponding to each of the multiple prompts using the LLM (601). Each of the multiple response results can include multiple error names.
- Figures 9a and 9b are diagrams illustrating the output results of LLM according to a prompt to generate an error name of a content name.
- FIG. 9a the execution screen of a chatbot application providing a generation AI service is shown.
- the execution screen of the chatbot application can be displayed on the AI server (500) or the display device (100).
- the prompt transmitted to the generation AI server (600) can be generated in the AI server (500) or the display device (100).
- the processor (560) of the AI server (500) can transmit the first prompt (901) to the generating AI server (600).
- the LLM (601) of the generation AI server (600) can respond to the first prompt (901) with five error names (902) that can be incorrectly pronounced for ⁇ your house>.
- the five error names (902) can be ⁇ Yore howse>, ⁇ Yer house>, ⁇ Yor house>, ⁇ Youse> and ⁇ Your hice>.
- the processor (560) may store the error names (902) in memory (530) by matching them to the name ⁇ your house>.
- FIG. 9b another execution screen of a chatbot application providing a generation AI service is shown.
- the processor (560) of the AI server (500) can transmit the second prompt (911) to the generating AI server (600).
- the LLM (601) of the generation AI server (600) can output a response result (912) indicating the principle of generating an abbreviation of a name in response to the second prompt (911).
- the processor (560) of the AI server (500) can transmit the third prompt (913) to the generating AI server (600).
- the LLM (601) of the generation AI server (600) can output a response result (914) including the abbreviation of the name, ⁇ bb>, in response to the third prompt (913).
- the processor (560) can obtain multiple error names corresponding to the name through the LLM (601) of the generation AI server (600) based on the prompt.
- the display device (100) can also obtain multiple error names corresponding to the name by executing a chatbot application that provides a generation AI service.
- the acquired multiple error names can be stored in the memory (530) of the AI server (500) or in a separate database.
- the processor (560) can obtain error names for content names from the display device (100).
- the processor (560) may determine that the content name was mispronounced based on the analysis of the voice command uttered by the user.
- the processor (560) can receive a text input for the content name from the display device (100).
- the processor (560) When the processor (560) recognizes the content name through text input, it can obtain the error text of the mispronounced voice as the error name.
- the processor (560) of the AI server (500) can match the acquired multiple error names to content names and store them in the memory (530) (S703).
- the processor (560) can match a plurality of error names corresponding to one acquired content name to the content name and store them in the memory (530) or a database.
- the database may be included in the AI server (500) or may be a storage facility provided separately from the AI server (500).
- FIG. 10 is a diagram illustrating a table matching content names and multiple error names according to one embodiment of the present disclosure.
- a table (1000) is illustrated showing a correspondence relationship between a content name, ⁇ your house>, and multiple error names, ⁇ Yore howse>, ⁇ Your hice>, and ⁇ Youse>, that match the content name.
- the table (1000) can be stored in the memory (530) of the AI server (500) or in a content database (not shown).
- Multiple tables can be stored in a form similar to table (1000) in which multiple names and multiple error names corresponding to each name are matched.
- the name can be either a content name or a search term.
- variant words of each of the one or more keywords may be used to generate the error name.
- LLM (601) can generate variant words for each keyword when the content name contains one or more keywords, and generate an error name by combining the generated variant words.
- a content title consists of two words, a first keyword and a second keyword.
- LLM (601) can generate variant words of the first keyword and variant words of the second keyword.
- LLM (601) can generate an error name by combining any one of the variant words of the first keyword and any one of the variant words of the second keyword.
- a plurality of error names generated by a combination of variant words of LLM (601) can be transmitted to the AI server (500).
- the processor (560) of the AI server (500) may generate variant words of the first keyword and variant words of the second keyword.
- the processor (560) may generate an error name by combining any one of the variant words of the first keyword and any one of the variant words of the second keyword.
- the processor (560) of the AI server (500) can adjust the number of error names according to the popularity ranking of the content (in the case of a search term, the search ranking). For example, the processor (560) can increase the number of error names corresponding to the content name as the popularity ranking of the content increases, and can decrease the number of error names corresponding to the content name as the popularity ranking of the content decreases.
- the processor (560) can add or delete error names to adjust the capacity of the memory (530).
- the AI server (500) can receive the popularity ranking of content or the search ranking of search words from a content provider server or a search server.
- the controller (170) of the display device (100) can obtain a voice command spoken by a user (S705) and transmit voice data corresponding to the obtained voice command to the AI server (500) through the network interface (133) (S707).
- the AI server (500) may be equipped with a STT (Speech To Text) engine and may convert voice data into text data using the STT engine.
- STT Seech To Text
- the display device (100) can transmit voice data corresponding to a voice command to an STT server (not shown).
- the STT server can convert the voice data into text data and transmit the converted text data to an AI server (500).
- the processor (560) of the AI server (500) can obtain analysis results of a voice command based on voice data (S709).
- the processor (560) can obtain analysis results using text data of voice data.
- the processor (560) can obtain analysis results from text data using a natural language processing engine.
- the processor (560) can sequentially perform a morphological analysis step, a syntax analysis step, a speech act analysis step, and a dialogue processing step on text data to generate an analysis result.
- the morphological analysis step is the step of classifying text data corresponding to the user's spoken voice into morphemes, which are the smallest units that have meaning, and determining which part of speech each classified morpheme has.
- the syntactic analysis step is the step that uses the results of the morphological analysis step to divide text data into noun phrases, verb phrases, adjective phrases, etc., and determines what kind of relationship exists between each divided phrase.
- the subject, object, and modifiers of the speech spoken by the user can be determined.
- the speech act analysis step is the step that analyzes the intention of the voice spoken by the user using the results of the syntactic analysis step. Specifically, the speech act analysis step is the step that determines the intention of the sentence, such as whether the user is asking a question, making a request, or simply expressing emotions.
- the conversation processing stage uses the results of the speech act analysis stage to determine whether to respond to the user's utterance, respond, or ask a question for additional information.
- the processor (560) may generate an analysis result including one or more of the user's spoken intention, a response to the intention, a response, and an inquiry for additional information.
- the processor (560) of the AI server (500) can determine whether the acquired analysis result includes a stored error name (S711).
- the processor (560) can determine whether the analysis result includes an error name through the memory (530) or the content database.
- the processor (560) of the AI server (500) does not include an error name or content name stored in the analysis result, the processor (560) can transmit a non-recognition result indicating that the content name was not recognized to the display device (100) through the communication unit (510) (S712).
- the display device (100) can display the unrecognized result on the display (180).
- the processor (560) of the AI server (500) can acquire a content name matching the error name from the memory (530) (S713).
- the processor (560) can recognize that the user has uttered a content name that matches the error name when the text data of the voice command uttered for the content name corresponds to the error name.
- the processor (560) of the AI server (500) can obtain search results for the obtained content name (S715) and transmit the obtained search results to the display device (100) through the communication unit (510) (S717).
- the processor (560) can transmit a search request for a content name recognized as spoken by a user to a search server (not shown) and receive a search result for the content name from the search server.
- the search server may be a web server or a content provider server.
- the search results may include one or more of the content's characters, plot, episode information, and access addresses.
- the controller (170) of the display device (100) can output search results for content names received from the AI server (500) (S719).
- the controller (170) can display search results on the display (180).
- the user experience for voice search can be improved by reducing search failures when users speak only some of the key words of the content name or substitute some of the key words with synonyms.
- Figure 11 is a diagram illustrating a process for providing search results for desired content even when a user incorrectly pronounces the content name.
- the display device (100) can transmit voice data for the ⁇ your hice> voice command uttered by the user to the AI server (500).
- the display device (100) can receive voice commands from a remote control device (200) or through a microphone provided by the display device.
- the AI server (500) can analyze the intent of text data for voice data and obtain analysis results.
- the analysis result may indicate a search intent for content corresponding to the content name.
- the search intent may include a first text corresponding to the content name and a first text indicating a search request for content corresponding to the content name.
- the AI server (500) can determine whether the first text included in the analysis result matches the error names stored in the content database (1100).
- the AI server (500) can determine whether the first text included in the analysis result is stored in the table (1100) stored in the content DB (1100).
- the AI server (500) can extract a content name matching the first text when the first text included in the analysis result is stored in the table (1100).
- the AI server (500) can determine that the content name has been recognized if the first text included in the analysis result is stored in the table (1100).
- the AI server (500) can provide search results of content corresponding to the recognized content name to the display device (100).
- the user experience for voice search can be improved by reducing search failures when users speak only some of the key words of the content name or substitute some of the key words with synonyms.
- Figure 12 is a diagram illustrating an example of providing a pop-up window to confirm whether the content name intended by the user is correct when the user incorrectly pronounces the content name.
- the AI server (500) can determine the content recognition result of a voice command incorrectly uttered by the user as ⁇ your house>, as in the embodiment of Fig. 11.
- the AI server (500) can transmit a control command to the display device (100) to verify whether the content recognition result is correct.
- the display device (100) can display a pop-up window (1200) on the display (180) to confirm whether the content name intended by the user is correct according to a control command received from the AI server (500).
- the pop-up window (1200) may include the recognized content name and text asking if the content name is correct.
- the display device (100) When the display device (100) receives a confirmation input indicating that the content name is correct through a pop-up window (1200), it can transmit a confirmation command indicating that the confirmation input has been received to the AI server (500).
- the AI server (500) can perform a search for a content name according to a received confirmation command and transmit the search result to the display device (100).
- Display of the pop-up window (1200) of Fig. 12 can be performed between steps S713 and S715 of Fig. 7.
- the above-described method can be implemented as a processor-readable code on a medium in which a program is recorded.
- the processor-readable medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
La présente divulgation porte sur un dispositif d'intelligence artificielle, selon un mode de réalisation, qui peut comprendre : une mémoire pour stocker un nom et de multiples noms erronés mis en correspondance avec le nom ; une unité de communication pour communiquer avec un dispositif électronique ou un serveur d'intelligence artificielle (IA) générative ; et un processeur pour recevoir, en provenance du dispositif électronique, des données vocales correspondant à une commande vocale prononcée par un utilisateur, obtenir un résultat d'analyse sur la base des données vocales reçues, obtenir le nom mis en correspondance avec les multiples noms erronés lorsque l'un quelconque des multiples noms erronés stockés est inclus dans le résultat d'analyse obtenu, et émettre un résultat de recherche pour le nom obtenu vers le dispositif électronique.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2023/016981 WO2025095151A1 (fr) | 2023-10-30 | 2023-10-30 | Dispositif d'affichage et procédé de fonctionnement s'y rapportant |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2023/016981 WO2025095151A1 (fr) | 2023-10-30 | 2023-10-30 | Dispositif d'affichage et procédé de fonctionnement s'y rapportant |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025095151A1 true WO2025095151A1 (fr) | 2025-05-08 |
Family
ID=95582456
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2023/016981 Pending WO2025095151A1 (fr) | 2023-10-30 | 2023-10-30 | Dispositif d'affichage et procédé de fonctionnement s'y rapportant |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025095151A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20160069329A (ko) * | 2014-12-08 | 2016-06-16 | 삼성전자주식회사 | 언어 모델 학습 방법 및 장치, 음성 인식 방법 및 장치 |
| US20160253989A1 (en) * | 2015-02-27 | 2016-09-01 | Microsoft Technology Licensing, Llc | Speech recognition error diagnosis |
| KR20170063037A (ko) * | 2015-11-30 | 2017-06-08 | 삼성전자주식회사 | 음성 인식 장치 및 방법 |
| KR20210047173A (ko) * | 2019-10-21 | 2021-04-29 | 엘지전자 주식회사 | 오인식된 단어를 바로잡아 음성을 인식하는 인공 지능 장치 및 그 방법 |
| KR20220039075A (ko) * | 2020-09-21 | 2022-03-29 | 삼성전자주식회사 | 전자 장치, 컨텐츠 검색 시스템 및 검색 방법 |
-
2023
- 2023-10-30 WO PCT/KR2023/016981 patent/WO2025095151A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20160069329A (ko) * | 2014-12-08 | 2016-06-16 | 삼성전자주식회사 | 언어 모델 학습 방법 및 장치, 음성 인식 방법 및 장치 |
| US20160253989A1 (en) * | 2015-02-27 | 2016-09-01 | Microsoft Technology Licensing, Llc | Speech recognition error diagnosis |
| KR20170063037A (ko) * | 2015-11-30 | 2017-06-08 | 삼성전자주식회사 | 음성 인식 장치 및 방법 |
| KR20210047173A (ko) * | 2019-10-21 | 2021-04-29 | 엘지전자 주식회사 | 오인식된 단어를 바로잡아 음성을 인식하는 인공 지능 장치 및 그 방법 |
| KR20220039075A (ko) * | 2020-09-21 | 2022-03-29 | 삼성전자주식회사 | 전자 장치, 컨텐츠 검색 시스템 및 검색 방법 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2014003283A1 (fr) | Dispositif d'affichage, procédé de commande de dispositif d'affichage, et système interactif | |
| WO2014107101A1 (fr) | Appareil d'affichage et son procédé de commande | |
| WO2021060590A1 (fr) | Dispositif d'affichage et système d'intelligence artificielle | |
| WO2015194693A1 (fr) | Dispositif d'affichage de vidéo et son procédé de fonctionnement | |
| WO2021060575A1 (fr) | Serveur à intelligence artificielle et procédé de fonctionnement associé | |
| WO2019135433A1 (fr) | Dispositif d'affichage et système comprenant ce dernier | |
| WO2021025245A1 (fr) | Dispositif d'affichage et système de son surround | |
| WO2019164049A1 (fr) | Dispositif d'affichage et son procédé de fonctionnement | |
| WO2020122271A1 (fr) | Dispositif d'affichage | |
| WO2021015319A1 (fr) | Dispositif d'affichage et son procédé de commande | |
| WO2021033785A1 (fr) | Dispositif d'affichage et serveur d'intelligence artificielle pouvant commander un appareil ménager par l'intermédiaire de la voix d'un utilisateur | |
| WO2025084447A1 (fr) | Dispositif d'intelligence artificielle et procédé de fonctionnement associé | |
| WO2021045278A1 (fr) | Appareil d'affichage | |
| WO2021054495A1 (fr) | Dispositif d'affichage et serveur d'intelligence artificielle | |
| WO2022014738A1 (fr) | Dispositif d'affichage | |
| WO2020230923A1 (fr) | Dispositif d'affichage permettant de fournir un service de reconnaissance de la parole, et son procédé de fonctionnement | |
| WO2019164020A1 (fr) | Dispositif d'affichage | |
| WO2025095151A1 (fr) | Dispositif d'affichage et procédé de fonctionnement s'y rapportant | |
| WO2021177495A1 (fr) | Dispositif de traitement de langage naturel | |
| WO2021060570A1 (fr) | Appareil électroménager et serveur | |
| WO2025075220A1 (fr) | Dispositif d'affichage et son procédé de fonctionnement | |
| WO2024005226A1 (fr) | Dispositif d'affichage | |
| WO2025110257A1 (fr) | Dispositif d'affichage et procédé de fonctionnement associé | |
| WO2022050433A1 (fr) | Dispositif d'affichage pour régler la sensibilité de reconnaissance d'un mot de départ de reconnaissance de la parole et son procédé de fonctionnement | |
| WO2021015307A1 (fr) | Dispositif d'affichage et serveur d'intelligence artificielle apte à commander un appareil domestique par l'intermédiaire d'une voix d'utilisateur |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23957749 Country of ref document: EP Kind code of ref document: A1 |