WO2024253376A1 - Method and system for contextual device wake-up in multi-device multi-reality environments - Google Patents

Method and system for contextual device wake-up in multi-device multi-reality environments Download PDF

Info

Publication number
WO2024253376A1
WO2024253376A1 PCT/KR2024/007274 KR2024007274W WO2024253376A1 WO 2024253376 A1 WO2024253376 A1 WO 2024253376A1 KR 2024007274 W KR2024007274 W KR 2024007274W WO 2024253376 A1 WO2024253376 A1 WO 2024253376A1
Authority
WO
WIPO (PCT)
Prior art keywords
devices
world devices
virtual
world
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2024/007274
Other languages
French (fr)
Inventor
Manjunath Belgod Lokanath
Vishwanath Pethri KAMATH
Rishabh SHUKLA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to CN202480037270.1A priority Critical patent/CN121241329A/en
Priority to EP24819528.1A priority patent/EP4619855A4/en
Priority to US18/883,463 priority patent/US20250006195A1/en
Publication of WO2024253376A1 publication Critical patent/WO2024253376A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the disclosure relates to multi-device-multi reality environments. More particularly, the disclosure relates to contextual device wake-up in the multi-device-multi reality environments.
  • Multi-Device Multi-Reality environments represent a dynamic landscape where users engage with a plurality of devices across both physical and virtual realms.
  • the user can seamlessly transition between tangible devices in the physical world, such as smartphones, tablets, and smart home appliances, and VR smart devices in the virtual world.
  • FIG. 1 illustrates an example of a Multi-Device Multi-Reality environment implemented according to the related art.
  • FIG. 1 depicts the Multi-Device Multi-Reality environment 100 where the user is communicating with multiple devices present in the XR environment 101 and in the physical environment 103.
  • the device in the physical room may wake up based on static parameters.
  • a wake command for example "Hi xxx” when the user utters the wake command for example "Hi xxx”, a television (TV) 105 of the physical environment 103 gets into a listening stage.
  • the devices at both the world get active. For example, both the TV 105 of the physical world and the virtual TV 107 answer to the user command. Accordingly, the devices existing only within the virtual environment do not actively participate in the process of differentiating or distinguishing real-world stimuli in order to wake up other devices.
  • an aspect of the disclosure is to provide contextual device wake-up in the multi-device-multi reality environments.
  • a method for waking up a device among a plurality of devices in a multi-reality environment includes detecting a voice input from a user, receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices and error information in wake-up of each of the pluralit
  • AI artificial intelligence
  • an apparatus for waking up a device among a plurality of devices in a multi-reality environment includes memory storing one or more computer programs, and one or more processors communicatively coupled to the memory, wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to detect a voice input from a user, receive, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feed the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the pluralit
  • AI artificial intelligence
  • one or more non-transitory computer-readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an apparatus, cause the apparatus to perform operations.
  • the operations include detecting a voice input from a user, receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real-world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-
  • AI artificial intelligence
  • FIG. 1 illustrates an example of a Multi-Device Multi-Reality environment implemented according to the related art
  • FIG. 2 illustrates a general architecture of an apparatus for waking up a device in a multi-reality environment, according to an embodiment of the disclosure
  • FIG. 3 illustrates various components of modules of FIG. 2, according to an embodiment of the disclosure
  • FIG. 4 illustrates an example of calculating an egocentric distance between a user and each of a plurality of virtual world devices, according to an embodiment of the disclosure
  • FIG. 5 illustrates an example training phase of a pre-trained AI-based model, according to an embodiment of the disclosure
  • FIG. 6 illustrates an operation flow of a Deep Neural Network-based Contextual Device Selector (DCDS) module 303, according to an embodiment of the disclosure
  • FIG. 7a illustrates a flowchart for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure
  • FIG. 7b illustrates an operational flow for waking up a device in a multi-reality environment, according to an embodiment of the disclosure
  • FIG. 8 illustrates a use case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
  • FIG. 9 illustrates another case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
  • the disclosure provides an apparatus implemented with a method for waking up a device among a plurality of devices in a multi-device multi-reality environment (hereinafter referred to as multi-reality environment).
  • the process involves waking up the most appropriate device from a list of candidate devices that are predicted by a pre-trained AI-based model.
  • a wake-up signal is sent to the most appropriate device for turning the most appropriate device into a listening state.
  • the AI-based model is trained based on a plurality of parameters associated with the user and the plurality of devices in the multi-device multi-reality environment.
  • each flowchart and combinations of the flowcharts may be performed by one or more computer programs which include instructions.
  • the entirety of the one or more computer programs may be stored in a single memory device or the one or more computer programs may be divided with different portions stored in different multiple memory devices.
  • the one processor or the combination of processors is circuitry performing processing and includes circuitry like an application processor (AP, e.g. a central processing unit (CPU)), a communication processor (CP, e.g., a modem), a graphics processing unit (GPU), a neural processing unit (NPU) (e.g., an artificial intelligence (AI) chip), a Wi-Fi chip, a Bluetooth ® chip, a global positioning system (GPS) chip, a near field communication (NFC) chip, connectivity chips, a sensor controller, a touch controller, a finger-print sensor controller, a display drive integrated circuit (IC), an audio CODEC chip, a universal serial bus (USB) controller, a camera controller, an image processing IC, a microprocessor unit (MPU), a system on chip (SoC), an integrated circuit (IC), or the like.
  • AP application processor
  • CP e.g., a modem
  • GPU graphics processing unit
  • NPU neural processing unit
  • AI artificial intelligence
  • FIG. 2 illustrates a general architecture of an apparatus for waking up a device in a multi-reality environment, according to an embodiment of the disclosure.
  • FIG. 2 describes various components of apparatus 200 for waking up a device in the multi-reality environment.
  • the apparatus 200 includes electronic devices such as a central hub, a smart monitoring system, a voice assistant system, and a head-mounted device (HMD).
  • the HMD may act as the central hub which enables seamless communication between physical world devices and virtual world devices.
  • the apparatus 200 may be an MDW server that acts as the central hub and enables seamless communication between the physical world devices and the virtual world devices.
  • the physical world devices are depicted in block 233, and the virtual world devices are depicted in block 231.
  • the physical world devices may be collectively referred to as 215 and the virtual world devices may be collectively referred to as 217.
  • the physical world devices with voice-enabled services may include smart home devices, Internet of Things (IoT) enabled devices, voice assistants, and the like.
  • the virtual world devices may include a digital representation of the physical world devices.
  • an apparatus 200 includes a processor(s) 201, memory 203, a module(s) 205, a database 207, a receiving unit 209, and a network interface (NI) 211 coupled with each other.
  • the processor 201 may be a single processing unit or a number of units, all of which could include multiple computing units.
  • the processor 201 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logical processors, virtual processors, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the processor 201 is configured to fetch and execute computer-readable instructions and data stored in the memory 203.
  • the memory 203 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • DRAM dynamic random access memory
  • non-volatile memory such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the module(s) 205 may include a program, a subroutine, a portion of a program, a software component, or a hardware component capable of performing a stated task or function. As used herein, the module(s) 205 may be implemented on a hardware component such as a server independently of other modules, or a module can exist with other modules on the same server, or within the same program. The module(s) 205 may be implemented on a hardware component such as processor one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The module(s) 205, when executed by the processor(s) 201 may be configured to perform any of the described functionalities of the module(s) 205. The various components of module(s) 205 will be explained with reference to FIG. 3 in the later sections.
  • the database 207 may be implemented with integrated hardware and software.
  • the hardware may include a hardware disk controller with programmable search capabilities or a software system running on general-purpose hardware.
  • the examples of the database 207 are, but are not limited to, in-memory databases, cloud databases, distributed databases, embedded databases, and the like.
  • the database 207 serves as a repository for storing data processed, received, and generated by one or more of the processors, and the modules/engines/units.
  • the module(s) 205 may be implemented using one or more AI modules that may include a plurality of neural network layers.
  • neural networks include but are not limited to, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Restricted Boltzmann Machine (RBM).
  • the module(s) 205 may be implemented using one or more generative AI modules that may include Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), flow-based generative model, auto-regressive models, and the like.
  • VAEs Variational Autoencoders
  • GANs Generative Adversarial Networks
  • flow-based generative model auto-regressive models, and the like.
  • 'learning' may be referred to in the disclosure as a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction.
  • learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
  • At least one of a plurality of CNN, DNN, RNN, RMB, VAES, GANs, flow-based generative models, auto-regressive models, and the like may be implemented to thereby achieve execution of the present subject matter's mechanism through an AI model or generative AI models.
  • a function associated with an AI module or the generative AI models may be performed through the non-volatile memory, the volatile memory, and the processor.
  • the processor may include one or a plurality of processors.
  • One or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU).
  • processors or neural processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model or generative AI models stored in the non-volatile memory and the volatile memory.
  • the predefined operating rule or artificial intelligence model is provided through training or learning.
  • the receiving unit 209 receives a command from the user.
  • the NI unit 211 establishes a network connection with a network like a home network, a public network, a private network, a cloud server, and the like for communication purposes.
  • FIG. 3 illustrates various components of modules of FIG. 2, according to an embodiment of the disclosure.
  • module(s) 205 of an apparatus 200 includes a Hybrid Multi-Device Aggregator module (HMDA) 301, a DNN-based Contextual Device Selector (DCDS) Module 303, a Hybrid MDE device Controller (HMDC) module 305, a Physical MDW (PMDW) module 307, and a Virtual MDW (VMDW) module 309.
  • HMDA Hybrid Multi-Device Aggregator
  • DCDS Contextual Device Selector
  • HMDC Hybrid MDE device Controller
  • PMDW Physical MDW
  • VMDW Virtual MDW
  • the PMDW module 307 is in communication with all the physical world devices 215 in the physical environment 103.
  • the PMDW module 307 takes the physical world devices 215 into consideration as candidates for selecting possible target devices for waking up based on the voice input command received from the user.
  • the target device further processes the command.
  • the command may be provided by the user or the apparatus 200.
  • the PMDW module 307 obtains a first plurality of wakeup parameters for each of the physical world devices 215 based on which the candidates are considered for waking up.
  • the first plurality of wakeup parameters comprises at least one of a signal-noise-ratio (SNR) value of each of the physical world devices 215, user's environmental information, a current state of each of the physical world devices 215, a first device status of each of the physical world devices 215, a first device context of each of the physical world devices 215, a direction of the voice input of the user, a distance of the user from each of the physical world devices 215, user's location information, a voice profile information, user profile information, time information related to the usage of the each of the physical world devices 215.
  • SNR signal-noise-ratio
  • the VMDW module 309 is in communication with all the virtual world devices 217 in the virtual environment 101.
  • the virtual world devices 217 might be completely in a Metaverse or in a mixed reality where the virtual world devices 217 are the digital replica of physical world devices 217 in the respective scene.
  • the VMDW module 309 takes the virtual world devices 217 into consideration as candidates for selecting possible target devices for waking up based on the voice input command received from the user.
  • the target device further processes the command.
  • the command may be provided by the user or the apparatus 200.
  • the VMDW module 309 obtains a second plurality of wakeup parameters for each of the virtual world devices 217 based on which the candidates are considered for waking up.
  • the second plurality of wakeup parameters comprises at least one of a second device state of each of the plurality of virtual world devices 217, a second device context of each of the plurality of virtual world devices 217, an egocentric distance between the user and each of the plurality of virtual world devices 217, user profile information, time information related to the usage of the each of the plurality of virtual world devices 217, or a normalized signal to noise ratio (SNR) value of each of the plurality of virtual world devices 217.
  • SNR signal to noise ratio
  • FIG. 4 illustrates an example of calculating an egocentric distance between a user and each of a plurality of virtual world devices 217, according to an embodiment of the disclosure.
  • the distance of each of the virtual world devices in a virtual space in a virtual world could be perceived with egocentric distance.
  • the egocentric distance is a measure of the distance of an object from the observer (i.e. a user 301). According to an example scenario, the egocentric distance would be the distance between an HMD device 401 and a virtual world devices 217 in the XR environment 101. In general, the egocentric distance can be measured using multiple methods like depth perception of the scene and the like.
  • the egocentric distance ‘d’ is given by Equation 1 below.
  • the HMDA module 310 takes the inputs from the PMDW module 307 and the VMDW module 309. In particular, the HMDA module 310 receives the first wakeup parameters and the second plurality of wakeup parameters. According to an embodiment, the HMDA module 310 pre-processes the first wakeup parameters and the second plurality of wakeup parameters by performing a plurality of operations.
  • the plurality of operations may include multi-reality feature normalization such as attribute selection, normalization between physical world devices 215 and virtual world devices 217, parameter identification, and softset techniques.
  • the DCDS module 303 is implemented with a pre-trained AI-based model.
  • a Deep Neural Network (DNN)-based AI model is used as the pre-trained AI-based model.
  • DNN Deep Neural Network
  • FIG. 5 illustrates an example training phase of the pre-trained AI-based model, according to an embodiment of the disclosure.
  • the pre-trained AI-based model is trained over various input samples 501.
  • the input samples 501 consists of different scenarios of device selection. Considerations for the physical world and the virtual world scenarios are given as input samples for the pre-trained AI-based model for training.
  • An example of the input samples 501 is given in Tables 1a to 1c.
  • the input samples may include one or more of a context of the user, a history, a device state of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the device wakeup operation information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual-world devices 217.
  • the pre-trained AI-based model includes a correlation between the one or more of the context of the user, the history, the device state of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the device wakeup operation information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual world devices 217.
  • This correlation information is used in the inference stage for predicting a target device.
  • FIG. 6 illustrates an operation flow of a DCDS module 303, according to an embodiment of the disclosure.
  • an HMDA module 310 performs data pre-processing, and data normalization of the first plurality of wakeup parameters and the normalized second plurality of wakeup parameters before the data points are fed to the Deep Neural Network (DNN)-based AI model.
  • the DCDS module 303 at block 603, predicts a list of candidate devices, from the physical world devices 215 and the virtual-world devices 217.
  • the Deep Neural Network (DNN)-based AI model generates a recommendation for the inputs shared from the HMDA module 310.
  • the recommendations are based on the inputs in the current scenario for the virtual world devices 217 and the real world devices 215, user preference, and context.
  • Table 4 depicts an example of the inputs (i.e. first wakeup parameters and the second wakeup parameters) in the current scenario for the virtual world devices 217, the real world devices 215, user preference, and context.
  • FOV Devices Real World Devices SNR Value Device State Context User ENV User Direction device Distance Wake Command Time Voice Profile Device Selection ⁇ devicel:Hall TV,device2:Hall Speaker, device3:Hall *TableLamp ⁇ ⁇ device1:Be dTV device2:Be dLamp device3:Mobile ⁇ [FOV; ⁇ device1:Active,device2:Idle,device3:idel ⁇ , RealWorld: ⁇ device1:Active,device2:Active,device3:Idle ⁇ ] [FOV; ⁇ device1:Active,device2:Idle,device3:idel ⁇ , RealWorld: ⁇ device1:Active,device2:Active,device3:Idle ⁇ ] [FOV; ⁇ device1:Active,device2:Idle,device3:idel ⁇ , RealWorld: ⁇ device1:Active,device2:Active,device3:Idle ⁇ ] ⁇ userEnv:"Bed
  • the list of candidate devices includes devices predicted from the plurality of real-world devices and the plurality of virtual-world devices. Further, at block 605, the DCDS module 303 assigns, in accordance with a determined prediction function, a score to each candidate device in the list of candidate devices. The DCDS module 303 ranks each of the candidate devices based on the assigned score. The DCDS module 303 takes candidate device inputs at block 607. For example, the candidate device input gives context information such as successful execution (turning into listening mode), device state, previous preferred history (device preferences), execution history (execution of prior utterance), and error rate (device errors). These data points are fed to the model at block 605 in predicting the most appropriate device for wakeup with a higher success rate of execution. As another example, the predictive function is a contextual deep neural network that, when given normalized inputs, generates predictions with suitable scores for selecting candidates. As an example, the assigning of the device score is given in Table 5.
  • the DCDS module 303 predicts a list of devices along with the ranks of each device using the pre-trained AI model 609.
  • the DCDS module 303 chooses top candidates by assigning scores to each of the devices according to the desired prediction function.
  • An example of the list of devices along with the ranks of each device is given in Table 6.
  • the information about the ranks of each of the candidate devices is passed on to the HMDC module 305.
  • the HMDC module 305 receives the information about the ranks of each of the candidate devices from the DCDS module 303.
  • the DCDS 303 determines the target device from the list of candidate devices having a first highest rank for waking up.
  • the HMDC module 305 sends a wake-up signal to the target device having the highest rank. Further, the HMDC module 305 waits for the success signal of the wake signal. In case the target device does not return a success signal within a determined time frame or the target device returns an error signal then a next higher-ranked device will be selected as the target device and dispatched with the wake-up signal.
  • the HMDC module 305 updates every successful or error transaction which is further fed to the training of DCDS module 303. This is further used in selecting and ranking the target device prediction by the DCDS module 303.
  • FIG. 7a illustrates a flowchart for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
  • the method 700 is implemented the apparatus 200 of FIG. 2. Further, the method 700 is implemented through operations 701 to 709 performed by various components of the module 205. According to some embodiments, the functions of the modules 205 may be alternately performed by the processor 201. However, for ease of understanding the operations 701 to 709 will be explained by referring to various modules 205. Further, a detailed explanation of each of the modules is covered in the above paragraphs therefore for the sake of brevity the same is being avoided here.
  • FIG. 7b illustrates an operational flow for waking up a device in a multi-reality environment, according to an embodiment of the disclosure.
  • the method 700B will be explained collectively with method 700 of FIG. 7a for ease of understanding.
  • the user 301 is in the multi-device multi-reality environment having the physical world devices 215 (e.g. device x1, device x2, device 3, device 4) and the virtual world devices 217 (e.g. device y1, device y2, device y3, and device y4) in the physical environment 103 and the XR environment 101 respectively.
  • the user sends a wakeup signal for example by sending the wakeup command "Hi xxx" as a voice input.
  • the virtual world devices 217 and the physical world devices 215 receive the wakeup command at block 723.
  • the PMDW module 307, and the VMDW module 309 detect the voice input received from the user 301. Based on the reception of the wake-up signal x, the PMDW module 307, and the VMDW module 309, at operation 703, receive the first plurality of wakeup parameters associated with the plurality of physical world devices 215 in the physical environment 103 and the second plurality of wakeup parameters associated with the plurality of virtual world devices 217 in the virtual world (e.g. XR world environment 101).
  • the device x1, the device x2, the device x3, and the device x4 receive the wake-up command and calculate respective values for SNR, direction, processed distance, time, voice analyser values, etc., as the first wake-up parameters.
  • the first wake-up parameters are passed to DB 207 and further given in standard format to HMDA module 310 for processing.
  • the device y1, the device y2, the device y3, and the device y4 receive the wake-up command and calculate respective values such as egocentric distance, non-speech inputs including device parameters, user direction inputs, etc., as the second wake-up parameters.
  • the second wake-up parameters are passed to DB 207 and further given in the standard format to HMDA module 310 for processing.
  • the operations performed at operation 703 correspond to block 725.
  • the first plurality of wakeup parameters and the second plurality of wakeup parameters are fed to the HMDA module 310 module for pre-processing and normalizing the first plurality of wakeup parameters and the second plurality of wakeup parameters.
  • HMDA module 310 module for pre-processing and normalizing the first plurality of wakeup parameters and the second plurality of wakeup parameters.
  • the normalized values of the first plurality of wakeup parameters and the second plurality of wakeup parameters are fed, by the HMDA module 310, to the pre-trained artificial intelligence (AI)-based model of the DCDS module 303. Further, the HMDA module 310 also triggers a determination request to the DCDS module 303 for predicting the target device.
  • the operation 705 corresponds to the operation at block 727. Accordingly, the DCDS module 303 at block 729 receives the first plurality of wakeup parameters, the second plurality of wakeup parameters, and the determination request from the HMDA module 310.
  • the pre-trained artificial intelligence (AI)-based model includes the correlation of one or more of the context of the user, the history, the device state of each of the plurality of real-world devices and the plurality of virtual-world devices, the device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual-world devices 217.
  • the training of the pre-trained artificial intelligence (AI)-based model is explained with reference to FIG. 5 above.
  • the DCDS module 303 predicts a target device based on the first plurality of wakeup parameters, and the second plurality of wakeup parameters.
  • the prediction of the target device is explained above with respect to the explanation of the DCDS module 303.
  • device y4 having the highest rank among the other virtual world devices 217 from the XR environment 101 is being selected.
  • the predicted target device may be referred to as a winner device.
  • the HDMC module 305 sends the wake-up signal to the target device to turn the target device into a listening state.
  • the operation 709 corresponds to the operation in the block 731.
  • the target device In response to the sent wake-up signal, at block 733, if the target device is available then it sends back a success signal.
  • the device y4 receives the wake-up signal. If the device y4 is available, then the device y4 sends a success signal status in response to the wake-up signal and goes into a listening stage. On the other hand, if the device y4 is unavailable then an error or failure response will be sent by the device y4. In such a scenario, the device which has the next highest rank in the list of target devices will be sent with the wake-up signal. Accordingly, an appropriate device will be activated when the user is in the multi-device multi reality environment.
  • FIG. 8 illustrates a use case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
  • FIG. 8 illustrates the multi-reality environment where a user 301 is communicating with physical world devices 215 and virtual world device 217.
  • user 301 is in a kitchen room with multiple voice-enabled/controlled devices around in the physical world environment 103. Further, the user 301 also is in a living room in the virtual scene with multiple virtual voice-enabled/controlled devices (i.e., 801v, 803v, 805v, 301v and 200v) around in a virtual world environment 800.
  • the user 301 wants to give a command and hence wants to wake up one device.
  • the kitchen room which is the physical reality of the user includes multiple smart devices such as a family hub, smart chimney, hob, tablet, smart bulb, etc.
  • the living room in the virtual world 800 includes multiple voice-enabled devices such as a TV, a phone, a tablet, speakers, a smart bulb, etc.
  • the user's 301 wake-up command is passed to all the devices for inputs.
  • the inputs i.e. the first wake-up parameters and the second wake-up parameters are processed by HMDA module 310.
  • the processed and normalized inputs are then fed to the DCDS module 303 for prediction.
  • the DCDS module 303 further predicts the target device having the highest rank.
  • the HMDE module 305 sends the wake-up signal to the target device to turn it into listening mode.
  • the DCDS module 303 is based on multiple factors such as signal-to-noise ratio at each device, nonspeech inputs including normalized device parameters from both the physical and virtual reality environments such as user direction, user distance from the device, time factor, voice analyser inputs, egocentric distance, etc., predicts the TV display in the virtual world as the higher ranked.
  • the wake-up signal is dispatched to the same by the HMDE module 305.
  • the HMDA module 310, DCDS module 303, and the HMDE module 305 are collectively referred to as a contextual device selection engine.
  • FIG. 9 illustrates another case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
  • FIG. 9 illustrates the multi-reality environment where a user 301 is communicating with physical world devices 215 and virtual world device 217.
  • a scenario considers that the user 301 is in a living room with multiple voice-enabled/controlled devices around in the physical environment 103.
  • the user 301 is in the virtual scene bedroom with multiple virtual voice-enabled/controlled devices around in the virtual environment 800.
  • the user is not static and moving around in the virtual space.
  • the user 301 wants to give the command and hence wants to wake up one device.
  • the living room which is the physical reality of the user 301 includes multiple voice-enabled devices such as TV, Phone, tablet, speaker, smart bulb, etc.
  • the bedroom which is the virtual scene of the user 301 includes multiple smart devices such as smart TV, smart Bulb, etc.
  • the user's 301 wake-up command is passed to all the devices for inputs.
  • the inputs i.e. the first wake-up parameters and the second wake-up parameters are processed by HMDA module 310.
  • the processed and normalized inputs are then fed to the DCDS module 303 for prediction.
  • the DCDS module 303 further predicts the target device having the highest rank.
  • the HMDE module 305 sends the wake-up signal to the target device to turn it into listening mode.
  • TV 901 of the virtual world is being selected as the target device.
  • the DCDS module 303 is based on multiple factors such as signal-to-noise ratio at each device, nonspeech inputs including normalized device parameters from both the physical and virtual reality environments such as user direction, user distance from the device, time factor, voice analyser inputs, egocentric distance, etc., predicts the TV display 901 in the virtual environment 800 as the higher ranked.
  • the wake-up signal is dispatched to the same by the HMDE module 305.
  • the HMDA module 310, DCDS module 303, and the HMDE module 305 are collectively referred to as a contextual device selection engine.
  • Non-transitory computer readable storage media store one or more computer programs (software modules), the one or more computer programs include computer-executable instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform a method of the disclosure.
  • Any such software may be stored in the form of volatile or non-volatile storage such as, for example, a storage device like read only memory (ROM), whether erasable or rewritable or not, or in the form of memory such as, for example, random access memory (RAM), memory chips, device or integrated circuits or on an optically or magnetically readable medium such as, for example, a compact disk (CD), digital versatile disc (DVD), magnetic disk or magnetic tape or the like.
  • ROM read only memory
  • RAM random access memory
  • CD compact disk
  • DVD digital versatile disc
  • the storage devices and storage media are various embodiments of non-transitory machine-readable storage that are suitable for storing a computer program or computer programs comprising instructions that, when executed, implement various embodiments of the disclosure. Accordingly, various embodiments provide a program comprising code for implementing apparatus or a method as claimed in any one of the claims of this specification and a non-transitory machine-readable storage storing such a program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method for waking up a device among a plurality of devices in in a multi-reality environment is provided. The method includes detecting a voice input from a user, receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real-world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model a correlation of one or more of a context of the user, a history a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices, and error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices, predicting, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters, and sending a wake-up signal to the target device for turning the target device into a listening state.

Description

METHOD AND SYSTEM FOR CONTEXTUAL DEVICE WAKE-UP IN MULTI-DEVICE MULTI-REALITY ENVIRONMENTS
The disclosure relates to multi-device-multi reality environments. More particularly, the disclosure relates to contextual device wake-up in the multi-device-multi reality environments.
Recently, as virtual reality (VR), augmented reality (AR), and mixed reality (XR) technologies advanced and became more accessible, Multi-Device Multi-Reality environments began to gain popularity. The Multi-Device Multi-Reality environment represents a dynamic landscape where users engage with a plurality of devices across both physical and virtual realms. In the Multi-Device Multi-Reality environment, the user can seamlessly transition between tangible devices in the physical world, such as smartphones, tablets, and smart home appliances, and VR smart devices in the virtual world.
In a multi-device environment spanning various realities, there are currently no established methods for initiating device wakeups.
FIG. 1 illustrates an example of a Multi-Device Multi-Reality environment implemented according to the related art. FIG. 1 depicts the Multi-Device Multi-Reality environment 100 where the user is communicating with multiple devices present in the XR environment 101 and in the physical environment 103.
Referring to FIG. 1, when the user utters a wake command for example "Hi xxx" to wake up the voice assistant, then the device in the physical room may wake up based on static parameters. In the example scenario, when the user utters the wake command for example "Hi xxx", a television (TV) 105 of the physical environment 103 gets into a listening stage. According to another example scenario, consider that the user utters the command for example "What's the time" to the voice assistant, but the devices at both the world get active. For example, both the TV 105 of the physical world and the virtual TV 107 answer to the user command. Accordingly, the devices existing only within the virtual environment do not actively participate in the process of differentiating or distinguishing real-world stimuli in order to wake up other devices.
In the current scenario, as explained, multiple devices turn on to wake-up mode and get into the listening mode. With the user further delivering the voice command, the response can be delivered from multiple devices and leads to a bad user experience.
The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure.
Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide contextual device wake-up in the multi-device-multi reality environments.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
In accordance with an aspect of the disclosure, a method for waking up a device among a plurality of devices in a multi-reality environment is provided. The method includes detecting a voice input from a user, receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices and error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices, predicting, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters, and sending a wake-up signal to the target device for turning the target device into a listening state.
In accordance with another aspect of the disclosure, an apparatus for waking up a device among a plurality of devices in a multi-reality environment is provided. The apparatus includes memory storing one or more computer programs, and one or more processors communicatively coupled to the memory, wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to detect a voice input from a user, receive, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feed the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices and error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices, predict, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters, and send a wake-up signal to the target device for turning the target device into a listening state.
In accordance with another aspect of the disclosure, one or more non-transitory computer-readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an apparatus, cause the apparatus to perform operations are provided. The operations include detecting a voice input from a user, receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real-world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world, feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including a correlation of one or more of a context of the user, a history, a device state of each of the plurality of real-world devices and the plurality of virtual-world devices, a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices, and error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices, predicting, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters, and sending a wake-up signal to the target device for turning the target device into a listening state.
Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses various embodiments of the disclosure.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 illustrates an example of a Multi-Device Multi-Reality environment implemented according to the related art;
FIG. 2 illustrates a general architecture of an apparatus for waking up a device in a multi-reality environment, according to an embodiment of the disclosure;
FIG. 3 illustrates various components of modules of FIG. 2, according to an embodiment of the disclosure;
FIG. 4 illustrates an example of calculating an egocentric distance between a user and each of a plurality of virtual world devices, according to an embodiment of the disclosure;
FIG. 5 illustrates an example training phase of a pre-trained AI-based model, according to an embodiment of the disclosure;
FIG. 6 illustrates an operation flow of a Deep Neural Network-based Contextual Device Selector (DCDS) module 303, according to an embodiment of the disclosure;
FIG. 7a illustrates a flowchart for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure;
FIG. 7b illustrates an operational flow for waking up a device in a multi-reality environment, according to an embodiment of the disclosure;
FIG. 8 illustrates a use case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure; and
FIG. 9 illustrates another case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
Throughout the drawings, like reference numerals will be understood to refer to like parts, components, and structures.
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. It includes various specific details to assist in that understanding but these are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to the bibliographical meanings, but, are merely used by the inventor to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the disclosure is provided for illustration purpose only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a component surface" includes reference to one or more of such surfaces.
The term "some" as used herein is defined as "none, or one, or more than one, or all." Accordingly, the terms "none," "one," "more than one," "more than one, but not all" or "all" would all fall under the definition of "some." The term "some embodiments" may refer to no embodiments, to one embodiment or to several embodiments or to all embodiments. The term "some embodiments" is defined as meaning "no embodiment, or one embodiment, or more than one embodiment, or all embodiments."
The terminology and structure employed herein is for describing, teaching, and illuminating some embodiments and their specific features and elements and does not limit, restrict, or reduce the spirit and scope of the claims or their equivalents.
Any terms used herein such as but not limited to "includes," "comprises," "has," "consists," and grammatical variants thereof do NOT specify an exact limitation or restriction and certainly do NOT exclude the possible addition of one or more features or elements, unless otherwise stated, and furthermore must NOT be taken to exclude the possible removal of one or more of the listed features and elements, unless otherwise stated with the limiting language "MUST comprise" or "NEEDS TO include."
Whether or not a certain feature or element was limited to being used only once, either way, it may still be referred to as "one or more features" or "one or more elements" or "at least one feature" or "at least one element." The use of the terms "one or more" or "at least one" feature or element does NOT preclude there being none of that feature or element, unless otherwise specified by limiting language such as "there NEEDS to be one or more . . . " or "one or more element is REQUIRED."
Unless otherwise defined, all terms, and especially any technical and/or scientific terms, used herein may be taken to have the same meaning as commonly understood by one having ordinary skill in the art. The reference numerals are kept same all over for the similar components, entities, and environments throughout the disclosure for ease of understanding.
Various embodiments of the disclosure will be described below in detail with reference to the accompanying drawings.
According to one embodiment, the disclosure provides an apparatus implemented with a method for waking up a device among a plurality of devices in a multi-device multi-reality environment (hereinafter referred to as multi-reality environment). According to another embodiment, in a multi-device multi-reality environment, the process involves waking up the most appropriate device from a list of candidate devices that are predicted by a pre-trained AI-based model. According to an embodiment, a wake-up signal is sent to the most appropriate device for turning the most appropriate device into a listening state. The AI-based model is trained based on a plurality of parameters associated with the user and the plurality of devices in the multi-device multi-reality environment.
A detailed methodology is explained in the following paragraphs of the disclosure.
It should be appreciated that the blocks in each flowchart and combinations of the flowcharts may be performed by one or more computer programs which include instructions. The entirety of the one or more computer programs may be stored in a single memory device or the one or more computer programs may be divided with different portions stored in different multiple memory devices.
Any of the functions or operations described herein can be processed by one processor or a combination of processors. The one processor or the combination of processors is circuitry performing processing and includes circuitry like an application processor (AP, e.g. a central processing unit (CPU)), a communication processor (CP, e.g., a modem), a graphics processing unit (GPU), a neural processing unit (NPU) (e.g., an artificial intelligence (AI) chip), a Wi-Fi chip, a Bluetooth® chip, a global positioning system (GPS) chip, a near field communication (NFC) chip, connectivity chips, a sensor controller, a touch controller, a finger-print sensor controller, a display drive integrated circuit (IC), an audio CODEC chip, a universal serial bus (USB) controller, a camera controller, an image processing IC, a microprocessor unit (MPU), a system on chip (SoC), an integrated circuit (IC), or the like.
FIG. 2 illustrates a general architecture of an apparatus for waking up a device in a multi-reality environment, according to an embodiment of the disclosure.
FIG. 2 describes various components of apparatus 200 for waking up a device in the multi-reality environment. In a non-limiting example, the apparatus 200 includes electronic devices such as a central hub, a smart monitoring system, a voice assistant system, and a head-mounted device (HMD). According to various embodiments, the HMD may act as the central hub which enables seamless communication between physical world devices and virtual world devices. According to some embodiments, the apparatus 200 may be an MDW server that acts as the central hub and enables seamless communication between the physical world devices and the virtual world devices. In an embodiment, the physical world devices are depicted in block 233, and the virtual world devices are depicted in block 231. The physical world devices may be collectively referred to as 215 and the virtual world devices may be collectively referred to as 217. In a non-limiting example, the physical world devices with voice-enabled services that may include smart home devices, Internet of Things (IoT) enabled devices, voice assistants, and the like. Further, in a yet non-limiting example, the virtual world devices may include a digital representation of the physical world devices.
According to an embodiment, an apparatus 200 includes a processor(s) 201, memory 203, a module(s) 205, a database 207, a receiving unit 209, and a network interface (NI) 211 coupled with each other.
For example, the processor 201 may be a single processing unit or a number of units, all of which could include multiple computing units. The processor 201 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logical processors, virtual processors, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 201 is configured to fetch and execute computer-readable instructions and data stored in the memory 203.
The memory 203 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
The module(s) 205 may include a program, a subroutine, a portion of a program, a software component, or a hardware component capable of performing a stated task or function. As used herein, the module(s) 205 may be implemented on a hardware component such as a server independently of other modules, or a module can exist with other modules on the same server, or within the same program. The module(s) 205 may be implemented on a hardware component such as processor one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The module(s) 205, when executed by the processor(s) 201 may be configured to perform any of the described functionalities of the module(s) 205. The various components of module(s) 205 will be explained with reference to FIG. 3 in the later sections.
As another example, the database 207 may be implemented with integrated hardware and software. The hardware may include a hardware disk controller with programmable search capabilities or a software system running on general-purpose hardware. The examples of the database 207 are, but are not limited to, in-memory databases, cloud databases, distributed databases, embedded databases, and the like. The database 207, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the processors, and the modules/engines/units.
The module(s) 205 may be implemented using one or more AI modules that may include a plurality of neural network layers. Examples of neural networks include but are not limited to, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Restricted Boltzmann Machine (RBM). According to other embodiments, the module(s) 205 may be implemented using one or more generative AI modules that may include Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), flow-based generative model, auto-regressive models, and the like. Further, 'learning' may be referred to in the disclosure as a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. At least one of a plurality of CNN, DNN, RNN, RMB, VAES, GANs, flow-based generative models, auto-regressive models, and the like may be implemented to thereby achieve execution of the present subject matter's mechanism through an AI model or generative AI models. A function associated with an AI module or the generative AI models may be performed through the non-volatile memory, the volatile memory, and the processor. The processor may include one or a plurality of processors. One or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU). One or a plurality of processors or neural processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model or generative AI models stored in the non-volatile memory and the volatile memory. The predefined operating rule or artificial intelligence model is provided through training or learning.
For example, the receiving unit 209 receives a command from the user. As a further example, the NI unit 211 establishes a network connection with a network like a home network, a public network, a private network, a cloud server, and the like for communication purposes.
FIG. 3 illustrates various components of modules of FIG. 2, according to an embodiment of the disclosure.
Referring to FIG. 3, module(s) 205 of an apparatus 200 includes a Hybrid Multi-Device Aggregator module (HMDA) 301, a DNN-based Contextual Device Selector (DCDS) Module 303, a Hybrid MDE device Controller (HMDC) module 305, a Physical MDW (PMDW) module 307, and a Virtual MDW (VMDW) module 309. The forthcoming paragraphs will describe a brief description and work of each of the module(s) 305. The detailed working of each of the components of modules of FIG. 2 will be explained in the forthcoming paragraphs through FIGS. 2 to 6. Further, the reference numerals were kept the same for the similar components for ease of understanding.
In an embodiment, the PMDW module 307 is in communication with all the physical world devices 215 in the physical environment 103. The PMDW module 307 takes the physical world devices 215 into consideration as candidates for selecting possible target devices for waking up based on the voice input command received from the user. The target device further processes the command. According to another embodiment, the command may be provided by the user or the apparatus 200. According to yet another embodiment, the PMDW module 307 obtains a first plurality of wakeup parameters for each of the physical world devices 215 based on which the candidates are considered for waking up. In a non-limiting example, the first plurality of wakeup parameters comprises at least one of a signal-noise-ratio (SNR) value of each of the physical world devices 215, user's environmental information, a current state of each of the physical world devices 215, a first device status of each of the physical world devices 215, a first device context of each of the physical world devices 215, a direction of the voice input of the user, a distance of the user from each of the physical world devices 215, user's location information, a voice profile information, user profile information, time information related to the usage of the each of the physical world devices 215. The first plurality of wakeup parameters is given to the HMDA module 310 in standard format for further processing.
According to one embodiment, the VMDW module 309 is in communication with all the virtual world devices 217 in the virtual environment 101. The virtual world devices 217 might be completely in a Metaverse or in a mixed reality where the virtual world devices 217 are the digital replica of physical world devices 217 in the respective scene. The VMDW module 309 takes the virtual world devices 217 into consideration as candidates for selecting possible target devices for waking up based on the voice input command received from the user. The target device further processes the command. The command may be provided by the user or the apparatus 200. According to another embodiment, the VMDW module 309 obtains a second plurality of wakeup parameters for each of the virtual world devices 217 based on which the candidates are considered for waking up. In a non-limiting example, the second plurality of wakeup parameters comprises at least one of a second device state of each of the plurality of virtual world devices 217, a second device context of each of the plurality of virtual world devices 217, an egocentric distance between the user and each of the plurality of virtual world devices 217, user profile information, time information related to the usage of the each of the plurality of virtual world devices 217, or a normalized signal to noise ratio (SNR) value of each of the plurality of virtual world devices 217. The second plurality of wakeup parameters is given to the HMDA module 310 in standard format for further processing.
FIG. 4 illustrates an example of calculating an egocentric distance between a user and each of a plurality of virtual world devices 217, according to an embodiment of the disclosure.
The distance of each of the virtual world devices in a virtual space in a virtual world (here an XR environment 101) could be perceived with egocentric distance. The egocentric distance is a measure of the distance of an object from the observer (i.e. a user 301). According to an example scenario, the egocentric distance would be the distance between an HMD device 401 and a virtual world devices 217 in the XR environment 101. In general, the egocentric distance can be measured using multiple methods like depth perception of the scene and the like. The egocentric distance ‘d’ is given by Equation 1 below.
Figure PCTKR2024007274-appb-img-000001
(Where EH is eye height and AoD is the angle of distance)
The HMDA module 310 takes the inputs from the PMDW module 307 and the VMDW module 309. In particular, the HMDA module 310 receives the first wakeup parameters and the second plurality of wakeup parameters. According to an embodiment, the HMDA module 310 pre-processes the first wakeup parameters and the second plurality of wakeup parameters by performing a plurality of operations. The plurality of operations may include multi-reality feature normalization such as attribute selection, normalization between physical world devices 215 and virtual world devices 217, parameter identification, and softset techniques.
According to another embodiment, the DCDS module 303 is implemented with a pre-trained AI-based model. According to an example embodiment, a Deep Neural Network (DNN)-based AI model is used as the pre-trained AI-based model.
FIG. 5 illustrates an example training phase of the pre-trained AI-based model, according to an embodiment of the disclosure.
Referring to FIG. 5, according to an embodiment, the pre-trained AI-based model is trained over various input samples 501. The input samples 501 consists of different scenarios of device selection. Considerations for the physical world and the virtual world scenarios are given as input samples for the pre-trained AI-based model for training. An example of the input samples 501 is given in Tables 1a to 1c. In an embodiment, the input samples may include one or more of a context of the user, a history, a device state of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the device wakeup operation information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual-world devices 217. As the pre-trained AI-based model trains on the given input samples, the pre-trained AI-based model includes a correlation between the one or more of the context of the user, the history, the device state of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the device wakeup operation information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual world devices 217. This correlation information is used in the inference stage for predicting a target device.
FIG. 6 illustrates an operation flow of a DCDS module 303, according to an embodiment of the disclosure.
Referring to FIG. 6, according to an embodiment, at block 601, an HMDA module 310 performs data pre-processing, and data normalization of the first plurality of wakeup parameters and the normalized second plurality of wakeup parameters before the data points are fed to the Deep Neural Network (DNN)-based AI model. The normalized first plurality of wakeup parameters and the normalized second plurality of wakeup parameters, which are obtained from the HMDA module 310, are fed into the Deep Neural Network (DNN)-based AI model. Further, the DCDS module 303, at block 603, predicts a list of candidate devices, from the physical world devices 215 and the virtual-world devices 217.
Figure PCTKR2024007274-appb-img-000002
Figure PCTKR2024007274-appb-img-000003
Figure PCTKR2024007274-appb-img-000004
In particular, the Deep Neural Network (DNN)-based AI model generates a recommendation for the inputs shared from the HMDA module 310. The recommendations are based on the inputs in the current scenario for the virtual world devices 217 and the real world devices 215, user preference, and context. Table 4 depicts an example of the inputs (i.e. first wakeup parameters and the second wakeup parameters) in the current scenario for the virtual world devices 217, the real world devices 215, user preference, and context.
FOV
Devices
Real
World
Devices
SNR
Value
Device State Context User
ENV
User
Direction
device Distance Wake
Command
Time
Voice
Profile
Device Selection
{devicel:Hall TV,device2:Hall
Speaker,
device3:Hall

*TableLamp}
{device1:Be dTV
device2:Be dLamp
device3:Mobile}
[FOV;{device1:Active,device2:Idle,device3:idel},
RealWorld:{device1:Active,device2:Active,device3:Idle}]
[FOV;{device1:Active,device2:Idle,device3:idel},
RealWorld:{device1:Active,device2:Active,device3:Idle}]
[FOV;{device1:Active,device2:Idle,device3:idel},
RealWorld:{device1:Active,device2:Active,device3:Idle}]
{userEnv:"Bedroom1"} {device1:"HallTV"device2:"HallSpeaker"} [FOV;{device1:2m,device2:5m,device3:2.5m},
RealWorld:{device1:3m,device2:3.5m,device3:1m}]
10:01pm {UName:'John',userType:All Access} {env:"FOV",device:"HallTV"}
... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ...
The list of candidate devices includes devices predicted from the plurality of real-world devices and the plurality of virtual-world devices. Further, at block 605, the DCDS module 303 assigns, in accordance with a determined prediction function, a score to each candidate device in the list of candidate devices. The DCDS module 303 ranks each of the candidate devices based on the assigned score. The DCDS module 303 takes candidate device inputs at block 607. For example, the candidate device input gives context information such as successful execution (turning into listening mode), device state, previous preferred history (device preferences), execution history (execution of prior utterance), and error rate (device errors). These data points are fed to the model at block 605 in predicting the most appropriate device for wakeup with a higher success rate of execution. As another example, the predictive function is a contextual deep neural network that, when given normalized inputs, generates predictions with suitable scores for selecting candidates. As an example, the assigning of the device score is given in Table 5.
Device scores will be shared as below: {Y4 : 0.95, Y1: 0.76, X4:0.71, X2:0.65, ….}
Accordingly, the DCDS module 303, at block 611, predicts a list of devices along with the ranks of each device using the pre-trained AI model 609. the DCDS module 303 chooses top candidates by assigning scores to each of the devices according to the desired prediction function. An example of the list of devices along with the ranks of each device is given in Table 6.
Device Prediction {FOV:device1: 0.95, Realworld:device1: 0.76, Realworld:device3:0.71, FOV:device3::0.65
The information about the ranks of each of the candidate devices is passed on to the HMDC module 305.
According to an embodiment, the HMDC module 305 receives the information about the ranks of each of the candidate devices from the DCDS module 303. According to another embodiment, the DCDS 303 determines the target device from the list of candidate devices having a first highest rank for waking up. The HMDC module 305 sends a wake-up signal to the target device having the highest rank. Further, the HMDC module 305 waits for the success signal of the wake signal. In case the target device does not return a success signal within a determined time frame or the target device returns an error signal then a next higher-ranked device will be selected as the target device and dispatched with the wake-up signal. The HMDC module 305 updates every successful or error transaction which is further fed to the training of DCDS module 303. This is further used in selecting and ranking the target device prediction by the DCDS module 303.
FIG. 7a illustrates a flowchart for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
Referring to FIG. 7a, according to an embodiment, the method 700 is implemented the apparatus 200 of FIG. 2. Further, the method 700 is implemented through operations 701 to 709 performed by various components of the module 205. According to some embodiments, the functions of the modules 205 may be alternately performed by the processor 201. However, for ease of understanding the operations 701 to 709 will be explained by referring to various modules 205. Further, a detailed explanation of each of the modules is covered in the above paragraphs therefore for the sake of brevity the same is being avoided here.
FIG. 7b illustrates an operational flow for waking up a device in a multi-reality environment, according to an embodiment of the disclosure. The method 700B will be explained collectively with method 700 of FIG. 7a for ease of understanding.
Referring to FIGS. 2 and 3, consider that the user 301 is in the multi-device multi-reality environment having the physical world devices 215 (e.g. device x1, device x2, device 3, device 4) and the virtual world devices 217 (e.g. device y1, device y2, device y3, and device y4) in the physical environment 103 and the XR environment 101 respectively. According to an embodiment, consider that, at block 721, the user sends a wakeup signal for example by sending the wakeup command "Hi xxx" as a voice input. As the user sends the wake-up signal x, the virtual world devices 217 and the physical world devices 215 receive the wakeup command at block 723. As the user sends the wakeup command, the PMDW module 307, and the VMDW module 309, at operation 701, detect the voice input received from the user 301. Based on the reception of the wake-up signal x, the PMDW module 307, and the VMDW module 309, at operation 703, receive the first plurality of wakeup parameters associated with the plurality of physical world devices 215 in the physical environment 103 and the second plurality of wakeup parameters associated with the plurality of virtual world devices 217 in the virtual world (e.g. XR world environment 101).
For example, the device x1, the device x2, the device x3, and the device x4 receive the wake-up command and calculate respective values for SNR, direction, processed distance, time, voice analyser values, etc., as the first wake-up parameters. The first wake-up parameters are passed to DB 207 and further given in standard format to HMDA module 310 for processing.
As a further example, the device y1, the device y2, the device y3, and the device y4 receive the wake-up command and calculate respective values such as egocentric distance, non-speech inputs including device parameters, user direction inputs, etc., as the second wake-up parameters. The second wake-up parameters are passed to DB 207 and further given in the standard format to HMDA module 310 for processing. The operations performed at operation 703 correspond to block 725.
The first plurality of wakeup parameters and the second plurality of wakeup parameters are fed to the HMDA module 310 module for pre-processing and normalizing the first plurality of wakeup parameters and the second plurality of wakeup parameters. A detailed operation of the pre-processing and normalizing of the first plurality of wakeup parameters and the second plurality of wakeup parameters is explained in the above paragraphs.
Further, at operation 705, the normalized values of the first plurality of wakeup parameters and the second plurality of wakeup parameters are fed, by the HMDA module 310, to the pre-trained artificial intelligence (AI)-based model of the DCDS module 303. Further, the HMDA module 310 also triggers a determination request to the DCDS module 303 for predicting the target device. The operation 705 corresponds to the operation at block 727. Accordingly, the DCDS module 303 at block 729 receives the first plurality of wakeup parameters, the second plurality of wakeup parameters, and the determination request from the HMDA module 310.
The pre-trained artificial intelligence (AI)-based model includes the correlation of one or more of the context of the user, the history, the device state of each of the plurality of real-world devices and the plurality of virtual-world devices, the device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices, the task execution information of each of the plurality of physical world devices 215 and the plurality of virtual world devices 217 and the error information in wake-up of each of the plurality of the physical world devices 215 and the plurality of virtual-world devices 217. The training of the pre-trained artificial intelligence (AI)-based model is explained with reference to FIG. 5 above.
At operation 707, the DCDS module 303 predicts a target device based on the first plurality of wakeup parameters, and the second plurality of wakeup parameters. In an embodiment, the prediction of the target device is explained above with respect to the explanation of the DCDS module 303. Referring to FIG. 3, device y4 having the highest rank among the other virtual world devices 217 from the XR environment 101 is being selected. In an embodiment, the predicted target device may be referred to as a winner device. Further, at operation 709, the HDMC module 305 sends the wake-up signal to the target device to turn the target device into a listening state. The operation 709 corresponds to the operation in the block 731. In response to the sent wake-up signal, at block 733, if the target device is available then it sends back a success signal. According to the example embodiment, the device y4 receives the wake-up signal. If the device y4 is available, then the device y4 sends a success signal status in response to the wake-up signal and goes into a listening stage. On the other hand, if the device y4 is unavailable then an error or failure response will be sent by the device y4. In such a scenario, the device which has the next highest rank in the list of target devices will be sent with the wake-up signal. Accordingly, an appropriate device will be activated when the user is in the multi-device multi reality environment.
FIG. 8 illustrates a use case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
FIG. 8 illustrates the multi-reality environment where a user 301 is communicating with physical world devices 215 and virtual world device 217. In an example scenario consider that user 301 is in a kitchen room with multiple voice-enabled/controlled devices around in the physical world environment 103. Further, the user 301 also is in a living room in the virtual scene with multiple virtual voice-enabled/controlled devices (i.e., 801v, 803v, 805v, 301v and 200v) around in a virtual world environment 800. According to an example embodiment, the user 301 wants to give a command and hence wants to wake up one device. As an example, the kitchen room, which is the physical reality of the user includes multiple smart devices such as a family hub, smart chimney, hob, tablet, smart bulb, etc. The living room in the virtual world 800 includes multiple voice-enabled devices such as a TV, a phone, a tablet, speakers, a smart bulb, etc. Thus, as the user provides the voice input, the user's 301 wake-up command is passed to all the devices for inputs. The inputs i.e. the first wake-up parameters and the second wake-up parameters are processed by HMDA module 310. The processed and normalized inputs are then fed to the DCDS module 303 for prediction. The DCDS module 303 further predicts the target device having the highest rank. Further, the HMDE module 305 sends the wake-up signal to the target device to turn it into listening mode. In the example scenario consider that TV of the virtual world is being selected as the target device. In particular, the DCDS module 303 is based on multiple factors such as signal-to-noise ratio at each device, nonspeech inputs including normalized device parameters from both the physical and virtual reality environments such as user direction, user distance from the device, time factor, voice analyser inputs, egocentric distance, etc., predicts the TV display in the virtual world as the higher ranked. Thus, the wake-up signal is dispatched to the same by the HMDE module 305. According to some embodiments, the HMDA module 310, DCDS module 303, and the HMDE module 305 are collectively referred to as a contextual device selection engine.
FIG. 9 illustrates another case scenario for waking up a device among a plurality of devices in a multi-reality environment, according to an embodiment of the disclosure.
FIG. 9 illustrates the multi-reality environment where a user 301 is communicating with physical world devices 215 and virtual world device 217. In an example, a scenario considers that the user 301 is in a living room with multiple voice-enabled/controlled devices around in the physical environment 103. The user 301 is in the virtual scene bedroom with multiple virtual voice-enabled/controlled devices around in the virtual environment 800. The user is not static and moving around in the virtual space. The user 301 wants to give the command and hence wants to wake up one device. As an example, the living room, which is the physical reality of the user 301 includes multiple voice-enabled devices such as TV, Phone, tablet, speaker, smart bulb, etc. Further, the bedroom, which is the virtual scene of the user 301 includes multiple smart devices such as smart TV, smart Bulb, etc. Thus, as the user provides the voice input, the user's 301 wake-up command is passed to all the devices for inputs. Further, the inputs i.e. the first wake-up parameters and the second wake-up parameters are processed by HMDA module 310. The processed and normalized inputs are then fed to the DCDS module 303 for prediction. The DCDS module 303 further predicts the target device having the highest rank. The HMDE module 305 sends the wake-up signal to the target device to turn it into listening mode. In the example scenario consider that TV 901 of the virtual world is being selected as the target device. In particular, the DCDS module 303 is based on multiple factors such as signal-to-noise ratio at each device, nonspeech inputs including normalized device parameters from both the physical and virtual reality environments such as user direction, user distance from the device, time factor, voice analyser inputs, egocentric distance, etc., predicts the TV display 901 in the virtual environment 800 as the higher ranked. The wake-up signal is dispatched to the same by the HMDE module 305. According to some embodiments, the HMDA module 310, DCDS module 303, and the HMDE module 305 are collectively referred to as a contextual device selection engine.
While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein.
The actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
It will be appreciated that various embodiments of the disclosure according to the claims and description in the specification can be realized in the form of hardware, software or a combination of hardware and software.
Any such software may be stored in non-transitory computer readable storage media. The non-transitory computer readable storage media store one or more computer programs (software modules), the one or more computer programs include computer-executable instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform a method of the disclosure.
Any such software may be stored in the form of volatile or non-volatile storage such as, for example, a storage device like read only memory (ROM), whether erasable or rewritable or not, or in the form of memory such as, for example, random access memory (RAM), memory chips, device or integrated circuits or on an optically or magnetically readable medium such as, for example, a compact disk (CD), digital versatile disc (DVD), magnetic disk or magnetic tape or the like. It will be appreciated that the storage devices and storage media are various embodiments of non-transitory machine-readable storage that are suitable for storing a computer program or computer programs comprising instructions that, when executed, implement various embodiments of the disclosure. Accordingly, various embodiments provide a program comprising code for implementing apparatus or a method as claimed in any one of the claims of this specification and a non-transitory machine-readable storage storing such a program.
While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.

Claims (15)

  1. A method for waking up a device among a plurality of devices in a multi-reality environment, the method comprising:
    detecting a voice input from a user;
    receiving, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real-world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world;
    feeding the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including:
    a correlation of one or more of a context of the user,
    a history,
    a device state of each of the plurality of real-world devices and the plurality of virtual-world devices,
    a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices,
    a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices, and
    error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices;
    predicting, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters; and
    sending a wake-up signal to the target device for turning the target device into a listening state.
  2. The method of claim 1, wherein the first plurality of wakeup parameters comprises at least one of:
    a signal to noise ratio (SNR) value of each of the plurality of real-world devices;
    user's environmental information;
    a current state of each of the plurality of real-world devices;
    a first device status of each of the plurality of real-world devices;
    a first device context of each of the plurality of real-world devices;
    a direction of the voice input of the user;
    a distance of the user from each of the plurality of real-world devices;
    user's location information;
    a voice profile information;
    user profile information; or
    time information related to usage of the each of the plurality of real-world devices.
  3. The method of claim 1, wherein the second plurality of wakeup parameters comprises at least one of:
    a second device state of each of the plurality of virtual-world devices;
    a second device context of each of the plurality of virtual-world devices;
    an egocentric distance between the user and each of the plurality of virtual-world devices;
    user profile information;
    time information related to usage of the each of the plurality of virtual-world devices; or
    a normalized signal to noise ratio (SNR) value of each of the plurality of virtual-world devices.
  4. The method of claim 1, further comprising:
    pre-processing the first plurality of wakeup parameters and the second plurality of wakeup parameters by performing a plurality of operations; and
    normalizing the first plurality of wakeup parameters and the second plurality of wakeup parameters based on the pre-processing,
    wherein the normalized first plurality of wakeup parameters and the normalized second plurality of wakeup parameters are fed into the pre-trained AI-based model.
  5. The method of claim 1, wherein the predicting of the target device comprises:
    predicting, by the pre-trained AI-based model, a list of candidate devices, from the plurality of real-world devices and the plurality of virtual-world devices, wherein the list of candidate devices includes devices predicted from the plurality of real-world devices and the plurality of virtual-world devices,
    assigning, in accordance with a determined prediction function, a score to each candidate device in the list of candidate devices by the pre-trained AI-based model,
    ranking each of the candidate devices based on the assigned score by the pre-trained AI-based model, and
    determining the target device from the list of candidate devices having a first highest rank for waking-up.
  6. The method of claim 5, further comprising:
    determining an availability of the target device based on the first plurality of wakeup parameters and the second plurality of wakeup parameters; and
    sending the wake-up signal to the target device for turning the target device in the listening state based on the determination of the availability of the target device.
  7. The method of claim 6, wherein, based on the determination of unavailability of the target device, the method further comprises:
    determining a next target device from the list of candidate devices having a second highest rank for waking-up; and
    sending the wake-up signal to the next target device for turning the next target device in the listening state based on a result of the determination.
  8. An apparatus for waking up a device among a plurality of devices in a multi-reality environment, the apparatus comprising:
    memory storing one or more computer programs; and
    one or more processors communicatively coupled to the memory,
    wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to:
    detect a voice input from a user;
    receive, based on the voice input, a first plurality of wakeup parameters associated with a plurality of real-world devices in the real-world and a second plurality of wakeup parameters associated with a plurality of virtual-world devices in a virtual world;
    feed the first plurality of wakeup parameters and the second plurality of wakeup parameters into a pre-trained artificial intelligence (AI)-based model including:
    a correlation of one or more of a context of the user,
    a history a device state of each of the plurality of real-world devices and the plurality of virtual-world devices,
    a device wakeup operation information of each of the plurality of real-world devices and the plurality of virtual-world devices,
    a task execution information of each of the plurality of real-world devices and the plurality of virtual-world devices, and
    an error information in wake-up of each of the plurality of real-world devices and the plurality of virtual-world devices;
    predict, by the pre-trained AI-based model, a target device based on the first plurality of wakeup parameters, the second plurality of wakeup parameters; and
    send a wake-up signal to the target device for turning the target device into a listening state.
  9. The apparatus of claim 8, wherein the first plurality of wakeup parameters comprises at least one of:
    a signal to noise ratio (SNR) value of each of the plurality of real-world devices, user's environmental information;
    a current state of each of the plurality of real-world devices, a first device status of each of the plurality of real-world devices;
    a first device context of each of the plurality of real-world devices, a direction of the voice input of the user;
    a distance of the user from each of the plurality of real-world devices;
    user's location information;
    a voice profile information;
    user profile information; or
    time information related to usage of the each of the plurality of real-world devices.
  10. The apparatus of claim 8, wherein the second plurality of wakeup parameters comprises at least one of:
    a second device state of each of the plurality of virtual-world devices;
    a second device context of each of the plurality of virtual-world devices;
    an egocentric distance between the user and each of the plurality of virtual-world devices;
    user profile information;
    time information related to usage of the each of the plurality of virtual-world devices; or
    a normalized signal to noise ratio (SNR) value of each of the plurality of virtual-world devices.
  11. The apparatus of claim 8,
    wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to:
    pre-process the first plurality of wakeup parameters and the second plurality of wakeup parameters by performing a plurality of operations, and
    normalize the first plurality of wakeup parameters and the second plurality of wakeup parameters based on the pre-processing, and
    wherein the normalized first plurality of wakeup parameters and the normalized second plurality of wakeup parameters are fed into the pre-trained AI-based model.
  12. The apparatus of claim 8, wherein, to predict the target device, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to:
    predict, by the pre-trained AI-based model, a list of candidate devices, from the plurality of real-world devices and the plurality of virtual-world devices, wherein the list of candidate devices includes devices predicted from the plurality of real-world devices and the plurality of virtual-world devices,
    assign, in accordance with a determined prediction function, a score to each candidate device in the list of candidate devices by the pre-trained AI-based model,
    rank each of the candidate devices based on the assigned score by the pre-trained AI-based model, and
    determine the target device from the list of candidate devices having a first highest rank for waking-up.
  13. The apparatus of claim 12, wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to:
    determine an availability of the target device based on the first plurality of wakeup parameters and the second plurality of wakeup parameters, and
    send the wake-up signal to the target device for turning the target device in the listening state based on the determination of the availability of the target device.
  14. The apparatus of claim 13, wherein, based on the determination of unavailability of the target device, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors, cause the apparatus to:
    determine a next target device from the list of candidate devices having a second highest rank for waking-up, and
    send the wake-up signal to the next target device for turning the next target device in the listening state based on a result of the determination.
  15. The apparatus of claim 8, wherein input samples for the pre-trained AI-based model comprise a context of the user, a history, a device state of each of a plurality of physical world devices and a plurality of virtual world devices, task execution information of each of the plurality of physical world devices and the plurality of virtual world devices, and error information in wake-up of each of the plurality of the physical world devices and the plurality of virtual-world devices.
PCT/KR2024/007274 2023-06-05 2024-05-29 Method and system for contextual device wake-up in multi-device multi-reality environments Ceased WO2024253376A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202480037270.1A CN121241329A (en) 2023-06-05 2024-05-29 Methods and systems for context-based device wake-up in multi-device, multi-reality environments
EP24819528.1A EP4619855A4 (en) 2023-06-05 2024-05-29 METHOD AND SYSTEM FOR CONTEXTUALLY WAKEING UP A DEVICE IN ENVIRONMENTS WITH MULTIPLE DEVICES AND MULTIPLE REALITIES
US18/883,463 US20250006195A1 (en) 2023-06-05 2024-09-12 Method and system for contextual device wake-up in multi-device multi-reality environments

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202341038628 2023-06-05
IN202341038628 2024-02-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/883,463 Continuation US20250006195A1 (en) 2023-06-05 2024-09-12 Method and system for contextual device wake-up in multi-device multi-reality environments

Publications (1)

Publication Number Publication Date
WO2024253376A1 true WO2024253376A1 (en) 2024-12-12

Family

ID=93794543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2024/007274 Ceased WO2024253376A1 (en) 2023-06-05 2024-05-29 Method and system for contextual device wake-up in multi-device multi-reality environments

Country Status (4)

Country Link
US (1) US20250006195A1 (en)
EP (1) EP4619855A4 (en)
CN (1) CN121241329A (en)
WO (1) WO2024253376A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392834A1 (en) * 2019-06-25 2019-12-26 Lg Electronics Inc. Method and apparatus for selecting voice-enabled device
US20200005766A1 (en) * 2019-08-15 2020-01-02 Lg Electronics Inc. Deeplearning method for voice recognition model and voice recognition device based on artificial neural network
CN112507799A (en) * 2020-11-13 2021-03-16 幻蝎科技(武汉)有限公司 Image identification method based on eye movement fixation point guidance, MR glasses and medium
US20210233537A1 (en) * 2020-01-28 2021-07-29 Lg Electronics Inc. Device, system and method for controlling a plurality of voice recognition devices
US20220051677A1 (en) * 2019-04-25 2022-02-17 Lg Electronics Inc. Intelligent voice enable device searching method and apparatus thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9812126B2 (en) * 2014-11-28 2017-11-07 Microsoft Technology Licensing, Llc Device arbitration for listening devices
US10607081B2 (en) * 2016-01-06 2020-03-31 Orcam Technologies Ltd. Collaboration facilitator for wearable devices
US12406671B2 (en) * 2021-10-27 2025-09-02 Samsung Electronics Co., Ltd. Method of identifying target device based on reception of utterance and electronic device therefor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220051677A1 (en) * 2019-04-25 2022-02-17 Lg Electronics Inc. Intelligent voice enable device searching method and apparatus thereof
US20190392834A1 (en) * 2019-06-25 2019-12-26 Lg Electronics Inc. Method and apparatus for selecting voice-enabled device
US20200005766A1 (en) * 2019-08-15 2020-01-02 Lg Electronics Inc. Deeplearning method for voice recognition model and voice recognition device based on artificial neural network
US20210233537A1 (en) * 2020-01-28 2021-07-29 Lg Electronics Inc. Device, system and method for controlling a plurality of voice recognition devices
CN112507799A (en) * 2020-11-13 2021-03-16 幻蝎科技(武汉)有限公司 Image identification method based on eye movement fixation point guidance, MR glasses and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4619855A4 *

Also Published As

Publication number Publication date
US20250006195A1 (en) 2025-01-02
EP4619855A1 (en) 2025-09-24
CN121241329A (en) 2025-12-30
EP4619855A4 (en) 2026-03-18

Similar Documents

Publication Publication Date Title
WO2019172704A1 (en) Method for intent-based interactive response and electronic device thereof
JP2020173462A (en) Computer-based selection of synthetic speech for agents
CN111552888A (en) Content recommendation method, device, equipment and storage medium
CN110162604B (en) Statement generation method, device, equipment and storage medium
US20180108352A1 (en) Robot Interactive Communication System
CN113762585B (en) Data processing method, account type identification method and device
JP2021068455A (en) Method of recognizing and utilizing user's face based on image and computer system
CN115205925A (en) Expression coefficient determining method and device, electronic equipment and storage medium
CN117908736A (en) Interaction method, device, equipment and storage medium
CN110955390A (en) Data processing method, apparatus and electronic equipment
CN110309339A (en) Picture tag generation method and device, terminal and storage medium
CN107968890A (en) Theme setting method and device, terminal equipment and storage medium
WO2025180460A1 (en) Model training method and apparatus, and computer device and storage medium
CN114168332A (en) Task processing method, device, electronic device and storage medium
WO2024253376A1 (en) Method and system for contextual device wake-up in multi-device multi-reality environments
KR102772461B1 (en) Method and system for searching media message using keyword extracted from media file
CN116030375A (en) Video feature extraction, model training method, device, equipment and storage medium
CN114066098A (en) Method and device for estimating completion duration of learning task
WO2021006620A1 (en) Method and system for processing a dialog between an electronic device and a user
CN111291694B (en) Dish image recognition method and device
US20210397991A1 (en) Predictively setting information handling system (ihs) parameters using learned remote meeting attributes
TWI698757B (en) Smart engine with dynamic profiles and method of operating smart engine
CN115525554B (en) Automatic test method, system and storage medium for model
CN118245525A (en) Content searching method, device, electronic equipment and computer readable storage medium
EP4616366A1 (en) Method and electronic device for estimating a landmark point of body part of subject

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24819528

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024819528

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2024819528

Country of ref document: EP

Effective date: 20250617

WWP Wipo information: published in national office

Ref document number: 2024819528

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE