WO2025042784A9 - Système de génération de dialogue empathique - Google Patents

Système de génération de dialogue empathique Download PDF

Info

Publication number
WO2025042784A9
WO2025042784A9 PCT/US2024/042810 US2024042810W WO2025042784A9 WO 2025042784 A9 WO2025042784 A9 WO 2025042784A9 US 2024042810 W US2024042810 W US 2024042810W WO 2025042784 A9 WO2025042784 A9 WO 2025042784A9
Authority
WO
WIPO (PCT)
Prior art keywords
user
llm
guardrail
message
observers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/042810
Other languages
English (en)
Other versions
WO2025042784A1 (fr
Inventor
Judy L. BARKAL
Edison TING
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Health2047 Inc
Original Assignee
Health2047 Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Health2047 Inc filed Critical Health2047 Inc
Publication of WO2025042784A1 publication Critical patent/WO2025042784A1/fr
Publication of WO2025042784A9 publication Critical patent/WO2025042784A9/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/0092Nutrition

Definitions

  • the disclosed technology generally relates to generating empathetic dialogue. More particularly, embodiments relate to use multiple large language models as observer guardrails to generate empathetic dialogue.
  • Such persons may benefit from routine interactions.
  • technological techniques to enable such routine interactions are deficient.
  • current software -based conversational techniques do not enable such guidance and companionship.
  • the current software-based conversational techniques are ill-suited to addressing this technical problem.
  • LLM primary large language model
  • API application programming interface
  • the communications with the end-user may be adjusted using a multitude of LLMs, with each of the LLMs accounting for one conversational aspect.
  • the multitude of LLMs may be run by the system.
  • the system can enhance the primary LLM’s sensitivity and applicability to the end-user.
  • LMMs Large Multimodal Models
  • the models may output text, images, audio, and so on.
  • LLMs are increasingly being used to communicate with users while addressing requests and/or questions from the users. For example, a user may ask an LLM for a recipe which uses a list of ingredients. As another example, a user may ask an LLM to solve a particular programming problem. As another example, a user may ask an LLM to draft a poem or portion of a paper based on information input by the user.
  • LLMs are ill suited for sensitive communications with users who may benefit from periodic conversations which are delivered with empathetic dialogue optionally along with images, sound, and other types of media I expressions to help facilitate engagement.
  • a health professional e.g., a mental health specialist
  • friends or family For certain patients, however, they may lack friends or family, or they may be uncomfortable communicating with friends or family about mental health issues.
  • the above-described patient may turn to existing LLMs; however, existing LLMs suffer numerous technological defects which make them problematic for such a use case.
  • an LLM at present is difficult to constrain.
  • Current techniques include a complex prompt which is provided with any interaction with a user 108.
  • the complex prompt may outline the role of the LLM (e.g., helper) and may try to introduce safeguards.
  • these safeguards are easily broken due to actions of the user or malicious actions by others (c.g., prompt injection attacks).
  • prompt injection attacks There is also no guarantee that such safeguards are maintained during a conversation.
  • the LLM is not a reasonable technological solution.
  • the health professional may prefer to have the conversations with the user to be geared towards certain goals.
  • the health professional may prefer that the user be nudged towards social interactions in the real world.
  • the health professional may prefer that the user receive encouragement to follow the health professional’s nutritional advice.
  • the health professional may prefer that the user be periodically reminded to take his/her medicine.
  • an LLM which is largely a black box, can guide the user toward the goals.
  • guardrail observers may monitor communications with a primary LLM and adjust, or otherwise update, the communications.
  • the guardrail observers may include information in a prompt (e.g., a message) to the primary LLM to segue to a different topic.
  • the guardrail observers may include information in a prompt to the primary LLM to introduce a particular topic.
  • Additional examples may include adjusting prompts to introduce goals, to recall information or stories of interest to the user, and so on.
  • the guardrail observers may adjust, or even block, a message being provided by the primary LLM to the user. For example, a guardrail observer may remove reference to political discussion, or other topics known to be problematic to the user (e.g., based on a history of conversations with the user).
  • Example guardrail observers may include a watcher, reasoner, and planner. These guardrail observers will be described in more detail below.
  • the watcher may help to guide conversations between the user and the system to avoid, or otherwise mitigate, harmful content.
  • the watcher may cause a message from the user to be disregarded, or otherwise altered, and cause the primary LLM to change topics.
  • the watcher may similarly disregard, or otherwise alter, a message from the primary LLM and cause it to generate a new message which changes topics.
  • the reasoner may have access to a conversational history, or features extracted from the conversational history (e.g., preferences, likes, interests, dislikes). As will be described, the reasoner may look for times at which it can inject message or prompts. For example, the reasoner may have access to goals set by a health professional or the user. In this example, the reasoner may monitor a conversation and identify when the add a message which is geared towards furtherance of a goal. As will be described, the reasoner may adjust a prompt which is being provided to the primary LLM. For example, the reasoner may add language to a message received from the user which informs the primary LLM that it should reference, or otherwise further, a particular goal. Thus, the output of the primary LLM, which may also be analyzed by the reasoner, may cause introduction of the particular goal.
  • features extracted from the conversational history e.g., preferences, likes, interests, dislikes.
  • the reasoner may look for times at which it can inject message or prompts.
  • the reasoner may have access
  • the planner may analyze a current conversation with a user and look for potential segues.
  • the segues may be used to provide real-time segues towards particular goals.
  • the user may be communicating with the system about an upcoming trip to a particular country.
  • the system may have knowledge, for example based on the current or prior conversation(s), that the user enjoys trying new foods.
  • the planner may determine to segue into a discussion about healthy eating while trying new foods on a trip.
  • the coordination of the above-described guardrails may be performed by the system.
  • the system may determine whether to adjust a conversation with a user based on the output from the guardrail observers. For example, the system may determine to ignore the guardrail observers or may determine to implement one of the adjustments from one of the guardrail observers. Additional description regarding coordination, and use of the guardrail observers, is included below with respect to at least Figures 1B-1C.
  • a user can interact with a user device to provide a message to the system, either spoken or provided via input of text.
  • the system may receive the message and, and as described above, determine whether to the adjust the message prior to transmission to an outside LLM (e.g., the above-described primary LLM).
  • the outside LLM may thus respond to the received message, with the response being routed to the system described herein. Similar to the above, the system may determine whether to adjust the received message prior to transmission to the user device.
  • the system may render an avatar on the user device to speak, or otherwise output, the messages being provided to the user device.
  • the avatar may be user 108 selectable (e.g., in Figure 1A, a cat is selected).
  • the avatar may be rendered in realistic motion, for example using machine learning techniques, such that the user may communicate with the avatar while the complex back-end processing described herein is masked from his/her view.
  • Guardrail observers can include one or more additional LLMs with sets of conditions and instructions which, when a message is received, cause the LLMs to generate response adjustments, composed of responses generated by one or more LLMs which can supplement or replace a response generated by the external LLM.
  • the system can include dialogue history, a record of received messages and generated replies; world context, a database including information on a host of topics, both plainly relevant to a user’s living situation and personal circumstances, and those tangentially related; a user model, which can include descriptions and factors concerning user’s physical and mental condition and potential ailments as well as health and personal goals the user has been prescribed and sought out themself; and a configuration, which can include personal preferences of a user.
  • the guardrail observers can include a planner, reasoner, and watcher. As described above, these observers may use individual LLMs with specialized conditions and instructions for generating response adjustments after a message from the user is received.
  • the watcher can be configured to review the message for harmful content and ignore the harmful content when generating an adjustment. Harmful content can be inflammatory or controversial topics which are likely to cause distress in a user if mentioned in conversation.
  • the planner can be configured to review the user prompt and dialogue history to generate response adjustments including conversation segues which introduce into a conversation advice or a recommendation which can lead a user to take actions to achieve one or more goals.
  • the reasoner can review one or more databases to find contextual information to generate a response adjustment which include the contextual information.
  • the contextual information can be situational factors that arc broadly applicable to persons in the position of the user.
  • contextual information can include factors common to a user’s housing situation, climate, and geographical location. For each of the guardrail observers, if relevant information is not identified, a response adjustment will not be generated.
  • each response adjustment can be received by a queue, a data structure which stores response adjustments and provides them to a resolver.
  • the resolver may select a response adjustment based on one or more factors.
  • the factors can be an inherent priority wherein a response adjustment from a watcher has highest priority, followed by the planner and then the reasoner.
  • a response adjustment may be selected by the system (e.g., by a dialogue generator of the system)
  • a response may be obtained.
  • the response may be from the external LLM in response to the response adjustment.
  • the response may reflect output from the external LLM as adjusted according to the response adjustment.
  • the response can be converted into a text message and provided to the text- to- speech module, wherein a corresponding audible and/or visual reply can be generated and presented to a user.
  • Applying the aforementioned methods and system can allow conversation with a user which accounts for a user’s sensitivities, health conditions, and personal preferences, and includes replies which both provide stimulation to a user as well as offer advice and actionable steps which can be taken by a user to achieve their personal and health goals.
  • the system can further be configured to identify and record social determinants of health, aspects of a user’ s socio-economic condition which are correlated with health-related outcomes.
  • the system can be configured to identify Social Determinants of Learning (e.g., physical health, psychosocial health, physical environment, social environment, economic stability and self-motivation).
  • the system can be manually modified by a health professional so as to supply corrections to one or more generated replies and direct future replies to be generated in a manner which accommodates the corrections.
  • Example description regarding modification, or other control, of the system is included below with respect to mentor mode.
  • the empathetic dialogue generation system can also include a turn tracking module which identifies topics of conversation and prevents the topics from being reintroduced into a future reply until a sufficient amount of time has passed and/or a sufficient number of replies have been generated.
  • the empathetic dialogue generation system can further be configured to monitor a user’ s mood, identify when a user is experiencing a negative mood, and generate one or more replies and images intended to improve the user’ s mood.
  • an avatar generation system can be implemented to receive or create images of persons, animals, or objects, which are then animated so as to imitate movement which matches audible replies.
  • the method includes accessing a message which forms part of a conversation history with an end-user, wherein the system maintains communications between the end-user and an outside large language model (LLM) being executed by an outside system, wherein the conversation history reflects messages being provided between the end-user and the outside LLM, and wherein the message was received from the end-user for routing to the outside LLM; determining an adjustment to the message based on a plurality of guardrail observers, wherein individual guardrail observers reflect individual LLMs which are executed by the system, wherein the guardrail observers adjust messages based on individual conversational characteristics, and wherein a first guardrail observer adjusts messages based on an individual conversational characteristic that identifies segues in which the first guardrail observer can identify a particular goal set by a domain expert associated with the end-user; obtaining a response from the outside LLM based on the adjusted message, wherein the response is output via an avatar presented on a
  • a method, and system or computer storage media implementing the method are described.
  • the method includes implementing a plurality of guardrail observers, each guardrail observer being associated with a locally executed large language model (LLM), wherein each LLM has a system prompt defining its role; receiving a portion of a conversation history between an external LLM and a user, wherein the guardrail observers generate respective output based on the portion; adjusting a message to one of the external LLM or the user based on the output.
  • LLM locally executed large language model
  • a method, and system or computer storage media implementing the method are described.
  • the method includes rendering an avatar, wherein the avatar outputs messages received from a system; transmitting a message from a user of the user device to the system, wherein the system adjusts the message based on a plurality of guardrail observers, and wherein the adjusted message is transmitted to an external large language model (LLM); and updating the rendered avatar to output a response from the external LLM.
  • LLM large language model
  • Figure 1 A is a block diagram of an example empathetic dialogue generation system in communication with a user device.
  • Figure IB is a detailed block diagram of the example empathetic dialogue generation system.
  • Figure 1C is a block diagram illustrating an example architecture and functioning of guardrail observers included in the example empathetic dialogue generation system.
  • Figure ID is a block diagram illustrating an example architecture and functioning of the empathetic dialogue generation system integrating with one or more health device 152s and receiving and integrating health data from a health data store.
  • Figure 2 is a block diagram illustrating an example architecture and functioning of the avatar generation system.
  • Figure 3A-B are block diagrams describing a method of generating emphatic conversations with a user.
  • Figure 4 is a block diagram illustrating an example implementation of a topic transition structure.
  • Figure 5 is a block diagram illustrating a method of employing the mentor mode.
  • Figure 6 is a block diagram illustrating an example embodiment of the method of monitoring and responding to a user’s level of engagement.
  • Figure 7 A is a block diagram illustrating an example method of categorizing health related information from user messages and presenting the information to mentors.
  • Figure 7B and 7C are example user interfaces that include summarizations of observed messages including information relating to health states and / or conditions as displayed within mentor mode.
  • Figure 7D is an example user interface describing the system performing topic transition structure analysis during the follow-up step.
  • This application describes techniques and methods to direct generation of empathetic dialogue from a large language model (“LLM”) to account for sensitive conversational topics, user goals, and broad contextual conversational backgrounds (e.g., multi-lingual functionality).
  • LLM large language model
  • a system described herein e.g., the empathetic dialogue generation system 100
  • LLMs large multimodal models
  • health professionals e.g., other domain experts, such as education professionals, may leverage the disclosed technology.
  • an LLM is a type of generative machine learning (“ML”) technique designed to process and generate human-like text and which is trained based on vast amounts of textual data.
  • LLMs can utilize deep learning techniques, particularly variants of neural networks, to understand and produce language with a high degree of accuracy and contextuality.
  • an LLM may leverage an attention-based layers, optionally using a mixture of experts architecture.
  • an LLM may be used as a conversational tool such that a user can engage in a simulated conversation with an LLM by sending messages (either text messages or spoken speech converted into text messages) to the LLM and receiving replies which account for context of messages and include words and phrases statistically likely to be appropriate and relevant.
  • LLM technology can be so robust as to enable directed prompts, where replies can be focused by providing instructions within prompts.
  • an LLM can be provided the following user prompt, “Analyze the following sentences and summarize their contents,” followed by a series of sentences to be analyzed.
  • the LLM can be effectively directed to analyze the provided texts.
  • Providing properly phrased instructions can allow tailoring of replies according to needs of a user.
  • LLMs function by being trained on large quantities of text and developing probabilistic models that can predict which sequences of words are likely to be associated in regular communication.
  • an LLM can iteratively select words to form a reply. Iterations may cease when the LLM determines a probability of a reply being complete is sufficiently high.
  • the probabilistic models are designed and how they function is dependent upon architecture of a specific probabilistic model being applied.
  • Each word and symbol in a prompt reflect an added piece of probabilistic context which is considered by a model, but because replies are based on probabilities, generated replies may not include or reference all context included in a prompt.
  • prompts including a large number of words and symbols can include words or symbols which are not reflected in (or significantly influential on) generated replies.
  • a first prompt of “What day of the week is it?” can cause generation of a reply “Today is Monday”
  • a second prompt of “I am wondering about how safe waterskiing is and what time of the day is best suited for water sports” can cause generation of a reply “Between three o’clock and five o’clock is the best time for water sports due to the increased air temperature.”
  • the reply was highly specific and focused.
  • the second prompt multiple topics were mentioned - relevant to safety concerns about waterskiing and an optimal time of day to participate in waterskiing -, causing a diluting effect for each individual term in the prompt which manifested as the reply failing to include reference to safety levels of waterskiing.
  • LLMs can be further limited in their ability to draw upon context which is not immediately present within a prompt or a previous prompt.
  • LLMs function by applying probabilistic models, if context of previously communicated prompts is not included in a current prompt, the previously communicated context will likely not be included in a reply. While for questions and requests which are not significantly context sensitive there might not be an issue, for questions and requests which are context sensitive, including such context can be necessary to generate properly applicable replies.
  • Some LLMs can address this issue by including a record of previous prompts and replies and adding relevant context to new prompts (such as by adding instructions to additionally analyze the record), allowing the probabilistic models to account for previous context.
  • the LLM is not actually “remembering” the user’s birthday and instead is altering future prompts to account from context gathered from prompts and replies, if a few days later the LLM receives a prompt of “When is my birthday?”, the model may fail to generate an appropriate response. Furthermore, if, after receiving the initial birthday prompt and generating the reply, other prompts are received in which the term “birthday” is mentioned a sufficient number of times in different contexts - perhaps with regards to other persons’ birthdays - the LLM can functionally “forget” when the user’s birthday is and generate prompts which assume the only relevant birthday belongs to a friend or other mentioned acquaintance.
  • LLMs have limited ability to functionally remember relevant aspects of a user. This limitation can be especially relevant if a user wishes to engage with an LLM in a manner which facilitates reminders and suggestions of desired activities or other goals.
  • a user can set an alarm on a clock as a reminder to engage in an action, the user may desire conversation with an LLM to periodically reference broader goals external to context immediately expressed in recent prompts.
  • a user who is diabetic may desire for an LLM to only recommend food suggestions which are low in sugar- or may desire periodic reminders to check blood sugar levels - potentially ignoring or deemphasizing context of a user’s prompt in order to appeal to such goals.
  • current LLM technology is limited in its ability to track or implement such goals due to limitations in preserving and accessing goals and requests in a manner which can effectively influence prompt generation.
  • LLMs can additionally have difficulty generating replies which account for sensitivity or empathy.
  • replies generated by LLMs are dependent upon probabilistic models, if training data used to generate an LLM includes harsh, vulgar, offensive, or other phrases likely to cause aggravation if seen or heard by a user, then replies generated for a user are likely to include such offensive material.
  • This issue can be addressed by “flagging” (e.g., marking) certain words or phrases as offensive and inappropriate, but tone itself can be seen as jarring or aggravating even if explicit words are not obviously so. For example, a user can create a prompt of, “I am feeling sick today.
  • LLMs can be limited by context included in prompts and training data and unable to generate replies which fully integrate all potentially relevant context.
  • an LLM can be trained on training data which does not include any facts relevant to farming or life in farming communities and similarly the LLM (once trained) can receive prompts which do not include significant words or phrases related to farming or life in farming communities.
  • a user who sends the LLM a prompt “I live on a farm, what should I do with my free time?” can receive a reply “Free time is a great opportunity to develop your skills and interests” or other similar replies which fail to account for context of a person living in a farming community and who engages in regular farming. Even if a user supplies prompts which include relevant context which potentially could be used to generate appropriate replies, such prompts are unlikely to include context in a sufficient quantity to allow generation of replies which include an optimal amount of context.
  • the disclosed technology includes one or more externally implemented LLMs to be supported by guardrail observers which, in some embodiments, are internal to the system (e.g., the empathetic dialogue generation system 100).
  • Each of the guardrail observers may be implemented, at least in part, by an LLM which may be executed by the system.
  • the system may use the guardrail observers to effectively generate empathetic, relevant, and appropriate dialogue which is tailored to a user’s individual needs and goals. While reference herein is made to three guardrail observers being implemented by three LLMs, in some embodiments there may be 1, 2, 4, 5 LLMs and so on. For example, one LLM may perform the functionality of two or more guardrail observers.
  • the system can direct, or otherwise control, generation of LLM responses from an externally located LLM system and supplement, or otherwise change, the LLM responses with adjustments generated from the guardrail observers.
  • Application of the system can cause generation of replies to user messages which account for harmful and offensive content, user goals, and broader situational context.
  • the system in some embodiments, is guided to achieve these functions based on a topic transition structure which includes a set of prompts to guide generation of LLM replies and adjustments.
  • the topic transition structure may enable the system to transition between initial chitchat to more empathetic discussion with a user which may guide the user towards specific goals (e.g., goals set by a domain expert, such as a health professional).
  • Example description of the topic transition structure is included below, with respect to Figure 4.
  • the system can monitor a conversation (e.g., the user messages) to generate a theory of mind of the user, enabling a modeling of the user’s mood and emotional state.
  • the system can generate, or cause generation of, an animated avatar which visually mimics an audible reply.
  • a message from a user can be an audible question, “I’m feeling tired, what should I do to feel better?” and an audible reply can be “Taking a nap can be a great way to feel more refreshed.”
  • the avatar can be rendered to mimic movement which emulates speaking the reply.
  • Operation of the system may rely on a foundation of LLM operation and direction. By applying specific prompts to an external LLM - cither by supplementing user prompts provided by the user to the system or provided discretely to the system - the system can direct the external LLM to perform analytical operations which achieve complex functionality.
  • the analysis performed by the system can include determining a user’s mental I emotional state, determining if a topic transition is appropriate, determining additional context for a user message by searching a database, identifying sensitive topics mentioned in messages or LLM responses, identifying segues to introduce user goals, among other operations.
  • FIG. 1A is a block diagram illustrating an example empathetic dialogue generation system 100 (“system”) generating empathetic dialogue based on use of an external LLM system 120 to generate replies 104.
  • the empathetic dialogue generation system 100 can generate, or otherwise cause rendering of, an avatar 106 to mimic recitation of the generated replies 104.
  • the empathetic dialogue generation system 100 may receive a message 118 from a user 108 of a user device 102 (e.g., a mobile device, a wearable device, a computer, and so on).
  • the message 118 can form part of a conversation between the user 108 and the system 100, and may include a question, an answer, a comment, a statement, or any combination thereof.
  • the user device 102 can provide the message 118 to the empathetic dialogue generation system 100 for analysis by the system 100.
  • the system 100 may generate the user prompt 112 for transmission to the LLM system 120.
  • the system 100 may adjust the received message 118.
  • the system 100 may reformat the message 118 to include removal of spelling errors and grammar errors.
  • a message can be “What tim of da is it?”, wherein the letters “tim” are meant to be “time” and the letters “da” are meant to be “day.”
  • pre-processing can include changing the message 118 to be “What time of the day is it?”
  • generating the user prompt 112 can include receiving an audible message and generating a corresponding text message with a text-to-speech module 138 (described in further detail in relation to Figure IB).
  • Generation of a user prompt 112 can further include adding one or more words or phrases to text of the message 118 to direct the outside LLM system 120 to analyze the message 118 in an intended manner.
  • the message 118 can have appended the phrase, “Analyze this message and determine the user’s mental state.”
  • the message 118 can have appended the phrase, “Analyze this and the previous ten messages to determine if the user would be receptive to suggestions concerning health-related goals.”
  • the phrases appended to the message 118 can be referred to herein as “system prompts.”
  • system prompts can be received discretely.
  • a system prompt may be “What time is it?”
  • system prompts can be received discretely and can include references to a preceding message or proceeding user prompt 112.
  • a system prompt can be “Analyze the proceeding message for indications that the user is in an aggravated emotional state.”
  • the message 118 can be provided to guardrail observers (e.g., guardrail observers 122 of Figure IB) to cause generation of adjustments which can be used to replace or modify the messages. These adjustments may be included in the user prompt 112.
  • the LLM system 120 can be located external to the empathetic dialogue generation system 100 and can be interacted with by sending user prompts 112 to the LLM system 120.
  • the LLM system 120 can be any commercially available, or open source, large language model. For example, some available LLMs include ChatGPTTM, ClaudeTM, GrokTM, and so on. In some embodiments, the LLM system 120 may be interacted with using an application programming interface (API), network endpoints, and so on.
  • API application programming interface
  • the user prompt 112 can be received by the LLM system 120 and an LLM response 114 can be correspondingly generated. Operation of the LLM system 120 is dependent upon which LLM is being used, but generally, the LLM response 114 will include a text message including words which are selected to appropriately respond or complement the received user prompt 112.
  • the LLM response 114 can be received by the empathetic dialogue generation system 100, and the system 100 may adjust the LLM response 114 prior to transmission to, or presentation via, the user device 102. For example, and as described below, the system 100 may adjust the LLM response 114 using guardrail observers (e.g., guardrail observers 122 as illustrated in Figure IB).
  • guardrail observers e.g., guardrail observers 122 as illustrated in Figure IB
  • the empathetic dialogue generation system 100 can additionally interact with one or more alternative LLM systems. These alternative LLMs can similarly receive the user prompt 112 and generate respective LLM responses. Thus, the system 100 can obtain multiple responses from different LLMs. The system 100 may select from among the responses. For example, the system may prefer a response which is received first, which response is shorter or longer, a priority assigned to an LLM, which LLM best responds to the user prompt 112, and so on. With respect to the best response, the system 100 may utilize the guardrail observers (e.g., the guardrail observers 122). For example, the guardrail observers, such as the watcher, may indicate that certain of the responses include harmful content or may upset the user.
  • the guardrail observers e.g., the guardrail observers 122).
  • the guardrail observers such as the watcher, may indicate that certain of the responses include harmful content or may upset the user.
  • the guardrail observers 122 may have adjusted the message 118 to include text indicating the LLM systems are to segue towards a goal.
  • the guardrail observers 122 may identify that certain of the responses do not reference the goal or a segue towards the goal.
  • the LLM response 114 may be adjusted by the system 100 to form the adjusted response 116 via implementation of guardrail observers.
  • individual guardrail observers may be implemented, in part, by individual LLMs executed by the system 100 or otherwise accessible to the system 100.
  • the LLMs may represent open source or otherwise publicly accessible language models, such as Llama.
  • the LLMs may be pre-trained.
  • the system 100 may refine the weights of the LLMs through additional training. For example, the system 100 may provide additional input and expected responses for refining of the LLMs.
  • the system 100 may compute forward passes through the LLMs or may instruct an outside system (e.g., a cloud system) to execute the LLMs.
  • the system may thus optionally adjust the LLM response 114 with content from one or more of the guardrail observers.
  • the system may also route the LLM response 114 without adjustment to the user device 102. Additional description regarding guardrail observers is included below with respect to Figure IB.
  • the system may provide the adjusted response 116 to the user 108 for presentation via the user device.
  • a reply 104 to the user’s 108 message may be presented as a visual text message or as audible message produced by passing the adjusted response 116.
  • the large language model system 120 may implement a large multimodal model (LMM).
  • the guardrail observers may implement LLMs.
  • the system 120 may output an image, audio, video, and so on in response to the user’s message 118.
  • the user’s message 118 may include an image, audio, video, and so on.
  • the system may output an infographic (e.g., infographic 160 of Figure IB) which is relevant to the user’s 108 message or conversation.
  • the infographic 160 may graphically describe techniques to reduce calorie consumption.
  • the LLM system 120 may generate the infographic 160 using generative Al techniques.
  • the infographic 160 may be provided by a health professional, or other domain expert, associated with the system 100.
  • the health professional may engage in a conversation with the system 100 (e.g., in mentor mode, described below) and describe the infographic 160 and when it should be used.
  • the infographic 160 can be provided by a teacher, a counselor, a family member, or any other domain expert (e.g., a person with knowledge in a domain of interest).
  • a domain of interest can be the field of health, mental health, nutrition, exercise science, pedagogy, amongst others.
  • the system 100 may cause the infographic 160 to be included in the response, for example via the guardrail observers described herein.
  • An infographic may provide suggestions, and in some embodiments, not include advice.
  • an infographic may state or explain facts about certain benefits of exercise but may not prescribe an exercise regimen.
  • An infographic can state “Exercise can be a good way to manage stress” but an infographic may avoid stating, “You should exercise at least 30 minutes every day.” In this way, the system 100 may provide helpful advice for the user 108.
  • the adjusted response 116 may represent a safety message or pre-determined reply provided to the user 108.
  • a safety message can be provided when generation of an LLM response 114 from the LLM system 120 is taking longer a threshold amount of time. The safety message may also be provided if the system 100, such as via the guardrail observers, is taking longer than a threshold amount of time to review the LLM response 114 or the system 100 is delayed or interrupted for another reason.
  • An example safety message can be “I hear you” or “I see.”
  • Safety messages can be included within one or more databases within the empathetic dialogue generation system 120. These safety messages may therefore further convey a realistic conversation between the system 100 and user 108, for example by masking network or other technological delay.
  • Figure 1 B is a detailed block diagram of the empathetic dialogue generation system 100.
  • the empathetic dialogue generation system 100 can include a dialogue generator 110, a tex t-to- speech module 138, guardrail observers 122, and a set of databases including dialogue history 130, world context 132, user model 134, and configuration 136 data.
  • a user e.g., a patient or someone under advisement from a health professional or other domain expert
  • the user may communicate via audibly speaking words into the user device 102 (e.g., a microphone of the device 102).
  • the user may also provide text via interacting with an analog or digital keyboard of the user device 102.
  • An audible message received by the user device 102 can be provided to the text- to-speech module 138 of the system 100, or processed locally on the user device 102, such that corresponding text can be generated.
  • the system may therefore receive a message 118 from the user device 102. (e.g., either received by a user 108 or generated from the text-to- speech module 138).
  • a dialogue generator 110 of the system 100 can convert the message 118 into the user prompt 112 (e.g., described in Figure 1A) which can be provided to the LLM system 120.
  • the dialogue generator 110 can generate system prompts to instruct the LLM system 120 and/or one or more guardrail observers 122.
  • the system prompts can cause adjustment of the user prompt 112, for example adjusting the received message 118 to add the system prompts.
  • the system 100 may analyze information included within the databases to determine the system prompts. For example, the system 100 may add text in a system prompt related to a particular goal to which the outside LLM system 120 is to segue.
  • the dialogue generator 110 may control the transmission of information between the LLM system 120 and the user device 102.
  • the dialogue generator 110 may adjust a message (e.g., an LLM response 114) from the LLM system prior to being transmitted to the user device 102.
  • harmful content may be removed from the message.
  • the dialogue generator 110 may adjust the message and cause a new message (e.g., a new LLM response 114) from the LLM system 120 to be generated.
  • the system may update a system prompt being provided to the LLM system 120 (e.g., the updated system prompt may instruct the LLM system 120 to remove harmful content, such as a discussion of a particular scenario, to update or add language regarding a particular goal, and so on).
  • the dialogue generator 110 may similarly adjust a message from the user device 102 (e.g., message 118) prior to transmission to the LLM system 120.
  • the generator 110 may add system prompts.
  • the generator 110 may leverage input from the guardrail observers 122 to ensure that conversations satisfy the empathetic constraints described herein.
  • the dialogue history 130 may include records or information reflecting received messages and generated replies.
  • the history 130 may include a conversation history with a user 108 of the user device 102.
  • Dialogue history 130 can additionally include summaries of conversation histories, for example topics of discussion, times of discussion, and so on. Additional description regarding the dialogue history is included below.
  • World context 132 may include information related to a number of relevant topics like one or more of weather, date, time, geolocation descriptions of businesses, utilities, and recreational activities, recent world events, or any other ancillary topic which may be relevant to the user. This information may optionally be personalized to the user, such as to the user’s location, interests, family members, friends, and so on.
  • world context 132 can be linked to, or otherwise include information from, one or more externally located sources.
  • world history can be linked to WikipediaTM.
  • the system 100 may make network calls to outside systems, such as to search for information relevant to personalizing conversations with the user. For example, the system may search for recent events related to the conversation or may search for upcoming weather related to a location of the user.
  • the system 100 may leverage the world context 132 to provide accurate and engaging conversation to the user.
  • User model 134 may include information describing the user’s physical and mental health conditions. These may be provided, for example, by one or more health professionals.
  • the user model 134 may include particular goals, such as goals a health professional wants to see furthered for the user.
  • the user model 134 may also include preferences of the user.
  • information describing the user’s physical condition can include a description of how the user’s mobility is limited to a wheelchair.
  • information describing the user’s mental conditions can include a description of the user’s diagnosis of dementia.
  • the system 100 may require that the person be added to a list of credentialed personnel or otherwise authorized. While health conditions are described above, in some embodiments other information may be included.
  • the user model 134 may reflect educational goals associated with a student, mentee, and so on.
  • a goal such as one specified or indicated by a health professional, can be a health-related achievement or a personal achievement.
  • a health-related achievement can be to reduce blood pressure.
  • a personal achievement can be to read for thirty minutes each day.
  • a goal can be received by the system by being explicitly stated by the user.
  • the message 118 from the user include, “It is a goal of mine to read thirty minutes every day.”
  • the goal e.g., a summary thereof
  • the system 100 e.g., via a guardrail observer
  • a goal can be provided by a health professional interacting with the system 100.
  • a nurse can say or enter text into the system 100 which states, “The user has a goal of reducing their blood pressure.”
  • the third party must receive authorization by the user or a person who established the user’s 108 profile.
  • Receiving authorization can include having a third party’s credentials added to a list of persons allowed to directly generate and modify goals.
  • the user model 134 can additionally include follow-up topics, health data, and user preferences.
  • follow-up topics may include conversational points which can be mentioned when the user initially engages with the system 100 after a break from a prior conversation.
  • the system 100 detects a conversation is ending (described more in relation to the topic transitional structure of Figure 4) one or more follow-up topics can be generated and held in the user model 134.
  • the system 100 may access a topic in a subsequent conversation, and may cause the LLM system 120 (e.g., for inclusion in a system prompt) to mention the accessed topic in the subsequent conversation.
  • Health data can include descriptions of the user’s past and present mental and physical condition (a topic which will be discussed in further detail in relation to Figure ID).
  • Preferences within the user model 134 can describe the user’s affinity for certain types of food, music, movies, activities, places, and so on.
  • the user model 134 may additionally indicate topics the user is not interested in talking about.
  • the user model 134 may additionally indicate content which is considered harmful for the user to discuss, such as politics, particularly family matters, and so on.
  • Configurations 136 may include a user profile reflecting a set of parameters and settings which personalize an experience the user has with the system 100.
  • the configurations 136 can include the user’s name, age, sex, gender, demographic information, language settings, home address, list of persons allowed to engage in mentor mode with the user’s profile, among other options.
  • the user profile can be generated by the user or a third person on the user’ s behalf.
  • the user may interact with an application executing on their device 102 (e.g., the user may establish a username, password, and so on) or the user may interact with a web application associated with the system 100.
  • the dialogue generator 110 directs the LLM system 120 (e.g., by modifying a user prompt 112 with a system prompt or sending a discrete system prompt to the LLM 120) to consider contextual information included in the databases 130-136, the LLM system 120 can generate an LLM response 114 which includes, or is otherwise based on, the contextual information.
  • the dialogue generator 110 can generate system prompts including both the information to be considered and instructions to consider the information when generating the LLM response 114.
  • the dialogue generator 110 can receive a message 118 from the user 108 of, “What should I do today?” and can modify a corresponding user prompt 112 by adding a system prompt of, “Considering my arthritis” to the user prompt 112, with the condition of arthritis being derived from the user model databasel34.
  • a final prompt sent to the LLM system 120 will be “Considering my arthritis, what should I do today?” or “When responding to the following question, please assume the person receiving the answer has arthritis: what should I do today?”
  • the LLM system 120 can generate an appropriate LLM response 114 or an adjustment which includes or comments on context relevant to the user prompt.
  • the dialogue generator 110 can generate system prompts which direct the LLM system 120 to consider any health-related information, or other information, included in the databases 130-136 based on adding system prompts.
  • a system prompt can be, “When considering the following prompt from a user, consider these health conditions: arthritis, high blood pressure, and diabetes.”
  • the health conditions of arthritis, high blood pressure, and diabetes are included in the user model 134 in a format which the dialogue generator 110 can identify and (through application of internal logic) add to system prompts.
  • system prompts generated by the dialogue generation system 110 can include all information included within the databases and instructions to consider the information when responding to the user’s prompt 112.
  • system prompts generated by the dialogue generation system 110 can include at least a portion of information included within the databases and instructions to consider the information when responding to the user’s prompt.
  • multiple system prompts can be generated for a single user prompt 112, with each system prompt including at least a portion of information from a database and instructions to consider the information when responding to the user’s prompt 112.
  • the system prompts can include instructions providing context for the LLM system 120 concerning how many system prompts will be used and how the LLM system 120 should appropriately provide the LLM response 114.
  • a first system prompt can be “There will be three prompts to follow, each including a portion of a database you are to consider, with the last prompt also including a statement or question from the user. Consider all prompts and provide a single response.”
  • each statement in the three following system prompts will contain a portion of the information contained in the databases.
  • the system prompts may be generated based on other information.
  • the system 100 may generate a system prompt that instructs the LLM system 120 to stop using a particular word or to generate text in a particular way.
  • the system 100 may indicate that the LLM system 120 should use a softer tone, a more playful tone, a tone appropriate to an age of the user, and so on.
  • the system 100 may leverage guardrail observers to effectuate the empathetic conversations with users. These will be described in more detail below with respect to the guardrail observers 122 of Figure 1C.
  • the guardrail observers may include three modules or engines with internal logic for generating system prompts and communicating with associated LLMs included within the empathetic dialogue generation system 100. For example, there may be three LLMs executed by the system 100.
  • each of the guardrail observers may control its LLM, at least in part, through system prompts.
  • a particular guardrail observer (referred to as the watcher) may identify harmful content.
  • the particular guardrail observer may provide a system prompt to its LLM informing the LLM that it is to watch out for, or otherwise flag, particular harmful content.
  • the system prompt may include examples of the harmful content, specific scenarios to watch out for, and so on.
  • each module can, in some embodiments, be configured to generate system prompts including information from the databases and instructions for the LLMs associated with the guardrail observers 122 to perform specific analysis of the information.
  • Guardrail observers 122 can be configured to detect harmful content within user prompts 112, detect segues to direct a user 108 towards one or more of the user’s 108 goals, and identify broader context which can be applied to, or can replace, an LLM response 114.
  • an adjusted response 116 may be provided to the user device 108.
  • the adjusted response 116 may reflect an LLM response 114 (e.g., from the LLM system 120) which has incorporated, or been replaced by, an adjustment (e.g., from the guardrail observers 122).
  • the system 100 may have instructed the LLM system 120 via a system prompt.
  • the system 100 may have adjusted the output of the LLM system 120 (e.g., using the guardrail observers, which as described herein may be in communication with the dialogue generator 110).
  • the system may have removed certain text (e.g., remove harmful content), replaced certain words (e.g., use different language), and so on.
  • the system 100 may also provide the adjusted response 116 which represents an unadjusted version of the response 114 from the LLM system 120.
  • the adjusted response 116 can be sent to the text-to- speech module 138 (e.g., an engine) to be converted into an audio signal where it will be delivered to an audio speaker to cause generation of an audible reply 104. If the adjusted response 116 is not sent to the tcxt-to- speech module 138, the adjusted response 116 can be sent to and displayed on a display of the user device 102 screen. For example, the avatar 106 of Figure 1A may be rendered to output the audio. The adjusted response 116 can also be sent to the dialogue history database to be catalogued. In some embodiments, the user device 102 may receive text and output corresponding audio. The user device 102 may also render the avatar.
  • the text-to- speech module 138 e.g., an engine
  • the LLM system 120 can be a Large Multimodal Model (“LMM”), a machine learning model which functions similarly to an LLM, but which can handle different types of data.
  • LMM Large Multimodal Model
  • an LMM can receive a text description 208 and generate a corresponding image.
  • the system 100 can additionally send a system prompt instructing the LMM to generate or retrieve an image which compliments a corresponding response from the LMM (equivalent to an LLM response 114).
  • an adjusted response 116 may be “Be sure to eat at least three servings of vegetables every day.”
  • the system 100 may generate a system prompt and send the system prompt to the LMM to generate an image corresponding to the adjusted response 116.
  • the LMM can then generate an image, and the system can cause the adjusted response 116 to be displayed alongside the image.
  • the image may include an infographic 160.
  • the infographic 160 may be provided by a health professional or other domain expert. For example, using the mentor mode described herein the health professional may upload, or otherwise provide, the infographic 160 to the system 100.
  • the health professional may additionally describe when the infographic 160 is to be used. For example, with respect to the example of eating vegetables, the health professional may indicate that when discussing healthy eating, which may be a goal of the user, the system 100 may present the infographic 160.
  • the system 100 may include the infographic 160 in a system prompt to the LLM system 120.
  • the system 100 may additionally include a system prompt that textually describes the infographic 160 or its relevance, and the LLM system 120 may generate text that informs the user of the infographic 160.
  • the system 100 may additionally add the infographic 160 to the output of the LLM system 120 (e.g., the LLM response 114) to form the adjusted response 116.
  • the adjusted response 116 may include text generated by the system 100 which is related to the infographic 160.
  • the text may be generated by the system 100, such as via one of its internal LLMs described herein, based on the description by the health professional.
  • FIG. 1C is a block diagram illustrating an example architecture and functioning of the guardrail observers 122.
  • the dialogue generator 110 receiving a user prompt 112 (e.g., from the user) and/or an LLM response 114 (e.g., from the external LLM system 120), the user prompt 112 and/or LLM response 114 can be provided to the guardrail observers 122 for generation of adjustments.
  • the guardrail observers 122 may include, in some embodiments, at least three large language models which may be used in part to generate adjustments.
  • the large language models include a watcher 124, a planner 126, and a reasoner 128.
  • guardrail observers 122 are configured with computer-executable instructions to provide the internally located LLMs modified user prompts 112, modified LLM responses 114, or system prompts (e.g., generated by the observers 122) to direct the LLMs to generate adjustments related to specific subject matter or objectives.
  • the guardrail observers 122 may reflect engines which leverage respective LLMs.
  • each LLM may receive a system prompt that identifies its purpose and explains how it is to accomplish the purpose.
  • the LLM may be a pretrained LLM which receives the system prompt.
  • the LLM may be further refined, or trained, based on examples (e.g., training examples of input and expected or intended output may be provided). Output of the LLM may be used, as one example, to generate a system prompt for inclusion in the user prompt to the LLM system 120.
  • the watcher 124 may include an engine which leverages an LLM that identifies harmful content in a message 118 or LLM response 114.
  • harmful content can be phrases or words related to topics that if mentioned to a user are likely to cause the user anxiety, sadness, anger, or any other negative emotional state. Harmful content can include explicitly or implicitly political, derogatory, inflammatory, or emotionally stimulating content.
  • a message 118 can be “What time is the presidential debate tonight?”
  • the harmful content can be the reference to the presidential debate, an inherently political topic - although the system 100 can generate replies which affirm or support the user’s political opinions, engaging in such conversation may still cause the user anxiety and thus, in some embodiments or preferences of the user or domain expert, may be avoided.
  • the watcher 124 may generate information for review by the system 100 (e.g., via the resolver 148 described below) which indicates no response should be directly given.
  • the watcher’s 124 LLM may identify the harmful content and recommend a switch to a different topic. This different topic may be included in a system prompt to the LLM system 120.
  • the user prompt 112 may be generated to include the message 118 and the system prompt.
  • the watcher’s 124 LLM may also note that no direct response should be given to the question.
  • the watcher 124 may identify a topic change with a factual answer.
  • an LLM response 114 can be “The presidential debate is tonight at 5:30pm, how are things going with [Topic A].”
  • a message 118 can be “I’m going to visit my useless son today, what is the best route I should take?”.
  • the phase “my useless son” may correspond to feelings of resentment between the user and their son and can be identified by the watcher 124 (e.g., based on the databases described herein, for example the watcher 124 may cause its LLM to monitor for specific harmful content).
  • the watcher’s 124 LLM may analyze this message from the user and identify it as including harmful content. Similar to the above, the watcher 124 may indicate a topic switch in the user prompt 112.
  • the watcher 124 may adjust an LLM response 114 to form the adjusted response 116.
  • the watcher 124 may identify harmful content in the LLM response 114 and cause its removal or adjustment.
  • the watcher’s 124 LLM may remove harmful content after identifying it.
  • the watcher’s 124 LLM may adjust the LLM response 114 itself.
  • the watcher’s 124 LLM may generate a system prompt and cause an updated user prompt 112 that includes the system prompt to be provided to the LLM system 120.
  • the system 120 may generate new output which the watcher 124 may review.
  • the instructions to generate prompts may include a set of pre-defined system prompts which can modify or replace user prompts 112 of LLM responses 114 depending upon context.
  • System prompts from the watcher 124 concern commanding the LLM associated with the watcher 124 to identify harmful topics.
  • a system prompt for the watcher 124 can be “Identify harmful and controversial topics in the conversation with the user.”
  • a system prompt for the watcher 124 can be “Generate a response to the following message 118 but avoid any political or inflammatory language in the message 118.”
  • the watcher 124 may therefore leverage information specific to the user, such as via access to the databases. For example, the watcher 124 may learn topics which tend to cause issues or distress with the user 108. In this example, the watcher 124 may analyze conversation histories to detect topics which inflame, or are otherwise associated with negative outcomes with, the user 108. The watcher 124 may use this information with its LLM, such as via including portions of it system prompt(s) to the LLM.
  • the planner 126 may include an engine which leverages an LLM that identifies segues in a conversation to introduce / mention the user’s goals. While generating replies 104 which include context and syntax sufficient to engage a user 108 and mimic conversation with another person, for a prolonged conversation with the system 100 to effectively mimic human conversation (and not simply responses) there may be a structure or flow to responses. For example, a conversation history may include: message: “What time is my son coming over today?”
  • the above conversation while including contextually relevant replies, includes replies which, in some embodiments, may be overly reactive and do not add any additional context or encourage user interaction.
  • the conversation may be observed by the user 108 as “robotic” or “stiff.”
  • the system 100 can apply a transitional topic structure for generating replies which directs generation of responses in a manner which effectively engages and directs the user 108.
  • the transitional topic structure is described in greater detail with respect to Figure 4 but will be briefly described here with respect to its interactions with the planer 126.
  • the planner 126 can generate system prompts which instruct the planner’s LLM to explicitly identify if there is a good segue for any goals recorded in the user model 134 database and can also draw upon the dialogue history database for context.
  • a system prompt may be “Given the next message and the prior five messages, would it be appropriate for the next reply to introduce a goal?” and an adjustment can be “It is not a good time to segue to a goal” or simply “No.”
  • a goal included in the user model 134 can be to walk for thirty minutes a day
  • a system prompt can be “Given the next message and the prior ten messages, if it is appropriate to segue the conversation to a goal, generate a segue, if not, say ‘no.’”, and an adjustment can be “It sounds like you’re interested in exercise. Today is a great day to go for a walk.”
  • the planner 126 may identify a segue and generate a system prompt based on the segue.
  • the system prompt may indicate that the LLM system 120 is to segue towards a particular goal.
  • the system prompt may include detail regarding how to segue, for example stating that the LLM system 120 is to respond to the last message with the user while seamlessly including a segue towards the goal.
  • the planner 126 can generate adjustments based upon analysis of health data derived from a health data store 154, a health device 152 integrated with the system 100, or provided by the user 108 and included within the user model 134 database.
  • the planner 126 may adjust an LLM response 114 to form the adjusted response 116.
  • the planner 126 may identify an unsuccessful segue.
  • the planner 126 LLM may identify too abrupt of a segue (e.g., based on an instruction provided to the LLM, based on training examples, and so on).
  • the planner 126 may additionally identify that a particular goal wasn’t mentioned.
  • the planner 126 may cause a new output from the LLM system 120.
  • the planner 126 may generate a system prompt that identifies the error (e.g., lack of a segue, not directly mentioning the goal) such that the LLM system 120 will update its output.
  • the reasoner 128 may include an engine that leverages an LLM that identifies relevant topics and concepts in a message 118 or LLM response 114. For example, the reasoner 128 may identify, utilizing the world context 132 database, contextual information relevant and pertaining to the message 118 or LLM response 114. In this example, the reasoner 128 may then generate an adjustment which accounts for the relevant information. Relevant information can be directly or adjacently related to the user’s living situation (e.g., whether it is rural, suburban, or urban), economic situation, occupation, regularly exposed to climate, among other factors. By identifying the relevant information, the reasoner 128 can generate an adjustment which can address the user’s situation more succinctly than otherwise would be possible.
  • the reasoner 128 can operate by generating system prompts which modify the user’s messages 118 (e.g., to form the user prompt 112) or LLM responses 114 or which are presented discretely, where the system prompts direct the reasoner’ s LLM to perform desired analysis.
  • a system prompt generated by the reasoner 128 can be, “Review previous message from the user and generate a reply to the user’s latest message which accounts for their living situation.”
  • a system prompt can be, “Generate a reply to the user’s latest message while accounting for the user’s living situation.”
  • a system prompt can be, “Generate a reply to the user’s latest message 118 which accounts for hobbies which are likely to appeal to people living in the user’s 108 area.”
  • the reasoner 128 may adjust an LLM response 114 to form the adjusted response 116. For example, the reasoner 128 may identify that the LLM response 114 did not address the user’ s living situation. The reasoner 128 may therefore update a system prompt to the LLM system 120 to cause expected output.
  • the guardrail observers 122 can have access to two or more of the databases when generating adjustments. Having access to multiple databases can be functionally treated as one larger database. Discerning between data included in either database can be allowed by a plurality of means, including structuring the data with appropriate labels.
  • each message 118 or reply can be modified with the text “This is a message 118:” or “This is a reply:”, corresponding to a saved reply of “You could take a nap.” being saved in the dialogue history database as “This is a reply: You should take a nap.”
  • goals saved in the user model 134 can include modification with the text “This is a goal:”, corresponding to a goal of “Walk thirty minutes every day” being saved as “This is a goal: Walk thirty minutes a day.”
  • the system 100 may leverage a multiplexer 150 - a computational module including a data structure for holding and processing messages 118 and LLM responses 114 — to receive, organize, and provide the messages 118 or LLM responses 114 to the guardrail observers 122. Conversation can progress at quick pace, with the conversation being generated rapidly. To ensure each message 1 18 is correctly associated with an LLM response 114, each message 118 and LLM response 114 may be sent to and categorized within the multiplexer 150.
  • the multiplexer 150 can determine which messages 118 and LLM responses 114 get sent to the guardrail observers 122. In some embodiments, a message 118 will be sent to the guardrail observers 122 followed by a corresponding LLM response 114.
  • user prompt 118 can be sent to the guardrail observers 122 and corresponding adjustments can be catalogued so as to be identified with a corresponding LLM response 114 sent to the guardrail observers 122 at a later time.
  • a dialogue generator can receive a first and second message from the user.
  • several sets of operations can be performed to process the user prompts 112:
  • the first message such as an associated first user prompt
  • the first user prompt can be sent to the LLM system 120, such that a first LLM response can be received.
  • the first user prompt can be sent to the multiplexer 150, followed by the first LLM response 114.
  • the second message such an associated second user prompt, can be sent to the LLM system 120, such that a second LLM response can be received.
  • the second user prompt can be sent to the multiplexer 150, followed by the second LLM response.
  • the guardrail observers 122 may process the first message (to form the first user prompt) and then the first LLM response 114, where corresponding generated adjustments can be analyzed together to determine which is more appropriate (performed in the later described resolver 148).
  • the guardrail observers may process the second user prompt and second LLM response 114, wherein corresponding generated adjustments can be analyzed together.
  • the first user prompt can be sent to the multiplexer 150, followed by the second user prompt being sent to the multiplexer 150.
  • the first user prompt can be sent to the LLM system 120 to generate a first LLM response and the second user prompt can be sent to the LLM system 120 to generate a second LLM response.
  • the first LLM response can be sent to the multiplexer 150 and then the second LLM response can be sent to the multiplexer 150.
  • contents of the multiplexer 150 are sent to the guardrail observers 122 and corresponding adjustments are generated, the adjustments can be analyzed by the resolver 148 in the order in which they were received.
  • analyzing adjustments in relation to appropriate sets of messages and LLM responses can be dictated by logic internal to the resolver 148.
  • potential adjustments by the guardrail observers 122 may be routed to the resolver 148 (e.g., via queue 146). These adjustments may be presented to the resolver 148 in an order corresponding to generation by a first message, a second message, a first LLM response 114, and a second LLM response 114, the resolver 148 can review every other set of adjustments as a set.
  • the resolver 148 can review two adjacent sets of adjustments together. Analyzing contents of a multiplexer 150 can be done by examining which types of adjustments are present and selecting adjustments based on system preferences. For example, if a first user prompt 112 corresponds to an adjustment being generated from the planner 126 and a first LLM response 114 corresponds to an adjustment being generated from the reasoner 128, the resolver 148 can select from the available adjustments based on internal priority. In some embodiments, the priority ranks highest for adjustments from the watcher 124, then from the planner 126, and then from the reasoner 128. In instances in which multiple adjustments from the same guardrail observer 122 are available, the resolver 148 can select which adjustment was received first.
  • the guardrail observers 122 may generate adjustments individually, and not, in some embodiments, as a combined unit.
  • the watcher 124, planner 126, and/or reasoner 128 may or may not generate an adjustment. If the watcher 124 does not identify any harmful content, the watcher 124 will not generate an adjustment. If the planner 126 does not identify any goals for which segues can be appropriately generated, the planner 126 will not generate an adjustment. If the reasoner 128 does not identify any applicable, broader context which could be added to or replace an LLM response 114, the reasoner 128 will not generate an adjustment.
  • adjustments are passed by the guardrail observers 122 to a queue 146, which includes a data structure capable of holding the adjustments and presenting the adjustments to the resolver 148.
  • the queue 1 6 can he any data structure capable of holding the adjustments in a usable format.
  • the resolver 148 receives adjustments from the queue 146 and determines which adjustment will be selected. In some embodiments, selection of adjustments can be dependent upon a prescribed priority, wherein adjustments are selected based upon a hierarchical preference for adjustments derived from either the watcher 124, planner 126, or reasoner 128. In some embodiments, the hierarchical preference can include a fist preference for adjustments from a watcher 124, a second preference for adjustments from a planner 126, and a third preference for adjustments from a reasoner 128.
  • the guardrail observers 122 may include a priority associated with their adjustment. For example, the output of their associated LLMs may include the recommended adjustment for inclusion in a system prompt to the LLM system 120 along with an associated priority. Thus, the resolver 148 may prefer a higher priority. In some embodiments, the resolver 148 may select adjustments from two or more guardrail observers for inclusion in a system prompt to the LLM system 120.
  • adjustments can further be prioritized depending upon whether they derive from a message 118 or an LLM response 114. Adjustments derived from a message 118 can be preferred over adjustments derived from a corresponding LLM response 114.
  • adjustments generated from messages 118 can be used to modify or replace use prompts to generate adjusted prompts, wherein the adjusted prompts can be sent to the LLM system 120 so as to receive an LLM response 114.
  • the dialogue generator 110 may include a system prompt in the user prompt 112 to the LLM system 120 based on the output from the resolver 148.
  • an adjustment can replace an LLM response 114 and can be used to generate an audible or visual reply 104.
  • content from an adjustment can be applied to an LLM response 114 so as to adjust the LLM response 114.
  • an LLM response 114 may be “Today, you should take a nap.” and an adjustment may be “go for a walk”, wherein the adjustment corresponds to a goal of the user.
  • the dialogue generator 110 can adjust the LLM response 114 to include content of the adjustment so that the adjusted response 116 will be “Today, you could go for a walk.”
  • the generator 110 may additionally use a current weather, for example obtained based on a network call by the system 100, to respond with, “Today is sunny, it may be nice to go for a walk.” Determining where to insert adjustments to modify LLM responses 114 can be done with the LLM system 120 or one or more guardrail observers 122.
  • an LLM response 114 may be “Today, you could read a book.” and an adjustment can be “call your mother.”
  • modifying the LLM response 114 can involve sending the following prompt to an LLM “Modify the message 118 ‘Today, you could read a book.’ with the adjustment ‘call your mother.’ in an appropriate manner.’”
  • Figure ID is a block diagram illustrating an example architecture and functioning of the empathetic dialogue generation system 100 integrating with one or more health devices 152 and receiving and integrating health data from a health data store 154.
  • a user model 134 can be a database including information defining factors and descriptions of a user’s 108 health history and conditions, among other topics.
  • the user model 134 can include information describing if a user 108 has dementia, diabetes, cancer, paralyzed lower extremities, or any other condition.
  • adjustments can be generated which account for the user’s 108 health condition and history.
  • a user 108 can engage with the empathetic dialogue generation system 100 to manually enter descriptions of the health data; the empathetic dialogue generation system 100 can integrate with a health data store 154; or the empathetic dialogue generation system 100 can integrate with one or more health device(s) 152 to receive measurements and readings of a user 108 (e.g., sensor measurements).
  • Manually entering descriptions of health data can be performed while establishing a user 108 profile (either by the user 108 or someone acting on behalf of the user 108, like a family member or health staff) and can involve providing a text message 118 or audible message 118 to the empathetic dialogue generation system 100 describing the user’s 108 health data.
  • Receiving health data from a health data store 154 can include integrating the empathetic dialogue generation system 100 with an externally located health data store 154, which can be a web-based database which can be integrated with and accessed so as to retrieve health data relevant to the user 108.
  • a health device 152 can be a heart rate monitor, a blood glucose monitor, a blood pressure monitor, a temperature sensor, or any other device configured to receive readings of a user’s 108 health condition.
  • the empathetic dialogue generation system 100 can integrate with a health device 152 by the user device 102 integrating (physically or through a wireless communication method) with the health device 152 and receiving health data from the health device 152.
  • the empathetic dialogue generation system can add additional context to adjusted responses 116. For example, if health data states that a user 108 is a diabetic, the guardrail observers 122 can generate adjustments which: recommend food choices which are low in sugar, remind the user 108 to monitor glucose when an appropriate time is additionally detected from the world context 132 database, or remind the user 108 to speak with their health professional periodically. In another example, if health data corresponds to the user 108 having high blood pressure, the guardrail observers 122 can generate adjustments which advise the user 108 to speak with a health professional about their blood pressure.
  • the health data can be processed by the system 100 to perform analyses.
  • An example analysis may include identifying one or more trends and points of interest.
  • the system 100 can direct the guardrail observers 122 to analyze user prompts to identify health-related factors to enable detection of health-related trends.
  • health data can include a user’s blood pressure readings.
  • the system 100 can identify trends in the user’s blood pressure as they relate to conversation with the system 100.
  • the system 100 can determine whether the conversations are causing a decrease in the user’s blood pressure.
  • the system 100 can determine whether the conversations are resulting in more frequent glucose monitor readings by the user.
  • This information may be saved by the system 100 and reviewed by the user or a health professional, or other mentor, using the mentor mode described herein. For example, a health professional can ask the system if the user has been more frequently using their glucose monitor.
  • health device 152 may include environmental sensors, humidity sensor, water quality sensor, temperature sensor, physiological / biometric sensors (e.g., blood pressure sensor, glucose sensor, heart rate sensor, oxygen sensor, and so on).
  • Avatar Generation e.g., blood pressure sensor, glucose sensor, heart rate sensor, oxygen sensor, and so on.
  • the empathetic dialogue generation system 100 can include an avatar generation system 200 which can generate an avatar, a visual image which can be animated so as to mimic dialogue corresponding to an audible reply.
  • an avatar may be generated using deep learning image techniques such as pix2pix, vid2vid, and so on.
  • FIG. 2 is a block diagram illustrating an example architecture and functioning of the avatar generation system 200.
  • the system 200 may, in some embodiments, be implemented or form part of the system 100.
  • the system 200 may also, in some embodiments, be executed by an outside system (e.g., a cloud system) in communication with the system 100.
  • Generating an avatar involves receiving an image or a description of an image, performing image generation 202 (if a description of an image was received), uploading the image into an avatar image dataset 204, receiving a generated reply and implementing motion transfer 206, and generating a moving avatar.
  • An image can be uploaded 210 to the avatar generation system 200 by a user 108 interacting with a user device 102 which includes the empathetic dialogue generation system 100 and the avatar generation system 200.
  • An image can be of an animal (including the animal’s face), a person, a cartoon, a plant, fungi, or an object, although generally the image will be of an animal.
  • a text description 208 is received, the text description 208 can be received by a user 108 physically entering text describing an image or the user 108 audibly describing an image and the audible description being received by the user device 102 and converted (via the text-to- speech module 138) into a text description 208.
  • a text description 208 can be “A monkey” or “A cat.”
  • the text description 208 can then be used in conjunction with a generative model to generate an image matching the text description 208.
  • the image can be provided to the avatar image dataset 204, a data structure including images used to generate avatars.
  • the avatar images can be provided to a motion transfer 206 module and analyzed so as to identify portions of the images which best correspond to a mouth which can be simulated to move. If the image includes a non-animal, like a fruit or another object, a portion of the image will still be identified as a mouth and will be accordingly processed.
  • the image will used to generate a set of alternative frames - alterations of the image with portions of the subject of the image warped to simulate different stages of speech - which, when combined, simulate movement of a mouth.
  • the motion transfer 206 module can additionally receive generated replies and can provide the alternative frames and the image in a progression so as to match an audible reply. Whenever a new reply is audibly generated, the alternative frames and the image (collectively referred to as the “avatar”) can be displayed to match the new reply.
  • the avatar and/or conversations may have different modes.
  • a companion mode may be selected which will cause the system 100 to engage the user in conversations to improve mood, offer humor, actively listen and respond.
  • the system 100 may use user goals and topics to offer responses that include humor, up-lifting stories, and so on.
  • Another example mode may include nap mode. This mode may include the avatar lying down and snoozing quietly. The avatar may stay quiet until the user initiates conversation or has been set up to start a conversation or remind the user at a particular time.
  • Another example mode may include memory mode. For this mode, the avatar may engage its user in conversations aimed at improving memory recall.
  • the avatar may keep track of the user’s stories from the past, for example the system may store the conversation history as described herein. At random days and times, the avatar may comment or ask a question about the stories to nudge the user into recalling a memory.
  • This memory might be an event, such as a wedding, or an experience, such as a trip. Memory mode may be used for important reminders, such as appointments or medication times.
  • Another example mode may include report mode.
  • the system may provide a tool for users of the system to access context-sensitive situational awareness of the user.
  • Another example mode may include coach mode.
  • the system may support the user by gently nudging conversations to talk about movement and exercise, or other goals, as described herein.
  • Another example mode may include helper mode.
  • the system has take notes it can later recall.
  • the system may store summaries of conversations, such as likes, dislikes, of the user, topics frequently mentioned, and so on. The system may use these to bring up specific likes later on, or to avoid particular dislikes.
  • Another example mode may include sharing mode, in which the system may share information by authorized users via, for example, an API.
  • Figure 3A is a block diagram describing a method 300 of generating conversation with a user 108.
  • the process 300 may be performed, for example, by the system 100. Additional description related to Figure 3 A may be found in at least Figures 1A- 1D with respect to the system.
  • a message 118 is received from a user 108 in the form of text or audio.
  • pre-processing is applied to the message 118 to generate a user prompt 112.
  • the user prompt 112 is then provided to the LLM system and an LLM response 114 is received at step 306.
  • the system can generate system prompts to be provided to the guardrail observers to direct the corresponding LLMs to perform specific actions.
  • the system prompts either supplement or include reference to the user prompt 112 or LLM response 114 and, at step 310, upon being received at one of the guardrail observers, cause analysis of the user prompt 112, LLM response 114, and I or one or more databases.
  • one or more adjustments are received from the guardrail observers and provided to a resolver 148, which selects an adjustment, wherein the system can apply the adjustment to the LLM response 114, corresponding to either modifying the LLM response 114 or replacing the LLM response 114 with content from the adjustment, generating an adjusted response 116.
  • the LLM response 114 is still considered an adjusted response 116 for computational purposes.
  • the adjusted response 116 is then provided to a user 108 either in the form of a text message 118 or an audible message 118 after the adjusted response 116 is provided to a text-to- speech module 138 and converted into an audible signal.
  • the system may additionally analyze the message 118 using the guardrail observers to form the user prompt 112.
  • the guardrail observers 118 may analyze the message 118, optionally along with at least a portion of a prior conversation history.
  • the guardrail observers may generate system prompts, or other adjustments, to the message 118.
  • the system such as by the resolver described herein, may select one or more of the adjustments and the system, such as via the dialogue generator, may generate the user prompt 112.
  • the system may additionally analyze the LLM response 114, optionally along with at least a portion of a prior conversation history.
  • the guardrail observers may generate system prompts, or other adjustments, to the LLM response 118.
  • the system may cause an updated LLM response to be generated, for example by sending one or more of the system prompts to the LLM system.
  • the system may also adjust the LLM response 114 based on the guardrail observers, for example it may remove text, adjust text, and so on.
  • Figure 3B is a block diagram describing a method 320 of generating adjustments with guardrail observers.
  • the process 320 may be performed, for example, by the system 100. Additional description related to Figure 3B may be found in at least Figures 1A- 1D with respect to the system.
  • a message 118 or LLM response 114 is received by the system.
  • the system can then generate system prompts, commands which can complement or modify messages or LLM responses provided to guardrail observers.
  • the system prompts may be generated to apply to specific guardrail observers, certain system prompts for the planner, reasoner, and watcher individually.
  • the message 118 or LLM response 114 and the system prompts can be sent to the planner, reasoner, and watcher to cause corresponding analysis and generation of adjustments.
  • the planner is configured to receive system prompts which cause the planner to analyze the message or LLM response, optionally along with a conversation history, and one or more databases to determine appropriate segues to support one or more user 108 goals.
  • the reasoner is configured to receive system prompts which cause analyses of the message or LLM response, optionally along with a conversation history, and one or more databases to generate adjustments which account for broader situational context of the user.
  • the watcher is configured to receive system prompts which cause the planner to analyze the message or LLM response, optionally along with a conversation history, and one or more databases to generate adjustments which have removed harmful subject matter.
  • the adjustments are provided to a queue 146, a data structure which holds and organized the adjustments, wherein the queue 146 provides the adjustments to a resolver 148 which selects an adjustment for application to the message (to form the user prompt) or LLM response 114.
  • Topic Transition Structure a data structure which holds and organized the adjustments, wherein the queue 146 provides the adjustments to a resolver 148 which selects an adjustment for application to the message (to form the user prompt) or LLM response 114.
  • the empathetic dialogue generation system 100 can include a conversational structure.
  • FIG. 4 is a block diagram illustrating an example implementation of a topic transition structure 400.
  • the topic transition structure includes operational phases which correspond to sets of instructions which control and guide which types of LLM responses 114 and adjustments arc generated. Driving the topic transition structure is generation of system prompts, commands which can complement or modify messages 118 or LLM responses 114 to induce specific operations.
  • the topic transition structure includes a set of steps corresponding to specific system prompts being sent to specific guardrail observers and / or the LLM system 120.
  • a greeting can be generated by a system prompt being generated which instructs an LLM to generate a greeting discretely or in connection with user prompt 112.
  • the system can receive a message 118 of “What time is it?” and generate a system prompt of “Generate a greeting and responds to the following message 118.”, wherein a corresponding LLM response 114 can be “Good morning. It is nine-thirty a.m.”.
  • the system may obtain a current time (e.g., the question may trigger access to an outside system or service to respond to the question).
  • a threshold number of replies may be required prior to seeking to perform any other task or introduce another topic (e.g., introduce a goal).
  • Engaging in chit-chat can involve replying to messages 118 in a straight-forward manner. For example, a message 118 can be “What time is it?” and a reply can be “It is nine-fifteen in the morning.”
  • the threshold number of replies is 3, 4, 5, or set by a health professional.
  • the system 100 can avoid segueing into messages regarding goals without some initial discussion (e.g., chitchat).
  • step 406 the system can proceed to step 406, for which the system can begin sending prompts to the guardrail observers with instructions to generate adjustments which segue into one or more topics corresponding to goals 412 included the User model 134 database.
  • the system may optionally provide messages to the guardrail observers (e.g., to identify harmful content, and so on).
  • System prompts to generate segues to goals can optionally be explicit requests to do so.
  • a system prompt in step 406 can be, “Generate a segue to a user goal. If such a statement is not appropriate at this phase of the conversation, return the word ‘No.’” If the guardrails do not generate an adjustment after receiving the system prompt, the system can return to step 404 to resume the chit-chat phase.
  • step 408 deliberation can occur to select which adjustment to utilize when generating an LLM response 114. If a single adjustment is received, the system may select the adjustment and either modify an LLM response 114 - by sending both the adjustment and LLM response 114 to an LLM and requesting the LLM perform the modification - or replace the LLM response 114 with the adjustment. If there are multiple adjustments, the adjustments can be organized into a queue 146. In some embodiments, the adjustments can be selected in order or based on priority information. Upon selection, adjusted responses 116 will be generated and presented to a user 108 which recommend topics supportive of one or more user 108 goals.
  • step 410 the system can proceed to step 410, wherein the system can follow up on topics 414 mentioned in relation to the user 108 goal.
  • follow-up can involve asking clarifying questions or performing analysis on future messages 118 to detect if the user 108 has made progress towards accomplishing one or more goals. If progress is identified, the system can record within the user model 134 that the one or more goals are being achieved.
  • System prompts corresponding to step 410 can include directions to one or more LLM to consider a recent or upcoming message 118 in relation to the mentioned goal.
  • a reply concerning a goal can be “Consider going for a walk, it is great exercise.”
  • a system prompt generated during follow-up can be “Focus on the recommended activity in the previous reply when reviewing the following message 118 and identify if progress has been made towards achieving any user 108 goals.”
  • the system can generate a reply of “Are you making progress with healthy eating?” After follow-up, the system can return to step 404 to engage in chit-chat.
  • Transitioning to each structure involves generating and applying one or more system prompts which ask one or more LLMs if transitioning to the next conversational phase is appropriate.
  • a corresponding LLM generated an LLM response 114 (e.g., an external LLM) or adjustment which instructs the system to return to a preceding step, the system can oblige.
  • conversations can be adaptive as they respond to organic changes and flow of each conversation.
  • Pre-made replies can be questions ascertaining a user’s 108 condition or opinion or can be statements which generally relate to one or more topics. For example, a pre-made reply can be “While I figure that out, tell me more about your day.” In another example, a pre-made reply can be “Let me consider that.”
  • turn counting is a functionality of turn counting, wherein the system 100 can identify topics which have been mentioned to a user 108 and can count how many messages 118 and replies take place between successive discussions of the topic.
  • a turn can refer to a message 118 or reply.
  • Applying turn counting includes restricting which topics can be covered until a minimum number of turns have taken place since a previous instance of a topic being discussed. For example, the system can suggest a user 108 eat more vegetables and can refrain from mentioning the topic of eating vegetables until a minimum number of pre-configured turns (e.g., a threshold number of turns, such as 2, 5, 75, 100) have taken place since the topic was discussed.
  • a minimum number of pre-configured turns e.g., a threshold number of turns, such as 2, 5, 75, 100
  • the turn count can be used by the topic transition structure to select an adjustment to recommend user 108 take actions to progress towards achieving a goal.
  • the method can select between recommended topics by selecting a topic with a smallest corresponding turn count.
  • the disclosed technology enables ‘adaptive topic trajectories in empathic dialog systems working within multi-session environments.
  • the system may ‘remember’ previous topics discussed from previous sessions then make progress on topics that need further deliberation. Further empathic dialog systems may avoid excessive discussions on a single topic within sessions and across multiple sessions.
  • topics that are brought up by empathic systems on some defined time such as that required to give reminders to users, may have empathic dialog systems to support timely changes in topic trajectories that the emphatic dialog system is considering.
  • Non-limiting examples of reminders include user- defined reminders and system-defined reminders.
  • An example of user-defined reminders might be appointments and/or medication reminders.
  • An example of system-defined reminders might be prompting the user to recall a memory and/or reminding the user of new lifestyle behaviors, such as taking a regular walk.
  • Adaptive topic trajectories may, in some embodiments, build on the system architecture described above, for example the system may track the state of the dialog using a conversation state structure.
  • nodes which form a conversation policy graph structure point may point to tables including lists of transitional goals and topics.
  • the ‘introduction’ state points via a topic trajectory to the table containing a list comprising the following transition goals and topics: (1) ‘healthy eating’, and (2) ‘regular exercise’.
  • topic of ‘healthy eating’ points to a list comprising the following sub-topics: (1) ‘low sodium diet’, (2) ‘balanced diet’, and (3) ‘reduce fat intake,’ and the topic of ‘regular exercise’ points to a list comprising the following sub-topics: (1) ‘low-impact walk’, and (2) ‘swimming’.
  • the dialog While the dialog is in an ‘introduction’ state, the dialog may transition either to the topic of ‘healthy eating’ or ‘regular exercise’. In the instance where the dialog has transitioned to the topic of ‘healthy eating,’ and when the dialog is in the ‘deliberation’ stage, the dialog may now transition to the sub-topic of low sodium diet, the sub-topic of ‘balanced diet’, or the sub-topic ‘reduce fat intake.’
  • Each dialog session may optionally beo assigned a number. For example, Session numbers may be further annotated with the current date and time.
  • Each node ‘reached’ in a dialog may be marked with (1) a session number (a monotonically increasing number representing the session the dialog is in), (2) the current tum-count, and (3) the detected sentiment in the user’s response.
  • a session number a monotonically increasing number representing the session the dialog is in
  • the current tum-count a monotonically increasing number representing the session the dialog is in
  • the detected sentiment in the user’s response For example, during the first session (i.e., session 1), a ‘greeting’ node, ‘chit-chat’ node, ‘introduction’ node, and the ‘deliberation’ node may be marked with a ‘1’ corresponding to the session number.
  • the system may optionally schedule topics in a future session based on a request from the user.
  • the user indicates in this session dialog that the user would like to be reminded about the topic of managing stress.
  • the system inquires what days/times such a reminder is required. Based on when such a reminder is required, the system 100 stores information scheduling a topic trajectory change by enqueuing the topic ‘manage stress’ and sets a timer that triggers the update on the requested schedule. For example, a guardrail observer may segue into the topic change.
  • LLM responses 114 often reference and include the same or similar content as corresponding user prompts 112, a user prompt 112 including sensitive information, protected health information or personally identifiable information, can cause generation of an LLM response 114 which includes some of or all of the sensitive information or includes variations of the sensitive information which would allow an inductive reconstruction of the sensitive information. And even in situations in which an LLM is programmed in a manner which does not learn from user prompts 112 or LLM responses 114, the act of sharing a user’s 108 sensitive information is itself an ethical violation, especially when the sensitive information relates to a user’s 108 health condition.
  • a mentor can be a family member of the user 108, a friend of the user 108, a health professional who works with the user 108, an education professional, or any other authorized person (e.g., with valid credentials).
  • the mentor may be a domain expert.
  • the mentor can provide - either via text or audible message - alternative versions of adjustments, allowing the system 100 to later review the alternative versions of adjustments when the guardrail observers are generating adjustments.
  • a user prompt 112 recorded in the dialogue history database can be “What should I do today?” and a recorded adjustment can be “You could go for a walk today.”
  • the mentor can provide an alternative adjustment of “You could call your friends today.”
  • the guardrail observers 122 When the guardrail observers 122 are responding to future user prompts 112 and / or LLM responses 114, the guardrail observers 122 will be able to access the alternative adjustment and not the original adjustment, thus allowing the alternative adjustment to shape generation of future adjustments.
  • a health professional may specify goals which they prefer specific user(s) 108 to further. For example, the health professional may indicate that user 108 A should try to go for a walk every day in a park. In this example, the health professional may communicate with the system 100 to indicate these goals. The guardrail observers 122 may then ensure that future conversations with user 108 A introduce, or segue, into these goals at times appropriate. For example, the guardrail observers 122 may be provided with system prompts that indicate, or otherwise identify locations of, the goals. Thus, the observers 122 may identify the goals and determine appropriate times at which to introduce them into discussion.
  • FIG. 5 is a block diagram illustrating a process 500 of employing the mentor mode.
  • the process 500 may be performed, for example, by the system 100.
  • the system will receive login information from a domain expert (e.g., a health professional) associated with a user’s 108 profile, wherein the user’s 108 profile will include a list of persons authorized to access mentor mode.
  • a domain expert e.g., a health professional
  • the system will present to the domain expert a user 108 interface including options to interact with the user’s 108 profile.
  • the user 108 interface will enable review of the user’s 108 dialogue history, displaying identified instances in which the user 108 stated something which suggests or is analogous to an aspect of a domain of interest.
  • the domain expert may cause the system to search for instances of text or instances of specific scenarios (e.g., did you discuss [topic A] with user?”
  • the user 108 interface enables modification of replies to affect future generated replies to be more in line with a mentor’s desired output.
  • the system can offer an exemplar of future generated replies for the mentor to review and potentially change further.
  • the domain expert when a domain expert is interacting with a user’s 108 profile at step 506, the domain expert can provide messages audibly or through text to the system 100 corresponding to questions about a user’s 108 profile, and the system 100 can provide replies 106 which answer the questions.
  • a domain expert can be a teacher and can ask the system 100 “Did you discuss eating healthy foods with the user?” and the system 100 can generate a response “Yes, I spoke with the user about eating healthy foods” or “No, I did not speak with the user about eating healthy foods.”
  • communicating with the system 100 may include the domain expert, such as a health professional, communicating with an LLM.
  • the LLM may be, in some embodiments, locally run by the system 100 to ensure privacy associated with conversations.
  • the LLM may have access to conversation history with a user, such as a patient of the example health professional.
  • the LLM may receive at least a portion of the conversation history in its context window (e.g., as a system prompt).
  • the system may search for specific queries related to the health professional’s conversation.
  • the health professional may ask if the system discussed healthy foods.
  • the system may search for text related to healthy foods and may provide responsive results in the context window of the LLM.
  • the domain expert may ask questions to the system 100, such as “Did you speak with [user A] about her health condition recently?”
  • the system 100 may use an LLM to respond with, “Yes, talked about her challenges walking up stairs and I described I had a friend who had similar challenges.”
  • the domain expert may then respond with, “please avoid saying that you had a friend with similar experiences and instead inquire more about her condition next time.”
  • the system may store information reflecting that it is to avoid describing it has a friend with similar experiences.
  • the guardrail observers described herein may, for example, reject a message from an outside LLM if it responds with an imaginary friend.
  • the guardrail observers may respond with an adjusted message noting that the outside LLM is to not imagine a friend but instead merely inquire about the user’s condition.
  • the guardrail observers may analyze a subsequent received response from the outside LLM, and if it is appropriate to send, then send to the user. In this way, the guardrail observers may effectuate the domain expert’s preferences.
  • mentor mode may cause an update of information used by the guardrail observers.
  • a domain expert e.g., health professional
  • the system 100 interacting with the domain expert may store data identifying the additional goal.
  • the goal may be stored as being associated with the planner described in Figure 1C.
  • the goal may be included in a prompt which informs the planner (e.g., the associated LLM) its intended purpose and identifies goals associated with the user.
  • the goal may be stored as data in a database which is accessible to the planner.
  • the system 100 can provide user prompts 112 to the LLM system 120 with system prompts to identify if a user 108 is engaged in conversation or not.
  • a system prompt can precede a user prompt 112 or can modify the user prompt 112.
  • a system prompt can be “Analyze the following messages and identify if the user appears engaged.”
  • the system 100 can also direct one or more guardrail observers to review contents of the dialogue history database to identify trends in conversation which suggest engagement or lack of engagement.
  • Such a system prompt can be “Review the previous [user- selectable threshold number or pre-configured number of] messages and identify if the user is engaged in the conversation or not.”
  • the empathetic dialogue generation system 100 can also include access to a timer, either internally included or included within the world knowledge database. By referencing the timer, the system 100 can determine when a user 108 has not responded to replies in a time sufficient to suggest engagement implying disinterest. For example, the system 100 can be configured to determine a user 108 is disinterested after a period of five minutes without a new message 118 from a user 108. In this example, a user 108 can be engaging with the system 100 regularly but then not provide a new message 118 for a period of five minutes, after which the system 100 will determine that the user 108 is disinterested and will take action to address the disinterest. In some embodiments, a duration of time corresponding to detected disinterest in a user 108 can be between thirty seconds and five minutes.
  • the system 100 can generate or retrieve an image and / or video to present to the user 108 to induce the user 108 to become more aware of conversation with the system 100.
  • Generating an image may include providing a text description 208 to a generative model which can generate a corresponding image, searching through the world context 132 for relevant images, or searching through an avatar image database for relevant images.
  • Generating a video may involve searching through one or more pre-made videos held within a database.
  • the system 100 can wait an additional amount of time before either providing another video or image or continuing to wait for the user 108 to initiate interaction with the system 100 by sending a new message 118.
  • Figure 6 is a block diagram illustrating an example embodiment of process 600 for monitoring and responding to a user’s 108 level of engagement.
  • the process 600 may be performed, for example, by the system 100.
  • the process 600 can begin at step 602 by monitoring dialogue between the user 108 and the system, wherein monitoring the dialogue includes instructing one or more of the guardrail observers to review the dialogue history database and prepare a report on the user’s 108 likely level of engagement.
  • the method determines that the user 108 is not engaged - such as by receiving an adjustment from the guardrail observers which indicates that the user 108 is not engaged - the method can proceed to step 606 by engaging with the LLM system 120 (which may be an LMM) to generate an interaction (an audible or text reply, an image, or a video) determined to cause the user 108 to engage with the system.
  • the LLM system 120 which may be an LMM
  • the system can, for example, retrieve a summary of the user 108 by sending a system prompt to the guardrail observers, receiving an adjustment, and then sending a second system prompt with the adjustment to the LMM including a command to generate a corresponding interaction.
  • a format which the interaction takes may depend on which medium of communication was previously being used.
  • step 606 can include a modal switch in which the system 100 can begin to use videos or images to interact with the user.
  • step 606 can include a modal switch to using audible or text replies.
  • the interaction can then be provided to the user 108.
  • the interaction may be provided to the guardrail observers for analysis (e.g., to identify harmful content).
  • the system can instruct the guardrail observers to analyze messages 118 received after the interaction is provided to identify a user’s 108 level of engagement. If the user 108 is still not engaged, the system can repeat the method 600 or go to a silent mode for a pre-configured amount of time or until the user initiates conversation.
  • the system 100 can additionally send system prompts to the guardrail observers 122 or the LLM system 120 to analyze the dialogue history database (and future user prompts 112) to identify a user’s 108 mental and I or emotional state. If the system 100 determines that the user 108 has a negative mental state or emotional state, the system 100 can instruct generation of adjustments to accommodate the negative mental or emotional state by generating system prompts to instruct generation of adjustments which are intended to address the negative mental or emotional state.
  • a first system prompt can be “Analyze the last twenty messages 118 to determine if the user 108 is experiencing emotional or mental distress.” If the user 108 is determined to be experiencing a negative mental or emotional state, the system 100 can generate a system prompt of “Generate a reply to the user’s 108 message 118 which relieves their negative emotional mental or emotional state”, wherein a corresponding adjustment can be generated.
  • the system may offer or otherwise advise the user to seek help from a qualified mental health professional.
  • one of the LLMs which implements the guardrail observers described herein may identify potential harm and generate a direct message to the user or add language, or a prompt, for review by an outside LLM which requests generation of text to seek help from a mental health professional.
  • the system 100 can monitor user 108 messages 118 to identify patterns suggestive of or directly corresponding to health conditions or descriptions of the user 108.
  • the system can direct the guardrail observers (via application of system prompts) to identify language within each message 118 which suggests or is related to a health condition or state.
  • a message 118 may be “My head is spinning” and the system may identify that the message 118 likely corresponds to a health condition and can categorize the message 118 within the dialogue history database (by further application of system prompts and the guardrail observers). Categorization of health conditions can involve sending a system prompt to a guardrail observer requesting the guardrail observer place a phrase or message 118 into a pre-defined health category.
  • the pre-defined health categories can include anxiety, attention disorder, depression, blood pressure, temperature, pain, among others.
  • the system can also categorize the messages 118 for later review by mentors engaging in mentor mode. For example, the previously mentioned message 118 of “My head is spinning” can be categorized as “Condition relating to head.” The system can thus summarize messages 118 into categories for presenting to mentors engaging in the mentor mode. Information categorized in the described method can include timestamps, a record of which calendar date and time a corresponding message 118 was received.
  • FIG. 7A is a block diagram illustrating an example process 700 of categorizing health related information from user 108 messages 118 and presenting the information to mentors. The process 700 may be performed, for example, by the system 100.
  • the system identifies and categorizes health related information within user messages 118.
  • a user interface will be presented which includes the categorized information in a format which can be read by the mentor.
  • the system upon receipt of interactions between the mentor and the user interface at step 706, the system, at step 708, can retrieve and display portions of categorized information along with summaries, wherein the summaries can include time stamps and other descriptors of the information.
  • Figure 7B and 7C arc example user interfaces that include summarizations of observed message 118 including information relating to health states and I or conditions as displayed within mentor mode.
  • FIG. 7B a series of messages 118 and replies are shown in which the system identified language suggestive of a health condition.
  • the system has identified language corresponding to having irritable bowel syndrome (“IBS”) and experiencing flare ups of symptoms at 712.
  • the identified language can also be correlated with received health data from a health device, such as illustrated in Figure ID, to identify patterns and associations between a user’s symptoms and language spoken by the user.
  • the system didn’t respond with health advice. Instead, the health symptoms were identified, assigned a confidence value and a timestamp, and summarized into a category.
  • the system summarized the health-related conversations and displayed them on a graph for the mentor to view.
  • Figure 7C similarly includes identification 714 of a message 118 corresponding to a user 108 altering their eating and experiencing significant weight loss.
  • Figure 7D is a user interface describing the system performing topic transition structure analysis during the follow-up step.
  • the system identifies 716 contents of a message 118 suggestive of the user making progress towards achieving one or more goals and can record a notification of the progress in the dialogue history and I or user model 134 databases.
  • the system can analyze the user message 118 to determine if progress was made towards achieving a goal, along with a confidence value associated with the analysis.
  • the system can perform cyclical analysis, analyzing text explanations for an analysis to assist determining confidence values.
  • the progress and confidence values can be saved to a database and included as a component of a summary.
  • the above-described user interfaces may be presented on a user device, for example for use by a health professional.
  • SDOH can pertain to a user’s 108 geographic location, individual context, behavioral context, and health context.
  • a SDOH pertaining to user’s 108 geographic location can be based upon a user’s 108 address and location of residence, and can include:
  • SDOH pertaining to user’s 108 behavioral context relate to information which describes how the user 108 spends their time and on what activities, which can include:
  • SDOH pertaining to a user’s 108 health context relate to information describing a user’s 108 physical and mental health condition(s) and biosensor readings, which can include:
  • the system 100 can determine a user’s 108 SDOH by engaging in conversation with a user 108 and - through instructing the guardrail observers 122 to do so - identifying words or phrases which correspond to relevant topics. After identifying one or more factors indicative of SDOH, the system 100 can generate replies which encourage the user 108 to seek assistance to alleviate issues commonly associated with negative SDOH. For example, the system 100 can determine a user 108 lives in an area with poor accessibility of fresh foods. In this example, the system 100, in response to the determination, can generate a reply which states “Have you considered ordering groceries for home delivery?
  • the system 100 can determine that a user 108 receives a likely sub-optimal amount of sleep each night and in response can generate a reply which states “If you’re feeling tired, sometimes it can be helpful to take a nap. Getting enough rest is important for physical and mental well-being.”
  • Determining SDOH can be done by generating one or more system prompts to instruct the guardrail observers 122 to analyze each user prompt 112 so as to identify relevant information. If found, the relevant information concerning the SDOH can be recorded in the user model 134.
  • SDOH can be identified by comparing user prompts 112 against a set of words and phrases pre-associated with certain SDOH. For example, if the word “hungry” is identified as being common in user prompts 112, the system 100 can determine that the user 108 is likely suffering from food insecurity and can record the information in the user model 134 database.
  • All of the processes described herein may be embodied in, and fully automated, via software code modules executed by a computing system that includes one or more computers or processors.
  • the code modules may be stored in any type of non-transitory computer-readable medium or other computer storage device. Some or all the methods may be embodied in specialized computer hardware.
  • a processor can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like.
  • a processor can include electrical circuitry configured to process computer-executable instructions.
  • a processor in another embodiment, includes an FPGA or other programmable device that performs logic operations without processing computer-executable instructions.
  • a processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a multitude of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration 136.
  • a processor may also include primarily analog components. For example, some or all of the signal processing algorithms described herein may be implemented in analog circuitry or mixed analog and digital circuitry.
  • a computing environment can include any type of computer system, including, but not limited to, a computer system based on a microprocessor, a mainframe computer, a digital signal processor, a portable computing device, a device controller, or a computational engine within an appliance, to name a few.
  • Conditional language such as, among others, “can,” “could,” “might” or “may,” unless specifically stated otherwise, are understood within the context as used in general to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user 108 input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.
  • Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (for example, X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
  • a device configured to are intended to include one or more recited devices. Such one or more recited devices can also be collectively configured to carry out the stated recitations.
  • a processor configured to carry out recitations A, B and C can include a first processor configured to carry out recitation A working in conjunction with a second processor configured to carry out recitations B and C.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne des systèmes et des procédés de génération de dialogue empathique. Un procédé donné à titre d'exemple comprend la réception d'un message provenant d'un utilisateur ; la génération d'une invite d'utilisateur à partir du message ; la fourniture de l'invite d'utilisateur à un grand modèle de langage (LLM) ; la réception d'une réponse LLM provenant du grand modèle de langage ; la génération d'une invite de système correspondant au message ; la fourniture de l'invite de système, de l'invite d'utilisateur ou de la réponse LLM à un ensemble d'observateurs de garde-corps ; la réception, en provenance des observateurs de garde-corps, d'un ou de plusieurs ajustements ; la génération d'une réponse avec un ajustement du ou des ajustements.
PCT/US2024/042810 2023-08-19 2024-08-16 Système de génération de dialogue empathique Pending WO2025042784A1 (fr)

Applications Claiming Priority (14)

Application Number Priority Date Filing Date Title
US202363533641P 2023-08-19 2023-08-19
US202363533645P 2023-08-19 2023-08-19
US202363533639P 2023-08-19 2023-08-19
US202363533643P 2023-08-19 2023-08-19
US202363533647P 2023-08-19 2023-08-19
US202363533640P 2023-08-19 2023-08-19
US202363533644P 2023-08-19 2023-08-19
US63/533,640 2023-08-19
US63/533,641 2023-08-19
US63/533,639 2023-08-19
US63/533,644 2023-08-19
US63/533,647 2023-08-19
US63/533,645 2023-08-19
US63/533,643 2023-08-19

Publications (2)

Publication Number Publication Date
WO2025042784A1 WO2025042784A1 (fr) 2025-02-27
WO2025042784A9 true WO2025042784A9 (fr) 2025-03-20

Family

ID=94732521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/042810 Pending WO2025042784A1 (fr) 2023-08-19 2024-08-16 Système de génération de dialogue empathique

Country Status (1)

Country Link
WO (1) WO2025042784A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20250219970A1 (en) * 2024-01-03 2025-07-03 International Business Machines Corporation Contextual conversational user assistance

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040179039A1 (en) * 2003-03-03 2004-09-16 Blattner Patrick D. Using avatars to communicate
WO2019133710A1 (fr) * 2017-12-29 2019-07-04 DMAI, Inc. Système et procédé de gestion de dialogue
US11302314B1 (en) * 2021-11-10 2022-04-12 Rammer Technologies, Inc. Tracking specialized concepts, topics, and activities in conversations
US12033265B2 (en) * 2022-04-28 2024-07-09 Theai, Inc. Artificial intelligence character models with goal-oriented behavior
US20240296219A1 (en) * 2023-03-05 2024-09-05 Microsoft Technology Licensing, Llc Adverse or malicious input mitigation for large language models

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20250219970A1 (en) * 2024-01-03 2025-07-03 International Business Machines Corporation Contextual conversational user assistance

Also Published As

Publication number Publication date
WO2025042784A1 (fr) 2025-02-27

Similar Documents

Publication Publication Date Title
US11862339B2 (en) Model optimization and data analysis using machine learning techniques
US20230047253A1 (en) System and Method for Dynamic Goal Management in Care Plans
US12248826B2 (en) Cloud-based healthcare platform
Radovic et al. Adolescents’ perspectives on using technology for health: qualitative study
Miller et al. Apps, avatars, and robots: The future of mental healthcare
US20210391083A1 (en) Method for providing health therapeutic interventions to a user
WO2021150617A1 (fr) Système et procédé de génération autonome de plans de soins personnalisés
US20180096738A1 (en) Method for providing health therapeutic interventions to a user
US20180060494A1 (en) Patient Treatment Recommendations Based on Medical Records and Exogenous Information
Fletcher et al. Process evaluation of text-based support for fathers during the transition to fatherhood (SMS4dads): mechanisms of impact
US20180107962A1 (en) Stress and productivity insights based on computerized data
CA3093066A1 (fr) Procedes et systemes de traitement de signal vocal
US20220384003A1 (en) Patient viewer customized with curated medical knowledge
US20240087700A1 (en) System and Method for Steering Care Plan Actions by Detecting Tone, Emotion, and/or Health Outcome
Kulkarni et al. Speech and language practitioners’ experiences of commercially available voice-assisted technology: web-based survey study
Callejas et al. Conversational agents for mental health and wellbeing
US12198814B2 (en) Tracking infectious disease using a comprehensive clinical risk profile and performing actions in real-time via a clinic portal
US20230115939A1 (en) Evaluation of comprehensive clinical risk profiles of infectious disease in real-time
US20220384001A1 (en) System and method for a clinic viewer generated using artificial-intelligence
WO2021086988A1 (fr) Extraction d'image et d'informations pour une prise de décisions faisant appel à des connaissances médicales conservées
US20220391730A1 (en) System and method for an administrator viewer using artificial intelligence
US20240355470A1 (en) System for condition tracking and management and a method thereof
WO2025042784A9 (fr) Système de génération de dialogue empathique
Hietbrink et al. Exploring the Acceptance of Just-in-Time Adaptive Lifestyle Support for People With Type 2 Diabetes: Qualitative Acceptability Study
Su et al. Robot-assisted homecare for older adults: A user study on needs and challenges: [version 2; peer review: 2 approved, 1 not approved]

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24857111

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE