WO2021052875A1 - Systèmes et procédés pour incorporer des données multimodales afin d'améliorer des mécanismes d'attention - Google Patents
Systèmes et procédés pour incorporer des données multimodales afin d'améliorer des mécanismes d'attention Download PDFInfo
- Publication number
- WO2021052875A1 WO2021052875A1 PCT/EP2020/075425 EP2020075425W WO2021052875A1 WO 2021052875 A1 WO2021052875 A1 WO 2021052875A1 EP 2020075425 W EP2020075425 W EP 2020075425W WO 2021052875 A1 WO2021052875 A1 WO 2021052875A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attention
- data
- artificial intelligence
- context vector
- multimodal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Definitions
- the present disclosure is directed to systems and methods for assisting with treating medical patients. More specifically, but not exclusively, the present disclosure is directed to systems and methods for enhancing attention mechanisms using multimodal patient data.
- the present disclosure is generally directed to systems and methods of utilizing multimodal information to enrich attention mechanisms, specifically to improve on the aligned position, p, and weights, a, from which the context variable, c, is generated.
- a set of correlated enhanced labels is selected based on preleamed correlation between enhanced labels and multimodal data. These correlated enhanced labels are associated with the enhanced labels but are more relevant to the interpretation of input data. Predetermined attention maps for the correlated enhanced labels improve the formulation of the aligned position p and the weight a and therefore context vector c of the artificial intelligence attention mechanism.
- a method for enhancing attention mechanisms using multimodal data comprises: retrieving input data from a first database; applying the input data to an attention layer within an artificial intelligence model; generating a context vector within the attention layer based at least in part on the input data; retrieving a predetermined attention map from a second database; modifying the attention layer based at least in part on the predetermined attention maps, wherein the modification to the attention layer results in a modification to the context vector; generating an output of the artificial intelligence model based on the context vector, wherein the output is used to provide a medical diagnosis or a treatment of a patient.
- the method further comprises the step of generating predetermined attention maps based on pre-leamed correlations between enhanced labels and multimodal data.
- the enhanced labels and multimodal data are stored or retrieved from the second database.
- the context vector is computed in the attention layer at least based on an aligned position and local weights.
- the pre-leamed correlations are determined by a Pearson correlation.
- the generation of predetermined attention maps further comprises generating correlated enhanced labels based on multimodal data and enhanced labels.
- the artificial intelligence model uses recurrent neural network (RNN), convolutional neural network (CNN), or encoder-decoder architecture.
- RNN recurrent neural network
- CNN convolutional neural network
- encoder-decoder architecture encoder-decoder architecture
- a spread of a distribution utilized by the artificial intelligence model is set at least in part based on the multimodal data.
- a computer program product for enhancing attention mechanisms using multimodal data.
- the computer program product has a plurality of non-transitory computer readable instructions, the plurality of non-transitory computer readable instructions arranged to be stored and executed on a memory and a processor of a computing device, respectively.
- the plurality of non-transitory computer readable instructions are operative to cause the processor to: retrieve input data from a first database; apply the input data to an attention layer within an artificial intelligence model; generate a context vector within the attention layer based at least in part on the input data; retrieve a predetermined attention map from a second database; modify the attention layer based at least in part on the predetermined attention maps, wherein the modification to the attention layer results in a modification to the context vector; generate an output of the artificial intelligence model based on the context vector, wherein the output is used to provide a medical diagnosis or a treatment of a patient.
- non-transitory computer readable instructions are further operative to cause the processor to: generate predetermined attention maps based on pre-leamed correlations between enhanced labels and multimodal data.
- the enhanced labels and multimodal data are stored or retrieved from the second database.
- the context vector is computed in the attention layer at least based on an aligned position and local weights.
- the pre-leamed correlations are determined by a Pearson correlation.
- generation of predetermined attention maps further comprises generating correlated enhanced labels based on multimodal data and enhanced labels.
- the artificial intelligence model uses recurrent neural network (RNN), convolutional neural network (CNN), or encoder-decoder architecture.
- RNN recurrent neural network
- CNN convolutional neural network
- encoder-decoder architecture encoder-decoder architecture
- FIG. 1 is a schematic representation of a unimodal attention mechanism.
- FIG. 2 is a schematic representation of a unimodal attention mechanism.
- FIG. 3 is a schematic representation of a multimodal attention mechanism according to aspects of the present disclosure.
- FIG. 4 is a flowchart illustrating the steps of a method for enhancing attention mechanisms according to aspects of the present disclosure.
- FIG. 5 is an illustration of a system for enhancing attention mechanisms according to aspects of the present disclosure.
- the present disclosure is generally directed to systems and methods of utilizing multimodal information to enrich attention mechanisms, specifically to improve on the aligned position, p, and weights, a, from which the context variable, c, is generated in an artificial intelligence attention mechanism.
- attention mechanisms When applied to existing deep learning architectures, attention mechanisms show improvements in model performance over the baseline architectures. Attention mechanisms allow the model to focus on a region of interest and suppress background clutter. However, current attention mechanisms are derived exclusively from the input source data, or a transformation of the input data. Additional information other than what the model is tasked with interpreting is not considered in the formulation of attention. Additional related information that can help further focus the region of interest is not utilized.
- the present disclosure is directed to a multimodal attention mechanism that incorporates additional available data other than the input source data. Multimodal attention mechanisms provide attention models which are better informed. Additionally, multimodal attention mechanisms provide models that are not completely dependent on the level of details inherent in the labels, which are often not ideal for deep learning attention mechanisms because of limitations in the available data.
- FIG. 1 is an illustration of current attention mechanisms in deep learning which depend exclusively on the direct input source data that the algorithm intends to classify, annotate, translate, etc.
- the attention model takes n number of arguments, xi, ... , x n (reference numbers 7, 14, 21, 28), and a context, c (reference number 35). It returns a vector, y (reference number 42), which is the summary of the individual input XiS (reference numbers 7, 14, 21, 28), focusing on information linked to the context, c, 35.
- the returned vector, y, 42 is a weighted mean of the individual XiS (reference numbers 7, 14, 21, 28), where the weights are chosen according to the relevance of each xi given the context, c, 35.
- the inputs (reference numbers 7, 14, 21, 28) can also be the transformed representations of xiS (reference numbers 7, 14, 21, 28) (hi, ... , h n ) via an encoding recurrent neural network (RNN), convolutional neural network (CNN), or outputs of their intermediate layers.
- RNN encoding recurrent neural network
- CNN convolutional neural network
- the input is the context, c, 35, and the individual input XiS (reference numbers 7, 14, 21, 28) are what is being addressed by the attention mechanism.
- the system uses the hyperbolic tangent function tanh layer 49, the system computes mi, ... , m n (reference numbers 56, 63, 70, 77), which are an aggregation of the values of Xi (reference numbers 7, 14, 21, or 28) and context, c, 35.
- Each m (reference numbers 56, 63, 70, 77) value is computed independently. This means that mi 56 is computed using xi 7 and without considering X2 14.
- each weight is computed.
- the softmax function 84 is a function that takes as input a vector of K real numbers and normalizes the vector of K real numbers into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval (0, 1), and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities.
- the softmax function 84 is used to map the non-normalized output of a network to a probability distribution over predicted output classes.
- the Si (reference numbers 91, 98, 105, 112) are the softmax of the mi (reference numbers 56, 63, 70, 77) projected on a learned direction. Accordingly, the softmax can be thought of as the maximum of the relevance of the input 7, 14, 21, 28 according to the context 35.
- the output is the weighted arithmetic mean of all the xi (reference numbers 7, 14, 21, or 28), where the weight represents the relevance for each variable according to context, c, 35.
- FIG. 2 In another illustration of current attention mechanism models shown in FIG. 2, more detailed formulation of the context variable, c, 35, is illustrated.
- the model first predicts a single aligned position p 119.
- a window 117 centered around the aligned position p t 119 is then used to compute context, c, 35 which is a weighted average of the source input states, hs, 126 in the window.
- the weights, a, 133 are inferred from the source states hs 126 (which are the transformed representations of inputs 26) in the window and the current target state h t 140 if applicable.
- the optional subscript, t denotes each time step if the attention mechanism is adapted in a RNN framework.
- the attention mechanisms illustrated in FIG. 1 and FIG. 2 are derived from a single modality of information, the input data, xi 7 through x n 28, or their transformed representations.
- the attention mechanisms are exclusive derived from the input images.
- the xis (reference numbers 7, 14, 21, 26, or 28) will be the input images divided into n number of parts.
- the attention mechanisms is again exclusively learned from the input sentences to be translated.
- the XiS reference numbers 7, 14, 21, 26, or 28
- Multimodal data can be used to enrich attention mechanisms.
- multimodal data can be clinical data which are often correlated.
- attention mechanisms applied to inputted images such as, for example, diagnostic medical imaging such as an MRI
- This clinical data can be doctor’s notes regarding patient visits, patient symptoms, current or past medical diagnoses, test results from other diagnostic test, current or past patient information, etc.
- attention mechanisms applied to clinical notes such as, for example, clinical notes from a healthcare professional interpreting a diagnostic medical image such as an MRI, can be assessed using other multimodal clinical data.
- the multimodal clinical data of a patient will likely inform detailed signs observable on medical images and clinical notes.
- the present disclosure is directed to multimodal attention mechanisms that incorporate data other than the input source data alone, going beyond the current unimodal approach. Applicant has recognized that not only is the attention model better informed, but modelling is not completely dependent on the level of detail inherent in the labels, which is often not ideal for deep learning tasks due to limitations in available data.
- the input data 7, 14, 21, 28, 26 applied to the artificial intelligence model may be the MRI image
- the output 42 from the artificial intelligence model may be information that is used to provide medical diagnosis or treatment of a patient, such as for example, identification of an enlarged heart.
- the output 42 from the artificial intelligence model for example, identification of an enlarged heart, does not rely on the input data 26, the inputted MRI image, alone.
- the output 42 is determined using the context vector 35 in an attention layer 147 of an artificial intelligence model utilizing attention mechanisms.
- the context vector 35 is computed in the attention layer 147 at least based on an aligned position, p, 119 and local weights, a, 133.
- the context vector 35 is generated at least in part based on the input data 26, the MRI image, and the attention layer 147 is modified at least in part based on a predetermined attention map 154 which also modifies the context vector 35.
- the predetermined attention map 154 which modifies the context vector 35 and the attention layer 147 from which the output 42 of the artificial intelligence model is determined, is generated based on pre-leamed correlations 33 between enhanced labels and multimodal data.
- multimodal data regarding medical diagnosis and treatment of many patients can be used to learn correlations 33 between the multimodal data and enhanced labels.
- Correlations 33 can be learned between the multimodal medical data and enhanced labels. For example, these correlations 33 can be learned using known correlation techniques such as a Pearson correlation in statistics. Other examples of correlations techniques include: k-means clustering method, Gaussian Mixture Modeling, and Correlation Explanation.
- the pre-leamed correlations 33 between multimodal data and enhanced labels can be, for example, the correlation between specific symptoms (for example, patient identified symptoms or medical test results) and a possible disease which is correlated with those symptoms.
- correlated enhanced labels 37 can be identified for the target patient.
- the correlated enhanced labels 37 which are identified for the target patient can be, for example, a particular disease or information focusing medical diagnosis or treatment on a particular area or system of the body. For example, from multimodal data 30 from the target patient indicating lack of cardiac output, the correlated enhanced label 37 may indicate focusing on the heart.
- pre-determined attention maps 154 can be applied to the artificial intelligence attention mechanism.
- the output of the attention mechanism will not be solely based on the input data, for example the MRI image, but will also be based on the multimodal data, for example, multimodal data 30 of the target patient which included clinical information that was correlated with a disease or area/system of the body.
- the pre-determined attention maps 154 are applied to the artificial intelligence attention mechanism by modifying the attention layer 147 and context vector 35 generated by the artificial intelligence model.
- the attention layer 147 may decrease in size placing greater focus on a narrower set of input data.
- the pre-determined attention map 154 may focus the attention mechanism on a smaller area of the body, such as the heart, on the MRI image.
- x is the source input data
- y is the target output
- s is the representation computed by an encoder for each source x
- p is the aligned position.
- y is the output
- x is the source input data
- s is the representation computed by an encoder for each source x.
- W s is a hyperparameter
- h t is the attentional hidden layer.
- the context vector c t is then derived as a weighted average over the set of source hidden states within the window [p t - D, p t + D], where D is empirically selected.
- c a ⁇ h s
- a is the weight
- h s is the source hidden state
- s is the representation computed by an encoder for each source x
- h is a decoder hidden unit in an RNN or an output of the conditional layer in the CNN
- h s is the source hidden state
- p is the aligned position
- S is the input dimension
- v p is a hyper parameter
- Wp is a hyperparameter
- h t is a decoder hidden unit in an RNN or an output of the conditional layer in the CNN.
- the standard deviation can be set empirically as 2 .
- the distribution dist (s, p, s) may also be specified by the multimodal data 30, for example, so that attention does not have to center at the aligned position p, but instead is weighted by various distributions (for example, edges surrounding p or segmentation of sub regions of an image if that input is an image).
- the spread of the distribution s can also be informed by multimodal data. For example, for focal image signs the spread can be smaller, and for diffusive image signs the spread can be larger. As another example, for early detection in a sentence, the spread can be smaller, and the spread can be larger for sentiment detection.
- FIG. 4 shows the steps in an exemplary method 200 for enhancing attention mechanisms using multimodal data according to the present disclosure.
- predetermined attention maps 154 based on pre-leamed correlations 33 between enhanced labels and multimodal data are generated (step 210); input data 26 is retrieve from a first database 364 (step 220); the input data 26 is applied to an attention layer 147, 154 within an artificial intelligence model (step 230); a context vector 35 is generated within the attention layer 147, 154 based at least in part on the input data 26 (step 240); a predetermined attention map 154 is retrieved from a second database 366 (step 250); the attention layer 147 is modified based at least in part on the predetermined attention map 154, wherein the modification to the attention layer results in a modification to the context vector 35 (step 260); an output 42 of the artificial intelligence model is generated based on the context vector 35, wherein the output 42 is used to provide a medical diagnosis or a treatment of a patient (step 270).
- FIG. 5 is an exemplary schematic representation of system 300 for enhancing attention mechanisms using multimodal data.
- System 300 may be any of the systems described or otherwise envisioned herein, and may comprise any of the components described or otherwise envisioned herein.
- system 300 comprises one or more of a processor 320, memory 330, user interface 340, communications interface 350, and storage 360, interconnected via one or more system buses 310. It will be understood that the actual organization of the components of the system 300 may be different and more complex than illustrated.
- system 300 comprises a processor 320 capable of executing instructions stored in memory 330 or storage 360 or otherwise processing data to, for example, perform one or more steps of the method 300 for enhancing attention mechanisms using multimodal data (shown in FIG. 4).
- Processor 320 may be formed of one or multiple modules.
- Processor 320 may take any suitable form, including but not limited to a microprocessor, microcontroller, multiple microcontrollers, circuitry, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), a single processor, or plural processors.
- FPGA field programmable gate array
- ASIC application-specific integrated circuit
- Memory 330 can take any suitable form, including a non-volatile memory and/or RAM.
- the memory 330 may include various memories such as, for example LI, L2, or L3 cache or system memory.
- the memory 330 may include static random access memory (SRAM), dynamic RAM (DRAM), flash memory, read only memory (ROM), or other similar memory devices.
- SRAM static random access memory
- DRAM dynamic RAM
- ROM read only memory
- the memory can store, among other things, an operating system.
- the RAM is used by the processor for the temporary storage of data.
- an operating system may contain code which, when executed by the processor, controls operation of one or more components of system 300. It will be apparent that, in embodiments where the processor implements one or more of the functions described herein in hardware, the software described as corresponding to such functionality in other embodiments may be omitted.
- User interface 340 may include one or more devices for enabling communication with a user.
- the user interface can be any device or system that allows information to be conveyed and/or received, and may include a display, a mouse, and/or a keyboard for receiving user commands.
- user interface 340 may include a command line interface or graphical user interface that may be presented to a remote terminal via communication interface 350.
- the user interface may be located with one or more other components of the system, or may located remote from the system and in communication via a wired and/or wireless communications network.
- Communication interface 350 may include one or more devices for enabling communication with other hardware devices.
- communication interface 350 may include a network interface card (NIC) configured to communicate according to the Ethernet protocol.
- NIC network interface card
- communication interface 350 may implement a TCP/IP stack for communication according to the TCP/IP protocols.
- TCP/IP protocols Various alternative or additional hardware or configurations for communication interface 350 will be apparent.
- Storage 360 may include one or more machine-readable storage media such as read only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, or similar storage media.
- ROM read only memory
- RAM random-access memory
- storage 360 may store instructions for execution by processor 320 or data upon which processor 320 may operate.
- storage 360 may store an operating system for controlling various operations of system 300.
- memory 330 may also be considered to constitute a storage device and storage 360 may be considered a memory.
- memory 330 and storage 360 may both be considered to be non-transitory machine-readable media.
- non-transitory will be understood to exclude transitory signals but to include all forms of storage, including both volatile and non-volatile memories.
- processor 320 may include multiple microprocessors that are configured to independently execute the methods described herein or are configured to perform steps or subroutines of the methods described herein such that the multiple processors cooperate to achieve the functionality described herein.
- processor 320 may include a first processor in a first server and a second processor in a second server. Many other variations and configurations are possible.
- storage 360 may store one or more algorithms and/or instructions to carry out one or more functions or steps of the methods described or otherwise envisioned herein.
- storage 360 may comprise, among other instructions, user interface instructions 362, first database 364, and second database 366
- user interface instructions 362 direct the system to receive information from and/or provide information to a user via user interface 340.
- the user interface instructions 362 may also direct the system to provide the output to the user to provide a medical diagnosis or treatment.
- first database 364 is arranged to store input data, where the input data is applied to an attention layer within an artificial intelligence model.
- second database 366 is arranged to store multimodal data, enhanced labels, and predetermined attention maps.
- the methods and algorithms disclosed herein may be applied to, for example, other variants of attention frameworks, such as hard attention mechanisms and global mechanisms.
- the disclosed enhanced attention mechanisms can be incorporated in any deep learning framework, such as RNN, CNN, any encoder-decoder architecture, or any other known deep learning frameworks or combinations thereof.
- the general approach of incorporating multimodal data can be applied for tasks such as speech recognition, translation, reasoning, and general classification tasks, as well as to image analysis (for example, focusing on parts of the sentence for translation or focusing on part of the image for image captioning).
- the weights learned by attention models can be readily plotted and serve as aid for interpretation of the output of the deep learning model.
- the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
- the present disclosure may be implemented as a system, a method, and/or a computer program product at any possible technical detail level of integration
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user’s computer, partly on the user's computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
- the computer readable program instructions may be provided to a processor of a, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the blocks may occur out of the order noted in the Figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
La présente invention concerne des procédés pour améliorer des mécanismes d'attention en utilisant des données multimodales. Ces procédés comprennent les étapes consistant à : récupérer des données d'entrée dans une première base de données; appliquer les données d'entrée à une couche d'attention à l'intérieur d'un modèle d'intelligence artificielle; générer un vecteur de contexte à l'intérieur de la couche d'attention sur la base, au moins en partie, des données d'entrée; récupérer une carte d'attention prédéterminée à partir d'une seconde base de données; modifier la couche d'attention sur la base, au moins en partie, des cartes d'attention prédéterminées, la modification apportée à la couche d'attention entraînant une modification du vecteur de contexte; générer une sortie du modèle d'intelligence artificielle sur la base du vecteur de contexte, la sortie étant utilisée pour fournir un diagnostic médical ou un traitement d'un patient.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962903162P | 2019-09-20 | 2019-09-20 | |
| US62/903,162 | 2019-09-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021052875A1 true WO2021052875A1 (fr) | 2021-03-25 |
Family
ID=72474313
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2020/075425 Ceased WO2021052875A1 (fr) | 2019-09-20 | 2020-09-11 | Systèmes et procédés pour incorporer des données multimodales afin d'améliorer des mécanismes d'attention |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2021052875A1 (fr) |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB202108294D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| GB202108314D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| GB202108293D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| CN113822178A (zh) * | 2021-09-06 | 2021-12-21 | 中车工业研究院有限公司 | 基于跨模态注意力机制的焊缝缺陷识别方法 |
| CN113887582A (zh) * | 2021-09-15 | 2022-01-04 | 南方科技大学 | 图像分类方法、装置、设备及存储介质 |
| GB202116899D0 (en) | 2021-11-24 | 2022-01-05 | Prec Planting Llc | Agricultural sample handling system and related methods |
| WO2022243796A1 (fr) | 2021-05-20 | 2022-11-24 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2022259071A1 (fr) | 2021-06-09 | 2022-12-15 | Precision Planting Llc | Micropompe |
| WO2023031725A1 (fr) | 2021-08-31 | 2023-03-09 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2023042032A1 (fr) | 2021-09-17 | 2023-03-23 | Precision Planting Llc | Système et procédé de déchargement d'un contenant d'échantillons contenant un échantillon agricole |
| WO2023161728A1 (fr) | 2022-02-23 | 2023-08-31 | Precision Planting Llc | Système de préparation d'échantillon de lisier agricole et procédés associés |
| WO2023170482A1 (fr) | 2022-03-09 | 2023-09-14 | Precision Planting Llc | Procédés, systèmes et kits d'analyse de sol |
| WO2023227960A1 (fr) | 2022-05-24 | 2023-11-30 | Precision Planting Llc | Système d'analyse de suspension d'échantillon agricole et procédés associés |
| WO2024023731A1 (fr) | 2022-07-28 | 2024-02-01 | Precision Planting Llc | Système d'emballage d'échantillon agricole |
| CN118507035A (zh) * | 2024-07-16 | 2024-08-16 | 北京大学 | 基于知识图谱增强的医疗诊断方法及应用 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180350459A1 (en) * | 2017-06-05 | 2018-12-06 | University Of Florida Research Foundation, Inc. | Methods and apparatuses for implementing a semantically and visually interpretable medical diagnosis network |
| US10380236B1 (en) * | 2017-09-22 | 2019-08-13 | Amazon Technologies, Inc. | Machine learning system for annotating unstructured text |
-
2020
- 2020-09-11 WO PCT/EP2020/075425 patent/WO2021052875A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180350459A1 (en) * | 2017-06-05 | 2018-12-06 | University Of Florida Research Foundation, Inc. | Methods and apparatuses for implementing a semantically and visually interpretable medical diagnosis network |
| US10380236B1 (en) * | 2017-09-22 | 2019-08-13 | Amazon Technologies, Inc. | Machine learning system for annotating unstructured text |
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022243796A1 (fr) | 2021-05-20 | 2022-11-24 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2022243794A1 (fr) | 2021-05-20 | 2022-11-24 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2022259073A1 (fr) | 2021-06-09 | 2022-12-15 | Precision Planting Llc | Micro-soupape |
| WO2022259071A1 (fr) | 2021-06-09 | 2022-12-15 | Precision Planting Llc | Micropompe |
| GB202108294D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| GB202108293D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| GB202108314D0 (en) | 2021-06-10 | 2021-07-28 | Prec Planting Llc | Agricultural sampling system and related methods |
| WO2023031725A1 (fr) | 2021-08-31 | 2023-03-09 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2023031727A1 (fr) | 2021-08-31 | 2023-03-09 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| WO2023031726A1 (fr) | 2021-08-31 | 2023-03-09 | Precision Planting Llc | Système d'échantillonnage agricole et procédés associés |
| CN113822178A (zh) * | 2021-09-06 | 2021-12-21 | 中车工业研究院有限公司 | 基于跨模态注意力机制的焊缝缺陷识别方法 |
| CN113822178B (zh) * | 2021-09-06 | 2024-04-02 | 中车工业研究院有限公司 | 基于跨模态注意力机制的焊缝缺陷识别方法 |
| CN113887582A (zh) * | 2021-09-15 | 2022-01-04 | 南方科技大学 | 图像分类方法、装置、设备及存储介质 |
| WO2023042032A1 (fr) | 2021-09-17 | 2023-03-23 | Precision Planting Llc | Système et procédé de déchargement d'un contenant d'échantillons contenant un échantillon agricole |
| GB202116899D0 (en) | 2021-11-24 | 2022-01-05 | Prec Planting Llc | Agricultural sample handling system and related methods |
| WO2023161728A1 (fr) | 2022-02-23 | 2023-08-31 | Precision Planting Llc | Système de préparation d'échantillon de lisier agricole et procédés associés |
| WO2023170482A1 (fr) | 2022-03-09 | 2023-09-14 | Precision Planting Llc | Procédés, systèmes et kits d'analyse de sol |
| WO2023227960A1 (fr) | 2022-05-24 | 2023-11-30 | Precision Planting Llc | Système d'analyse de suspension d'échantillon agricole et procédés associés |
| WO2024023731A1 (fr) | 2022-07-28 | 2024-02-01 | Precision Planting Llc | Système d'emballage d'échantillon agricole |
| WO2024023729A1 (fr) | 2022-07-28 | 2024-02-01 | Precision Planting Llc | Système de conditionnement d'un échantillon agricole et procédés associés |
| WO2024023728A1 (fr) | 2022-07-28 | 2024-02-01 | Precision Planting Llc | Système d'emballage d'échantillon de produit agricole |
| CN118507035A (zh) * | 2024-07-16 | 2024-08-16 | 北京大学 | 基于知识图谱增强的医疗诊断方法及应用 |
| CN118507035B (zh) * | 2024-07-16 | 2024-10-29 | 北京大学 | 基于知识图谱增强的医疗诊断方法及应用 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021052875A1 (fr) | Systèmes et procédés pour incorporer des données multimodales afin d'améliorer des mécanismes d'attention | |
| US11688518B2 (en) | Deep neural network based identification of realistic synthetic images generated using a generative adversarial network | |
| US11901047B2 (en) | Medical visual question answering | |
| Kurmann et al. | Simultaneous recognition and pose estimation of instruments in minimally invasive surgery | |
| JP7391846B2 (ja) | ディープニューラルネットワークを使用したコンピュータ支援診断 | |
| US11544529B2 (en) | Semi-supervised classification with stacked autoencoder | |
| US10606982B2 (en) | Iterative semi-automatic annotation for workload reduction in medical image labeling | |
| US20210319340A1 (en) | Machine learning model confidence score validation | |
| US20240152767A1 (en) | Visual question answering with unlabeled image augmentation | |
| Alsharid et al. | Captioning ultrasound images automatically | |
| WO2021247962A1 (fr) | Classification de données hors distribution au moyen d'une perte contrastive | |
| KR20240073790A (ko) | 다중 레이블 이미지에 대한 이미지 분류 모델 학습방법 및 시스템, 그리고 상기 이미지 분류 모델을 통해 이미지를 분류하는 방법 | |
| Spinks et al. | Justifying diagnosis decisions by deep neural networks | |
| Khan et al. | Surgical scene understanding in the era of foundation ai models: A comprehensive review | |
| CN114266777B (zh) | 分割模型的训练方法、分割方法、装置、电子设备及介质 | |
| JP7658692B2 (ja) | Oct画像データを正規化すること | |
| CN110706200B (zh) | 数据预测的方法及装置 | |
| US20250384666A1 (en) | Selecting in-context demonstration examples using difficulty classifications | |
| Hasan et al. | Stamp: A self-training student-teacher augmentation-driven meta pseudo-labeling framework for 3d cardiac mri image segmentation | |
| Møller et al. | NEMt: Fast targeted explanations for medical image models via neural explanation masks | |
| EP4616380A1 (fr) | Procédé de traitement d'image, dispositif électronique et produit programme | |
| Dornier et al. | Scaf: Skip-connections in auto-encoder for face alignment with few annotated data | |
| Gupta et al. | Liver Tumor Segmentation with U-Net, V-Net, and AH-Net Using MONAI | |
| US20260050795A1 (en) | Visual retrieval augmented generation for multimodal large language models | |
| Xia et al. | A medical visual question-answering model based on multi-scale feature fusion and question Feature enhancement: H. Xia et al. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20771823 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20771823 Country of ref document: EP Kind code of ref document: A1 |