WO2021134027A1 - Predicting and addressing severe disease in individuals with sepsis - Google Patents
Predicting and addressing severe disease in individuals with sepsis Download PDFInfo
- Publication number
- WO2021134027A1 WO2021134027A1 PCT/US2020/067038 US2020067038W WO2021134027A1 WO 2021134027 A1 WO2021134027 A1 WO 2021134027A1 US 2020067038 W US2020067038 W US 2020067038W WO 2021134027 A1 WO2021134027 A1 WO 2021134027A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- individual
- sample
- level
- residue sum
- phosphatidylcholine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/60—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
- G16H40/67—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/145—Measuring characteristics of blood in vivo, e.g. gas concentration or pH-value ; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid or cerebral tissue
- A61B5/14546—Measuring characteristics of blood in vivo, e.g. gas concentration or pH-value ; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid or cerebral tissue for measuring analytes not otherwise provided for, e.g. ions, cytochromes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4836—Diagnosis combined with treatment in closed-loop systems or methods
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4836—Diagnosis combined with treatment in closed-loop systems or methods
- A61B5/4839—Diagnosis combined with treatment in closed-loop systems or methods combined with drug delivery
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4842—Monitoring progression or stage of a disease
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7275—Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/40—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mechanical, radiation or invasive therapies, e.g. surgery, laser therapy, dialysis or acupuncture
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- Described herein are methods, systems, and computational environments for stratifying individuals with sepsis or at risk of developing sepsis, and for predicting severe disease in individuals with sepsis or at risk of developing sepsis. Also described are systems and methods for generating topological networks and clusters identifying disease-response phenotypes, systems and methods for selecting prognostic or diagnostic features and host biomarkers, and systems and methods for predicting clinical outcomes. Also described are methods of detecting panels of host biomarkers, methods of assessing risk factors in an individual with sepsis or at risk of developing sepsis, and methods of treating a patient determined to have an elevated risk of severe disease from sepsis.
- Expeditious and accurate information for clinical decision-making is critical for improving outcomes for infectious disease patients, particularly if a dysregulated host response to the infection leads to the potentially life-threatening organ dysfunction known as sepsis.
- Early recognition and characterization of an infection and the ensuing host response are essential components for preventing the development and/or mitigating the severity of sepsis.
- current diagnostic and prognostic assays are either insensitive or not expediently useful, if available at all.
- the use of specific host response biomarkers can improve our ability to quickly and accurately phenotype infectious disease states and predict their clinical course. This will be highly informative not just in traditional clinical settings, but also in low resource environments, military operations, and for at-home monitoring.
- Described herein are methods of stratifying individuals with sepsis or at risk of developing sepsis; predicting severe disease in an individual with sepsis, including prior to the detection of symptoms thereof and/or prior to the onset of any detectable symptoms thereof; identifying disease-response phenotypes and associated diagnostic or prognostic host biomarker panels; and related methods of treatment targeted toward disease-response phenotypes.
- the present disclosure also provides methods of treating individuals with sepsis determined to have an increased risk of severe disease, optionally before the onset of any detectable symptoms thereof, such as before there are perceivable, noticeable, or measurable signs of severe disease in the individual.
- treatments may include: initiation or broadening of antibiotic therapy, balancing fluids and electrolytes, renal replacement therapy, adjustment of mechanical ventilation, targeted or empiric anti-inflammatory or immunomodulatory drugs, hemodynamic adjustments, calcium channel blocker medications, or surgical intervention.
- Benefits of such early treatment may include: reduced severity or duration of symptoms, reduced need for organ support (e.g., ventilation, renal replacement therapy, or vasoactive medications), reduced length of stay in a hospital or intensive care unit, reduced risk of mortality, reduced longterm morbidity (e.g., time to returning to activities or quality of life), decreased incidence of longterm sequelae of infectious diseases (e.g., chronic kidney disease, cardiovascular disease, or chronic pulmonary disease), decreased re-hospitalization rates, and/or reduced medical costs.
- organ support e.g., ventilation, renal replacement therapy, or vasoactive medications
- reduced length of stay in a hospital or intensive care unit e.g., reduced risk of mortality
- reduced longterm morbidity e.g., time to returning to activities or quality of life
- decreased incidence of longterm sequelae of infectious diseases e.g., chronic kidney disease, cardiovascular disease, or chronic pulmonary disease
- decreased re-hospitalization rates e.g., chronic kidney disease, cardiovascular disease, or chronic pulmonary disease
- methods for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: generating a discovery database storing first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing a plurality of topological data analysis and/or clustering algorithms for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; and outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- methods for generating a model predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: generating a discovery database storing first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing a plurality of topological data analysis and/or clustering algorithms for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; and outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- methods for pre-processing data that is stored in the discovery database including: determining that a first value of at least one of the plurality of clinical parameters is missing; estimating a reference value for the at least one of the plurality of clinical parameters that is missing; and storing the reference value as the first value of the at least one of the plurality of clinical parameters in the discovery database.
- the plurality of data quality control algorithms comprise at least one of: differential expression algorithms, principal component analysis, k-nearest neighbor imputation algorithms, three-sigma rule algorithms, and empirical Bayes method algorithms. While these algorithms are enumerated for data quality control, many others are contemplated.
- the clinical parameter data is stratified using topological data analysis and/or cluster analysis, wherein disease-response phenotypes are defined based on the identified clusters.
- the cluster analysis comprises at least one of: k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear clustering (e.g., t-distributed stochastic neighbor embedding), consensus clustering, or spectral clustering. While these algorithms are enumerated for cluster analysis, many others are contemplated.
- topological data analysis uses the Mapper algorithm as an alternative to canonical cluster analysis.
- a topological network is generated in which individuals or samples group together based on their similarities in the plurality of subsets of clinical parameters, as well as the algebraic topology of the same data. Clusters are then delineated based on the persistence homology of node density and connectivity (edges).
- the feature selection machine learning models comprise at least one of: unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s t-test, Mann-Whitney U test, random forest, logistic regression, or neural networks.
- the feature selection ensemble learning models include combinations of the models described herein for cluster analysis and machine learning.
- the feature selection ensemble learning models may comprise: cluster analysis, unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s t-test, Mann-Whitney U test, random forest, logistic regression, neural networks, or a combination thereof.
- Ensembles may also comprise: Bayes optimal classifier, classification and regression tree, bootstrap aggregating, boosting, Bayesian model averaging, Bayesian model combination, a bucket of models, stacking, or a combination thereof.
- the plurality of biological parameters comprise one or more protein data markers, one or more nucleic acid data markers, one or more metabolite data markers, one or more clinical outcomes data, one or more administrative health data, or a combination thereof.
- systems for generating a machine learning engine for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: one or more processors; a memory; a communication platform; a discovery database configured to store first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; a machine learning engine configured to: execute a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; execute a plurality of topological data analysis and/or clustering algorithms forthe plurality of subsets of clinical parameters; execute a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; output
- systems for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: one or more processors; a memory; a communication platform; a discovery database configured to store first values of a plurality of clinical parameters and the clinical outcomes associated with a plurality of first subjects with sepsis or at risk of developing sepsis; a machine learning engine configured to pre-train a model for severe disease in an individual with sepsis or at risk of developing sepsis, wherein the model is pre-trained by performing operations comprising: executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing a plurality of topological data analysis and/or clustering algorithms for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; outputting a model for predicting severe disease in the individual
- a non-transitory computer-readable medium having information recorded thereon for generating a model for predicting severe disease in an individual with sepsis or at risk of developing sepsis, wherein the information, when read by a computer, causes the computer to perform operations of: generating a discovery database storing first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing a plurality of topological data analysis and/or clustering algorithms for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- FIG. 1 depicts a method of predicting severe disease in individuals with sepsis or at risk of developing sepsis, through a process of acquisition of discovery data, data quality control in a data quality control engine, topological data analysis and/or clustering in a data stratification engine, feature selection and classification and/or time-to-event analyses in a feature selection and outcome modeling engine, and predicting severe disease in individuals with sepsis or at risk of developing sepsis in a prediction engine.
- FIG. 2 illustrates a block diagram for a severe disease in sepsis prediction system for predicting severe disease in an individual with sepsis or at risk of developing sepsis, as described herein.
- FIG. 3 illustrates a flow-chart for a severe disease in sepsis prediction system and the data flow at each stage of the system.
- FIG. 4 illustrates an embodiment of a computational environment that involves a computing device, a network, and a remote device.
- FIG. 5 illustrates an example of an Austere Environments Consortium for Enhanced Sepsis Outcomes (ACESO) flow chart for a sepsis host biomarker discovery phase.
- ACESO Enhanced Sepsis Outcomes
- FIG. 6 illustrates an example of topological data analysis networks of blood plasma gene expression in an ACESO discovery cohort.
- FIG. 7 illustrates an example of topological data analysis networks of blood plasma protein expression in an ACESO discovery cohort.
- FIG. 8 illustrates an example of a classification and regression tree output from an ensemble machine learning model prognosing risk of hospital admission in COVID-19 patients based on blood cytokine levels and basic demographics.
- the present disclosure provides methods of predicting severe disease and adjusting treatments for individuals with sepsis or at risk of developing sepsis, optionally before the onset of detectable symptoms thereof, such as before there are perceivable, noticeable, or measurable signs of severe disease in the individual.
- the individuals may be undergoing established treatment, and based on the clinical outcome predicted by the methods described herein adjustment can be made for more appropriate treatment.
- the present disclosure provides methods for predicting severe disease and adjusting treatments for individuals with sepsis or at risk of developing sepsis that are applicable to most, if not all, populations in different parts of the world.
- the present disclosure also provides methods of treating individuals with sepsis determined to have an increased risk of severe disease, optionally before the onset of detectable symptoms thereof, such as before there are perceivable, noticeable or measurable signs of severe disease in the individual.
- treatments may include: initiation or broadening of antibiotic therapy, balancing fluids and electrolytes, renal replacement therapy, adjustment of mechanical ventilation, targeted or empiric anti-inflammatory or immunomodulatory drugs, hemodynamic adjustments, calcium channel blocker medications, or surgical intervention.
- Benefits of such early treatment may include: reduced severity or duration of sepsis, reduced need for organ support (e.g., ventilation, renal replacement therapy, or vasoactive medications), reduced length of stay in a hospital or intensive care unit, reduced risk of mortality, reduced longterm morbidity (e.g., time to returning to activities or quality of life), decreased incidence of longterm sequelae of infectious diseases (e.g., chronic kidney disease, cardiovascular disease, or chronic pulmonary disease), decreased re-hospitalization rates, and/or reduced medical costs.
- adjusting current treatment comprises changing dose of current antibiotic, changing to a different antibiotic, changing dose of non-steroidal anti- inflammatory drugs, or initiating or adjusting insulin therapy.
- the present disclosure also provides using the methods described herein to monitor patients to help clinicians make decisions on adjusting treatments, when necessary.
- administer refers to (1) providing, giving, dosing and/or prescribing, such as by either a health professional or his or her authorized agent or under their direction, and (2) putting into, taking or consuming, such as by a health professional or the individual, and is not limited to any specific dosage forms or routes of administration, unless otherwise stated.
- treat include alleviating, abating or ameliorating sepsis or one or more symptoms thereof, whether or not sepsis is considered to be “cured” or “healed” and whether or not all symptoms are fully resolved.
- the terms “ameliorating” or “preventing” progression of sepsis include alleviating or preventing the development of one or more symptoms thereof, or impeding or preventing an underlying mechanism of severe disease, and achieving any therapeutic and/or prophylactic benefit.
- sepsis refers to the potentially life-threatening physical reaction of the host to an infection.
- the term “at risk of developing sepsis” refers to an individual being infected by a pathogen, which may result in them developing sepsis.
- pathogens include, but are not limited to: viruses (e.g., influenza, ebolaviruses, SARS-CoV-2), bacteria (e.g., Escherichia coli, Mycobacterium tuberculosis, Salmonella sp., Leptospira sp., Rickettsia sp., Burkholderia pseudomallei), fungi (e.g., Aspergillus sp., Candida sp., Histoplasma sp., Pneumocystis jirovecii), or parasites (e.g., Plasmodium sp., Trypanosoma cruzi). Whilst infection by a pathogen is a prerequisite for developing sepsis, it is understood that not all infected
- viruses e.g.
- severe disease is defined as sepsis with any degree of end organ damage (e.g., kidney, respiratory, or liver failure). Sepsis patients who go on to develop severe disease will require significant medical intervention (e.g., admission to a hospital or intensive care unit, ventilation, renal replacement therapy) in order to avert permanent physical damage, long-term sequelae, and/or death.
- end organ damage e.g., kidney, respiratory, or liver failure.
- the terms “marker” and “biomarkers” are used interchangeably to refer to a measurable substance from a biological sample.
- these can comprise one or more protein data markers, one or more nucleic acid data markers, one or more metabolite data markers, or a combination thereof.
- the term “host biomarker” further indicates that the measurable substance is derived from the infected individual, rather than the infecting pathogen.
- stratification refers to the division of a group of individuals into subgroups, based on one or more shared characteristics, such as derived from the observable or measured biological parameters.
- the division can be based on a characteristic already known relevant to the outcome, such as age, sex, or having a pre-existing condition, or it can be based on clusters identified in observable or measured biological parameters using any of a variety of data cluster analysis techniques.
- clustering refers to the grouping of individuals or samples based on one or more shared characteristics, such as derived from the observable or measured biological parameters. For example, these can comprise one or more host biomarkers, one or more clinical outcomes data, one or more administrative health data, or a combination thereof. Clustering is performed using dedicated mathematical algorithms, here primarily via topological data analysis or cluster analysis methods.
- data quality control refers to analytic approaches including visual and mathematical approaches to cleaning data, reformatting data, applying missing data algorithms, normalizing data, standardizing data, and/or reducing the dimensionality of data based on specific criteria.
- topological data analysis refers to the analysis of datasets using techniques from topology, a study of the properties of a geometric space which allows defining continuous deformation of subspaces. Extraction of information from datasets that are high-dimensional, incomplete, and noisy is generally challenging.
- TDA methods such as the “Mapper” algorithm, enable dimensionality reduction, visualization and clustering of complex data sets.
- ensemble learning refers to the use of multiple learning algorithms described herein to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
- the terms “individual”, “subject”, “patient”, or “test individual” indicates a mammal, in particular a human or non-human primate.
- the test individual may or may not be in need of an assessment of sepsis and/or severe disease.
- the test individual is assessed prior to the detection of symptoms of sepsis.
- the test individual is assessed prior to the onset of any detectable symptoms of sepsis.
- the test individual does not have detectable symptoms of any type of sickness or condition.
- the test individual has an exposure, injury, wound, or condition that puts them at risk of developing sepsis, such as: having a viral or bacterial infection, such as but not limited to: urinary tract infection, meningitis, endocarditis, or septic arthritis; undergoing a medical surgical or dental procedure; having an open wound or trauma, such as but not limited to: a blast injury, a crush injury, an extremity wound, a gunshot wound, or a wound received in combat; suffering a nosocomial infection; having undergone medical interventions such as central line placement or intubation; having diabetes; being HIV positive; undergoing hemodialysis; and/or undergoing an organ transplant procedure (donor or receiver).
- the individual does not have a condition that puts them at risk of severe disease from sepsis, prior to application of the methods described herein.
- the individual has a condition that puts them at risk of severe disease from sepsis.
- the term “clinical outcome” indicates a measurable status or change in the health, function or quality of life of an individual with sepsis or at risk of developing sepsis. Examples include, but are not limited to: severity or duration of symptoms, need for organ support (e.g., ventilation, renal replacement therapy, or vasoactive medications), response to treatment, admission to a hospital or intensive care unit, length of stay in a hospital or intensive care unit, mortality, long-term morbidity (e.g., time to returning to activities or quality of life), incidence of long-term sequelae of infectious diseases (e.g., chronic kidney disease, cardiovascular disease, or chronic pulmonary disease), and re-hospitalization.
- organ support e.g., ventilation, renal replacement therapy, or vasoactive medications
- response to treatment admission to a hospital or intensive care unit
- length of stay in a hospital or intensive care unit length of stay in a hospital or intensive care unit
- mortality long-term morbidity (e.g., time to returning to activities or quality of life)
- Clinical outcomes may be recorded as categorical data (e.g., “yes/no”, “presence/absence”, an ordinal scale), continuous data (e.g., blood pressure), temporal data (e.g., duration of symptoms, days hospitalized), or time-to-event data (e.g., days to death, time to return to normal daily activities).
- categorical data e.g., “yes/no”, “presence/absence”, an ordinal scale
- continuous data e.g., blood pressure
- temporal data e.g., duration of symptoms, days hospitalized
- time-to-event data e.g., days to death, time to return to normal daily activities.
- the term “increased risk” or “elevated risk” indicates that the test individual has an increased chance of severe disease from sepsis.
- the reference individual is the test individual at an earlier time point, including priorto having an exposure, injury, wound, or condition that puts them at risk of severe disease from sepsis, or at an earlier point in time after having such an exposure, injury, wound, or condition.
- the increased risk may be relative or absolute and may be expressed qualitatively or quantitatively. For example, an increased risk may be expressed as simply determining the individual’s risk profile and placing them in an “increased risk” category, based upon previous studies. Alternatively, a numerical expression of the individual’s increased risk may be determined based upon the risk profile.
- examples of expressions of an increased risk include, but are not limited to: odds, probability, odds ratio, p-value, attributable risk, biomarker index score, relative frequency, positive predictive value, negative predictive value, risk, relative risk, hazard, and hazard ratio.
- Risk may be determined based on predicting a specific clinical outcome in the individual; for example, predicted outcome may include an indication of whether the individual will or will not experience a specific clinical event within a specific timeframe, or an indication of a likelihood that the individual will or will not experience a specific clinical event within a specific timeframe.
- the attributable risk can also be used to express an increased risk.
- the AR describes the proportion of individuals in a population exhibiting a specific outcome (e.g., mortality, hospitalization, or long-term sequelae) to a specific member of the risk profile.
- AR may also be important in quantifying the role of individual components (specific member) in condition etiology and in terms of the public health impact of the individual risk factor.
- the public health relevance of the AR measurement lies in estimating the proportion of cases of a clinical outcome among individuals in the population that could be prevented if the profile or individual factor were absent.
- Clinical parameters include various factors associated with an individual experiencing symptoms of a disease or condition, or in measurable changes in health, function, or quality of life.
- Examples of clinical parameters of an individual include, but are not limited to: proteins, nucleic acids, metabolites, clinical outcomes, clinical laboratory data, physiological monitoring data, and administrative health data.
- nucleic acids include, but are not limited to the level of any one or more of the following in a biological sample from the individual: adhesion G protein-coupled receptor E1 (ADGRE1), adrenoceptor b2 (ADRB2), angiotensin II receptor associated protein (AGTRAP), AKT serine/threonine kinase 1 (AKT1), 5'-aminolevulinate synthase 2 (ALAS2), alkaline phosphatase, biomineralization associated (ALPL), ankyrin repeat domain 22 (ANKRD22), annexin A3 (ANXA3), arginase 1 (ARG1), BCL2 like 1 (BCL2L1), BMX non-receptor tyrosine kinase (BMX), chromosome 6 open reading frame 62 (C6orf62), carbonic anhydrase 2 (CA2), C-C motif chemokine ligand 5 (CCL5),
- ADGRE1
- the genes are protein-coding genes.
- the genes are at least one or more of: adrenoceptor b2 (ADRB2), CD177 molecule (CD177), carboxypeptidase vitellogenic like (CPVL), C-X3-C motif chemokine receptor 1 (CX3CR1), defensin a3 (DEFA3), Fc receptor like 5 (FCRL5), G protein subunit g2 (GNG2), interleukin 10 receptor subunit a (IL10RA), kinesin light chain 3 (KLC3), oleoyl-ACP hydrolase (OLAH), pyruvate kinase M1/2 (PKM), radical S-adenosyl methionine domain containing 2 (RSAD2), STE20 related adaptor b (STRADB), tyrosylprotein sulfotransferase 1 (TPST1), tetraspanin 5 (TSPAN5), t
- ADRB2 adren
- proteins include, but are not limited to the level of any one or more of the following in a biological sample from the individual: a disintegrin and metalloproteinase with thrombospondin motifs 13 (ADAMTS13), Angiopoietin-1 (ANGPT1), Angiopoietin-2 (ANGPT2), C-C chemokine receptor ligand 2/monocyte chemoattractant protein 1 (CCL2/MCP-1), C-C chemokine receptor ligand 3/ macrophage inflammatory protein 1-alpha (CCL3/MIP-1-a), C-C chemokine receptor ligand 5/regulated on activation, normal T cell expressed and secreted (CCL5/RANTES), cluster of differentiation 163 (CD163), cluster of differentiation 40 ligand (CD40L), chitinase-3-like protein 1 (CHI3L1), C-reactive protein (CRP), C-X-C motif chemokine ligand 10/interferon
- the proteins are at least one or more of: C-reactive protein (CRP), C- X-C motif chemokine ligand 10/interferon gamma-induced protein 10 (CXCL 10/1 P-10), D-dimer, ferritins, fibrinogens, (soluble) intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNY), interleukin-1 receptor antagonist (IL-1RA), interleukin-5 (IL-5), interleukin-6 (IL-6), interleukin-6 receptor a (IL-6Ra), interleukin-18 (IL-18), interleukin-18-binding protein (IL-18BP), lipocalin-2 (LCN-2), matrix metalloproteinase-8 (MMP-8), procalcitonin (PCT), (soluble) receptor for advanced glycation end products (RAGE), tumor necrosis factor receptor 1 (TNF-R1), tumor necrosis factor alpha (TNFa), vascular endothelial RI, INFa
- Examples of the metabolites include, but are not limited to the level of any one or more of the following in a biological sample from the individual: fatty acyls and their constituent molecular species, glycerolipids and their constituent molecular species, glycerophospholipids and their constituent molecular species, sphingolipids and their constituent molecular species, sterol lipids and their constituent molecular species, prenol lipids and their constituent molecular species, saccharolipids and their constituent molecular species, polyketides and their constituent molecular species, carbohydrates and their constituent molecular species, organic acids and their derivatives and constituent molecular species, organo-heterocyclic compounds and their constituent molecular species, organo-oxygen compounds and their constituent molecular species, organo-nitrogen compounds and their constituent molecular species, amino acids and their constituent molecular species, peptides and their constituent molecular species, and nucleosides and their constituent molecular species.
- the metabolites are at least one or more of: carnitine, acetylcarnitine, propionylcarnitine, malonylcarnitine, methylmalonylcarnitine, hydroxypropionylcarnitine, propenoylcarnitine, butyrylcarnitine, hydroxybutyrylcarnitine, fumarylcarnitine, valerylcarnitine, glutarylcarnitine, hydroxyvalerylcarnitine, tiglylcarnitine, hexanoylcarnitine, hydroxyhexanoylcarnitine, pimeloylcarnitine, decanoylcarnitine, decadienylcarnitine, tetradecenoylcarnitine, hydroxytetradecenoylcarnitine, hydroxytetradecadienylcarnitine, hexadecanoylcarnitine, hexadecenoylcarnitine, oc
- Examples of clinical outcome data include, but are not limited to any one or more of the following: severity or duration of symptoms, time to symptom onset or abatement, need for organ support, duration of organ support, response to treatment, admission to a hospital or intensive care unit, length of stay in a hospital or intensive care unit, mortality, time to death, duration of morbidity (e.g., time to returning to normal daily activities or quality of life), incidence of long-term sequelae of infectious diseases, and re-hospitalization.
- Examples of administrative health data include, but are not limited to any one or more of: baseline demographics (e.g., age, sex, ethnicity), physiological parameters (e.g., body mass index, heart rate, respiratory rate, body temperature), comorbid conditions including but not limited to immunocompromising conditions (e.g., history of chronic kidney disease, history of hepatic disease, pulmonary hypertension, dementia, having diabetes, being HIV positive, tobacco use, alcohol use, drug use, or pregnancy), past surgical history (e.g., central line placement, organ transplant donor or recipient), and environmental or social exposures (e.g., living situation, travel history, contact with livestock.
- baseline demographics e.g., age, sex, ethnicity
- physiological parameters e.g., body mass index, heart rate, respiratory rate, body temperature
- comorbid conditions including but not limited to immunocompromising conditions (e.g., history of chronic kidney disease, history of hepatic disease, pulmonary hypertension, dementia, having diabetes, being HIV positive, tobacco use, alcohol use, drug
- the clinical parameters may include one or more biological effectors and/or one or more non-biological effectors.
- biological effector is used to mean a molecule, such as, but not limited to: a protein, a peptide, a carbohydrate, a complex lipid, a fatty acid, an amino acid, a biogenic amine, a nucleic acid, a glycoprotein, or a proteoglycan, that can be assayed.
- Specific examples of biological effectors can include: cytokines, growth factors, antibodies, hormones, cell surface receptors, cell surface proteins, lipid mediators, or carbohydrates. More specific examples of biological effectors include, but are not limited to the the genes, proteins, and metabolites described herein.
- the biological effectors are soluble.
- the biological effectors are membrane-bound, such as a cell surface receptor.
- the biological effectors are intracellular.
- the biological effectors are nucleic acids (e.g., messenger RNA, transfer RNA, micro RNA, long-noncoding RNA, silencing RNA, short hairpin RNA, or DNA).
- the biological effectors are detectable in a fluid sample of an individual, such as serum, and/or plasma.
- the biological effectors are measurable in a biological sample of an individual, such as blood plasma, wound effluent, or sputum.
- non-biological effector is a clinical parameter that is generally considered not to be a specific molecule. Although not a specific molecule, a non-biological effector may nonetheless still be quantifiable, either through routine measurements or through measurements that stratify the data being assessed. For example, heart rate, change in heart rate over time, respiratory rate, body temperature, blood pressure, body mass index, and other parameters would be a non-biological effector component of the risk profile. All these components are measurable or quantifiable using routine methods and equipment. Other non-biological components include data that may not be readily or routinely quantifiable or that may require a practitioner’s judgment or opinion.
- peripheral vascular disease may be a quantifiable aspect of the risk profile. While there may be published guidance on classifying and diagnosing these aspects of the risk profile, assigning a numerical value to the severity, still involves observation and, to a certain extent, judgment or opinion.
- the quantity or measurement assigned to a non-biological effector could be binary, e.g., “0” if absent or ⁇ ” if present.
- the non-biological effector aspect of the risk profile may involve qualitative components that cannot or should not be quantified.
- Levels of the clinical parameters can be assayed, detected, measured, and/or determined in a sample taken or isolated from an individual. “Sample” and “test sample” are used interchangeably herein.
- test samples or sources of clinical parameters include, but are not limited to: biological fluids and/or tissues isolated from an individual or patient, which can be tested by the methods of the present application described herein, and include but are not limited to: whole blood, peripheral blood, capillary blood, serum, plasma, cerebrospinal fluid, wound effluent, urine, amniotic fluid, peritoneal fluid, pleural fluid, lymph fluids, various external secretions of the respiratory, intestinal, and genitourinary tracts, various components of exhaled breath, tears, sweat, saliva, white blood cells, tissue biopsies, and combinations thereof.
- data quality control involves at least one of differential expression algorithms, principal component analysis, k-nearest neighbor imputation algorithms, three-sigma rule algorithms, or empirical Bayes method algorithms.
- Differential expression algorithms determine the fold-change from a reference sample and the p-value for the statistical difference between the sample and the reference value, which are used as decision metrics for inclusion or exclusion.
- Principal component analysis identifies the key variables in a multidimensional data set that explain the differences in the observations (variance) and can be used to determine if groups separate according to a priori knowledge about the samples.
- Nearest neighbor imputation utilizes k-nearest neighbor algorithms to predict discrete and continues values for a potential missing value.
- biomarker data protein-based, nucleic acid- based, or metabolite-based
- biomarker data may be subsetted by a variance metric, wherein a threshold for variance is set as an inclusion or exclusion criteria (e.g., only markers with a variance greater than three standard deviations will be included).
- Empirical Bayes method algorithms utilize the estimated distributions from the data to establish prior distributions, and are used to approximate values in a data set and subset data based on the parameters of the estimated distribution.
- feature selection involves at least one of: unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s t-test, Mann-Whitney U test, random forest, or logistic regression.
- Minimum redundancy maximum relevance involves selecting features that have high correlation to the classification variable but are mathematically far away from each other.
- a Student’s t-test utilizes the mean and variance of two distributions to generate a t-statistic and calculate the probability that the data comes from a distribution that is true under the null hypothesis.
- a Mann-Whitney U test is a non-parametric test that utilizes a rank-order approach to test the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample.
- Random forest approaches include a large number (100s-10,000s) of decision trees, each of which is generated by bootstrap aggregating, where for each decision tree the discovery data is randomly sampled with replacement to generate a randomly sampled set of discovery data, and subsequently the decision tree is trained on the randomly sampled set of discovery data.
- the discovery data is sampled based on the reduced set of variables from variable selection (as opposed to sampling based on all variables).
- feature selection may involve ensemble learning methods. Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
- the feature selection ensemble learning models include combinations of the models described herein for cluster analysis and machine learning.
- the feature selection ensemble learning models may comprise: cluster analysis, unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s t-test, Mann-Whitney U test, random forest, logistic regression, neural networks, or a combination thereof.
- Ensembles may also comprise: Bayes optimal classifier, classification and regression tree, bootstrap aggregating, boosting, Bayesian model averaging, Bayesian model combination, a bucket of models, stacking, or a combination thereof.
- data may be stratified prior to feature selection.
- This data stratification may be achieved by using unsupervised or supervised machine learning models, including but not limited to: topological data analysis, k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear clustering (e.g., t-distributed stochastic neighbor embedding), consensus clustering, or spectral clustering.
- systems, methods, and a non-transitory computer-readable medium of the present disclosure can execute a process by which data is aggregated about one or more individuals, machine learning algorithms perform data-mining procedures, pattern recognition, intelligent prediction, and other artificial intelligence procedures, such as for enabling prognostic or diagnostic predictions (e.g., predicting hospitalization, predicting mortality, diagnosing sepsis phenotype, detecting pathogen or pathogen class) based on clinical data (e.g., age, sex, medical history) and/or biological data (e.g., protein-based biomarkers, nucleic acid-based biomarkers, metabolite-based biomarkers, organ system function, or physiologic parameters such as heart rate).
- prognostic or diagnostic predictions e.g., predicting hospitalization, predicting mortality, diagnosing sepsis phenotype, detecting pathogen or pathogen class
- clinical data e.g., age, sex, medical history
- biological data e.g., protein-based biomark
- Machine learning and ensemble learning algorithms are increasingly being implemented to reveal knowledge structures to guide decisions in conditions of limited certainty, which can lead to improved decision making. This would not be possible with the use of manual techniques or traditional algorithmic approaches, because of the large number of data points involved, as well as the specific approach and data pipeline used in the analysis. However, in order to use machine learning algorithms effectively and obtain optimal results out of existing data, a machine learning engine comprising a specific sequence of approaches and feature selection implemented by machine learning or ensemble learning algorithms may be required.
- Constructing such a machine learning engine and executing these machine learning or ensemble learning algorithms can improve the performance of diagnostic and prognostic prediction technologies. These improvements may include, but are not limited to increasing the accuracy, selectivity, and/or specificity of models used to perform the diagnostic or prognostic predictions. Therefore, such an engine can improve decision-making for, and delivery of treatments to, individuals and patients. While various machine learning or ensemble learning algorithms can be used for such purposes, generating a machine learning engine with desired performance characteristics can be highly domain-specific, requiring rigorous modeling, testing, and validation to select appropriate algorithms (or combinations thereof) and the parameters modeled with the algorithms to generate the machine learning system.
- the machine learning engine may be constructed to include five major components: (1) initial data exploration, (2) data quality control, (3) stratification, (4) feature selection and outcome modeling, (5) deployment and self-improvement. It will be understood by those possessing ordinary skill in the art that these stages may not be discrete entities and there may be overlap between them, and that the output from each stage may be used to inform, calibrate, and/or improve other stages of the machine learning engine.
- the initial data stage may include data preparation.
- Data preparation may include cleaning data (e.g., searching for outlying data, applying missing data algorithms, altering data formats), transforming data, and selecting subsets of records in case of data sets with large numbers of variables (“fields or dimensions”).
- the data on which data preparation is performed may be referred to as “discovery data”.
- data preparation can include executing pre-processing operations on the data.
- missing data may be handled through the execution of imputation algorithms that interpolate and/or estimate missing values.
- imputation involves generating a distribution (e.g., Gaussian, Poisson, binomial, zero-inflation, beta, pert) of available data for a clinical parameter having missing data, and interpolating values for the missing data based on the distribution.
- Other examples of handling missing data may involve k-nearest neighbor imputation.
- data may be screened for outliers and non-random variation (e.g., batch effects related to analytical platform, collection site, operator that are known or suspected a priori).
- Data outliers and non-random variation may be initially identified using the ‘three-sigma rule’ or principal component analysis and assessed on a case-by-case basis.
- Non- random variation in the data may be corrected primarily using empirical Bayesian methods.
- the R software function “ComBat” is widely used in biomedical research to correct data sets that contain known batch effects.
- data quality control can include reducing the dimensionality of data (e.g., protein marker data, nucleic acid marker data, metabolite marker data, clinical outcome data, administrative health data) via specific algorithms or analytic approaches.
- data e.g., protein marker data, nucleic acid marker data, metabolite marker data, clinical outcome data, administrative health data
- host biomarkers protein-based, nucleic acid-based, or metabolite-based
- Subsetting such data may be performed by conducting a differential expression analysis wherein the fold-change from a reference sample and the p-value for the statistical difference between the sample and the reference value are used as decision metrics for inclusion or exclusion.
- biomarker data protein-based, nucleic acid-based, or metabolite-based
- biomarker data may be subsetted by a variance metric, wherein a threshold for variance is set as an inclusion or exclusion criteria (e.g., only markers with a variance greater than three standard deviations will be included).
- data quality control algorithms may comprise: supervised machine learning algorithm, differential expression algorithms, principal component analysis, k-nearest neighbor imputation algorithms, three-sigma rule algorithms, or empirical Bayes method algorithms, or a combination thereof.
- clinical parameter data can be stratified using cluster analysis algorithms, which discretize information based on measures of similarity.
- individuals or samples are assigned to a discrete set of groups (clusters) based on one or more shared characteristics, such as derived from the observable or measured clinical parameters.
- these can comprise one or more host biomarkers, one or more clinical outcomes data, one or more administrative health data, or a combination thereof.
- a “phenotype” may thus be defined as the set of clinical parameter values that underlies a distinct cluster of individuals or samples.
- the cluster analysis algorithms may comprise: k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear clustering (e.g., t-distributed stochastic neighbor embedding), consensus clustering, or spectral clustering.
- clinical parameter data can be stratified using topological data analysis (TDA).
- TDA topological data analysis
- Unsupervised TDA approaches such as the “Mapper” algorithm, can be used to represent highly complex data in a structured, two-dimensional network that retains the geometric “shape” (topology) of the data.
- Individuals or samples with a high degree of similarity for example of host gene, protein and/or metabolite expression profiles, form groups of highly interconnected nodes which represent distinct subgroups/populations within the dataset.
- TDA is able to reflect the continuous nature of many types of biological data. For example, it can capture how groups of individuals with different characteristics relate to one another, or form trends along specific axes.
- Groups of individuals or samples within a TDA networks can be delineated based on the persistence homology of their node density and connectivity (edges).
- a “phenotype” may thus be defined as the set of clinical parameter values that underlies a distinct TDA group of individuals or samples. Differences in the biological effectors, non-biological effectors, and/or additional metadata between phenotypes can be independently assessed for their statistical significance. Membership of a specific disease-response phenotype constitutes valuable information about an individual, and stratifying heterogeneous data sets in this manner can improve feature selection, machine learning, and predictive modelling approaches.
- FIG. 1 the process of predicting severe disease among individuals with sepsis or at risk of developing sepsis and its components are shown and described below.
- the process begins with the acquisition of discovery data 100 and executes a data quality control 112 process in a data quality control engine 114, performs topological data analysis and/or clustering 118 in a data stratification engine 120, executes feature selection and classification and/or time-to-event analyses 124 in a feature selection and outcome modeling engine 126, and the model(s) is/are deployed for prediction 132 in a prediction engine 134.
- the discovery data 102 comprises protein data 104, nucleic acid data 106, metabolite data 111 , clinical outcomes data 108, and administrative health data 110.
- the protein data 104 may include, but are not limited to one or more of: a disintegrin and metalloproteinase with thrombospondin motifs 13 (ADAMTS13), Angiopoietin-1 (ANGPT1), Angiopoietin-2 (ANGPT2), C-C chemokine receptor ligand 2/monocyte chemoattractant protein 1 (CCL2/MCP-1), C-C chemokine receptor ligand 3/ macrophage inflammatory protein 1-alpha (CCL3/MIP-1-a), C-C chemokine receptor ligand 5/regulated on activation, normal T cell expressed and secreted (CCL5/RANTES), cluster of differentiation 163 (CD163), cluster of differentiation 40 ligand (CD40L), chitinase-3-like protein 1 (CHI3L1), C- reactive protein (CRP), C-X-C motif chemokine ligand 10/interferon gamma-induced protein 10 (
- ADAMTS13
- the protein markers are at least one or more of: C-reactive protein (CRP), C-X-C motif chemokine ligand 10/interferon gamma-induced protein 10 (CXCL10/IP-10), D-dimer, ferritins, fibrinogens, (soluble) intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNy), interleukin-1 receptor antagonist (IL-1RA), interleukin-5 (IL-5), interleukin-6 (IL-6), interleukin-6 receptor a (IL-6Ra), interleukin-18 (IL-18), interleukin-18-binding protein (IL-18BP), lipocalin-2 (LCN-2), matrix metalloproteinase-8 (MMP-8), procalcitonin (PCT), (soluble) receptor for advanced glycation end products (RAGE), tumor necrosis factor receptor 1 (TNF-R1),
- the nucleic acid data 106 may include, but are not limited to one or more of: adhesion G protein-coupled receptor E1 (ADGRE1), adrenoceptor b2 (ADRB2), angiotensin II receptor associated protein (AGTRAP), AKT serine/threonine kinase 1 (AKT1), 5'- aminolevulinate synthase 2 (ALAS2), alkaline phosphatase, biomineralization associated (ALPL), ankyrin repeat domain 22 (ANKRD22), annexin A3 (ANXA3), arginase 1 (ARG1), BCL2 like 1 (BCL2L1), BMX non-receptor tyrosine kinase (BMX), chromosome 6 open reading frame 62 (C6orf62), carbonic anhydrase 2 (CA2), C-C motif chemokine ligand 5 (CCL5), C-C motif chemokine receptor 3
- ADGRE1 adh
- the nucleic acid markers are at least one or more of: adrenoceptor b2 (ADRB2), CD177 molecule (CD177), carboxypeptidase vitellogenic like (CPVL), C-X3-C motif chemokine receptor 1 (CX3CR1), defensin a3 (DEFA3), Fc receptor like 5 (FCRL5), G protein subunit y2 (GNG2), interleukin 10 receptor subunit a (IL10RA), kinesin light chain 3 (KLC3), oleoyl-ACP hydrolase (OLAH), pyruvate kinase M1/2 (PKM), radical S-adenosyl methionine domain containing 2 (RSAD2), STE20 related adaptor b (STRADB), tyrosylprotein sulfotransferase 1 (TPST1), tetraspanin 5 (TSPAN5), tetratricopeptide repeat domain 9
- ADRB2
- the metabolite data 111 may include, but are not limited to one or more of: fatty acyls and their constituent molecular species, glycerolipids and their constituent molecular species, glycerophospholipids and their constituent molecular species, sphingolipids and their constituent molecular species, sterol lipids and their constituent molecular species, prenol lipids and their constituent molecular species, saccharolipids and their constituent molecular species, polyketides and their constituent molecular species, carbohydrates and their constituent molecular species, organic acids and their derivatives and constituent molecular species, organo-heterocyclic compounds and their constituent molecular species, organo-oxygen compounds and their constituent molecular species, organo-nitrogen compounds and their constituent molecular species, amino acids and their constituent molecular species, peptides and their constituent molecular species, and nucleosides and their constituent molecular species.
- the metabolite markers are at least one or more of: carnitine, acetylcarnitine, propionylcarnitine, malonylcarnitine, methylmalonylcarnitine, hydroxypropionylcarnitine, propenoylcarnitine, butyrylcarnitine, hydroxybutyrylcarnitine, fumarylcarnitine, valerylcarnitine, glutarylcarnitine, hydroxyvalerylcarnitine, tiglylcarnitine, hexanoylcarnitine, hydroxyhexanoylcarnitine, pimeloylcarnitine, decanoylcarnitine, decadienylcarnitine, tetradecenoylcarnitine, hydroxytetradecenoylcarnitine, hydroxytetradecadienylcarnitine, hexadecanoylcarnitine, hexadecenoylcarnitine, o
- the clinical outcomes data 108 may include, but are not limited to one or more of: severity or duration of symptoms, time to symptom onset or abatement, need for organ support, duration of organ support, response to treatment, admission to a hospital or intensive care unit, length of stay in a hospital or intensive care unit, mortality, time to death, duration of morbidity, incidence of long-term sequelae of infectious diseases, and re-hospitalization.
- the administrative health data 110 may include, but are not limited to one or more of: baseline demographics, physiologic parameters, comorbid conditions including but not limited to immunocompromising conditions, past surgical history, and environmental or social exposures.
- data quality control 112 occurs in the data quality control engine 114, which executes a series of data quality control algorithms 116A-116N (hereinafter referred to individually as “item 116A,” and generically as “item 116”) which subset data to be used in topological data analysis and/or clustering 118.
- the data quality control algorithms and general approach may vary depending on the characteristics of each unique data set. For example, host biomarkers (protein-based, nucleic acid-based, or metabolite-based) may be measured using multiplex assays that generate data on thousands of markers.
- Subsetting such data may be performed by conducting a differential expression analysis wherein the fold-change from a reference sample and the p-value for the statistical difference between the sample and the reference value are used as decision metrics for inclusion or exclusion.
- biomarker data protein-based, nucleic acid-based, or metabolite-based
- a threshold for variance is set as an inclusion or exclusion criteria (e.g., only markers with a variance greater than three standard deviations will be included). While these methods of data quality control are discussed, many more are contemplated.
- the topological data analysis and/or clustering 118 occur in the data stratification engine 120, wherein topological data analysis and/or cluster analysis algorithms 122A-122N (hereinafter referred to individually as “item 122A,” and generically as “item 122”) are deployed upon the subsetted data from the data quality control engine 114.
- Cluster analysis algorithms 122 use supervised or unsupervised approaches to discretize highly complex data based on similarities in the observable or measured clinical parameters.
- topological data analysis algorithms 122 such as the “Mapper” algorithm, use unsupervised approaches to represent such data in a structured, two-dimensional network that retains the geometric ‘shape’ (topology) of the data correlations.
- Individuals or samples with a high degree of similarity form groups of highly interconnected nodes which represent distinct subgroups/populations within the dataset.
- Such groups within a TDA networks can be delineated based on the persistence homology of their node density and connectivity (edges).
- Both cluster analysis 122 and topological data analysis 122 algorithms result in the assignment of individuals or samples to a discrete set of groups/clusters based on multiple shared characteristics, thereby enabling the definition of disease-response phenotypes.
- Sepsis response phenotypes can thus be defined as the profile of biomolecular, clinical, administrative health, and/or physiologic profile data of each distinct cluster.
- phenotypes Differences in the biological effectors, non-biological effectors, and/or additional metadata between phenotypes can be assessed independently for their statistical significance. Membership of a specific sepsis response phenotype constitutes valuable information about an individual, and stratifying heterogeneous data sets in this manner can improve feature selection, machine learning, and predictive modelling approaches.
- feature selection and classification and/or time-to-event analysis 124 occur in the feature selection and outcome modeling engine 126.
- Feature selection 124 involves the use of feature selection algorithms 128A-128N (hereinafter referred to individually as “item 128A,” and generically as “item 128N”) to select features (e.g., variables, parameters) for improving outcome modeling performance (as measured by model performance metrics), optimizing computational resources, removing confounders and/or mediating factors, and for temporal and/or causational interpretation.
- features e.g., variables, parameters
- data may be stratified in the data stratification engine 120 prior to feature selection.
- data stratification prior to feature selection may be achieved by using other unsupervised or supervised machine learning models, including but not limited to: k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear clustering (e.g., t-distributed stochastic neighbor embedding), consensus clustering, or spectral clustering.
- the data on which the feature selection is performed may be referred to as “discovery data”.
- classification and/or time-to-event analysis 124 involves the use of the classification and time-to-event analysis algorithms 130A-130N (hereinafter referred to individually as “item 130A,” and generically as “item 130N”) to calculate the prediction score for clinical outcomes in individuals with sepsis or at risk of developing sepsis (outcome modeling).
- the prediction 132 involves the prediction of severe disease in individuals with sepsis or at risk of developing sepsis. This is performed in the prediction engine, which houses trained machine learned algorithms (e.g., trained data quality control algorithms, trained data stratification algorithms trained feature selection algorithms, trained classification and/or time-to-event analysis algorithms).
- the prediction engine 134 utilizes the trained machine learned algorithms to calculate and provide a clinical outcome prediction score 136 for predicting severe disease in individuals with sepsis or at risk of developing sepsis.
- the classification and/or time-to-event analysis algorithms 130 may include incidence rates by categorical variables or continuous variables.
- the classification and/or time-to-event analysis algorithms 130 may also include Kaplan-Meier estimators, Cox proportional-hazards models, cumulative incidence functions, or accelerated failure time models. While these classification and time-to-event analysis algorithms are discussed, others are contemplated.
- the Severe Disease in Sepsis Prediction System 200 includes discovery data 202, a machine learning engine 204 that is comprised of data quality control algorithms 206, topological data analysis and/or clustering algorithms 208, feature selection and classification and/or time-to-event analysis algorithms 210, and a prediction engine 212.
- An additional prediction engine 214 is housed outside the machine learning engine but is connected to the Severe Disease in Sepsis Prediction System 200 and can feed data and models bi-directionally.
- the prediction engine 212 can predict severe disease from sepsis specific to at least one second individual.
- the prediction engine 212 can receive, for the at least one second individual, a second value of at least one clinical parameter of the plurality of clinical parameters.
- at least one of the received second values corresponds to a model parameter of the subset of model parameters used in the feature selection and outcome modeling engine 126. If the prediction engine 212 receives several second values of clinical parameters, of which at least one does not correspond to a model parameter of the subset of model parameters, the prediction engine 212 may execute an imputation algorithm to generate a value for such a missing parameter.
- the prediction engine 212 can execute the feature selection and classification and time-to-event analysis algorithms 210 using the second value of the at least one clinical parameter to calculate the severe disease risk to the at least one second individual.
- the classification and time-to-event analysis algorithms 210 may include a Kaplan-Meier estimator wherein the topological data analysis and/or clustering 208 and feature selection algorithms 128 may provide a categorical variable as the predictor for the Kaplan-Meier estimator, resulting in a severe disease risk prediction and a confidence interval for each category by providing a hazard ratio for each group.
- a Cox Proportional-Hazards model may include the categorical variable provided from the topological data analysis and/or clustering 208, and at least one or more clinical parameters as covariates to improve the accuracy of the model, resulting in the Cox Proportional-Hazards model providing a hazard ratio for each group provided by the topological data analysis and/or clustering 208, as well as the confidence intervals for the categorical variable and each of the covariates.
- the prediction engine 212 can output a prediction that the second individual will experience severe disease from sepsis based on the overall probabilities (e.g., based on a ratio of the overall probabilities).
- the additional prediction engine 214 may be house outside the Severe Disease in Sepsis Prediction System 200 and may contained machine learned models, but is connected to the Severe Disease in Sepsis Prediction System 200 and may feed data and models bi-directionally.
- the discovery data 300 comprises protein data 104, nucleic acid data 106, metabolite data 111 , clinical outcomes data 108, and administrative health data 110.
- preprocessing is executed on the discovery data 300. Pre-processing may be performed before data quality control 302 and/or topological data analysis and/or clustering 304 are performed on the data.
- an imputation algorithm can be executed to generate values for missing data in the discovery data 300.
- at least one of up-sampling or predictor rank transformations is executed on the data of the discovery database. Up-sampling and/or predictor rank transformation can be executed only for variable selection to accommodate class imbalance and non-normality in the data. While up-sampling or predictor rank transformations are discussed, many others are contemplated.
- the dimensionality of data may be reduced via specific algorithms or analytic approaches.
- protein data 104, nucleic acid data 106 and/or metabolite data 111 may be generated using multiplex assays that generate data on thousands of markers. Subsetting such data may be performed by conducting a differential expression analysis wherein the fold-change from a reference sample and the p-value for the statistical difference between the sample and the reference value are used as decision metrics for inclusion or exclusion.
- biomarker data protein-based, nucleic acid-based, or metabolite-based
- biomarker data may be subsetted by a variance metric, wherein a threshold for variance is set as an inclusion or exclusion criteria (e.g., only markers with a variance greater than three standard deviations will be included). While these methods of data quality control are discussed, many more are contemplated.
- the cluster analysis discretizes highly complex data based on similarities in the plurality of subsets of clinical parameters.
- the topological data analysis groups individuals or samples based on similarities in the plurality of subsets of clinical parameters, as well as the algebraic topology of the same data, and clusters are delineated based on persistence homology of the node density and connectivity. Sepsis response phenotypes are then defined based on the identified clusters using either approach.
- feature selection and classification and/or time-to-event analysis 306 one or more feature selection machine learning or ensemble learning models, and classification and/or time- to event analysis algorithms are executed.
- the subsets of model parameters are selected from the plurality of clinical parameters of the discovery data 300, such that a count of each subset of model parameters is less than a count of the clinical parameters.
- Feature selection machine learning engines such as constraint-based algorithms, constrain-based structure learning algorithms, and/or constraint-based local discovery learning algorithms can be used to select the subsets of model parameters.
- the machine learning engine 204 can execute machine learning algorithms such as minimum redundancy maximum relevance, Student’s t-test, Mann-Whitney U test, random forest, and logistic regression.
- the clinical parameters are randomly re-ordered prior to feature selection.
- data may be stratified in the data stratification engine 120 prior to feature selection.
- data stratification prior to feature selection may be achieved by using other unsupervised or supervised machine learning models, including but not limited to topological data analysis, k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear cluster (e.g., t- distributed Stochastic neighbor embedding), consensus clustering, or spectral clustering.
- unsupervised or supervised machine learning models including but not limited to topological data analysis, k-means clustering, hierarchical clustering, nearest neighbor clustering, non-linear cluster (e.g., t- distributed Stochastic neighbor embedding), consensus clustering, or spectral clustering.
- one or more models and/or algorithms that are designed to classify the probability that a given individual or given sample belongs to a particular group may be used.
- the machine learning engine 204 can execute a regression model, a pattern recognition algorithm, a decision tree, or other machine learning algorithm to calculate a risk, risk ratio, odds, odds ratio, or other probability output. While these models and/or algorithms are discussed, others are contemplated.
- one or more models and/or algorithms that are designed to forecast or predict duration of time until one or more events (e.g., death of a biological organism) may be used.
- the machine learning engine 204 can execute a log-rank test, a Kaplan-Meier function, a survival function, a hazard function, a Cox Proportional-Hazards regression, survival trees, survival random forests, or calculate life tables. While these models and/or algorithms are discussed, others are contemplated.
- At risk prediction 308 second values of clinical parameters are received.
- the second values may be received for at least one second individual.
- at least one of the received second values corresponds to a model parameter of the subset of model parameters used in the classification and/or time-to-event analysis machine learning algorithm 306.
- an imputation algorithm may be executed to generate a value for such a missing parameter.
- the candidate classification machine learning is executed using the corresponding subset of model parameters and the second value of the at least one clinical parameter to calculate the prediction of the clinical outcome specific to the at least one second individual.
- the predicted outcome specific to the at least one second individual is outputted.
- the predicted outcome may be displayed on an electronic device to a user or may be provided as an audio output.
- the predicted outcome may be transmitted to another device.
- the predicted outcome may include at least one of an indication that the second individual has sepsis, that the second individual is likely to have sepsis (e.g., relative to a confidence threshold), or that the second individual has an increased risk for experiencing severe disease from sepsis relative to a reference risk level.
- methods for predicting severe disease in an individual with sepsis and/or assessing risk factors comprising, consisting of, or consisting essentially of measuring, assessing, detecting, assaying, and/or determining one or more clinical parameters, such as one or more selected from level of the following in a sample from the individual: adhesion G protein-coupled receptor E1 (ADGRE1), adrenoceptor b2 (ADRB2), angiotensin II receptor associated protein (AGTRAP), AKT serine/threonine kinase 1 (AKT1), 5'-aminolevulinate synthase 2 (ALAS2), alkaline phosphatase, biomineralization associated (ALPL), ankyrin repeat domain 22 (ANKRD22), annexin A3 (ANXA3), arginase 1 (ARG1), BCL2 like 1 (BCL2L1), BMX
- 2, 3, 4, 5, 6, 7, or 8 clinical parameters are measured, assessed, detected, assayed, and/ or determined.
- one or more samples is taken or isolated from the individual. In embodiments, at least 1 , at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11 , at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 samples are taken or isolated from the individual.
- the one or more samples may or may not be processed prior assaying levels of the factors, risk factors, biomarkers, clinical parameters, and/or components.
- whole blood may be taken from an individual and the blood sample may be processed, e.g., centrifuged, to isolate plasma or serum from the blood.
- the one or more samples may or may not be stored, e.g., frozen, prior to processing or analysis.
- levels of individual biomarkers in a sample isolated from an individual are assessed, detected, measured, and/or determined using one or more biological methods, such as but not limited to: ELISA assays; Western Blot; multiplexed immunoassays; quantitative arrays; PCR; RNA sequencing; DNA sequencing; Northern Blot analysis; Luminex proteomic data; RNA-seq; transcriptomic data; quantitative polymerase chain reaction (qPCR) data; microarray, mass spectrometry (MS); MS in conjunction with liquid chromatography (LC), gas chromatography (GC), or supercritical fluid chromatography (SFC); or quantitative bacteriology data.
- biological methods such as but not limited to: ELISA assays; Western Blot; multiplexed immunoassays; quantitative arrays; PCR; RNA sequencing; DNA sequencing; Northern Blot analysis; Luminex proteomic data; RNA-seq; transcriptomic data; quantitative polymerase chain reaction (qPCR) data; microarray, mass spectrometry (MS
- the biomarkers include nucleic acids, proteins, and metabolites isolated from biological samples, for example tissue, organ, exhaled breath, or biological fluids of an individual.
- biological fluids include: whole blood, serum, plasma, sweat, urine, saliva, sputum, peritoneal fluid, wound effluent, and spinal fluid.
- biomarkers To determine levels of clinical parameters, particularly biomarkers, it is not necessary that an entire biomarker molecule, e.g., a full-length protein or an entire RNA transcript, be present or fully sequenced. In other words, determining levels of, for example, a fragment of protein being analyzed may be sufficient to conclude or assess that an individual component of the risk profile being analyzed is increased or decreased. Similarly, if, for example, arrays or blots are used to determine biomarker levels, the presence, absence, and/or strength of a detectable signal may be sufficient to assess levels of biomarkers.
- clinical parameters are detected, measured, assayed, assessed, and/or determined in a sample isolated from the individual at different time points, such as before, at a first time point after, and/or at a subsequent time point after the individual has an exposure, injury, wound, or condition that puts them at risk of severe disease from sepsis, such as having a viral or bacterial infection, undergoing a medical surgical or dental procedure, having an open wound or trauma, undergoing hemodialysis, or undergoing an organ transplant procedure.
- embodiments of the methods described herein may comprise detecting biomarkers at two, three, four, five, six, seven, eight, nine, 10 or even more time points over a period of time, such as a week or more, two weeks or more, three weeks or more, four weeks or more, a month or more, two months or more, three months or more, four months or more, five months or more, six months or more, seven months or more, eight months or more, nine months or more, ten months or more, 11 months or more, a year or more or even two years or longer.
- the methods also include embodiments in which the individual is assessed before and/or during and/or after treatment for sepsis.
- the methods are useful for monitoring the efficacy of treatment of sepsis, and comprise detecting clinical parameters, such as biomarkers in a sample isolated from the individual, at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points prior to beginning treatment for sepsis and subsequently detecting clinical parameters, such as at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points after beginning of treatment for sepsis, and determining the changes, if any, in the levels detected.
- clinical parameters such as biomarkers in a sample isolated from the individual, at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points prior to beginning treatment for sepsis
- clinical parameters such as at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points after beginning of treatment for sepsis
- a risk profile for severe disease in individuals with sepsis or at risk of developing sepsis wherein the risk of severe disease consists essentially of one or more components based on one or more clinical parameters selected from the following: adhesion G protein-coupled receptor E1 (ADGRE1), adrenoceptor b2 (ADRB2), angiotensin II receptor associated protein (AGTRAP), AKT serine/threonine kinase
- ADGRE1 adhesion G protein-coupled receptor E1
- ADRB2 adrenoceptor b2
- AGTRAP angiotensin II receptor associated protein
- AKT1 5'-aminolevulinate synthase 2
- ALAS2 alkaline phosphatase
- APL biomineralization associated
- ANKRD22 annexin A3
- ARG1 arginase 1
- BCL2 like 1 BCL2L1
- BMX non-receptor tyrosine kinase
- C6orf62 carbonic anhydrase 2
- CA2 C-C motif chemokine ligand 5
- CCR3 CCR3
- CD4 molecule CD4 molecule
- CD24 CD24
- CD177 CD177
- CD274 molecule CD274 molecule
- cell division cycle 34 ubiqiutin conjugating enzyme (CDC34), complement factor D (CFD), chitinase 3 like 1 (CHI3L1), carbohydrate sulf1, alkaline phosphatase, biomineralization associated (ALPL), ankyrin repeat domain 22 (ANKRD22), annexin A
- CHST2 C-type lectin domain family 4 member E
- CLEC4E C-type lectin domain family 4 member E
- COA1 cytochrome C oxidase assembly factor 1 homolog
- CPT1A carnitine palmitoyltransferase 1A
- CSGALNACT1 carboxypeptidase vitellogenic like
- CSGALNACT1 chondroitin sulfate N-acetylgalactosaminyltransferase 1
- CST3 C-X3-C motif chemokine receptor 1
- DDIT4 DNA damage inducible transcript 4
- DEFA3 defensin a3
- DEFA4 DNA J heat shock protein family (Hsp40) member C1 (DNAJC1)
- DRAM1 DNA damage regulated autophagy modulator 1
- DUT deoxyuridine triphosphatase
- the risk of severe disease in an individual with sepsis or at risk of developing sepsis is calculated from one or more clinical parameters, two or more clinical parameters, three or more clinical parameters, four or more clinical parameters, five or more clinical parameters, six or more clinical parameters, seven or more clinical parameters, eight or more clinical parameters, nine or more clinical parameters, ten or more clinical parameters, 11 or more clinical parameters, 12 or more clinical parameters, 13 or more clinical parameters, 14 or more clinical parameters, 15 or more clinical parameters, 16 or more clinical parameters, 17 or more clinical parameters, 18 or more clinical parameters, 19 or more clinical parameters, 20 or more clinical parameters, 21 or more clinical parameters, 22 or more clinical parameters, 23 or more clinical parameters, 24 or more clinical parameters, 25 or more clinical parameters, 26 or more clinical parameters, 27 or more clinical parameters, 28 or more clinical parameters, 29 or more clinical parameters, 30 or more clinical parameters, 31 or more clinical parameters, 32 or more clinical parameters, 33 or more clinical parameters, 34 or more clinical parameters, 35 or more clinical parameters, 36 or more clinical parameters, 37 or more clinical parameters
- the risk of severe disease in an individual with sepsis or at risk of developing sepsis is calculated from 2, 3, 4, 5, 6, 7, or 8 clinical parameters such as selected from those set forth above.
- an individual is diagnosed as having an increased risk experiencing severe disease from sepsis if the individual’s five, four, three, two or even one of the components or factors herein are at abnormal levels. It should be understood that individual levels of risk factor need not be correlated with increased risk in order for the risk profile value to indicate that the individual has an increased risk of experiencing severe disease from sepsis.
- one or more clinical parameters are detected in a sample from the individual that is a biological fluid or tissue isolated from the individual.
- Biological fluids or tissues include but are not limited to: whole blood, peripheral blood, capillary blood, serum, plasma, cerebrospinal fluid, wound effluent, urine, amniotic fluid, peritoneal fluid, pleural fluid, lymph fluids, various external secretions of the respiratory, intestinal, and genitourinary tracts, various components of exhaled breath, tears, sweat, saliva, white blood cells, and tissue biopsies.
- the measurements of the individual components themselves are used in the risk profile for severe disease in an individual with sepsis or at risk of developing sepsis, and these levels can be used to provide a “binary” value to each component, e.g., “elevated” or “not elevated.”
- Each of the binary values can be converted to a number, e.g., “1” or “0,” respectively.
- the “risk of severe disease in an individual with sepsis or at risk of developing sepsis” can be a single value, number, factor or score given as an overall collective value to the individual components of the profile. For example, if each component is assigned a value, such as above, the component value may simply be the overall score of each individual or categorical value. For example, if a single categorical variable is used as the basis of the risk profile for predicting severe disease, then a hazard ratio of 2.5 might be used to convey a 250% increased risk of severe disease compared to a reference group.
- the “risk of severe disease in an individual with sepsis or at risk of developing sepsis value” could be a useful single number or score, the actual value or magnitude of which could be an indication of the actual risk of severe disease, e.g., the “more positive” the value, the greater the risk of severe disease.
- the “risk of severe disease in an individual with sepsis or at risk of developing sepsis value” can be a series of values, numbers, factors or scores given to the individual components of the overall profile.
- the “risk of severe disease in an individual with sepsis or at risk of developing sepsis value” may be a combination of values, numbers, factors or scores given to individual components of the profile as well as values, numbers, factors or scores collectively given to a group of components, such as a host biomarker portion.
- the risk profile value may comprise or consist of individual values, number or scores for specific component as well as values, numbers or scores for a group of components.
- individual values from the “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” can be used to develop a single score, such as a “combined risk index,” which may utilize weighted scores from the individual component values reduced to a diagnostic number value.
- the combined risk index may also be generated using non-weighted scores from the individual component values.
- the threshold value may be set by the combined risk index from a population of one or more control (normal) subjects.
- the value of the “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” can be the collection of data from the individual measurements, and need not be converted to a scoring system, such that the “risk profile value” is a collection of the individual measurements of the individual components of the profile.
- the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” is compared to a reference “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile”.
- the reference “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” value is calculated from clinical parameters previously detected for the individual.
- the present application also includes methods of monitoring the progression of sepsis toward severe disease in an individual, with the methods comprising determining the individual’s risk profile at more than one-time point.
- embodiments of the methods of the present application will comprise determining the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” at two, three, four, five, six, seven, eight, nine, 10 or even more time points over a period of time, such as a week or more, two weeks or more, three weeks or more, four weeks or more, a month or more, two months or more, three months or more, four months or more, five months or more, six months or more, seven months or more, eight months or more, nine months or more, ten months or more, 11 months or more, a year or more or even two years or longer.
- the methods described herein also include embodiments in which the individual’s risk profile is assessed before and/or during and/or after treatment of sepsis.
- the present application also includes methods of monitoring the efficacy of treatment of sepsis by assessing the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” over the course of the treatment and after the treatment.
- the methods of monitoring the efficacy of treatment of sepsis comprise determining the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points prior to the receipt of treatment for sepsis and subsequently determining the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” at least one, two, three, four, five, six, seven, eight, nine or 10 or more different time points after beginning of treatment for sepsis, and determining the changes, if any, in the “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” of the individual.
- the treatment may be any treatment designed to cure, remove or diminish the symptoms and/or cause(s) of sepsis.
- the reference “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” value is calculated from clinical parameters detected for a population of one or more reference subjects when the reference subjects did not have detectable signs that would put them at risk for severe disease.
- the reference “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” value is calculated from clinical parameters detected for a population of reference subjects having an exposure, injury, wound, or condition that puts them at risk of developing sepsis and severe disease from sepsis, such as an infection.
- the levels or values of the clinical parameters compared to reference levels can vary.
- the levels or values of any one or more of the factors, risk factors, biomarkers, clinical parameters, and/or components is at least 1.05, 1.1 , 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1 ,000, or 10,000-fold higher than reference levels or values.
- the levels or values of any one or more of the factors, risk factors, biomarkers, clinical parameters, and/or components is at least 1.05, 1.1 , 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1 ,000, or 10,000-fold lower than reference levels or values.
- the levels or values of the factors or components may be normalized to a standard and these normalized levels or values can then be compared to one another to determine if a factor or component is lower, higher or about the same.
- an increase in the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” value as compared to a reference “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” value indicates that the individual has an increased risk of severe disease from sepsis.
- the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” is compared to the profile that is deemed to be a “normal” “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile”.
- a “normal” “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” an individual or group of individuals may be first assessed to ensure they have no signs, symptoms or diagnostic indicators that they may experience severe disease from sepsis.
- the “risk of severe disease in an individual with sepsis profile” of the individual or group of individuals can be determined to establish a “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile.”
- a “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” can be ascertained from the same individual when the individual is deemed healthy, such as when the individual does not have an exposure, injury, wound, or condition that puts the individual at risk of experiencing severe disease from sepsis.
- a “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” from a “normal individual,” e.g., a “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile,” is from an individual who has sepsis but does not have any concurrent conditions that may increase the risk of severe disease.
- a “normal” “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” is assessed in the same individual from whom the sample is taken, prior to the onset of any signs, symptoms or diagnostic indicators that they may experience severe disease from sepsis.
- the “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” may be assessed in a longitudinal manner based on data regarding the individual at an earlier point in time, enabling a comparison between the “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” (and values thereof) overtime.
- a “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” is assessed in a sample from a different individual (from the individual being analyzed) and this different individual does not have, or is not suspected of, experiencing severe disease from sepsis.
- the “normal risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” is assessed in a population of healthy individuals, the constituents of which display no signs, symptoms or diagnostic indicators that they may have sepsis.
- the individual’s “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” can be compared to a normal risk profile generated from a single normal sample or a risk profile generated from more than one normal sample.
- a Wilcoxon rank-sum test can be used to identify which biomarkers from specific patient groups are associated with a specific indication, outcome, or specific phenotype.
- the assessment of the levels of the individual components of the “risk of severe disease in an individual with sepsis or at risk of developing sepsis profile” can be expressed as absolute or relative values and may or may not be expressed in relation to another component, a standard, an internal standard or another molecule or compound known to be in the sample. If the levels are assessed as relative to a standard or internal standard, the standard or internal standard may be added to the test sample prior to, during or after sample processing.
- proteins and nucleic acids can be linked to chips, such as microarray chips (see U.S. Patent 6,040,138 and U.S. Patent 7,148,058). Binding to proteins or nucleic acids on arrays can be detected by scanning the microarray with a variety of laser or charge coupled device (CCD)-based scanners, and extracting features with software packages, for example, Imagene (Biodiscovery, Hawthorne, CA), Feature Extraction Software (Agilent), Scanalyze (Eisen, M.
- CCD charge coupled device
- An array panel including one or more biomarkers for severe disease in an individual with sepsis can be used for predicting the risk of an individual for experiencing a specified clinical outcome and/or for monitoring a patient undergoing treatment for sepsis.
- the array is a microarray.
- RNA-sequencing techniques may be used, which may include single cell RNA-sequencing, direct RNA-sequencing, and/or next-gen RNA-sequencing.
- methods to measure metabolites may be used. For example, these techniques may include mass spectrometry, gas chromatography, liquid chromatography, supercritical fluid chromatography, or capillary electrophoresis, ora combination thereof.
- the arrays described herein can be used to predict severe disease of an individual with sepsis or at risk of developing sepsis.
- the arrays can be used to predict mortality of an individual with sepsis.
- the method includes using the arrays to detect or obtain the levels of one or more biomarkers described herein.
- the method can also include comparing the results of an array to a respective control for predict severe disease of an individual with sepsis or at risk of developing sepsis.
- the respective control can be an array for a normal individual.
- the methods described herein include predicting predict severe disease of an individual with sepsis or at risk of developing sepsis comprising detecting and/or measuring one or more biomarkers described herein.
- the method can include comparing the results of the detection and/or measured level of one or more biomarkers to a respective control.
- the respective control can include markers of a normal individual.
- aspects of the present disclosure may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “engine,” “module,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- aspects of the present disclosure may be implemented using one or more analog and/or digital electrical or electronic components, and may include a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), programmable logic and/or other analog and/or digital circuit elements configured to perform various input/output, control, analysis and other functions described herein, such as by executing instructions of a computer program product.
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- the computer device, computer readable media, network, and remote device may be arranged in the architecture depicted in FIG. 4.
- the computing device 400 houses at least, but is not limited to: a processor(s) 402, input/output device(s) 404, a display device 406, memory 408, a machine learning engine 420, and a prediction engine 432.
- the memory includes at least, but is not limited to an application programming interface 410, a client-facing application 412, machine learned models 414, training application 416, a discovery database 418, and a machine learning engine 420 that comprises data quality control algorithms 422, topological data analysis and clustering algorithms 424, feature selection algorithms 426, classification and time- to-event analysis algorithms 428, and trained prediction models 430.
- the memory also includes a prediction engine 432.
- the computing device(s) can be accessed through a network 434 by a remote device 436.
- the network enables communication via internet with a secure and protected host website operating the machine learning engine and prediction engine and providing an output after predictive variables are entered.
- the remote device 436 can be connected to the network using any number or combination of communication standards (e.g., Bluetooth, GSM, CDMA, TDNM, WCDMA, OFDM, GPRS, EV-DO, Wi-Fi, WiMAX, S02.xx, UWB, LTE, satellite).
- the connections may also be through wired communication features, such as USB ports, serial ports, IEEE 1394 ports, optical ports, parallel ports, and/or any other suitable wired communication port.
- the input/output device(s) 404 may include one or more of: a computer, a keyboard, a mouse, a mobile device (e.g., a mobile phone, a tablet, a laptop), a screen, a microphone, or a printing device.
- the user input device can include various user interface elements such as keys, buttons, sliders, knobs, touchpads (e.g., resistive or capacitive touchpads), or microphones.
- the user interface device includes a touchscreen display device and user input device, such that the user interface device can receive user inputs as touch inputs and determine commands indicated by the user inputs based on detecting location, intensity, duration, or other parameters of the touch inputs.
- the application programming interface 410 and the client-facing application 412 may be implemented using various software environments, including but not limited to: SAS and R software packages.
- SAS Statistical Analysis Software
- R is a free, general purpose, open-source software package that complies with and runs on a variety of UNIX platforms. There are many additional packages that run within the R general purpose software package, including topological data analysis, cluster analysis, and machine learning. While these are discussed, many other statistical and or machine learning software packages are contemplated.
- Any combination of one or more computer readable medium(s) may be utilized to store the machine-learned models 414, the training application 416, and the discovery database 418.
- the one or more computer readable medium(s) may also be utilized to store the machine learning engine 420 and the data quality control algorithms 422, the topological data analysis and clustering algorithms 424, the feature selection algorithms 426, and the classification and time- to-event analysis algorithms 428.
- the trained prediction models 430 may be stored in the machine learning engine 420.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider an Internet Service Provider
- These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each embodiment disclosed herein can comprise, consist essentially of, or consist of its particular stated element, step, ingredient or component.
- the terms “include” or “including” should be interpreted to recite: “comprise, consist of, or consist essentially of.”
- the transition term “comprise” or “comprises” means includes, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts.
- the transition phrase “consisting essentially of limits the scope of the embodiment to the specified elements, steps, ingredients or components and to those that do not materially affect the embodiment.
- a method of generating a model predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: generating a discovery database storing first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing topological data analysis and/or clustering for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to- event analysis algorithms; and outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- topological data analysis groups individuals or samples based on similarities in the plurality of subsets of clinical parameters, as well as the algebraic topology of the same data, wherein clusters are delineated based on the persistence homology of the node density and connectivity, and wherein sepsis response phenotypes are defined based on the identified clusters.
- cluster analysis discretizes the plurality of subsets of clinical parameters based on measures of similarity, wherein clusters are delineated based on the persistence homology of the node density and connectivity, and wherein sepsis response phenotypes are defined based on the identified clusters.
- the feature selection machine learning models comprise at least one of: unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s f-test, Mann- Whitney U test, random forest, logistic regression, or neural networks.
- the feature selection ensemble learning models comprise at least one of: cluster analysis, unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s f-test, Mann-Whitney U test, random forest, logistic regression, neural networks, Bayes optimal classifier, classification and regression tree, bootstrap aggregating, boosting, Bayesian model averaging, Bayesian model combination, a bucket of models, or stacking.
- the plurality of clinical parameters comprise one or more nucleic acid data markers, one or more protein data markers, one or more metabolite data markers, one or more clinical outcomes data, one or more administrative health data, or a combination thereof.
- nucleic acid data markers comprise one or more of: level of adhesion G protein-coupled receptor E1 (ADGRE1) in a sample from the individual, level of adrenoceptor b2 (ADRB2) in a sample from the individual, level of angiotensin II receptor associated protein (AGTRAP) in a sample from the individual, level of AKT serine/threonine kinase 1 (AKT1) in a sample from the individual, level of 5'-aminolevulinate synthase 2 (ALAS2) in a sample from the individual, level of alkaline phosphatase, biomineralization associated (ALPL) in a sample from the individual, level of ankyrin repeat domain 22 (ANKRD22) in a sample from the individual, level of annexin A3 (ANXA3) in a sample from the individual, level of arginase 1 (ARG1) in a sample from the individual, level of BCL2 like 1 (BCL2L
- ADGRE1 adhe
- a method for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: receiving, from a second individual, a second value of at least one clinical parameter of a plurality of clinical parameters; executing a pre-trained model for predicting severe disease from sepsis of the second individual using the second value of at least one clinical parameter, wherein the model is pre-trained by performing operations comprising: generating a discovery database storing first values of the plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing topological data analysis and/or clustering for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; and outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis; and outputting
- the plurality of data quality control algorithms comprise at least one of: differential expression algorithms, k-nearest neighbor imputation algorithms, three- sigma rule algorithms, or empirical Bayes method algorithms.
- topological data analysis groups individuals or samples based on similarities in the plurality of subsets of clinical parameters, as well as the algebraic topology of the same data, wherein clusters are delineated based on the persistence homology of the node density and connectivity, and wherein sepsis response phenotypes are defined based on the identified clusters.
- cluster analysis discretizes the plurality of subsets of clinical parameters based on measures of similarity, wherein clusters are delineated based on the persistence homology of the node density and connectivity, and wherein sepsis response phenotypes are defined based on the identified clusters.
- the feature selection machine learning models comprise at least one of: unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s f-test, Mann- Whitney U test, random forest, logistic regression, or neural networks.
- the feature selection ensemble learning models comprise at least one of: cluster analysis, unsupervised machine learning algorithm, supervised machine learning algorithm, minimum redundancy maximum relevance, Student’s f-test, Mann-Whitney U test, random forest, logistic regression, neural networks, Bayes optimal classifier, classification and regression tree, bootstrap aggregating, boosting, Bayesian model averaging, Bayesian model combination, a bucket of models, or stacking.
- the plurality of clinical parameters comprise one or more nucleic acid data markers, one or more protein data markers, one or more of metabolite data markers, one or more clinical outcomes data, one or more administrative health data, or a combination thereof.
- nucleic acid data markers comprise one or more of: level of adhesion G protein-coupled receptor E1 (ADGRE1) in a sample from the individual, level of adrenoceptor b2 (ADRB2) in a sample from the individual, level of angiotensin II receptor associated protein (AGTRAP) in a sample from the individual, level of AKT serine/threonine kinase 1 (AKT1) in a sample from the individual, level of 5'-aminolevulinate synthase 2 (ALAS2) in a sample from the individual, level of alkaline phosphatase, biomineralization associated (ALPL) in a sample from the individual, level of ankyrin repeat domain 22 (ANKRD22) in a sample from the individual, level of annexin A3 (ANXA3) in a sample from the individual, level of arginase 1 (ARG1) in a sample from the individual, level of BCL2 like 1 (BCL2L
- ADGRE1 adhe
- treating the individual comprises at least one of initiation or broadening of antibiotic therapy, balancing fluids and electrolytes, renal replacement therapy, adjustment of mechanical ventilation, targeted or empiric anti-inflammatory or immunomodulatory drugs, hemodynamic adjustments, calcium channel blocker medications, or surgical intervention.
- adjusting current treatment comprises changing dose of current antibiotic, changing to a different antibiotic, changing dose of non-steroidal anti- inflammatory drugs, initiating or adjusting insulin therapy.
- a system for generating a machine learning engine for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: one or more processors; a memory; a communication platform; a discovery database configured to store first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; and a machine learning engine configured to: execute a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; execute topological data analysis and/or clustering for the plurality of subsets of clinical parameters; execute a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; and output a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- the communication platform comprises at least one of: a mobile device, a secured network, a server that stores and receives messages, and a database.
- a system for predicting severe disease in an individual with sepsis or at risk of developing sepsis comprising: one or more processors; a memory; a communication platform; a discovery database configured to store first values of a plurality of clinical parameters and the clinical outcomes associated with a plurality of first subjects; a machine learning engine configured to pre-train a model for severe disease in an individual with sepsis or at risk of developing sepsis, wherein the model is pre-trained by performing operations comprising: executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing topological data analysis and/or clustering for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis; a prediction engine configured to: receive, from a second
- a non-transitory computer-readable medium having information recorded thereon for generating a model for predicting severe disease in an individual with sepsis or at risk of developing sepsis, wherein the information, when read by a computer, causes the computer to perform operations of: generating a discovery database storing first values of a plurality of clinical parameters and clinical outcomes associated with a plurality of first subjects; executing a plurality of data quality control algorithms to select a subset of clinical parameters from the plurality of clinical parameters; executing topological data analysis and/or clustering for the plurality of subsets of clinical parameters; executing a plurality of feature selection machine learning and/or ensemble learning models based on a plurality of classification and/or time-to-event analysis algorithms; and outputting a model for predicting severe disease in the individual with sepsis or at risk of developing sepsis.
- An array of host-biomarkers for sepsis wherein the array of biomarkers comprise two or more of: adhesion G protein-coupled receptor E1 (ADGRE1), adrenoceptor b2 (ADRB2), angiotensin II receptor associated protein (AGTRAP), AKT serine/threonine kinase 1 (AKT1), 5'- aminolevulinate synthase 2 (ALAS2), alkaline phosphatase, biomineralization associated (ALPL), ankyrin repeat domain 22 (ANKRD22), annexin A3 (ANXA3), arginase 1 (ARG1), BCL2 like 1 (BCL2L1), BMX non-receptor tyrosine kinase (BMX), chromosome 6 open reading frame 62 (C6orf62), carbonic anhydrase 2 (CA2), C-C motif chemokine ligand 5 (CCL5), C-C motif chemokine receptor
- the array of biomarkers of embodiment 27 or 28, wherein the array comprises three or more biomarkers, four or more biomarkers, five or more biomarkers, six or more biomarkers, seven or more biomarkers, eight of more biomarkers, nine or more biomarkers, 10 or more biomarkers, 15 or more biomarkers, 20 or more biomarkers, 25 or more biomarkers, 30 or more biomarkers, 35 or more biomarkers, 40 or more biomarkers, 45 or more biomarkers, or 48 biomarkers
- biomarkers wherein the array comprises two or more of the following biomarkers: adrenoceptor b2 (ADRB2), CD177 molecule (CD177), carboxypeptidase vitellogenic like (CPVL), C-X3-C motif chemokine receptor 1 (CX3CR1), defensin a3 (DEFA3), Fc receptor like 5 (FCRL5), G protein subunit y2 (GNG2), interleukin 10 receptor subunit a (IL10RA), kinesin light chain 3 (KLC3), oleoyl-ACP hydrolase (OLAH), pyruvate kinase M1/2 (PKM), radical S- adenosyl methionine domain containing 2 (RSAD2), STE20 related adaptor b (STRADB), tyrosylprotein sulfotransferase 1 (TPST1), tetraspanin 5 (TSPAN5), tetratricopeptid
- a method of predicting mortality in an individual with sepsis comprising: obtaining a biological sample from the individual; measuring one or more of the following biomarkers: adrenoceptor b2 (ADRB2), CD177 molecule (CD177), carboxypeptidase vitellogenic like (CPVL), C-X3-C motif chemokine receptor 1 (CX3CR1), defensin a3 (DEFA3), Fc receptor like 5 (FCRL5), G protein subunit y2 (GNG2), interleukin 10 receptor subunit a (IL10RA), kinesin light chain 3 (KLC3), oleoyl-ACP hydrolase (OLAH), pyruvate kinase M1/2 (PKM), radical S-adenosyl methionine domain containing 2 (RSAD2), STE20 related adaptor b (STRADB), tyrosylprotein sulfotransferase 1 (TPST1), tetraspan
- Example 1 The Austere environments Consortium for Enhanced Sepsis Outcomes (ACESO) follows a multi-omics systems biology approach for profiling sepsis patients into disease-response phenotypes, which informs the development of robust and accurate host- biomarker panels for sepsis diagnosis and prognosis (FIG. 5).
- the aim of this study was to use topological data analysis (TDA) to identify gene and protein expression phenotypes in sepsis patients enrolled in an ACESO observational study from sites in Cambodia, Ghana and the USA.
- TDA topological data analysis
- Concentrations of 48 proteins representing a range of biologic pathways were measured by Luminex multiplex immunoassay in peripheral blood samples from 586 sepsis patients.
- RNA sequencing was performed on 506 patients from the same cohort, and the 1000 protein-coding genes with the largest standard deviation were selected for analysis.
- Topological data analysis was used as an unsupervised method for identifying clusters of patients with similar gene or protein expression profiles (molecular phenotypes), as well as broadertrends across the TDA network.
- differences in demographic, clinical and basic laboratory measurements between TDA clusters were tested for statistical significance to inform on sepsis endotypes associated with the gene and protein expression phenotypes.
- TDA networks of gene expression in the ACESO discovery cohort show the heterogeneity of sepsis in an unsupervised, data-driven manner.
- TDA a 2- dimensional topological network is created which is based on the similarity between data points, as well as the overall distribution of the data in n-dimensional space.
- Nodes represent groups of patients with shared characteristics, whereas edges (lines) indicate that one or more patients are shared between two nodes.
- Any (meta)data available for the same patient set can be used to generate a gray-scale overlay (average values are calculated for each node) (FIG. 6).
- TDA analysis distinguished 5 distinct sepsis phenotypes based on gene expression, with significantly different levels of mortality (at 28 days post-enrollment).
- Using feature selection and machine learning a set of 13 genes was identified for predicting mortality in the discovery cohort (top right of FIG. 6) with a sensitivity of 90-96% for the high-mortality TDA groups.
- the distributions of genes across the TDA network highlights biological pathways relevant to the different sepsis phenotypes.
- TDA networks of protein expression in the ACESO discovery cohort indicated two major trends within the protein data and identified six overlapping patient clusters. Four of these clusters, comprising two-thirds of the study cohort, form a continuous spectrum along the primary axis of the network. Protein concentrations along this spectrum are predictive of mortality risk within the first 28 days of disease, representing a two-fold increase in risk between patients, independent of site of enrollment, at either end of the spectrum. In addition, there are significant differences between these phenotypes in terms of clinical presentation, laboratory measurements and blood cell counts (FIG. 7).
- Example 2 Sepsis is a major risk factor in patients with COVID-19, and those who go on to develop (severe) sepsis and require hospital or intensive care unit admission have poorer outcomes (mortality and long-term morbidity).
- Host biomarker data were collected from a cohort of COVID-19 patients in order to elucidate their role in the pathogenesis of this disease, as well as to assess the feasibility of using host biomarker levels to prognose disease severity and longterm (post 90-day) morbidity in COVID-19 patients. Baseline levels of 15 cytokines were measured in peripheral blood samples using the Ella multiplex assay. In addition, a wide range of demographic, clinical and laboratory variables were collected
- the ensemble machine learning algorithm was a combination of random forest (RF) and classification and regression tree (CART), along with the extreme gradient boosting. 10,000 simulations of the model were performed in order to minimize or account for the two common sources of uncertainty in prediction models: (1) the errors introduced by the use of imperfect initial conditions, and (2) errors introduced due to imperfections in the model formulation. The accuracies of the individual models (RF and CART) were assessed priorto creating the ensemble model based on a combination of the two. The ensemble model showed 20% higher accuracy compared with either of these individual methods. The area under the curve (AUC) for the training dataset was 0.88 and for testing set was to 0.83.
- AUC area under the curve
- FIG. 8 depicts the CART tree based on the ensemble model.
- NODE 2 the number of patients diagnosed with COVID-19.
- patients with IL-6 levels between 2.34 pg/ml and 5.74 pg/ml and who were also older than 74 had a very high likelihood of requiring hospitalization (NODE 8), and the same was true for any patients with IL-6 levels equal to or greater than 5.74 pg/ml (NODE 9).
- NODE 9 For patients with IL-6 levels between 2.34 pg/ml and 5.74 pg/ml but who were less than 74 years old the CRP levels were relevant.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Biophysics (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Surgery (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Heart & Thoracic Surgery (AREA)
- Veterinary Medicine (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Chemical & Material Sciences (AREA)
- Physiology (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Mathematical Physics (AREA)
- Bioethics (AREA)
- Medicinal Chemistry (AREA)
- Urology & Nephrology (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Optics & Photonics (AREA)
Abstract
Description
Claims
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20906754.5A EP4082028A4 (en) | 2019-12-27 | 2020-12-24 | PREDICTING AND ADDRESSING SERIOUS ILLNESS IN INDIVIDUALS WITH SEPSIS |
| AU2020411504A AU2020411504B2 (en) | 2019-12-27 | 2020-12-24 | Predicting and addressing severe disease in individuals with sepsis |
| CA3163000A CA3163000A1 (en) | 2019-12-27 | 2020-12-24 | Predicting and addressing severe disease in individuals with sepsis |
| US17/757,984 US20230018537A1 (en) | 2019-12-27 | 2020-12-24 | Predicting and addressing severe disease in individuals with sepsis |
| JP2022539379A JP7754815B2 (en) | 2019-12-27 | 2020-12-24 | Predicting and managing critical illness in individuals with sepsis |
| IL294285A IL294285A (en) | 2019-12-27 | 2022-06-26 | Prediction and treatment of serious illness in people with sepsis |
| JP2025117616A JP2025163050A (en) | 2019-12-27 | 2025-07-11 | Predicting and managing critical illness in individuals with sepsis |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962954298P | 2019-12-27 | 2019-12-27 | |
| US62/954,298 | 2019-12-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021134027A1 true WO2021134027A1 (en) | 2021-07-01 |
Family
ID=76575149
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/067038 Ceased WO2021134027A1 (en) | 2019-12-27 | 2020-12-24 | Predicting and addressing severe disease in individuals with sepsis |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US20230018537A1 (en) |
| EP (1) | EP4082028A4 (en) |
| JP (2) | JP7754815B2 (en) |
| AU (1) | AU2020411504B2 (en) |
| CA (1) | CA3163000A1 (en) |
| IL (1) | IL294285A (en) |
| WO (1) | WO2021134027A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210315511A1 (en) * | 2020-03-21 | 2021-10-14 | Tata Consultancy Services Limited | Discriminating features based sepsis prediction |
| CN114617899A (en) * | 2022-04-14 | 2022-06-14 | 苏州大学附属儿童医院 | Application of S-adenosylmethionine in preparing medicine for treating sepsis related encephalopathy |
| WO2023019093A3 (en) * | 2021-08-07 | 2023-03-23 | Venn Biosciences Corporation | Detection of peptide structures for diagnosing and treating sepsis and covid |
| WO2024018372A3 (en) * | 2022-07-20 | 2024-03-07 | Sri Sathya Sai Institute Of Higher Learning | A machine learning platform for predicting uropathogens and their resistance for prescribing suitable urinary infection therapy |
| WO2024216394A1 (en) * | 2023-04-20 | 2024-10-24 | Cardiai Technologies Ltd. | System and method for health analysys based on pharmacogenomics and/or microbiome |
| WO2024261218A1 (en) * | 2023-06-22 | 2024-12-26 | Grifols Worldwide Operations Limited | Biomarkers of acute-on-chronic liver failure (aclf) progression |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230251274A1 (en) * | 2020-06-25 | 2023-08-10 | Humanitas Mirasole S.P.A. | Ptx3 as prognostic marker in covid-19 |
| US12288603B2 (en) * | 2021-04-09 | 2025-04-29 | Endocanna Health, Inc. | Machine-learning based efficacy predictions based on genetic and biometric information |
| US20230162019A1 (en) * | 2021-11-23 | 2023-05-25 | International Business Machines Corporation | Topological signatures for disease characterization |
| US12587274B2 (en) | 2023-03-28 | 2026-03-24 | Quantum Generative Materials Llc | Satellite optimization management system based on natural language input and artificial intelligence |
| WO2024216182A1 (en) * | 2023-04-14 | 2024-10-17 | Prenosis, Inc. | Systems and methods of artificial intelligence-based sepsis prediction |
| WO2024259269A1 (en) * | 2023-06-16 | 2024-12-19 | Health Outlook Corp. | Systems and methods for using machine-learning to predict patient health characteristics |
| CN117012375B (en) * | 2023-10-07 | 2024-03-26 | 之江实验室 | Clinical decision support method and system based on patient topological feature similarity |
| US12368503B2 (en) | 2023-12-27 | 2025-07-22 | Quantum Generative Materials Llc | Intent-based satellite transmit management based on preexisting historical location and machine learning |
| US12603701B2 (en) | 2023-12-27 | 2026-04-14 | Quantum Generative Materials Llc | Distributed satellite constellation management and control system |
| US20250279200A1 (en) * | 2024-02-29 | 2025-09-04 | Mayaminer Company Ltd. | Method and System for Establishing Death in Hemodialysis Patients Prediction Model, and Method and Non-transitory Computer Readable Medium for Predicting Death in Hemodialysis Patients |
| US12450242B2 (en) | 2024-03-26 | 2025-10-21 | International Business Machines Corporation | Ranking cross-disorder features in multiway data interactions |
| CN119007990B (en) * | 2024-10-23 | 2025-01-24 | 山东第一医科大学附属省立医院(山东省立医院) | Sepsis prediction method and system based on machine learning |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015031996A1 (en) * | 2013-09-05 | 2015-03-12 | University Health Network | Biomarkers for early determination of a critical or life threatening response to illness and/or treatment response |
| US20170009297A1 (en) * | 2014-02-11 | 2017-01-12 | The Secretary Of State For Defence | Apparatus, kits and methods for the prediction of onset of sepsis |
| US20170022568A1 (en) * | 2014-04-07 | 2017-01-26 | The University Court Of The University Of Edinburg | Molecular predictors of sepsis |
| KR20180058466A (en) * | 2016-11-24 | 2018-06-01 | 주식회사 셀바스에이아이 | Method and apparatus for machine-learning of a model predicting probability of outbreak of disease |
| KR20190131267A (en) * | 2018-05-16 | 2019-11-26 | 고려대학교 산학협력단 | Method and system for prediction of Coronary Artery Disease by using machine learning |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB0426982D0 (en) * | 2004-12-09 | 2005-01-12 | Secr Defence | Early detection of sepsis |
| EP3114617A4 (en) * | 2014-03-05 | 2017-07-26 | Ayasdi Inc. | Systems and methods for capture of relationships within information |
| US20160070879A1 (en) * | 2014-09-09 | 2016-03-10 | Lockheed Martin Corporation | Method and apparatus for disease detection |
| WO2017027432A1 (en) * | 2015-08-07 | 2017-02-16 | Aptima, Inc. | Systems and methods to support medical therapy decisions |
| WO2018129413A1 (en) * | 2017-01-08 | 2018-07-12 | The Henry M. Jackson Foundation For The Advancement Of Military Medicine, Inc. | Systems and methods for using supervised learning to predict subject-specific bacteremia outcomes |
| WO2018140256A1 (en) * | 2017-01-17 | 2018-08-02 | Duke University | Gene expression signatures useful to predict or diagnose sepsis and methods of using the same |
| US11901080B1 (en) * | 2019-12-30 | 2024-02-13 | C/Hca, Inc. | Predictive modeling for user condition prediction |
-
2020
- 2020-12-24 AU AU2020411504A patent/AU2020411504B2/en active Active
- 2020-12-24 WO PCT/US2020/067038 patent/WO2021134027A1/en not_active Ceased
- 2020-12-24 CA CA3163000A patent/CA3163000A1/en active Pending
- 2020-12-24 US US17/757,984 patent/US20230018537A1/en not_active Abandoned
- 2020-12-24 EP EP20906754.5A patent/EP4082028A4/en active Pending
- 2020-12-24 JP JP2022539379A patent/JP7754815B2/en active Active
-
2022
- 2022-06-26 IL IL294285A patent/IL294285A/en unknown
-
2025
- 2025-07-11 JP JP2025117616A patent/JP2025163050A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015031996A1 (en) * | 2013-09-05 | 2015-03-12 | University Health Network | Biomarkers for early determination of a critical or life threatening response to illness and/or treatment response |
| US20170009297A1 (en) * | 2014-02-11 | 2017-01-12 | The Secretary Of State For Defence | Apparatus, kits and methods for the prediction of onset of sepsis |
| US20170022568A1 (en) * | 2014-04-07 | 2017-01-26 | The University Court Of The University Of Edinburg | Molecular predictors of sepsis |
| KR20180058466A (en) * | 2016-11-24 | 2018-06-01 | 주식회사 셀바스에이아이 | Method and apparatus for machine-learning of a model predicting probability of outbreak of disease |
| KR20190131267A (en) * | 2018-05-16 | 2019-11-26 | 고려대학교 산학협력단 | Method and system for prediction of Coronary Artery Disease by using machine learning |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4082028A4 * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210315511A1 (en) * | 2020-03-21 | 2021-10-14 | Tata Consultancy Services Limited | Discriminating features based sepsis prediction |
| US11817217B2 (en) * | 2020-03-21 | 2023-11-14 | Tata Consultancy Services Limited | Discriminating features based sepsis prediction |
| WO2023019093A3 (en) * | 2021-08-07 | 2023-03-23 | Venn Biosciences Corporation | Detection of peptide structures for diagnosing and treating sepsis and covid |
| CN114617899A (en) * | 2022-04-14 | 2022-06-14 | 苏州大学附属儿童医院 | Application of S-adenosylmethionine in preparing medicine for treating sepsis related encephalopathy |
| CN114617899B (en) * | 2022-04-14 | 2023-10-20 | 苏州大学附属儿童医院 | Application of S-adenosylmethionine in preparation of medicines for treating sepsis-related encephalopathy |
| WO2024018372A3 (en) * | 2022-07-20 | 2024-03-07 | Sri Sathya Sai Institute Of Higher Learning | A machine learning platform for predicting uropathogens and their resistance for prescribing suitable urinary infection therapy |
| WO2024216394A1 (en) * | 2023-04-20 | 2024-10-24 | Cardiai Technologies Ltd. | System and method for health analysys based on pharmacogenomics and/or microbiome |
| WO2024261218A1 (en) * | 2023-06-22 | 2024-12-26 | Grifols Worldwide Operations Limited | Biomarkers of acute-on-chronic liver failure (aclf) progression |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2025163050A (en) | 2025-10-28 |
| EP4082028A4 (en) | 2024-01-24 |
| AU2020411504B2 (en) | 2024-04-18 |
| CA3163000A1 (en) | 2021-07-01 |
| EP4082028A1 (en) | 2022-11-02 |
| JP7754815B2 (en) | 2025-10-15 |
| AU2020411504A1 (en) | 2022-07-14 |
| JP2023511658A (en) | 2023-03-22 |
| US20230018537A1 (en) | 2023-01-19 |
| IL294285A (en) | 2022-08-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2020411504B2 (en) | Predicting and addressing severe disease in individuals with sepsis | |
| US20240401107A1 (en) | Methods and systems for processing a nucleic acid sample | |
| US12597488B2 (en) | Use of machine learning models for prediction of clinical outcomes | |
| JP7097370B2 (en) | Systems and methods for using supervised learning to predict subject-specific bloodstream transcriptions | |
| US20220251647A1 (en) | Gene expression signatures useful to predict or diagnose sepsis and methods of using the same | |
| US9238841B2 (en) | Multi-biomarker-based outcome risk stratification model for pediatric septic shock | |
| US20190355473A1 (en) | Systems and methods for using supervised learning to predict subject-specific pneumonia outcomes | |
| CN102803951A (en) | Determination of coronary artery disease risk | |
| US20230019900A1 (en) | Prediction of venous thromboembolism utilizing machine learning models | |
| US20240309469A1 (en) | Methods to detect and treat a fungal infection | |
| WO2025043226A1 (en) | Sepsis mortality prediction model | |
| WO2025199410A1 (en) | Multi-omic patient stratification in inflammatory bowel disease treatment | |
| CN115698712A (en) | Method for predicting disease progression in rheumatoid arthritis | |
| Baghela | Identifying predictive gene expression signatures of sepsis severity | |
| Dong et al. | Blood Transcriptomic and Inflammatory Protein Biomarkers Associated with Imminent Pulmonary Exacerbation Risk in Cystic Fibrosis | |
| Chen et al. | Plasma proteomic profiles predict chronic obstructive pulmonary disease up to 16 years before onset: a multi-national, machine learning-guided biomarker discovery study | |
| Mikhaylov | Integrating Biologic and Clinical Data towards Resolving Heterogeneity in Childhood Inflammatory Diseases |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20906754 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 3163000 Country of ref document: CA |
|
| ENP | Entry into the national phase |
Ref document number: 2022539379 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2020411504 Country of ref document: AU Date of ref document: 20201224 Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020906754 Country of ref document: EP Effective date: 20220727 |