WO2025010357A2 - Systèmes et procédés de prédiction de pathogénicité de variants sur la base de rapports signal sur bruit d'acides aminés - Google Patents

Systèmes et procédés de prédiction de pathogénicité de variants sur la base de rapports signal sur bruit d'acides aminés Download PDF

Info

Publication number
WO2025010357A2
WO2025010357A2 PCT/US2024/036776 US2024036776W WO2025010357A2 WO 2025010357 A2 WO2025010357 A2 WO 2025010357A2 US 2024036776 W US2024036776 W US 2024036776W WO 2025010357 A2 WO2025010357 A2 WO 2025010357A2
Authority
WO
WIPO (PCT)
Prior art keywords
gene
amino acid
variants
disease
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/036776
Other languages
English (en)
Other versions
WO2025010357A3 (fr
Inventor
Andrew P. LANDSTROM
Leonie M. KURZLECHNER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Duke University
Original Assignee
Duke University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Duke University filed Critical Duke University
Publication of WO2025010357A2 publication Critical patent/WO2025010357A2/fr
Publication of WO2025010357A3 publication Critical patent/WO2025010357A3/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies

Definitions

  • the present disclosure relates to systems and methods that enable accurate predictions of variant pathogenicity for genes, often rare genes, in helping to determine disease susceptibility for a patient in view of the gene(s), and more particularly relates to a web-based tool, and related methods, that provide these accurate predictions relying upon, for example amino acid signal-to-noise ratios.
  • Conditions presenting with SCD include arrhythmogenic cardiomyopathy (ACM), catecholaminergic polymorphic ventricular tachycardia (CPVT), long QT syndrome (LQTS), hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM). These have pediatric diagnostic yield of 60-80%. Desmosomal genes are a major cause of ACM (PKP2, DSC2, DSG2, DSP), where fibrofatty tissue replaces cardiac myocytes, increasing risk of fatal arrhythmias. Defects in ion channels (RYR2, KCNQ1, KCNH2, SCN5A) are primarily responsible for channelopathies, including CPVT and LQTS.
  • ACM arrhythmogenic cardiomyopathy
  • CPVT catecholaminergic polymorphic ventricular tachycardia
  • LQTS long QT syndrome
  • HCM hypertrophic cardiomyopathy
  • DCM dilated cardiomyopathy
  • CPVT classically results in syncope or SCD secondary to bidirectional ventricular tachycardia, particularly in heightened adrenergic states.
  • the cardiac ryanodine receptor 2 (RyR2) implicated in CPVT maintains Ca 2+ release channel complexes in the sarcoplasmic reticulum of cardiac myocytes.
  • LQTS is marked by delayed cardiac repolarization, with the implicated ion channels crucial for cardiac action potential regulation and in which defects contribute to life-threatening torsades de pointes.
  • MYH7, MYBPC3, and TTN are the major sarcomeric genes that cause cardiomyopathy.
  • HCM is defined by unexplained thickening of the left ventricular wall without dilation.
  • TTNtvs Truncating variants in TTN (TTNtvs) resulting in premature translation termination cause familial cardiomyopathies (CM), including nonischemic DCM and left ventricular noncompaction cardiomyopathy (LVNC).
  • CM familial cardiomyopathies
  • LVNC left ventricular noncompaction cardiomyopathy
  • the present disclosure is directed to systems, methods, and techniques for producing identifying pathogenic hotspots in SCD-associated genes using amino acid- level signal-to-noise analysis, as well as a web-based precision medicine tool, referred to as DiscoVari or the DiscoVari tool herein, to improve variant evaluation and make it more accessible.
  • the disclosed technology may also leverage artificial intelligence (Al) models, algorithms, and techniques to predict disease risk in various genomic variants on a population-based level. More particularly, the Al model(s) described herein can be trained and used to predict disease association and also disease penetrance, which includes likelihood that a particular individual will manifest a particular associated disease phenotype.
  • variants in the primary genes implicated in ACM, CPVT, LQTS, HCM, and familial CM are considered reportable, medically actionable secondary findings.
  • a population-based approach to determining the likelihood a given protein region is associated with disease development so-called amino acid level signal-to-noise (S:N), can predict disease risk in incidental variants.
  • the present disclosure provides for a bioinformatics tool (e.g., web-based, mobile application, software) to refine variant evaluation in cardiomyopathy-associated genes, channelopathy-associated genes, and/or other genes.
  • the disclosures contained herein use amino acid-level S:N analysis to identify disease-associated genetic hotspots.
  • the minor allele frequency (MAF) of putatively pathogenic variants can be derived from cohort-based cardiomyopathy and channelopathy studies.
  • disease-associated MAFs were normalized to rare variants in an ostensibly healthy population (gnomAD) to calculate amino acid-level S:N.
  • Amino acids with S:N above the gene-specific threshold were defined as hotspots.
  • the disclosed DiscoVari tool can be used to identify pathogenic variants using variants from databases that provide information about genetic variants and their relationship to human health, like ClinVar, and individuals clinically evaluated with cardiac genetic testing.
  • the DiscoVari tool can be used as an internet-based tool for S:N-based variant hotspots.
  • the DiscoVari tool reliably can be used to identify disease-susceptible amino acid residues to evaluate variants by searching amino acid-specific S:N ratios.
  • One embodiment of a method of analyzing a gene to determine susceptibility to a particular disease includes receiving a gene selection from a user, receiving an amino acid position selection from that user, and outputting information.
  • the gene selection is one gene of a plurality of genes stored in a database, with the plurality of genes being linked to one or more associated diseases.
  • the outputted information includes at least one of: a signal-to-noise ratio of the selected gene-amino acid combination, a relative risk level of the one or more associated diseases, an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot, or a disease susceptibility determination.
  • the outputted information is based on the received gene selection and amino acid position selection.
  • the outputted information can further include information about a functional domain for the selected gene-amino acid combination.
  • the outputted information can include data generated from population predictions for one or more of the plurality of genes stored in the database. The population predictions can be determined, for example, based on providing the selected gene-amino acid combination as input to an artificial intelligence (Al) model.
  • the outputted information can include data generated by an Al model. The Al model can have been trained to generate the data based on receiving, as model input, the selected gene-amino acid combination.
  • An embodiment of a gene analysis tool includes a database of a plurality of genes and a processor.
  • the database including information for each gene of the plurality of genes that includes: one or more associated diseases for the gene, a threshold signal-to-noise value for each disease of the one or more associated diseases, and a signal-to-noise value for each amino acid position of the gene.
  • the processor is configured to receive user input that includes a gene selection from the database of a plurality of genes and an amino acid position selection, as well as output information about disease susceptibility in view of the gene selection and the amino acid position selection based on the information in the database.
  • the information for each gene of the plurality of genes can include a signal- to-noise ratio of the selected gene-amino acid combination and/or a signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene.
  • the information for each gene of the plurality of genes can include a functional domain for the selected gene-amino acid combination.
  • the processor can be further configured to compare the signal-to-noise ratio of the selected gene-amino acid combination and the signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene, as well as determine, based on the comparison, whether the signal-to-noise threshold value may be exceeded by the signal-to-noise ratio of the selected gene-amino acid combination. Still further, the processor can be configured to output the determination about the signal-to-noise ratio as compared to the signal-to-noise threshold value.
  • the information for each gene of the plurality of genes can include a functional domain for the selected gene-amino acid combination, and the processor can be further configured to identify the functional domain for the selected gene-amino acid combination and output the identified functional domain.
  • the tool can be web-based.
  • the processor can be further configured to present a graphical user interface (GUI) in a display of a computing device of a user.
  • GUI graphical user interface
  • the GUI can include, for example, a first graphical element that can correspond to the gene selection and a second graphical element that can correspond to the amino acid position selection.
  • at least one of the first and second graphical elements can include dropdown menus, and the user input can include selection of an option in a respective dropdown menu.
  • the information about disease susceptibility can be presented in a graph.
  • An embodiment of a computer-readable medium storing instructions is also provided.
  • That computer-readable medium storing instructions When that computer-readable medium storing instructions is executed by a processor, it causes the processor to receive a gene selection from a user, receive an amino acid position selection from the user, and output information.
  • the gene selection is one gene of a plurality of genes stored in a database, with the plurality of genes being linked to one or more associated diseases.
  • the outputted information includes at least one of: a signal-to-noise ratio of the selected gene- amino acid combination, a relative risk level of the one or more associated diseases, an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot, or a disease susceptibility determination.
  • the outputted information is based on the received gene selection and amino acid position selection.
  • the outputted information can include data generated by an Al model.
  • the Al model can have been trained to generate the data based on receiving, as model input, the selected gene-amino acid combination.
  • the processor can be further configured to present a graphical user interface (GUI) in a display of a computing device of the user.
  • GUI graphical user interface
  • the GUI can include, for example, a first graphical element that can correspond to the gene selection and a second graphical element that can correspond to the amino acid position selection.
  • the information for each gene of the plurality of genes further can include a functional domain for the selected gene-amino acid combination.
  • the processor can be further configured to identify the functional domain for the selected gene-amino acid combination and output the identified functional domain in a GUI at a computing device of the user.
  • One embodiment of a method of analyzing a gene to determine susceptibility to a particular disease includes receiving a gene selection from a user, receiving an amino acid position selection from the user that made the gene selection, and outputting information in view of the same.
  • the gene selection is one gene of a plurality of genes stored in a database, with the plurality of genes being linked to one or more associated disease.
  • the information that is output includes at least one of: a signal-to-noise ratio of the selected gene-amino acid combination, a relative risk level of the one or more associated diseases, an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot, and/or a disease susceptibility determination.
  • the outputted information can include information about a functional domain for the selected gene-amino acid combination.
  • the outputted information can include data generated from population predictions for one or more of the plurality of genes stored in the database.
  • the population predictions can be determined based on providing the selected gene-amino acid combination as input to an artificial intelligence (Al) model.
  • the outputted information can include data generated by an Al model.
  • the Al model can have been trained to generate the data based on receiving, as model input, the selected gene-amino acid combination.
  • One embodiment of a gene analysis tool includes a database of a plurality of genes and a processor.
  • the database includes information for each gene of the plurality of genes. Such information includes: one or more associated diseases for the genes; a threshold signal-to- noise value for each disease of the one or more associated diseases; and a signal-to-noise value for each amino acid position of the gene.
  • the processor is configured to receive user input that includes a gene selection from the database of a plurality of genes and an amino acid position selection.
  • the processor is also configured to output information about disease susceptibility in view of the gene selection and the amino acid position selection based on the information in the database.
  • the information for each gene of the plurality of genes can include a signal-to-noise ratio of the selected gene-amino acid combination, as well as a signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene.
  • the information for each gene of the plurality of genes can additionally, or alternatively, include a functional domain for the selected gene-amino acid combination.
  • the processor can be configured to perform other functions as well.
  • the processor can be configured to compare the signal-to-noise ratio of the selected gene-amino acid combination and the signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene, as well as determine, based on the comparison, whether the signal-to-noise threshold value is exceeded by the signal-to-noise ratio of the selected gene- amino acid combination.
  • the processor can be configured to output the determination about the signal-to-noise ratio as compared to the signal-to-noise threshold value.
  • the information about disease susceptibility can be presented in a graph.
  • the information for each gene of the plurality of genes can include a functional domain for the selected gene-amino acid combination.
  • the processor can be further configured to identify the functional domain for the selected gene- amino acid combination, and output the identified functional domain.
  • the tool can be web-based.
  • the processor can be further configured to present a graphical user interface (GUI) in a display of a computing device of a user.
  • GUI graphical user interface
  • the GUI can include, for example, a first graphical element that corresponds to the gene selection and a second graphical element that corresponds to the amino acid position selection.
  • at least one of the first and second graphical elements can include dropdown menus and the user input can include selection of an option in a respective dropdown menu.
  • One embodiment of a computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a variety of functions.
  • the gene selection is one gene of a plurality of genes stored in a database, with the plurality of genes being linked to one or more associated diseases.
  • the output information includes at least one of: a signal-to-noise ratio of the selected gene-amino acid combination; a relative risk level of the one or more associated diseases; an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot; and/or a disease susceptibility determination.
  • the outputted information can include data generated by an Al model.
  • the Al model can have been trained to generate the data based on receiving, as model input, the selected gene-amino acid combination.
  • the processor can be further configured to present a graphical user interface (GUI) in a display of a computing device of the user.
  • GUI graphical user interface
  • the GUI can include a first graphical element that corresponds to the gene selection and a second graphical element that corresponds to the amino acid position selection.
  • the information for each gene of the plurality of genes can include a functional domain for the selected gene-amino acid combination.
  • the processor can be further configured to identify the functional domain for the selected gene- amino acid combination and output the identified functional domain in a GUI at a computing device of the user.
  • the disclosed technology can provide one or more of the following advantages.
  • the disclosed technology can provide for accurately predicting and identifying pathogenic variants for various diseases over a population-level.
  • the disclosed techniques further provide that S:N can be useful in identifying disease-causative variants and may also distinguish variants correlated with sub-clinical evidence of cardiac disease. Improved identification of disease-causing variants can allow for preemptive evaluation and treatment of patients and their family members at increased risk for developing cardiac disease.
  • the disclosed techniques’ usefulness in downgrading variants can also reduce patient anxiety and healthcare expenditure in patients without disease found to have variants in CM-associated genes. In a research setting, this tool may allow for more accurate prediction of whether currently unidentified or uncertain genetic variants will lead to cardiac disease.
  • the disclosed technology can use a complex collection of algorithms, Al, and/or machine learning techniques to analyze data related to different genes for a population of users and to inform the relevant users about potential disease risks derived from genomic variances.
  • This complex collection of algorithms, Al, and/or machine learning techniques can provide an unconventional solution to the problem of trying to accurately detect various disease risks for populations of users (and/or individual users).
  • This unconventional solution can be rooted in technology and provides information that was not available in conventional systems.
  • This unconventional solution also represents an improvement in the subject technical field otherwise unrealized by conventional systems.
  • the disclosed technology may predict, with high accuracy and by using minimal processing time and compute resources, disease risks of a population level.
  • the disclosed technology may also display relevant information and data using a graphical user interface (GUI) on a display of computing devices of the relevant users in a unique and easy way to understand format.
  • GUI graphical user interface
  • information about predicted disease risks for at least the following reasons: (i) the significant processing power required to predict the disease risks of a population level real-time or near real-time; (ii) the considerable data storage requirements for maintaining information collected and determined by the disclosed technology; (hi) a large enough pool of parameter data to provide accurate thresholds for the disclosed algorithms, Al, and/or machine learning techniques; and/or (iv) algorithms, Al, and/or machine learning techniques that allow for the thresholds to be self-updated in light of additional data that can be added to the pool of relevant parameter data.
  • the GUI(s) can display results of the execution of these complex algorithms, Al, and/or machine learning techniques in a manner that can be easily understandable by a human user.
  • an exemplary algorithm from this complex collection of algorithms can require: receiving genomic data from a variety of computing sources; selecting some data provided by the computing sources; ignoring some of the data that was provided by the computing sources; performing multiple calculations on a selected subset of the data; combining the data from these multiple calculations; and then outputting that data within a short amount of time (e.g., preferably less than a minute), all for multiple relevant users.
  • a short amount of time e.g., preferably less than a minute
  • the disclosed technology may require analyzing millions of data points to accurately predict and quantify risk of different types of disease on a population level, generating and outputting information based on the analysis and predictions, and then repeating the above operations over a relatively short time period e.g., every day, every half day, every hour, every 10 minutes, every 5 minutes, every 1 minute, etc.) and for many different users and/or input data (e.g. , genes, diseases).
  • a relatively short time period e.g., every day, every half day, every hour, every 10 minutes, every 5 minutes, every 1 minute, etc.
  • FIG. 1A is a schematic illustration of derivation of disease-associated variants
  • FIG. IB is a diagram of a methodology for population variant inclusion
  • FIG. 1C depicts a schematic illustration of an S:N modeling design
  • FIG. 2 A is an S:N validation workflow using variants in ClinVar
  • FIG. 2B is a bar graph of percent of variants located within S:N hotspots between ClinVar likely benign or benign (LB/B) and likely pathogenic and pathogenic (LP/P) variants;
  • FIG. 2C is a bar graph of the S:N values for LB/B and LP/P variants in hotspots
  • FIG. 2D is a workflow illustrating CM and channelopathy variants with a classification change in ClinVar from initial variant of uncertain/unknown significance (VUS) to LP/P or LB/B (or vice versa);
  • VUS uncertain/unknown significance
  • FIG. 2E is a bar graph depicting percent of variants located within S:N hotspots of variants that were reclassified from VUS to LB/B, from LB/B or LP/P to VUS, or from VUS to LP/P;
  • FIG. 2F is a bar graph illustrating S:N values for reclassified variants;
  • FIG. 3A is a bar graph of percent of variants in a clinical validation cohort located in S:N hotspots stratified by variants classified as LB/B, VUS, and LP/P at time of genetic testing;
  • FIG. 3B is a bar graph of S:N values for VUSs and LP/P variants in the clinical cohort
  • FIG. 3C is a bar graph of percent of VUSs located in S:N hotspots stratified by those considered disease-causative, of unknown significance, or suspected benign;
  • FIG. 4 A is an example architecture of an application for performing the disclosed techniques
  • FIG. 4B is one embodiment of a graphical user interface (GUI) based on using the application described in reference to FIG. 4A;
  • GUI graphical user interface
  • FIG. 4C is the GUI of FIG. 4B illustrating a gene selection being made
  • FIG. 4D is the GUI of FIG. 4C illustrating both a gene selection and an amino acid position having been selected
  • FIG. 4E is the GUI of FIG. 4D illustrating a GUI output after the gene selection and the amino acid position of FIG. 4D were searched;
  • FIG. 4F is the GUI of FIG. 4C illustrating both the same gene selection as FIG. 4D but a different amino position having been selected;
  • FIG. 4G is the GUI of FIG. 4F illustrating a GUI output after the gene selection and the amino acid position of FIG. 4F were searched;
  • FIG. 5 is a flowchart of an example process for incorporating S:N analysis into ACMG criteria when evaluating incidental variants
  • FIG. 6A illustrates a workflow for derivation of UK Biobank (UKBB) variants for prediction of variants that can cause disease at a population level;
  • UKBB UK Biobank
  • FIG. 6B is a bar graph of the variant prevalence in the UKBB, stratified by DiscoVari genes, HCM-associated variants, and TTN variants;
  • FIG. 6C is a bar graph of a proportion of cohort variants where classification between LB/B, VUS, and/or LP/P, as described by ACMG criteria, hinges upon whether the variant does, or does not, satisfy ACMG PM1 Criteria, the PM1 Criteria asking whether the variant does, or does not, lie in a genetic “hotspot,” which increases chances that the variant will cause disease, stratified by DiscoVari genes, HCM-associated variants, and TTN variants;
  • FIG. 7 A is a workflow of missense variants identified in HCM-associated genes among UKBB participants and their classification change to DiscoVari classification;
  • FIG. 7B is a bar graph of prevalence of HCM by ICD-10 code in UKBB participants included in an example cohort
  • FIG. 7C is a bar graph of a proportion of variants identified in S:N hotspots stratified by variants in participants with no evidence of HCM or with HCM;
  • FIG. 7D is a bar graph of signal-to-noise values for variants in HCM negative participants compared to variants in participants with HCM;
  • FIG. 7E is a bar graph of a proportion of variants hosted in individuals with HCM where ACMG PM1 criteria was met or not by InterVar and using DiscoVari S:N analysis;
  • FIG. 7F is a bar graph of LP/P variants in HCM negative individuals and those with HCM using InterVar and DiscoVari S:N for applying PM1 criteria;
  • FIG. 8A is a workflow of identification of HCM negative individuals and phenotypic analysis for sub-clinical evidence of disease among UKBB participants;
  • FIG. 8B is a bar graph of a proportion of missense HCM-associated variants hosted in individuals with evidence of an HCM-associated phenotype, including atrial fibrillation/flutter, cardiomyopathy, cardiac murmur, and chest pain, stratified by variants outside S:N hotspots or in S:N hotspots;
  • FIG. 8C is a bar graph of a proportion of variants hosted in HCM-negative individuals with Afib/flutter in downgraded or upgraded variants with DiscoVari incorporated for classification;
  • FIG. 8D is a bar graph of a proportion of variants hosted in HCM-negative individuals with chest pain in downgraded or upgraded variants with DiscoVari incorporated for classification;
  • FIG. 9 A is a workflow of identification of CM negative individuals and phenotypic analysis for sub-clinical evidence of disease;
  • FIG. 9B is a bar graph of a proportion of variants identified in S:N hotspots stratified by variants in participants with no evidence of CM or with CM;
  • FIG. 9C is a bar graph of signal-to-noise values for variants localizing to hotspots in CM negative participants compared to variants in participants with CM;
  • FIG. 9D is a bar graph of a proportion of variants hosted in CM negative individuals with dizziness and a murmur outside S:N hotspots compared to in S:N hotspots;
  • FIG. 9E is a bar graph of a proportion of variants hosted in individuals with syncope in variants with no classification change versus upgraded with DiscoVari incorporated for classification;
  • FIG. 10 illustrates a conceptual diagram of a system for predicting risk of disease on a population-level using the disclosed techniques and artificial intelligence (Al) techniques;
  • FIG. 11 is a schematic diagram that shows an example of a computing device and a mobile computing device for use in conjunction with the present disclosures.
  • ACM arrhythmogenic cardiomyopathy
  • ACMG American College of Medical Genetics and Genomics
  • B benign
  • CM cardiomyopathy
  • CPVT catecholaminergic polymorphic ventricular tachycardia
  • DCM dilated cardiomyopathy
  • ES exome sequencing
  • gnomAD Genome Aggregation Database
  • GS genome sequencing
  • HCM hypertrophic cardiomyopathy
  • LB likely benign
  • LP likely pathogenic
  • LQTS long QT syndrome
  • MAF minor allele frequency
  • P pathogenic
  • SCD sudden cardiac death
  • S:N signal-to-noise
  • TTN titin
  • TTNtvs truncating variants in TTN', VUS, variant of uncertain/unknown significance.
  • Articles “a’- and “an’- are used herein to refer to one or to more than one (z.e. , at least one) of the grammatical object of the article.
  • an element means at least one element and can include more than one element.
  • “About” is used to provide flexibility to a numerical range endpoint by providing that a given value may be “slightly above” or “slightly below” the endpoint without affecting the desired result.
  • the term "subject” and “patient” are used interchangeably herein and refer to both human and nonhuman animals.
  • the subject comprises a human who is undergoing a medical procedure using a system or method as prescribed herein.
  • the disclosed techniques provide for improved specificity in putatively pathogenic variant identification by downgrading most LP/P variants in HCM negative individuals to VUS.
  • the importance of this finding is highlighted by the markedly high general population prevalence of 1: 149 for LP/P variants in HCM-associated genes, despite a notably low disease penetrance approximately in the range of about 1.3% to about 2.1%.
  • Similar work in dilated CM also found markedly low penetrance in the UK biobank (UKBB), as an illustrative example, the UKBB being a large-scale biomedical database and research resource.
  • variant classification changes in individuals with an HCM diagnosis can be correlated with ClinVar and in silico predictions.
  • DiscoVari of the DiscoVari tool enables a more accurate prediction of how likely a gene(s), often a rare gene(s), is going to result in a patient developing a particular disease. While in the past the detection of such rare genes often led to an over-inclusive determination that a particular disease was likely to occur in view of the detection of that gene, the disclosures herein provide a much more refined approach, improving false determinations that a disease is likely to occur in view of a particular gene by at least one-third (1/3).
  • a platform provides a user the ability to input a gene and amino acid of input, and the user then obtains a signal-to-noise (S:N) value for a given amino acid residue.
  • S:N signal-to-noise
  • a user can select a gene from a given list of genes, such as provided by a dropdown menu.
  • the user can also select the amino acid position, which can be any whole number (e.g., 1, 2, 3. ..135, 136, 137. ..etc.).
  • the tool can search the database to identify the S:N value for that gene’s amino acid position in conjunction with a disease(s) for which the genedisease pair has been validated.
  • the result can be compared to an S:N threshold value for that gene-disease pair.
  • the tool can flag when the S:N ratio has or has not exceeded the S:N threshold for that gene-disease pair, among other information. Refer to FIGs. 10B and 10C for further discussion.
  • FIGs. 1 A, IB, and 1C depict an illustrative summary of cohorts and methodology for performing the disclosed techniques.
  • FIG. 1A is a schematic of derivation of disease-associated variants.
  • TTN truncating variants were included in a CM variant (e.g., 3722 Probands TTN).
  • ACM, CPVT, LQTS, and HCM variants e.g., 1421 Probands PKP2, DSC2, DSG2, DSP; 155 Probands RYR2; 4356 Probands KCNQ1, KCNH2, SCN5A; and 5520 Probands MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TMP1, MYL2, MYL3, respectively
  • FIG. IB is a diagram of methodology for gnomAD variant inclusion. Variants with an MAF greater than the highest pathologic MAF for that gene in the disease cohort can be excluded.
  • Pathogenic variant MAF approximately ranged from about 0.15% (ACTC7) to about 40.3% (RYR2).
  • the control cohort consisted of rare genetic variants derived from gnomAD with genotyping of 141,456 individuals, as shown by FIG. IB.
  • Aggregate gnomAD MAF approximately ranged from about 0.87% (TTNtvs) to about 5.38% (ACM-associated genes), highlighting intragenic heterogeneity in MAF.
  • FIG. 1C shows a schematic of an S:N modeling design. Variant prevalence and gene-level S:N were evaluated by averaging the S:N across all amino acid positions to compare variant frequencies in the case cohorts and gnomAD, as shown by FIG. 1A. Disease-associated MAF was normalized to gnomAD MAF to establish S:N ratios at the amino acid-level. Pathogenic hotspots were areas with S:N above calculated gene-specific thresholds. Amino acid residues were considered a “hotspot” if the S:N ratio exceeded respective gene-specific thresholds.
  • topology maps overlaying S:N across domains high S:N was found in the A-band of TTN and pore regions of LQTS genes, which are known genetic hotspots.
  • Topology maps for ACM-, CM-, CPVT-, LQTS-, and HCM-associated genes may also demonstrate the S:N across functional domains.
  • FIGs. 2A, 2B, 2C, 2D, 2E, and 2F illustrate workflows and bar graphs of LP/P ClinVar variants as being commonly found in S:N hotspots compared to LB/B variants.
  • S:N was applied to all ClinVar LB/B and LP/P variants in DiscoVari genes, excluding any variants from the provided disease cohorts used to identify hotspots, as shown by FIG. 2A, which represents an S;N validation workflow.
  • FIG. 2A represents an S;N validation workflow.
  • CM and channelopathy variants that were not found in the provided disease cohorts were included.
  • S:N was applied to all LB/B and LP/P variants.
  • FIG. 2B illustrates a bar graph of the percent of variants located within S:N hotspots for all LB/B and all LP/P variants.
  • the proportion of LP/P variants in hotspots (about 43.1 % [40.4-45.9]) was higher than LB/B variants (about 17.8% [13.4-23.2], P ⁇ 0.0001), as shown by block B.
  • the mean S:N ratio for LB/B variants (about 13.8 [4.9-22.7]) was lower than LP/P variants (about 31.0 [26.8-35.3], P ⁇ 0.0001), as shown in FIG. 2C, which provides a plotting of the S:N values for LB/B and LP/P variants in hotspots.
  • FIG. 2C which provides a plotting of the S:N values for LB/B and LP/P variants in hotspots.
  • FIG. 2D illustrates that CM and channelopathy variants with a classification change in ClinVar from initial VUS to LP/P or LB/B (or vice versa) were included.
  • S:N was next applied to ClinVar variants reclassified over time to determine whether hotspots could predict those reassessed from LB/B or VUS to LP/P, as shown by FIG. 2D.
  • Variants in the provided disease cohorts were excluded. Variants were assessed using S:N analysis.
  • FIG. 2E illustrates a bar graph depicting the percent of variants located within S:N hotspots of variants that were reclassified from VUS to LB/B, from LB/B or LP/P to VUS, or from VUS to LP/P.
  • the lowest proportion of variants found in hotspots was among ClinVar variants reevaluated as LB/B (about 23.4% [14.7-35.2]), as shown in FIG. 2E.
  • the percentage of variants reinterpreted as VUSs within a S:N peak was also higher than LB/B variants (P ⁇ 0.01).
  • FIG. 2F illustrates plotted S:N values for reclassified variants.
  • the mean S:N for hotspot variants reclassified to LP/P (about 37.2, [24.0-50.4]) was greater than those reassessed to LB/B (about 5.10, [2.00-8.20], P ⁇ 0.0001), as shown by FIG. 2F.
  • Labels indicate most recent variant classification. **, P ⁇ 0.01. ****, P ⁇ 0.0001.
  • LB/B likely benign/benign; LP/P, likely pathogenic/pathogenic; S:N, signal-to-noise; VUS, variant of uncertain significance.
  • FIGs. 3A, 3B, and 3C illustrate bar graphs showing that clinically re-evaluated LP/P variants can more likely fall within S:N hotspots compared to LB/B variants and VUSs.
  • An illustrative clinical cohort of patients with cardiovascular genetic disease evaluated using the disclosed techniques can include 152 variants in 103 individuals meeting inclusion criteria, including 47 (45.6%) male subjects.
  • the illustrative mean age of genetic testing can be about 26.5 years [21.7-31.3], though the date of genetic testing may be unavailable for some.
  • PM1 criteria represents one of the criteria present in the 2015 ACMG criteria for variant interpretation. This criterion is satisfied if the variant is located within a genetic “hotspot” that is more likely to cause disease or a functional domain contributing to disease when disrupted.
  • PM1 was applied to variants with a S:N ratio above the respective gene-specific threshold.
  • FIG. 3 A illustrates a bar graph of the percent of variants in the clinical cohort located in S:N hotspots stratified by variants classified as LB/B, VUS, and LP/P at time of genetic testing.
  • variants classified as LB/B, VUS, and LP/P were incorporated.
  • PM1 incorporated.
  • LP/P the percent of variants in the clinical cohort located in S:N hotspots stratified by variants classified as LB/B, VUS, and LP/P at time of genetic testing.
  • PM1 incorporated.
  • FIG. 3B illustrates the plotted S:N values for VUSs and LP/P variants in the clinical cohort.
  • the mean S:N ratio for LP/P variants in hotspots was about 37.9 [25.1-50.7] compared to about 10.6 [4.5-16.8] for VUSs (P ⁇ 0.0001), as shown by FIG. 3B.
  • the application of S:N in this cohort suggests that the analysis is clinically valid and aligns with pathogenicity assignments in individuals with diagnostic gene testing.
  • variants interpreted as VUSs at the time of genetic test reporting were categorized based on suspected clinical re-evaluation of the variant in a clinical cohort. Of all VUSs identified in the clinical cohort, about 32.9% [23.7-43.7] were located in hotspots. Upon re-evaluation of the variant, variants disease-associated were considered if there was a phenotypic match to the disease associated with the gene in question, and if that variant was suspected to be pathogenic due to significant family history or proven co-segregation of the variant with affected family members.
  • FIG. 3C illustrates a bar graph of the percent of VUSs located in S:N hotspots stratified by those considered disease-causative, of unknown significance, or suspected benign based on clinical presentation, family history, and/or co-segregation of variants with disease.
  • suspected disease-associated VUSs about 80.0% [49.0-96.4] localized to S:N hotspots, compared to only about 23.3% [13.2-37.7] of suspected benign variants (P ⁇ 0.001), as shown by block FIG. 3C.
  • Similar proportions hotspot variants were found when considering variants called VUSs by ACMG criteria before PM1 incorporation.
  • FIG. 4A is an example architecture 400 of an application for performing the disclosed techniques.
  • a frontend computing system 402 can provide a user interface for end users 406.
  • a hackend computing system 404 can be configured to process information with programming logic. 256-bit Advanced Encryption Standard, or other similar standards, may be used in the backend 404 to store data, and each encryption key can be encoded with a rotating set of master keys.
  • JAVASCRIPT ES6 programming language can be used to build the frontend 402 and the backend 404 of the web application described herein.
  • the application can be accessed using any standard web browser (e.g., SAFARI, GOOGLE CHROME, FIREFOX).
  • the user(s) 406 can provide/collect data using the frontend 402 (408).
  • the frontend 402 as described above, can be configured to provide user interfaces at the user(s) 406 computing device (e.g., computer, laptop, tablet, mobile device, smartphone) through which the user(s) 406 interacts.
  • Providing/collecting data can include providing a gene and/or amino acid position to the frontend 402 using GUI features presented in a display of the user(s) 406 computing device.
  • the frontend 402 can then make a request of the backend system 404 to generate and obtain a S:N ratio for the user-inputted position (410).
  • the backend system 404 can access one or more application logic, file systems, databases, and/or web servers to determine the S:N ratio and/or generate results/information to be presented to the user(s) 406 (412). Accordingly, the backend system 404 can transmit a response back to the frontend 402 and the frontend 402 can display the results at the user(s) 406 computing device (414).
  • the disclosed tool can be built and used to improve variant analysis based on S:N analysis.
  • An example of the tool is provided for at https://discovarilab.duke.edu/, the content at the provided web link of which is incorporated by reference herein in its entirety, including all links and sublinks associated with the provided web link.
  • the user(s) 406 can input a gene and amino acid position to obtain a S:N ratio for the position.
  • the results include the disease in which the disclosed tool can be validated for that gene, the gene-specific S:N threshold, a gene-level S:N plot, and/or the S:N and applicable functional domain(s) for that amino acid residue.
  • Illustrations of an example graphical user interface (GUI) based on using the application described in reference to FIG. 4A are provided in FIGs. 4B-4G, the GUI serving as the frontend 402.
  • FIG. 4B illustrates a GUI 420, which can be presented for the user(s) 406 at their respective computing device as the frontend 402.
  • the GUI 420 can include, for example, selectable options or fields 422 and 424, also referred to as graphical elements or graphical input elements.
  • the selectable option 422 represents a selection of genes that can be listed in a dropdown menu 422d, as shown in FIG. 4C, while the selectable option 424 represents an amino acid (AA) position that can be manually entered, for instance by entering a number.
  • AA amino acid
  • any such option or field can include a dropdown menu, a text field, or other way by which data can be inputted by a user.
  • a GUI input for example the gene KCNH2 (NM_000238.4) illustrated as gene selection 428 and AA position 200 illustrated as AA position selection 429, as shown in FIG. 4D
  • the user can then click on or otherwise choose the select button 426 to cause his or her selections to be processed using the architecture 400.
  • Results from such processing can be presented at the user’s device, for example as shown by the GUI output 480 in FIG. 4E.
  • the example GUI output 480 of FIG. 4E includes an identified associated disease 486, as shown long QT syndrome, a visual result 488, and a tabular result 490.
  • the output 480 may additionally display overlap between diseases associated with particular genes.
  • the visual result 488 and the tabular result 490 can be considered graphical elements or graphical output elements.
  • the visual result 488 includes a graph 488g comparing AA position of the gene and the LQTS/gnomAD variant frequency, as well as a bar line 488b illustrating various aspects of the result: PAS, Pore, SF, cNBD, Hinge, and STK.
  • the visual result 488 may include a functional domains (e.g., the bar line 488b) mapped to a plot (e.g., the graph 488g).
  • a functional domains e.g., the bar line 488b
  • a plot e.g., the graph 488g.
  • PAS, Pore, SF, cNBD, Hinge, and STK are indications that can represent the relevant functional domains of the hERG potassium channel encoded by the KCNH2 gene. These are relevant as mutations overlapping with these areas would potentially disrupt crucial elements of the resulting protein, thereby impacting its normal function and contributing to disease mechanism.
  • the tabular result 490 includes relevant results for the selected AA position, including an S:N threshold, a Signal-to- Noise ratio, a determination as to whether the S:N is above the threshold, and the functional domain.
  • the threshold denotes a variant that would be considered to fulfill the PM1 criteria described herein and therefore predict to be more likely to cause disease, while being below the threshold denotes that the variant is more likely to be a benign population genetic variation.
  • the visual and/or tabular results 488 and/or 490 can be used to indicate whether a variant of the gene may exist at the user-indicated position and/or whether a mutation exists (or does not exist) within a particular hotspot.
  • the S:N ratio at position 200 is 0.00, and thus that value can be flagged as not being above a threshold value, which for the illustrated gene-disease pair may be 1.56. There is thus not a functional domain that is applicable, as represented by the “NA” in the tabular results 490.
  • FIGs. 4F and 4G illustrate a second example using the GUI input 420 having a different resulting GUI output 480'. More particularly, as shown in FIG. 4F, a gene selection 428 made in the selection portion 422 is again the gene KCNH2 (NM_000238.4), but now an AA position selection 429' made in the selectable option 424 is position 600. Again, after clicking on or otherwise choosing the select button 426, using the techniques described herein, the disclosed tool processes the selections and outputs the GUI output 480' using the architecture 400, the output 480' being shown in FIG. 4G. In the illustrated embodiment, the output 480' again includes an associated diseases 486', a visual result 488', and a tabular result 490'.
  • the associated disease 486' is again long QT syndrome.
  • the visual result 488' again includes a graph 488g' and a bar line 488b', the graph and chart conveying similar information as for the output 480. Both the graphs 488g and 488g' are and the bar lines 488b and 488b' are the same.
  • the tabular result 490' includes the same relevant results for the selected AA position, including an S:N threshold, a Signal-to-Noise ratio, a determination as to whether the S:N is above the statistical threshold for being labeled as a genetic “hotspot,” and the corresponding functional domain of the protein if present.
  • the S:N ratio at position 600 is 73.10, and thus that value is flagged as being above the threshold value of 1.56, meaning that the variant localizes in an area of the gene and protein that is a genetic “hotspot.” Further, the resulting functional domain is identified as “Pore.” As shown by bar-line 488b', an AA position of 775 would yield a functional domain of “cNBD.”
  • genes for which the disclosed tools and methods have been validated for missense variants in long QT syndrome may include but are not limited to KCNQ1 and SCN5A.
  • the disclosed tools and methods can be used for missense variants in: arrhythmogenic cardiomyopathy for genes PKP2, DSC2, DSG2, and DSP; catecholaminergic polymorphic ventricular tachycardia for gene RYR2; and hypertrophic cardiomyopathy for genes MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TPM1, MYL2, and MYL3.
  • the disclosed tools and methods can be used for cardiomyopathy-associated truncating variants in gene TTN.
  • FIG. 5 is a flowchart of an example process 500 for incorporating S:N analysis into ACMG criteria when evaluating incidental variants.
  • the process 500 can be performed by a computing system as described herein (e.g., refer to the backend computing system 404 of FIG. 4A).
  • the process 500 can be performed by any other type of computing system and/or network of computing systems configured to perform the disclosed techniques.
  • the process 500 is described from the perspective of a computing system.
  • CM- or channelopathy-associated genes should be referred to a multi-disciplinary center specializing in cardiovascular genetic testing.
  • the disclosed tool may be used and deployed by the computing system as a component of a comprehensive clinical evaluation by providing a correlate for ACMG PM1 criteria (where FHx corresponds to family history and PHx corresponds to personal history).
  • the computing system can identify an incidental variant in a CM and/or channelopathy associated gene (block 502).
  • the computing system can identify the incidental variant using input provided by a relevant user, as described in reference to FIGs. 4A-4G.
  • the incidental variant can be identified, for example, using a variety of genetic testing techniques, including but not limited to exome sequencing, genome sequencing, and/or targeted panel sequencing.
  • the computing system can generate a referral recommendation for an associated user or patient to a multi-disciplinary cardiovascular genetics team (block 504).
  • a clinical evaluation of the associated user can be performed, as indicated in block 506.
  • Results for the clinical evaluation can be documented, recorded, and/or processed.
  • the results may include, but are not limited to, PHx, physical examination, and/or FHx.
  • the results of the clinical examination can be compared against one or more rules and/or criteria by the computing system to perform the clinical evaluation.
  • Performing the clinical evaluation may include determine whether there is a low suspicion of disease risk (block 508) or a high suspicion of disease risk (block 510).
  • the computing system may apply one or more ML models and/or Al models in block 506 to evaluate the clinical results and determine the associate user’s level of disease risk.
  • the computing system can proceed to block 512.
  • the computing system may search for one or more related variants using the disclosed tool (e.g., the DiscoVari tool).
  • the computing system can identify a low S:N area (block 514) or an S:N hotspot PM1 being met (block 516). For example, the computer system can match the inputted variant value and inputted gene value to data (e.g. , in a database, data repository, data store) that correlates to the same gene position.
  • the low S:N area in block 514 can indicate a variant that does not meet a predetermined threshold level/value and therefore does not meet the PM1 criteria described herein. Accordingly, the variant is predicted as more likely to be benign. If the S:N hotspot PM1 is met in block 516, then the S:N of the variant is predicted to be above the predetermined threshold, and therefore is predicted to be more likely disease causative.
  • the computing system can identify a low S:N area (block 518) or an S:N hotspot PM1 being met (block 520). Refer to blocks 514 and 516 for further discussion.
  • the computing system can apply one or more ACMG criteria (block 522).
  • a relevant user can provide input indicating that the one or more criteria should be applied.
  • the computing system may apply the one or more criteria without first receiving user input indicating such action to be taken.
  • applying the ACMG criteria can lead to the computing system generating an LP/P indication in block 534.
  • the computing system can generate a recommendation for continued follow-up in block 536, and can subsequently recommend cascade testing of relatives (e.g., first degree relatives).
  • Such information can be presented in GUI displays at a computing device of the associated relevant user. Refer to at least FIGs. 4A-4G for further discussion.
  • the process 500 is one example of an implementation of the techniques and systems provided for herein. It is intended to illustrate an example of how the disclosed DiscoVari tool can be used as a first-in-kind tool for assisting users in identifying disease variant hotspots (e.g. cardiovascular disease variant hotspots), or a lack of hotspots, so there can be a better identification of those most likely to be susceptible to medical issues.
  • disease variant hotspots e.g. cardiovascular disease variant hotspots
  • a lack of hotspots so there can be a better identification of those most likely to be susceptible to medical issues.
  • a person skilled in the art, in view of the present disclosures will appreciate other workflows that can be followed in view of the present disclosures and/or result from the present disclosures.
  • identifying individuals at-risk of developing disease before symptom onset can reduce morbidity and mortality.
  • cascade testing of a LP/P variant in a proband can identify family members at risk of disease development (see, e.g., block 538 in the process 500 of FIG. 5). This can be essential as it relates to heart disease, for example, as SCD can be the first presentation of inherited cardiac disorders.
  • the disclosed technology can make accurate determinations of whether a variant may, or may not, be disease- associated critical.
  • a population-based amino acid-level S:N analysis can be used by the computing system to identify genetic hotspots where variants may be more likely to be disease-causative.
  • S:N analysis can be incorporated into ACMG criteria by using S:N as a correlate for PM1, which can define variants located in mutational hotspots or functional domains with no benign variation. PM 1 implementation can sometimes be inconsistent across laboratories; thus, S:N can be used to standardize application.
  • the disclosed tool e.g., DiscoVari
  • DiscoVari can help determine whether variants fulfill this criterion, offering an easy, precise search to pinpoint hotspots with a higher probability of causing disease and low background variation.
  • this user-friendly method for incorporating S:N into ACMG criteria was validated in patients clinically evaluated across ClinVar variants.
  • ClinVar LP/P variants were found in areas of high signal than VUSs and LB/B variants, which also extended to variants with an increase in pathogenicity.
  • LP/P variants were more commonly found in hotspots and with a higher mean S:N than VUSs. Incorporating S:N into variant classification may help determine whether individuals with genetic variants need to be followed or evaluated to ensure early identification and treatment of SCD-predisposing diseases.
  • a compounding factor among incidental variants is the often limited clinical history, phenotyping, and family history available. Such phenotyping can be critical to variant interpretation but may be absent when variants are found incidentally rather than through diagnostic testing. Moreover, variant calls can be fluid and change over time, adding additional burden on various entities. Overcalling pathogenic variants can cause significant stress for patients and their families, potentially resulting in misdiagnoses and inappropriate clinical management. Pre-genetic sequencing counseling can remain key to discussing potential findings with patients. Multi-disciplinary interpretation of genetic testing results can be important before reporting to appropriately determine variant pathogenicity and risk.
  • DiscoVari can help distinguish between VUS and LP/P variant by assessing variants when clinical data is unavailable and there is low pre-genetic test suspicion of cardiac disease.
  • S:N can provide clarity when determining variant pathogenicity, evidenced by retrospectively evaluated clinical cohorts.
  • DiscoVari represents one part of the return of results by providing a basis for ACMG PM1 criteria, particularly for diagnostic VUSs and for incidental variants. It can be part of a full clinical evaluation at a multi-disciplinary center specializing in cardiovascular genetic testing. Importantly, decisions regarding clinical patient management should not be based on any single evaluation component, including DiscoVari.
  • the variant should be clinically reevaluated in the context of a complete clinical evaluation (personal history, physical exam for relevant disease, family history).
  • Amino acid-level analysis can also offer an additional tool to define the diagnostic weight of variants by identifying those in genetic loci, and therefore, greater diagnostic relevance.
  • amino-acid level signal-to- noise analysis can be a more precise way to determine variant disease-risk and may also impact ACMG criteria.
  • the PM1 “hotspot” criteria can represent a binary instrument for incorporating continuous relative risk variable that the disclosed S:N analysis provides.
  • S:N residues 5
  • 502 42.56
  • S:N threshold 1.42
  • the region around residue 5 may not be considered a hotspot based on previous studies, and there may be two variants at this position listed in ClinVar without evidence of pathogenicity).
  • HCM-associated variants in MYBPC3 can be p.Arg502Trp.
  • PM1 may not reflect heterogeneity within current “hotspots,” nor does it allow for a quantitative assessment of disease association.
  • the disclosed DiscoVari tool can provide a more nuanced consideration of genetic hotspots and S:N analysis for various frameworks of variant interpretation.
  • variants with a clinically significant change in their ClinVar classification can be used as another method of validation.
  • VUS to LB/B or LP/P and vice versa can be used as another method of validation.
  • listed variants and their classifications can be curated and regularly reviewed by expert groups or other relevant users. While this strategy may not replicate the strength of an individualized clinical validation in a large, prospective cohort, it can support utility of S:N analysis given the stringent criteria used by ClinVar working groups for variant classification.
  • S:N can also be a relative measure of effect size, such that the strength of the provided results can be confined by the number of disease-associated variants incorporated into the present analysis.
  • SCN5A can be associated with LQTS, Brugada syndrome, and/or DCM, which have different mechanisms of disease (with gain of function variants associated with LQTS, compared to loss of function variants in Brugada).
  • S:N across the spectrum of cardiomyopathies for individual genes can be an important future direction to expand DiscoVari' s utility and understand differences and similarities in phenotype-specific genetic hotspots.
  • FIGs. 6A, 6B, and 6C illustrate example variants and methodology for performing the disclosed techniques.
  • a large populationbased cohort can be used, such as UKBB, as a merely illustrative example.
  • FIG. 6 A provides a schematic of derivation of UKBB variants for analysis.
  • Only participants with ES and variants in DiscoVari genes may be included in this illustrative example.
  • Variants can be categorized into variants in HCM-associated genes, or TTN variants.
  • FIG. 6B illustrates a bar graph of the variant prevalence in the UKBB, stratified by DiscoVari genes 600, HCM-associated variants 602, and TTN variants 604. Notably, in this example, about 80.8% [80.6-80.9] of individuals in the UKBB hosted a variant in a DiscoVari gene, as shown by FIG. 6B. Prevalence can be shown across all variants, or those classified as LBB, VUS, or LPP.
  • FIG. 6C illustrates a bar graph of the proportion of cohort variants hinging on PM1 stratified by DiscoVari genes 600, HCM-associated variants 602, and TTN variants 604. Variants hinging on PM1 were defined as those that changed in classification from VUS to LPP/LBB and vice versa. When PM1 was added, about 54.1% [52.4-55.7] of TTNtvs changed classification, as shown by FIG. 6C.
  • DiscoVari S:N can be interpreted by using it as a correlate for ACMG PM1 criteria.
  • DiscoVari may represent one component of the return of results, which can be done in a multi-disciplinary fashion at a specialized cardiovascular genetics center incorporating evidence-based guidelines to appropriately determine variant pathogenicity and risk.
  • Clinical patient management generally should not be decided upon any individual aspect of the evaluation.
  • the disclosed techniques can be implemented in the context of a full clinical evaluation, to include a personal and family history, as well as a physical exam and testing for the disease in question. Guidelines recommend a probabilistic, Bayesian approach for variant interpretation by incorporating the pre-test suspicion of cardiac disease in addition to the diagnostic weight of a genetic test.
  • the amino acid-level S:N analysis provided by DiscoVari can provide an additional tool for quantifying diagnostic relevance of variants in S:N-predicted genetic hotspots.
  • FIGs. 7A, 7B, 7C, 7D, 7E, and 7F illustrate an example Signal-to-Noise implementation using the disclosed techniques to enrich for HCM-associated genes.
  • the disclosed /WvcoVuri-based S:N analysis can be validated to accurately identify putatively pathogenic variants in individuals with disease.
  • FIG. 7A illustrates a schematic of missense variants identified in HCM-associated genes and their classification change to DiscoVari classification. As an illustrative example a total of 9,378 missense variants were identified in HCM-associated genes, with 36 of these variants being identified in participants with an ICD- 10 code-based diagnosis of HCM, as shown by FIG. 7A.
  • 1,371 variants were downgraded in HCM negative participants when incorporating S:N, representing a downgrade of about 56.2% [54.2-58.1] of the original LP/P variants. Only 47 variants were upgraded from VUS to LP/P in HCM negative participants using DiscoVari.
  • FIG. 7B illustrates a bar graph of the prevalence of HCM by 1CD-10 code in UKBB participants included in the example cohort.
  • FIG. 7C illustrates a bar graph of the proportion of variants identified in S:N hotspots (defined as those with a S:N ratio above the gene-specific threshold) stratified by variants in participants with no evidence of HCM (HCM negative) or with HCM.
  • HCM negative HCM negative
  • FIG. 7C illustrates a bar graph of the proportion of variants identified in S:N hotspots (defined as those with a S:N ratio above the gene-specific threshold) stratified by variants in participants with no evidence of HCM (HCM negative) or with HCM.
  • FIG. 7D illustrates plotted signal-to-noise values for variants in HCM negative participants compared to variants in participants with HCM. There was a higher median S:N value for variants found in people with HCM (p ⁇ 0.05), shown in FIG. 7D. Such results support that S:N hotspots can correlate with evidence of HCM in the UKBB.
  • FIG. 7E illustrates a bar graph of the proportion of variants hosted in individuals with HCM where ACMG PM1 criteria was met or not by Interpretation and Validation of Genomic Variants (InterVar) (700) (which is a non-limiting example of a tool that can be used to assist in interpreting genetic variants identified through sequencing technologies) and using DiscoVari signal-to-noise analysis (702). Specifically, with PM1 criteria not met, about 0.26% [0.
  • variants may be hosted in individuals with HCM, compared to about 0.59% [0.39-0.90] with PM1 applied by S:N (p ⁇ 0.01), as shown by FIG. 7E.
  • InterVar which is currently used in the UKBB to predict variant pathogenicity, did not show enrichment for HCM by their PM1 assignment (about 0.54% [0.23-1.26] when PM1 was not met and about 0.37% [0.26- 0.52] when PM1 was met).
  • FIG. 7F illustrates a proportion of LP/P variants in HCM negative individuals and those with HCM using InterVar (700) and DiscoVari signal-to-noise (702) for applying PM1 criteria.
  • * p ⁇ 0.05.
  • **** p ⁇ 0.001.
  • Using DiscoVari to evaluate these variants reduced the burden of LP/P variants in individuals without an HCM diagnosis (about 12.0% [11.3-12.6]) compared to InterVar (about 26.1% [25.2-27.0], p ⁇ 0.0001), as shown in FIG. 7F.
  • FIGs. 8A, 8B, 8C, and 8D illustrate that sub-clinical HCM phenotypes can be more prevalent in variants upgraded in classification by using the disclosed techniques.
  • the disclosed techniques can be used to determine whether S:N predicted hotspot variants correlated with sub-clinical evidence of disease, as shown by FIG. 8A.
  • FIG. 8A illustrates a schematic of identification of HCM negative individuals and phenotypic analysis for sub-clinical evidence of disease.
  • FIG. 8B illustrates a bar graph of the proportion of missense HCM-associated variants hosted in individuals with evidence of an HCM-associated phenotype ( ⁇ ?.g., Afib/flutter, cardiomyopathy, murmur, chest pain) stratified by variants outside S:N hotspots 800 or in S:N hotspots 802.
  • FIGs. 8C and 8D provide example bar graphs of the proportion of variants hosted in HCM- negative individuals with Afib/flutter (refer to FIG. 8C) or chest pain (refer to FIG. 8D) in downgraded or upgraded variants with DiscoVari incorporated for classification.
  • FIGs. 9A, 9B, 9C, 9D, and 9E provide examples of cardiomyopathy in TTNtvs localizing to S:N hotspots.
  • the disclosed technology can provide for the utility of S:N analysis in TTNtv evaluation, which can be found as a common source of population based variants in the UKBB described above.
  • Participants with TTNtvs can be categorized into those with a CM-associated ICD10 and CM negative participants, as shown by FIG. 9A.
  • FIG. 9A illustrates a schematic 900 of identification of CM negative individuals and phenotypic analysis for sub-clinical evidence of disease.
  • Patients can be evaluated with truncating variants in TTN (block 902). After review and analysis of the patient’s medical records, ECG(s), and/or cardiac MRI (block 904), the patients can be evaluated as CM negative (block 906) or CM positive individuals (block 910) to distinguish variants found in individuals with or without disease. In individuals with no clear diagnosis (block 906), further analysis and evaluation can be performed for sub-clinical evidence of disease that is related to cardiomyopathies (block 908). Block 908 may not warrant a full diagnosis, in some implementations.
  • FIG. 9C demonstrates plotted signal-to-noise values for variants localizing to hotspots in CM negative participants compared to variants in participants with CM. Similar to in HCM, there was a higher median S:N of hotspot variants in CM (p ⁇ 0.01), shown by FIG. 9C.
  • FIG. 9D illustrates a bar graph of the proportion of variants hosted in CM negative individuals with dizziness and a murmur outside S:N hotspots 900 compared to in S:N hotspots 902.
  • FIG 9E illustrates a proportion of variants hosted in individuals with syncope in variants with no classification change versus upgraded with DiscoVari incorporated for classification.
  • CM negative individuals a higher proportion of variants was in individuals with reported syncope in upgraded variants (about 6.5% [4.5-9.4]) compared to those that did not change (about 3.8% [3.2-4.5]) classification (p ⁇ 0.01), as shown by FIG. 9E.
  • S:N hotspots correlated with evidence of CM in TTNtvs, and correlated with sub-clinical symptoms in CM negative UKBB participants.
  • FIG. 10 illustrates a conceptual diagram of a system 1000 for predicting risk of disease on a population-level using the disclosed techniques and artificial intelligence (Al) techniques.
  • the system 1000 can include a backend computing system 1002, a user computing device 1004, and/or a data store 1008 in communication (e.g., wired, wireless) via network(s) 1006.
  • the system components 1002, 1004, and/or 1008 are similar to other computer components described herein.
  • the backend computing system 1002 can be configured to generate predictions of disease risks based on inputs such as particular genes and/or amino acid (AA) positions.
  • the backend computing system 1002 can generate such predictions using the disclosed techniques and technology.
  • the backend computing system 1002 can generate the predictions using Al techniques, models, and/or algorithms.
  • the user computing device 1004 can be configured to present the disclosed tool in one or more web applications and/or mobile applications. Refer to at least FIGs. 4A-4G for further discussion.
  • the backend computing system 1002 can transmit code for launching an application for predicting disease risk at the user computing device 1004 (block A, 1010).
  • the code can be transmitted in response to a user providing input at their device 1004 to launch a web-based application in a web browser (e.g., by typing in a URL or searching for the application using a search engine in their browser).
  • the user computing device 1004 can execute the code so that the application can be launched and displayed in one or more GUIs at the device 1004 (block B, 1012). Refer to at least FIGs. 4B-4G for further discussion about the application presented in the GUIs. [00164]
  • the user computing device 1004 can receive user input indicating one or more genes and/or amino acid positions in block C (1014). The user input can be provided using GUI features presented as part of launching the application at the device 1004. Refer to FIGs. 4B- 4G for further discussion.
  • the user input can be provided for a particular user/patient and/or a population of users/patients.
  • the disclosed techniques can be used as a diagnostic variant classifier to predict disease risk on a population level.
  • the user computing device 1004 can transmit the user input to the backend computing system 1002 in block D (1016).
  • the backend computing system 1002 can retrieve an Al model from the data store 1008 in block E (1018).
  • the Al model can be locally stored at the backend computing system 1002 and/or the data store 1008 can be part of the backend computing system 1002.
  • the backend computing system 1002 can train genespecific machine learning/ Al models. Then, the backend computing system 1002 can retrieve the model(s) that corresponds to the particular gene(s) provided as part of the user input.
  • the gene-specific model(s) can provide for post-genetic testing diagnostic analyses and high performance prediction of variant pathogenicity.
  • the backend computing system 1002 can also train and use a model that is trained to automatically compare different model configurations with ensembling and cross-validation to identify high- performing models for a particular gene of interest. The identified model(s) can then be retrieved and used by the backend computing system 1002 to perform the disclosed techniques.
  • the backend computing system 1002 can generate datasets for one or more gene variants.
  • Each of the gene variants can be associated with a ground truth pathogenic label or benign label.
  • a model for each gene can be applied to a model deployment dataset of VUS for that particular gene to reclassify those VUS as likely benign, still uncertain, or likely pathogenic, and perform functional and clinical evaluation of model predictions on VUS.
  • Model input features for each gene variant can include consensus amino acid, mutant amino acid, amino acid position, domain, evolutionary conservation, rate of evolution, signal-to-noise ratio, position-specific scoring matrix score, and/or any combination thereof.
  • the backend computing system 1002 can perform cross-validation of each model configuration on the training dataset (e.g., the dataset of pathogenic and benign variants) to calculate an area under a receiver operating characteristic, an average precision, and/or a calibration curve slope.
  • the backend computing system 1002 may then automatically select a preferred model configuration for each gene based on the calibration curve slope e.g., being close to 1) and average precision (e.g., as high as possible).
  • the backend computing system 1002 can then train a model with the selected model configuration using the benign and pathogenic variants in the training dataset mentioned above. This trained model can then be applied to the model deployment dataset of VUS. Based on the final trained model for each gene, the backend computing system 1002 can also calculate thresholds for VUS reclassification based on negative predictive values and positive predictive values.
  • an Al model can be trained using one or more of the following: (i) known disease-associated variants (or benign variants) through databases (e.g., ClinVar); (ii) whether each (or one or more) of the variants lands in a functional domain of a resultant protein; (iii) how evolutionarily-conserved an amino acid residue hosting the variant is across species; and/or (iv) a signal-to-noise analysis at the amino acid level. Accordingly, the model can be trained to identify and correlate user inputs with different types of diseases based, at least in part, on validated gene-disease associations.
  • Various types of machine learning models may be used with the disclosed technology, including but not limited to Random Forest models and/or Gradient Boosting models.
  • the backend computing system 1002 can provide the user input as inputs to the Al model.
  • the Al model can be used to improve calculations described herein on the user input, thereby yielding accurate thresholds that can serve for measuring whether one or more detected genes are more likely than not to leave to particular diseases.
  • the backend computing system 1002 can receive, as output from the Al model, indications of disease risk for the gene(s) and/or the AA position(s) (block G, 1022).
  • the model output can include, for example, a probabilistic value of a likelihood that the particular variant will result in a penetrant disease. This can be an improvement over existing techniques, which may be designed around how likely it is that the variant is disease-associated. Using existing techniques, a variant can be predicted to be 100% disease-associated, but an individual hosting the variant may never develop any signs of disease.
  • the Al model described herein, on the other hand, may not only predict disease association, but more specifically, disease penetrance, which includes the likelihood that the individual will actually manifest the particular disease phenotype.
  • the backend computing system 1002 may then generate disease risk results and/or recommendations based on the model output and/or one or more criteria. Refer to the process 500 of FIG. 5 for further discussion about determining how the disease risk can factor into treatments, additional testing, and/or other recommendations for the relevant user(s).
  • the backend computing system 1002 can return the results and/or recommendations to the user computing device 1004 and/or the data store 1008 (for storage and/or later retrieval) in block I (1026).
  • the results and/or recommendations can be presented at the user computing device 1004, as described at least in reference to FIGs. 4B-4G.
  • FIG. 1 1 is a schematic diagram that shows an example of a computing system 1 100.
  • the computing system 1100 includes one or more computing devices (e.g., computing device 1110), which can be in wired and/or wireless communication with various peripheral device(s) 1180, data source(s) 1190, and/or other computing devices (e.g. , over network(s) 1170).
  • the computing device 1110 can represent various forms of stationary computers 1112 (e.g., workstations, kiosks, servers, mainframes, edge computing devices, quantum computers, etc.) and mobile computers 1114 (e.g., laptops, tablets, mobile phones, personal digital assistants, wearable devices, etc.).
  • the computing device 1110 can be included in (and/or in communication with) various other sorts of devices, such as data collection devices (e.g., devices that are configured to collect data from a physical environment, such as microphones, cameras, scanners, sensors, etc.), robotic devices (e.g., devices that are configured to physically interact with objects in a physical environment, such as manufacturing devices, maintenance devices, object handling devices, etc.), vehicles (e.g., devices that are configured to move throughout a physical environment, such as automated guided vehicles, manually operated vehicles, etc.), or other such devices.
  • data collection devices e.g., devices that are configured to collect data from a physical environment, such as microphones, cameras, scanners, sensors, etc.
  • robotic devices e.g., devices that are configured to physically interact with objects in a physical environment, such as manufacturing devices, maintenance devices, object handling devices, etc.
  • vehicles e.g., devices that are configured to move throughout a physical environment, such as automated guided vehicles, manually operated vehicles, etc.
  • Each of the devices
  • the computing device 1110 can be part of a computing system that includes a network of computing devices, such as a cloud-based computing system, a computing system in an internal network, or a computing system in another sort of shared network.
  • a network of computing devices such as a cloud-based computing system, a computing system in an internal network, or a computing system in another sort of shared network.
  • Processors of the computing device (1110) and other computing devices of a computing system can be optimized for different types of operations, secure computing tasks, etc.
  • the components shown herein, and their functions, are meant to be examples, and are not meant to limit implementations of the technology described and/or claimed in this document.
  • the computing device 1110 includes processor(s) 1120, memory device(s) 1130, storage device(s) 1140, and interface(s) 1150. Each of the processor(s) 1120, the memory device(s) 1130, the storage device(s) 1140, and the interface(s) 1150 are interconnected using a system bus 1160.
  • the processor(s) 1120 are capable of processing instructions for execution within the computing device 1110, and can include one or more single-threaded and/or multithreaded processors.
  • the processor(s) 1120 are capable of processing instructions stored in the memory device(s) 1130 and/or on the storage device(s) 1140.
  • the memory device(s) 1130 can store data within the computing device 1110, and can include one or more computer-readable media, volatile memory units, and/or non-volatile memory units.
  • the storage device(s) 1140 can provide mass storage for the computing device 1110, can include various computer- readable media (e.g., a floppy disk device, a hard disk device, a tape device, an optical disk device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations), and can provide date security/encryption capabilities.
  • the interface(s) 1150 can include various communications interfaces e.g., USB, Near-Field Communication (NFC), Bluetooth, WiFi, Ethernet, wireless Ethernet, etc.) that can be coupled to the network(s) 1170, peripheral device(s) 1180, and/or data source(s) 1190 (e. g. , through a communications port, a network adapter, etc.).
  • Communication can be provided under various modes or protocols for wired and/or wireless communication. Such communication can occur, for example, through a transceiver using a radio-frequency. As another example, communication can occur using light e.g., laser, infrared, etc.) to transmit data.
  • the interface(s) 1150 can include a control interface that receives commands from an input device (e.g., operated by a user) and converts the commands for submission to the processors 1120.
  • the interface(s) 1150 can include a display interface that includes circuitry for driving a display to present visual information to a user.
  • the interface(s) 1150 can include an audio codec which can receive sound signals (e.g., spoken information from a user) and convert it to usable digital data. The audio codec can likewise generate audible sound, such as through an audio speaker. Such sound can include real-time voice communications, recorded sound (e.g., voice messages, music files, etc.), and/or sound generated by device applications.
  • the network(s) 1170 can include one or more wired and/or wireless communications networks, including various public and/or private networks.
  • Examples of communication networks include a LAN (local area network), a WAN (wide area network), and/or the Internet.
  • the communication networks can include a group of nodes (e.g. , computing devices) that are configured to exchange data (e.g., analog messages, digital messages, etc.), through telecommunications links.
  • the telecommunications links can use various techniques e.g., circuit switching, message switching, packet switching, etc.) to send the data and other signals from an originating node to a destination node.
  • the computing device 1110 can communicate with the peripheral device(s) 1180, the data source(s) 1190, and/or other computing devices over the network(s) 1170. In some implementations, the computing device 1110 can directly communicate with the peripheral device(s) 1180, the data source(s), and/or other computing devices.
  • the peripheral device(s) 1180 can provide input/output operations for the computing device 1110.
  • Input devices e.g., keyboards, pointing devices, touchscreens, microphones, cameras, scanners, sensors, etc.
  • Output devices e.g. , display units such as display screens or projection devices for displaying graphical user interfaces (GUIs)
  • audio speakers for generating sound, tactile feedback devices, printers, motors, hardware control devices, etc.
  • output from the computing device 1110 e.g. , user-directed output and/or other output that results in actions being performed in a physical environment.
  • input from a user can be received in any form, including visual, auditory, or tactile input
  • feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback).
  • the data source(s) 1190 can provide data for use by the computing device 1110, and/or can maintain data that has been generated by the computing device 1110 and/or other devices (e.g., data collected from sensor devices, data aggregated from various different data repositories, etc.).
  • one or more data sources can be hosted by the computing device 1110 (e.g., using the storage device(s) 1140).
  • one or more data sources can be hosted by a different computing device. Data can be provided by the data source(s) 1190 in response to a request for data from the computing device 1110 and/or can be provided without such a request.
  • a pull technology can be used in which the provision of data is driven by device requests, and/or a push technology can be used in which the provision of data occurs as the data becomes available (e.g., real-time data streaming and/or notifications).
  • Various sorts of data sources can be used to implement the techniques described herein, alone or in combination.
  • a data source can include one or more data store(s) 1190a.
  • the database(s) can be provided by a single computing device or network (e.g., on a file system of a server device) or provided by multiple distributed computing devices or networks e.g., hosted by a computer cluster, hosted in cloud storage, etc.).
  • DBMS database management system
  • APIs application programming interfaces
  • the database(s) can include relational databases, object databases, structured document databases, unstructured document databases, graph databases, and other appropriate types of databases.
  • a data source can include one or more blockchains 1190b.
  • a blockchain can be a distributed ledger that includes blocks of records that are securely linked by cryptographic hashes. Each block of records includes a cryptographic hash of the previous block, and transaction data for transactions that occurred during a time period.
  • the blockchain can be hosted by a peer-to-peer computer network that includes a group of nodes (e.g., computing devices) that collectively implement a consensus algorithm protocol to validate new transaction blocks and to add the validated transaction blocks to the blockchain.
  • the blockchain can maintain data quality (e.g. , through data replication) and can improve data trust (e.g. , by reducing or eliminating central data control).
  • a data source can include one or more machine learning systems 1190c.
  • the machine learning system(s) 1190c can be used to analyze data from various sources (e.g., data provided by the computing device 1110, data from the data store(s) 1190a, data from the blockchain(s) 1190b, and/or data from other data sources), to identify patterns in the data, and to draw inferences from the data patterns.
  • training data 1192 can be provided to one or more machine learning algorithms 1 194, and the machine learning algorithm(s) can generate a machine learning model 1196. Execution of the machine learning algorithm(s) can be performed by the computing device 1110, or another appropriate device.
  • Machine learning approaches can be used to generate machine learning models, such as supervised learning (e.g., in which a model is generated from training data that includes both the inputs and the desired outputs), unsupervised learning (e.g., in which a model is generated from training data that includes only the inputs), reinforcement learning (e.g., in which the machine learning algorithm(s) interact with a dynamic environment and are provided with feedback during a training process), or another appropriate approach.
  • supervised learning e.g., in which a model is generated from training data that includes both the inputs and the desired outputs
  • unsupervised learning e.g., in which a model is generated from training data that includes only the inputs
  • reinforcement learning e.g., in which the machine learning algorithm(s) interact with a dynamic environment and are provided with feedback during a training process
  • a variety of different types of machine learning techniques can be employed, including but not limited to convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), and other
  • LOO 182 J Various implementations of the systems and techniques described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
  • a computer program product can be tangibly embodied in an information carrier (e.g. , in a machine-readable storage device), for execution by a programmable processor.
  • Various computer operations e.g. , methods described in this document
  • the described features can be implemented in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • a computer program is a set of instructions that can be used, directly or indirectly, by a computer to perform a certain activity or bring about a certain result.
  • a computer program can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program product can be a computer- or machine-readable medium, such as a storage device or memory device.
  • machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, etc.) used to provide machine instructions and/or data to a programmable processor, including a machine -readable medium that receives machine instructions as a machine-readable signal.
  • machine -readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and can be a single processor or one of multiple processors of any kind of computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data.
  • a computer can also include, or can be operatively coupled to communicate with, one or more mass storage devices for storing data files.
  • Such devices can include magnetic disks (e.g. , internal hard disks and/or removable disks), magneto-optical disks, and optical disks.
  • Storage devices suitable for tangibly embodying computer program instructions and data can include all forms of non-volatile memory, including by way of example semiconductor memory devices, flash memory devices, magnetic disks (e.g., internal hard disks and removable disks), magneto-optical disks, and optical disks.
  • semiconductor memory devices flash memory devices
  • magnetic disks e.g., internal hard disks and removable disks
  • magneto-optical disks e.g
  • the systems and techniques described herein can be implemented in a computing system that includes a back end component e.g., a data server), or that includes a middleware component e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network).
  • the computer system can include clients and servers, which can be generally remote from each other and typically interact through a network, such as the described one. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a method of analyzing a gene to determine susceptibility to a particular disease comprising: receiving a gene selection from a user, the gene selection being one gene of a plurality of genes stored in a database, the plurality of genes being linked to one or more associated diseases; receiving an amino acid position selection from the user that made the gene selection; and outputting information that includes at least one of: a signal-to-noise ratio of the selected gene-amino acid combination, a relative risk level of the one or more associated diseases, an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot, or a disease susceptibility determination.
  • a gene analysis tool comprising: a database of a plurality of genes, the database including information for each gene of the plurality of genes that includes: one or more associated diseases for the gene; a threshold signal-to-noise value for each disease of the one or more associated diseases; and a signal-to-noise value for each amino acid position of the gene; a processor configured to: receive user input that includes a gene selection from the database of a plurality of genes and an amino acid position selection; and output information about disease susceptibility in view of the gene selection and the amino acid position selection based on the information in the database.
  • the information for each gene of the plurality of genes comprises: a signal-to-noise ratio of the selected gene-amino acid combination; and a signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene.
  • processor is further configured to: compare the signal-to-noise ratio of the selected gene-amino acid combination and the signal-to-noise threshold value of the selected gene and the one or more associated diseases for the gene; determine, based on the comparison, whether the signal-to-noise threshold value is exceeded by the signal-to-noise ratio of the selected gene-amino acid combination; and output the determination about the signal-to-noise ratio as compared to the signal-to- noise threshold value.
  • GUI graphical user interface
  • a computer-readable medium storing instructions that, when executed by a processor, cause the processor to: receive a gene selection from a user, the gene selection being one gene of a plurality of genes stored in a database, the plurality of genes being linked to one or more associated diseases; receive an amino acid position selection from the user that made the gene selection; and output information that includes at least one of: a signal-to-noise ratio of the selected gene-amino acid combination, a relative risk level of the one or more associated diseases, an indication of whether the signal-to-noise ratio corresponds to a statistical mutation hotspot, or a disease susceptibility determination.
  • GUI graphical user interface

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne des outils et des procédés associés pour analyser la pathogénicité de variants géniques, l'analyse étant basée sur des rapports signal sur bruit des acides aminés pour le gène. Les outils permettent à un utilisateur d'entrer un gène, conjointement avec une position d'acide aminé, puis l'outil délivre des informations relatives à la prédisposition à une maladie particulière sur la base des données entrées par comparaison avec des informations dans une base de données concernant le gène, la position d'acide aminé et/ou d'autres informations. Les informations dans la base de données peuvent comprendre des données provenant du reste de la population et/ou des informations prédites concernant le public, par exemple à l'aide d'une intelligence artificielle et d'autres mesures prédictives associées. Le résultat est un outil et des méthodologies associées qui prédisent plus précisément la prédisposition réelle d'un patient à présenter ou contracter à terme une maladie particulière que les outils actuels, qui tendent à surestimer la probabilité de présenter ou de contracter une telle maladie.
PCT/US2024/036776 2023-07-03 2024-07-03 Systèmes et procédés de prédiction de pathogénicité de variants sur la base de rapports signal sur bruit d'acides aminés Pending WO2025010357A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363511798P 2023-07-03 2023-07-03
US63/511,798 2023-07-03

Publications (2)

Publication Number Publication Date
WO2025010357A2 true WO2025010357A2 (fr) 2025-01-09
WO2025010357A3 WO2025010357A3 (fr) 2025-04-03

Family

ID=94172172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/036776 Pending WO2025010357A2 (fr) 2023-07-03 2024-07-03 Systèmes et procédés de prédiction de pathogénicité de variants sur la base de rapports signal sur bruit d'acides aminés

Country Status (1)

Country Link
WO (1) WO2025010357A2 (fr)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20260039788A (ko) * 2017-10-10 2026-03-20 시애틀 프로젝트 코포레이션 핫스팟을 이용한 신생항원 동정
US12297426B2 (en) * 2019-10-01 2025-05-13 The Broad Institute, Inc. DNA damage response signature guided rational design of CRISPR-based systems and therapies

Also Published As

Publication number Publication date
WO2025010357A3 (fr) 2025-04-03

Similar Documents

Publication Publication Date Title
Liu et al. Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis
Quazi RETRACTED ARTICLE: Artificial intelligence and machine learning in precision and genomic medicine
Li et al. Genetically determined serum urate levels and cardiovascular and other diseases in UK Biobank cohort: A phenome-wide mendelian randomization study
Fullerton et al. Polygenic risk scores in psychiatry: Will they be useful for clinicians?
Wang et al. Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction
Son et al. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes
US20240289586A1 (en) Diagnostic data feedback loop and methods of use thereof
Ahmad et al. Clinical implications of chronic heart failure phenotypes defined by cluster analysis
Austin et al. Genetics and precision genomics approaches to pulmonary hypertension
Zheng et al. Evaluation of polygenic scores for hypertrophic cardiomyopathy in the general population and across clinical settings
Bush et al. Unravelling the human genome–phenome relationship using phenome-wide association studies
McManus et al. Long-term survival in HIV positive patients with up to 15 years of antiretroviral therapy
Shah et al. Population genomics of cardiometabolic traits: design of the University College London-London School of Hygiene and Tropical Medicine-Edinburgh-Bristol (UCLEB) Consortium
Mallick et al. An integrated Bayesian framework for multi‐omics prediction and classification
Jenny et al. Are mortality and acute morbidity in patients presenting with nonspecific complaints predictable using routine variables?
Lopez-Jimenez et al. Assessing biological age: the potential of ECG evaluation using artificial intelligence: JACC family series
Nasiruddin et al. Predicting heart failure survival with machine learning: assessing my risk
Saqib et al. Machine learning in heart failure diagnosis, prediction, and prognosis
Chen et al. Deep learning-derived 12-lead electrocardiogram-based genotype prediction for hypertrophic cardiomyopathy: a pilot study
Liang et al. Prediction of genotype positivity in patients with hypertrophic cardiomyopathy using machine learning
Care et al. Genetic testing in inherited heart diseases: practical considerations for clinicians
Spendlove et al. Polygenic risk scores of endo-phenotypes identify the effect of genetic background in congenital heart disease
Yang et al. Genetic association studies using disease liabilities from deep neural networks
Mustafa A. Mohammad et al. Classifying the mortality of people with underlying health conditions affected by COVID‐19 using machine learning techniques
Axford et al. Development and internal validation of machine learning–based models and external validation of existing risk scores for outcome prediction in patients with ischaemic stroke

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24836581

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE