WO2022015998A1 - Panels de gènes et leurs procédés d'utilisation pour le criblage et le diagnostic de malformations et de maladies cardiaques congénitaux - Google Patents
Panels de gènes et leurs procédés d'utilisation pour le criblage et le diagnostic de malformations et de maladies cardiaques congénitaux Download PDFInfo
- Publication number
- WO2022015998A1 WO2022015998A1 PCT/US2021/041853 US2021041853W WO2022015998A1 WO 2022015998 A1 WO2022015998 A1 WO 2022015998A1 US 2021041853 W US2021041853 W US 2021041853W WO 2022015998 A1 WO2022015998 A1 WO 2022015998A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genes
- heart
- gene
- enhancer
- embryonic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/16—Primer sets for multiplex assays
Definitions
- the present disclosure generally describes a well-connected hub genes with heart-specific expression targeted by embryonic heart-specific enhancers are likely disease candidates. These functional annotations will allow for better interpretation of whole genome sequencing data in the large number of patients affected by congenital heart defects. Accordingly, provided herein are methods of identifying a subject as at risk of having a congenital heart defect.
- the method of identifying a subject as at risk of having a congenital heart defect comprise assessing a plurality of genes in a sample obtained from the subject.
- the plurality of genes are selected from the genes listed in Table 1.
- the plurality of genes comprise at least 5 genes listed in Table 1.
- the plurality of genes comprises at least 10 genes listed in Table 1.
- the plurality of genes comprises at least 25 genes listed in Table 1. In some embodiments, the panel of genes comprises at least 50 genes listed in Table 1. In some embodiments, the panel of genes comprises at least 100 genes listed in Table 1. In some embodiments, the plurality of genes comprises at least 150 genes listed in Table 1. In some embodiments, the plurality of genes comprises at least 200 genes listed in Table 1. In some embodiments, the plurality of genes comprises at least 250 genes listed in Table 1. In some embodiments, the plurality of genes comprises all of the genes listed in Table 1. In some embodiments, wherein assessing the plurality of genes comprises determining a copy number for at least one of the plurality of genes.
- assessing the plurality of genes comprises detecting the presence of one or more single nucleotide polymorphisms (SNPs) for at least one of the plurality of genes. In some embodiments, assessing the plurality of genes comprises measuring expression of a product of at least one of the plurality of genes.
- a panel of genes for assessing risk of congenital heart defects In some embodiments, the panel comprises at least 10 genes listed in Table 1. In some embodiments, the panel comprises at least 25 genes listed in Table 1. In some embodiments, the panel comprises at least 50 genes listed in Table 1. In some embodiments, the panel comprises at least 100 genes listed in Table 1. In some embodiments, the panel comprises at least 150 genes listed in Table 1.
- the panel comprises at least 200 genes listed in Table 1. In some embodiments, the panel comprises at least 250 genes listed in Table 1. In some embodiments, the panel comprises all of the genes listed in Table 1.
- FIG. 1A shows representative images of primary human embryonic heart tissue at indicated Carnegie stages (CS).
- FIG. 1A shows the data types collected and downstream analyses performed in this study.
- FIG. 1B shows the principal component analysis of genome-wide primary and imputed pchromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) signals. Each mark is indicated by separate colors. Primary samples are shown as triangles and imputed data as circles. Grouping of marks and overall function are indicated in normal and bold text, respectively.
- FIG. 1A shows representative images of primary human embryonic heart tissue at indicated Carnegie stages (CS).
- FIG. 1A shows the data types collected and downstream analyses performed in this study.
- FIG. 1B shows the principal component analysis of genome-wide primary and imputed pchromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq
- FIG. 1C shows the total numbers of each ChromHMM (chromatin state) identified in segmentation of each individual embryonic tissue sample. Samples are ordered from left to right as the earliest to latest time points. Legend of colors is located below using conventions defined by Roadmap Epigenome.
- FIG. 1D shows the average numbers of each chromatin state for all heart samples (red) and all Roadmap Epigenome samples (gray) are shown. Error bars represent SDs for each chromatin state and tissue group PC indicates principal component; PCW, postconception weeks; and PTM, post translational modification.
- FIG. 1C shows the total numbers of each ChromHMM (chromatin state) identified in segmentation of each individual embryonic tissue sample. Samples are ordered from left to right as the earliest to latest time points. Legend of colors is located below using conventions defined by Roadmap Epigenome.
- FIG. 1D shows the average numbers of each chromatin state for all heart samples (red) and all Roadmap Epigenome samples (gray) are shown. Error bars represent SDs for
- FIG. 2A shows the tSNE (t-distributed stochastic neighborhood embedding) projection of imputed H3K27ac P signals at 444413 enhancer segments from tissues profiled by Roadmap Epigenome and in this study. Dots are color coded by tissue as indicated and labeled as each individual tissue samples as profiled by Roadmap Epigenome or in this study.
- FIG. 2B shows the fraction of each of the 25 ChromHMM States, EMERGE, and Dickel datasets that overlap with either active heart enhancers (unshaded) or enhancers active in tissues other than heart (shaded) as tested by the Vista Enhancer Browser (enhancer.lbl.gov).
- FIG. 2D shows the top most significantly enriched motifs in putative EHEs calculated by Hypergeometric Optimization of Motif Enrichment (HOMER). Shown are the position weight matrix for each motif, transcription factor predicted to bind that motif, and HOMER P: HOMER known motifs (top) and de novo motifs (bottom).
- AF-2 indicates activation function 2; AV, atrioventricular; ESC, embryonic stem cell; ESD, ESC-derived; FDR, false discovery Rate; GI, gastrointestinal; and GO, gene ontology.
- FIG. 3A shows the delineation of 3 major stages of heart development during the embryonic period based on Carnegie staging (CS).
- FIG. 3B is a heat map of signal at putative enhancers differentially marked with H3K27ac.
- FIG. 3C is the same as FIG. 3B but with H3K4me2.
- FIG. 3D is a heat map of Z scores for level of significance of motifs enriched in each class of differentially regulated enhancers based on pairwise comparisons of replicates of H3K27ac signal at all embryonic heart enhancer segments using DiffBind. Comparisons are indicated as follows: early up vs mid (EVM), early up vs late (EVL), mid up vs early (MVE), mid up vs late (MVL), late up vs mid (LVM), and late up vs early (LVE). The more significantly enriched motifs are colored yellow.
- FIG. 3E is the same as in FIG. 3D but using H3K4me2 signals.
- FIG. 4A is a University of California Santa Cruz (UCSC) Browser shot of NKX2.5 gene locus showing individual embryo ChromHMM (chromatin state) annotations from this study and Roadmap Epigenome. Samples are ordered from top to bottom based on developmental age, earliest to latest. Chromatin states are indicated by color segments using color convention from FIG. 1C.
- UCSC University of California Santa Cruz
- FIG. 4B is a UCSC Browser shot of locus near the TBX20 gene using the same conventions as in FIG. 4A.
- the region upstream of the TBX20 gene is a human embryonic heart–specific super enhancer (orange bar).
- the strong HEH-specific enhancer states track, as well as the experimentally validated enhancer elements with images to the right.
- all the Roadmap Epigenome ChromHMM segmentations are stacked showing the region is not similarly active in any other profiled tissue.
- 4C are box plots of fold enrichment of overlap of each indicated chromatin state in the human embryonic heart or brain with anchor points identified by capture Hi-C interactions in iPSC-derived cardiomyocytes over matched randomly selected segments. Solid boxes represent embryonic heart chromatin segments while dotted boxes represent adult brain chromatin segments. Significance of difference between embryonic heart and adult brain fold enrichments was calculated using the Mann-Whitney U test and is shown at top (*P ⁇ 0.05, **P ⁇ 0.01, ***P ⁇ 0.001, ****P ⁇ 0.0001). The largest increases in fold enrichments for embryonic heart were identified for strong enhancer states 13 and 14. FIG.
- FIG. 5A is a scatterplot of the log2 fold enrichment and log10 Bonferroni-adjusted significance level of genome-wide association study (GWAS) variants associated with systolic blood pressure in all enhancer segments identified in the strong enhancer states for each embryonic heart sample (bright red), the total reproducible strong enhancers from the whole dataset (dark red), or other tissues in Roadmap Epigenome (blue). All values calculated using only variants with P ⁇ 5 ⁇ 10 ⁇ 8 from GWAS catalog using GREGOR (genomic regulatory elements and Gwas overlap algoRithm).
- FIG. 5B is the same as in FIG. 5A using GWAS variants associated with electrocardiograph traits and measures.
- FIG. 5C is the same as in FIG.
- FIG. 5A using GWAS variants associated with resting heart rate.
- FIG. 5D is the same as in FIG. 5A using GWAS variants associated with QRS complex traits.
- FIG. 5E shows the enrichment of GWAS analysis P for atrial fibrillation in all ChromHMM (chromatin state) annotations as determined by GARFIELD (GWAS analysis of regulatory and functional information enrichment with LD correction). Scatterplot of the odds ratio of atrial fibrillation GWAS SNPS (single nucleotide polymorphism) using the 1 ⁇ 10 ⁇ 8 threshold by the log10 GARFIELD Bonferroni-adjusted P.
- FIG. 5F is the same as in FIG. 5E using GWAS summary statistics for systemic lupus erythematosus.
- Lupus shows the greatest enrichment in strong enhancers identified in immune cell types sorted from blood. Lupus also shows enrichment in repressed and bivalent states in human embryonic heart.
- FIG. 6A shows a heat map showing specificity of expression for 5167 genes identified with elevated Gini scores (>0.5) for 25 tissues from GTEx and embryonic heart.
- FIG. 6B shows the gene ontology enrichments for genes identified as specific for heart, spleen, and embryonic heart, respectively, based on genes from indicated color-coded clusters in FIG. 6A.
- FIG. 6C shows a heat map of Z scores of normalized gene expression for genes identified as differentially expressed in pairwise comparisons of replicates from each of Carnegie stage in our developmental series. Dendrogram on the left is hierarchical clustering of genes across a developmental series. The genes were color coded by cutting the dendrogram at a height, which would result in 4 groups. Purple most highly expressed early. Pink and green expressed most strongly in intermediate stages of the series.
- FIG. 6D shows the gene ontology enrichment maps from the purple (left) and blue (right) gene sets identified in FIG. 6C.
- the size of each dot represents the number of genes, and the color scale represents the ⁇ log2 transformed Benjamini and Hochberg–adjusted P of each ontology. Darker colors indicate higher significance.
- the edges connect overlapping gene sets. The location of each dot is determined by the overlap ratio (OvR) calculated by enrichplot.
- OvR overlap ratio
- FIG. 7A is a plot of gene expression values from embryonic heart (red), adult heart (purple), brain (green), or all other tissues (gray) for genes assigned indicated number of EHEs as determined by Genomic Regions Enrichment of Annotations Tool. Genes assigned multiple EHEs are more strongly expressed in the embryonic heart than in other tissues. Significant differences in distributions of gene expression values in each comparison were determined based on Mann-Whitney U test.
- FIG. 7B is a histogram of distances of EHEs (red) or randomly selected sets of enhancers (gray) to the nearest heart-specific gene (Gini, >0.75) in 10-kb bins up to 100 kb. Overall EHEs are enriched near heart-specific genes over all distances up to 100 kb.
- FIG. 7C is a network plot of gene modules identified by WGCNA using embryonic heart gene expression data. A Pearson correlation of the module eigenvectors was calculated for the edges. Positive correlations of ⁇ 0.5 were included. The location of each module is determined by multidimensional scaling (MDS) of the module eigengene vectors. Modules are color coded based on names assigned by WGCNA. Size of dots indicates the number of genes in each module. Each module is labeled based on the most significant biological process category gene ontology enrichment determined by Database for Annotation, Visualization, and Integrated Discovery (DAVID). Modules are grouped based on related functional category enrichments and distance in MDS space.
- MDS multidimensional scaling
- FIG. 7D shows the trajectories of expression based on eigenvectors reported by WGCNA for each module across the developmental series. Groups and color coding are the same as in FIG. 7C.
- Group 1 modules have generally declining expression and include many genes involved in developmental patterning.
- Group 3 modules generally have increasing expression.
- Groups 2 and 4 have multiphasic but offset expression and contain genes involved in chromatin regulation and muscle cell differentiation and function.
- GTEx indicates Genotype-Tissue Expression Project.
- FIG. 8A shows dot plots of gene enrichment within the WGCNA modules. The lists of genes used are curated from multiple sources, while embryonic heart–specific enhancer segments (EHEs) and Gini are from this article. The groups correspond to FIG. 5.
- FIG. 5 shows the trajectories of expression based on eigenvectors reported by WGCNA for each module across the developmental series.
- Groups and color coding are the same as in FIG. 7C.
- Group 1 modules have generally declining expression and include many genes
- FIG. 8B shows network of multidimensional scaling coordinates and pairwise correlation scores for the violet module in group 4 in FIG. 8D, which is enlarged to show detail. All genes with correlation value >0.88 with any other gene are plotted. Size of shape indicates highly connected hub genes. Diamonds represent genes assigned EHEs. Purple filled shapes indicate heart-specific gene expression (Gini, ⁇ 0.5). Hub genes are labeled with gene symbol. Genes directly positively regulated by NKX2-5 (NK2 homeobox 5) binding are indicated with yellow. Several hub genes that have all these criteria are listed in larger yellow text.
- FIG. 8C is a histogram of loss-of-function observed/expected upper bound fraction (LOEUF) deciles of hub genes or randomly selected nonhub genes from all modules in the WGCNA network.
- LEUF loss-of-function observed/expected upper bound fraction
- FIG. 8D is a histogram of the number of gene-scrambled modules that have protein- protein interaction (ppi) enrichment at a Bonferroni-adjusted P of ⁇ 0.05. The vertical orange line marks the 15 modules that have significant ppi in the actual WGCNA network.
- CHD indicates congenital heart defects; MYOZ2, myozenin 2; NCC, neural crest cell; NHE, novel heart enhancer; OR, odds ratio; and pLI, probability of loss of function intolerance.
- FIG. 9 shows example images of embryos at each carnegie stage.
- FIG. 10A shows the Pearson correlation of primary ChIP-Seq signals by 10kb bins across the genome showing general correlation by mark.
- FIG. 10B shows the Pearson correlation of imputed ChIP-Seq signals by 10kb bins across the genome showing better correlation between marks than the primary signal.
- FIG. 11 shows the individual state names/classifications and color coding for each model of the 15, 18, and 25 state models.
- FIG. 12A shows the distance of segments from protein coding TSS in base pairs per state with Dickel and EMERGE.
- FIG. 12B shows the length of segments per state with Dickel and EMERGE in base pairs.
- FIG. 12A shows the distance of segments from protein coding TSS in base pairs per state with Dickel and EMERGE.
- FIG. 12D shows the CADD scores per state with Dickel and EMERGE.
- FIG. 12E shows the LINSIGHT scores per state with Dickel and EMERGE.
- FIG. 13A shows the Pearson correlation heatmap of H3K27ac signal at putative enhancer segments showing general correlation by tissue type. Red indicates greater correlation, blue lower correlation.
- FIG. 13B shows the tSNE projection of imputed H3K27ac p-value signals at 444,413 enhancer segments from tissues profiled by Roadmap Epigenome, in embryonic craniofacial tissue, in embryonic heart tissue from this study, and from a study looking at sorted nuclei from heart tissue from multiple developmental stages.
- FIG. 14A shows the 15 state ChromHMM model analysis of the embryonic heart data. Boxplot where solid bar indicates fraction of overlap of VISTA heart positive enhancers with each state, shaded bar indicates overlap with enhancers from VISTA positive in tissues other than heart.
- FIG. 14B shows the 18-state model with the same conventions as FIG. 14A.
- FIG. 15A is a boxplot representing the fraction of each state in our embryonic heart samples that overlaps with peaks called from EMERGE bedgraph showing the greatest overlap between EMERGE and our TSS and Promoter States (specifically States 1-3), followed by strong enhancer states (States 13-15), and transcribed and regulatory states (States 9-10).
- FIG. 15B is a violin plot of the distribution of the scores calculated over EMERGE peaks from bedgraph signal, with again the highest concentration of higher scores being seen in the TSS and promoter states (States 1-4).
- FIG. 15A is a boxplot representing the fraction of each state in our embryonic heart samples that overlaps with peaks called from EMERGE bedgraph showing the greatest overlap between EMERGE and our TSS and Promoter States (specifically States 1-3), followed by strong enhancer states (States 13-15), and transcribed and regulatory states (States 9-10).
- FIG. 15B is a violin plot of the distribution of the scores calculated over EMERGE peaks from bedgraph signal
- FIG. 15C are ROC curves calculated for each of our embryonic heart samples, as well as the Dickel and EMERGE datasets, showing similar or higher AUCs for our heart samples when compared to the other datasets.
- FIG. 16A shows the overlap of enhancers from Dickel et al. by state as represented by percentage of overlap.
- FIG. 16B (Left) shows the overlap of the reproducible human embryonic heart enhancers (177,412) with the Dickel compendium of heart enhancers (82,119); (Right) shows the significance of overlap, shown by 1000 iterations of overlap with shuffled coordination of the larger dataset, p ⁇ 0.001.
- FIG. 16A shows the overlap of enhancers from Dickel et al. by state as represented by percentage of overlap.
- FIG. 16B shows the overlap of the reproducible human embryonic heart enhancers (177,412) with the Dickel compendium of heart enhancers (82,119);
- (Right) shows the significance of overlap, shown by 1000 iterations of
- 16C shows the overlap of the strong novel human embryonic heart enhancers (12,395) with the Dickel compendium of heart enhancers (82,119); (Right) shows the significance of overlap, shown by 1000 iterations of overlap with shuffled coordination of the larger dataset, p ⁇ 0.001.
- FIG. 16D shows the overlap of the strong novel human embryonic heart enhancers (12,395) with the enhancers from the Dickel compendium that show a prenatally biased score (>2 prenatal/postnatal) (9,953); (Right) shows the significance of overlap, shown by 1000 iterations of overlap with shuffled coordination of the larger dataset, p ⁇ 0.001.
- FIG. 16E shows the overlap of the strong novel human embryonic heart enhancers (12,395) with the enhancers from the Dickel compendium that came from peaks in human fetal data (5,042); (Right) shows the significance of lack of overlap, shown by 1000 iterations of overlap with shuffled coordination of the larger dataset, p ⁇ 0.001.
- FIG. 17 is a heatmap of the signal of putative enhancers differentially marked by H3K27ac and H3K4me2.
- FIG. 18A (Top) shows indicates the relative effects of enhancer sequences harboring the alternate allele on luciferase gene expression on HL1 cardiomyocytes across the SCN5A/SCN10A gene locus.
- Each dot indicates a variant centered amplicon tested in the pGL4.23 luciferase vector ordered by genomic position as tested by Kapoor et al. PNAS 2019.
- Bottom shows individual embryo chromatin state annotations from this study across the SCN5A/SCN10A gene locus. Samples are ordered from top to bottom based on developmental age, earliest to latest. Chromatin states are indicated by color segments using color convention from FIG. 1C. Below chromatin state segmentations is stranded RNA-Seq data from representative CS23 heart sample indicating robust SCN5A expression and virtually no expression from SCN10A. Super enhancers are indicated by labeled orange bars.
- the former variant directly overlaps a strong embryonic heart enhancer segment and an experimentally validated heart enhancer, hs2177, (in dark orange) as shown in panel to right.
- FIG 18B shows the UCSC browser shot of Hand2 gene locus, conventions are the same as in FIG. 18A.
- FIG. 18C shows the UCSC browser shot of MYOCD gene locus, conventions are the same as in FIG.
- FIG. 19A shows the scatterplot of the log2 fold enrichment and log10 Bonferroni adjusted significance level of GWAS variants associated with atrial fibrillation in all enhancers segments identified in the strong enhancer states for each embryonic heart sample (bright red), the total reproducible strong enhancers from the whole dataset (dark red) or other tissues in Roadmap Epigenome (blue). All values calculated using only variants with p values ⁇ 5x10-8 from GWAS Catalog using GREGOR.
- FIG. 19B uses the same conventions as FIG. 19A, for QT Interval associations.
- FIG. 19C uses the same conventions as FIG. 19A, for Chrons associations.
- FIG. 19D uses the same conventions as FIG. 19A, for Congenital and Conotruncal associations.
- FIG. 19E shows a Scatterplot of the odds ratio of Lupus GWAS SNPS using the 1E ⁇ 8 Threshold by the log10 GARFIELD Bonferroni adjusted p-values.
- Roadmap blood and immune cell types (triangle symbol) and non-immune cell types (star symbol) are colored by chromatin state as indicated by the color key. Significant enrichments in immune cell types are labeled.
- FIG. 19F shows the enrichment of GWAS analysis p-values for atrial fibrillation in strong enhancer state annotations (E13_EnhA1, E14_EnhA2, E15_EnhAc) as determined by GARFIELD.
- FIG. 19G is the same as in FIG. 19E using GWAS summary statistics for systemic lupus erythematosus. Lupus shows greatest enrichment in strong enhancers identified in immune cell types sorted from blood.
- FIG. 19H is the same as FIG.
- FIG. 19E for Resting Heart Rate.
- FIG. 19I is the same as FIG. 19E, for QRS Interval.
- FIG. 19J is the same as FIG. 19E, for P-wave duration.
- FIG. 19K is the same as FIG. 19F, for Crohns.
- FIG. 20A shows the tSNE plot using all genes annotated by Gencode (v25) quantified by Rail-RNA form this study or GTEx and retrieved from recount2. All samples profiled in this study (red) cluster well with one another relative to other human tissues including adult heart (purples) and brain (greens).
- FIG. 20B shows box plots of expression values based on log10 scaled counts for samples from embryonic heart (red), adult heart (purple), and adult brain (green).
- FIGS. 20C–E show PCA plots for human embryonic heart RNA-Seq Data. Colored and labeled by for C. CS Stage. D. Gender E. RIN score of RNA.
- FIG. 21A shows a heatmap of all genes found to be differentially expressed across the time series.
- FIG. 21B is a horizontal bar plot of the number of genes differentially expressed across time series. The blue bars indicate downregulated genes and pink for upregulated. Note the early time points have the most amount of differentially expressed genes when compared to the later time points.
- FIG. 22A shows the gene ontologies for the CS16 (pink cluster, left panel) time point.
- FIG. 23A inset shows the cyclical pattern of gene expression identified for Group 2 genes from WGCNA; bars indicate normalized enrichment scores for significantly enriched functional categories from gene set enrichment analysis based on comparisons of gene expression between CS16 and CS18 (left) or CS18 and CS20 (right).
- FIGS. 23B-C show the enrichment plots of genes across heart valve development category based on ranked order from pairwise comparisons between CS16 and CS18 (FIG. 23B) or CS18 and CS20 (FIG.23C).
- FIG. 24 shows the enrichment of curated gene lists in WGCNA of human time-series brain RNA-seq.
- the enrichment of heart specific lists (EHE, GINI) do not line up with cardiomyocytes like was observed for the embryonic heart network. Also, there is significant enrichment of embryonic heart specific genes in their null module (unassigned, grey).
- FIG. 25 shows the full violet module network.
- FIG. 26A shows the brown WGCNA module network.
- FIG. 26B shows the grey60 WGCNA module network.
- FIG. 26C shows the mediumpurple3 WGCNA module network.
- FIG. 26D shows the green WGCNA module network.
- peptide amphiphile is a reference to one or more peptide amphiphiles and equivalents thereof known to those skilled in the art, and so forth.
- the term “comprise(s)” and linguistic variations thereof denote the presence of recited feature(s), element(s), method step(s), etc. without the exclusion of the presence of additional feature(s), element(s), method step(s), etc.
- the term “consisting of” and linguistic variations thereof denotes the presence of recited feature(s), element(s), method step(s), etc.
- congenital heart defect refers to an abnormality in the heart that is present in an infant at birth.
- a congenital heart defect may affect the structure of the heart and/or affect the way the heart operates.
- infant when used herein in reference to a human subject refers to a subject less than 1 year of age.
- sample is used in the broadest sense and is inclusive of many sample types that may be obtained from the subject. Samples may be obtained from animals (including humans) and encompass fluids (e.g., urine, blood, blood products, sputum, saliva, etc.), solids, tissues, and gases. In some embodiments, the sample is a blood sample, a serum sample, or a plasma sample.
- single nucleotide polymorphism refers to a variation in the sequence of a gene in the genome of a population that arises as the result of a single base change, such as an insertion, deletion or, a change in a single base.
- subject refers to any vertebrate, including, but not limited to, a mammal (e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse, a non-human primate (for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.) and a human).
- a mammal e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse
- a non-human primate for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.
- the subject may be a human or a non- human.
- the subject is an infant.
- the subject is a child.
- Treatment are each used interchangeably herein to describe reversing, alleviating, or inhibiting the progress of a disease and/or injury, or one or more symptoms of such disease, to which such term applies.
- the term also refers to preventing a disease, and includes preventing the onset of a disease, or preventing the symptoms associated with a disease.
- a treatment may be either performed in an acute or chronic way.
- the term also refers to reducing the severity of a disease or symptoms associated with such disease prior to affliction with the disease.
- Such prevention or reduction of the severity of a disease prior to affliction refers to administration of a pharmaceutical composition to a subject that is not at the time of administration afflicted with the disease.
- Preventing also refers to preventing the recurrence of a disease or of one or more symptoms associated with such disease.
- CHDs congenital heart defects
- CHD arises through combinations of otherwise benign mutations in a large number of genes, unappreciated genetic-environmental interactions, or disruption of regulatory sequences that control heart development.
- regulatory regions are causative for CHDs.
- Patients homozygous for rare variation in a heart-specific regulatory sequence controlling the cardiac TF (transcription factor) TBX5 (T-box TF 5) have isolated CHD.
- Recent analysis of gene expression from the heart at multiple stages of human development indicates the dynamics of gene expression occur primarily during the embryonic period of human development or the first 8 postconception weeks, aligning well with known structural and functional changes in the developing heart.
- H3K27ac associated with active chromatin with functions ranging from active transcription to active enhancers
- H2A.Z histone 2A variant Z
- DNA accessibility DNAse hypersensitivity
- chromatin states from human embryonic hearts at 4 to 8 postconception weeks using hidden Markov models of chromatin state in a manner that allows for direct comparison with over 100 tissues profiled by Roadmap Epigenome.
- methods of identifying a subject as at risk of having a congenital heart defect are provided herein.
- the method of identifying a subject as at risk of having a congenital heart defect comprise assessing a plurality of genes in a sample obtained from the subject.
- the plurality of genes are selected from the genes listed in Table 1.
- the plurality of genes comprise at least 5 genes listed in Table 1.
- the plurality of genes may comprise at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 200, at least 230, at least 240, or at least 250 genes listed in Table 1.
- the plurality of genes comprises all of the genes listed in Table 1. Table 1. Genes that may be assessed to determine risk of CHD
- the method further comprises identifying the subject as at risk of having a congenital heart defect based upon the assessment of the plurality of genes.
- assessing or “assessment” is used in the broadest sense and is inclusive of many types of evaluations that may be performed that are indicative of gene expression and/or mutation(s) in a gene.
- assessing the plurality of genes comprises determining a copy number for at least one of the plurality of genes.
- the subject may be identified as at risk of having a congenital heart defect based upon the assessment of the copy number for the at least one gene.
- the copy number for at least one gene, at least 2 genes, at least 3 genes, at least 4 genes, at least 5 genes, at least 6 genes, at least 7 genes, at least 8 genes, at least 9 genes, at least 10 genes, or more than 10 genes in the panel of genes (e.g. at least 10, at least 25, at least 50, etc.) is assessed and used to determine the risk of having a congenital heart defect.
- the subject may be identified as at risk of having a congenital heart defect if the copy number for the at least one gene is increased.
- the gene may be associated with increased risk of congenital heart defects, and therefore an increased copy number of the gene may identify the subject as at risk of having a congenital heart defect.
- the subject may be identified as at risk of having a congenital heart defect if the copy number for the at least one gene is decreased.
- the gene may be associated with protection from (e.g., resistance to) congenital heart defects, and therefore a decreased copy number or the absence of a copy of the gene may indicate that the subject is at risk of having a congenital heart defect.
- assessing the plurality of genes comprises detecting the presence of one or more single nucleotide polymorphisms (SNPs) for at least one of the plurality of genes.
- SNPs single nucleotide polymorphisms
- the presence of one or more SNPS for at least one gene, at least 2 genes, at least 3 genes, at least 4 genes, at least 5 genes, at least 6 genes, at least 7 genes, at least 8 genes, at least 9 genes, at least 10 genes, or more than 10 genes in the panel of genes (e.g. at least 10, at least 25, at least 50, etc.) may be assessed to determine the risk of a subject having a CHD.
- Any suitable method for assessing the plurality of genes e.g. assessing copy number variation, assessing SNPs, measuring expression of a gene product, etc.
- Suitable methods for assessing a gene e.g.
- PCR polymerase chain reaction
- DNA polymerase chain reaction refers to the method of K.B. Mullis U.S. Patent Nos. 4,683,195, 4,683,202, and 4,965,188, that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic or other DNA or RNA, without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase.
- the two primers are complementary to their respective strands of the double stranded target sequence.
- the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule.
- the primers are extended with a polymerase so as to form a new pair of complementary strands.
- the steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence.
- the length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter.
- the method is referred to as the “polymerase chain reaction” (“PCR”).
- PCR encompasses many variants of the originally described method using, e.g., real time PCR, nested PCR, digital PCR, droplet-digital PCR, reverse transcription PCR (RT-PCR), single primer and arbitrarily primed PCR, etc.
- RT-PCR reverse transcription PCR
- Sequencing may be performed to determine an RNA sequence. Sequencing methods generally involve amplifying the target sequence (e.g. by PCR, as described above), purifying the amplicon, sequencing the amplicon, and analyzing the sequence to detect. RNA sequencing methods involve a first step of converting the desired RNA into complementary DNA fragments (e.g. cDNA), prior to amplifying and isolating the desired amplicon. Analyzing the sequence may identify, for example, copy number variants or single nucleotide polymorphisms present for a desired sequence.
- hybridization methods refers to a variety of methods that involve the use of probes (e.g., DNA probes) that are complementary to a given SNP site.
- probes e.g., DNA probes
- hybridization methods may involve the use of a labeled probe, which binds to a given SNP site and thereby gives a signal indicating the presence of a given SNP in a sample.
- the hybridization method is dynamic allele-specific hybridization.
- the hybridization is an array, such as a high-density oligonucleotide SNP array. The use of such an array allows for the investigation of multiple SNPs simultaneously.
- assessing the plurality of genes comprises measuring expression of a gene product.
- assessing the plurality of genes comprises measuring expression of a gene product, such as a protein, and determining the risk of having a congenital heart defect based upon the expression of said gene product.
- the method comprises assessing the quantity or amount of a given protein in a biological sample, and using the amount of protein to infer the expression of a gene or the presence of one or more hyperactive (e.g., activity increasing) or hypoactive (e.g. activity decreasing) mutations in the gene.
- hyperactive e.g., activity increasing
- hypoactive e.g. activity decreasing
- Such information may thereby be used to assess the risk of CHD in the subject.
- a single assessment type is performed to determine the risk of the subject having a congenital heart defect.
- multiple assessment types are performed to determine the risk of the subject having a congenital heart defect.
- the sample may be any suitable sample obtained from the subject. Suitable sample types include, for example, biological fluids (e.g., urine, blood, serum plasma, sputum, saliva, etc.), solids, tissues, and gases.
- biological fluids e.g., urine, blood, serum plasma, sputum, saliva, etc.
- the sample is a urine sample.
- the sample is a blood sample.
- the sample is a serum sample.
- the sample is a plasma sample.
- the sample is a saliva sample.
- the subject is a human subject. In some embodiments, the subject is an infant.
- the subject is a child. In some embodiments, the subject is an adult.
- the subject may be a human infant or a human child suspected of having a congenital heart defect based upon the presence of one or more symptoms indicative of a potential CHD.
- congenital heart defect is inclusive of many types of defects, including Atrial Septal Defect, Atrioventricular Septal Defect, Coarctation of the Aorta, Double-outlet Right Ventricle, d-Transposition of the Great Arteries, Ebstein Anomaly, Hypoplastic Left Heart Syndrome, Interrupted Aortic Arch, Pulmonary Atresia, Single Ventricle, Tetralogy of Fallot, Total Anomalous Pulmonary Venous Return, Tricuspid Atresia, Truncus Arteriosus, and Ventricular Septal Defect. Signs and symptoms for CHDs depend on the type and severity of the particular defect. Some defects might have few or no signs or symptoms.
- the assessment of risk of having a CHD performed using the panels and/or methods described herein may be corroborated by additional investigations of the infant. For example, additional procedures such as monitoring any one of the above signs/symptoms of CHD may be performed. As another example, imaging procedures may be performed to evaluate heart contractions, rhythms, blood flow, etc.
- a panel of genes In some embodiments, provided herein is a panel of genes for assessing risk of congenital heart defects in a subject. In some embodiments, the panel comprises at least 10 genes listed in Table 1.
- the panel comprises at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 200, at least 230, at least 240, or at least 250 genes listed in Table 1.
- the panel comprises all of the genes listed in Table 1.
- the panel may be used to assess the risk of congenital heart defects in a subject.
- the panel may be used to assess the risk of congenital heart defects in the subject by any of the assessments performed above.
- the panel may be used to assess the risk of congenital heart defects in a subject by assessing one or more genes in the panel for changes in copy number.
- the panel may be used to assess the risk of CHD in the subject by assessing the one or more genes in the panel for mutations, such as SNPs.
- multiple assessment types e.g., copy number variation, mutations such as SNPs, etc.
- the methods for assessing the risk of congenital heart defects may further comprise providing an appropriate treatment to the subject, if the subject is determined as having a high risk of a congenital heart defect.
- Suitable treatments for congenital heart defect may depend on the nature and/or severity of the defect, along with the subject’s age, size, and general health.
- treatment includes surgical procedures, cardiac catheterizations, and heart transplant.
- treatment includes therapeutic agents, such as anticoagulants, agents to control blood pressure, agents to control heart rate/rhythm, etc. Selection of the appropriate treatment may be controlled by a physician, such as a physician having knowledge of the assessments performed using any of the methods described herein.
- kits may be used to assess the panel of genes described herein. The kits may be used to perform any of the methods for assessing risk of congenital heart defect in a subject as described herein.
- the kit comprises at least one component for assessing risk of congenital heart defect.
- the kit may comprise components for a hybridization based assay, sequencing, a PCR-based assay, etc. to assess a plurality of genes or gene products in a sample (e.g. primers, probes, labels, buffers, enzymes, plates, tubes, etc.).
- the kit may further comprise a means for obtaining and/or storing a sample, or components for processing a sample (e.g,. sample collection tubes, sample storage tubes, preserving agents, buffers, etc.).
- the kit further comprises instructions for assessing the plurality of genes in sample.
- the kit may comprise instructions for performing a sequencing based method, a PCR based method, or a protein expression based method for evaluating the plurality of genes.
- Instructions included in kits can be affixed to packaging material or can be included as a package insert. While the instructions are typically written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this disclosure. Such media include, but are not limited to, electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like.
- the term “instructions” can include the address of an internet site that provides the instructions.
- EXAMPLES Example: Epigenomic and Transcriptomic Dynamics During Human Heart Organogenesis
- the systematic characterization of chromatin states from human embryonic hearts is described at 4 to 8 postconception weeks using hidden Markov models of chromatin state in a manner that allows for direct comparison with over 100 tissues profiled by Roadmap Epigenome.
- Previously validated in vivo heart enhancers were confirmed and thousands of previously unknown embryonic heart–specific regulatory sequences were identified. Integration of chromatin and gene expression through coexpression analysis identified groups of genes that are coordinately expressed during early heart development and likely regulated by these heart-specific enhancers.
- ChIP-seq and RNA-seq frozen human embryonic heart tissues frozen were removed from tubes to a petri dish with cold PBS using 1ml of cold PBS and a cut pipette tip. Hearts were photographed from at least two aspects and the tube with embryo ID photographed for records. Hearts were homogenized by mechanical disruption and divided between samples for RNA-seq and ChIP-seq as necessary.
- tissues were fixed by incubation in 1% formaldehyde for 15 minutes at room temperature with agitation before being quenched by addition of 2.5M glycine to a concentration of 150mM with rotation/agitation for 10 minutes.
- RNA-Seq homogenized tissue was added to Qiazol (Qiagen) in a non-stick 1.5ml eppendorf tube, inverted to mix and placed in a dry ice- ethanol slurry to flash freeze.
- ChIP-Seq Fixed cells pellets were processed for ChIP as previously described.105 Briefly, samples were thawed in 1 mL of 1x Cell Lysis buffer and incubated on ice for 20 minutes. Cells were lysed with dounce homogenization and nuclei were collected by centrifugation (5 min, 2500g, 4°C).
- Nuclei were resuspended in 300 ⁇ L of 1x Nuclear Lysis buffer + 0.3% SDS + 2 mM sodium butyrate and incubated on ice for 20 minutes. Chromatin was sheared with a Qsonica Q800R1 sonicator system operating at amplitude 23 and 2°C for 30 minutes (10 seconds duty, 10 seconds rest). Samples were cleared by centrifugation (5 min, 20,000g, 4°C) and soluble chromatin was transferred equally into seven separate tubes with 10% reserved as an input control. SDS concentration was reduced to 0.18% with ChIP-seq Dilution buffer.
- Antibodies used in this study were as follows: anti-H3K27ac (C15410196, Diagenode), anti-H3K4me1 (C15410194, Diagenode), anti-H3K4me2 (ab7766, Abcam), anti-H3K4me3 (C15410003, Diagenode), anti-H3K27me3 (C16410195, Diagenode), anti-H3K9me3 (C15410193, Diagenode) and anti-H3K36me3 (C15410192, Diagenode).
- Combined eluates for each ChIP were subjected to crosslink reversal overnight at 65°C. Samples were then sequentially treated with RNAse A and proteinase K, purified with a PCR Purification Kit (Qiagen), and eluted in 40 uL of EB.
- ChIP samples were then quantified with picoGreen (ThermoFisher) and ChIP-seq libraries were prepared (SMARTer® ThruPLEX® DNA-seq 48S Kit, R400427, Takara Bio USA), then quantified by qPCR (NEBNext Library Quant Kit for Illumina E7630L), multiplexed, and sequenced for 75 cycles across multiple flow cells on an Illumina NextSeq 500 instrument using a NextSeq 500/550 High Output v2 kit (75 cycles, Cat No. FC-404-2005). ChIP-Seq Data Analysis Quality control was performed on ChIP-seq reads using FastQC (version [v.] 0.11.5) and MultiQC (v.1.1).
- Trimming for adapters, quality and length was performed using Trimmomatic (v.0.36) for single end data.
- ChIP-seq reads were aligned to the human genome (hg19) using Bowtie2 (v. 2.2.5). Fragment sizes of each library were estimated using PhantomPeakQualTools (v.1.14). P value-based signal tracks were generated relative to appropriate input controls based on estimated library fragment size using MACS2 (2.1.1.20160309).
- Bedgraph files for all p- value signals from primary ChIP-Seq data were converted to 25 bp resolution and processed for model training and generation of imputed signals for all samples using ChromImpute (v1.0.3) as previously described.37 Resulting imputed signal tracks were converted to bigWig format for display in UCSC genome browser and converted for use with ChromHMM (v1.12), using ChromImpute’s ExportToChromHMM. Signal files for individual chromosomes for each epigenome were binarized and segmentation was performed using the previously published 25- state chromatin models using ChromHMM as previously described. Following segmentation, annotation of states and generation of genome browser files was performed based on annotations provided by Roadmap Epigenome.
- the x and y dimensions were combined with sample and group labels for plotting with ggplot2 in R.
- H3K27ac and H3K4me2 signals were compared at enhancer chromatin state segmentations independently using DiffBind (V2.6.6) in R (V3.4.1).111
- DiffBind V2.6.6
- V3.4.1 a specific chromatin signal
- uniquely aligned reads from two to four replicates of each time period were quantified and normalized for input signal at enhancer segments (states 13 through 18 from 25 state model) using fragment sizes determined by phantompeakqualtools (v.1.14)107, and the DBA_SCORE_TMM_MINUS_FULL_CPM function of DiffBind.
- Z- score were calculated for each motif across the comparisons and plotted as heatmaps. Differentially enriched regions were assigned to the single nearest gene up to 1 Mb away and resulting gene lists were assessed for gene ontology enrichments using GREAT.113 All results from GREAT were retrieved programmatically using rGREAT (V1.17.1 https://doi.org/doi:10.18129/B9.bioc.rGREAT). Z-score were calculated for each -log10 transformed p-value for each gene ontology enrichment across a single comparison and plotted as a heatmap.
- Enrichment of GWAS signals in enhancer chromatin state segmentations Two linkage-disequilibrium (LD) aware approaches were used to determine enrichment of cardiovascular and cardiac development related GWAS signals in enhancer chromatin state signals.
- LD linkage-disequilibrium
- ROC Curves The ROC curves for each of the three types of data (ChromHMM, EMERGE, Dickel) used enhancers verified by the Vista Enhancer browser.
- ChromHMM prediction values for enhancers in heart versus non heart the posterior probabilities were used. Each chromatin state has a posterior probability output file from ChromHMM segmented into 200bp bins. The mean of the sum of the posterior probabilities for states 13,14,15, and 18 were calculated for the ROC curves.
- RNA-Seq RNA was extracted using miRNeasy RNA extraction kit with on-column DNAse treatment according to the manufacturer’s protocol (Qiagen). RNA integrity was checked using Agilent Tapestation 2200. RNA-seq libraries were prepared from 100–200ng total RNA using the TruSeq stranded mRNA kit (Illumina). Libraries were quantified using NEBNext Library Quant Kit for Illumina and library quality checked using Agilent Tapestation 220. Libraries were pooled and diluted to 1.8pm and sequenced on the NextSeq500 Illumina platform using 75bp paired end sequencing according to manufacturer’s recommendations.
- RNA-Seq Data Processing Quality control was performed on RNA-seq reads using FastQC (version [v.] 0.11.5) and MultiQC (v.1.1). Trimming for adapters, quality and length was performed using Trimmomatic (v.0.36). Trimmed fastqs were aligned with Rail-RNA77 using human assembly GRCh38/hg38.
- GTEx tSNE Analysis The rse_gene R data object, or counts table of all GTEx tissues was retrieved from the Recount2 database (https://jhubiostatistics.shinyapps.io/recount/). The GTEx counts table was generated using the same Rail-RNA & Recount pipeline we used to generate the counts tables for our embryonic heart data which is described above.
- the GTEx data contained 9,662 samples that were combined with the 24 embryonic heart samples to make one matrix containing 9,686 total samples by 58,037 genes.
- the meta-data for GTEx is provided in a link under the phenotype column from the Recount2 database, and the tissue assignments located under the column named smts was used, resulting in a total of 31 unique tissues.
- the counts were transformed using the scale_counts() function from the R library recount (v1.8.2) as is recommended from the workflow in the Recount2 F1000 paper.114 Genes whose mean across all GTEx and embryonic heart tissues was lower than or equal to 1 were removed, resulting in 36,990 genes.
- the filtered counts matrix was transformed by log10() with a pseudo count of 1 added to all values. The transpose of the log10 transformed matrix was then converted to a distance matrix using the dist() function in R.
- This distance matrix was used as input for the tsne model generated by using the Rtsne() function from the R package Rtsne (v0.15).
- Tissue Specificity of Gene Expression (GINI)
- the combined, filtered GTEx and human embryonic heart counts matrix used for the tsne analysis was also used to calculate the tissue specificity for each gene.
- the Gini Index for each gene was calculated using the Gini() function from R library Ineq (v0.2-13) on the average counts per tissue. A gene was given a tissue assignment based on the tissue with the maximum count for that gene.
- the z-scores of this average expression per tissue matrix was plotted using heatmap.2 with the gene rows organized by the dendrogram calculated from hclust().
- the genes for the embryonic heart, spleen and adult heart were determined by cutree() such that it would result in 25 groups to correspond to the 25 tissues.
- RDAVIDWebService (v1.22.0) was used to obtain the gene ontology enrichments using the original 36,990 genes as the background.
- Detected modules were merged based on their eigengene correlation. To do this a dendrogram of the module eigengenes was generated and a threshold value of 0.18 was chosen as input for the function mergeCloseModules(). The intra-modular connectivity of each gene was calculated using the intramodularConnectivity() function from the WGCNA library that determined the hub and non-hub designation. The resulting network contained 29 modules. Plotting of Modules A multidimensional scaling of the module eigenvectors output from WGCNA was generated to plot the modules in 2-D space using the function cmdscale() from the stats v3.6.1 package. A pearson correlation of the module eigenvectors was calculated for the edges. Positive correlations of 0.5 and greater were included.
- Modules were plotted that fulfilled the criteria of having significant adjusted p-values ( ⁇ 0.05) from the GO analysis, significant permutation p- values ( ⁇ 0.05) of embryonic heart specific enhancers, and/or of embryonic heart specific Gini genes, this resulted in exclusion of 9 modules from the plot.
- the module eigenvectors of each of the 20 modules were plotted using ggplot2, geom_smooth() function with the loess smoothing method. The confidence intervals were removed for ease of visualization.
- Gene Ontology and Functional Enrichments RDAVIDWebService (v1.22.0) was used to obtain gene ontology enrichment of the genes within each of the 29 WGCNA modules.
- the gene background list used was all the genes input into the WGCNA.
- the module enrichments of embryonic heart specific Gini genes, embryonic heart specific enhancers, various disease gene lists, and the NKX2-5 bound gene lists were determined by a permutation test with 1000 iterations.
- the non-hub genes were randomly sampled using the R function sample().
- the number of non-hub genes sampled were the same number of total hub genes within the network- 10% of 26,122 genes or 2,612 from the 23,510 non-hub gene list. This process was iterated 1000 times to get a mean (gray bars) and standard deviation (shown as error bars) for each LOEUF decile.
- the LOEUF score and decile designation for each gene is freely available through gnomAD v2.1.1 Protein-protein interaction analysis
- 100 randomized versions of the WGCNA network were made. This was done by randomly assigning the 26,122 genes to 29 modules of equal gene sizes to the original network using the R function sample().
- the ppi enrichment of up to 500 randomly chosen genes for each module of each of the 100 randomized versions was then determined using the STRINGdb (v1.24.0) package. Up to 500 genes were used due to constraints from the STRINGdb package.
- the output p-value of the STRINGdb call get_ppi_enrichment() was adjusted using the Bonferroni method. The number of modules that met the adjusted p-value cut-off of 0.05 was counted for each iteration to produce frequency values.
- Table 2 Having uniform epigenomic datasets for each heart sample across the developmental series, we then applied the previously generated 15-, 18-, and 25-state models of chromatin activity developed by Roadmap Epigenome to segment the genome into chromatin states.
- the individual state classifications and color coding for each model are provided in FIG. 11 for easy reference.
- the number of segments identified for each of the 25 chromatin states was similar across all of our 18 samples (FIG. 1C).
- the pattern of chromatin state segments identified in our human embryonic heart samples was similar to that identified in 127 tissues from Roadmap Epigenome, with the one exception of significantly increased numbers of poised promoter segments (state 22 from 25-state model; FIG. 1D).
- Promoter-associated states 2 through 4 (2_PromU, 3_PromD1, and 4_PromD2) and bivalent promoter state 23 (PromBiv) were generally located within 1000 bp of known TSS. The remaining states from 5 to 21 were progressively more distant from known TSS with heterochromatic segment annotations (21_Het) being the most distant. Transcription-associated states (5_Tx5, 6_Tx, 7_Tx3, and 8_TxWk) and regulatory states located in introns of active genes (9_TxReg, 10_TxEnh5, 11_TxEnh3, and 12_TxEnhW) were closer to TSS than other active regulatory states.
- Regions assembled in a compendium of heart enhancers by Dickel et al showed similar distributions of distance to TSS as strong enhancer states 14, 15, and 18 but were more distant than state 13. These states were driven in the original Roadmap model by H3K27ac and H3K4me1 with the Dickel resource being heavily biased toward H3K27ac and its depositor P300.
- the strongest enhancer state, state 13, was identified in Roadmap Epigenome by many more features including the presence of H3K27ac, H3K4me1, H3K4me2, and DNase and conversely the absence of H3K9me3, H3K27me3, H3K36me3, and H3K79me2, which are uniquely captured in our study.
- H3K27ac signals from each of these samples since it has been frequently shown to be highly tissue specific in its distribution and generally associated with enhancer activation. Due to the overall better performance of normalized, imputed signals relative to primary ChIP-seq data, we extracted imputed H3K27ac P signals from 174 samples at 444413 enhancer segments across the genome. We found the strongest global correlations between related tissue types, such as immune cell types, brain region tissues, and the embryonic heart samples. These data separated largely into adult versus embryonic groups and subsequently by tissue type (FIG. 2A; FIG. 13A and FIG. 13B).
- Enhancers identified by the 15-state model performed the poorest by this metric with EMERGE showing significantly higher specificity (FIG. 14A). These results suggest the 25-state chromatin model is best able to identify heart-specific regulatory sequences, particularly state 13 enhancers. However, this analysis does not take into account the ranking of enhancers for heart specificity provided by both Dickel et al and EMERGE. It is thus possible that higher ranking regulatory sequences in either resource may be better able to identify true heart positives. To test whether this was indeed the case, the ability of strong enhancer states from the 25-state model to recover heart-positive versus heart-negative enhancers compared with performance of ranked lists from both EMERGE and Dickel et al was measured.
- the Dickel compendium is based almost entirely on these 2 marks, and the EMERGE framework has been tuned to find positives from this database.
- chromatin state segmentations herein are capable of annotating regulatory sequences identified by different means.
- a large set of sequences characterized by binding of 7 cardiac TFs in fetal and adult mouse hearts were analyzed. This work demonstrated that regions bound by ⁇ 5 TFs showed significantly more activity than regions with H3K27ac signal but lacking TF binding when systematically tested using a massively parallel reporter assay (MPRA).
- MPRA massively parallel reporter assay
- EHEs embryonic heart–specific enhancer segments
- the enriched molecular functions contain multiple terms related to microfibril and tubulin binding along with voltage-gated channel activity in the atrioventricular node.
- the sequence content of the EHEs for enrichment of TF-binding sites was subsequently analyzed using Hypergeometric Optimization of Motif Enrichment (FIG. 2D).
- We identified significantly enriched motifs that matched binding sites of TFs involved in heart development such as the GATA family, the MEF2 (myocyte enhancer factor-2) family, and TBX20 (T-box TF 20) among others (FIG. 2D, top).
- Putative enhancer segments differentially activated based on H3K27ac signals, the greatest differences were observed in motif enrichment between early and later stages of development (FIG. 3D).
- Putative enhancer segments active early were specifically enriched for pluripotency-related TFs like SOX2 and OCT4 and multiple members of the KLF (Kruppel-like factor) and Forkhead families of TFs.
- Motifs enriched in late putative enhancer segments included many zinc finger–containing TFs and multiple members of the T-box, GATA, and PAX (paired box) families of TFs.
- enhancer segments more strongly active in the mid period of heart embryonic development based on H3K4me2 signals showed the most pronounced enrichment of TF motifs (FIG. 3E).
- Many of the same TF motifs enriched in early and late putative enhancer segments based on H3K27ac were shifted to enrichment in the mid period, suggesting dynamics in TF utilization related to chromatin state.
- differentially activated putative enhancer segments it was hypothesized that the location of these segments and subsequently the genes that they control might uncover distinct patterns of biological pathway utilization during this developmental series. Differentially activated putative enhancer segments from each comparison to the nearest gene were assigned and the most significantly enriched gene ontology terms were identified (FIG. 3F).
- NKX2-5 results in dysregulation of many downstream target genes in mice and human cardiomyocytes, suggesting it is a master regulator of cardiac development. Therefore, identifying regulatory sequences that control NKX2-5 expression specifically in the developing heart could be valuable information for understanding the unknown genetic causes of CHDs.
- genomic locus containing this gene was inspected, it was found that it was surrounded by a plethora of strong enhancer state segments including an EHE immediately downstream of the coding exons (FIG. 4A). There are a number of strong enhancer segments identified uniformly from CS14 to CS23 ⁇ 50 kb upstream that are largely repressed in fetal and adult hearts.
- regions are particularly interesting as they cannot be readily identified through sequence conservation based on comparisons of 100 vertebrate genomes (FIG. 4A).
- Such regions have been associated with tissue-specification loci, further reinforcing the central role NKX2-5 plays in heart development and the novel information our resource has identified.
- the locus harboring SCN5A was then inspected, which has been implicated in multiple cardiac diseases and specifically known to cause ⁇ 20% to 30% of cases of Brugada syndrome (FIG.
- These variants included rs41312411 associated with establishment of resting heart rate and P-wave duration, rs3922844 associated with establishment of ECG traits and measures, and rs11708996 associated with Brugada syndrome. Variants in potential regulatory sequences across this entire locus have been tested for effects on enhancer activity in cultured cardiomyocytes.
- the large noncoding region adjacent to this gene contains 60 putative enhancers, nearly one-third are EHEs, and multiple heart-positive in vivo enhancers (FIG. 4B).
- EHEs electrospray enhancers
- FIG. 4B The specific nature of our annotations is readily apparent at this location with the distal noncoding region, and the sequences surrounding the TBX20 TSS are sparsely annotated across 127 tissues and cell types in Roadmap Epigenome.
- Another putative embryonic heart–specific super enhancer of note is ⁇ 200 kb in length and resides in the large noncoding region upstream of gap junction protein GJA1. Sites throughout this ⁇ 1-Mb region form long-range interactions with GJA1 in human induced pluripotent stem cell–derived cardiomyocytes.
- Heart-specific enhancers of Gja1 in mice is sufficient to decrease its expression, which has in turn been previously linked to arrhythmias.
- This set of embryonic heart–specific super enhancers includes many additional loci that are not currently known to play a role in cardiac development making them good candidates for future study. While the examples above demonstrated cis-regulatory landscapes surrounding a single gene, as indicated for enhancers of GJA1, such regulatory sequences can interact with their targets over long distances through chromatin looping. Such loops can be difficult to predict in silico, and given the tissue-specific nature of enhancers, appropriate tissues or cell types have to be utilized to identify biologically relevant interactions.
- Embryonic heart samples were tightly clustered and distinct from adult heart samples and other GTEx tissues (FIG. 20A). Principal component analysis of only the embryonic heart samples showed good clustering by stage and minimal effects from sex and RNA quality in the first 2 components (FIGS. 20C–E). Overall, these results suggest that our expression data are of high quality and likely informative for understanding early human heart development. Genes that are expressed in a limited number of tissues are more likely to be disease- related genes than those with broader expression patterns. Based on these general trends, it was hypothesized that genes expressed specifically during embryonic heart development are likely involved in cardiac defects.
- Gini coefficient a metric originally used to measure income inequality, which accurately identified genes with tissue- and cell-type specific expression.
- the same tissue-specific functional trends were observed when genes with elevated Gini coefficients and the highest expression in either the brain or the spleen, demonstrating the unbiased nature of this analysis and the specificity of our findings (FIG. 6B, top) were examined.
- Genes with the highest degree of embryonic heart specificity included known heart developmental TFs (NKX2-5, NKX2-6, and TBX20), myosin light chain genes (MYL3, MYL4, and MYL7), the long noncoding RNA BANCR, and the sinoatrial node–associated channel gene, HCN4 (FIG. 20B).
- the single highest Gini coefficient gene assigned to embryonic heart was LRRC10—a leucine-rich repeat–containing protein previously identified as having cardiomyocyte-specific expression and linked to human dilated cardiomyopathy.
- Modules with gene ontology enrichments expected for early heart development such as embryonic patterning (green, 2573 genes), muscle cell differentiation (brown, 4945 genes), and sarcomere assembly (violet, 1267 genes; FIG. 7C) were identified.
- Multidimensional scaling of the module eigengenes revealed intermodule coexpression, suggesting some modules were more closely related in their expression than others (FIG. 7C).
- Comparison of trajectories of expression of eigengenes from each module revealed 4 groups of gene expression patterns that reflect positioning of each module in multidimensional scaling space (FIG. 7D). Groups 1 and 3, which are diametrically opposed to one another in multidimensional scaling space, showed downward and upward trends of expression throughout the embryonic period, respectively.
- Groups 2 and 4 which are also opposite one another in multidimensional scaling space, showed multiphasic but offset patterns of expression.
- Group 2 showed a particularly strong wave-like pattern between CS16 and CS20.
- gene set enrichment analyses were performed across gene expression from CS16, CS18, and CS20, we readily observed cyclical enrichment of a number of pathways (FIGS. 23A –C). These included heart valve development, tissue migration, mitochondrial gene expression, and several metabolic processes. Significance Tests Give Context to WGCNA of Early Developing Heart To further characterize and validate the WGCNA network, the enrichment of several curated gene lists were evaluated. As we demonstrated above, binding sites for TFs expressed specifically in the embryonic heart were significantly enriched in embryonic heart–specific enhancers.
- the brown4 and violet modules that are enriched for gene ontologies related to the sarcomere and muscle cell development concordantly have significant enrichment for the cardiomyocyte cell types (FIG. 8A).
- the combined enrichment of both specific gene expression and specific enhancer activation in these modules suggest the network that was constructed is particularly meaningful during embryonic heart development (FIG. 8A).
- a coexpression network constructed for the developing human brain was leveraged. Analysis of embryonic heart–specific genes on this network supported this hypothesis as only 2 modules show enrichment for heart high Gini genes (FIG. 24). No modules were significantly enriched for known CHD genes.
- LOEUF loss-of-function observed/expected upper bound fraction
- hub genes in our network might be generally intolerant to loss-of-function mutations in otherwise healthy individuals.
- hub genes across the entire network were interrogated, significant enrichment of low LOEUF score genes and significant depletion of genes from the tenth decile was found (FIG. 8C).
- FOG. 8C Genes that have not been implicated in CHD but are characterized by high connectivity in our network, heart-specific expression, and low LOEUF scores thus represent novel candidate CHD genes.
- WGCNA reveals NKX2-5 regulatory program. Herein, many of these results have confirmed a central role of NKX2-5 in human heart organogenesis.
- NKX2-5 bound the promoters and directly regulated the expression of many genes involved in heart development, specifically those involved in voltage-gated ion channel activity.
- Two prominent genes in the violet module that are directly regulated by NKX2-5 in human cardiomyocytes are the TFs HEY2 and IRX4 (FIG. 8B; FIG. 25). Both of these have been shown to play a role in ventricular myogenesis in mice and are linked to heart abnormalities in humans.
- NKX2-5 target genes in 4 other modules (green, lightgreen, skyblue3, and mediumpurple3) that are enriched for embryonic patterning, ion channel function, and mitosis (FIG. 7C; FIGS. 26A–D).
- super enhancers In addition to individual regulatory sequences, the localized landscapes of coactivated enhancers commonly referred to as super enhancers were characterized. Large regulatory landscapes have been implicated in tissue specification and tumorigenesis and encompass genes important for these processes. In this data, thousands of super enhancers across all time points of embryonic heart development were identified, 1611 of which had not been previously annotated. These include embryonic heart–specific super enhancers near important heart genes, TBX20, GATA4, HAND1, HCN4, IRX3, IRX4, and IRX6, among others.
- This WGCNA identified modules of genes in an unbiased way, yet when it was analyzed, these groups of genes, it was uncovered coherent biological functions and expression characteristics across this developmental trajectory. It was found that a subset of modules is significantly enriched for both heart-specific gene expression and heart-specific enhancer activation. Many of these same modules are also enriched for known CHD genes, and well-connected or hub genes in these modules are generally intolerant to loss- of-function mutations in otherwise healthy individuals. When these networks with binding and functional data for the cardiac TF NKX2-5 were systematically interrogated, a clear physical regulatory connections both within the NKX2-5–containing module and other modules was uncovered.
- NKX2-5 connected modules were enriched for functions this gene has been suggested to regulate, including activation of sarcomere and ion channel genes and repression of neurogenesis.
- These findings demonstrate that these networks represent real biological connections relevant to heart development.
- Genes with characteristics similar to NKX2-5, such as specificity of expression, highly connected to other genes in WGNCA modules, regulated by heart-specific enhancers, and low tolerance to gene disruption, are, therefore, prime candidates for CHD genes. All these datasets openly in commonly used formats are directly comparable to other large consortia. These can be downloaded from Gene Expression Omnibus, retrieved via public track hub functionality on the UCSC Genome Browser, or directly from the Cotney Lab website.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne des panels de gènes et leurs procédés d'utilisation pour l'évaluation du risque de malformation ou de maladie cardiaque congénitale chez un sujet.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063051983P | 2020-07-15 | 2020-07-15 | |
| US63/051,983 | 2020-07-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022015998A1 true WO2022015998A1 (fr) | 2022-01-20 |
Family
ID=79554292
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/041853 Ceased WO2022015998A1 (fr) | 2020-07-15 | 2021-07-15 | Panels de gènes et leurs procédés d'utilisation pour le criblage et le diagnostic de malformations et de maladies cardiaques congénitaux |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2022015998A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114792548A (zh) * | 2022-06-14 | 2022-07-26 | 北京贝瑞和康生物技术有限公司 | 校正测序数据、检测拷贝数变异的方法、设备和介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130231258A1 (en) * | 2011-12-09 | 2013-09-05 | Veracyte, Inc. | Methods and Compositions for Classification of Samples |
| US20170166965A1 (en) * | 2013-11-27 | 2017-06-15 | William Beaumont Hospital | Method for Predicting Congenital Heart Defect |
| US20180305689A1 (en) * | 2015-04-22 | 2018-10-25 | Mina Therapeutics Limited | Sarna compositions and methods of use |
| WO2018204764A1 (fr) * | 2017-05-05 | 2018-11-08 | Camp4 Therapeutics Corporation | Identification et modulation ciblée de réseaux de signalisation génique |
| US20200176078A1 (en) * | 2008-11-17 | 2020-06-04 | Veracyte, Inc. | Algorithms for disease diagnostics |
-
2021
- 2021-07-15 WO PCT/US2021/041853 patent/WO2022015998A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200176078A1 (en) * | 2008-11-17 | 2020-06-04 | Veracyte, Inc. | Algorithms for disease diagnostics |
| US20130231258A1 (en) * | 2011-12-09 | 2013-09-05 | Veracyte, Inc. | Methods and Compositions for Classification of Samples |
| US20170166965A1 (en) * | 2013-11-27 | 2017-06-15 | William Beaumont Hospital | Method for Predicting Congenital Heart Defect |
| US20180305689A1 (en) * | 2015-04-22 | 2018-10-25 | Mina Therapeutics Limited | Sarna compositions and methods of use |
| WO2018204764A1 (fr) * | 2017-05-05 | 2018-11-08 | Camp4 Therapeutics Corporation | Identification et modulation ciblée de réseaux de signalisation génique |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114792548A (zh) * | 2022-06-14 | 2022-07-26 | 北京贝瑞和康生物技术有限公司 | 校正测序数据、检测拷贝数变异的方法、设备和介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Marand et al. | A cis-regulatory atlas in maize at single-cell resolution | |
| Viñuela et al. | Genetic variant effects on gene expression in human pancreatic islets and their implications for T2D | |
| Werling et al. | An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder | |
| Panopoulos et al. | iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types | |
| Auer et al. | Rare variant association studies: considerations, challenges and opportunities | |
| Holm et al. | A rare variant in MYH6 is associated with high risk of sick sinus syndrome | |
| EP2437191B1 (fr) | Procédé et système de détection d'anomalies chromosomiques | |
| Farré et al. | Evolution of gene regulation in ruminants differs between evolutionary breakpoint regions and homologous synteny blocks | |
| Garg et al. | A survey of inter-individual variation in DNA methylation identifies environmentally responsive co-regulated networks of epigenetic variation in the human genome | |
| Beauchemin et al. | Temporal dynamics of the developing lung transcriptome in three common inbred strains of laboratory mice reveals multiple stages of postnatal alveolar development | |
| Benaglio et al. | Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits | |
| Costa et al. | Massive-scale RNA-Seq analysis of non ribosomal transcriptome in human trisomy 21 | |
| EP3666902B1 (fr) | Analyse parallèle multiplexée de régions génomiques ciblées pour des tests prénataux non invasifs | |
| Bittel et al. | Gene expression in cardiac tissues from infants with idiopathic conotruncal defects | |
| Powell et al. | The genetic architecture of variation in the sexually selected sword ornament and its evolution in hybrid populations | |
| JP2016165286A (ja) | 転写物測定値数が減少した、遺伝子発現プロファイリング | |
| Ricci et al. | Myocardial alternative RNA splicing and gene expression profiling in early stage hypoplastic left heart syndrome | |
| US20220073986A1 (en) | Method of characterizing a neurodegenerative pathology | |
| Natri et al. | Cell type-specific and disease-associated eQTL in the human lung | |
| Xu et al. | The interplay between host genetics and the gut microbiome reveals common and distinct microbiome features for human complex diseases | |
| Li et al. | The functional impact of rare variation across the regulatory cascade | |
| Ma et al. | Molecular convergence of risk variants for congenital heart defects leveraging a regulatory map of the human fetal heart | |
| WO2022015998A1 (fr) | Panels de gènes et leurs procédés d'utilisation pour le criblage et le diagnostic de malformations et de maladies cardiaques congénitaux | |
| Han et al. | A Population-scale Single-cell Spatial Transcriptomic Atlas of the Human Cortex | |
| Arthur et al. | Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21841257 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21841257 Country of ref document: EP Kind code of ref document: A1 |