WO2025024111A1 - Séquençage de fragment d'amplicon à point unique et procédés de diagnostic et de surveillance d'une maladie - Google Patents

Séquençage de fragment d'amplicon à point unique et procédés de diagnostic et de surveillance d'une maladie Download PDF

Info

Publication number
WO2025024111A1
WO2025024111A1 PCT/US2024/036971 US2024036971W WO2025024111A1 WO 2025024111 A1 WO2025024111 A1 WO 2025024111A1 US 2024036971 W US2024036971 W US 2024036971W WO 2025024111 A1 WO2025024111 A1 WO 2025024111A1
Authority
WO
WIPO (PCT)
Prior art keywords
species
mcfdna
microbial
disease
fragments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/036971
Other languages
English (en)
Inventor
Daniel Van Der Lelie
Lisa OUELLETTE
Safiyh Taghavi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gusto Global LLC
Original Assignee
Gusto Global LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gusto Global LLC filed Critical Gusto Global LLC
Publication of WO2025024111A1 publication Critical patent/WO2025024111A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria

Definitions

  • the presently disclosed subject matter relates to a high-throughput, high-resolution and low-cost method for early detection and monitoring of disease including cancer and IBD.
  • Liquid biopsy based on circulating cell-free DNA provides a new prospect for the diagnosis, monitoring and risk assessment of a range of diseases.
  • cfDNA molecules circulating in peripheral blood originate from dying human cells as well as from viruses, parasites, and colonizing or invasive microbes that release their nucleic acids into the blood as they die and break down (setting et al, 2001).
  • Human-derived cfDNA has evolved into an indispensable biomarker in clinical practice for rapid and noninvasive diagnosis in prenatal screening, organ transplantation, and oncology (Decker and Sholl, 2020; Liang et al, 2019; Sun et al, 2019; Wu et al, 2020).
  • mcfDNA detection offers the potential to reliably identify a wide variety of infections, such as invasive fungal infection, tuberculosis, sepsis, cystic fibrosis (Rassoulian Barrett et al, 2020) and chorioamnionitis (Witt et al, 2020; for review see Han et al, 2020).
  • 2019; Zeng et al, 2019 may also harbor microbiota with distinctive compositions (for review, see Sepich-Poore et al, 2021), including fungi (Narunsky- Haziza et al, 2022). Both Nejman et al. (2020) and Poore et al. (2020) suggested the existence of distinct intratumoral microbiomes among >30 cancer types; these microbiomes also vary in composition at different developmental stages of the tumor, thus providing biomarkers for disease progression and prognosis for patient outcomes.
  • the tumor associated bacteria will release distinct mcfDNA in the blood stream, and this let Poore et al (2020) propose the analysis of mcfDNA from the peripheral blood as a tool to gain valuable information regarding the progression of various types of cancers.
  • IBD inflammatory bowel disease
  • existing stool and peripheral blood inflammatory biomarkers are not informative to monitor patient disease status, medication performance and treatment outcomes as they fail to address a growing body of evidence that increased intestinal permeability and gut microbiome dysbiosis are correlated to disease progression, symptom severity, and patient outcomes.
  • the lactulose - mannitol test is the most common and only direct test of a leaky gut. However, this test doesn’t provide insights in imbalanced gut microbiome bacteria and a dysfunctional gut epithelial barrier.
  • leakage of microbial material from the gut into the bloodstream can result in an overactive immune response that underlies multiple inflammatory conditions besides IBD, including rheumatoid arthritis juvenile idiopathic arthritis, psoriatic arthritis, ankylosing spondylitis and plaque psoriasis, and even inflammatory neurological diseases such as Alzheimer’s disease. Therefore, there is an urgent need for new ways of testing to assess the level of microbial material in the bloodstream as a surrogate biomarker in IBD and other inflammatory diseases.
  • amplicon-based sequencing approaches are routinely used to determine microbial community composition in a wide range of biological samples.
  • the most used approach is amplicon sequencing of the 16S rRNA gene based on its variable regions, such as the VI -V2 and V3-V4 regions (Gupta et al, 2019).
  • Shahir et al (2020) applied 16S rRNA gene sequencing to identify region-specific composition and aerotolerance profiles of mucosally adherent bacteria in biopsy samples taken from the colon and ileum of Crohn’s disease and non-IBD patients.
  • single copy proteins encoding housekeeping genes including the genes for the DNA gyrase subunit B (gyrB) (Poirier et al, 2018), RNA polymerase subunit B (rpoB) (Vos at al, 2012; Ogier et al, 2019), the heat shock protein 60 (hsp60), the superoxide dismutase A (sodA), the TU elongation factor (tuf) (Ghebremedhin et al, 2008) and the 60 kDa chaperonin protein (cpn60) (Links et al, 2012) have been proposed as phylogenetic marker genes.
  • Liquid biopsy samples especially peripheral blood, represent unique challenges for the analysis of microbial signatures.
  • the majority of mcfDNA fragments in blood were found to be approximately 40 - 100 bp in size (Bumham et al, 2016), as was confirmed by Rassoulian Barrett et al (2020).
  • Due to the small size of mcfDNA fragments conventional amplicon-based sequencing approaches that target DNA fragments of several hundred nucleotides (>400) are not suitable for determining the composition of colonizing or invasive microorganisms using mcfDNA from liquid biopsy samples.
  • the V1-V2 and the V3-V4 regions of the 16S rRNA gene have an average length of 437 and 443 nucleotides, respectively.
  • the concentrations of plasma cfDNA in healthy individuals varies greatly, generally within the range of 0-100 ng per milliliter of plasma, sometimes exceeding 1500 ng per milliliter.
  • Human cfDNA accounts for the vast majority (>90% or even >99%), while mcfDNA accounts for only a small fraction with 0.08%-4.85% from bacteria, 0.00%-0.01% from fungi, and 0.00%-0.16% from viruses/phages.
  • elevated levels of mcfDNA can sometimes be observed in certain pathological conditions, including infection, sepsis, trauma, and autoimmune diseases (Han et al, 2020). Because the analysis of mcfDNA requires deep next generation sequencing (NGS) of plasma cfDNA to overcome the limitations of small mcfDNA fragment size and low concentration, this approach is unsuitable for the testing of large patient cohorts or routine health screening.
  • NGS next generation sequencing
  • a method is provided of amplifying microbial cell free DNA (mcfDNA), comprising: (a) providing a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (b) providing an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E- PCR) using the degenerate primer and the additional primer, thereby generating enriched mcfDNA fragments; and (d) performing an amplification PCR (A-PCR) on
  • a method is provided of amplifying microbial cell free DNA (mcfDNA), comprising: (a) performing an enrichment polymerase chain reaction (E- PCR) on a sample comprising the microbial cell-free DNA (mcfDNA), wherein enriched mcfDNA fragments are generated using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and wherein the degenerate primer comprises an affinity purification group, and (ii) an additional primer complementary to an end of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; and (b) performing an amplification PCR (A-PCR) on the enriched m
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) providing a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (b) providing an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E- PCR) using the degenerate primer and the additional primer, thereby generating enriched mcfDNA fragments; (d) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragments
  • E-PCR enrichment polymerase chain reaction
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; (b) sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragments; (a) performing, on a sample comprising m
  • a kit for amplifying microbial cell free DNA (mcfDNA), comprising: (a) an adaptor for ligating to ends of mcfDNA; (b) a degenerate primer comprising complementarity to a conserved region, wherein the degenerate primer comprises an affinity purification group, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region on the phylogenetic marker gene, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (c) an additional primer complementary to a repaired version of the adaptor; (d) a modified degenerate primer comprising a 3 ’-end extension of at least three nucleotides relative to the degenerate primer; and (e) instructions for performing: (i) an enrichment of a microbial cell free DNA (mcf
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) providing mcfDNA sequences, wherein the mcfDNA sequences are derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; (b) assigning one or more of the mcfDNA sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database; and (c) optionally, calculating a microbial community composition for the sample, based on a relative abundance of the mcfDNA sequences assigned to each microbial species, wherein one or a combination of: the
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) sequencing amplified mcfDNA fragments derived from a sample, wherein the amplified mcfDNA fragments comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; (b) assigning one or more of the mcfDNA sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database; and (c) optionally, calculating a microbial community composition for the sample, based on a relative abundance of the mcfDNA sequences assigned to each microbial species, wherein one or a combination of: the microbial DNA signature;
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; and (b) sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragment
  • a method is provided of detecting infection or inflammation, comprising: (a) performing, on a sample comprising mcfDNA from a subject, an enrichment polymerase chain reaction (E-PCR) using a degenerate primer and an additional primer, thereby generating enriched mcfDNA fragments, wherein: (i) the degenerate primer comprises complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region, and (ii) the additional primer is complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (b) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragment
  • a method is provided of detecting infection or inflammation, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA) from a subject, a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; (b) sequencing the amplified mcfDNA fragments or obtaining sequences of the amplified mcfDNA
  • a method is provided of detecting infection or inflammation, comprising: (a) providing mcfDNA sequences, wherein the mcfDNA sequences are derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; and (b) assigning one or more of the mcfDNA sequences as belonging to a pathogenic bacterial species based on the closest sequence match of the hypervariable region to a reference microbial database, wherein the pathogenic bacterial species comprise one or more of Escherichia species, Escherichia coli, Shigella species, Klebsiella species, Klebsiella pneumoniae, Klebsiella oxytoca, Kluyvera species, Erwin
  • a system for amplifying microbial cell free DNA (mcfDNA), comprising: (a) a reaction vessel; (b) a reagent dispensing module; and (c) software to execute any of the methods provided herein, wherein the method is executed at least partially robotically.
  • mcfDNA microbial cell free DNA
  • Fig. 1 is a schematic of SPA fragment generation.
  • the arrow indicates the position of the SPA primer (5’ to 3’).
  • the SPA fragment refers to the mcfDNA fragment region that will be amplified that is shown in dark shading.
  • Fig. 2 is a schematic overview of one embodiment of the protocol for generating single point amplification (SPA) fragments for sequencing. The various steps are in order of their successive execution. Once single point amplicon fragments are generated, they are sequenced using the standard protocol for next generation paired-end Illumina sequencing.
  • SPA single point amplification
  • Fig. 3 is a schematic representation of the ligation and PCR steps that can be included in the SPA fragment sequencing protocol. Details of the SPA-linker used in the ligation step, the Linker. v2-amp primer and the 16SV4-R785-Bio primer used in the enrichment (E-PCR) step, the six phased primers R1 -linker. v2 -SPO to SP5 and the 16SV4-R781-Rd2 primer used in the amplification (A-PCR) step, and the indexing primers used in the indexing (I-PCR) step are shown.
  • sequences of the UMI region (with sequence NNTTNN (SEQ ID NO: 1)) and the non- complementary 3 ’-end sequence of the shorter SPA-linker fragment (with sequence 3’-ACG) are indicated via shading.
  • the 3 ’-end extension to the degenerate primer used in the A-PCR reaction compared to the degenerate primer used in the E-PCR reaction is highlighted in bold/italics.
  • the Illumina Read-1 and Read-2 sequences for use in multiplex ILLUMINA NEXTERA sequencing are included in the primers used in the A-PCR and these sequences are shown in bold, with the linker sequences being underlined.
  • A adenine
  • G guanidine
  • C cytosine
  • T thymine
  • B not A (T, C, or G)
  • V not T (A, C, or G)
  • H not G (A, T or C)
  • D not C (A, T or G)
  • N any nucleotide (A, G, C or T).
  • FIG. 4 is an overview of an exemplary method used for SPA primer selection.
  • Fig. 5 shows a comparison between SPA fragment sequencing, 16S rRNA V3-V4 amplicon sequencing and 16S rRNA V6 amplicon sequencing to determine the microbial community composition in water after a spike in with 2.5 pg/pl MCI DNA.
  • the 16S rRNA gene amplicon sequencing results are based on sequencing the 130 bp 5’ and 3’ ends of the V3-V4 region using the Rdl-16S-V3-F primer and the Rd2-16S-V4-R primer, respectively, and sequencing of the V6 region using the Rdl-16S-V6-F and Rd2-16S-V6-R primers. Only bacteria identified with >50% confidence at the genus level are reported. For comparison, the expected theoretical composition of MCI, after correction for genome size and 16S rRNA gene copy number, is provided.
  • Fig. 6 shows a comparison between SPA fragment sequencing, 16S rRNA V3-V4 amplicon sequencing and 16S rRNA V6 amplicon sequencing to determine the microbial community composition in blood plasma after a spike in with 2.5 pg/pl MCI DNA.
  • the 16S rRNA gene amplicon sequencing results are based on sequencing the 130 bp 5’ and 3’ ends of the V3-V4 region using the Rdl-16S-V3-F primer and the Rd2-16S-V4-R primer, respectively, and sequencing of the V6 region using the Rdl-16S-V6-F and Rd2-16S-V6-R primers. Only bacteria identified with >50% confidence at the genus level are reported. For comparison, the expected theoretical composition of MCI, after correction for genome size and 16S rRNA gene copy number, is provided.
  • Fig. 7 shows a comparison between SPA fragment sequencing and 16S rRNA V3- V4 amplicon sequencing to determine the microbial community composition in blood plasma after a spike in with 5.0 pg/pl MCI DNA.
  • the 16S rRNA gene amplicon sequencing results are based on sequencing the 3’ end of the V3-V4 region using the Rd2-16S-V4-R primer. Only bacteria identified at >50% confidence level at the genus level are reported. For comparison, the expected theoretical composition of MCI, after correction for genome size and 16S rRNA gene copy number, is provided.
  • Fig. 8 shows the successful stratification of colon cancer patients and non-small cell lung cancer patients using principal component analysis of their mcfDNA microbial signatures obtained with SPA fragment sequencing of the 16S rRNA gene V4 hypervariable region.
  • the type of cancer as well as the stage of the cancer are indicated for each patient.
  • the light shading represents colon cancer and the dark fill represents non-small cell lung cancer.
  • Fig. 9 shows the successful stratification of colon cancer patients and breast cancer patients using principal component analysis of their mcfDNA microbial signatures obtained with SPA fragment sequencing of the 16S rRNA gene V4 hypervariable region.
  • the type of cancer as well as the stage of the cancer are indicated for each patient.
  • the light shading represents breast cancer and the dark fill represents colon cancer.
  • Fig. 10 shows the successful stratification of non-small cell lung cancer patients and breast cancer patients using principal component analysis of their mcfDNA microbial signatures obtained with SPA fragment sequencing of the 16S rRNA gene V4 hypervariable region.
  • the type of cancer as well as the stage of the cancer are indicated for each patient.
  • the dark fill represents breast cancer and the light shading represents non-small cell lung cancer.
  • Fig. 11 shows stratification of colorectal cancer patients and healthy individuals using principal component analysis of their mcfDNA microbial signatures obtained with SPA fragment sequencing of the 16S rRNA gene V4 hypervariable region.
  • the stage of colorectal cancer is indicated for each patient.
  • a healthy individual is defined as an individual not having a cancer diagnosis and presumably cancer-free.
  • the dark fill represents colorectal cancer and the light shading represents healthy cancer-free; the water control is represented by a double triangle shape.
  • Fig. 12 shows nucleotide statistics for the rpoB gene region 1327-1355 and degenerate sequence (GAYGAY ATYGAYCAYYTNGGHAAYCGHGC (SEQ ID NO: 2)) which from position 1327-1352 is the reverse complement sequence of degenerate primer RpoBl-R1327.
  • the relative abundance of a nucleotide at a specific position was calculated using the nucleotide sequences of 47,505 aligned unique rpoB genes from the PATRIC database and used to design the degenerate sequence, which is provided from 5’ to 3’ using the following nucleotide codes: A: adenine; G: guanidine; C: cytosine; T: thymine; R: purine (A or G); Y: pyrimidine (T or C); H: not G (A, T or C); N: any nucleotide (A, G, C or T); N*: presence of any nucleotide (N) at a specific rpoB gene position. The percentages of highly conserved nucleotide sequences used to determine the consensus sequence for the degenerate primer are highlighted. The position of the region is based on the nucleotide sequence of the Escherichia coli rpoB gene.
  • Fig. 13 shows the machine learning informed diagnosis/classification of patients using their rpoB gene-derived mcfDNA signatures.
  • the SciKitLeam https ://scikit- leam.org/stable/) package in python was used for stratification.
  • Fig. 14A shows box plots of the blood pathogenicity index versus the disease status for ulcerative colitis. Disease status was determined as being in remission or “not in remission”, respectively, with “not in remission” being defined as an active disease flair or medication failure - medication stopped working or serious side effects of the medication were observed.
  • the blood pathogenicity index determined for the samples used to create Table 7, was calculated as the percentage of mcfDNA fragments coming from a defined set of pro-inflammatory bacteria as a fraction of the total amount of mcfDNA fragments found in a blood plasma sample.
  • the blood pathogenicity median is shown as a line inside the box extending from the first quartile (QI) to the third quartile (Q3).
  • the whiskers extend from the box to the farthest data point lying within 1.5x the inter-quartile range (IQR) from the box. Outlier points are those past the end of the whiskers.
  • Fig. 14B shows box plots of the blood pathogenicity index versus the disease status for Crohn’s disease. Disease status was determined as being in remission or “not in remission”, respectively, with “not in remission” being defined as an active disease flair or medication failure - medication stopped working or serious side effects of the medication were observed.
  • the blood pathogenicity index determined for the samples used to create Table 7, was calculated as the percentage of mcfDNA fragments coming from a defined set of pro-inflammatory bacteria as a fraction of the total amount of mcfDNA fragments found in a blood plasma sample.
  • the blood pathogenicity median is shown as a line inside the box extending from the first quartile (QI) to the third quartile (Q3).
  • the whiskers extend from the box to the farthest data point lying within 1.5x the inter-quartile range (IQR) from the box. Outlier points are those past the end of the whiskers.
  • Fig. 15A shows box plots of the fecal calprotectin levels versus the disease status for ulcerative colitis.
  • Disease status was determined as being in remission or “not in remission”, respectively, with “not in remission” being defined as an active disease flair or medication failure - medication stopped working or serious side effects of the medication were observed.
  • Calprotectin levels were determined as pg/g fecal material for the samples used to generate the data for Table 8. The null hypothesis that the distribution of calprotectin levels of patients in remission is the same as the distribution of patients not in remission is rejected for ulcerative colitis patients with a p-value of 0.0212876 (Mann- Whitney U test).
  • the calprotectin median is shown as a line inside the box extending from the first quartile (QI) to the third quartile (Q3).
  • the whiskers extend from the box to the farthest data point lying within 1.5x the inter-quartile range (IQR) from the box. Outlier points are those past the end of the whiskers.
  • Fig. 15B shows box plots of the fecal calprotectin levels versus the disease status for Crohn’s disease. Disease status was determined as being in remission or “not in remission”, respectively, with “not in remission” being defined as an active disease flair or medication failure - medication stopped working or serious side effects of the medication were observed. Calprotectin levels were determined as pg/g fecal material for the samples used to generate the data for Table 8. The null hypothesis that the distribution of calprotectin levels of patients in remission is the same as the distribution of patients not in remission is accepted for Crohn’s disease patients with a p-value of 0.8428739 (Mann- Whitney U test).
  • the calprotectin median is shown as a line inside the box extending from the first quartile (QI) to the third quartile (Q3).
  • the whiskers extend from the box to the farthest data point lying within 1.5x the inter-quartile range (IQR) from the box. Outlier points are those past the end of the whiskers.
  • Fig. 16 shows a schematic representation of the journey of Crohn’s disease patient CD-4 and how the blood-based pathogenicity index (BPI) can be successfully used to influence a better disease outcome.
  • the patient is in remission (12/2018; 06/2019) on a combination of azathioprine and adalimumab, the latter being discontinued due to side effects (10/2019). Stopping this medication results in full remission (01/2020).
  • clinical use of the BPI would have identified inflammatory markers in blood that alter CD-4 patient pathway by changing medication.
  • the side effects from adalimumab were missed by the calprotectin (Calpr) levels and the SCDAI score (SCADI).
  • Data points marked by open circles or grey circles indicate remission or active disease, respectively. Arrows indicate medication is being taken over a certain time period.
  • a STOP symbol indicates that a medication is being discontinued at a given timepoint.
  • Fig. 17 shows a schematic representation of the journey of ulcerative colitis patient UC-3 and how the blood-based pathogenicity index (BPI) can be successfully used to influence a better disease outcome. It is assumed that the initial combination of tofacitinib, vedolizumab and mesalamine is working before 11/2018. Issues are reported for mesalamine, which is removed from the medication (02/2019). In retrospect (11/2018), clinical use of BPI would have identified inflammatory markers in blood that alter UC-3 patient pathway by removing mesalamine as medication, resulting in a decrease of the BPI (03/2019) indicative for remission in line with fecal urgency.
  • BPI blood-based pathogenicity index
  • the side effects from mesalamine were missed by the calprotectin levels (Calpr) and the Mayo-6 score (Mayo-6).
  • Data points marked by open circles or grey dots indicate remission or active disease, respectively.
  • a data point marked by a black dot indicates that no sample to determine the BPI was available.
  • Arrows indicate medication is being taken over a certain time period.
  • a STOP symbol indicates that a medication is being discontinued at a given timepoint.
  • Fig. 18 shows a schematic representation of the journey of ulcerative colitis patient UC-9 and how the blood-based pathogenicity index (BPI) can be successfully used to influence a better disease outcome.
  • mesalamine is working as medication but starts to lose efficiency (11/2017) based on disease symptoms and calprotectin levels.
  • Adalimumab is added as medication (12/2017), resulting in remission (11/2019).
  • clinical use of the BPI would have identified inflammatory markers in blood that alter the UC-9 patient pathway by changing medication to include adalimumab, resulting in a decrease of the BPI (11/2019).
  • Data points marked by open circles or grey dots indicate remission or active disease, respectively.
  • a data point marked by a black dot indicates that no sample to determine the BPI was available. Arrows indicate medication is being taken over a certain time period.
  • the term “about” when used in connection with one or more numbers or numerical ranges should be understood to refer to all such numbers, including all numbers in a range and modifies that range by extending the boundaries above and below the numerical values set forth.
  • the recitation of numerical ranges by endpoints includes all numbers, e.g., whole integers, including fractions thereof, subsumed within that range (for example, the recitation of 1 to 5 includes 1, 2, 3, 4, and 5, as well as fractions thereof, e.g., 1.5, 2.25, 3.75, 4.1, and the like) and any range within that range.
  • the term "about”, when referring to a value can encompass variations of, in some embodiments +/-20%, in some embodiments +/-10%, in some embodiments +/-5%, in some embodiments +/-!%, in some embodiments +/-0.5%, and in some embodiments +/-0.1%, from the specified amount, as such variations are appropriate in the disclosed compositions and methods.
  • the term can mean within an order of magnitude, preferably within 5 -fold, and more preferably within 2-fold, of a value.
  • the term “subject” includes humans and animals and can be used interchangeably with the term “human” and the term “patient”.
  • SPA fragment and “SPA fragment sequence” are herein used interchangeably.
  • microbial phylogenetic marker gene as used herein means any conserved gene from any organism, including but not limited to bacteria, fungi, parasites, and viruses, that is suitable for phylogenetic identification.
  • amplified mcfDNA fragments and “amplified mcfDNA fragment sequences” are used interchangeably herein for the purposes of the specification and claims.
  • Deep microbial metagenome sequencing is the most informative approach when it comes to microbial community analysis, as it will provide detailed information regarding community composition as well as the key functions encoded by the community members.
  • metagenome sequencing technologies to reduce its costs, it is currently still too expensive for routine screening purposes of human associated microbial communities in large population screenings.
  • Another disadvantage of deep microbial metagenome sequencing is the need for relatively large amounts of high-quality microbial DNA. This has hindered its application to study the microbial communities associated with liquid and solid biopsy samples, where only a small fraction of the total DNA is of microbial origin.
  • the amplification and subsequent sequencing of phylogenetic marker genes provides an alternative, cheaper high throughput method for microbial community analysis.
  • tissue biopsy samples where there is sufficient concentration of DNA having average fragment length of about 5,000 bp or more
  • amplification-based sequencing approaches have been successfully applied to identify differences in microbial communities between healthy individuals and patients suffering from a wide range of diseases.
  • Advantages of the amplification and subsequent sequencing method include that it requires significantly less DNA than metagenome sequencing, and because specific DNA primers are used to amplify phylogenetic target genes, there is little contamination with host DNA, making this method suitable to analyze the microbial communities associated with tissue biopsy samples, from which small amounts of high molecular weight DNA can be obtained.
  • analysis of microbial signatures in liquid biopsy samples, especially peripheral blood samples results in additional challenges as compared to tissue biopsy samples, due to the low concentration of mcfDNA having small fragment sizes.
  • cfDNA accounts for the vast majority of cfDNA (>90% or even >99%), while mcfDNA accounts for only a small fraction with 0.08%-4.85% from bacteria, 0.00%-0.01% from fungi, and 0.00%-0.16% from viruses/phages (Han et al, 2020).
  • the percentage of mcfDNA compared to cfDNA should be placed in the context of the human genome size and the size of an average microbial genome, with sizes of 6.4 billion and approximately 6 million nucleotides, respectively, therefore providing similar coverage.
  • mcfDNA represents an important signal that is largely being ignored in liquid biopsy testing.
  • the present inventors developed a single point amplification sequencing approach that exploits the combination of a degenerate primer for a conserved region of a marker gene located adjacent to a phylogenetic hypervariable region of the gene for a wide range of microbes.
  • the method is based on the targeted amplification of high-resolution phylogenetic identifier fragments from mcfDNA, which comprises a fraction of the total cfDNA isolated from, for example, biopsy samples.
  • mcfDNA which comprises a fraction of the total cfDNA isolated from, for example, biopsy samples.
  • a hypervariable DNA region with high phylogenetic resolution is targeted.
  • the fragments resulting from specific amplification of the hypervariable DNA regions are referred to as SPA fragments.
  • methods and kits are provided herein for generating the SPA fragments from a sample, such as a blood sample.
  • the methods and kits provided herein can be used to determine the presence of one or more microbial species, microbial DNA signatures, and/or microbial community compositions for the sample.
  • the microbial phylogenetic marker gene can be eukaryotic, fungal, viral, or bacterial, and combinations thereof.
  • the microbial phylogenetic marker gene is a eubacterial phylogenetic marker gene.
  • the length of the SPA fragment is determined by the distance between the end of the mcfDNA fragment and the 3 ’-end of the primer annealing site. Only mcfDNA fragments that contain the primer annealing site will give SPA fragments, which can be subsequently sequenced and used for high resolution phylogenetic identification of microbial species and/or microbial signatures. Some embodiments include analysis of microbial community composition.
  • the methods provided herein use a single conserved DNA sequence as the primer annealing site to initiate PCR amplification.
  • the amplification initiated from this single conserved DNA sequence allows for targeted amplification of the hypervariable region located adjacent to the primer annealing site, independent of the size of the fragment, followed by sequencing of the amplified fragment.
  • This method may be referred to herein as Single Point Amplification (SPA) fragment sequencing.
  • SPA Single Point Amplification
  • a method is provided of amplifying microbial cell free DNA (mcfDNA), comprising: (a) providing a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (b) providing an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E- PCR) using the degenerate primer and the additional primer, thereby generating enriched mcfDNA fragments; and (d) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragment
  • a method is provided of amplifying microbial cell free DNA (mcfDNA), comprising: (a) performing an enrichment polymerase chain reaction (E-PCR) on a sample comprising the microbial cell-free DNA (mcfDNA), wherein enriched mcfDNA fragments are generated using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and wherein the degenerate primer comprises an affinity purification group, and (ii) an additional primer complementary to an end of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; and (b) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragments using E-PCR
  • Additional advantages of the present invention include that it enables identification of microbial species and/or microbial DNA signatures in a sample, such as a blood sample, that allows for diagnosis and/or stratification of cancer patients by type of cancer, stage of cancer, and specific tumor subtype driven by the location of the tumor.
  • methods for diagnosing a disease or disorder in a sample, including a blood sample.
  • the methods include assigning one or more mcfDNA fragment sequences derived from the sample as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database, or as having a microbial DNA signature.
  • the method can include calculating a microbial community composition for the sample, based on the relative abundance of the mcfDNA fragment sequences assigned to each microbial species. Diagnosing the sample as having a disease or disorder includes linking the microbial DNA signature, the microbial community composition, or one or a combination of the presence, the absence, or the relative abundance of the microbial species to the disease or disorder.
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) providing a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (b) providing an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E- PCR) using the degenerate primer and the additional primer, thereby generating enriched mcfDNA fragments; (d) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragments using (i)
  • E-PCR enrichment poly
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; (b) sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragments; (c) assigning one
  • a method is provided of diagnosing or monitoring a disease or disorder, comprising: (a) providing mcfDNA sequences, wherein the mcfDNA sequences are derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; (b) assigning one or more of the mcfDNA sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database; and (c) optionally, calculating a microbial community composition for the sample, based on a relative abundance of the mcfDNA sequences assigned to each microbial species, wherein one or a combination of: the microbial DNA
  • a method of diagnosing or monitoring a disease or disorder, comprising: (a) sequencing amplified mcfDNA fragments derived from a sample, wherein the amplified mcfDNA fragments comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; (b) assigning one or more of the mcfDNA sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database; and (c) optionally, calculating a microbial community composition for the sample, based on a relative abundance of the mcfDNA sequences assigned to each microbial species, wherein one or a combination of: the microbial DNA signature; the microbial community
  • a method of diagnosing or monitoring a disease or disorder, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; and (b) sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragments, wherein the closer the
  • a blood pathogenicity index (BPI) is provided as an example of a microbial signature that can be used to monitor disease progression, medication efficiency and medication side effects in IBD patients or in patients suffering from other inflammatory conditions.
  • the BPI is calculated as the percentage of mcfDNA fragments coming from pro-inflammatory bacteria as part of the total amount of mcfDNA fragments found in a blood plasma sample.
  • a method is provided of detecting infection or inflammation, comprising: (a) performing, on a sample comprising mcfDNA from a subject, an enrichment polymerase chain reaction (E-PCR) using a degenerate primer and an additional primer, thereby generating enriched mcfDNA fragments, wherein: (i) the degenerate primer comprises complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and wherein the degenerate primer is oriented to prime polymerase extension of the hypervariable region, and (ii) the additional primer is complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (b) performing an amplification PCR (A-PCR) on the enriched mcfDNA fragments using (i)
  • E-PCR enrichment
  • a method is provided of detecting infection or inflammation, comprising: (a) performing, on a sample comprising microbial cell-free DNA (mcfDNA) from a subject, a polymerase chain reaction (PCR) using: (i) a degenerate primer comprising complementarity to a conserved region, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) an additional primer complementary to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; (b) sequencing the amplified mcfDNA fragments or obtaining sequences of the amplified mcfDNA fragments; and (c) assigning
  • a method of detecting infection or inflammation, comprising: (a) providing mcfDNA sequences, wherein the mcfDNA sequences are derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of a microbial phylogenetic marker gene, and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region; and (b) assigning one or more of the mcfDNA sequences as belonging to a pathogenic bacterial species based on the closest sequence match of the hypervariable region to a reference microbial database, wherein the pathogenic bacterial species comprise one or more of Escherichia species, Escherichia coli, Shigella species, Klebsiella species, Klebsiella pneumoniae, Klebsiella oxytoca, Kluyvera species, Erwinia species, Hae
  • the degenerate primer can be used in combination with an adaptor, such as, for example, an asymmetric linker cassette which is attached to the ends of the cfDNA fragments in the sample.
  • an adaptor such as, for example, an asymmetric linker cassette which is attached to the ends of the cfDNA fragments in the sample.
  • a PCR amplification reaction is performed using the degenerate primer and a primer complementary to the 5’ asymmetrical end of the linker cassette.
  • the degenerate primer is designed to allow for DNA synthesis into the hypervariable region. However, successful PCR amplification of the hypervariable region occurs only when the asymmetric linker cassette is repaired.
  • the asymmetric linker cassette will be repaired only when located downstream from the degenerate primer annealing site, i.e, when the asymmetric linker cassette has been ligated to a mcfDNA fragment that contains the conserved region of the phylogenetic marker gene. In this manner, microbial DNA fragments that originate from the hypervariable region are selectively amplified.
  • an affinity tagged (e.g., a biotinylated) version of the degenerate primer can be used.
  • the amplified and biotinylated fragments can be purified using streptavidin coated magnetic beads. This step results in removal of the background cfDNA, especially human cfDNA, that lacks the annealing site for the degenerate primer. This step is referred to as “enrichment PCR (E-PCR)” (see Figure 3).
  • the purified biotinylated fragments are subsequently used in a second PCR reaction.
  • the degenerate primer used in this PCR reaction referred to as the “amplification PCR (A-PCR) reaction, recognizes the same conserved DNA region as the primer used in the E-PCR reaction, except that its annealing site located in the conserved region is shifted towards the 3’- direction compared to the primer used in the E-PCR reaction (see, e.g., Figure 3).
  • the degenerate primer used in the A-PCR comprises a 3 ’-end extension of at least three nucleotides relative to the degenerate primer of the E-PCR. This allows for further enrichment of DNA fragments that contain the conserved DNA region while avoiding the amplification of DNA fragments that originated from aspecific hybridization of the E-PCR primer.
  • phased primers recognizing the linker sequence can be used for the A-PCR reaction (see, e.g., Figure 3).
  • the present inventors experimentally exemplified the methods provided herein by exploiting the phylogenetic resolution of a hypervariable region of the V3-V4 hypervariable region of the 16S rRNA gene, a multi copy gene.
  • the present inventors exemplified an approach that exploits the phylogenetic resolution of the rpoB gene, a single copy housekeeping gene. See EXAMPLES 1-5 herein below.
  • Additional embodiments of the invention include use of a conserved DNA sequence as the primer annealing site for more than one site on a phylogenetic marker gene or for a site on two or more different phylogenetic marker genes in a single amplification reaction.
  • the methods provided herein can further include providing one or more additional degenerate primers comprising complementarity to one or more additional conserved regions of the microbial phylogenetic marker gene or to one or more additional genes.
  • the one or more additional genes can be one or more additional microbial phylogenetic marker genes.
  • the one or more additional genes can encode virulence factors, secondary metabolites, or drug degradation pathways, and combinations thereof.
  • the methods provided herein can include a degenerate primer for both the 16S rRNA gene and the rpoB gene.
  • the use of two or more degenerate primers for annealing to two or more conserved regions on a single or two different phylogenetic marker genes may be referred to herein as “multi-loci SPA fragment sequencing”.
  • EXAMPLES 1-5 the 16S rRNA gene and the RNA polymerase subunit B (rpoB gene were used, but the SPA fragment sequencing method is very broadly applicable to conserved housekeeping genes, including, but not limited to, the prokaryotic genes coding for the DNA gyrase subunit B (gyrB), the heat shock protein 60 (As/i O). the superoxide dismutase A protein (sodA). the TU elongation factor (tuf), and the DNA recombinase proteins (including recA, recE).
  • the SPA fragment sequencing method can also be applied on the Eukaryotic internal transcribes spacer (ITS) regions ITS1, which is located between the 18S and 5.8S rRNA genes, and ITS2, which is located between the 5.8S and 28S rRNA genes.
  • ITS Eukaryotic internal transcribes spacer
  • the SPA fragment sequencing method can also be applied to genes that are unique to pathogenic fungi including the trrl gene that encodes for thioredoxin reductase; the rim8 gene that encodes for a protein involved in the proteolytic activation of a transcriptional factor in response to alkaline pH; the kre2 gene that encodes for a-l,2-mannosyltransferase; and the erg6 gene that encodes for A(24)-sterol C-methyltransferase (Abadio et al, 2011); or any conserved gene from any organism, including bacteria, fungi, parasites, and viruses that is suitable for phylogenetic identification.
  • EBV Epstein-Barr Virus
  • HPV Human Papillomavirus
  • HBV Hepatitis B virus
  • HHV-8 Human Herpesvirus-8
  • MCPyV Merkel Cell Polyomavirus
  • Advantages of the disclosed SPA fragment sequencing methods include an increase in the diversity of hypervariable regions that can be targeted for amplicon analysis as the method only requires one neighboring conserved region to bind the primer (compared with the two required by dual primer approaches). As such, the SPA fragment sequencing method is more adaptable, flexible, and offers greatly improved resolution over current methods, especially when it comes to the analysis of short DNA fragments such as mcfDNA.
  • an adaptor such as, for example, an asymmetric linker cassette
  • an asymmetric linker cassette can be used to introduce a DNA sequence that is targeted by an additional primer in the PCR amplification reaction.
  • the adaptors are “defective” or in other words “asymmetric”. This can be accomplished by designing an adaptor as an asymmetric linker cassette where the strand that serves as the template for primer annealing is missing.
  • Typical asymmetric linker cassette configurations include, but are not limited to:
  • a “Y”-shaped linker cassette where two single stranded DNA fragments that are only partially complementary are annealed. This results in an asymmetric linker cassette where one end is double stranded, allowing for ligation, but where the other end is comprised of two single stranded non-complementary DNA strands.
  • a “single arm” linker cassette where a shorter single stranded DNA fragment is annealed to the complementary 3 ’-end of a longer single stranded DNA fragment. This results in an asymmetric linker cassette with a single stranded the 5’-end and a double stranded 3’-end.
  • the single strands of the asymmetric linker cassette can be complementary over a stretch of about at least 16 nucleotides with an annealing temperature of approximately 50°C or higher, allowing for a linker cassete that is stable at room temperature.
  • the single strand of the asymmetric linker can also contain 4 to 6 random nucleotides that constitute a Unique Molecular Identifier (UMI) to correct PCR induced errors including PCR amplification bias, improve sequencing accuracy, and correct for fragment copy number.
  • UMI Unique Molecular Identifier
  • the asymmetric linker cassette includes a 3’sticky end.
  • the 3’sticky end can be formed by a single nucleotide, such as, for example, thymine.
  • the terminal 3’ nucleotide can be a dideoxy nucleotide that functions as a chain-elongating inhibitor of DNA polymerase.
  • UMI Unique Molecular Identifier
  • SPA fragment sequences derived from the same microorganisms are identical except for the length of the sequenced fragment, which varies as a function of the distance between the gene specific primer annealing site (e.g., 16SV4-R781-Rd2 primer) and the end of the mcfDNA fragment, to which the UMI is atached.
  • the uniqueness of a sequenced DNA fragment can be inferred from a combination of its UMI, the length of the SPA fragment, and the sequence of the SPA fragment.
  • the number of UMIs can be smaller than the number of fragments that cover a specific DNA region from the same microorganism.
  • the pool of UMI’s does not have to be in excess of the number of fragments that cover a specific DNA region from the same microorganism, each of the UMI’s present on the DNA or RNA fragments are not required to be unique. Therefore, the term “UMI” is not intended to be understood according to the dictionary definition of “unique”.
  • the UMIs can range in length from 4 to 6 nucleotides, encoding for 256 or 4096 unique sequences, respectively.
  • N any nucleotide (A, G, C or T)
  • A adenine
  • G guanidine
  • C cytosine
  • T thymine.
  • the presence of the two thymine nucleotides allows for determination of the frequency of errors that occur in the UMI sequence during PCR amplification.
  • the asymmetric linker cassete will only be repaired when located downstream from the degenerate primer annealing site.
  • the term "repaired" when used in the context of the asymmetric linker cassete means that a new DNA strand is created in the PCR reaction that is complementary at the 5' end of the asymmetric linker cassete. DNA synthesis initiated from the degenerate primer into the asymmetric linker cassete will restore the defective DNA strand complementary to the 5 ’-end of the linker and in this manner the asymmetric linker cassette is repaired. In subsequent PCR cycles this strand is used for primer annealing, allowing for the amplification of the hypervariable region.
  • the resulting amplicons can be further amplified in a third PCR reaction to introduce two Unique Dual Indexes (UDI), one at each end of the amplicons, and, for example, the Illumina sequencing anchors P5 and P7.
  • UMI Unique Dual Indexes
  • These indexes provide distinct barcode sequences for each of the i5 and i7 index reads for a specific sequencing library, allowing for multiplexing of sequencing libraries. After sequencing, the barcodes are used to assign each sequenced fragment to its specific library.
  • the method includes one or more of the following steps as outlined in Figure 2:
  • cfDNA Blood plasma collection and isolation of cfDNA using standard protocols.
  • Cell-free DNA can be extracted from 0.5 ml blood plasma using the typically yielding 0.1 ng to 10 ng to be used for sequencing.
  • cfDNA can also be isolated from urine, saliva, stool, spinal fluid, cerebral fluid, and other biopsy samples. For this step many commercial kits are available that either use columns or magnetic beads for cfDNA isolation from liquid biopsy samples, including blood and urine.
  • total DNA that has been fragmented to smaller sizes via mechanical shearing or enzymatic digestion can be used.
  • End repair and 5 ’-phosphorylation of cfDNA fragments followed by the 3’ addition of a deoxy-adenine to create a 3 ’-sticky end formed by a single adenine nucleotide using standard protocols.
  • a typical protocol to process cfDNA includes end repair (blunting and 5' phosphorylation), 3' A-tailing, followed by adaptor ligation.
  • the fragment ends are repaired by blunting and 5' phosphorylation with a mixture of enzymes, such as T4 polynucleotide kinase (PNK) and T4 DNA polymerase (T4 DNA pol).
  • PNK polynucleotide kinase
  • T4 DNA polymerase T4 DNA polymerase
  • This end repair step is followed by 3' A-tailing at 37 °C using a mesophilic polymerase such as Klenow Fragment 3'-5' exonuclease minus (Head et al, 2014). Many commercial kits are available to perform this step.
  • a mesophilic polymerase such as Klenow Fragment 3'-5' exonuclease minus (Head et al, 2014).
  • Annealing of two partially complementary single stranded DNA fragments results in a “Y”-shaped or single arm DNA linker cassette.
  • the linker cassette On one end, the two strands of the linker cassette are not complementary.
  • the linker cassette On other end, where the two strands are complementary, the linker cassette includes a 3 ’sticky end formed by a single thymine nucleotide. Due to the sticky ends, the only possible ligation is between cfDNA fragments and asymmetric linker cassettes, while self-ligation of linker cassettes and repaired cfDNA fragments is blocked.
  • PCR Single point linker cassette repair as part of the Enrichment PCR step (E-PCR).
  • the terms “ePCR” and “E-PCR” are herein used interchangeably. PCR is performed on the ligation product using the following primers: (a) the Linker. v2-amp primer that recognizes the repaired 5’ asymmetrical end of the SPA-Linker-UMI.v2-Y-SP linker cassette; (b) one or more primers that recognize the primer annealing site specific for the conserved region of the one or more phylogenetic marker genes. DNA amplification initiated from the genespecific SPA primer will result in the repair of the asymmetric linker cassette but only when this cassette is bound to a cfDNA fragment that contains the primer annealing site on the conserved region.
  • the primer that recognizes the repaired 5’ asymmetrical end of the linker cassette can anneal, and PCR amplification is initiated.
  • the 16SV4-R785-bio primer this will result in the amplification of DNA sequences located upstream of position 785 of the 16S rRNA gene.
  • a 5’-biotinilated version of the one or more gene specific primers e.g., 16SV4-R785- bio primer
  • streptavidin-coated beads in the subsequent purification step allows for the removal of the non-biotinylated DNA fragments.
  • the enrichment of the amplified fragments is referred to herein as performing an “affinity purification” and the biotin is referred to as an affinity purification group or an affinity tag.
  • Affinity purification can be performed using any combination of steps and types of affinity purification groups known to those of ordinary skill in the art.
  • an additional PCR step referred to as the amplification PCR (A-PCR) step, is used to reduce background amplification resulting from nonspecific primer annealing to e.g. human DNA fragments.
  • A-PCR amplification PCR
  • the terms “aPCR” and “A-PCR” are herein used interchangeably.
  • the A-PCR can use the Linker. v2-amp primer in combination with one primer annealing to the conserved region extended at its 3’-end by 3 or more nucleotides (e.g. primer 16SV4-R781-Rd2) compared to the primer used in the E-PCR step (STEP 4) (e.g. 16SV4-R785-bio; see Figure 3).
  • a set of primers can be used that are identical except that their sequences are phased by a single nucleotide, as shown in Figure 3. Therefore, as an alternative for using a single Linker.
  • v2-amp primer a set of six primers referred to as Linker-aPCR mix, can be used (see Table 1 and Figure 3 for their sequences).
  • the forward (Linker-aPCR mix) and reverse e.g.
  • 16SV4-R781-Rd2) primers include an additional 5’ sequence corresponding to the Illumina Read-1 and Read-2 sequences, respectively, to allow for indexing as part of sequencing library preparation, and these sequences are shown in bold in Figure 3, with the linker sequence being underlined.
  • I-PCR Indexing PCR reaction
  • Unique Dual Indexes (UDI) and Illumina sequencing anchors (P5 and P7) are added to the amplified SPA fragments using P5-I5-Rdl and P7-I7-Rd2 primers (see Table 1).
  • the I-PCR reactions are performed using unique sets of UDI for each sample, subsequently allowing the pooling/multiplexing of the libraries, after which fragments are paired-end sequenced using NGS Illumina sequencing, e.g. on the Illumina NEXTSEQ 1000 (Illumina, Inc., San Diego, CA).
  • sequenced fragments that all share the sequence of the gene specific primer (e.g., 16SV4- R781-Rd2 primer) followed by sequences that vary in length and nucleotide composition. Sequences derived from the same microorganisms will be identical except for the length of the sequenced fragment, which will vary as a function of the distance between the gene specific primer annealing site (e.g., 16SV4-R781-Rd2 primer) and the end of the mcfDNA fragment.
  • Table 1 Overview of primer sequences.
  • nucleotide codes were used: A: adenine; G: guanidine; C: cytosine; T: thymine; R: purine (A or G); V: not T (A, C, or G); H: not G (A, T or C); D: not C (A, T or G); N: any nucleotide (A, G, C or T).
  • the 3’-end extension to the degenerate primers used in the A-PCR reaction compared to the degenerate primers used in the E-PCR reaction are highlighted in bold/italics .
  • the Rdl and Rd2 sequences plus the standard linker sequences to connect the original primer sequences i. e.
  • linker and degenerate primer sequences to the Rdl or Rd2 sequences used for multiplex ILLUMINA NEXTERA sequencing are included in A-PCR primers and are shown in bold and underlined, respectively. *: Phosphorothioated modification to protect nucleotide from nuclease degradation; /5Phos/: 5’-end phosphorylation modification; /5Biosg/: 5’-end biotin modification.
  • the pooling ratio indicates the ratio of primers combined in a specific primer mix. n/a: not applicable.
  • the processing and analysis of the SPA fragment sequences includes one or more of the following steps:
  • Paired-end reads are filtered based on read quality, expected amplicon structure, and length.
  • the UMI (unique molecular identifier) of each read is recorded for downstream deduplication and denoising.
  • the longest read in each bin is searched against a database of bacterial 16S rRNA genes or genes corresponding to the microbial phylogenetic marker gene being targeted to aid the assembly into consensus sequences.
  • EXAMPLE 4 describes the design and use of rpoB gene specific primers.
  • EXAMPLES 1 - 5 of the present disclosure The utility of the methods of the invention is exemplified in EXAMPLES 1 - 5 of the present disclosure.
  • EXAMPLE 1 of the present disclosure the inventors describe an improved SPA fragment sequencing protocol targeting the conserved region upstream of the hypervariable V4 region of the 16S rRNA gene.
  • the improved method resulted in the percentage of 16S rRNA gene derived fragments (as identified by Blastn) increasing from less than 0.01% to an average of 35%, a 3,500 fold increase (see Table 2).
  • EXAMPLE 2 of the present disclosure a comparison between SPA fragment sequencing and amplicon sequencing targeting the V3-V4 or the V6 regions of the 16S rRNA gene is described using samples that were spiked with DNA of defined microbial communities. SPA fragments targeting the 16S rRNA gene V4 were superior over ampliconbased methods, providing a more accurate description of the relative abundances and the phylogenetic identification of the bacteria present in the samples. SPA fragment sequencing was also less sensitive to the presence of contaminating DNA (see Figures 5 to 7).
  • EXAMPLE 3 of the present disclosure the inventors demonstrate that the mcfDNA in blood plasma of cancer patients can be used to generate SPA fragments targeting the 16S gene V4 region. These SPA fragments enabled distinguishing between cancer-free individuals and those with colorectal cancer as well as between patients with lung, breast and colorectal cancer, including patients with early-stage cancer (see Figures 8 to 11).
  • EXAMPLE 4 of the present disclosure demonstrate that the mcfDNA in blood plasma of cancer patients can be used to generate SPA fragments targeting the hypervariable region of the rpoB gene located upstream of position 1327. These SPA fragments allowed, with 85 percent accuracy, to distinguish between healthy individuals and patients with lung and breast cancer, including patients with early-stage cancer.
  • EXAMPLE 5 of the present disclosure the inventors demonstrate that the mcfDNA in blood plasma of IBD patients can be used to monitor disease severity and progression in IBD patients.
  • SPA fragment sequences were used to determine the blood pathogenicity index (BPI), which is calculated as the percentage of mcfDNA fragments coming from pro-inflammatory bacteria as a fraction of the total amount of mcfDNA fragments found in a blood plasma sample.
  • BPI blood pathogenicity index
  • sequencing 300,000 fragments yields approximately 100,000 fragments that can be identified as 16S rRNA gene sequences. Since 100,000 sequencing reads represent the standard depths for amplicon-based sequencing for complex microbial community analysis, the latest Illumina NEXTSEQ instruments allow for an unprecedented number of samples to be sequenced in parallel. For example, the Illumina NEXTSEQ 6000 allows to theoretically collect 20 billion reads with a single run, which would correspond to approximately 65,000 paired-end sequenced samples. This is in sharp contrast to the scalability of deep sequencing of total cfDNA, where only approximately 200 samples can be processed in a single paired- end sequencing run (Poore et al, 2020).
  • SPA fragment sequencing can be useful as part of the general health screening. Unlike the stool microbiome, the microbiome of colonizing and infecting bacteria will be relatively stable, with changes occurring when the relation between host and microbes is changing.
  • SPA fragment sequencing provides an “open” diagnostics approach to detect any bacterium or fungus based on the presence of its mcfDNA in peripheral blood.
  • SPA fragment sequencing can provide an important non-invasive method for (early) detection and identification of infectious and colonizing bacteria using mcfDNA from peripheral blood samples, which can subsequently be linked to a broad range of diseases, including: screening for tuberculosis and other diseases caused by Mycobacterium species; determining pulmonary infection risks and causes in cystic fibrosis patients; determining the risk and onset of sepsis in patients with compromised immune systems; detection of opportunistic bacterial pathogens originating from the oral cavity that have been linked to Alzheimer’s disease, pancreatic cancer and other serious conditions such as endocarditis; women’s health issues including Chlamydia linked to mucopurulent cervicitis, pelvic inflammatory disease, tubal factor infertility, ectopic pregnancy and cervical cancer; detection and monitoring of progression of cancer; monitoring of minimal residual disease after oncology treatments; detection and monitoring of progression and minimal residual disease of breast cancer including triple negative breast cancer; detection of esophageal cancer, precancerous
  • SPA fragment sequencing represents a quantum leap forward to apply mcfDNA sequencing as a high-resolution, high-throughput and low-cost routine test in disease detection, patient monitoring, risk assessment and large- scale population screenings using mcfDNA informed biomarkers.
  • the microbial footprint obtained with SPA fragment sequencing combined with the mutational footprint and methylation footprint that are currently being used as biomarkers for the detection, monitoring and prognostics of cancers can provide a powerful tool for improved early detection and monitoring of progression of various types of cancer.
  • the term “microbial footprint” is used herein interchangeably with the terms “microbial community composition” and “community composition”.
  • the microbial footprint is used in combination with alternative tests, such as tests that focus on the analysis of circulating tumor DNA (ctDNA), for the detection of cancer, guiding tumor-specific treatments, monitoring of cancer treatment, and monitoring periods of symptom free remission and identifying potential early recurrence of cancer.
  • ctDNA circulating tumor DNA
  • Including the microbial footprint can increase the specificity and sensitivity of these screening tests, e.g. tests for the detection of specific cancers, such as early stage adenomas and carcinomas in colorectal cancer; or tests for early multi -cancer detection, such as the GALLERI cancer screening test and the GUARD ANT test.
  • the microbial footprint can be used to calculate the blood pathogenicity index (BPI). Included as pathogenic or pro-inflammatory bacteria in the BPI are the relative abundances of Enter obacteriaceae including Enterobacter species, Escherichia species (e.g. E. coll), Shigella species, Klebsiella species (e.g. K. pneumoniae and K. oxytoca), Kluyvera species, Erwinia species, Haemophilus species, Salmonella species (e.g. S. typhi and S.
  • BPI blood pathogenicity index
  • enterica enterica
  • Listeria species L. monocytogenes
  • pathogenic Clostridium species such as C. difficile, C. perfringens and C. botulinum
  • Veillonella species such as V. atypica, V. dispar and V. oral
  • Pseudomonas species like P. aeruginosa
  • Stenotrophomonas species like S. maltophilia
  • pathogenic Staphylococcus species like S. aureus
  • Porphyromonas species Porphyromonas species
  • Fusobacterium species like F. nucleatum and F. varium.
  • EXAMPLE 5 The results described in EXAMPLE 5 show that SPA fragment sequencing of mcfDNA in blood can be successfully used to determine the BPI of patients suffering from inflammatory diseases such as IBD.
  • the BPI can be used as a surrogate biomarker for both medication performance and assessing the risk for serious or life-threatening complications, such as an increased risk of inflammation.
  • the BPI can be used as a surrogate biomarker for diseases where patients are at risk of serious or life-threatening infections due to treatment with immune repressing drugs, including rheumatoid arthritis, juvenile idiopathic arthritis, psoriatic arthritis, ankylosing spondylitis, plaque psoriasis, acute Graft versus Host disease.
  • Immune suppressing mediations that can benefit from patient monitoring using the BPI to determine the risk of serious or life-threatening infections include, but are not limited to, abatacept (Orencia); adalimumab (Humira); infliximab (Remicade); Tofacitinib (Xeljanz); risankizumab (Skyrizi); anakinra (Kineret); azathioprine (Azasan); certolizumab (Cimzia); cyclosporine (Gengraf, Neoral, Sandimmune); etanercept (Enbrel); golimumab (Simponi); infliximab (Remicade); methotrexate (Otrexup, Rasuvo, Trexall); rituximab (Rituxan); steroids including dexamethasone, methylprednisolone (Medrol), prednisolone (Orapred ODT, Prelone
  • the BPI can be used to monitor the performance and possible side effects, when used for a prolonged period, of immune suppressing medications including corticosteroids, 5-amine-salicylic acid, inflammatory response modulators, calcineurin phosphatase inhibitors, purine metabolism antagonists, and biologicals (antibodies) including a4[37 integrin blockers, antiTNFa, antiIL12/IL23 and JAK-kinase inhibitors.
  • immune suppressing medications including corticosteroids, 5-amine-salicylic acid, inflammatory response modulators, calcineurin phosphatase inhibitors, purine metabolism antagonists, and biologicals (antibodies) including a4[37 integrin blockers, antiTNFa, antiIL12/IL23 and JAK-kinase inhibitors.
  • the BPI can be used to monitor the performance and possible side effects of other medications/treatments that can cause complications of inflammation, including the gastrointestinal track, or affect the immune response, such as immune checkpoint inhibitor drugs like Pembrolizumab, Nivolumab, and Cemiplimab as anti-PD-1 antibodies, Ipilimumab as an anti-CTLA-4 antibody, as well as Atezolizumab, Avelumab, and Durvalumab as anti-PD-Ll antibodies.
  • immune checkpoint inhibitor drugs like Pembrolizumab, Nivolumab, and Cemiplimab as anti-PD-1 antibodies, Ipilimumab as an anti-CTLA-4 antibody, as well as Atezolizumab, Avelumab, and Durvalumab as anti-PD-Ll antibodies.
  • the BPI can be used in combination with alternative tests, such as tests that focus on the analysis of inflammatory biomarkers in stool and blood, such as the stool calprotectin levels or C-reactive protein (CRP), erythrocyte sedimentation rate (ESR) and plasma viscosity (PV) in blood.
  • CRP C-reactive protein
  • ESR erythrocyte sedimentation rate
  • PV plasma viscosity
  • the BPI can also be used in combination with alternative microbial biomarkers, such as lipo-polysaccharide (LPS) levels.
  • alternative tests such as tests that focus on the analysis of inflammatory biomarkers in stool and blood, such as the stool calprotectin levels or C-reactive protein (CRP), erythrocyte sedimentation rate (ESR) and plasma viscosity (PV) in blood.
  • CRP C-reactive protein
  • ESR erythrocyte sedimentation rate
  • PV plasma viscosity
  • LPS lipo-polysaccharide
  • the SPA fragment sequencing approach provided herein is applicable to analyze microbial DNA compositions in any sample type, especially when in samples having low amounts of small fragment microbial DNA.
  • a DNA region is identified in a suitable microbial phylogenetic marker gene that has the following characteristics:
  • FIG. 4 An overview of an exemplary SPA primer design method is shown in Figure 4.
  • phylogenetic marker gene such as rpoB, cpn60. 16S rRNA, gyrB, tuf or other phylogenetic marker gene or conserved housekeeping gene including, but not limited to, those used by CheckM (Parks et al, 2015)
  • 50-100 species are initially selected that cover the prokaryotic diversity, including members of the phylum Proteobacteria (including representative a-, [3-, y-, 6- and e-Proteobacteria), the phylum Firmicutes (including representatives for the classes Bacilli, Clostridia, Erysipelotrichia and Negativicutes), and the y ⁇ aAcinetobacteria and Fusobacteria.
  • phylum Proteobacteria including representative a-, [3-, y-, 6- and e-Proteobacteria
  • Marker genes for these species can be aligned using a multiple sequence alignment tool like ClustalW.
  • the SPA algorithm can subsequently be used to identify conserved regions as putative annealing sites for primer candidates by looking for the highest “average sequence variance” scores over 25 nucleotide-long DNA regions among this limited set of sequences. This can be performed as follows:
  • a completely conserved nucleotide position will have 100% of one nucleotide and 0% for the other three nucleotides, and a variance of 0.25.
  • a completely nonconserved region will have 25% of each nucleotide and a variance of 0.
  • Primer candidates are prioritized based on their “average sequence variance” scores.
  • Primer candidates are evaluated for key properties including the level of primer degeneracy and annealing temperature (>50°C).
  • the sequences from the complete curated marker gene database are aligned to these conserved regions to determine their nucleotide compositions.
  • the conservation of their 3’ nucleotide must be >99% conserved among entries) and their “average sequence variance” scores are calculated (highly conserved regions have the highest score) and used to rank and select primer leads, prioritizing primers with the highest score.
  • a curated marker gene database and an algorithm are used to determine the “average sequence variance” for the regions adjacent to the primer annealing site.
  • Primers with adjacent 25 nucleotide-long and 50 nucleotide-long regions with ideally an average sequence variance of ⁇ 0.15 and ⁇ 0.075, respectively, are prioritized based on the lowest score.
  • the algorithm also identifies the resolution of phylogenetic identification for the regions adjacent to each primer lead by determining the number of unique SPA fragments. SPA primers with the highest phylogenetic resolution are added to the SPA primer repository.
  • the same approach can be used for other organisms, including fungi and viruses. For fungi, the ITS1 and ITS2 regions can be used as phylogenetic markers.
  • Figure 12 shows nucleotide statistics for the rpoB gene region 1327-1355 and degenerate sequence (GAYGAY ATYGAYCAYYTNGGHAAYCGHGC (SEQ ID NO: 2)) which is the reverse complement sequence recognized by the degenerate primers RpoBl- R1327 and RpoB-R1330.
  • the relative abundance of a nucleotide at a specific position was calculated using the nucleotide sequences of 47,505 aligned unique rpoB genes from the PATRIC database and used to design the degenerate sequence, which is provided from 5’ to 3’ using the following nucleotide codes: A: adenine; G: guanidine; C: cytosine; T: thymine; R: purine (A or G); Y: pyrimidine (T or C); H: not G (A, T or C); N: any nucleotide (A, G, C or T); *: presence of an ambiguous sequence at a specific rpoB gene position.
  • the percentages of highly conserved nucleotide sequences used to determine the consensus sequence for the degenerate primer are highlighted. The position of the region is based on the nucleotide sequence of the Escherichia coli rpoB gene.
  • the proposed degenerate primer sequences are matched to the human genome sequence and the number of hits with increased number of allowed mismatches is determined.
  • a primer should ideally have two or more mismatches with the human genome.
  • a method for amplifying microbial cell free DNA including: (a) providing one or more degenerate primers that are complementary to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region, and wherein the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region; (b) providing one or more additional primers that are complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E-PCR) using the one or more degenerate primers and the one or more additional primers, thereby generating one or
  • E-PCR enrichment polymerase chain reaction
  • a method for amplifying microbial cell free DNA that includes: performing an enrichment polymerase chain reaction (E-PCR) on a sample comprising microbial cell-free DNA (mcfDNA) wherein enriched mcfDNA fragments are generated using: (i) one or more degenerate primers comprising complementarity to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes, and wherein at least one of the one or more degenerate primers comprises an affinity purification group, and (ii) an additional primer comprising complementarity to an end of the mcfDNA.
  • E-PCR enrichment polymerase chain reaction
  • At least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region
  • the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region
  • enriched amplified mcfDNA fragments are generated in the E-PCR.
  • the method includes performing an amplification PCR (A-PCR) on the enriched mcfDNA fragments using: (i) a modification of at least one of the one or more degenerate primers of the E-PCR, wherein the modified degenerate primer comprises a 3 ’-end extension of at least three nucleotides relative to a degenerate primer of the one or more degenerate primers of the E-PCR, and (ii) the additional primer of the E- PCR, thereby generating enriched amplified mcfDNA fragments.
  • A-PCR amplification PCR
  • the methods of the invention may further include sequencing the enriched amplified mcfDNA fragments.
  • the methods of the invention may further include performing an affinity purification reaction on the enriched mcfDNA fragments.
  • the methods of the invention include, using a computer system, a step: (a) aligning the amplified mcfDNA fragment sequences on a sequence of the one or more degenerate primers and assigning matching sequences from the hypervariable region as representative of the same microbial species; (b) for the microbial species in part (a), searching a database of the one or more phylogenetic marker genes against the amplified mcfDNA fragment sequences and assigning a microbial species based on the closest match; and (c) optionally, for the one or more phylogenetic marker genes, calculating a microbial community composition based on the relative abundance of the amplified mcfDNA fragment sequences assigned to each microbial species.
  • the phylogenetic marker gene of the one or more phylogenetic marker genes can be a multicopy gene, in which case the method includes correcting for copy number variation between each species.
  • the method includes determining the microbial community composition by calculating a mathematical mean of the relative abundance of each microbial species for each of the two or more phylogenetic marker genes.
  • the microbial community composition can include one or more members of Eukaryotes, bacteria, viruses, or fungi.
  • a system for amplifying microbial cell free DNA (mcfDNA), including: a reaction vessel; a reagent dispensing module; and software to execute the method of any of the methods provided herein, wherein the method is executed at least partially robotically.
  • mcfDNA microbial cell free DNA
  • one or more nucleotides at positions represented by D or N can be replaced by inosine.
  • a kit for amplifying microbial cell free DNA (mcfDNA).
  • the kit includes: (a) an adaptor for ligating to ends of mcfDNA; (b) a degenerate primer comprising complementarity to a conserved region, wherein the degenerate primer comprises an affinity purification group, wherein the conserved region spans at least 18 nucleotides of a microbial phylogenetic marker gene, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region comprises a hypervariable region, and the degenerate primer is oriented to prime polymerase extension of the hypervariable region; (c) an additional primer complementary to a repaired version of the adaptor; (d) a modified degenerate primer comprising a 3 ’-end extension of at least three nucleotides relative to the degenerate primer; and instructions for performing: (i) an enrichment polymerase chain reaction (E-PCR) on the mcfDNA having the adapt
  • the kit can further include one or more additional degenerate primers comprising complementarity to one or more additional conserved regions of the microbial phylogenetic marker gene or to one or more additional genes.
  • the one or more additional genes can be one or more additional microbial phylogenetic marker genes.
  • the one or more additional genes can encode virulence factors, secondary metabolites, or drug degradation pathways, and combinations thereof.
  • a method for diagnosing or monitoring a disease or disorder includes (a) providing one or more degenerate primers that are complementary to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region, and wherein the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region; (b) providing one or more additional primers that are complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (c) performing, on a sample comprising the mcfDNA, an enrichment polymerase chain reaction (E-PCR) using the one or more degenerate primers and the one or more additional primers, thereby generating one or more enriched mcfDNA fragments; (d)
  • E-PCR enrichment poly
  • method of diagnosing or monitoring a disease or disorder, that includes: performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) one or more degenerate primers comprising complementarity to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes and (ii) an additional primer comprising complementarity to a repaired version of an adaptor ligated to ends of the mcfDNA.
  • mcfDNA microbial cell-free DNA
  • PCR polymerase chain reaction
  • At least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region
  • the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region, wherein amplified mcfDNA fragments are generated in the PCR.
  • the method includes sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragments and assigning one or more of the amplified mcfDNA fragment sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database.
  • the method includes calculating a microbial community composition for the sample, based on the relative abundance of the mcfDNA fragment sequences assigned to each microbial species.
  • one or a combination of: the microbial DNA signature; the microbial community composition; or (i) a presence, (ii) an absence, or (iii) a relative abundance of one or a combination of the microbial species, associated with a disease or disorder indicates that the subject has the disease or disorder.
  • a method of diagnosing or monitoring a disease or disorder that includes: providing mcfDNA sequences, wherein the mcfDNA sequences are derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of at least one microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region.
  • the method includes assigning one or more of the mcfDNA fragment sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database.
  • the method may include calculating a microbial community composition for the sample, based on the relative abundance of the mcfDNA fragment sequences assigned to each microbial species.
  • one or a combination of: the microbial DNA signature; the microbial community composition; or (i) a presence, (ii) an absence, or (iii) a relative abundance of one or a combination of the microbial species, associated with a disease or disorder indicates that the subject has the disease or disorder.
  • a method of diagnosing or monitoring a disease or disorder including sequencing amplified mcfDNA fragments derived from a sample, wherein the mcfDNA sequences comprise sequence corresponding to (i) a conserved region spanning at least 18 nucleotides of at least one microbial phylogenetic marker gene and (ii) a hypervariable region at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region.
  • the method includes assigning one or more of the amplified mcfDNA fragment sequences as comprising a microbial DNA signature or as belonging to a particular microbial species based on the closest sequence match of the hypervariable region to a reference microbial database.
  • the method can optionally include calculating a microbial community composition for the sample, based on the relative abundance of the mcfDNA fragment sequences assigned to each microbial species.
  • one or a combination of: the microbial DNA signature; the microbial community composition; or (i) a presence, (ii) an absence, or (iii) a relative abundance of one or a combination of the microbial species, associated with a disease or disorder indicates that the subject has the disease or disorder.
  • a method is provided of diagnosing or monitoring a disease or disorder, that includes: performing, on a sample comprising microbial cell-free DNA (mcfDNA), a polymerase chain reaction (PCR) using: (i) one or more degenerate primers comprising complementarity to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes and (ii) an additional primer comprising complementarity to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region, and the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region, wherein amplified mcfDNA fragments are generated in the PCR.
  • mcfDNA microbial cell-free DNA
  • PCR polymerase chain reaction
  • the method includes sequencing the amplified mcfDNA fragments or obtaining the sequences of the amplified mcfDNA fragments.
  • the disease or disorder comprises detection of infection or inflammation in one or a combination of Irritable Bowel Disease, Crohn’s disease, Ulcerative colitis, rheumatoid arthritis juvenile idiopathic arthritis, psoriatic arthritis, ankylosing spondylitis, or plaque psoriasis.
  • a method of detecting infection or inflammation comprising: (a) performing, on a sample comprising mcfDNA from a subject, an enrichment polymerase chain reaction (E-PCR) using one or more degenerate primers and one or more additional primers, thereby generating one or more enriched mcfDNA fragments, wherein: (i) the one or more degenerate primers are complementary to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region, and wherein the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region, and (ii) the one or more additional primers are complementary to a repaired version of an adaptor ligated to ends of the mcfDNA; (E-PCR) using one or more
  • a method of detecting infection or inflammation comprising: performing, on a sample comprising microbial cell-free DNA (mcfDNA) from a subject, a polymerase chain reaction (PCR) using: (i) one or more degenerate primers comprising complementarity to one or more conserved regions, wherein the one or more conserved regions span at least 18 nucleotides of one or more phylogenetic marker genes designated for a set of reference microbes and (ii) an additional primer comprising complementarity to a repaired version of an adaptor ligated to ends of the mcfDNA, wherein at least 25 adjacent nucleotides upstream or downstream of an end of the one or more conserved regions comprise a hypervariable region, and the one or more degenerate primers are oriented to prime polymerase extension of the hypervariable region, thereby generating amplified mcfDNA fragments; sequencing the amplified mcfDNA fragments or obtaining the sequence
  • the disease or disorder can include detection of infection or inflammation in Irritable Bowel Disease and the one or more pathogenic bacterial members can further include one or more of Rhuminococcus gnavus. Enterococcus faecium or Enterococcus faecalis.
  • the methods provided herein for detecting infection or inflammation can include calculating a percentage of the amplified mcfDNA fragments coming from the one or more bacterial pathogenic members as a fraction of a total amount of the amplified mcfDNA fragments in the sample, wherein the calculated percentage of less than one percent indicates an absence of infection or inflammation and the calculated percentage of greater than one percent indicates the presence of infection or inflammation.
  • the presence of infection or inflammation can be in the gastrointestinal tract.
  • the methods for detecting infection or inflammation provided herein can be used in combination with determination of a level of lipo-polysaccharide (LPS) in the sample.
  • LPS lipo-polysaccharide
  • the disease or disorder may include tuberculosis and other diseases caused by Mycobacterium species; determining pulmonary infection risks and causes in cystic fibrosis patients; determining the risk and onset of sepsis in patients with compromised immune systems; detection of opportunistic bacterial pathogens originating from the oral cavity that have been linked to Alzheimer’s disease, pancreatic cancer and other serious conditions such as endocarditis; women’s health issues including Chlamydia linked to mucopurulent cervicitis, pelvic inflammatory disease, tubal factor infertility, ectopic pregnancy and cervical cancer; detection and monitoring of progression of cancer; monitoring of minimal residual disease after oncology treatments; detection and monitoring of progression and minimal residual disease of breast cancer including triple negative breast cancer; detection of esophageal cancer, precancerous colonic polyps and early stage colorectal cancer, and detection and monitoring of progression and minimal residual disease of gastrointestinal cancers in general; detection
  • the disease or disorder can include colon cancer, breast cancer, or non-small cell lung cancer.
  • the disease or disorder can include a specific type of cancer, an early or later stage cancer, or a specific tumor subtype driven by the location of the tumor, and combinations thereof.
  • the disease or disorder can include IBD.
  • the disease or disorder can include Crohn’s disease or ulcerative colitis.
  • calculating the blood pathogenicity index is used to monitor disease progression, medication efficiency and medication side effects in IBD patients.
  • the methods provided herein can be used with one or a combination of inflammatory biomarkers in stool and blood, such as the stool calprotectin levels or C-reactive protein (CRP), erythrocyte sedimentation rate (ESR) and plasma viscosity (PV) in blood that provide information on the disease or disorder.
  • CRP C-reactive protein
  • ESR erythrocyte sedimentation rate
  • PV plasma viscosity
  • the combination with these inflammatory biomarkers can indicate whether the sample is likely to have the disease or the disorder.
  • the BPI can be used in combination with alternative microbial biomarkers, such as lipo-polysaccharide (LPS) levels.
  • alternative microbial biomarkers such as lipo-polysaccharide (LPS) levels.
  • the additional primer in the A- PCR can be used with a set of at least two phased primers, wherein the phased primers are identical to the additional primer except that the phased primers are phased by a single nucleotide.
  • the set of phased primers comprises five or more phased primers.
  • the conserved region of the microbial phylogenetic marker gene can have an average sequence variance score of greater than 0.175.
  • the hypervariable region can have an average sequence variance score of less than 0.15, 0.1, or 0.075.
  • the conserved region can span 18 to 40 nucleotides, 20 to 30 nucleotides, or 22 to 28 nucleotides of the phylogenetic marker gene.
  • the at least 25 adjacent nucleotides upstream or downstream of an end of the conserved region can include less than 150 adjacent nucleotides, less than 75 adjacent nucleotides, or less than 50 adjacent nucleotides.
  • the ends of the mcfDNA can include an adaptor and the additional primer is complementary to a repaired version of the adaptor.
  • the adaptor can be a double stranded asymmetric linker cassette comprising a 5’ asymmetrical end and a 3’ end where the two strands are complementary.
  • the asymmetric linker cassette can be a Y-shaped linker cassette or a single arm linker cassette.
  • the additional primer is complementary to a repaired 5’ end of the asymmetric linker cassette and, in the E-PCR, polymerase extension from the degenerate primer results in repair of the asymmetric linker cassette.
  • the mcfDNA can be an average fragment length of less than about 100 bp.
  • the percentage of the mcfDNA in the sample can be less than about 0.05%, less than about 0.1%, less than about 1%, less than about 5%, or less than about 15%.
  • the set of reference microbes can be eubacterial microbes. In other instances, the set of reference microbes are eukaryotic, fungal, viral, or bacterial.
  • the methods and kits provided herein can include performing one or more reactions to repair the ends of the mcfDNA.
  • the degenerate primer can include one or more sequencing adaptor sequences.
  • the method can include adding one or more sequencing adaptor sequences to the amplified mcfDNA fragments in a second amplification reaction or to the enriched mcfDNA fragments in the A-PCR.
  • the microbial phylogenetic marker gene comprises rpoB.
  • the microbial phylogenetic marker gene comprises 16S rRNA.
  • the microbial phylogenetic marker gene comprises a combination of two or more of rpoB. cpn60. tif or 16S rRNA.
  • the phylogenetic marker gene can include rpoB and the conserved region can include nucleotide positions 1327 - 1355 based on the Escherichia coli rpoB gene sequence.
  • the microbial phylogenetic marker gene comprises 16S rRNA and the conserved region comprises nucleotide positions 781-805 based on the Escherichia coli 16S rRNA gene sequence.
  • the degenerate primer can include one or both of RpoB-ePCR mix and RpoB-aPCR mix.
  • the degenerate primer comprises one or both of 16SV4-ePCR and 16SV4-aPCR.
  • the microbial phylogenetic marker gene comprises 16S rRNA and the conserved region comprises a VI, V2, V3, V4, V5, or V6 region of the 16S rRNA phylogenetic marker gene.
  • the degenerate primer can include RpoB-ePCR mix, RpoB-aPCR mix, 16SV4-ePCR, or 16SV4-aPCR, and combinations thereof.
  • the microbial phylogenetic marker gene can include DNA gyrase subunit B (gyrB), heat shock protein 60 (hsp60 superoxide dismutase A protein (sodA TU elongation factor (tuf), DNA recombinase proteins (including recA, recE).
  • gyrB DNA gyrase subunit B
  • hsp60 superoxide dismutase A protein hsp60 superoxide dismutase A protein
  • tuf DNA recombinase proteins
  • trrl gene that encodes for thioredoxin reductase
  • rim.8 gene that encodes for a protein involved in the proteolytic activation of a transcriptional factor in response to alkaline pH
  • kre2 gene that encodes for a-l,2-mannosyltransferase
  • erg6 gene that encodes for A(24)-sterol C-methyltransferase.
  • the phylogenetic marker gene can be a fungal phylogenetic marker gene or a human fungal phylogenetic marker gene.
  • the human fungal phylogenetic marker gene comprises nuclear ribosomal internal transcribed spacer region 1 (ITS1) or nuclear ribosomal internal transcribed spacer region 2 (ITS2).
  • the amplified mcfDNA fragments or enriched amplified mcfDNA fragments can include mcfDNA from one or a combination of members of the Ascomycota, Basidiomycota and Mucoromycota, including Alternaria species, Aspergillus species, Blastomyces species, Candida species, Capnodiales species, Cladosporium species, Malassezia species, Phaeosphaeria species, Pseudozyma species, Saccharomyces species, Sporobolomyces species, Vishniacozyma species, and Yarrowia species.
  • the methods provided herein for amplifying mcfDNA can include adding, in the PCR or in both the E-PCR and the A-PCR, a primer to a functional gene to determine the presence of the functional gene in the sample.
  • the functional gene can be a pathogenicity factor, a PKS gene cluster essential for colibactin synthesis, or a choline trimethylaminelyase gene.
  • the methods for amplifying mcfDNA can include adding, in the PCR or in both the E-PCR and the A-PCR a primer to a viral gene to determine the presence of the viral gene in the sample.
  • the viral gene can include a human DNA- or RNA-based oncovirus gene.
  • the oncovirus can be one or a combination of Epstein-Barr Virus (EBV), Human Papillomavirus (HPV), Hepatitis B virus (HBV), Human Herpesvirus-8 (HHV-8), or Merkel Cell Polyomavirus (MCPyV).
  • the sample can include a bodily fluid, a tissue, or an extracellular bodily substance.
  • the bodily fluid can include whole blood, a blood fraction, serum, plasma, or combinations thereof.
  • the sample includes a biopsy sample from a solid tumor, a skin graft, a liquid biopsy sample other than blood, or combinations thereof.
  • the sample includes a stool sample.
  • the amplified mcfDNA fragments or the enriched amplified mcfDNA fragments can include mcfDNA from one or more members of: Flavobacterium sp., Staphylococcus auricularis, Pseudomonas toyotomiensis , Rheinheimera sediminis, Finegoldia magna, Parvularcula sp., Pseudomonas stutzeri, Pseudomonas soyae, Pseudomonas saponiphila, Pseudomonas sp., Peptoniphilus harei, Quisquiliibacterium sp., Azoarcus sp., Sphingopyxis terrae, uncultured Clostridiales bacterium strain UMGS460, Staphylococcus schweitzeri, Flavobacterium erciyesense, Rhodoc
  • tuber culosis-simiae (“ tuber culosis-simiae ” clade), Staphylococcus aureus, Staphylococcus argenteus, Staphylococcus schweitzeri, Pseudomonas aeruginosa, Burkholderia cepacia complex, Burkholderia ubonensis, Burkholderia species Nov., Burkholderia multivorans, Burkholderia pseudomultivorans, Burkholderia pseudomallei, Burkholderia mallei, Trinickia species, Burkholderia thailandensis, Haemophilus species, Haemophilus influenzae, Haemophilus parainfluenzae, Streptococcus species at the various group and species levels, Streptococcus dysgalactiae, Streptococcus pyogenes, Streptococcus mutans, Streptococcus suis, Streptococc
  • Streptococcus oralis Streptococcus gordonii, Streptococcus uberis, Streptococcus parasanguinis, Streptococcus sanguinis Streptococcus parauberis, Streptococcus infantarius, Streptococcus iniae, Streptococcus salivarius, Streptococcus thermophilus, Streptococcus vestibularis, Streptococcus bovis, Streptococcus gallolyticus subsp. gallolyticus, Streptococcus gallolyticus subsp. macedonicus, Streptococcus gallolyticus subsp.
  • Fusobacterium hwasookii Fusobacterium canifelinum, Fusobacterium nucleatum subsp. animalis, Fusobacterium periodonticum, Fusobacterium necrophorum subsp. funduliforme, Fusobacterium mortiferum, Fusobacterium varium, Fusobacterium nucleatum subsp. nucleatum, Fusobacterium ulcerans, Fusobacterium nucleatum subsp.
  • Enterobacter species Escherichia species, Escherichia coli, or Klebsiella species, Klebsiella pneumoniae, Klebsiella oxyloca. Shigella species, Kluyvera species, Erwinia species, Salmonella species, Salmonella typhi, Salmonella enterica, Listeria species, Listeria monocytogenes, Veillonella species, Veillonella atypica, Veillonella dispar, Veillonella oral, Stenotrophomonas species, Stenotrophomonas maltophilia. and combinations thereof.
  • the present invention may be implemented using hardware, software, or a combination thereof and may be implemented in one or more computer systems or other processing systems. In one aspect, the invention is directed toward one or more computer systems capable of carrying out the functionality described herein.
  • cfDNA isolation was performed using the Qiagen QIAamp ccfDNA/RNA Kit on 1.0 ml blood plasma from a healthy volunteer. To confirm the presence of mcfDNA in the blood sample, total cfDNA was isolated on 1.0 ml blood plasma and deep sequencing was used to determine the percentage of mcfDNA. In the case of this healthy donor, the percentage of mcfDNA was approximately 0.1 % to 1.0% of the total cfDNA (data not shown). This is considerably lower than typically found in blood samples from cancer patients, where this ranged between approximately 1% to 4% (Poore et al, 2020).
  • the E-PCR reaction was introduced during protocol iteration 3. Together with the A- PCR reaction it results in enrichment of target fragments and reduces the level of fragments amplified as the result of non-specific primer hybridization.
  • the E-PCR reaction was further improved during protocol iteration 7 by using the biotinylated 16S-V4-R785-Bio primer, which targets position 785 - 805 upstream of the V4 region of the 16S rRNA gene.
  • the linker-ePCR primer is not biotinylated.
  • Non-specific primer hybridization was further reduced by increasing the E-PCR annealing temperature from 55°C to 60°C (protocol iteration 4). o Prepare the Reaction according to the table below.
  • This step is essential to remove unincorporated biotinylated primer from the E-PCR reaction before streptavidin bead purification.
  • o Let the AMPURE beads equilibrate at RT and vortex to resuspend the beads, o Add 50 pl of beads into each tube (containing 25 pL E-PCR reaction from step 4), mix by pipetting 10 times and incubate at RT for 5 min. c Place the tubes on a magnetic stand for 1 min. Remove and discard the supernatant. o Keep the tubes on the magnetic stand and add 200pL of 80% ethanol. Incubate at RT for 30 seconds. Remove and discard the ethanol supernatant. Repeat step for total of two washes. o Remove the tubes from the magnetic stand.
  • This step removes non-biotinylated DNA fragments, especially human cfDNA, not amplified during the E-PCR reaction and is essential to reduce non-specific PCR amplification during theA-PCR andl-PCR steps (protocol iteration 7).
  • o Vortex the Streptavidin Cl beads and add 88 pl into a LOBIND.
  • o Add 200 pl of IX B&W Buffer and mix by pipetting 10 times. Place the tube on a magnet for 1 min and discard the supernatant.
  • o Remove the tube from the magnet and resuspend the beads in 88 pl of IX B&W Buffer by pipetting 10 times. Place the tube on a magnet for 1 min and discard the supernatant.
  • Linker-aPCR primer mix can be used in combination with SPA-linker -UMI.v2-Y-SP for the A-PCR reaction (protocol iteration
  • AMPURE bead cleanup at 2X 8 samples
  • o Let the AMPURE beads equilibrate at RT and vortex to resuspend the beads. o Add 100 pl of beads into each tube (containing 50 pL A-PCR reaction, with streptavidin Cl beads, from step 7), mix by pipetting 10 times and incubate at RT for 5 min. o Place the tubes on a magnetic stand for 1 min. Remove and discard the supernatant. o Keep the tubes on the magnetic stand and add 200 pL of 80% ethanol. Incubate at RT for 30 seconds. Remove and discard the ethanol supernatant. Repeat once for a total of two washes. o Remove the tubes from the magnetic stand.
  • This step allows for subsequent sequencing and multiplexing of the SPA libraries.
  • the total number of amplification cycles for the E-PCR, A-PCR and I-PCR reaction should not exceed 40 to 45 (protocol iteration 5).
  • o Prepare the Reaction on ice according to the table below, using a unique set of NEXTERA UDI primers for each sample from step 8. o Place the reactions in the thermocycler and run the following program: heated lid on, 95°C for 10 min, [95°C for 30 sec, 55°C for 30 sec, 72°C for 30 sec] for 7 cycles, 72°C for 5 min, 4°C on hold.
  • PhiXDNA as Spike-IN instead of 5%> was introduced during protocol iteration 2.
  • Dilute the library pool to 150 pM including 5-30% PhiX DNA as Spike-IN, using Dilution Buffer (10 mM Tris-HCl, pH 8.0 + 0.05% Tween 20) in 100 pl final volume.
  • the library pool was subsequently sequenced following the 150bp paired-end read sequencing protocol from Illumina using the Illumina iSEQIOO il Reagent v2 (300-cycle) kit on an Illumina iSEQIOO instrument according to the manufacturer’s instructions.
  • Adaptors and the primer (16SV4-R781) are trimmed from the sequences.
  • the remaining reads of different lengths are deduplicated, while recording the number of duplicates by sequence for retaining read abundance. 5.
  • the deduplicated reads are aligned to thel6S gene database using the basic local alignment search tool (BLAST, Altschul et al, 1990). Only Blast hits where the read starts aligning at the first nucleotide, and where the read aligned at 90% identity over at least 90% of its length are retained along with the taxonomy of the best hit.
  • Table 2 summarizes the improvement steps evaluated and links them to two important criteria: the percentage of reads that can be identified, based on sequence homology using Blastn, as 16S rRNA gene derived SPA sequences; and the number of unique 16S rRNA gene derived SPA sequences.
  • Table 2 Description of the various changes implemented in chronological order during subsequent iterations of the SPA fragment sequencing protocol. The effect of each change on improving the ability to generate unique SPA sequencing fragments was evaluated by determining the number of unique reads representing SPA fragment sequences obtained from the 16S rRNA V4 hypervariable region. [00171] Surprisingly, the improved method resulted in the percentage of 16S rRNA gene derived fragments (as identified by Blastn) increasing from less than 0.01% to an average of 35%, a 3,500-fold increase (see Table 2). Compared to the original protocol, major improvements were made, especially with the introduction of the enrichment (E-PCR) step (protocol iteration 3).
  • E-PCR enrichment
  • Protocol improvements included increasing the PCR annealing temperature (protocol iteration 4), decreasing the total number of PCR cycles to avoid over amplification of fragments (protocol iteration 5), and use of the biotinylated primer for affinity purification (protocol iteration 7) as part of the E-PCR step.
  • the use of the biotinylated primer in the E-PCR reaction did not increase the percentage of reads that identified as derived from the 16S rRNA V4 gene region, it significantly increased the number of unique reads, thereby improving the sensitivity of the SPA fragment sequencing method.
  • the improved SPA fragment sequencing protocol was successfully used for the stratification of cancer patients.
  • the improved SPA fragment sequencing protocol can be fully automated in a 96-well format, or multiples thereof, starting with the DNA extraction step to the library quantification step.
  • the cfDNA isolation step which us currently done using the column based Qiagen QIAamp ccfDNA/RNA Kit, should preferably be performed using a magnetic bead-based extraction kit, such as the Maxwell® HT ccfDNA Kit (Promega) or the APOSTLE MINIMAX system (Beckman Coulter).
  • the current standard for community composition analysis is 16S rRNA gene-based sequencing.
  • this method has several shortcomings, including PCR- based biases resulting in over and/or underrepresentation of species, or the formation of chimeric sequences, resulting in misrepresentation of community composition. This can have major effects when analyzing microbial communities linked to disease, including mcfDNA signatures in blood originating from tumor microbiomes or indicative for a leaky gut in the case of IBD patients, among others.
  • Dilution series (10-fold dilution steps) were made of sheared MCI DNA to a final concentration of 250 pg/pl. Subsequently, 10 pl was added to 0.5 ml blood plasma or 0.5 ml water. DNA from the spiked samples was extracted, resuspended in 30 pl water, and SPA fragment sequencing libraries were prepared using 18 pl DNA solution according to the protocol described in EXAMPLE 1. In addition, 2 pl DNA solution was used for DNA amplification and sequencing library construction using the V3-V4 16S rRNA primer pair or the V6 rRNA primer pair. The primer sequences are shown in Table 1. Libraries were sequenced on an iSEQIOO instrument using the 150 bp paired-end sequencing protocol from Illumina as mentioned in EXAMPLE 1.
  • Adaptors and the primers are trimmed from the sequences.
  • DADA2 (Callahan et al, 2016) is used to filter reads and create amplicon sequence variants (ASVs) with read counts for community composition.
  • consensus sequences and community composition are computed: a. The UMI (unique molecular identifier) of each read is recorded for downstream deduplication and denoising. b. Forward and reverse reads are merged if possible and clustered into bins of similar sequence. c. The longest read in each bin is searched against a database of bacterial 16S rRNA genes to aid the assembly into consensus sequences. d. Consensus sequence abundances are calculated based on the number of reads contributing to each consensus sequence, taking into consideration the number of deduplicated reads identified in step 4a,
  • Amplicon sequence vanants (ASV) and consensus sequence genera are assigned with at least 50% confidence.
  • V3-V4 16S amplicon sequencing after removal of the sequencing adaptors, fragments of approximately 130 bp are obtained that all cover the same positions of the V3 and V4 regions, thereby not allowing to in silico build ASVs beyond 130 bp.
  • the large majority of mcfDNA fragments derived from the V3-V4 region will either contain the V3 or the V4 16S rRNA gene primer annealing site, but not both.
  • SPA fragment sequencing provided a significantly better representation of the species present in MCI when spiked into water or blood plasma. For instance, when MCI was spiked into blood plasma, bacteria belonging to the genera Enterococcus, Salmonella, Listeria, Bacillus and Staphylococcus were either underrepresented or could not be identified at the genus level based on V3 and V4 16S rRNA gene sequences, thus showing that SPA fragment sequencing has significantly better phylogenetic resolution and quantification of community composition.
  • SPA fragment sequencing showed superior community composition and species identification on blood plasma instead of water. Similar results were obtained for water or plasma spiked with 0.5 pg/pl MCI DNA.
  • V6 16S rRNA gene amplicon sequencing gave the highest number of sequence reads that could not be assigned to a bacterial genus, making it the least suitable method for the analysis of mcfDNA. It also gave a significant overrepresentation of Escherichia/Shigella but failed to identify the closely related genera Enterococcus and Salmonella.
  • the MCI sample spiked into blood plasma contained some longer DNA fragments, allowing to in silico construct a larger consensus sequence representing Salmonella, which could be assigned with 99% confidence to the genus Salmonella. Since the sequence of the V3-V4 16S rRNA gene region of Salmonella is very similar to that of other Enter obacteriaceae. only SPA fragment sequencing that allows to generate longer consensus sequences that go beyond the 130 bp sizes of the V3 and V4 amplicon sequences can distinguish between Enterobacteriaceae and successfully identify Salmonella at levels similar to the MCI spike (see Figure 7).
  • SPA fragment sequencing was performed on cfDNA isolated from the blood plasma of 8 colorectal cancer (CRC) patients, 12 breast cancer patients, and 12 non-small cell lung cancer patients. All 32 blood plasma samples were obtained from the same hospital. Based on their histology the CRC and non-small cell lung cancers were identified as adenocarcinomas. Metadata for all 32 patients are provided in Table 3.
  • FIG. 8 to 10 The type of cancer as well as the stage of the cancer are indicated in Figures 8 to 10.
  • the figures illustrate that CRC, breast and non-small cell lung cancer patients with either early or later stage cancer could be correctly stratified. Furthermore, one of the CRC patients, although correctly diagnosed based on the PCA analysis, was found to be an outlier (datapoint located right lower quadrant of Figure 8). It turned out that the location of the tumor was rectal instead of in the colon, as was the case for the other CRC patients. Thus, the mcfDNA signatures obtained by SPA fragment sequencing are specific enough to distinguish tumor subtypes, e.g. driven by the location of the tumor.
  • Table 3 Metadata for the 12 lung cancer patients, 12 breast cancer patients and 8 colorectal cancer patients used in this study. Information on the gender, ethnicity, race, cancer diagnosis, results of the tumor histology, tumor site and development stage are provided.
  • SPA identified indicates if a patient was successfully stratified based on its mcfDNA pattern using SPA fragment sequencing of the 16SrRNA gene V4 hypervariable region followed by paired wise PCA analysis using the combinations (a) Colon Cancer versus Non-Small Cell Lung Cancer, and (b) Colon Cancer versus Breast Cancer, of which the results are shown in Figure 8 and Figure 9, respectively.
  • nucleotide sequence upstream of the conserved region 1327-1352 is the most variable, as indicated by the lowest average variance scores of 0.0667 for both the 25 nucleotide-long and 50 nucleotide-long regions.
  • the identification of a hypervariable DNA region in the rpoB gene upstream of the conserved region 1327-1352 was unexpected, as it falls outside of the region that has previously been identified and used for RpoB gene amplicon sequencing (Ogier et al, 2019).
  • the number of putative annealing sites of the proposed degenerate primer sequences to the human genome sequence (Reference: GCF_000001405.40_GRCh38.pl4_genomic.fna) with increased number of allowed mismatches is determined.
  • Results for the degenerate primers 16SV4-R785 and RpoB-R1327 are shown in Table 5.
  • a primer should not have zero or one mismatch, and ideally no more than 10 instances of two mismatches with the human genome. Based on the results from this analysis, both the 16S-V4-R and RpoB-R1327 primers should be suitable for SPA fragment sequencing.
  • both the 16SV4-R785-Bio and 16SV4-R781 primers have a degeneracy of 9 possible combinations.
  • RpoB-ePCR and RpoB-aPCR primers for the E-PCR and A- PCR steps we noticed that as a results of the degeneracy the region 1327-1355 had a total of 2,304 possible combinations (see consensus sequence in Figure 12). Using primers that included all degeneracies combined with an E.
  • coli specific rpoB gene primer failed to amplify a target fragment ranging from position 1220 to 1352 of the rpoB gene using E. coli DNA as positive control (results not shown).
  • nucleotide linkage analysis was performed for this region to detect patterns of nucleotide preference. For instance, if the nucleotide at degenerate position 1329 of the rpoB gene is a cytosine, what is the frequency for a specific nucleotide at the next degenerate nucleotide position? This analysis was performed for the sequences of the regions covered by the degenerate primers RpoBl-R1327 and RpoBl-R1330 with either thymine or cytosine in position 1329 or position 1332, respectively.
  • the individual primer sequences for these mixes referred to as the RpoB-ePCR mix and the RpoB- aPCR mix, are listed in Table 1.
  • the degeneracy for the primers in these mixes ranged from 64 to 1536 for the RpoB-ePCR mix and 32 to 768 for the RpoB-aPCR mix.
  • Using the RpoB-ePCR mix and the RpoB-aPCR mix in combination with the E. coli specific rpoB gene primer (RpoB- colil247-F) on E. coli DNA allowed for successful amplification of the target fragment, confirming the importance of reduced primer degeneracy.
  • Table 4 Average sequence variance for the primer regions and the regions upstream or downstream of candidate primer annealing regions recognizing conserved rpoB gene sequences. For each region adjacent to the primer region, the variance is shown for 25, 50, 75, 100 or 200 nucleotides (nt) upstream (5’) or downstream (3’) of the beginning or end of the primer annealing sequence. The variance score is calculated as the average of the variance of the percentage of the nucleotides adenine, guanidine, cytosine and thymine at each position of the rpoB gene. A lower number is indicative for more variance, while a higher number is indicative for less variance and a more conserved DNA sequence.
  • the maximum theoretical variance score for a region is 0.25 (would represent a 100% conserved DNA region). Regions with a variance score ⁇ 0.1 are highlighted. The coordinates of the regions recognized by the primers are based on the nucleotide sequence of the Escherichia coli rpoB gene.
  • Table 5 Number of hits for primers to the human genome. For each primer, the number of hits with zero, one or two mismatches are presented. The number of hits was determined based on homology to the nucleotide sequences on both DNA strands (+ and - strand) of the human chromosome (Reference: GCF_000001405 ,40_GRCh38. p 14_genomi c. fna).
  • Table 6 Linkage between nucleotide preferences for the region 1327 - 1355 of the rpoB gene.
  • the nucleotide at position 1329 was either fixed as a C or a T, after which the preferred nucleotides for the other degenerate positions were calculated. This information was subsequently used to design primers with fewer degenerate nucleotides that recognize the region 1327 - 1356 of the rpoB gene. Preferred nucleotides with a preference score over 80% are highlighted in grey.
  • protocol iteration 8 was followed, using a set of six phased primers referred to as the Linker-aPCR primer mix in combination with SPA-linker-UMI.v2-Y- SP for the A-PCR reaction.
  • Linker-aPCR primer mix in combination with SPA-linker-UMI.v2-Y- SP for the A-PCR reaction.
  • SPA fragment sequencing was performed on cfDNA isolated from the blood plasma of 10 breast and 10 lung cancer patients, as well as 4 healthy controls. Twenty blood plasma samples from cancer patients were obtained from the same hospital, while the healthy control samples were obtained locally from healthy volunteers.
  • SPA fragment sequencing of a single copy housekeeping gene such as the rpoB gene
  • SPA fragment sequencing of the 16S rRNA gene there are notable differences between SPA fragment sequencing of a single copy housekeeping gene, such as the rpoB gene, and SPA fragment sequencing of the 16S rRNA gene.
  • the rpoB gene derived SPA fragments provide better phylogenetic resolution than fragments derived from the 16S rRNA gene.
  • the number of unique 16S rRNA gene derived SPA fragments obtained per sample is more than 300 times higher than what is observed for single copy housekeeping genes. It is therefore proposed to combine SPA fragment sequencing on the 16S rRNA gene and a single copy housekeeping gene, exemplified by the rpoB gene, for improved insights in community composition and phylogenetic identification, respectively.
  • multi-loci SPA fragment sequencing The use of two or more degenerate primers for annealing to two or more conserved regions on a single or two different phylogenetic marker genes may be referred to herein as “multi-loci SPA fragment sequencing”.
  • the lactulose - mannitol test is the most common and only direct test of a leaky gut.
  • this test doesn’t provide insights in imbalanced gut microbiome bacteria and a dysfunctional gut epithelial barrier, with leakage of microbial material from the gut into the bloodstream resulting in an overactive immune response that underlies multiple inflammatory conditions, including but not limited to IBD. Therefore, a need exists for biomarkers that provide this clinically relevant information.
  • SPA fragment sequencing was successfully used to evaluate the levels of mcfDNA in the blood of IBD patients.
  • BPI Blood Pathogenicity Index
  • the BPI can be integral to clinical decision-making throughout the treatment continuum in IBD patients and patients suffering from other inflammatory conditions, as it provides a surrogate biomarker that can successfully measure the effect of a specific treatment and correlate with a real clinical endpoint: the load of mcfDNA derived from pathogens in blood as an indication for treatment performance and the risk for medical complications from infections.
  • the BPI is calculated as the percentage of mcfDNA fragments coming from pro-inflammatory bacteria as a fraction of the total amount of mcfDNA fragments found in a blood plasma sample, and comprises pathogenic members of the following bacterial genera/species: Enterobacteriaceae including Enterobacter species, Escherichia species (e.g. E. coll), Shigella species, Klebsiella species (e.g. K. pneumoniae and K. oxytoca), Kluyvera species, Erwinia species, Haemophilus species, Salmonella species (e.g. S. typhi and S. enterica), and Listeria species (L. monocytogenes); pathogenic Clostridium species such as C. difficile, C.
  • Enterobacteriaceae including Enterobacter species, Escherichia species (e.g. E. coll), Shigella species, Klebsiella species (e.g. K. pneumoniae and K. oxytoca), Kluyvera species, Erwinia species,
  • Veillonella species such as V. atypica, V. dispar and V. oral,' Pseudomonas species like P. aeruginosa,' Stenotrophomonas species like S. maltophilia; Burkholderia species like B. cepacia,' pathogenic Staphylococcus species like S. aureus; Porphyromonas species; and Fusobacterium species like F nucleatum andF. varium.
  • additional bacteria including Rhuminococcus gnavus, Enterococcus faecium and Enterococcus faecalis can be included in the BPI.
  • a total of 94 blood plasma samples (0.5 ml aliquots), coming from 18 Crohn’s disease patients and 18 ulcerative colitis patients, were provided by the Crohn’s and Colitis Foundation IBD Plexus (https://www.crohnscolitisfoundation.org/research/current-research- initiatives/ibd-plexus/abouf). together with the patient metadata. All patients were using immune response suppressing medication on at least one of the timepoints when blood samples were collected.
  • the samples were processed following the protocol described in EXAMPLE 1 using the MAXWELL HT ccfDNA Kit (Promega) for mcfDNA extraction in a 96-well format on the KingFisher (Thermo Fisher) platform.
  • the SPA fragment libraries were sequenced on the iSEQIOO sequencer from Illumina. The analysis of the SPA fragment sequences included the following steps:
  • Paired-end reads are filtered based on read quality, expected amplicon structure, and length.
  • the UMI (unique molecular identifier) of each read is recorded for downstream deduplication and denoising.
  • the longest read in each bin is searched against a database of bacterial 16S rRNA genes to aid the assembly into consensus sequences.
  • Figures 14A and 14B show that UC and CD patients found to be in remission had very low BPI scores. Similarly, patients with clinical events had significantly elevated BPI scores at the time of these events. These results show that the BPI provides an excellent biomarker to monitor disease activity status.
  • Fecal calprotectin levels above 150 pg/g are considered indicative for inflammation of the intestinal epithelial lining.
  • Table 7 Prediction of an IBD patient’s disease status based on the Blood Pathogenicity Index.
  • a BPI >1.0 is considered as active disease.
  • a clinical event refers to an active disease flair or medication failure (i.e. , medication stopped working or serious side effects of the medication were observed). False positive is defined as an elevated BPI with no recorded clinical event. False negative is defined as a BPI ⁇ 1.0 while a clinical event has been recorded.
  • Table 8 Prediction of an IBD patient’s disease status based on fecal calprotectin levels.
  • the calprotectin levels were determined on stool samples that were taken around the same time as the corresponding 63 blood samples that had passed the QC criteria after SPA fragment sequencing.
  • the samples included 29 CD and 31 UC patient samples where calprotectin results were available, and patient metadata were used to confirm the disease status for each sample.
  • Calprotectin levels >150 pg/g fecal material are considered as active disease.
  • Calprotectin levels were determined on 63 IBD samples that passed QC for SPA fragment sequencing.
  • a clinical event refers to an active disease flair or medication failure (i.e., medication stopped working or serious side effects of the medication were observed).
  • False positive is defined as an elevated BPI with no recorded clinical event. False negative is defined as Calprotectin levels ⁇ 150 pg/g fecal while a clinical event has been recorded.
  • False negative is defined as Calprotectin levels ⁇ 150 pg/g fecal while a clinical event has been recorded.
  • Linking the BPI back to the patient journey showed that for many CD and UC patients an increase in BPI was observed when their medication stopped working, resulting in active disease, or when a patient on the medication reported undesirable side effects and/or disease symptoms, showing that the BPI can be successfully used as a surrogate biomarker to monitor disease severity, treatment efficacy and risk of complications in IBD patients.
  • the BPI would have also correctly predicted remission in CD and UC patients. Examples of the correlation between the BPI and the patient’s health status are presented in Table 9 and Table 10 for CD or UC patients, respectively.
  • Patient CD-4 reported side effects while taking the antiTNFa medication Adalimumab (Humira), which coincided with an increase in the BPI. After the medication was eliminated, the BPI returned to 0. The side effects from adalimumab were missed by the calprotectin levels and the SCDAI score.
  • the journey of patient CD-4 is schematically presented in Figure 16. Specifically, Figure 16 shows a schematic representation of the journey of Crohn’s disease patient CD-4 and how the BPI can be successfully used to influence a better disease outcome. The patient is in remission (12/2018; 06/2019) on a combination of azathioprine and adalimumab, the latter being discontinued due to side effects (10/2019). Stopping this medication results in full remission (01/2020).
  • Patient CD-6 was initially treated with corticosteroids. With disease progression, the patient was successfully treated with an antiTNFa medication, resulting in remission, which was supported by a low BPI value ⁇ 1.0.
  • Patient CD-7 was initially treated with Vedolizumab, an a4p7 integrin blocker. Once this medication stopped working and was omitted, the BPI went up, indicating the need for changes to the patient’s treatment regime.
  • Table 9 Examples of the correlation between the Blood Pathogenicity Index (BPI), disease progression and the performance of medication in Crohn’s disease (CD) patients.
  • CD patient number CD-n
  • Sample Date Calprotectin levels (pg/mg stool); Patient’s SCDAI score
  • BPI indicating the percentage (%) of pathogens present in the mcfDNA signature as determined via SPA fragment sequencing; Medication name and type before or at time of sample; Medication usage history and comments.
  • the BPI was calculated as the percentage of mcfDNA fragments coming from pro-inflammatory bacteria as part of the total amount of mcfDNA fragments found in a blood plasma sample.
  • BPI An increase of BPI is indicative that a medication is no longer working, and/or that a patient on the medication is experiencing an increased level of infection by pathogenic bacteria.
  • BPI values >1.0 are in bold; * indicated that a medication (in bold) is removed or no longer working. The results were obtained in a retrospective study.
  • Table 10 Examples of the correlation between the Blood Pathogenicity Index (BPI), disease progression and the performance of medication in Ulcerative Colitis (UC) patients.
  • BPI Blood Pathogenicity Index
  • the following information is provided: UC patient number (UC-n); Sample Date; Calprotectin levels (pg/mg stool); Patient’s Mayo-6 score; BPI, indicating the percentage (%) of pathogens present in the mcfDNA signature as determined via SPA fragment sequencing; Medication name and type before or at time of sample; Medication usage history and comments.
  • the BPI was calculated as the percentage of mcfDNA fragments coming from pro-inflammatory bacteria as part of the total amount of mcfDNA fragments found in a blood plasma sample.
  • BPI An increase of BPI is indicative that a medication is no longer working, and/or that a patient on the medication is experiencing an increased level of infection by pathogenic bacteria.
  • BPI values >1.0 are in bold; * indicated that a medication (in bold) is removed or no longer working. The results were obtained in a retrospective study.
  • Patients UC-1 and UC-2 provide examples where patients indicate remission, but where the levels of calprotectin and the BPI are elevated, indicating that the medication is not meeting its primary endpoint of lowering inflammation of the gut epithelium.
  • Patient UC-3 was on a combination of medication that included Mesalamine, which started to have side effects and stopped working. This coincided with an increase in the BPI. After the medication was eliminated, the BPI returned to 0.
  • the journey of patient UC-3 is schematically presented in Figure 17. Specifically, Figure 17 shows a schematic representation of the journey of ulcerative colitis patient UC-3 and how the BPI can be successfully used to influence a better disease outcome. It is assumed that the initial combination of tofacitinib, vedolizumab and mesalamine is working before 11/2018. Issues are reported for mesalamine, which is removed from the medication (02/2019).
  • Patient UC-4 is an example of a false positive. An elevated BPI is observed despite no issues with medication being reported.
  • Patient UC-9 is an example where the initial medication, Mesalamine, did not work.
  • the BPI determined on a blood sample taken from the patient at this time is elevated.
  • Subsequent addition of an antiTNFa medication resulted in remission, which correlates with a decrease in the BPI.
  • the journey of patient UC-9 is schematically presented in Figure 18.
  • Figure 18 shows a schematic representation of the journey of ulcerative colitis patient UC-9 and how the BPI can be successfully used to influence a better disease outcome.
  • mesalamine is working as medication but starts to lose efficiency (11/2017) based on disease symptoms and calprotectin levels.
  • Adalimumab is added as medication (12/2017), resulting in remission (11/2019).
  • Patient UC-10 is initially treated with an integrin o P? blocker, which stops working (BPI not available). Subsequently, the patient is treated with an antiTNFa. This medication initially results in remission, which correlates with a BPI of ⁇ 0.0. However, the antiTNFa medication stopped working, requiring adjustment of the patient’s medication. The BPI is increased at the time of the antiTNFa medication not working.
  • SPA fragment sequencing of mcfDNA in blood can be successfully used to determine the BPI of patients suffering from inflammatory diseases such as IBD.
  • the BPI was successfully used as a surrogate biomarker for both medication performance and assessing the risk for serious or life-threatening complications, such as an increased risk of inflammation.
  • the BPI can be used as a surrogate biomarker for diseases where patients are at similar risk of serious or life-threatening infections due to treatment with immune repressing drugs, including rheumatoid arthritis juvenile idiopathic arthritis, psoriatic arthritis, ankylosing spondylitis, plaque psoriasis, acute Graft versus Host disease.
  • Immune suppressing mediations that can benefit from patient monitoring using the BPI to determine the risk of serious or life-threatening infections include, but are not limited to, abatacept (Orencia); adalimumab (Humira); infliximab (Remicade); Tofacitinib (Xeljanz); risankizumab (Skyrizi); anakinra (Kineret); azathioprine (Azasan); certolizumab (Cimzia); cyclosporine (Gengraf, Neoral, Sandimmune); etanercept (Enbrel); golimumab (Simponi); infliximab (Remicade); methotrexate (Otrexup, Rasuvo, Trexall); rituximab (Rituxan); steroids including dexamethasone, methylprednisolone (Medrol), prednisolone (Orapred ODT, Prelone), and pre
  • the BPI can also be used to assess the side effects for other medications/treatments that can cause complications of inflammation, including the gastrointestinal track, or affect the immune response, such as immune checkpoint inhibitor drugs like Pembrolizumab, Nivolumab, and Cemiplimab as anti-PD-1 antibodies, Ipilimumab as an anti- CTLA-4 antibody, as well as Atezolizumab, Avelumab, and Durvalumab as anti-PD-Ll antibodies.
  • immune checkpoint inhibitor drugs like Pembrolizumab, Nivolumab, and Cemiplimab as anti-PD-1 antibodies, Ipilimumab as an anti- CTLA-4 antibody, as well as Atezolizumab, Avelumab, and Durvalumab as anti-PD-Ll antibodies.
  • Propionibacterium acnes associated with inflammation in radical prostatectomy specimens A possible link to cancer evolution? J. Urol. 173: 1969-1974.
  • Liquid biopsy for infectious diseases a focus on microbial cell-free DNA sequencing. Theranostics 10: 5501-5513.
  • the chaperonin- 60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne des procédés d'amplification d'ADN acellulaire microbien (ADNmcf) dans des échantillons de sang. Contrairement au séquençage d'amplicons, dans lequel des régions entre deux séquences conservées sont ciblées pour une amplification par PCR, les procédés selon l'invention utilisent une séquence conservée unique en tant que site de recuit d'amorce pour initier une amplification par PCR. La PCR d'enrichissement (E-PCR) est effectuée à l'aide d'une amorce dégénérée complémentaire d'une région conservée dans un gène marqueur phylogénétique microbien et d'une amorce supplémentaire à l'extrémité opposée de l'ADNmcf. La région conservée est adjacente à une région hypervariable. Une PCR d'amplification est effectuée sur les fragments d'ADNmcf enrichis à l'aide d'une amorce dégénérée modifiée ayant une extension d'extrémité 3' par rapport à l'amorce dégénérée, et de l'amorce supplémentaire, générant ainsi des fragments d'ADNmcf amplifiés enrichis. Le séquençage de l'ADNmcf amplifié permet l'identification d'espèces microbiennes, ce qui est utile pour diagnostiquer et surveiller des maladies inflammatoires et le cancer, y compris la stratification du cancer par stade, par type, et par sous-type de tumeur.
PCT/US2024/036971 2023-07-25 2024-07-05 Séquençage de fragment d'amplicon à point unique et procédés de diagnostic et de surveillance d'une maladie Pending WO2025024111A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363515480P 2023-07-25 2023-07-25
US63/515,480 2023-07-25
US202463549246P 2024-02-02 2024-02-02
US63/549,246 2024-02-02

Publications (1)

Publication Number Publication Date
WO2025024111A1 true WO2025024111A1 (fr) 2025-01-30

Family

ID=94375657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/036971 Pending WO2025024111A1 (fr) 2023-07-25 2024-07-05 Séquençage de fragment d'amplicon à point unique et procédés de diagnostic et de surveillance d'une maladie

Country Status (1)

Country Link
WO (1) WO2025024111A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4469607A4 (fr) * 2022-01-24 2026-01-07 Gusto Global Llc Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070072206A1 (en) * 2005-06-07 2007-03-29 Hill Janet E Strong PCR primers and primer cocktails
US20160244817A1 (en) * 2013-09-30 2016-08-25 President And Fellows Of Harvard College Methods of Determining Polymorphisms
WO2019224548A2 (fr) * 2018-05-23 2019-11-28 Lucite International Uk Limited Procédé permettant la production de méthacrylates
WO2020055887A1 (fr) * 2018-09-10 2020-03-19 T2 Biosystems, Inc. Procédés et compositions pour le séquençage à haute sensibilité dans des échantillons complexes
US20200131582A1 (en) * 2016-06-07 2020-04-30 The Regents Of The University Of California Cell-free dna methylation patterns for disease and condition analysis
WO2021072439A1 (fr) * 2019-10-11 2021-04-15 Life Technologies Corporation Compositions et procédés pour évaluer des populations microbiennes
WO2021110833A1 (fr) * 2019-12-04 2021-06-10 Consejo Superior De Investigaciones Cientificas Outils et procédés pour détecter et isoler des bactéries produisant de la colibactine
US20210310085A1 (en) * 2013-11-07 2021-10-07 The Board Of Trustees Of The Leland Stanford Junior University Cell-free nucleic acids for the analysis of the human microbiome and components thereof
US20220010376A1 (en) * 2012-03-20 2022-01-13 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
US20220195496A1 (en) * 2020-12-17 2022-06-23 Karius, Inc. Sequencing microbial cell-free dna from asymptomatic individuals
WO2023141347A2 (fr) * 2022-01-24 2023-07-27 Gusto Global, Llc Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070072206A1 (en) * 2005-06-07 2007-03-29 Hill Janet E Strong PCR primers and primer cocktails
US20220010376A1 (en) * 2012-03-20 2022-01-13 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
US20160244817A1 (en) * 2013-09-30 2016-08-25 President And Fellows Of Harvard College Methods of Determining Polymorphisms
US20210310085A1 (en) * 2013-11-07 2021-10-07 The Board Of Trustees Of The Leland Stanford Junior University Cell-free nucleic acids for the analysis of the human microbiome and components thereof
US20200131582A1 (en) * 2016-06-07 2020-04-30 The Regents Of The University Of California Cell-free dna methylation patterns for disease and condition analysis
WO2019224548A2 (fr) * 2018-05-23 2019-11-28 Lucite International Uk Limited Procédé permettant la production de méthacrylates
WO2020055887A1 (fr) * 2018-09-10 2020-03-19 T2 Biosystems, Inc. Procédés et compositions pour le séquençage à haute sensibilité dans des échantillons complexes
WO2021072439A1 (fr) * 2019-10-11 2021-04-15 Life Technologies Corporation Compositions et procédés pour évaluer des populations microbiennes
WO2021110833A1 (fr) * 2019-12-04 2021-06-10 Consejo Superior De Investigaciones Cientificas Outils et procédés pour détecter et isoler des bactéries produisant de la colibactine
US20220195496A1 (en) * 2020-12-17 2022-06-23 Karius, Inc. Sequencing microbial cell-free dna from asymptomatic individuals
WO2023141347A2 (fr) * 2022-01-24 2023-07-27 Gusto Global, Llc Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4469607A4 (fr) * 2022-01-24 2026-01-07 Gusto Global Llc Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples

Similar Documents

Publication Publication Date Title
US12139768B2 (en) Polymerase chain reaction primers and probes for Mycobacterium tuberculosis
Frickmann et al. Emerging rapid resistance testing methods for clinical microbiology laboratories and their potential impact on patient management
JP2020513856A (ja) 大腸癌の複合バイオマーカーを特定するための配列ベースの糞便微生物群調査データの活用
CN107660234A (zh) 使用二代测序预测器官移植排斥的方法
JP2014023539A (ja) 抗生物質耐性細菌の検出および分析のための方法、組成物、およびキット
US20110312521A1 (en) Genomic Transcriptional Analysis as a Tool for Identification of Pathogenic Diseases
CN104212890A (zh) 诊断传染病病原体及其药物敏感性的方法
US20250095782A1 (en) Single-loci and multi-loci targeted single point amplicon fragment sequencing
CN105473743A (zh) 脓毒症生物标志物及其应用
EP3430168B1 (fr) Procédés et kits pour l'identification de souches de klebsiella
WO2015023616A2 (fr) Procédé de discrimination exhaustive, quantitative et hautement sensible de séquences d'acides nucléiques dans des populations homogènes et hétérogènes
WO2015123205A1 (fr) Compositions et méthodes pour déterminer la probabilité d'appendicite
WO2016138471A1 (fr) Procédé et kit permettant de prédire la résistance et la sensibilité des bactéries aux antibiotiques
CN115992267B (zh) 一种高通量高精度检出多种病原菌的引物组、试剂盒及方法
WO2025024111A1 (fr) Séquençage de fragment d'amplicon à point unique et procédés de diagnostic et de surveillance d'une maladie
KR20240063034A (ko) 간암 진단용 dna 메틸화 마커 및 이의 용도
WO2017087735A1 (fr) Procédé de traitement de la maladie de crohn
US20260028669A1 (en) Methods and compositions for nucleic acid analysis
WO2019108549A1 (fr) Dosages pour la détection d'une maladie de lyme aiguë
CN119876438B (zh) 一种鉴定高毒力型肺炎克雷伯菌的荧光定量pcr的引物组、应用和检测方法
CN113637782B (zh) 与急性胰腺炎病程进展相关的微生物标志物及其应用
KR102816628B1 (ko) 대사증후군 특이적 후성유전 메틸화 마커 및 이의 용도
US20230326600A1 (en) A method for determining a diagnostic outcome
HK40072693A (en) Polymerase chain reaction primers and probes for mycobacterium tuberculosis
HK40104688A (zh) 脓肿分枝杆菌复合株的基因型测定及大环内酯类耐药性检测

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24846199

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE