WO2024252401A1 - Marqueurs - Google Patents
Marqueurs Download PDFInfo
- Publication number
- WO2024252401A1 WO2024252401A1 PCT/IL2024/050563 IL2024050563W WO2024252401A1 WO 2024252401 A1 WO2024252401 A1 WO 2024252401A1 IL 2024050563 W IL2024050563 W IL 2024050563W WO 2024252401 A1 WO2024252401 A1 WO 2024252401A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- lung cancer
- cfdna
- methylation
- marker
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Definitions
- the present invention relates to methods, systems and kits for diagnosing lung cancer (and particularly early-stage and/or high-grade lung cancer) in a subject, staging and grading the cancer, evaluating post-treatment disease recurrence, monitoring treatment efficacy and providing prognosis, by analysing DNA methylation markers in cell-free DNA from a sample of the subject.
- Lung cancer is one of the most common and serious types of cancer.
- NSCLC non-small cell lung cancer
- SCLC small cell lung cancer
- the general prognosis of lung cancer is poor as it does not usually cause noticeable symptoms until it has spread through the lungs and sometimes also into other parts of the body. Therefore, detection of cancer at the earliest possible stage is of paramount importance for treatment of the disease.
- Lung cancers may be classified according to ‘stage’ and/or ‘grade’ (with the classification process known as ‘staging’ or ‘grading’, respectively).
- stage of a lung cancer indicates its size and degree of spread around the body, so provides information about the progression of the disease.
- a typical lung cancer staging system comprises stages 1 to 4, with progression being from stage 1 to stage 4.
- the grade of a lung cancer is determined by specific morphological features of the lung cancer tumor cells and generally indicates the similarity of the tumor cells to non-tumor cells when viewed under a microscope.
- Lung cancer grade provides information about the aggressiveness of the cancer, i.e., how quickly the lung cancer cells are likely to be able to divide and spread around the body.
- Various grading systems may be used, comprising a different number of possible grades.
- SCLCs typically have a high grade. Both stage and grade can influence treatment efficacy, for instance certain treatments may not be as effective in a high-grade cancer compared to a low-grade cancer. Stage and grade will also affect the balance of risk against benefit for any particular treatment option.
- the present invention addresses these needs by providing methods, systems and kits for diagnosing and staging/grading lung cancer based on the methylation of CpG sites in DNA, specifically cell-free DNA (cfDNA), at one or more of the genomic loci in the appended sequence listing.
- cfDNA cell-free DNA
- the genomic loci in the sequence listing are also referred to as ‘markers’ or ‘marker loci’.
- DNA methylation is the conversion of a cytosine in a DNA site with the dinucleotide sequence ‘CG’ (known as a CpG site) to 5 -methylcytosine (5mC).
- CG dinucleotide sequence
- 5mC 5 -methylcytosine
- Changes in DNA methylation are known to occur in many types of cancer, but the pattern of DNA methylation across the genome also varies over time, between different individuals and between different instances of the disease. So, it is difficult to link specific changes in DNA methylation to the presence or absence of lung cancer (or its stage/grade) and there is currently insufficient knowledge about methylation markers that are highly correlated with lung cancer.
- the inventors have discovered that a change in methylation of one or more of the CpG markers in the sequence listing indicates the presence of lung cancer with surprisingly high specificity and high sensitivity. Accordingly, measuring the methylation level of one or more of the markers in the sequence listing allows for improved diagnosis of lung cancer.
- the change for any particular marker can be an increase in methylation (hypermethylation) or a decrease (hypomethylation) compared to an index methylation level for the marker in cfDNA from an individual/individuals without lung cancer (which can include individuals without lung cancer, but who are at high risk of developing lung cancer).
- the invention involves analysis of the methylation of CpG sites in cfDNA, i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell. This allows for non-invasive diagnosis (so-called ‘liquid biopsy’).
- CfDNA from an individual with lung cancer will comprise small amounts of DNA derived from the lung cancer tumor cells, so analysing methylation in cfDNA allows for cancer-associated marker methylation changes to be detected without having to take samples of the tumor itself.
- the small amount of tumor-derived cfDNA will be mixed with a massive excess of DNA derived from non-tumor cells. This is even more the case when the individual has an early-stage lung cancer.
- the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising steps of:
- the known source may be cfDNA from: an individual without lung cancer or an individual known to have lung cancer. If the source is an individual known to have lung cancer, in some embodiments they be known to have a particular type of lung cancer (z.e., NSCLC or SCLC), a particular stage of lung cancer, or a particular grade of lung cancer. Comparison of the methylation level of the sample to an index methylation level derived from an individual having a known lung cancer status permits the likelihood of the presence (or absence) of lung cancer to be determined.
- TNM Tumor, Node and Metastasis
- the TNM system can also be used to divide lung cancers into ‘number stages’ (stage 1, stage 2, stage 3 or stage 4) and/or sub-stages (stage 1A, stage IB, stage 2A, stage 2B, stage 3 A, stage 3B, stage 3C, stage 4A or stage 4B).
- stage 1B stage 1, stage 2, stage 3 or stage 4
- sub-stages stage 1A, stage IB, stage 2A, stage 2B, stage 3 A, stage 3B, stage 3C, stage 4A or stage 4B.
- SCLC an alternative staging system is to classify the cancer as either ‘limited’ or ‘extensive’.
- a system for grading lung cancer classifies the cancer into grade 1 (well differentiated, lepidic dominant), grade 2 (moderately differentiated, acinar or papillary predominant) or grade 3 (poorly differentiated; solid or micropapillary predominant).
- the known source may be cfDNA from an individual with stage 1, stage 2, stage 3 or stage 4 lung cancer. Alternatively or additionally, the known source may be cfDNA from an individual with stage 1A, stage IB, stage 2A, stage 2B, stage 3A, stage 3B, stage 3C, stage 4A or stage 4B lung cancer. The known source may also be cfDNA from an individual with limited SCLC or extensive SCLC. The known source may also be cfDNA from an individual with grade 1 , grade 2 or grade 3 lung cancer.
- the index level may be based on cfDNA from a plurality of individuals with the same cancer status.
- the known source may be a plurality of individuals without lung cancer, a plurality of individuals known to have lung cancer, a plurality of individuals known to have a particular type of lung cancer (i.e., NSCLC or SCLC), a plurality of individuals known to have a particular stage of lung cancer, or a plurality of individuals known to have a particular grade of lung cancer.
- the index methylation level may be an average of the methylation levels for the same marker in the known sources.
- the comparison methylation level may be the average of the methylation level calculated for the same marker in cfDNA from many individuals without lung cancer. The average may be the arithmetic mean.
- a cancer becomes increasingly severe ⁇ i.e., progresses from stage 1 through to stage 4 and/or from a low grade to a high grade
- additional changes in CpG methylation can accrue.
- the methylation level of the markers of the sequence listing may also change with lung cancer severity, so providing information about lung cancer stage and/or grade.
- Methods of the invention are particularly useful for diagnosing early-stage lung cancer, i.e., stage 1 (encompassing stage 1A and stage IB).
- Methods of the invention are also particularly useful for diagnosing high-grade lung cancer (e.g., grade 2 or 3), preferably at an early stage (e.g., stage 1 or 2).
- the methylation level of the markers of the sequence listing may also differ depending on lung cancer type (NSCLC or SCLC). So, methods of the invention are also useful for diagnosing whether NSCLC or SCLC is present in a human subject.
- the invention provides a method for determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade in a human subject, comprising:
- the measured methylation level(s) may be compared to more than one index methylation level, each index level being for the same marker in individual(s) having a different type, stage and/or grade of lung cancer, and the likelihood of the presence of lung cancer of a particular type, stage and/or grade is based on all the comparisons performed.
- the indicative value of a marker can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker.
- AUC receiver operating characteristic
- an AUC of greater than 0.5 (for hypermethylated markers) or less than 0.5 (for hypomethylated markers) is useful for disease prediction.
- the difference between the AUC of and 0.5 is >0.25, >0.30, >0.35, >0.40, >0.45, or greater.
- the indicative value of a marker can also be measured statistically by comparing its mean methylation level in cfDNA samples from patients with lung cancer to its mean methylation level in cfDNA samples from patients without lung cancer e.g. using Student’s t-test to produce a p-value.
- a p-value ⁇ 0.05 is generally used as the cut-off for statistical significance.
- the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
- the methylation level of a marker is measured by cfDNA digestion using methylation-sensitive and/or methylation-dependent restriction endonucleases (MSREs or MDREs, respectively) followed by downstream analytical steps which quantify the degree of digestion.
- MSREs or MDREs methylation-sensitive and/or methylation-dependent restriction endonucleases
- downstream analytical steps which quantify the degree of digestion.
- MSRE methylation-sensitive and/or methylation-dependent restriction endonucleases
- Digestion with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention. MSREs and MDREs are described in more detail below.
- this term refers to the mixing of active restriction enzyme(s) with cfDNA in conditions under which digestion can occur. If there are no recognition sites for the restriction enzyme in question (e.g., because it is a MSRE and all of the recognition sequences are fully methylated) then a step of ‘digestion’ still takes place even though DNA cleavage does not occur.
- the methylation level of a marker is measured by mixing cfDNA with one or more reagents that chemically modify nucleobases within DNA in a methylation-conditional manner, followed by downstream analytical steps which quantify the degree of modification.
- a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil (further details below).
- the downstream analytical steps comprise (i) amplification of a sequence comprising a CpG site located within the marker, or (ii) high throughput sequencing.
- the amplification is by polymerase chain reaction (PCR), specifically by real time PCR (rtPCR, also known as quantitative PCR or qPCR).
- rtPCR real time PCR
- the amplification is by another real-time amplification reaction, for instance, an isothermal real-time amplification reaction such as real-time accelerated reverse transcription loop-mediated isothermal amplification (real-time RT-LAMP).
- the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
- the invention also provides primer pairs comprising a first primer and a second primer, for amplifying a CpG site within a marker of the sequence listing. So, for each marker listed in the sequence listing, the invention provides a primer pair consisting of one primer binding upstream of a CpG site within the marker and one primer binding downstream of the CpG site, wherein the primer pair is suitable for use in a PCR to generate an amplification product comprising the CpG site.
- measuring a methylation level comprises using a fluorescently-labelled polynucleotide probe to obtain a signal intensity for an amplification product generated in the rtPCR.
- the labelled probe is typically between 15-30 nucleotides in length and comprises sequence that is complementary to a sub-sequence within the amplicon of interest.
- the melting temperature of the probe is comparable to that of the primers used in the rtPCR.
- the invention also provides fluorescently-labelled oligonucleotide probes for detecting an amplification product of a primer pair of the invention.
- the invention also provides primer sets, each primer set having 4-6 primers, for amplifying a CpG site within a marker of the sequence listing by an isothermal real-time amplification reaction such as real-time RT-LAMP.
- the invention also provides fluorescently-labelled polynucleotide probes for obtaining a signal intensity for an amplification product generated in the isothermal real-time amplification reaction.
- the invention also provides a nucleic acid construct comprising a pair of sequencing adapters flanking a nucleic acid insert, wherein the insert is a marker listed in the sequence listing (or a fragment thereof).
- the sequencing adapters can include one or more of: a site recognised by a universal primer; a flow cell binding sequence, such as a P5 or P7 sequence; an index sequence, such as an i5 or i7 index; and/or a molecular barcode.
- the two adapters within the construct may differ e.g. one may include a P7 and i7 sequence, whereas the other includes a P5 and i5 sequence.
- the insert is a fragment of a marker in the sequence listing, it is ideally at least 20 nucleotides long e.g.
- constructs can be prepared by ligating sequencing adapters to a digested cfDNA sample e.g. by ligating Y-shaped adapters.
- the digested cfDNA sample may be subjected to end repair and/or A-tailing prior to the ligation.
- the nucleic acid construct is suitable for sequencing by a NGS technique.
- the invention provides a method for determining a likelihood of the presence of lung cancer in a human subject, comprising:
- the amount of cfDNA that can be isolated from a typical sample is generally not limiting.
- the invention is particularly useful as an initial evaluation technique, or as part of a screening programme.
- the subject may not be suspected of having lung cancer.
- the subject may be suspected of having lung cancer but is asymptomatic (/. ⁇ ?., does not exhibit any suspicious clinical signs of lung cancer).
- a reason that the subject is suspected of having lung cancer may be that the subject is classified as having a high risk of developing lung cancer, for example, based on age, smoking history, previous history of lung cancer, genetic predisposition, and/or family history.
- the high risk may be classified according to the age and smoking history of the subject. For instance, the subject may be classified as high risk if they are between 55 and 74 years old and smoke or have smoked.
- USPSTF US Preventative Services Taskforce
- a ‘pack year’ is a unit of smoking equivalent to an average of 1 pack of cigarettes (such as 20 cigarettes) per day for 1 year.
- a person could have a 20 pack year history by smoking 1 pack a day for 20 years, or 2 packs a day for 10 years.
- the subject has a smoking history of about 40 pack years or more.
- the subject is at least about 50 years old, at least about 55 years old, at least about 60 years old, or at least about 65 years old.
- the subject may exhibit suspicious clinical signs of cancer and/or is suspected of having lung cancer based on other prior assay(s) e.g., based on testing of other biomarker(s).
- the subject is at risk of recurrence of lung cancer.
- the subject shows at least one symptom or characteristic of lung cancer.
- Symptoms or characteristics of lung cancer include, but are not limited to: a persistent (and potentially worsening) cough, recurring chest infections, coughing up blood, aches and/or pains when breathing and/or coughing, persistent breathlessness, persistent lack of energy, loss of appetite, unexplained weight loss, finger clubbing, difficulty swallowing or pain when swallowing, wheezing, a hoarse voice, swelling of the face or neck and persistent chest and/or shoulder pain.
- the subject was not previously diagnosed with lung cancer. In some embodiments, the subject was previously diagnosed and treated for lung cancer. In some embodiments, such a subject is in need of monitoring for the recurrence of lung cancer.
- methods include a step of preparing a report in paper or electronic form based on the assessment of the likelihood of lung cancer or the diagnosis of the presence or absence of lung cancer, and optionally communicating the report to the subject and/or a healthcare provider of the subject.
- the invention can also be embodied as a method for: assessment of a subject with lung cancer, assessment of a subject without any symptoms of lung cancer, assessment of a subject with at least one symptom of lung cancer, ruling out lung cancer in a subject with at least one symptom of lung cancer, determining the presence or absence of high-grade lung cancer in a subject, or ruling out high-grade lung cancer in a subject.
- the invention can also be used as an initial step in existing lung cancer diagnostic techniques, to target such techniques on patients where the invention indicates that lung cancer is present.
- the invention also provides a method for detecting lung cancer in a subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and performing a clinical diagnostic step on the subject.
- the clinical diagnostic step may be one or more of: a chest X-ray; a CT scan; a PET-CT scan; a bronchoscopy and biopsy; a bronchoscopy and endobronchial ultrasound scan; a thoracoscopy; a mediastinoscopy; and/or percutaneous needle biopsy.
- the invention can also be embodied as methods of treatment.
- the invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as disclosed herein, and administering, deciding to administer, or recommending the administration of a suitable treatment to the subject based on the likelihood.
- the subject can be taken forward into a suitable method of treatment.
- the treatment may comprise one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy (see below for more details).
- the invention also provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer of a particular type, stage and/or grade as disclosed herein, and administering a suitable treatment based on the likelihood.
- a likelihood of the presence of lung cancer is determined in a human subject one or more times after the subject has undergone lung cancer treatment. This provides information about treatment response.
- the human subject is identified as non- responsive to the lung cancer treatment and said lung cancer treatment is modified.
- the human subject is identified as non-responsive to the lung cancer treatment and it is decided to modify said lung cancer treatment.
- the human subject is identified as non-responsive to the lung cancer treatment and it is recommended to modify said lung cancer treatment.
- the human subject is categorised as having residual disease or tumor viable cells, and a second-line therapy is administered, to the subject.
- the human subject is categorised as having residual disease or tumor viable cells, and it is decided to administer a second-line therapy to the subject.
- the second-line therapy comprises one or more of surgical resection (including wedge resection, segmental resection, sleeve resection, lobectomy and pnemonectomy), laser therapy, photodynamic therapy, cryosurgery, electocautery, chemotherapy, radiation therapy, immunotherapy, and/or targeted drug therapy.
- said subject is categorised as having residual disease or viable tumor cells, thereby indicating that said subject is at high risk of disease recurrence.
- the invention also provides a method for differentially amplifying tumor-derived cfDNA and non-tumor-derived cfDNA in cfDNA from a sample of a human subject having lung cancer comprising: (a) treating the cfDNA from a sample of a human subject having lung cancer with at least one reagent that differentially affects methylated and non-methylated DNA; and
- the reagent that differentially affects methylated and non-methylated cfDNA is a MSRE or a MDRE.
- the reagent is a MSRE. Treating the cfDNA with a plurality of MSREs or MDREs, for instance, two MSREs, is also encompassed in the methods of the invention.
- the reagent that differentially affects methylated and non-methylated cfDNA may be a reagent that conditionally chemically modifies nucleobases within DNA based on their methylation status.
- a suitable reagent is sodium bisulfite, which converts unmethylated cytosine to uracil.
- the amplification reaction is preferably PCR. Most preferably the amplification reaction is rtPCR because this can be highly sensitive and does not require a separate step for quantifying amplification. Alternatively, the amplification reaction may be an isothermal amplification reaction, such as RT-LAMP.
- the invention also provides a method for preparing data useful for lung cancer diagnosis comprising: treating cell-free DNA (cfDNA) from a sample of a subject with a reagent that differentially affects methylated and non-methylated DNA; measuring a methylation level, based on the effect of the reagent on the cfDNA, for at least one marker of the sequence listing; and recording the measured methylation level(s).
- cfDNA cell-free DNA
- the CpG markers disclosed herein have been selected based on their ability to identify methylation changes associated with lung cancer. More generally, however, the same markers can also be useful for identify methylation changes associated with other types of cancer and proliferative disorders. Moreover, they can be used as pan-cancer markers i.e. for determining the likelihood of the presence of multiple different types of cancer (including lung cancer).
- the inventors have identified the genomic loci listed in the sequence listing as markers for early detection of lung cancer.
- measuring the methylation level of a CpG located within a sequence of the sequence listing can be used for determining the likelihood of the presence of lung cancer in a human subject.
- the sequence listing provides the sequence of each marker and comprises the following additional information for each marker: the marker’s chromosome (in the “chromosome” qualifier of the “source” feature of each sequence) start/end coordinates according to the hg38 genome assembly (in the “map” qualifier of the “source” feature of each sequence)
- the markers of SEQ ID NOs: 1-30615 have increased methylation (‘hyper’) in cfDNA from cancer patients compared to healthy control subjects.
- the markers of SEQ ID NOs: 30616-39636 have decreased methylation (‘hypo’) in cfDNA from cancer patients compared to healthy control subjects.
- two or more markers may overlap (e.g. SEQ ID NOs: 25726 & 25727) and, in these instances, the invention also extends to an aggregated marker encompassing the overlapping markers (e.g. for SEQ ID NOs: 25726 & 25727, nucleotides 12,685,047-12,685,301 on chromosome 16).
- genomic locus refers to a DNA sequence at a specific region within the genome.
- the specific region may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome.
- Genomic loci include gene sequences as well as other genetic elements (e.g., intergenic sequences).
- a ‘marker locus’ or simply ‘marker’ is a genomic locus that is differentially methylated between different sources of cfDNA (e.g. lung tumor vs. healthy tissue), and therefore analysis of its methylation provides an indication with respect to the source of the DNA.
- hypermethylation of a particular marker indicates of the presence of the cancer, where ‘hypermethylation’ means increased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer.
- hypomethylation of a particular marker indicates of the presence of the cancer, where ‘hypomethylation’ means decreased methylation of the marker across a sample of DNA molecules containing the marker, compared to an index methylation level for that marker in cfDNA from an individual/individuals without lung cancer.
- the comparison of a methylation level for a marker in a sample and the index methylation level of that marker can use typical techniques used when comparing measurements in biological systems.
- the comparison may be accompanied by an indication of the confidence in that comparison e.g. based on statistical analysis.
- the degree to which the methylation status of a particular marker is indicative of the presence or absence of lung cancer can be quantified by measuring the area under the receiver operating characteristic (ROC) curve (AUC) for the marker.
- ROC receiver operating characteristic
- a particular methylation level can be chosen as a threshold for a disease prediction model based on methylation of that marker.
- the model would predict the presence of disease for observed methylation levels that are cross that threshold, and the absence of disease for observed methylation levels that do not.
- a particular classification threshold will be associated with a true positive rate (sensitivity), z. e. , the proportion of observations that are correctly predicted to indicate disease, and a false positive rate (1 - specificity), i. e. , the proportion of observations that are incorrectly predicted to indicate disease.
- a ROC curve is obtained by plotting the true positive rate (on the y axis) against the false positive rate (on the x axis) for various classification thresholds.
- ROC curve that is simply a straight line from the bottom left corner to the top right corner (AUC of 50% or 0.5), occurs if the true and false positive rates are equal at all classification thresholds, and indicates no predictive value.
- Preferred markers herein have an AUC that differs from 0.5 by at least 0.2 e.g. by >0.25, >0.30, >0.35, >0.40, >0.45, or more.
- a hypermethylated marker may thus have a AUC of >0.7, and a hypomethylated marker may have a AUC of ⁇ 0.2.
- Markers according to the invention are described in the sequence listing.
- the location of the markers is given according to Genome Reference Consortium Human Build 38 patch pl3 (‘GRCh38.pl3’, generally known as ‘hg38’).
- the markers in the sequence listing cover between around 30 bp to around 500 bp of the human genome.
- the markers in the sequence listing contain at least one CpG site located within a restriction site of a MSRE or MDRE.
- CpG site(s) may be at any position within a particular marker in the sequence listing.
- the invention can be based on analysis of any CpG found within the markers in the sequence listing.
- these marker loci can be detected in cell-free DNA, particularly in cfDNA from plasma samples, enabling non-invasive disease detection and characterization.
- Cell-free DNA particularly in cfDNA from plasma samples, enabling non-invasive disease detection and characterization.
- cfDNA cell-free DNA
- the methods disclosed herein are particularly useful for analysing cell-free DNA (cfDNA) i.e., fragmented genomic DNA which is found in vivo in an animal within a bodily fluid rather than within an intact cell.
- cfDNA cell-free DNA
- the origin of cfDNA is not fully understood, but it is generally believed to be released from cells in processes such as apoptosis and necrosis.
- cfDNA is highly fragmented compared to intact genomic DNA (e.g., see Alcaide et al. (2020) Scientific Reports 10, article 12564), and in general circulates as fragments between 120-220 bp long, with a peak around 168 bp (in humans).
- cfDNA is present in many bodily fluids, including but not limited to blood and urine, and the methods and compositions disclosed herein can use any suitable source of cfDNA e.g., a blood sample (such as venous blood) or a urine sample.
- a blood sample such as venous blood
- a urine sample e.g., a blood sample obtained from a blood sample.
- cfDNA is isolated from blood, and the blood may be treated to yield plasma (i.e., the liquid remaining after a whole blood sample is subjected to a separation process to remove the blood cells, typically involving centrifugation) or serum (i.e., blood plasma without clotting factors such as fibrinogen).
- plasma i.e., the liquid remaining after a whole blood sample is subjected to a separation process to remove the blood cells, typically involving centrifugation
- serum i.e., blood plasma without clotting factors such as fibrinogen.
- the methods and compositions disclosed herein can be used
- Methods disclosed herein may thus include a step of purifying cfDNA from a blood, plasma or serum sample, to provide cfDNA for digestion and analysis. Methods may also include a step of obtaining a blood sample and preparing plasma or serum therefrom, thus providing a source for downstream purification of cfDNA.
- Blood can be collected in tubes that contain an anticoagulant and an agent to inhibit genomic DNA from white blood cells in the sample being released into the plasma component of the blood sample.
- Such tubes are commercially available as glass cfDNA ‘Blood Collection Tubes’ or ‘BCT’ from Streck (La Vista, NE) e.g. as discussed by Diaz et al. (2016) PLoS One 11(11): e0166354, and they can stabilize cfDNA within blood for up to 14 days at 6-37°C (thus providing advantages compared to typical K 2 EDTA collection tubes).
- Useful anticoagulants include, but are not limited to, EDTA, heparin, or citrate.
- Useful agents to inhibit release of genomic DNA from white blood cells include, but are not limited to, diazolidinyl urea, imidazolidinyl urea, dimethoylol-5,5- dimethylhydantoin, dimethylol urea, 2-bromo-2-nitropropane-l,3-diol, oxazolidines, sodium hydroxymethyl glycinate, 5-hydroxy-methoxymethyl-l-laza-3,7-dioxabicyclo[3.3.0]octane, 5- hydroxymethyl- 1-1 aza-3,7dioxa-bicyclo[3.3.0]octane, 5-hydroxypoly [methyleneoxy]methyl- 1 -laza- 3,7dioxabicyclo[3.3.0]-octane, quaternary adamantine, and mixtures thereof.
- a quenching agent e.g. lysine, ethylene diamine, arginine, urea, adenine, guanine, cytosine, thymine, spermidine, or any combination thereof
- a quenching agent e.g. lysine, ethylene diamine, arginine, urea, adenine, guanine, cytosine, thymine, spermidine, or any combination thereof
- a tube can include imidazolidinyl urea (or diazolidinyl urea), EDTA and glycine.
- Suitable collection tubes can be found in W02013/123030 and US2010/0184069.
- Other useful collection tubes are available, including but not limited to various plastic tubes: the ‘Cell-Free DNA Collection Tube’ from Roche, made of PET; the ‘LBgard blood tube’ from Biomatrica, made from plastic and suitable for up to 8.5 mL of blood; and the ‘PAXgene Blood DNA tube’ from PreAnalytiX or Qiagen. These tubes are discussed in more detail in Kerachian et al. (2021) Clinical Epigenetics 13,193 and Grolz et al. (2016) Current Pathobiology Reports 6:275-86.
- These various tubes can store up to 8.5 mL of blood, or sometimes up to 10 mL.
- a blood sample taken from a subject may thus typically have a volume of between 5-10 mL.
- a 10 mL blood sample typically yields between 10-500 ng cfDNA, but can sometimes yield substantially higher amounts e.g. up to around 10 pg, particularly in certain cancer patients. Methods disclosed herein can be performed on the amount of cfDNA contained in a 10 mL blood sample. Methods and compositions disclosed herein may typically use from 10-400 ng of cfDNA, for instance from 10-250 ng or from 10-200 ng.
- Kits for purifying cfDNA from plasma (and other bodily fluids) are readily available e.g. the MagMAX cfDNA isolation kit from ThermoFisher, the Maxwell RSC ccfDNA plasma kit from Promega, the alle MiniMax high efficiency isolation kit from Beckman Coulter, or the QIAamp or EZ1 products from Qiagen.
- Methods disclosed herein may therefore utilise cfDNA extracted from ⁇ 10 mL blood from a subject. Methods may begin with cfDNA which has already been prepared, or may include an upstream step of preparing the cfDNA. Similarly, methods may include an upstream step of obtaining a plasma sample before a step of preparing cfDNA from the plasma sample.
- the cfDNA utilised in methods disclosed herein is substantially free of singlestranded DNA (ssDNA) i.e. where less than 7% of the cfDNA molecules (by number) are singlestranded, and preferably less than 5% or less than 1% (i.e. such that at least 99% of the cfDNA molecules are double-stranded).
- the cfDNA contains less than 0.1% ssDNA, less than 0.01% ssDNA, or may even contain no ssDNA (i.e. free of ssDNA). Extraction of cfDNA to obtain a cfDNA sample substantially free of ssDNA is described, for example, in WQ2020/188561.
- Ensuring low levels of ssDNA avoids potential inhibition of restriction digestion, and also avoids undesired amplification of ssDNA.
- kits are available for quantifying single-stranded DNA in a sample e.g. the Promega QuantiFluorTM kit.
- all extracted cfDNA is used in the methods disclosed herein.
- cfDNA is split into multiple fractions, and one or more fractions is not used in the methods disclosed herein but may instead be used in other analytical methods, or is kept for use in control experiments, or for other purposes.
- cfDNA is quantified prior to digestion. In other embodiments, cfDNA is not quantified prior to digestion. Measuring a methylation level
- Methods of the invention comprise measuring a methylation level of a marker in the sequence listing.
- a ‘methylation level’ of a marker as used herein is a numerical value conveying information about the proportion or number of cfDNA molecules in a sample of cfDNA which were methylated and/or unmethylated at one or more CpG site(s) in the marker.
- the invention can use any method suitable for measuring a methylation level. Methods encompassed by the invention include those that comprise analysis of DNA upstream and/or downstream of the markers given in the sequence listing, so long as the methylation level of at least one CpG site in a marker in the sequence listing is measured.
- Preferred methods are those comprising cfDNA digestion using methylation sensitive and/or dependent restriction endonucleases (MSREs/MDREs) followed by downstream analytical steps which quantify the degree of digestion of the marker and/or of a CpG site in the marker.
- Preferred downstream analyses are high-throughput sequencing (also known as next-generation sequencing or NGS) or real-time PCR (rtPCR, also known as quantitative PCR or qPCR).
- a methylation level can be expressed as a percentage, a fraction, a normalised value, etc.
- a methylation level of a marker may be expressed as a percentage, ratio or fraction representing the proportion of cfDNA molecules that are methylated at one or more CpG sites in the marker out of the total number of cfDNA molecules comprising the marker.
- a methylation level of a marker may be expressed as a copy number of methylated or unmethylated cfDNA molecules comprising the marker. This may be expressed as a ‘HitspanN’ of a genomic position in the marker (explained in more detail below).
- a methylation level may be expressed as the quantification cycle (Cq) for an amplicon comprising a marker locus.
- Cq quantification cycle
- the methylation level of a marker would again represent the number of cfDNA molecules comprising the marker which were methylated or unmethylated.
- methylation levels can be determined according to how often the MSREs/MDREs used cleave and/or do not cleave at their recognition site during digestion. For example, where digestion used an MSRE, alignments which span a particular recognition site indicate molecules which were not cleaved, and so which (with complete digestion) were methylated at the CpG site within the recognition site. So, alignments which span a recognition site directly indicate methylation of the site when an MSRE was used for digestion (and conversely, indicate unmethylation when an MDRE was used).
- alignments which start or terminate with the cleaved recognition site indicate molecules which were unmethylated at the site (and therefore cleaved during digestion). So, alignments which start or terminate with the cleaved recognition site directly indicate unmethylation of the site when an MSRE was used for digestion (and indicate methylation of the site when an MDRE was used).
- a methylation level can be determined from alignments that directly indicate methylation and/or alignments that directly indicate unmethylation. Preferably, alignments that directly indicate methylation and alignments that directly indicate unmethylation are considered because this allows for greater accuracy.
- methylation levels can be determined according to how often a nucleobase capable of being modified by the reagent(s) is modified by the reagent(s). For instance, in embodiments comprising sodium bisulfite treatment, a methylation level of a CpG can be determined from the number of reads wherein the site has the sequence ‘TG’ instead of ‘CG’.
- the HitspanN of a genomic position corresponds to the number of reads or alignments with a size of ‘N’ nucleotides centred on the position (where N is a positive even integer).
- the HitspanlOO of a genomic position refers to the number of reads or sequence alignments with a size of at least 100 nucleotides centred on the position. So, a HitspanlOO of 90 at a specific position means that there are 90 sequence reads or alignments with a size of at least 100 nucleotides centred on the position.
- a methylation level may be normalised with respect to a reference locus and/or a reference DNA sample.
- the methylation level is a methylation ratio between a marker locus and a reference locus (which may be in the cfDNA being analysed or in a reference DNA sample), expressed as a ratio between signals obtained for these loci in downstream analysis following restriction digestion, methylation-conditional nucleobase modification, PCR amplification, etc.
- the methylation level of a marker can be calculated by dividing the HitspanN of aposition in the marker by an expected HitspanN of the position e.g., the HitspanN which would be expected if the marker was fully methylated, and thus uncleaved by an MSRE).
- the expected HitspanN may be determined using, for instance: (i) the HitspanN of a position in a reference locus that is not cut by the restriction enzyme; (ii) the average HitspanN of positions in a plurality of such reference loci; or (iii) the HitspanN of a position in a reference locus in an undigested reference sample (e.g.
- methylation level may be inferred by comparing the HitspanN in a digested sample to the HitspanN in a reference locus in an undigested sample.
- the non-methylated CpG sites can be taken as sequencing reads whose 5' ends map to a site, as sequencing reads whose 3' ends map to a site, or as the half of the sum of sequencing reads whose 5' ends or 3' ends map to a site.
- some sequencing library preparation methods can result in depletion of small fragments, which are then not sequenced (e.g., in CpG islands, where a starting cfDNA molecule is cleaved by a MSRE at more than one unmethylated site, thus providing 3 or more restriction fragments, some of which are very small)
- the observed number of unmethylated CpG sites may be lower than the true value in the original sample. This distortion can be somewhat addressed by using the larger of the number of reads or alignments whose 3' ends map to a site and the number of reads or alignments whose 5' ends map to a site (or to use the mean).
- the reference locus may be a different locus compared to the marker locus.
- the reference locus and the marker locus may be present in the cfDNA from samples from the one or more first subjects.
- the reference locus may be in DNA from a sample other than those from the one or more first subjects and one or more second subjects, such as an artificial sample comprising a locus with a known methylation level.
- the reference locus may be the same locus as the marker locus, with the reference locus and marker locus in different samples.
- the marker locus may be present in cfDNA from samples from the one or more first subjects, and the reference locus may be in cfDNA from samples from the one or more second subjects.
- the marker locus may be present in cfDNA from samples from one of a plurality of first subjects, and the reference locus may be the same locus in cfDNA from samples from another one of the plurality of first subjects - for example, in first subjects that have a different disease classifications.
- Methylation level may also be determined without use of a reference locus.
- the expected read count for a marker locus may be determined as the sum of the read count for the marker locus (indicating methylation, where an MSRE is used) with the read count of loci that start or end at the marker locus (indicating non-methylation), taking account where necessary of any end repair which took place during library preparation. Therefore, a methylation level may be determined without reference to other loci or other samples, based on the ‘raw’ or ‘absolute’ level of methylation at the marker locus.
- Methods of the invention may comprise a methylation-conditional nucleobase modification step in which chemical changes are made to nucleobases within DNA based on their methylation status. Such chemical changes can be detected in downstream analytical steps.
- a suitable downstream step is high-throughput sequencing.
- the methylation-conditional nucleobase modification step is bisulfite conversion.
- DNA is treated with sodium bisulfite to convert unmethylated cytosine to uracil. The differences in sequence between treated and untreated DNA permits methylation to be detected.
- Methods of the invention may comprise bisulfite conversion (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish uracil from cytosine in the markers of the invention.
- the methylation-conditional nucleobase modification step is ten-eleven translocation (TET)-assisted pyridine borane sequencing (TAPS).
- TAPS refers to a nucleobase modification technique and does not include any particular methodology for reading the sequence of treated DNA.
- methylated cytosine is converted to dihydrouracil, which is recognised as thymine.
- Methods of the invention may comprise TAPS (including as part of an upstream step when preparing the DNA) followed by downstream analytical steps which can distinguish dihydrouracil from cytosine in the markers of the invention.
- measuring a methylation level comprises the methodology described in Fiillgrabe et al. (2023) Nature Biotechnol. (https://doi.org/10.1038/s41587-022-01652-0; see also WO2022/023753) in which methylation-conditional nucleobase modification is combined with particular downstream DNA sequencing steps.
- preferred methods do not include a step of methylation-conditional nucleobase modification (also called nucleobase conversion).
- preferred methods do not include nucleobase conversion. Instead, preferred methods disclosed herein use restriction enzymes which recognise specific sequences in double-stranded DNA and introduce a double-stranded break into the DNA. More specifically, methods and compositions disclosed herein may use MSREs and/or MDREs.
- MSREs and MDREs recognise specific sequences in double-stranded target DNA and introduce a double-stranded break into the target DNA.
- a MSRE cleaves the target DNA only if a CpG associated with its recognition site is unmethylated, and methylation inhibits the cleavage.
- a MDRE cleaves the target DNA only if a CpG associated with its recognition site is methylated.
- DNA digestion with MSREs and/or MDREs provides information about the methylation status of the CpGs within recognition sites present in the target DNA.
- Type II restriction endonucleases i.e., enzymes where the double-stranded break is introduced within the recognition site, are particularly useful in the invention. The use of multiple restriction enzymes permits simultaneous digestion in parallel within a sample.
- MSREs and MDREs are also called ‘restriction sites’. Many MSREs and MDREs, with different restriction sites, are commercially available, so a broad coverage of CpG sites across a genome can be obtained using the appropriate combination of MSREs and/or MDREs. Because broad genomic coverage can be obtained, the use of MSREs and/or MDREs in the invention is particularly preferred, with the use of MSREs being most preferable.
- cfDNA from the sample is digested with MSRE(s). In some embodiments, cfDNA from the sample is digested with MDRE(s). In some embodiments, cfDNA from the sample is digested with MSRE(s) and MDRE(s). Use of MSRE(s) without any MDRE is preferred, and use of a combination of two or more MSREs is preferred, as discussed below. In embodiments involving DNA digestion, enzymes and DNA are typically incubated for a long enough period for substantially complete digestion to occur i.e., further incubation does not lead to any measurable increase in DNA cleavage.
- digestion can be performed if desired e.g., 3 hours, 4 hours, or longer (e.g., overnight). In some embodiments, digestion is performed for 11 hours or less e.g., for between 2-10 hours, 2-9 hours, 2-8 hours, or 2-4 hours. In other embodiments (e.g., where a collection tube is used, as discussed herein) digestion may be performed for longer periods e.g., for 12 hours or more.
- Allowing a digestion reaction to substantially proceed to completion provides information about the cleavability of the restriction site of the restriction endonuclease(s) used in the reaction. For example, if a particular restriction site in a particular DNA molecule is not cleaved after complete digestion, then it can be inferred that the locus in that molecule was not cleavable. Lack of cleavage of a MSRE restriction site thus indicates that a CpG sequence which is within or overlaps with that MSRE recognition sequence was methylated, while cleavage indicates that it was unmethylated.
- restriction enzymes can be inactivated by heating (e.g. to 65°C or 85°C) e.g., by immersing the reaction mixture in a water bath, or by subjecting the mixture to a raised temperature within a thermal cycler which can be used for subsequent PCR.
- Digestion reaction mixtures with cfDNA tend to have a low volume such that the temperature of the whole reaction mixture reaches the elevated temperate very quickly, leading to inactivation of the enzymes. In some embodiments heating at this temperature occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes.
- the temperature can exceed the temperature required for inactivation if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity toward cleavable target cfDNA molecules under the digestion conditions employed prior to heating can no longer be measurably detected.
- Preferred methods do not use restriction enzyme isoschizomers, where one of the enzymes recognizes both the methylated and unmethylated forms of the restriction site while the other recognizes only one of these forms.
- Preferred methods do not use a mixture of restriction enzymes in which at least one enzyme has a recognition sequence which includes a CpG but which is neither a MSRE or a MDRE i.e., an enzyme which digests regardless of the CpG methylation status.
- MSREs and MDREs are readily available from well-known commercial suppliers, such as ThermoFisher, New England Biolabs, Promega, etc.
- MSREs include, but are not limited to: Aatll, AccII, Acil, Acll, Afel, Agel, Aorl3HI, Aor51HI, Asci, AsiSI, Aval, BceAI, BmgBI, BsaAI, BsaHI, BsiEI, BsiWI, BsmBI, BspDI, BspT104I, BssHII, BstBI, BstUI, CfrlOI, Clal, Cpol, DpnII, EagI, Eco52I, Faul, Fsel, FspI, Haell, HapII, Hgal, Hhal, HinPH, Hpall, Hpy99I, HpyCH4IV, KasI, Mini, MspI, Nael, Narl, NgoMIV, Notl, Nrul, Nsbl, PaeR7I, PluTI, PmaCI, Pmll, Pspl406I,
- MDREs include, but are not limited to: BspEI, BtgZI, FspEI, Glal, LpnPI, McrBC, MspJI, Xhol, Xmal.
- Two preferred MSREs are HinPH and Acil.
- the invention also provides for the use of a plurality of restriction endonucleases, wherein the plurality consists of MSRE and/or MDRE.
- the plurality may include only MSREs, only MDREs, or a mixture of both (e.g. one or more MSRE plus one or more MDRE).
- MSREs it is preferred to work with MSREs, without needing MDREs, and thus the plurality includes two or more MSREs.
- MSREs leads to digested cfDNA in which methylated CpG sites are intact but unmethylated CpG sites are digested.
- a preferred plurality of MSREs includes both HinPH and Acil.
- the markers in the sequence listing include a restriction site for HinPH and/or Acil. This pairing of enzymes covers over 99% of CpG islands in the human genome. With this MSRE pairing it is preferred to include HinPH at an excess (measured in terms of enzymatic units) to Acil, and ideally an excess of at least 1.2:1 e.g. at least 1.5:1, at least 1.75:1, at least 2:1, at least 3:1, at least 4:1, or at least 5:1.
- Ratios between 2:1 and 5:1 are particularly useful with human cfDNA, and an excess of about 4.5 is preferred.
- Digestion can be performed at about 37°C, until completion. Incubation at 37°C for 2 hours is typically adequate for complete digestion with HinPH and Acil.
- HinPH (sometimes known as Hin6I) recognises the sequence GCGC and cleaves after the first G to leave a two nucleotide 5' overhang (5 -G/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes.
- NEB recommends the use of its rCutSmartTM buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9).
- 1 unit of HinPH is defined as the amount of enzyme required to digest 1 pg of DNA in 1 hour at 37°C in a total reaction volume of 50 pL.
- Acil recognises the sequence CCGC and cleaves after the first C to leave a two nucleotide 5' overhang (5'-C/CGC). It cuts well at 37°C and can be heat-inactivated by heating at 65°C for 20 minutes.
- NEB recommends the use of its rCutSmartTM buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 pg/mL recombinant albumin, pH 7.9).
- 1 unit of Acil is defined as the amount of enzyme required to digest I pg of A DNA in 1 hour at 37°C in a total reaction volume of 50 pL. Its recognition site is non-palindromic.
- a DNA is a commonly used DNA substrate extracted from bacteriophage lambda (cI857ind 1 Sam 7), being 48502bp long. It is usually stored in 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, and is widely available from commercial suppliers e.g., from NEB under catalogue number N3011S.
- HinPlI and Acil can both be inactivated by heating at 65 °C. In some embodiments heating at this temperate occurs for longer than 15 minutes, and ideally occurs for at least 20 minutes e.g., for 20-60 minutes. The temperature can exceed 65°C if desired, but this is not required. This heating step is adequate for complete inactivation of the restriction enzymes i.e., such that the enzymes’ digestion activity which was present during cfDNA digestion can no longer be measurably detected even when cleavable target molecules are present.
- the marker loci disclosed herein contain differentially methylated CpG sites located within recognition site(s) of at least one MSRE and/or MDRE, differences in methylation levels between DNA sources result in differences in the degree of digestion, and subsequently different amplification patterns in subsequent amplification and quantification steps. Such differences enable distinguishing between DNA from different sources, for example, between DNA samples from subjects with lung cancer and DNA samples from healthy subjects.
- methods disclosed herein may include a step of amplification (e.g., PCR) performed on the digested cfDNA.
- this amplification will be targeted to the marker(s) of interest.
- upstream and downstream primers are used which flank a CpG site of interest in the marker, and the intervening CpG-containing sequence will be amplified if it has not been digested by restriction enzymes.
- the resulting amplicons can then be detected e.g., using a labelled probe which is complementary to a sub-sequence within the amplicons of interest.
- Methods may therefore include a step of adding PCR reagents after digestion e.g., suitable buffer/salt components (if required in addition to buffer/salt remaining from digestion), a DNA polymerase (such as a Taq polymerase), dNTPs, primers and (optionally) probes.
- suitable buffer/salt components if required in addition to buffer/salt remaining from digestion
- a DNA polymerase such as a Taq polymerase
- dNTPs primers and (optionally) probes.
- one or more of these components may be present during digestion e.g., it is possible to use a hot start PCR protocol, such that PCR reagents are already present during the digestion step but they do not become active until the reaction mixture is heated (e.g. during heat inactivation of the restriction enzymes).
- PCR Restriction digestion typically takes place in the presence of high levels of Mg ++ .
- PCR usually relies on Mg ++ , so standard PCR buffers include Mg ++ . In this situation, however, addition of a standard PCR buffer can lead to an excess of Mg ++ which can inhibit efficiency of amplification. Thus added PCR reagents may include a lower level of Mg ++ than would normally be the case.
- PCR primers and probes are present during MSRE digestion, they should be designed so that their sequences do not include the recognition site for the MSRE(s) which is/are being used.
- Amplification and detection of amplicons may be carried out by conventional PCR using fluorescently-labeled primers followed by capillary electrophoresis of amplification products.
- the amplification products are separated by capillary electrophoresis and fluorescent signals are quantified.
- An electropherogram plotting the change in fluorescent signals as a function of size (bp) or time from injection may be generated, wherein each peak in the electropherogram corresponds to the amplification product of a single locus.
- the peak's height (provided for example using ‘relative fluorescent units’, rFU) may represent the intensity of the signal from the amplified locus.
- Computer software may be used to detect peaks and calculate the fluorescence intensities (peak heights) of a set of loci whose amplification products were run on the capillary electrophoresis machine, and subsequently the ratios between the signal intensities.
- a preferred PCR technique is real-time PCR (also known as qPCR), in which simultaneous amplification and detection of the amplification products are performed.
- Real-time PCR can be used with non-specific detection or sequence-specific detection.
- Non-specific detection e.g., using a dsDNA-binding dye, such as SYBR Green
- SYBR Green can be used within the methods disclosed herein, but is not ideal if it is desired to distinguish between multiple different amplicons in the same reaction.
- sequence-specific detection and methods and compositions may use a labelled oligonucleotide probe (usually with a fluorophore and fluorescence quencher on the same probe, as in the TaqMan system) which is complementary to a specific sequence within nucleic acid amplicon(s) of interest.
- a labelled oligonucleotide probe usually with a fluorophore and fluorescence quencher on the same probe, as in the TaqMan system
- Different probes for amplicons derived from different target CpGs can be labelled with different fluorophores so that multiple different amplicons can be distinguished.
- Real-time PCR may thus be achieved by using a hydrolysis probe based on combined reporter and quencher molecules.
- oligonucleotide probes have a fluorescent moiety (fluorophore) attached to their 5' end and a quencher attached to the 3' end.
- the polynucleotide probes selectively hybridize to their target sequences on the template, and as the polymerase replicates the template it also cleaves the polynucleotide probes due to the polymerase’s 5'-nuclease activity.
- the polynucleotide probes are intact, the close proximity between the quencher and the fluorescent moiety normally results in a low level of background fluorescence.
- the quencher When the polynucleotide probes are cleaved, the quencher is decoupled from the fluorescent moiety, resulting in an increase of intensity of fluorescence.
- the fluorescent signal correlates with the amount of amplification products, i.e., the signal increases as the amplification products accumulate.
- Suitable fluorophores include, but are not limited to, fluorescein, FAM, lissamine, phycoerythrin, rhodamine, Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX, JOE, HEX, NED, VIC and ROX.
- Suitable fluorophore/quencher pairs are known in the art, including but not limited to: FAM- TAMRA, FAM-BHQ1, Yakima Yellow-BHQl, ATTO550-BHQ2 and R0X-BHQ2. Fluorescence may be monitored during each PCR cycle, providing an amplification plot showing the change of fluorescent signals from the probe(s) as a function of cycle number. In the context of real-time PCR, the following terminology is used:
- ‘Quantification cycle’ refers to the cycle number in which fluorescence increases above a threshold, set automatically by software or manually by the user.
- the threshold may be constant for each CpG locus of interest and may be set in advance, prior to carrying out the amplification and detection. In other embodiments, the threshold may be defined separately for each CpG locus after the run, based on the maximum fluorescence level detected for this locus during the amplification cycles.
- Theshold refers to a value of fluorescence used for Cq determination.
- the threshold value may be a value above baseline fluorescence, and/or above background noise, and within the exponential growth phase of the amplification plot.
- Baseline refers to the initial cycles of PCR where there is little to no change in fluorescence.
- Computer software is readily available for analysing amplification plots and determining baseline, threshold and Cq.
- Primers may vary in length, depending on the particular assay format and the particular needs.
- the primers may be at least 15 nucleotides long, such as between 15- 25 nucleotides or 18-25 nucleotides long.
- the primers may be adapted to be suited to a chosen amplification system.
- Primers may be designed to generate amplicons between 60-150 bp long (when the relevant CpG site(s) is/are intact) e.g. between 70-140 bp long.
- Oligonucleotide probes may vary in length. In some embodiments, the probes may include between 15-30 nucleotides, from 20-30 nucleotides, or from 25-30 nucleotides.
- the oligonucleotide probes may be designed to bind to either strand of the double-stranded amplicons. Additional considerations include the melting temperature of the probes, which should preferably be comparable to that of the primers. Where multiple CpG sites are analysed in parallel, with simultaneous amplification of more than one target in the same reaction mixture (co-amplification) using different primer pairs for each CpG site of interest, these different primers may be designed such that they can work at the same annealing temperature during amplification. Thus, primers with similar melting temperature (Tm) can be designed e.g. within + 3°-5°C of each other. Similar considerations apply where multiple probes are used.
- Tm melting temperature
- Methods disclosed herein may include a step of DNA sequencing, such as a step using nextgeneration sequencing (‘NGS’) techniques (also known as high-throughput sequencing).
- NGS generally involves three basic steps: library preparation; sequencing; and data processing.
- Examples of NGS techniques include sequencing-by-synthesis and sequencing -by-ligation (employed, for example, by Illumina Inc., Life Technologies Inc., PacBio, and Roche), nanopore sequencing methods and electronic detection-based methods such as Ion TorrentTM technology (Life Technologies Inc.).
- NGS may be performed using various high-throughput sequencing instruments and platforms, including but not limited to: NovaseqTM, NextseqTM and MiSeqTM (Illumina), 454 Sequencing (Roche), Ion ChefTM (ThermoFisher), SOLiD® (ThermoFisher) and Sequel IITM (Pacific Biosciences).
- Appropriate platform-designed sequencing adapters are used for preparing the sequencing library, and are readily available from the platforms’ manufacturers.
- Sequencing adapters typically include platform-specific sequences for fragment recognition by a particular sequencer e.g. sequences that enable ligated molecules to bind to the flow cells of Illumina platforms (e.g. the P5 and P7 sequences). Each sequencing instrument provider typically sells a specific set of sequences for this purpose. Further details of library preparation are discussed below.
- Sequencing adapters can include sites for binding to a universal set of PCR primers. This permits multiple adapter-ligated DNA molecules to be amplified in parallel by PCR, using a single set of primers.
- Sequencing adapters can include sample indices, which are sequences that enable multiple samples to be combined, and then sequenced together (i.e. multiplexed) on the same instrument flow cell or chip. Each sample index, typically 6-10 nucleotides, is specific to a given sample and is used for de-multiplexing during downstream data analysis to assign individual sequence reads to the correct sample. Sequencing adapters may contain single or dual sample indexes depending on the number of libraries combined and the level of accuracy desired. Sequencing adapters can include unique molecular identifiers (UMIs) to provide molecular tracking, error correction and increased accuracy during sequencing. UMIs are short sequences, typically 5 to 20 bases in length, used to uniquely identify original molecules in a sample library. As each nucleic acid in the starting material is tagged to provide a unique molecular barcode, bioinformatics software can filter out duplicate reads and PCR errors with a high level of accuracy and report unique reads, removing the identified errors before final data analysis.
- UMIs unique mole
- sequencing adapters include both a sample barcode sequence and a UMI.
- sequencing adapters allow for paired-end sequencing.
- compositions and methods disclosed herein use Y-shaped sequencing adapters i.e., adapters consisting of two single-stranded oligonucleotides which anneal to provide a double-stranded stem and two single-stranded ‘arms’.
- compositions and methods disclosed herein use hairpin sequencing adapters i.e., a single-stranded oligonucleotide whose 5' and 3' termini anneal to provide a double-stranded stem.
- the double-stranded stem can include a short single-stranded overhang e.g., a single A or T nucleotide.
- the double-stranded stem can be ligated to a cfDNA fragment, to prepare a sequencing library.
- Suitable sequencing adapters for use in the compositions and methods disclosed herein may thus be TruSeqTM or AmpliSeqTM or TruSightTM adapters (for use on the Illumina platform) or SMRTbellTM adapters (for use on the PacBio platform).
- sequencing adapters are added by ligation, this usually occurs at both ends of the DNA to be sequenced.
- Restriction digestion can leave blunt-ends, but typically produces a single-stranded overhang.
- Library preparation steps can either preserve this overhang (i.e., add complementary nucleotides) or remove it.
- sequence of a post-digestion terminal single-stranded overhang can include useful information then it is preferred to add sequencing adapters in a way which preserves the overhang e.g. using enzymatic ligation in which a ligase enzyme covalently links a sequencing adapter to a DNA fragment where the terminal sequence of the adapter is complementary to the terminal sequence obtained using the restriction enzyme, or by using a polymerase to add complementary nucleotides and generate a blunt-ended fragment.
- end repair methods can be carried out before adapter ligation can ensure that DNA molecules contain 5' phosphate and 3' hydroxyl groups.
- dAMP deoxyadenosine 5 '-monophosphate
- the chelating agent can be added to provide an amplification reaction mix comprising the chelating agent and a divalent cation at a molar ratio of between 1:20 to 2:1.
- the reaction mix may include 8-20 mM Mg ++ e.g., about 10 mM magnesium.
- amplification may be carried out in a reaction mix comprising between 3-4 mM chelating agent and 4 mM Mg ++ .
- the chelating agent may comprise one or both of EDTA and EGTA.
- the prepared DNA molecules can be sequenced, to provide a plurality of ‘sequence reads’.
- Sequence reads from DNA sequencing are then subjected to data processing e.g., to remove sequences which do not fulfil desired quality criteria, to remove duplicates, to correct sequencing errors, to map sequences onto a reference genome, to count the number of sequence reads, etc.
- Computer software is readily available for performing these steps.
- the sequencing may be single-read sequencing or paired-end sequencing. Paired-end sequencing is preferred. In single-read sequencing, individual DNA strands are sequenced from one end. In paired-end sequencing, individual DNA strands are sequenced from both ends of the strand. Paired-end sequencing produces paired-end reads, wherein a single paired-end read contains a forward read derived from one end of the DNA strand and a reverse read derived from the other end of the DNA strand. The forward and reverse reads may or may not overlap.
- Sequence reads can be mapped to a reference genome i.e., a previously identified genome sequence, whether partial or complete, assembled as a representative example of a species or subject.
- a reference genome is typically haploid, and typically does not represent the genome of a single individual of the species but rather is a mosaic of the genomes of several individuals.
- a reference genome for the methods of the present invention is typically a human reference genome e.g., a complete human genome, such as the human genome assemblies available at the website of the National Center for Biotechnology Information or at the University of California, Santa Cruz, Genome Browser.
- An example of a suitable reference genome for human studies is the GRCh38 major assembly (up to patch pl3).
- Mapping aligns sequence reads to the reference genome, to identify the location of the reads within the reference genome.
- the sequence reads that align are designated as being ‘mapped’.
- the alignment process aims to maximize the possibility for obtaining regions of sequence identity across the various sequences in the alignment, allowing mismatches, indels and/or clipping of some short fragments on the two ends of the reads.
- the number of sequence reads mapped to a certain genomic locus is referred to as the ‘read count’ or ‘copy number’ of this genomic locus. It is not necessary to map all sequence reads which are obtained; indeed, it is not unusual that a portion of sequence reads obtained in any given experiment will not be mappable.
- the forward and reverse reads of a paired-end read will map upstream and downstream of a locus but not overlap.
- the 5' and 3' ends of DNA molecule that gave rise to the paired-end read have been directly sequenced, but the sequence of the intervening DNA can be indirectly sequenced (as the genomic sequence between the mapped regions). So, a ‘sequence alignment’, or simply ‘alignment’ may contain both direct and indirect sequence information.
- the analysis of sequencing data is preferably based on sequence alignments. In embodiments comprising methylation-conditional nucleobase modification, only direct sequence information can be used.
- alignments used in the analysis are less than about 600 bp and more than about 50 bp in length. More preferably, alignments are less than about 500 bp and more than about 100 bp in length, or less than about 400 bp and more than about 100 bp in length.
- Another way of expressing coverage that is useful in the analysis methods is to use ‘HitspanN’, such as ‘HitspanlOO’.
- Any particular CpG site can feature in multiple sequence reads, which can be sequence reads derived from the same original cfDNA molecule and/or from different cfDNA molecules which span the same CpG site. Sequencing is suitably performed such that CpG site(s) of interest is/are seen in at least 100 sequence reads e.g., in at least 200, 300, 400, 500, 600, 700 or more sequence reads.
- genomic locus refers to a specific location within the genome, and may include a single position (a single nucleotide at a defined position in the genome) or a stretch of nucleotides starting and ending at defined positions in the genome.
- the specific position(s) may be identified by the molecular location, namely, by the chromosome and the numbers of the starting and ending base pairs on the chromosome.
- a genomic locus of interest herein contains at least one CpG site.
- the non-methylated CpG sites can be taken as sequencing reads whose 5’ ends map to a site, as sequencing reads whose 3’ ends map to a site, or as the half of the sum of sequencing reads whose 5’ ends or 3’ ends map to a site (see above).
- Sequencing may optionally be preceded by a step of ‘hybrid capture’ (also known as ‘hybridization capture’) to enrich the sample to be sequenced for DNA molecules comprising regions of interest, such as one or more markers of the sequence listing.
- hybrid capture a sample of DNA molecules, such as the prepared DNA molecules of the sequencing library, is captured by allowing the DNA molecules to hybridize with single-stranded oligonucleotide ‘baits’ or ‘probes’ specific for the regions of interest.
- the baits may be immobilized on a solid support to capture the DNA molecules.
- the hybridization is carried out in solution with baits that comprise a tag, allowing the subsequent isolation of the DNA:bait hybrids.
- the baits may be biotinylated and the DNA:bait hybrids isolated by allowing them to bind to the surface of streptavidin-coated magnetic beads.
- RNA baits are preferred because RNA:DNA duplexes hybridize more efficiently and are more stable than DNA:DNA duplexes.
- the prepared DNA molecules are subjected to hybrid capture prior to sequencing e.g. using biotinylated RNA bait molecules specific for one or several markers of the sequence listing (or genomic regions close to or overlapping the sequence listing markers).
- Methods disclosed herein do not require differential adapter tagging of methylated vs. unmethylated DNA molecules.
- the same population of adapters can be used for all molecules.
- the invention also provides various systems and kits.
- a system can comprise computer processor(s) for performing and/or controlling the methods disclosed herein, and/or for processing the results e.g., for performing calculations based on the results.
- Methods which are at least partially computer-implemented are provided.
- a system or kit may comprise: a blood, plasma or serum sample of a human subject; components for carrying out a method disclosed herein on at least one CpG site; and computer software stored on a non-transitory computer readable medium, the computer software being able to direct a computer processor to determine a methylation level for the at least one CpG locus based on the methylation assay.
- the software may also be able to link the methylation level to a diagnostic result or prediction e.g. by comparing one or more methylation level(s) to one or more reference levels to assess the presence of a disease in the subject.
- the computer software may receive data from a qPCR and/or a NGS experiment.
- Components for carrying out a method disclosed herein encompass biochemical components (e.g., enzymes, primers, probes, NTPs, etc.), chemical components e.g., buffers, reagents), and technical components (e.g., a PCR system, such as a real-time PCR system, and equipment such as tubes, vials, plates, pipettes).
- the system may be able to prepare and/or communicate a report to the subject and/or to a healthcare provider of the subject, based on the methylation levels.
- Computer software includes processor-executable instructions that are stored on a non- transitory computer readable medium.
- the computer software may also include stored data.
- the computer readable medium is a tangible computer readable medium, such as a compact disc (CD), magnetic storage, optical storage, random access memory (RAM), read only memory (ROM), or any other tangible medium.
- Computer-related methods and steps described herein are implemented using software stored on non-volatile or non-transitory computer readable instructions that when executed configure or direct a computer processor or computer to perform the instructions.
- Each of the system, server, computing device, and computer described in this application can be implemented on one or more computer systems and be configured to communicate over a network. They all may also be implemented on one single computer system.
- the computer system includes a bus or other communication mechanism for communicating information, and a hardware processor coupled with bus for processing information.
- a computer system also includes a main memory, such as a random-access memory (RAM) or other dynamic storage device, coupled to bus for storing information and instructions to be executed by processor.
- Main memory also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor.
- Such instructions when stored in non-transitory storage media accessible to processor, render computer system into a specialpurpose machine that is customized to perform the operations specified in the instructions.
- a computer system can include read only memory (ROM) or other static storage device coupled to bus for storing static information and instructions for processor.
- ROM read only memory
- a storage device such as a magnetic disk or optical disk, is provided and coupled to bus for storing information and instructions.
- a computer system may be coupled via bus to a display, for displaying information to a computer user.
- An input device including alphanumeric and other keys, can be coupled to bus for communicating information and command selections to processor.
- cursor control such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor and for controlling cursor movement on display.
- Methods disclosed herein may be performed by a computer system in response to the processor executing one or more sequences of one or more instructions contained in main memory. Such instructions may be read into main memory from another storage medium, such as storage device. Execution of the sequences of instructions contained in main memory causes the processor to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
- Suitable storage media include any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion.
- Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
- Storage media are distinct from, but may be used in conjunction with, transmission media.
- Transmission media participates in transferring information between storage media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus.
- the invention also provides a kit comprising: (i) a composition comprising one or more restriction enzymes; and (ii) components for analysing cfDNA which has been digested with the composition.
- these components may be e.g. components for performing PCR, or for preparing a sequencing library from digested cfDNA.
- the kit may include one or more of: (a) a buffer solution e.g.
- a kit may include an instruction manual for carrying out the methods as disclosed herein.
- a kit may include a non-transitory computer readable medium storing a computer software comprising instructions that when executed configure or direct a computer processor to perform the method steps disclosed herein.
- Methods disclosed herein can take advantage of positive and negative controls.
- parallel analysis can be performed on one or more of:
- a DNA control which contains a fully methylated recognition sequence for the restriction enzymes used for digestion If this DNA is digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs).
- a DNA control which contains a fully unmethylated recognition sequence for the restriction enzymes used for digestion If this DNA is not fully digested when a method uses only MSREs, this indicates that the method has not performed correctly (and conversely for MDREs).
- DNA controls can also be used as a reference point for analysis, for checking completeness of digestion, etc. As mentioned above, for instance, if fragments are obtained using MSRE digestion then it can be useful in a downstream NGS experiment to know the expected read count, and one way of obtaining this value is to look at the read count for DNA which does not contain the recognition sequence for the MSRE, or at the read count for DNA which contains the recognition sequence but is fully methylated.
- the DNA control should be similar in size and composition to cfDNA molecules which contain CpG sites of interest.
- synthetic DNA or PCR amplicons or bacterial plasmid DNA as an unmethylated control, these are more useful if they have sizes which are similar to cfDNA (e.g., a long synthetic DNA, or an appropriately-sized restriction fragment prepared from a plasmid).
- Control experiments can be performed internally in a sample, or externally.
- control DNA can be present in a sample already (e.g., cfDNA containing a CpG site which is known to be ubiquitously (un)methylated, or cfDNA which does not contain a recognition sequence for the restriction enzymes being used) and/or can be added (e.g., synthetic DNA, added to cfDNA).
- the control DNA can therefore be processed in combination with the cfDNA, and experiences the same conditions as the cfDNA, and so a method can involve co-amplification of a locus including a restriction site and a control locus.
- control DNA is subjected to the same treatment as the cfDNA but not as part of the same reaction mixture.
- control DNA like cfDNA
- Real-time PCR of suitable control loci can give a result that can be used as a reference point.
- the signals obtained from cfDNA at a CpG site of interest and from control DNA can be compared, and the signal ratio can be used to determine the degree of methylation at a CpG site of interest, because the ratio of signal reflects the ratio of methylation.
- methods disclosed herein can be performed without requiring evaluation of absolute methylation levels at genomic loci, but rather by calculating a signal ratio between the analyzed genomic loci and a control. This contrasts with some conventional methods of methylation analysis for distinguishing between tumor-derived and normal DNA, which require determining actual methylation levels at specific genomic loci.
- the methods disclosed herein can thus eliminate the need for standard curves and/or additional laborious steps involved in determination of absolute methylation levels, thereby offering a simple and cost-effective procedure.
- An additional advantage when using an internal control is that signal ratios are obtained for loci amplified in the same reaction mixture under the same reaction conditions, which can help to eliminate sources of potential error (e.g. the potential for differences between reaction mixtures, such as the concentration of template, enzyme, etc.).
- Methods which use qPCR may therefore involve calculating signal intensity ratios between a CpG site co-amplified after digestion of DNA as disclosed herein, thereby providing a methylation status for the CpG site.
- This methylation level can then be compared to reference levels (e.g., obtained from healthy subjects, or from subjects having a known disease) and, based on the comparison, a diagnostic result can be derived.
- a method may involve: co-amplifying from restriction endonuclease-digested DNA a CpG site and a control locus, thereby generating co-amplification products; determining a signal intensity for each generated co-amplification product; and calculating a ratio between the signal intensities of the co-amplification products of the CpG site and the control locus.
- the ratio between the signal intensities of the co-amplification products may be calculated by determining the quantification cycle (Cq) for each locus and calculating 2 ⁇ Cq contro1 locus ’ Cq CpG Slte) .
- Cq quantification cycle
- CpG Slte the reduction in Cq relative to the control locus
- this value is used as the exponent of 2 to calculate the ratio.
- the difference in Cq for a marker of interest and a control locus (ACq) is at least 2 cycles.
- a numerical value which represents the degree of methylation of that CpG site in a cfDNA sample.
- This value may be expressed in a variety of ways e.g., as a ratio or percentage of the cfDNA molecules that are methylated at a CpG site, or as an intensity of a signal obtained from a particular CpG site, or as the ratio between a CpG site and a control locus, etc.
- PRC2 is a protein complex that methylates histone H3 at lysine 27 (H3K27).
- the constitutive subunits of PRC2 are polycomb protein SUZ12, histone-lysine N-methyltransferase EZH1 or histone-lysine N-methyltransferase EZH2, polycomb protein EED and histone binding protein RBBP4 or histone binding protein RBBP7.
- PRC2 may also comprise, as accessory subunits, zinc finger protein AEBP2 and protein Jumonji (Jarid2); or one of the PCL proteins (PHF1, MTF2 or PHF19), and EPOP or PALI1/PALI2.
- marker loci according to the invention can comprise the genomic loci targeted by PRC2.
- Marker loci according to the invention also comprise genomic loci that are located fewer than 500 bp, 1000 bp, 2000 bp, 5000 bp, 10000 bp or 20000 bp from a genomic locus targeted by PRC2.
- Genomic loci targeted by PRC2 are known in the art and include loci identified as being targeted by any constitutive subunit of PRC2, such as polycomb protein SUZ12, or any accessory subunit of PRC2, such as Jarid2.
- Loci targeted by PRC2 or any of subunits of PRC2 can be identified using methods known in the art, such as, but not limited to, chromatin immunoprecipitation (ChIP) followed by microarray analysis or sequencing.
- Loci targeted by PRC2 may also be identified by inference, based on the H3K27 methylation of associated nucleosomes (loci associated with high levels of H3K27 methylation are likely to be PRC2 targets because PRC2 catalyses this methylation).
- H3K27 methylation can be associated with genomic loci, for instance, by performing ChIP using antibodies that selectively recognise tri-methylated H3K27, (i.e., ‘H3K27me3’, the product of PRC2 methylation) followed by microarray analysis or sequencing.
- H3K27me3 the product of PRC2 methylation
- the H3K27 methylation activity of PRC2 may be involved in the development of lung cancer. Accordingly, the invention also encompasses methods for the treatment or prevention of lung cancer comprising regulating the activity of PRC2.
- the regulating is achieved by contacting at least one subunit of PRC2, such as SUZ12, with a therapeutic compound.
- the at least one subunit contacted may be a constitutive or an accessory subunit.
- the therapeutic compound affects the genomic targeting of PRC2. Additionally or alternatively, the therapeutic compound may regulate the methyltransferase activity of PRC2.
- the therapeutic compound may be able to interact with the methyltransferase active site in EZH2 and/or EZH1 and inhibit methyltransferase activity.
- the therapeutic compound may allosterically regulate the methyltransferase activity of PRC2, for instance, by interacting with EED.
- the invention provides a method for treating or managing lung cancer in a human subject, comprising determining a likelihood of the presence of lung cancer as above, and administering, deciding to administer, or recommending the administration of, a suitable treatment to the subject based on the likelihood.
- the treatment may comprise administration of one or more of: adagrasib, afatinib dimaleate, alectinib, amivantamab, atezolizumab, bevacizumab, brigatinib, capmatinib, carboplatin, cemiplimab, ceritinib, cisplatin, crizotinib, dabrafenib mesylate, dacomitinib, docetaxel, doxorubicin hydrochloride, durvalumab, entrectinib, erlotinib hydrochloride, etoposide, everolimus, famtrastuzumab deruxtecan, gefitinib,
- the type of treatment may be determined by skilled practitioner(s) according to characteristics of the tumor, including the type, stage and grade of the tumor. The type of treatment is determined typically also based on additional factors such as characteristics of the patient.
- composition ‘comprising’ encompasses ‘including’ as well as ‘consisting’ e.g. a composition ‘comprising’ X may consist exclusively of X or may include something additional e. g. X + Y.
- the word ‘substantially’ does not exclude ‘completely’ e.g., a composition which is ‘substantially free’ from Y may be completely free from Y. Where necessary, the word ‘substantially’ may be omitted from the definition of the invention.
- the term ‘between’ with reference to two values includes those two values e.g., the range ‘between’ 10 mg and 20 mg encompasses inter alia 10, 15, and 20 mg.
- a method comprising a step of mixing two or more components does not require any specific order of mixing.
- components can be mixed in any order. Where there are three components then two components can be combined with each other, and then the combination may be combined with the third component, etc.
- the various steps of methods may be carried out at the same or different times, in the same or different geographical locations, e.g., countries, and by the same or different people or entities.
- US Preventative Services Taskforce US Preventative Services Taskforce
- ECS methylation sensitive restriction enzymes followed by standard library preparation and sequencing
- Mapping rate was 99.6%, 99.7% and 85.7% and unique mapping rate was 94.1%, 94.3% and 81.4% for WGS, ECS, and BS samples, respectively. Copy number integrity showed Pearson correlations of 0.9 for ECS and 0.67 for BS. Somatic mutation analysis identified a subset of cases with relatively high cfDNA shedding that were associated with larger tumors, older age and squamous cell carcinoma histology. This subset was further used to identify tumor derived plasma based markers and assess fragmentation with high confidence.
- methylation levels for CpG markers were measured in plasma & lung tissue from healthy subjects, and in plasma & tissue from subjects known to have early-stage lung cancer. Comparisons were performed both at an early sample set (28 controls, 36 lung cancer patients) and a later set (90 controls, 93 cases). Methylation levels were analyzed and compared in various ways. For instance, methylation levels were rank ordered using Student’s t-test, to compare the mean HitSpanlOO in plasma samples taken from lung cancer patients or from healthy controls (and produce a FDR- corrected p-value to compare those means i.e. to indicate how likely it is that the mean HitSpanlOO in the plasma of lung cancer patients and in healthy controls is the same).
- a logistic regression classifier with Lasso regularization was trained on 100,000 loci, and performance was examined by mean AUC using 5-fold cross validation.
- AUC values for a ROC curve were determined to assess a CpG marker’s ability to distinguish lung cancer samples from healthy controls.
- an updated lung cancer atlas was constructed by collecting additional plasma samples from cancer subjects and high-risk individuals without cancer, and processing these as described above. After rigorous filtering, the updated lung cancer atlas includes a total of 79 tumour tissues, 88 normal lung tissues, 89 plasma cancers and 128 plasma controls, that were used for marker development.
- markers of particular interest were found i.e. about 1/1000 of the CpG sites present in the human genome. Around 9000 are hypo-methylated in cancer samples, and the remainder are hyper-methylated. The full list of markers is shown in the sequence listing.
- the markers have at least one of the following properties: (i) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the early sample set; (ii) an AUC well above or below 0.5 for hyper-methylated or hypo-methylated markers in the later sample set; (iii) an AUC well above or below 0.5 for hyper-methylated and hypo-methylated markers in the further analysis; (iv) a p-value ⁇ 0.01 in the t-test.
- a marker did not meet these criteria when comparing all cancer samples to all controls, but it was still selected where it was found to be useful for classification as being informative for identifying only a specific subset of the cancers.
- a machine learning model trained on CpG markers in category (iii) performed with high accuracy in discriminating lung cancer patients from high-risk healthy individuals.
- markers have a low background (z.e. low methylation levels in plasma from healthy patients) which means that they could not have been detected using bisulfite conversion.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des procédés, des systèmes et des kits pour diagnostiquer un cancer du poumon (et en particulier, un cancer du poumon à un stade précoce et/ou un cancer du poumon de haut grade) chez un sujet, stadifier et classer le cancer, évaluer la récurrence d'une maladie post-traitement, surveiller l'efficacité du traitement et fournir un pronostic, par analyse de marqueurs de méthylation de l'ADN dans l'ADN acellulaire provenant d'un échantillon du sujet.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2308414.8 | 2023-06-06 | ||
| GBGB2308414.8A GB202308414D0 (en) | 2023-06-06 | 2023-06-06 | Markers |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024252401A1 true WO2024252401A1 (fr) | 2024-12-12 |
Family
ID=87156940
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IL2024/050563 Pending WO2024252401A1 (fr) | 2023-06-06 | 2024-06-06 | Marqueurs |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB202308414D0 (fr) |
| WO (1) | WO2024252401A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005040399A2 (fr) * | 2003-10-21 | 2005-05-06 | Orion Genomics Llc | Procedes pour quantifier la densite de methylation d'un site d'adn |
| WO2022157764A1 (fr) * | 2021-01-19 | 2022-07-28 | Nucleix Ltd. | Dépistage non invasif du cancer sur la base de changements de méthylation de l'adn |
-
2023
- 2023-06-06 GB GBGB2308414.8A patent/GB202308414D0/en not_active Ceased
-
2024
- 2024-06-06 WO PCT/IL2024/050563 patent/WO2024252401A1/fr active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005040399A2 (fr) * | 2003-10-21 | 2005-05-06 | Orion Genomics Llc | Procedes pour quantifier la densite de methylation d'un site d'adn |
| WO2022157764A1 (fr) * | 2021-01-19 | 2022-07-28 | Nucleix Ltd. | Dépistage non invasif du cancer sur la base de changements de méthylation de l'adn |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202308414D0 (en) | 2023-07-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20210363597A1 (en) | Identification and use of circulating nucleic acids | |
| JP6543569B2 (ja) | 定量的多重メチル化特異的PCR法−cMethDNA、試薬、及びその使用 | |
| KR102210852B1 (ko) | 희귀 돌연변이 및 카피수 변이를 검출하기 위한 시스템 및 방법 | |
| JP2022525890A (ja) | Dna試料のメチル化変化を検出するための方法およびシステム | |
| JP2023550141A (ja) | 制限酵素及びハイスループット配列決定を用いたdna試料におけるメチル化変化の検出 | |
| EP4638781A2 (fr) | Procédés recourant à une amplification préservant la méthylation avec correction des erreurs | |
| CN110741096A (zh) | 用于检测循环肿瘤dna的组合物和方法 | |
| US20250101494A1 (en) | Methods for analyzing cytosine methylation and hydroxymethylation | |
| CN119421958A (zh) | 鉴别癌症的甲基化标志物及应用 | |
| JP2023524067A (ja) | ヌクレアーゼ、ライゲーション、脱アミノ化、dna修復、およびポリメラーゼ反応と、キャリーオーバー防止との組み合わせを用いた、核酸配列、変異、コピー数、またはメチル化変化の特定および相対的定量化のための方法およびマーカー | |
| US20240093302A1 (en) | Non-invasive cancer detection based on dna methylation changes | |
| JP2025522763A (ja) | 異常にメチル化されたdnaの富化 | |
| WO2025029475A1 (fr) | Procédés d'enrichissement de variants nucléotidiques par sélection négative | |
| WO2024252401A1 (fr) | Marqueurs | |
| WO2023227954A1 (fr) | Préparation d'échantillon pour analyse d'adn acellulaire | |
| WO2023228174A9 (fr) | Combinaisons utiles d'enzymes de restriction | |
| WO2023116593A1 (fr) | Procédé d'essai tumoral et application | |
| WO2022262831A1 (fr) | Substance et procédé pour l'évaluation tumorale | |
| US20220307077A1 (en) | Conservative concurrent evaluation of dna modifications | |
| WO2024157256A1 (fr) | Marqueurs de maladie | |
| US20250243550A1 (en) | Minimum residual disease (mrd) detection in early stage cancer using urine | |
| US20250101522A1 (en) | Brca1 promoter methylation in sporadic breast cancer patients detected by liquid biopsy | |
| WO2023089613A1 (fr) | Analyse cpg du génome entier | |
| WO2025207926A1 (fr) | Procédés de désamination sélective utilisant des désaminases sensibles au méthyle | |
| WO2025224260A1 (fr) | Enrichissement de cible |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24818925 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |