WO2013097062A1 - 一种遗传变异检测方法 - Google Patents

一种遗传变异检测方法 Download PDF

Info

Publication number
WO2013097062A1
WO2013097062A1 PCT/CN2011/002244 CN2011002244W WO2013097062A1 WO 2013097062 A1 WO2013097062 A1 WO 2013097062A1 CN 2011002244 W CN2011002244 W CN 2011002244W WO 2013097062 A1 WO2013097062 A1 WO 2013097062A1
Authority
WO
WIPO (PCT)
Prior art keywords
genetic variation
window
sequencing
sequence
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2011/002244
Other languages
English (en)
French (fr)
Inventor
陈盛培
张春雷
陈芳
谢伟伟
潘小瑜
汪建
王俊
杨焕明
张秀清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI HEALTH SERVICE Co Ltd
Original Assignee
BGI HEALTH SERVICE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI HEALTH SERVICE Co Ltd filed Critical BGI HEALTH SERVICE Co Ltd
Priority to PCT/CN2011/002244 priority Critical patent/WO2013097062A1/zh
Priority to ES11878559T priority patent/ES2741966T3/es
Priority to DK11878559.1T priority patent/DK2772549T3/da
Priority to US14/369,615 priority patent/US20140370504A1/en
Priority to JP2014546264A priority patent/JP5993029B2/ja
Priority to PL11878559T priority patent/PL2772549T3/pl
Priority to CN201180076137.XA priority patent/CN104204220B/zh
Priority to HUE11878559A priority patent/HUE047193T2/hu
Priority to EP11878559.1A priority patent/EP2772549B8/en
Publication of WO2013097062A1 publication Critical patent/WO2013097062A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • the invention relates to the field of detection of s-transmutation, in particular copy number variation, such as micro-deficient/micro-repetition and non-detection. Background technique
  • Copy number variation refers to a submicroscopic mutation in a DNA fragment ranging from kb to Mb, which is manifested by an increase or decrease in copy number.
  • CNV Copy number variation
  • Syndromes are types of diseases in which small fragments are deleted or repeated on human chromosomes, that is, DNA fragment copy number variation, causing complex and variable phenotypes. High incidence in perinatal and neonatal, leading to serious diseases and abnormalities Such as congenital heart disease or cardiac malformation, severe growth retardation, appearance or limb deformity.
  • microdeletion syndrome is one of the main causes of mental development in addition to Down syndrome and X chromosome vulnerability syndrome [Knight SJL (ed): Genetics of Mental Retardation. Monogr Hum Genet. Basel, Karger, 2010, Vol 18, pp 101-113 (DOI: 10.1159/000287600)].
  • micro-deficiency syndrome in the top-ranking mental retardation, cerebral palsy and congenital deafness in the L genetic counseling clinic.
  • microdeletion syndromes include 22qll microdeletion syndrome, feline syndrome, Angelman syndrome, and AZF deficiency.
  • microdeletion syndrome Although the incidence of each microdeletion syndrome is low, the incidence of the more common 22qll microdeletion syndrome, meow syndrome, Angelman syndrome, MiHer-Dieker syndrome is 1:4000 (live birth), 1:50000, 1:10000, 1:12000, but due to the limitations of clinical testing techniques, a large number of patients with microdeletion syndrome can not be detected in prenatal screening and prenatal diagnosis, even when the baby is born for months or even years. After a typical clinical manifestation, when retrospective search for causes, the cause of the disease cannot be diagnosed due to limitations of detection techniques (https: ⁇ decipher.sanger.ac.uk/syndromes).
  • microdeletion syndrome Because some types of microdeletion syndrome cannot be cured, and died within a few months or years after birth, it brings a heavy mental and economic burden to society and families. According to incomplete statistics, the global "Happy Puppet Syndrome," (Angdman The number of patients with syndromes has reached 15,000, and the number of other types of chromosomal microdeletion syndrome has increased year by year. Therefore, chromosomal microdeletion/microrepetition detection of clinically suspected patients and parents with associated adverse maternal history before pregnancy is conducive to providing genetic counseling and providing clinical decision-making basis; early prenatal diagnosis during pregnancy can effectively prevent the birth of the child or Provide a basis for treatment after birth
  • the prenatal diagnosis of microdeletion/microrepetition syndrome is mainly based on the method of inducing fetal amniotic fluid or other tissues for molecular diagnosis.
  • invasive molecular diagnostic methods mainly include high-resolution karyotype analysis, FISH (fluorescence in situ hybridization), Array CGH (comparative genomic hybridization), MLPA (multiple ligation probe amplification) and PCR methods.
  • the genetic diagnosis is based on the FISH test, and most of the chromosome fragments can be detected by ⁇ AM ⁇ r.
  • invasive sampling requires a certain amount of surgery or cell culture, it is suitable as a diagnostic indicator in terms of time efficiency and resource consumption, but not as a method of universal clinical screening.
  • sequencing of sequencing technology in prenatal screening enables genetic variation such as chromosome copy number variation and aneuploidy through high-throughput sequencing, especially for fetuses.
  • the analysis of aneuploidy chromosome variation has been applied more and more widely.
  • the present invention designs a genetic variation based on high-throughput sequencing technology. Screening method, which can detect genetic variation such as copy number variation and aneuploidy, and has the characteristics of high flux, high specificity and accurate positioning.
  • the method of the invention comprises obtaining a test sample and extracting
  • the invention provides a genetic variation detection method, which comprises the following steps:
  • a sequencing sequence is obtained from a test sample, for example, the sequencing sequence fragment may be 25-100 nt in length, and the number of sequencing sequence fragments may be at least 1 million.
  • the genetic variation site in the method of the invention is a median point between the inflection point of the statistic from increasing to decreasing and the next inflection point, and the two genetic variability sites
  • the interval includes at least 50, at least 70, at least 100, preferably 100 window lengths; the above-mentioned site, inflection point, and median point refer to the chromosomal location corresponding to the window corresponding to the statistic, and the starting point, the midpoint, and the end point of the window may be used. Wait for any position to represent.
  • the method of the present invention further comprises the steps of:
  • step 5 is:
  • the difference between the two numerical populations consisting of the statistics of the windows contained in the two segments is removed.
  • the difference saliency can be performed, for example, by a run test, and the genetic variation site with the largest saliency value and greater than the preset threshold is removed; the above process is repeated until the run test saliency value of all genetic variability points is less than the pre-predetermined value Set the threshold.
  • the preset threshold used in the above step 5 can be obtained by the following steps:
  • the minimum value is the significance. Threshold.
  • the invention also provides a genetic variation detection method, comprising the steps of:
  • the method of the fundamental invention obtains a genetic variation site on a reference genomic sequence
  • step 2) of the above step of confidence selection is:
  • the segment is a segment missing, and if it is greater than the second threshold, the segment is a segment repetition.
  • the first threshold value is a cumulative probability that the statistic appears at less than or equal to 0.1, preferably at a value less than or equal to 0.01, most preferably at a value of 0.05, and/or the second threshold may be
  • the cumulative probability of occurrence of the statistic is the value of the statistic at greater than or equal to 0.9, preferably greater than or equal to 0.99, and most preferably at 0.95.
  • the present invention also provides a computer readable medium carrying a series of genetic testing methods that can perform the present invention.
  • the invention also provides a method for detecting fetal genetic variation, which comprises the following steps: obtaining a mother book containing fetal nucleic acid;
  • the parent is parental blood.
  • the advantages of the present invention mainly include the following:
  • FIG. 1 is a schematic flow chart of genetic variation analysis of a chromosome according to an embodiment of the present invention.
  • Figure 2A shows the staining of the S67 «t karyotype.
  • Fig. 2B is a staining diagram of S10.
  • Fig. 2C is a staining diagram of S14.
  • Figure 2D is a stained karyotype of S18.
  • Figure 2E is a chromatogram of the staining word of S49.
  • Figure 2F is a chromatogram of the staining word of S55.
  • Figure 2G shows the chromatogram of the stained word of S82.
  • Figure 2H shows the stained «word karyotype of S103.
  • Table 1 is a list of CNV results for each sample of the embodiment
  • Table 2 shows the results of aCGH and karyotype detection for each sample of the examples
  • the test sample is a sample containing a nucleic acid
  • the type of the nucleic acid is not particularly limited, and may be deoxyribonucleic acid (DNA) or ribonucleic acid (RA), preferably DNA.
  • DNA deoxyribonucleic acid
  • RA ribonucleic acid
  • the properties of the test sample are not particularly limited.
  • a genomic DNA sample may be employed, or a portion of the genomic DNA may be used as a test.
  • the source of the test is Not subject to special restrictions.
  • a pregnant woman sample can be used as a test sample, so that a nucleic acid sample containing fetal genetic information can be extracted therefrom, and the genetic information and physiological state of the fetus can be detected and analyzed.
  • maternal samples that may be used in accordance with embodiments of the present invention include, but are not limited to, maternal peripheral blood, maternal urine, pregnant cervix fetal trophoblasts, pregnant women's cervical mucus, fetal 11 nucleated red blood cells. The inventors have found that by extracting nucleic acid samples from the above pregnant women samples, the genetic variation in the fetal genome can be effectively analyzed to achieve prenatal diagnosis or detection of the fetus without damage.
  • the present invention is capable of performing non-invasive fetal genetic variation detection
  • the sample is peripheral blood of a pregnant woman
  • the method of the present invention is also applicable to invasive detection
  • the sample may be derived from fetal cord blood;
  • the tissue may be placental tissue or chorionic tissue;
  • the cells may be uncultured or cultured amniocytes, villous cells.
  • the subject to be tested and the normal subject are of the same species.
  • the mutation detection of the present invention is not necessarily used for disease diagnosis or related purposes. Because of the existence of polymorphism, the variation of some relative reference genomes does not constitute a disease risk or a health condition, and may be purely genetic. The use of state science research.
  • control sample is relative to the test sample.
  • a control sample refers to a normal sample.
  • test sample is maternal peripheral blood and the corresponding control sample is peripheral blood of a normal mother with a normal fetus.
  • the method and apparatus for extracting a nucleic acid sample from a test sample are also not particularly limited, and may be carried out using a commercially available nucleic acid extraction kit.
  • a reference unique alignment sequence refers to a chromosomal segment having a unique sequence that can be determined to be uniquely located at a single chromosomal location, and the reference unique alignment sequence of the chromosome can be based on a published chromosomal reference genomic sequence such as h g 18 or h g 19 Construct.
  • the process of obtaining a reference unique alignment sequence generally involves cleaving the reference genome into sequences of any fixed length, aligning the sequences back to the reference genome, and selecting the uniquely aligned sequence to the reference genome as a reference unique alignment sequence.
  • the fixed length depends on the sequence length of the sequencing result of the sequencer, and the average length can be specifically referred to.
  • the lengths of sequencing results obtained by different sequencers are different.
  • the length of sequencing results may be different for each sequencing. There is a certain main empirical factor in the selection of this length.
  • the reference unique alignment sequence length selection is based on the actual sequence length of the sequencing result, for example 25-100 bp, and for the illumina/Solexa system, for example 50 bp optional, each window contains a reference unique ratio
  • the number of sequences is controlled between 800,000 and 900,000.
  • the distance between adjacent windows is lkb - 100 kb, preferably 5 kb - 20 kb, more preferably 10 kb. This distance can be based on the tires in the sample The abundance of the DNA is adjusted.
  • each window corresponds to a statistic and a chromosomal location, which means that the distance of the window determines the accuracy of the detection. Fine; ⁇ high, the higher the background of the maternal source, the less likely it is to distinguish the source of genetic variation.
  • the statistic may be the number of sequencing sequences themselves, but is preferably an error corrected (eg, GC corrected) and/or data normalized statistic, with the purpose that the statistic satisfies a common distribution of statistics, such as positive State or standard normal distribution.
  • the normalization process is performed relative to the average number of sequencing sequences of all windows.
  • normalization includes the process of determining the Z value below.
  • the measure is an approximate normal distribution statistic obtained by normalizing the number of sequencing sequences to the window.
  • the normalization is based on the number of average sequencing sequences aligned to all windows.
  • the statistic is a statistic that approximates a standard normal distribution.
  • the sequencing sequence refers to a sequence fragment output by the sequencer, i.e., reads, preferably about 25-100 nt.
  • the DNA molecule may be obtained by a conventional DNA extraction method such as a salting out method, a column chromatography method, a method or an SDS method, and preferably a magnetic bead method.
  • the so-called magnetic bead method refers to the blood, tissue or cell through the action of cell lysate and proteinase K to obtain the exposed DNA molecules, using specific magnetic 3 ⁇ 4 ⁇ DNA molecules for reversible affinity adsorption, rinse solution After washing and removing impurities such as proteins and lipids, the DNA molecules are incubated with a purification solution. It is well known in the art and is commercially available, for example from Tiangen.
  • the purpose of the present invention can be achieved by directly performing sequencing and subsequent stepping of DNA obtained from a sample, and the extracted DNA can be used in subsequent steps without treatment.
  • the electrophoresis main band may be studied only on a fragment of 50-700 bp, preferably 100-500 bp, more preferably 150"300 bp, especially about 200 bp in size.
  • the DNA molecule can be interrupted so that the electrophoresis main band is concentrated in a fragment of a certain size, for example, 50-700 bp, preferably 100-500 bp, more preferably 150-300 bp, especially about 200 bp, and then the subsequent steps are performed.
  • the interrupting treatment may employ enzymatic, nebulized, ultrasonic, or HydroShear methods.
  • ultrasonication such as Covaris' S-series (based on AFA technology, dissolves when the acoustic energy of the sensor passes through the DNA sample) The gas forms bubbles. When the energy is removed, the bubbles rupture and produce the ability to break the DNA molecules. By setting a certain energy intensity and time interval, the DNA molecules can be broken to a certain extent.
  • the specific principle and method See Covaris' S-series instructions).
  • the breakthrough point or candidate breakpoint is a potential or existing genetic variation site which, by convention, appears as a position on the reference genome.
  • the two concepts of the genetic variation site and the breakthrough point are mutually convertible under certain circumstances, and are merely representational differences, and may be used at different stages to indicate the genetic variation that is potentially or determined to exist. Position coordinates on the reference genome.
  • the sequencing sequence obtained from the test sample can be carried out by a sequencing method, which can be performed by any sequencing method, including but not limited to the dideoxy chain termination method; preferably a high-throughput sequencing method, including but not limited to the first Second generation sequencing technology or single molecule sequencing technology.
  • the second generation sequencing platform (Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan; ll(l): 31-46) includes, but is not limited to, Illumina-Solexa (GATM, HiSeq2000TM, etc.), ABI- Solid and Roche-454 (pyrophosphate sequencing) sequencing platforms; Single "Sequencing platforms (technologies) include, but are not limited to, Helicos's single single sequencing technology (TVue Single Molecule DNA sequencing), Pacific Biosciences's single real-time sequencing (single molecule) Real-time (SMRTTM) ), and nanopore sequencing technology from Oxford Nanopore Technologies (Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 2446 (4 ).
  • the sequencing types can be single-end (one-way) sequencing and pair-end (bidirectional) sequencing, and the sequencing length can be 50 bp, 90 bp, or 100 bp.
  • the sequencing platform is Illumina/Solexa
  • the sequencing type is Pair-end sequencing, and a 100 bp DNA sequence molecule having a bidirectional positional relationship is obtained.
  • the sequencing depth of the sequencing can be determined according to the size of the detected fetal chromosomal variation fragment, the sequencing is deep, the detection is high, and the detected missing and repeated fragments are small.
  • the sequencing depth may be 30 ⁇ , that is, the total data amount is 30 times the length of the human genome.
  • the sequencing depth is 0.1 X, that is, 2 times (2.5 ⁇ 10 8 bp). .
  • each sample can be labeled with a different tag sequence for sample differentiation during sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et Al. Error-correcting barcoded primers
  • the human genome reference sequence is a human genome reference sequence in the NCBI database.
  • the genomic sequence is a human genomic reference sequence in the NCBI database (hgl8; NCBI Build 36).
  • the alignment may be an error-tolerant alignment or a mismatch of one base alignment.
  • Sequence alignments can be performed by a sequence alignment program, such as the Short Oligonucleotide Analysis Package (SOAP) and the BWA alignment (Burrows-Wheeler Aligner) available to those skilled in the art.
  • SOAP Short Oligonucleotide Analysis Package
  • BWA alignment Burrows-Wheeler Aligner
  • the sequence of the sequencing sequence on the reference genome is obtained by alignment with the reference genome sequence.
  • the sequence alignment can be performed using the default parameters provided by the program, or can be selected by those skilled in the art as needed.
  • the comparison software employed is SOAPaligner/soap2.
  • the software algorithm is a series of programs developed by the Shenzhen Huayi Institute for detecting fetal copy number variation, collectively referred to as FCAPS. It is capable of correcting, normalizing, and fragmenting data from 3 ⁇ 4 copies and control sets using data generated by next-generation sequencing techniques to estimate the extent and magnitude of fetal copy number variation.
  • the sequencing sequence is obtained from the test sample for step 1): after extracting plasma DNA from the test sample and the control sample according to the Tiangen DP327-02 Kit operating manual, according to the modified Illumina/Solexa standard
  • the library building process is used to build the database.
  • Illumina's Multiplexing Sample Preparation Guide Part#1005361; Feb 2010
  • Paired-End SamplePrep Guide Part#1005063; 2010
  • the 200 bp DNA molecule itself is added to the linker used for sequencing, and each sample is labeled with a different tag sequence, so that the data obtained by one sequencing can distinguish the data of multiple samples.
  • the second generation sequencing method IUumina/Solexa sequencing (using other sequencing methods such as ABI/SOLiD to achieve the same or similar effects), each sequence to a sequence of fragments of a certain size.
  • step 2) alignment the sequencing sequence of the method of the invention step 1) is compared with the standard human genome reference sequence in the NCBI database by SOAP2, and the sequenced DNA sequence is obtained in the genome. Location information on. To avoid interference with the C V analysis of the repeat sequence, only the sequencing sequences that are uniquely aligned with the human genome reference sequence are selected for subsequent analysis.
  • the step of dividing the window and obtaining the statistics of the window for step 3) comprises the steps of: a) For the test and control samples, open a window of length w on the genomic reference sequence, calculate the GC content of each window and calculate the number of relative sequencing fragments that fall on each window; b) number 3 ⁇ 4i and normalize .
  • GC calibration is performed on the test sample based on the control sample set: because there is a certain GC bias between the sequencing batches, the ⁇ fit ⁇ appears in the high GC or low GC region of the group
  • the copy number deviation is obtained by performing GC 3 ⁇ 4Jt on the sequencing data based on the control sample set to obtain the corrected relative sequencing sequence number in each window, which can remove the bias and improve the accuracy of copy number variation detection. Normalization of the number of relative sequencing sequences corrected in each window: Detection of fetal copy number variation using pregnant mother plasma, fetal variation is difficult to highlight due to maternal DNA background, so standardize to reduce maternal DNA Background noise, amplifying the copy number variation signal in the fetus.
  • the GC correction comprises the steps of: a) replacing the test sample with a control sample, obtaining a sequence aligned to each window and calculating the number of relative sequencing sequences for each window in accordance with the method of the invention; Obtaining a GC relationship between the GC content of the sequencing sequence aligned to each window as a function of the number of relative sequencing sequences of the window; c) for each window, using the test sample to compare the GC to the sequencing sequence within the window > ⁇ * In relation to the above function, the number of relative sequencing sequences of the window of the test sample is corrected to obtain the corrected number of relative sequencing sequences for the window.
  • the step of dividing the window and obtaining the statistics for the window for step 3) includes the steps of:
  • Nb Nb data ⁇ £ and standardization: 1 in the coordinate system with the GC content as the abscissa and the relative sequencing sequence number R as the ordinate, the control sample is linearly fitted to the GC, and the slope ", and the intercept 6," 2 Calculate the number of corrected relative sequencing sequences for each window of the test sample
  • R a x GC , +b, ,
  • the position of the genetic variant site of the test sample obtained in step 4) on the reference genome sequence is performed by the following steps:
  • the length of c, T (the theoretical pole PMt) is the theoretically detectable fragment size.
  • the window size is W
  • the window slide length is S
  • the minimum p value is the termination value of the genomic sequence).
  • the step of confidently selecting a segment between the genetic variant sites is: calculating the fragment between the genetic variation sites on the reference gene column
  • the average value of z l and J is denoted as Z. If the Z of the fragment is less than -1.28, the fragment is a fragment deletion. If it is greater than 1.28, the fragment is a fragment repeat.
  • the swim test is a non-parametric test, and the significance P values of the two ⁇ are evaluated according to the uniform distribution of the elements in the two groups according to two >3 ⁇ 4 ⁇ .
  • the difference is in the process of the test. It will be distinguished, but the fragments at both ends of the breakthrough point will not reach the level of variation. Because the candidate breakthrough point does not distinguish these differences at the beginning of the test, it is necessary to define an N value to ensure that when the number of breakthrough points is N, the experiment can better distinguish these differences, then use this to get The threshold can be more accurate when detecting test samples.
  • the control sample is counted according to steps a) and b), then the Z value in each window conforms to the normal distribution, and -1.28 and 1.28 are the cumulative probability of the normal distribution, respectively. And a quantile of 0.95.
  • -1.28 and 1.28 are the inventors for the present invention.
  • non-invasive fetal CNV screening for a suitable population is advantageous for providing genetic counseling and providing clinical decision-making basis; prenatal diagnosis can effectively prevent the birth of a child.
  • the present invention is applicable to all pregnant women, and the applicable population is merely illustrative of the present invention and is not intended to limit the scope of the present invention. The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, however,
  • Example 1 Detection of fetal large fragment copy number variation in a pregnant woman's plasma, and fetal non-detective mutation detection in 9 pregnant women's plasma
  • the DNA of the above 8 plasma samples was extracted according to the operation procedure of Tiangen DP327-02Kit.
  • the extracted DNA was constructed according to the modified Illumina/Solexa standard library construction process, and the 200 bp DNA molecules were concentrated in the main band.
  • the ends were ligated with the linkers used for sequencing, each sample was labeled with a different tag sequence, and then hybridized to the complementary junction of the flowcell surface.
  • a single-stranded primer is attached to the surface of flowed 1, and the DNA fragment becomes "fixed” on the chip by one end complementary to the primer 51 ⁇ 2 on the surface of the chip; the other end (5, or 3, ) is complementary to another primer near the ⁇ , ⁇ "fixed", formed “bridge”, repeated 30 rounds of amplification, each single molecule was amplified about 1000 times, became a monoclonal DM cluster. Then at
  • a double-end sequencing on IlluminaHiseq2000 yielded a DNA fragment of approximately 50 bp in length.
  • the DNA library size and insert were determined to be about 200 bp by the 2100 Bioanalyzer (Agilent), and the QPCR was accurately quantified and sequenced.
  • Sequencing In this example, the DNA samples obtained from the above 10 plasmas were processed according to the Illumina/Solexa officially published ClusterStation and Hiseb2000 (PEsequencin) instructions, so that each sample obtained about 0.36G data amount for sequencing on the machine. Each sample is distinguished according to the tag sequence.
  • the alignment software SOAP2 obtained from soap.genomics.org.cn
  • the sequenced DNA sequence was aligned with the human genome reference sequence of ⁇ 36 (hgl8; NCBIBuild36) in the NCBI database, and the sequenced DNA sequence was obtained. Localization on the genome.
  • a) Calculate the relative number of sequencing sequences for the test sample: Refer to the unique alignment sequence length to select 50 bp, the statistical reference unique alignment sequence number, and divide the human genome reference sequence into the same reference unique alignment sequence number (840,000) Window, all window sizes are 1Mb, adjacent window distance The distance is S 10kb. The actual number of sequencing sequences falling on each window in step 2 above is counted, wherein the subscripts and the window numbers and sample numbers are respectively represented, and the GC content GC of each window is calculated, and the relative sequencing sequence number is calculated as 3 ⁇ 4/", ,;
  • R, , a, xGC, , +h,
  • Fragment filtering after merging the window In order to further filter the fragment obtained after merging the window, calculate the average value of the fragment, denoted as ⁇ , if the ⁇ of the fragment is less than -1.28 or greater than 1.28, the fragment is the copy number variation. Result J1 ⁇ 2 l. 4) Visualize the results, see Figure 2. Table 1. Example ⁇ Item 0 ⁇ Result List
  • DNA or male, female ⁇ ⁇ i3 ⁇ 4 ⁇ DNA of the same sex as the target to be tested use Cy3, Cy5 fluorescein to label the reference DNA and the test DNA, respectively, and then hybridize with the probe.
  • the ratio of the fluorescence intensity of the DNA to be tested to the reference DNA is 1, which can be understood as the amount of DNA to be tested is equal to the amount of reference DNA. If the ratio is not equal to 1, it indicates that the DNA to be tested is deleted or amplified.
  • the resolution of the various types of Array CGH depends on the spacing and length of the probes on the microarray. Degree. Procedure: The cell culture solution remaining after the G-banding chromosome examination is collected, and the genomic DNA of the sample to be tested and the control sample are extracted.
  • amniotic fluid obtained by puncture was centrifuged for 5 minutes (rotation speed 800 to 1000 rpm), and then inoculated in the inoculation hood. The supernatant is first aspirated and left for examination. 0.5 ml of amniotic fluid and precipitated amniotic fluid cells are left in the centrifuge tube, and the finely divided amniotic cells are collected into a cell suspension, and inoculated into three culture flasks containing the culture solution.
  • the culture will be cultured as a ⁇ ⁇ dioxygen incubator.
  • Adherent cells include epithelioid cells, fibroblast-like cells, and amniotic fluid cells. This is a cell that is morphologically bound between epithelioid cells and fibroblasts. All three cells form clones. If they grow well, inoculate them. After 11 ⁇ 14 days, there are more than ten large clones on the bottom of the bottle. The naked eye can also be seen as a floc-like clone on the bottom of the bottle. At this point, you can prepare a slice or a harvest. ⁇ J ⁇ —Day, fresh culture fluid should be replaced to increase nuclear fission.
  • the average harvest is 14 to 20 days after cultivation, and colchicine is added to the culture flask.
  • colchicine 0.04 ng / ml, the cells are stopped in the middle of the sub-eye, cultured for 5 ⁇ 15 hours, Under the inverted microscope, the nucleus of the mitotic phase can be smashed, and the cells are round and large, bright as a piece of pearl, and connected to each other.
  • the amount of colchicine can vary from laboratory to laboratory.
  • Blowing the film Centrifuge the supernatant, add 0.5ml to make the cell suspension, or absorb the supernatant, add 0.5ml of new fixed solution, carefully blow with a thin glass tube and then take a drop, drop On the glass piece taken out of the water, gently blow it off, leave the slide in the air, and observe the chromosome dispersion under the microscope, then continue to blow the film.
  • the dried slides can be directly stained with Giemsa.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明公开了一种遗传变异检测方法,包括如下步骤:从测试样本获得测序序列;将所述测序序列与参考基因组序列进行比对;将所述参考基因组序列划分窗口,统计比对至各窗口的测序序列数目,基于所述测序序列数目得到各窗口的统计量;对于一段参考基因组序列,基于其上所有窗口的统计量在该段参考基因组序列上的变化,获得遗传变异位点。

Description

一种遗传变异检测方法 技术领域
本发明涉 s t传变异检测领域,特别是拷贝数变异,例如微缺^ /微重复及 非 性的检测。 背景技术
拷贝数变异 ( Copy number variation, CNV )是指 DNA片段范围从 kb到 Mb的亚微观突变, 表现为拷贝数增加或减少。 拷贝数变异和疾病之间关系的 研究已经有很长的历史。 对于一些胚系突变拷贝数变异(即父母均没有, 胎儿 由于自身变异而产生的拷贝数变异), 有观点认为, 片 大, 越容易发生先 天异常, 例如染色体非整倍性 ( aneuploidy )疾病(如 T21、 T18等)和染色 缺失 /微重复综合征都是公认的胚系突变拷贝数变异相关疾病。
人类染色 ^1缺^/微重复综合征 (microdeletion/microduplication
syndromes)是由人类染色体上出现微小片段缺失或重复, 即 DNA片段拷贝数 变异, 引起表型复杂多变的疾病类型, 在围产儿和新生儿中发病率较高, 可导 致严重的疾病和异常, 如先天性心脏病或心脏畸形、 严重的生长发育迟緩、 外 貌或肢体畸形等。 另外, 微缺失综合征也是除唐氏综合征与 X染色体易损综合 征外引起智力发育 的主要原因之一 [Knight SJL (ed): Genetics of Mental Retardation. Monogr Hum Genet. Basel, Karger, 2010, vol 18, pp 101-113 (DOI: 10.1159/000287600)】。近年来, 在出生缺陷发病率统计中排在首位的先天性心 脏病以 L遗传咨询诊断门诊中排在前列的智力低下、 脑瘫和先天性耳聋都与 微缺失综合征有关。 常见的微缺失综合征包括 22qll微缺失综合征、 猫叫综合 征、 Angelman综合征、 AZF缺失等。
尽管每种微缺失综合征发病率 艮低,其中较常见的 22qll微缺失综合征、 猫叫综合征、 Angelman综合征、 MiHer-Dieker综合征等发生率分别为 1:4000 (活产婴儿)、 1:50000、 1:10000、 1:12000, 但由于临床检测技术的限制, 大 量的微缺失综合征患者在产前筛查和产前诊断中无法检出, 甚至在婴儿出生数 月甚至数年后出现典型的临 征后, 回溯性的寻找原因时, 也因检测技术的 限制无法对病因进行确诊(https:〃 decipher.sanger.ac.uk/syndromes )。 由于部 分类型的微缺失综合征无法根治, 在出生后数月或数年内去世, 给社会和家庭 带来沉重的精神和经济负担。据不完全统计,全球"快乐木偶综合征,,(Angdman 综合征)患者已达 1.5万名, 其他类型的染色体微缺失综合征患者数量也在逐 年增加。 因此, 孕前对临床疑似患者和有相关不良孕产史的父母进行染色体微 缺失 /微重复检测,有利于提供遗传咨询和提供临床决策依据;在孕期进行早期 产前诊断可有效防止患儿出生或为患 供出生后的治疗方法提供依据
【Bretelle F, et al.Prenatal and postnatal diagnosis of 22qll.2 deletion
syndrome. Eur J Med Genet. 2010 Nov-Dec;53(6):367-70】。
然而, 由于这类疾病的染色体变异水平微小而无法用常规的临床方法, 例 如染色 ¾型分析方法等(其^ ^率为 10M以上), 检出 [Malcolm
S.Microdeletion and microduplication syndromes. Prenat Diagn. 1996
Dec;16(13):1213-9】。 目前, 针对微缺失 /微重复综合征的产前诊断主要采用有 创胎儿羊水或者其他组织的方法进行分子诊断。 目前, 有创的分子诊断方法主 要有高分辨率染色体核型分析、 FISH (荧光原位杂交)、 Array CGH (比较 基因组杂交)、 MLPA (多重连接探针扩增技术)和 PCR的方法等。 其中, 遗传学诊断以 FISH检查为黄金标准, 可以有 ^AM^r测出大部分染色体片段缺 失。 然而, 由于有创取样需要一定的手术或者细胞培养, 从时间效率和资源消 耗的角度而言, 适合充当诊断指标, 而不适 为一种普适临床筛查的方法。
在微缺失 /微重复综合征的无创筛查方法方面, 也有一些尝试。 例如, 在 2011年 11月发表的一项无创胎儿微缺失综合症检测研究中, 研究者对母亲孕 期血浆进行了高深度测序, 产生了大约 243百万条测序短序列(short reads ), 检测出胎儿从 12pll.22到 12pl2.1的一个 4Mb左右的微缺^ iDavid Peters, et al.Noninvasive Prenatal Diagnosis of a Fetal Microdeletion Syndrome .N Engl J Med 2011; 365:1847-1848】。 但是, 产生如此大的数据量, 无论从资源消耗, 还是时间效率而言, 都是不适合临床使用。
结合上述内容可知, 目前对于染色 缺 微重复综合征的产前检查方法 中,还没有可行的普适筛查方法。本领域中需要一种新的可信的胎儿拷贝数变异 筛查方法, 以对已知的位点进行鉴定, 并对未知的位点进行发现性探索。 发明内容
随着高通量测序技术的不断发展与测序成本的不断降低, 测序技术在产前 筛查方面的研究使得通过高通量测序进行染色体拷贝数变异和非整倍性等遗 传变异,特别是胎儿非整倍性染色体变异歸查,分析得到了越来越广泛的应用。 为了进行遗传变异检测, 本发明设计了一种基于高通量测序技术进行遗传变异 筛查的方法, 该方法可使用拷贝数变异及非整倍性等遗传变异的检测, 具有通 量高、 特异性高、 定位准确的特点。 本发明的方法包括获取测试样品并提取
DNA、 进行高通量测序对获得的数据进行分析, 得出检测结果。
本发明提供了一种遗传变异检测方法, 其包括以下步骤:
1 )从测试样本获得测序序列, 例如, 所述测序序列片段长度可以为 25-100nt, 所述测序序列片段数目可以为至少 1百万条。
2 )将所述测序序列与参考基因组序列进行比对;
3 )将所述参考基因组序列划分窗口, 统计比对至各窗口的测序序列数目, 基于所述测序序列数目得到各窗口的统计量;
4 )对于一段参考基因组序列, 基于其上所有窗口的统计量在该段参考基 因组序列上的变化, 获得两侧窗口的统计量发生显著性变化的位置, 这些位置 即为测试样本遗传变异位点在参考基因组序列上的位置。
在一个实施方案中, 本发明方法中的所述遗传变异位点是所述统计量由递 增变成递减的拐点与下一个同样的拐点之间的中位点, 且两个遗传变异位点之 间包括至少 50, 至少 70, 至少 100, 优选 100个窗口长度; 上述位点、 拐点、 中位点是指统计量所对应的窗口所对应的染色体位置, 可以用窗口的起点、 中 点、 终点等任意位置来代表。
在具体一个实施方案中, 本发明方法还进一步包括步骤:
5 )对遗传变异位点进行筛选, 得到筛选后的遗传变异位点,
例如, 上述步骤 5 )为:
对于每个遗传变异位点至在前遗传变异位点和在后遗传变异位点之间的 两段序列, 统计所述两段序列包含的窗口的统计量组成的两个数值群体的差 异,去除其差异显著性值最大且大于预设阈值的遗传变异位点;重复上述过程, 直至所有遗传变异点的差异显著性值都小于预设阈值,
其中, 所述差异显著性例如可以通过游程检验进行, 去除游程检验显著性 值最大且大于预设阈值的遗传变异位点; 重复上述过程, 直至所有遗传变异点 的游程检验显著性值都小于预设阈值。
在一个实施方案中, 上述步骤 5 ) 中使用的预设阈值可以通过以下步骤获 得 ··
a )用对照样本代替测试样本, 根据本发明的方法得到遗传变异位点; b )对于每个遗传变异位点至在前遗传变异位点和在后遗传变异位点之间 的两段序列, 统计它们包含的窗口的统计量组成的两个数值群体的差异, 去除 所述差异最不显著的遗传变异位点;
C)重复上述步骤 b),直至剩余候选突攻 数等于预期值^, NC = L T , 因组序列的长度, 理论极 fW度 r是理论上能检测到的片段大小, 当窗口 大小均值为 W, 窗口滑动长度为 S, 游程检验的每个群体窗口数为 N时, 理论 极限精 T=W+S*N, 在所有剩余候选突破点的显著性值中, 最小值为所述显 著性阈值。
本发明还提供了一种遗传变异检测方法, 包括步骤:
1 )根本发明的方法得到一段参考基因组序列上的遗传变异位点;
2 )将所 iiit传变异位点之间的片段进行置信选择的步骤。
在本发明的一个实施方案中,
上述步骤 2 )置信选择的步骤为:
i )通过窗口的统计量的分布模式, 计算统计量的分布概率, 并设定阈值; ii )将筛 it^的遗传变异位点之间的片段中窗口的统计量均值与所述阈值 进行比较, 通过比较结果确定遗传位点之间的片 否异常。
在本发明的另一实施方案中, 上述步骤 2 )置信选择的步猓为:
i )通过窗口的统计量的分布模式,计算统计量的分布概率, i殳定第一阈 值和第二阈值;
ϋ )将筛选后的遗传变异位点之间的片段中窗口的统计量均值与所述第一 阈值和第二阈值进行比较,
如果片段中窗口的统计量小于第一阈值, 则该片段为片段缺失, 如果大于 第二阈值, 则该片段为片段重复,
其中, 所述第一阈值为统计量出现的累计概率在小于或等于 0.1处, 优选 在小于或等于 0.01处, 最优选在 0.05处的统计量的值, 并且 /或者所述第二阈 值可以为统计量出现的累计概率为在大于或等于 0.9处,优^大于或等于 0.99 处, 最优选在 0.95处的统计量的值。
本发明还提供了一种计算机可读介质, 承栽一系列可 其可执行 本发明的遗传检测方法。
本发明还提供了一种胎儿遗传变异的检测方法, 其包括一下步骤: 获取含胎儿核酸的母 ^本;
对所述母^^¾行测序;
使用权利要求 1-16任一项所述方法检测遗传变异的步骤。
在本发明的一个实施方案中, 所述母 本为母体外周血。 与目前的遗传变异检测的方法对比, 本发明的优越性主要有一下几点:
( 1 )临床可行性: 我们只使用 5M左右的测序数据, 可检测出 5Mb左右 的 CNV片段。而已报道方法则使用了接近 243M,我们的方法大大的减少了数 据产生的成本和时间。
( 2 )可扩展性: 除了通过增加测序量^卜, 我们可以通过 大对照组数 量来增大精度, 以减轻对起始 DNA量的压力。
( 3 )更稳定, 更加全面: 已报 if L章中, 并无明确指出自身的操作细节, 而本发明设计数据 W¾正, 片段化奈件优选等的各个方面。 附图说明
图 1为本发明一个实施例对染色体进行遗传变异分析的简要流程图。
图 2A为 S67的染色 «t字核型图。
图 2B为 S10的染色^字核型图。
图 2C为 S14的染色^字核型图。
图 2D为 S18的染色^字核型图。
图 2E为 S49的染色 字核型图。
图 2F为 S55的染色 字核型图。
图 2G为 S82的染色 字核型图。
图 2H为 S103的染色«字核型图。 具体实施方式 实施例中表的说明:
表 1为实施例各样本 CNV结果列表;
表 2为实施例各样品的 aCGH与核型检测结果;
表 3为实施例的检测结果与标准核型检测结果。 根据本发明的实施例, 测试样本为含有核酸样本, 核酸的类型并不受特别 限制,可以是脱氧核糖核酸(DNA ),也可以是核糖核酸(R A ),优选 DNA。 本领域技术人员可以理解, 对于 RNA, 可以通过常规手段将其转换为具有相 应序列的 DNA, 进行后续检测和分析。 另外, 测试样本的属性也不受特别限 制。 根据本发明的一些实施例, 可以采用基因组 DNA样本, 也可以釆用由基 因组 DNA的一部分作为测 本。 根据本发明的实施例, 测 ·¾ 本的来源并 不受特别限制。 根据本发明的示例, 可以采用孕妇样本作为测试样本, 从而可 以从其中提取含有胎儿遗传信息的核酸样本, 进而可以对胎儿的遗传信息和生 理状态进行检测和分析。 根据本发明的实施例, 可以使用的孕妇样本的例子包 括但不限于孕妇外周血、 孕妇尿液、 孕妇宫颈胎儿脱落滋养细胞、 孕妇宫颈粘 液、 胎11^核红细胞。 发明人发现, 通过对上述孕妇样本进行提取核酸样本, 能够有效地对胎儿基因组中的遗传变异进行分析, 实现对胎儿无损的产前诊断 或检测。 虽然本发明可以进行无创胎儿遗传变异检测是一种优势, 例如所述样 本是孕妇的外周血, 但是本发明的方法也适用于有创检测, 例如所述样本可以 来自胎儿的脐带血; 所述的组织可以是胎盘组织或绒毛膜组织; 所述的细胞可 以是未培养或培养过的羊水细胞、 绒毛组细胞。 在本发明中, 待测受试者和正 常受试者是同一物种。 同时, 本发明的变异检测并不一定用于疾病诊断或相关 的目的, 因为多态性的存在, 一些相对参考基因组的变异存在并不代束着患病 风险或健康状况, 可以纯粹是遗传多态性科学研究的用途。
在本发明中, 对照样本是相对测试样本而言的。 例如在与疾病检测相关的 方法中, 对照样本是指正常样本。 例如, 在本发明的一个实施方案中, 测试样 本为母体外周血, 相应的对照样本则为怀有正常胎儿的正常母亲的外周血。
根据本发明的实施例, 从测试样本提取核酸样本的方法和设备, 也不受特 别限制, 可以采用商品化的核酸提取试剂盒进行。
在本发明的方法中, 所述窗口具有相同的参考唯一比对序列 (reference unique reads )数目。 参考唯一比对序列是指具有唯一序列的染色体片段, 这 种片段可以确定地定位于单一染色体位置, 染色体的参考唯一比对序列可基于 公开的染色体参考基因组序列例如 hg18或 hg19进行构建。获得参考唯一比对 序列的过程, 一般包括, 将参考基因组切割为任意固定长度的序列, 将这些序 列比对回参考基因组, 选择唯一比对到参考基因组的序列为参考唯一比对序 列。 所述固定长度依测序仪的测序结果序列长度而定, 具体可参考平均长度。 不同测序仪得到的测序结果长度是不同的, 具体每一次测序, 测序结果的长度 也可能不同, 该长度的选取存在一定主 经验因素。
本发明的一个实施例中, 参考唯一比对序列长度选择是根据测序结果的实 际序列长度进行, 例如 25-100bp, 对于 illumina/Solexa系统, 例如可选 50bp, 则每个窗口含有的参考唯一比对序列数目控制在 80万 -90万。在本发明的方法 中, 所述窗口之间可以有重叠或无重叠。 本发明的一个实施例中, 相邻窗口之 间距离 lkb-100kb, 优选 5kb-20kb, 更优选 10kb。 这一距离可根据样本中胎 儿 DNA的丰度进行调整。 调整的原理是每一个窗口对应一个统计量及一个染 色体位置, 也就意味着窗口的距离决定了检测的精度。 精;^高, 母体来源的 背景也越高, 越不容易区分遗传变异的来源。
在本发明的方法中, 所述统计量可以是测序序列数目本身, 但优选经过误 差校正 (例如 GC校正 )和 /或数据标准化的统计量, 目的是统计量满足统计学 的常见分布, 例如正态或标准正态分布。 便于对统计量进行后续的统计分析。 在本发明的一个实施例中, 是相对所有窗口的平均测序序列数目进行标准化处 理。 在本发明的一个实施例中, 标准化包括下文求 Z值的过程。 在一个实施方 案中, 所 计量是对比对至窗口的测序序列数目进行标准化处理得到的近似 符合正态分布的统计量。 在一个实施方案中, 所述标准化是基于比对至所有窗 口的平均测序序列数目。 在一个实施方案中, 所述统计量是近似符合标准正态 分布的统计量。
在本发明中, 测序序列是指测序仪输出的序列片段, 即 reads, 优选约 25-100nt。
在本发明中, 所述 DNA分子的获取可以采用盐析法、柱层析法、 法、 SDS法等常规 DNA提取方法, 优选采用磁珠法。 所谓的磁珠法, 是指血液、 组织或细胞经过细胞裂解液和蛋白酶 K的作用后得到棵露的 DNA分子, 利用 特异性的磁¾^ DNA分子进行可逆性的亲和吸附,经漂洗液清洗除去蛋白质、 脂质等杂质后, 用纯化液将 DNA分子 υ¾珠上 来。 是本领域中公 知的, 可市购获得, 例如从 Tiangen.
在本发明中, 一般情况下, 对于获自样品的 DNA 直接进行测序和后 续步棟已经可以实现本发明的目的, 提取的 DNA可以不需经过处理即用于后 续步骤。 在一些优选实施方案中, 可以仅对电泳主带集中在 50-700 bp, 优选 100- 500bp, 更优选 150"300bp, 特别是约 200bp大小的片段进行研究。 本发 明一些更优选实施方案中, 可以将 DNA分子打断为电泳主带集中在一定大小 的片段, 例如 50-700 bp,优选 100- 500bp, 更优选 150~300bp, 特别是 200bp 附近, 然后进行后续步骤。 所述 DNA分子的随机打断处理可以釆用酶切、 雾 化、 超声、 或者 HydroShear法。 优选地, 采用超声法, 例如 Covaris公司的 S-series (基于 AFA技术, 当由传感器 ^的声 能通过 DNA样品时, 溶解气体形成气泡。 当能量移除后, 气泡破裂并产生断裂 DNA分子的能力。 通过设置一定的能量强度和时间间隔等条件, 可将 DNA分子打断至一定范围 的大小。 例如, 具体原理和方法可以参见 Covaris公司的 S-series说明书)。 在本发明中, 所述的突破点或候选突破点 ( breakpoint ),是潜在或存在的 遗传变异位点, 按照惯例, 该位点表现为参考基因组上的位置。 本发明中, 遗 传变异位点与突破点两个概念之间在特定情况下是可相互转换的, 仅仅是表述 上的不同, 在不同的阶段都可能用以表示潜在在或确定存在的遗传变异在参考 基因组上位置坐标。
本发明中,从测试样本获得测序序列可以采用测序的方法进行,所述测序 可通过任何测序方法进行, 包括但不限于双脱氧链终止法; 优选高通量的测序 方法, 包括但不限于第二代测序技术或者是单分子测序技术。
所述第二代测序平台 ( Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet.2010 Jan;ll(l):31-46 )包括但不限于 Illumina-Solexa ( GATM,HiSeq2000TM等)、 ABI-Solid和 Roche-454 (焦磷酸测序)测序平 台; 单^ "测序平台 (技术)包括但不限于 Helicos公司的真实单 测序技 术 ( TVue Single Molecule DNA sequencing ), Pacific Biosciences公司单^ 实 时测序( single molecule real-time (SMRTTM) ), 以及 Oxford Nanopore Technologies公司的纳米孔测序技术等 ( Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 2446 (4 ).
测序类型可以为 single-end (单向)测序和 Pair-end (双向)测序, 测序 长度可以为 50bp、 90bp、或 100bp。在本发明的一个实施方案中, 所述的测序 平台为 Illumina/Solexa, 测序类型为 Pair-end测序, 得到具有双向位置关系的 100bp大小的 DNA序列分子。
本发明的一个实施方案中, 测序的测序深度可以依据检测的胎儿染色体变 异片段大小确定, 测序深^^高, 检测的灵^ Ml高, 即可检出的缺失和重复 的片■ ^小。 测序深度可以是卜 30 X, 即总数据量为人类基因组长度的 30 倍,例如在本发明的一个实施方案中,测序深度为 0. 1 X,即 2倍( 2. 5 X 108bp )。
当待测的 DNA分子来自多个受试样本时, 每个样本可以被加上不同的标 签序列, 以用于在测序过程中进行样品的区分 (Micah Hamady, Jeffrey J Walker, J Kirk Harris et al. Error-correcting barcoded primers
forpyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol.5 No.3),从而实现同时对多个样品进行测序。标签序列为了区分 不同序列, 但不影响添加标签序列的 DM分子的其他功能。 标签序列长度可以 是 4-12bp。 本发明的一个实施例中, 所述的人类基因组参考序列是 NCBI数据库中的 人类基因组参考序列。 在本发明的一个实施方案中, 所 ¾ 类基因组序列是 NCBI数据库中 ( hgl8; NCBI Build 36 )的人类基因组参考序列。
在本发明中,所述比对可以是不容错比对,也可以是错配 1个碱基的比对。 序列比对可以通过 —种序列比对程序, 例如本领域技术人员可获得的短寡 核苷酸分析包( Short Oligonucleotide Analysis Package, SOAP )和 BWA比对 ( Burrows-Wheeler Aligner )进行, 将测序序列与参考基因组序列比对, 得到 测序序列在参考基因组上的位置。 进行序列比对可以使用程序提供的默认参数 进行, 或者由本领域技术人员根据需要对#^¾:进行选择。 在本发明的一个实施 方案中, 所采用的比对软件是 SOAPaligner/soap2。
本发明中, 所述软件算法是一种由深圳华^因研究院开发针对胎儿拷贝 数变异检测的一系列程序, 统称为 FCAPS。 它能够通过新一代测序技术产生 的数据, 将受¾ 本和对照集合进行数据校正、 标准化和片段化, 估算出胎儿 拷贝数变异的程度和大小。
在本发明的方法的一些具体实施方案中, 对于步骤 1 )从测试样本获得测 序序列: 根据 Tiangen DP327-02 Kit操作手册从测试样本和对照样本提取血浆 DNA后, 按照修改过的 Illumina/Solexa标准建库流程进行建库。 关于构建全 基因组测序 的细节,可以参见测序仪器的厂商例如 Illumina公司所提供的 规程, 例如参见 Illumina 公司 Multiplexing Sample Preparation Guide ( Part#1005361; Feb 2010 )或 Paired-End SamplePrep Guide ( Part#1005063; Feb 2010 ),通过参照将其并入本文。在这个过程中,本身集中于 200bp的 DNA 分子两端被加上测序所用接头, 每个样本被加上不同的标签序列, 从而在一次 测序得到的数据中可以使多个样本得数据区分开, 利用第二代测序方法 IUumina/Solexa测序(用其它测序方法如 ABI/SOLiD能达到相同或相近的效 果), 每个样^ 到一定大小片段的测序序列。
在本发明的方法的一些具体实施方案中, 对于步骤 2 )比对: 将本发明方 法步骤 1 )测序序列与 NCBI数据库中的标准人类基因组参考序列进行 SOAP2 比对, 得到所测序 DNA序列在基因组上的位置信息。 为避免重复序列对 C V 分析的干扰, 只选取与人类基因组参考序列唯一比对的测序序列(reads ), 进 行后续分析。
在本发明的方法的一些具体实施方案中, 对于步骤 3 )划分窗口并获得窗 口的统计量包括步骤: a )对于测试样本和对照样本, 在基因组参考序列上开长度为 w的窗口, 计算每个窗口的 GC含量并计算落在每个窗口上的相对测序序列片段数; b ) 数进行 ¾i并标准化。
在在本发明的方法的一些具体实施方案中, 对测试样本基于对照样本集进 行 GC校正: 因为测序批次间 /内存在一定的 GC偏向性, ^fit^因组中高 GC 或低 GC区域出现拷贝数偏差,对测序数据基于对照样本集进行 GC ¾Jt得到 每个窗口中校正后的相对测序序列数, 可以去除此偏向性, 提高拷贝数变异检 测的精度。 对每个窗口中校正后的相对测序序列数进行标准化: 用怀孕母亲血 浆检测胎儿的拷贝数变异, 由于母亲 DNA背景的影响, 胎儿的变异较难凸显 出来, 所以要通过标准化, 来降低母亲 DNA背景噪音, 放大胎儿中拷贝数变 异信号。 在本发明的一个实施方案中, 所述 GC校正包括步骤: a )用对照样 本代替测试样本, 依照本发明的方法得到比对至各窗口的测序序列并计算各窗 口的相对测序序列数目; b )得到比对至各窗口的测序序列的 GC含量与所述 窗口的相对测序序列数目的函数关系; c )对于每个窗口, 利用测试样本比对 到该窗口内的测序序列的 GC >^*和上述函数关系, 对测试样本的该窗口的相 对测序序列数目进行校正, 得到该窗口的校正的相对测序序列数目。
在本发明的方法的一些具体实施方案中, 对于步骤 3 )划分窗口并获得窗 口的统计量包括步骤:
a ) 计算测试样本和对照样本的相对测序序列数: 对于测¾ 本和对照样本, 在人类基因组参考序列上开长度为 w的窗口, 统计本发明方法步骤 2 ) 中落在每个窗口上的测序序列数 , 其中下标/和_/ 分别代表窗口编号和样本编号, 并计算每个窗口的 GC含量 σ , 计算相对测 序序列数 i?,,; = i0g2f^^, 其中平均测序序列数 F; =丄 ; ,
n b )数据^ £和标准化: ①在 GC含量为横坐标和相对测序序列数 R为纵坐标的坐标系中,将对 照样本的 和 GC 线性拟合, 得斜率《,和截距 6,, ②对于测试样本的每个窗口, 计算校正的相对测序序列数
R , = a x GC , +b, ,
③对于测试样本的每个窗口, 计算统计量^:
Zt = R ― ― mearij )/SDj , 其中 meartj =丄 ^( ― ),
Figure imgf000013_0001
在本发明的方法的一些具体实施方案中, 对于对于步骤 4 ) 中得到测试样 本遗传变异位点在参考基因组序列上的位置通过以下步驟进行:
①初始化:针对每个窗口的端点,如果在该点前后窗口的统计量 Ζ变化趋 势发生改变,且该点与上一个前后窗口的统计量 Ζ变化趋势发生改变的点之间 距离至少 η个窗口 (η为整数 10-500, 优选 50-300, 例如 100 ) , 则该点为候 选突破点 ( Breakpoint ), 比如前后窗口的统计量 Z由递增变成递减的那个拐 点与下一个同样的拐点之间的中点为候选突破点,或者前后窗口的统计量 Z由 递减变成递增的那个拐点与下一个同样的拐点之间的中点为候选突破点 bk (k=l,2, .....,s, s为 X)的整数);
②最优迭代: 为了研究一 列的拷贝数变异或非 性, 将该段 基因^ ^列的所有排 的候选突破点记为 = {W..,W, 每个候选突破点 都存在左右面两个片段, 所述片段即上一个突破点到该突攻、的区域以及该突 破点到下一个突攻、的区域, 将这两个片段中所有窗口的 ^进行检验(例如, 进行游程检验 种非参数检验, 利用两个群体元素混合后的分布均匀状态 此两个群体的差异显著性)所得的 p值( ), 视作 作为突 、的显 著性", 将/ ^最大的候选突破点剔除, 反复此步骤, 直到所有 p值都小于该基 因组序列的终止 值 Pf'„al );
③终止 p值的获得: 在测试过程中, 将以另一对照样本作为测试样本进 行上述步驟 a)至 c)①,对于一 因^列,将该 因 列的所有排 的候选突破点记为 Bc = {bx , b2 , ..., bs } , 每个候选突破点 都存在左右面两个窗口, 将这两个窗口中所有 Ζ, ,进行游程检验所得的 ?值 ( pk ) , 视作 作为突破点 的显著性", 将; ^最大的候选突破点剔除, 合并其左右两个窗口, 直到候选突 破点数等于预期值 Nr ( Nc = LJT, 因 列 c的长度, T (理论极 PMt 度)是理论上能检测到的片段大小, 当窗口大小为 W, 窗口滑动长度为 S, 游 验的每个 ^个数为 N时, 理论极 iW
Figure imgf000014_0001
), 在该候选突攻、 集合中, 最小 p值为该基因组序列的终止 值 )。
在本发明的方法的一些具体实施方案中, 将所述遗传变异位点之间的片段 进行置信选择的步骤为: 对于在参考基因^^列上遗传变异位点之间的片段, 计算该片段中 z l、J的平均值, 记为 Z, 如果片段的 Z小于 -1.28, 则该片段为片段 缺失, 如果大于 1.28, 则该片段为片段重复。
在本发明中, 游^ r验是一种非参数检验, 根据两个 >¾ ^后, 两个群 体中元素的分布均匀情况得到评价这两个^^的显著性 P值。 可参考:
http://support.sas.com/kb/33/092.htmlo
在本发明中, 以对照样本作为测试样本进行试验时, 由于实际中测序或实 验会引起全基因中不同片 ·¾_ίι比对至的测序片段数存在差异, 所以进^^验过 程中, 这些差异就会被区分出来, 只是突破点两端的片段还达不到变异水平而 已。 因为在检验开始时, 候选突破点并不能将这些差异較 著的区分开, 所以 要定义一个 N值, 保证当突破点数为 N值是, 实验可以较好的区分这些差异, 那么在用此得到的阈值去检测测试样本时就可以更精确。
在本发明中, 对于 Z值阈值的确定: 将对照样本按照步骤 a )和 b )统计, 则每个窗口中 Z值符合正态分布, -1.28和 1.28分别是该正态分布中累计概率 0.05和 0.95的分位点。 虽然, 本领域技术人员»需要, 也可以选取 Z值为绝 对值更大和更小的值, 分别对应正态分布中累计概率更大和更小; 但是, -1.28 和 1.28是发明人针对本发明通过大量实验确立的最优选的阈值,在该两个值之 外绝对值更大的阈值会增加检测结果中的假阴 /假阳性率。
本发明方法的一种应用中, 例如对适用人群进行无创胎儿 CNV筛查, 有 利于提供遗传咨询和提供临床决策依据; 进行产前诊断可有效防止患儿出生。 本发明适用 可以是所有 孕妇, 适用人群举例仅用于说明本发明, 而不 应为限定本发明的范围。 下面将结合实施例对本发明的实施方案进行详细描述, 但是本领域 技术人员将会理解, 下列实施例仅用于说明本发明, 而不应视为限定本 发明的范围。
实施例中未注明具体条件者, 按照常规条件或制造商建议的条件进 行。 所用试剂或仪器未注明生产厂商者, 均为可以通过市场获得的常规 产品。 以下括号内为各个试剂或试剂盒的厂家货号。 所使用的测序用的 接头和标签序列来源于 Illumina公司的 Multiplexing Sample Preparation Oligonutide Kit:。
实施例一、 对 1例孕妇血浆进行胎儿大片段拷贝数变异检测, 和对 9例孕 妇血浆进行胎儿非^ ^性变异检测
1. DNA提取:
按照 TiangenDP327-02Kit操作流程提取上述 8例血浆样品(样品编号 见表 1 )的 DNA, 所提取 DNA按照修改后的 Illumina/Solexa标准建库流程 进行建库, 在主带集中于 200bp的 DNA分子两端被加上测序所用接头, 每个 样本被加上不同的标签序列, 然后与 flowcell表面互补接头杂交。 通过 flowed 1表面连接有一层单链引物, DNA片段变成单 通过与芯片表面的引 物5½互补被一端 "固定"在芯片上; 另外一端(5, 或 3, )随 ^附近的另 外一个引物互补, ^ "固定"住, 形成 "桥 (bridge) " , 反复 30轮扩增, 每个单分子得到了约 1000倍扩增, 成为单克隆 DM簇。 然后在
IlluminaHiseq2000上通过双末端测序, 得到长度为约 50bp的 DNA片 列。
具体而言, 将获自上述血浆样品的约 10ng的 DNA, 进行修改后的
Illumina/Solexa标准流程建库, 具体流程参照产品说明书
( http:〃 www.illumina.com/提供的 Illumina/Solexa标准建库说明书)。 经 2100Bioanalyzer (Agilent)确定 DNA文库大小及插入片段为约 200bp, QPCR 精确定量后可上机测序。
2.测序: 本实施例中, 对于获自上述 10 例血浆的 DNA 样本按照 Illumina/Solexa官方公布的 ClusterStation和 Hiseq2000 ( PEsequencin )说 明书进行操作, 使每个样品得到约 0.36G数据量进行上机测序, 每个样本根 据所述标签序列区分。 利用比对软件 SOAP2 (获自 soap.genomics.org.cn ) , 将测序所得 DNA序列与 NCBI数据库中 ^ 36 ( hgl8; NCBIBuild36 )的人 类基因组参考序列进行不容错比对, 得到所测序 DNA序列在所述基因组上的 定位。
3.数据分析
a )对测试样本计算相对测序序列数: 参考唯一比对序列长度选 50bp, 统计参考唯一比对序列的数目, 将人类基因组参考序列上划分为具有相同参考 唯一比对序列数目 (84万)的窗口, 所有窗口大小均值为 1Mb, 相邻窗口距 离为 S=10kb。 统计上述步糠 2中落在每个窗口上的实际测序序列数 , 其中 下标 和 分别代表窗口编号和样本编号, 并计算每个窗口的 GC含量 GC , 计算相对测序序列数 = ¾/",,;
Figure imgf000016_0001
b )数据校正和标准化:
①在 GC含量为横坐标和相对测序序列数 R为纵坐标的坐标系中,将对 照样本的 GC, 线性拟合, 得斜率《,和截距 6, ,
②对于测试样本的每个窗口, 计算校正的相对测序序列数
R, , =a, xGC, , +h,
③对于测试样本的每个窗口, 计算标准化的相对测序序列数 z, :
Zt j y ~ Rj j - j -Λζ·
Figure imgf000016_0002
j, \Ι;^ — iί i=i - -画" ,
c)合并窗口
①初始化:将参考基因组序列上每个窗口的起点位置记录为统计量 Z的位 置。 则对应参考基因组上的染色体位置, Z值有一个变化趋势。 找到 Z值拐点 (即 Z值从增加趋势转化为减少趋势,或者从减少趋势变化为增加趋势的临界 点)所对应的位置。 对于任一染色体, 从第一个窗口的起点开始, 再依次选取 选:^巨离至少为 100个窗口的位置,这些位置记为为候选突破点 bk(k=l»2, .....,s, s为 X)的整数) ( Breakpoint );
②最优迭代:为了研究基因组任意一条染色体的拷贝数变异分析或非整 倍性(本实施例仅研究 1-22号人染色体),将每条染色体的所有排过序的候选 突破点记为 = 每个候选突破点 都存在左右面两个片段, 所述片 段即上一个突破点到该突破点的区域以及该突破点到下一个突破点的区域, 将 这两个片段中所有 z,进行游 验, 所得的 p值(Pt ), 视作 作为突破点 的显著性", 将^最大的候选突破点剔除, 反复此步骤, 直到所有 P值都小于 该染色体的终止 值( p );
③终止 p值的获得: 在测试过程中, 将以对照样本作为测试样本进行上 述步骤 a)至 c)①, 对于染色体 c, 将第 c条染色体的所有排过序的候选突破点 记为 = ..,W, 每个候选突^ 都存在左右面两个窗口, 将这两个窗口 中所有 ^进行游程检验所得的 ;值( Α ),视作" 作为突破点的显著性", 将 最不显著的候选突 、剔除, 直到候选突破点数等于预期值^ ( NC = L T , 是染色体长度, 理论极限精度 r=2Mb ), 在该候选突破点集合中, 最小 p值为 该染色体的终止 p值( prmal ), 见下表;
实施例中使用的相关数值
Figure imgf000017_0001
Figure imgf000018_0001
d )合并窗口后的片段过滤: 为了进一步对合并窗口后获得的片段进行 过滤, 计算该片段中 的平均值, 记为 Ζ, 如果片段的 Ζ小于 -1.28或者大于 1.28, 则该片段为拷贝数变异。 结果 J½ l。 4 )结果可视化, 见图 2。 表 1. 实施例^ 品0\¥结果列表
Figure imgf000018_0002
以下将本发明 CNV分析结果与 CGH芯片结果比较, 比较结果如下表 2所 示。 CGH 芯片结果使用 Human Genome CGH Microarray Kit, ( Agilent Technologies Inc. )
依照提供商的方案获得, 步骤简述如下:
釆用与待测标 ^目同性别的^ DNA或男,女^ ^i¾^ DNA作为参 照 DNA利用 Cy3, Cy5荧光素分别对参照 DNA和待测 DNA进行标记, 然后 与探针进行杂交, 如果待测 DNA与参照 DNA荧光强度之比为 1, 则可以理解 为待测 DNA与参照 DNA量相等,如果比率不等于 1,则表明待测 DNA有缺失 或扩增。 各种不同类型 Array CGH的分辨率取决于微阵列上探针的间距和长 度。 流程: 收集 G显带染色体检查后剩余的细胞培养液,提取待测标本和对照 标本的基因组 DNA。 纯化对待测样本和参照样 行不同的荧光标记, 然后将 标本与阻断非特异杂交的 Cot-IDNA混合, 变性, 预退火, 与微阵列杂交, 最 列^ 上的两种信号的荧光强度比值,反映待测标本基因组 DNA与参照标本基 因组 DNA在相应序列或基因上的拷贝数变化。
表 2. 本发明实施例的检测结果与 CGH芯片结果的比较
Figure imgf000019_0001
以下将本发明 CNV分析结果与标准核型分析结果比较, 比较结果如下表 3所示。 标准核型分析步猓如下:
(1)将穿刺所得羊水离心 5分钟(转速 800~1000转 /分), 而后在接种 罩内进行接种。 先吸出上清液留送其 查, 剩 0.5ml羊水及沉淀的羊水细胞 于离心管内, 打匀沉淀的胎儿脱落细 羊膜细胞成为细胞悬液, 接种入三个 盛有培养液的培养瓶内。
( 2 )将培养 « ^二氧 培养箱。
(3)接种 5~7A^, 羊水内有活力的细胞就贴附在 , 并开始生长, 可用倒装显微镜(inverted microscope)观察细胞生长情况。 如已经贴壁, 可 更换培养液, 加入 3 ~ 5ml新鲜培养液, 以后每 2 ~3天换液一次。 贴壁的细胞 有上皮样细胞, 成纤维样细胞及羊水细胞, 这是一种形态界于上皮样细胞和成 纤维细胞之间的细胞,上述三种细胞都形成克隆,如果生长状态良好,接种 11 ~ 14天后, 瓶底可有十多个大片克隆, 肉眼也可看出瓶底上呈絮片状的克隆, 细 ^^大而圓。 此时可准备制片或称 ( harvest)。 》J^^—天, 应更换新鲜 培养液, 以增加核分裂。
(4)收获: 平均在培养后 14 ~ 20天收获, 在培养瓶内加入秋水仙素
(Colchicine )0.04毫微克 /毫升, 使细胞停止在分 目中期, 培养 5 ~ 15小时, 在倒置显微镜下可 艮多分裂相细胞核, 细胞圆而大, 明亮如一片明珠, 相互 联接。 加秋水仙素的量, 各实验室可不同。
(5)消化(trypsinize)将培养瓶内的培养液倒入离心管内, 在培养 « i^0.02%EDTA胰晦消化液 0.5ml或 0.15%蛋白醉 (Pronase) 0.5ml, 用玻 璃长弯吸管轻轻吹打瓶底之细胞克隆, 倒装显微镜下见克隆细胞已经飘浮, 吸 入离心管,再用 Hank氏液 0.5 ~ lml冲洗并用长吸管继续吹打尚未飘浮之细胞, 使其完全脱落后, 倒入离心管内。 离心 5^ , ^800~1000转/分, 吸去 清液, 细胞备用。
(6)低渗: 上述离心管及细胞内轻^ p入 37'C的 0.075MKC1液 4ml, 用 手指轻弹管底或用尖吸管轻轻开沉淀之细胞, 置 37'C水浴内 16 ^ (各实验 室可根据自己经验高速低渗时间), 离心 5分钟, 吸去上清液, 沿管壁轻轻滴 入新鲜置之固定液(曱醇: 冰醋酸 =3: 1) , 轻轻有指头拍管底, 使细胞均匀 分开, 固定 15分钟后离心, 更换固定液, 第二次固定 30分钟后过液。
(7)吹片: 离心吸去上清液, 留 0.5ml制成细胞悬液, 或吸净上清液, 加 入 0.5ml新配的固定液, 用细 璃管小心吹拍后吸出一滴, 滴在 水中取 出来的玻璃片上, 轻轻吹开, 玻片置空气中干燥后, 在显微镜下看染色体分散 情况, 再继续吹片。 干燥的玻片可直接用 Giemsa染色。
(8)分带: 如果染色体形态良好可做 Giemsa带简称 G带。 先将玻片在 65。C下烤 1小时, 或在 37°C下烤 24小时, 在室温下将玻片放入 0.25%胰酵液 20 ~ 25秒, 过两 理盐水, ;¾^ 2%Giemsa液内 5 ~ 10 ^t7, 取出用流水 冲洗, 空气干 , 即可在显微镜下看染色体, 作核型分析。 表 3. 本实施例的检测结果与标准核型检测结果比较
Figure imgf000020_0001
尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理 解。 根据已经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改 变均在本发明的保护范围之内。本发明的全部范围由所附权利要求及其任何等 同物给出。

Claims

权 利 要 求 书
1、 一种遗传变异检测方法, 其包括以下步骤:
1 )从测试样本获得测序序列;
2 )将所述测序序列与参考基因组序列进行比对;
3 )将所述参考基因组序列划分窗口, 统计比对至各窗口的测序序列数目, 基于所述测序序列数目得到各窗口的统计量;
4 )对于一段参考基因组序列, 基于其上所有窗口的统计量在该段参考基 因组序列上的变化, 获得两侧窗口的统计量发生显著性变化的位置, 这些位置 即为测试样本遗传变异位点在参考基因组序列上的位置。
2、 权利要求 1的方法, 进一步包括步骤:
5 )对遗传变异位点进行筛选, 得到筛选后的遗传变异位点。
3、权利要求 1或 2的方法,其中所述测序序列片段长度为 25-100nt, 优选 35-100nt。
4、 权利要求 1或 2的方法, 其中所述测序序列片段数目为至少 1百 万条。
5、 权利要求 1或 2的方法, 所述窗口具有相同的参考唯一比对序列数目 ( reference unique reads )。
6、权利要求 1或 2的方法, 所述窗口之间有重叠或无重叠。
7、 权利要求 1或 2的方法, 所述统计量是对比对至窗口的测序序列数目 进行标准化处理得到的近似符合正态分布的统计量。
8、 权利要求 7的方法, 所述标准化是基于比对至所有窗口的平均测序序 列数目。
9、 权利要求 1或 2的方法, 所述遗传变异位点是所述统计量由递增变成 递减的拐点与下一个同样的拐点之间的中位点, 且两个遗传变异位点之间包括 至少 50, 至少 70, 至少 100, 优选 100个窗口长度。
10、 权利要求 2的方法, 所述步骤 5 )为:
对于每个遗传变异位点至在前遗传变异位点和在后遗传变异位点之间的 两段序列, 统计所述两段序列包含的窗口的统计量组成的两个数值群体的差 异,去除其差异显著性值最大且大于预设阁值的遗传变异位点;重复上述过程, 直至所有遗传变异点的差异显著性值都小于预设阈值。
11、 权利要求 10的方法, 所述差异显著性通过游程检验进行, 去除游程 检验显著性值最大且大于预设阈值的遗传变异位点; 重复上述过程, 直至所有 遗传变异点的游程检验显著性值都小于预设阈值。
12、 权利要求 10或 11的方法, 通过以下步驟获得所述预设阈值:
a )用对照样本代替测试样本, 根据权利要求 1的方法得到遗传变异位点, b )对于每个遗传变异位点至在前遗传变异位点和在后遗传变异位点之间 的两段序列, 统计它们包含的窗口的统计量组成的两个数值群体的差异, 去除 所述差异最不显著的遗传变异位点;
C)重复上述步骤 b),直至剩余候选突破点数等于预期值^ , NC = L T , 因组序列的长度, 理论极 1¾#度 是理论上能检测到的片段大小, 当窗口 大小均值为 W, 窗口滑动长度为 S, 游程检验的每个群体窗口数为 N时, 理论 极限精 > T=W+S*N, 在所有剩余候选突破点的显著性值中, 最小值为所述显 著性阈值。
13、 一种遗传变异检测方法, 包括步骤:
1 )根据权利要求 1-10任一项的方法得到一段参考基因组序列上的遗传变 异位点;
2 )将所^ f传变异位点之间的片段进行置信选择的步驟。
14、 权利要求 13的方法, 所述步骤 2 )是:
i )通过窗口的统计量的分布模式, 计算统计量的分布概率, 并设定阈值; ii )将筛 的遗传变异位点之间的片段中窗口的统计量均值与所述阈值 进行比较, 通过比较结果确定遗传位点之间的片 否异常。
15、 权利要求 14的方法, 所述步骤 2 )是:
i )通过窗口的统计量的分布模式, 计算统计量的分布概率, ^殳定第一阈 值和第二阁值;
ii )将筛^的遗传变异位点之间的片段中窗口的统计量均值与所述第一 阈值和第二阈值进行比较,
如果片段中窗口的统计量小于第一阈值, 则该片段为片段缺失, 如果大于 第二阈值, 则该片段为片段重复。
16、 权利要求 15的方法, 所述第一阈值为累计概率为 0.05处的统计量的 值, 并 JL/或者所述第二阈值为累计概率为 0.95处的统计量的值。
17 —种计算机可读介质, 承载一系列可执行代码, 其可执行如权利要求 1-16任一项所述的方法。
18、 一种胎儿遗传变异的检测方法, 其包括: 获取含胎儿核酸的母 本;
对所述母 ^¾行测序;
使用权利要求 1-16任一项所述方法检测遗传变异的步骤。
19、 权利要求 18的方法, 所述母体样本为母体外周血。
PCT/CN2011/002244 2011-12-31 2011-12-31 一种遗传变异检测方法 Ceased WO2013097062A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
PCT/CN2011/002244 WO2013097062A1 (zh) 2011-12-31 2011-12-31 一种遗传变异检测方法
ES11878559T ES2741966T3 (es) 2011-12-31 2011-12-31 Método para detectar una variación genética
DK11878559.1T DK2772549T3 (da) 2011-12-31 2011-12-31 Fremgangsmåde til detektering af genetisk variation
US14/369,615 US20140370504A1 (en) 2011-12-31 2011-12-31 Method for detecting genetic variation
JP2014546264A JP5993029B2 (ja) 2011-12-31 2011-12-31 遺伝子変異の検出方法
PL11878559T PL2772549T3 (pl) 2011-12-31 2011-12-31 Sposób wykrywania zmienności genetycznej
CN201180076137.XA CN104204220B (zh) 2011-12-31 2011-12-31 一种遗传变异检测方法
HUE11878559A HUE047193T2 (hu) 2011-12-31 2011-12-31 Módszer genetikai variáció kimutatására
EP11878559.1A EP2772549B8 (en) 2011-12-31 2011-12-31 Method for detecting genetic variation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/002244 WO2013097062A1 (zh) 2011-12-31 2011-12-31 一种遗传变异检测方法

Publications (1)

Publication Number Publication Date
WO2013097062A1 true WO2013097062A1 (zh) 2013-07-04

Family

ID=48696161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/002244 Ceased WO2013097062A1 (zh) 2011-12-31 2011-12-31 一种遗传变异检测方法

Country Status (9)

Country Link
US (1) US20140370504A1 (zh)
EP (1) EP2772549B8 (zh)
JP (1) JP5993029B2 (zh)
CN (1) CN104204220B (zh)
DK (1) DK2772549T3 (zh)
ES (1) ES2741966T3 (zh)
HU (1) HUE047193T2 (zh)
PL (1) PL2772549T3 (zh)
WO (1) WO2013097062A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103993069A (zh) * 2014-03-21 2014-08-20 深圳华大基因科技服务有限公司 病毒整合位点捕获测序分析方法
WO2015184404A1 (en) * 2014-05-30 2015-12-03 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies and copy number variations
CN105986008A (zh) * 2015-01-27 2016-10-05 深圳华大基因科技有限公司 Cnv检测方法和装置
CN107312850A (zh) * 2017-07-19 2017-11-03 华东医药(杭州)基因科技有限公司 一种pcr无效扩增的检测方法
WO2018054254A1 (zh) * 2016-09-22 2018-03-29 上海亿康医学检验所有限公司 一种鉴定样本中肿瘤负荷的方法和系统
CN108410970A (zh) * 2018-03-12 2018-08-17 博奥生物集团有限公司 一种单细胞基因组拷贝数变异的检测方法及试剂盒
US10095831B2 (en) 2016-02-03 2018-10-09 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
US10741269B2 (en) 2013-10-21 2020-08-11 Verinata Health, Inc. Method for improving the sensitivity of detection in determining copy number variations
US11072814B2 (en) 2014-12-12 2021-07-27 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
US11342047B2 (en) 2017-04-21 2022-05-24 Illumina, Inc. Using cell-free DNA fragment size to detect tumor-associated variant
US12437838B2 (en) 2013-01-25 2025-10-07 Sequenom, Inc. Methods and processes for non-invasive analysis of cell-free fetal nucleic acid according to sequence read quantifications for chromosomes 13, 18, and 21

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201404079SA (en) * 2012-01-20 2014-10-30 Bgi Diagnosis Co Ltd Method and system for determining whether copy number variation exists in sample genome, and computer readable medium
CN104520437B (zh) 2013-07-17 2016-09-14 深圳华大基因股份有限公司 一种染色体非整倍性检测方法及装置
CN105734120B (zh) * 2014-12-11 2020-11-27 天津华大基因科技有限公司 检测性发育相关基因变异的方法和试剂盒
CN105354443A (zh) * 2015-12-14 2016-02-24 孔祥军 无创产前基因检测分析软件
CN107480470B (zh) * 2016-06-08 2020-08-11 广州华大基因医学检验所有限公司 基于贝叶斯与泊松分布检验的已知变异检出方法和装置
CN110462063B (zh) * 2017-05-23 2023-06-23 深圳华大生命科学研究院 一种基于测序数据的变异检测方法、装置和存储介质
CN109097457A (zh) * 2017-06-20 2018-12-28 深圳华大智造科技有限公司 确定核酸样本中预定位点突变类型的方法
CN112365927B (zh) * 2017-12-28 2023-08-25 安诺优达基因科技(北京)有限公司 Cnv检测装置
CN109086571B (zh) * 2018-08-03 2019-08-23 国家卫生健康委科学技术研究所 一种单基因病遗传变异智能解读及报告的方法和系统
CN109920485B (zh) * 2018-12-29 2023-10-31 浙江安诺优达生物科技有限公司 对测序序列进行变异模拟的方法及其应用
CN111139303B (zh) * 2020-01-03 2022-07-05 西北农林科技大学 一种山羊cadm2基因cnv标记辅助检测生长性状的方法及其应用
CN113436683B (zh) * 2020-03-23 2024-08-16 北京合生基因科技有限公司 筛选候选插入片段的方法和系统
CN111429966A (zh) * 2020-04-23 2020-07-17 长沙金域医学检验实验室有限公司 基于稳健线性回归的染色体拷贝数变异判别方法及装置
CN113299342B (zh) * 2021-06-17 2024-03-15 苏州贝康医疗器械有限公司 基于芯片数据的拷贝数变异检测方法及检测装置
EP4397773A4 (en) * 2021-08-30 2025-09-10 Guangzhou Burning Rock Dx Co Ltd METHOD FOR DETECTING VARIATION IN COPY NUMBER AND ITS APPLICATION
WO2025208288A1 (zh) * 2024-04-01 2025-10-09 深圳华大生命科学研究院 基因预测方法、装置、计算机设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101555528A (zh) * 2008-09-28 2009-10-14 南京市妇幼保健院 一种染色体22q11.2区微缺失、微重复的测定方法
WO2010001419A2 (en) * 2008-07-04 2010-01-07 Decode Genetics Ehf Copy number variations predictive of risk of schizophrenia
WO2010042716A1 (en) * 2008-10-08 2010-04-15 The Children's Hospital Of Philadelphia Genetic alterations associated with type i diabetes and methods of use thereof for diagnosis and treatment
WO2010054284A1 (en) * 2008-11-10 2010-05-14 Signature Genomic Laboratories, Llc Interactive genome browser
WO2010057132A1 (en) * 2008-11-14 2010-05-20 The Children's Hospital Of Philadelphia Genetic alterations associated with schizophrenia and methods of use thereof for the diagnosis and treatment of the same
WO2010075459A1 (en) * 2008-12-22 2010-07-01 Celula, Inc. Methods and genotyping panels for detecting alleles, genomes, and transcriptomes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010001419A2 (en) * 2008-07-04 2010-01-07 Decode Genetics Ehf Copy number variations predictive of risk of schizophrenia
CN101555528A (zh) * 2008-09-28 2009-10-14 南京市妇幼保健院 一种染色体22q11.2区微缺失、微重复的测定方法
WO2010042716A1 (en) * 2008-10-08 2010-04-15 The Children's Hospital Of Philadelphia Genetic alterations associated with type i diabetes and methods of use thereof for diagnosis and treatment
WO2010054284A1 (en) * 2008-11-10 2010-05-14 Signature Genomic Laboratories, Llc Interactive genome browser
WO2010057132A1 (en) * 2008-11-14 2010-05-20 The Children's Hospital Of Philadelphia Genetic alterations associated with schizophrenia and methods of use thereof for the diagnosis and treatment of the same
WO2010075459A1 (en) * 2008-12-22 2010-07-01 Celula, Inc. Methods and genotyping panels for detecting alleles, genomes, and transcriptomes

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
"Monogr Hum Genet", vol. 18, 2010, KARGER, article "Genetics of Mental Retardation", pages: 101 - 113
BRETELLE F ET AL.: "Prenatal and postnatal diagnosis of 22q 11.2 deletion syndrome", EUR J MED GENET., vol. 53, no. 6, November 2010 (2010-11-01), pages 367 - 70
DAVID PETERS ET AL.: "Noninvasive Prenatal Diagnosis of a Fetal Microdeletion Syndrome", N ENGL J MED, vol. 365, 2011, pages 1847 - 18481
MALCOLM S.: "Microdeletion and microduplication syndromes", PRENAT DIAGN., vol. 16, no. 13, December 1996 (1996-12-01), pages 1213 - 9 1
METZKER ML.: "Sequencing technologies-the next generation", NAT REV GENET., vol. 11, no. 1, January 2010 (2010-01-01), pages 31 - 46
MICAH HAMADY; JEFFREY J WALKER; J KIRK HARRIS ET AL.: "Error-correcting barcoded primers forpyrosequencing hundreds of samples in multiplex", NATURE METHODS, vol. 5, no. 3, March 2008 (2008-03-01)
RUSK, NICOLE: "Cheap Third-Generation Sequencing", NATURE METHODS, vol. 6, no. 4, 1 April 2009 (2009-04-01), pages 2446
See also references of EP2772549A4

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12437838B2 (en) 2013-01-25 2025-10-07 Sequenom, Inc. Methods and processes for non-invasive analysis of cell-free fetal nucleic acid according to sequence read quantifications for chromosomes 13, 18, and 21
US10741269B2 (en) 2013-10-21 2020-08-11 Verinata Health, Inc. Method for improving the sensitivity of detection in determining copy number variations
CN103993069A (zh) * 2014-03-21 2014-08-20 深圳华大基因科技服务有限公司 病毒整合位点捕获测序分析方法
AU2015266665B2 (en) * 2014-05-30 2021-08-19 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies and copy number variations
WO2015184404A1 (en) * 2014-05-30 2015-12-03 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies and copy number variations
US12217827B2 (en) 2014-05-30 2025-02-04 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies
IL249095B2 (en) * 2014-05-30 2023-10-01 Verinata Health Inc Detection of subchromosomal aneuploidy in the fetus and variations in the number of copies
IL249095B1 (en) * 2014-05-30 2023-06-01 Verinata Health Inc Detection of subchromosomal aneuploidy in the fetus and variations in the number of copies
AU2015266665C1 (en) * 2014-05-30 2021-12-23 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies and copy number variations
US10318704B2 (en) 2014-05-30 2019-06-11 Verinata Health, Inc. Detecting fetal sub-chromosomal aneuploidies
EP3690061A1 (en) * 2014-05-30 2020-08-05 Verinata Health, Inc. Detecting, optionally fetal, sub-chromosomal aneuploidies and copy number variations
US11072814B2 (en) 2014-12-12 2021-07-27 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
CN105986008A (zh) * 2015-01-27 2016-10-05 深圳华大基因科技有限公司 Cnv检测方法和装置
US10095831B2 (en) 2016-02-03 2018-10-09 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
US11430541B2 (en) 2016-02-03 2022-08-30 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
WO2018054254A1 (zh) * 2016-09-22 2018-03-29 上海亿康医学检验所有限公司 一种鉴定样本中肿瘤负荷的方法和系统
US11342047B2 (en) 2017-04-21 2022-05-24 Illumina, Inc. Using cell-free DNA fragment size to detect tumor-associated variant
US12087401B2 (en) 2017-04-21 2024-09-10 Illumina, Inc. Using cell-free DNA fragment size to detect tumor-associated variant
CN107312850A (zh) * 2017-07-19 2017-11-03 华东医药(杭州)基因科技有限公司 一种pcr无效扩增的检测方法
CN108410970A (zh) * 2018-03-12 2018-08-17 博奥生物集团有限公司 一种单细胞基因组拷贝数变异的检测方法及试剂盒

Also Published As

Publication number Publication date
HUE047193T2 (hu) 2020-04-28
EP2772549B8 (en) 2019-09-11
CN104204220A (zh) 2014-12-10
PL2772549T3 (pl) 2019-12-31
EP2772549A1 (en) 2014-09-03
US20140370504A1 (en) 2014-12-18
DK2772549T3 (da) 2019-08-19
ES2741966T3 (es) 2020-02-12
EP2772549B1 (en) 2019-07-31
JP5993029B2 (ja) 2016-09-14
JP2015502749A (ja) 2015-01-29
CN104204220B (zh) 2017-06-06
EP2772549A4 (en) 2015-03-18

Similar Documents

Publication Publication Date Title
WO2013097062A1 (zh) 一种遗传变异检测方法
JP7506408B2 (ja) 血漿dnaの単分子配列決定
JP7321727B2 (ja) 卵子提供による妊娠での胎児異数性の非侵襲的検出
TWI874916B (zh) 腫瘤檢測之方法及系統
CN108026572B (zh) 游离dna的片段化模式的分析
CN103608818B (zh) 非侵入性产前倍性识别装置
WO2013149385A1 (zh) 一种拷贝数变异检测方法和系统
CN105648045B (zh) 确定胎儿目标区域单体型的方法和装置
WO2013059967A1 (zh) 一种检测染色体微缺失和微重复的方法
CN105555970B (zh) 同时进行单体型分析和染色体非整倍性检测的方法和系统
CN104232777A (zh) 同时确定胎儿核酸含量和染色体非整倍性的方法及装置
WO2015035555A1 (zh) 用于确定胎儿是否存在性染色体数目异常的方法、系统和计算机可读介质
US20180142300A1 (en) Universal haplotype-based noninvasive prenatal testing for single gene diseases
CN111321210A (zh) 一种无创产前检测胎儿是否患有遗传疾病的方法
CN117925820B (zh) 一种用于胚胎植入前变异检测的方法
WO2014075228A1 (zh) 确定生物样本中染色体数目异常的方法、系统和计算机可读介质
CN105648044B (zh) 确定胎儿目标区域单体型的方法和装置
HK40101694A (zh) 使用核酸大小范围进行非侵入性产前检查和癌症检测
HK40079331A (zh) 游离dna的片段化模式的分析
CN117737227A (zh) 基于cfDNA筛查胎儿ACH的基因检测试剂盒及系统
WO2013173993A1 (zh) 鉴定双胞胎类型的方法和系统
CN116716396A (zh) microRNAs作为神经管畸形的无创产前筛查分子标志物及其应用
CN107988343A (zh) 非侵入性产前倍性识别方法
HK40031026B (zh) 使用核酸大小范围进行非侵入性产前检查和癌症检测
HK40006242A (zh) 用於肿瘤检测的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11878559

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011878559

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2014546264

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 14369615

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE