WO2021026828A1 - 确定孕妇血液中胎儿核酸浓度的方法及设备 - Google Patents
确定孕妇血液中胎儿核酸浓度的方法及设备 Download PDFInfo
- Publication number
- WO2021026828A1 WO2021026828A1 PCT/CN2019/100629 CN2019100629W WO2021026828A1 WO 2021026828 A1 WO2021026828 A1 WO 2021026828A1 CN 2019100629 W CN2019100629 W CN 2019100629W WO 2021026828 A1 WO2021026828 A1 WO 2021026828A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- genotype
- information
- concentration
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the invention relates to the field of gene detection, in particular to a method and equipment for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- NIPT noninvasive prenatal testing
- cfDNA fetal DNA information
- Y chromosome depth calculation method which is based on the cfDNA sequencing reads from pregnant women in plasma that cannot be compared to the non-homologous region of the Y chromosome of the human reference genome, so it can be uniquely compared to the Y chromosome
- the reads in the non-homologous regions are all derived from the cfDNA of the male fetus, and the nucleic acid concentration of the fetus is calculated based on the principle
- SNP Single nucleotide polymorphism
- the fetal concentration calculated by the depth of the Y chromosome is used as the true set, and the model is fitted using super large sample data, and then used This model completes the detection of fetal concentration; 4) Methylation data-assisted calculation method, which is calculated based on the differences in the methylation of DNA from different tissues of different individuals and the same individual; 5) cfDNA fragment length calculation method.
- the method is based on the known that the free fetal DNA fragments in the plasma of pregnant women average about 147-167bp, while the free DNA length distribution of pregnant women is generally about 167-187bp.
- the fetal cfDNA concentration is estimated by calculating the proportion of cfDNA fragments in pregnant women.
- Nucleosome arrangement calculation method this method is based on the difference in degradation degree, the fragment length distribution of pregnant women and fetus cfDNA is different, and this difference is used to estimate the concentration of fetal cfDNA.
- these six types of methods only 1), 3) and 6) can calculate fetal concentration based on NIPT data only.
- 1) is limited to the calculation of male fetuses, and 3) and 6) can only be used for higher depth. Calculation of NIPT data.
- an object of the present invention is to provide a method and device for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- cfDNA fragment length calculation method requires accurate estimation of cfDNA fragment length, so paired-end sequencing (PE) can only be used;
- nucleosome arrangement calculation method requires the use of reads in the nucleosome unit Therefore, there are certain requirements for the depth of NIPT sequencing, and it is impossible to use only the data of about 0.1x depth that is more common in NIPT detection.
- the Y chromosome depth calculation method can only calculate the fetal concentration of male fetuses, but cannot calculate the female fetus; 2) the depth distribution calculation method of sequencing reads can only be used to estimate samples with high fetal concentration, and cannot be applied to fetal concentrations at 5 % Within the sample.
- the present application provides a method and device for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- the method or device can only use the NIPT data of the pregnant women's plasma without other data assistance, that is, to achieve the determination of the concentration of fetal nucleic acid.
- it can be applied to ultra-low depth (for example, about 0.1x) sequencing data, and there is no requirement for the type of sequencing, whether it is paired-end sequencing or single-end sequencing; there are no special requirements for the sample type, male and female All tires are applicable.
- the application provides a method for determining the concentration of fetal nucleic acid in the blood of a pregnant woman, including: (1) determining the first genotype information based on the comparison of the sequencing data with at least a part of the reference genome, so The sequencing data comes from the nucleic acid sample of the pregnant woman’s blood; (2) using the linkage disequilibrium relationship, based on the reference data, correcting the first genotype information to obtain the second genotype information; and (3) based on The difference between the first genotype information and the second genotype information determines the fetal nucleic acid concentration.
- This application provides a method for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- the method obtains first genotype information by comparing sequencing data with a reference genome; these sequencing data are obtained by sequencing nucleic acid samples of pregnant women’s blood.
- the obtained sequencing data contains the nucleic acid information of the mother and the fetus. Since part of the nucleic acid information of the fetus comes from the father, the sequencing data also indirectly contains the nucleic acid information of the father.
- the linkage disequilibrium relationship to correct the obtained first genotype information, that is, because the sequencing data is mixed with the part of the nucleic acid information from the male parent, this part of the information and the female parent information come from different individuals, so it will be certain It is corrected to a certain extent, and the corrected second genotype information is obtained. Then, by comparing the difference between the first genotype information and the second genotype information, determine the part of the genotype information that is corrected, and the more genotype information in this part is corrected, the higher the concentration of fetal cfDNA in the pregnant woman's plasma can be Determine the fetal nucleic acid concentration based on the relationship between the corrected genotype ratio and the fetal cfDNA concentration in pregnant women's plasma.
- the method for determining the concentration of fetal nucleic acid in the blood of pregnant women has many advantages, which are as follows: 1) In the whole method, only the NIPT data of pregnant women’s plasma is used without other data assistance; 2) due to the application of linkage failure The balance relationship estimates the changes in the genome genotype, so even if the sequencing depth is low, as long as there is a mixture of pregnant and fetal data, the fetal nucleic acid concentration can be reflected by the changes in the genome genotype, so it can be applied Ultra-low depth (for example, about 0.1x) in the sequencing data; 3) There is no requirement on the type of sequencing, whether it is paired-end sequencing or single-end sequencing can be applied; 4) There is no special requirement on the sample type (both male and female fetuses) Applicable, and no requirement for fetal concentration).
- the method provided in this application breaks through the limitations on the estimated data sequencing depth, data type, and fetal gender for the first time, is universal and does not require additional sampling and sequencing costs, and has extremely high application value in the NIPT field.
- the above-mentioned method for determining the concentration of fetal nucleic acid in the blood of pregnant women may further include the following technical features:
- the sequencing data is obtained by sequencing a nucleic acid sample of the pregnant woman’s blood, and the sequencing depth may be 10X, 5X, 1X, 0.5X, It can be 0.2X or 0.1X.
- Nucleic acid samples of pregnant women’s blood include fetal nucleic acid information and maternal nucleic acid information. Some of the fetal nucleic acid information comes from the father’s parents.
- nucleic acid samples of pregnant women’s blood are sequenced, even for low-sequencing deep-sequencing data, such as No more than 10X, no more than 5X, no more than 1X, or even 0.1X sequencing data can be analyzed by the method provided in this application to determine the concentration of fetal nucleic acid.
- the sequencing data is obtained through second-generation sequencing technology or third-generation sequencing technology.
- the second-generation sequencing technology is also called high-throughput sequencing technology. It can measure many sequences at one time. For example, the nucleic acid can be randomly broken into countless small fragments by physical or chemical means, which can be 250-300bp or so. , And then enrich these small molecule fragments by building a library, and then sequence them in a sequencer. The sequencer has regions where these fragments can be attached. Each fragment has an independent attachment region, so that all attached fragments can be detected at once DNA sequence information.
- the second-generation sequencing technology can measure a large number of sequences at one time, but the fragments are limited to, for example, about 250-300bp, which is costly.
- second-generation sequencing technologies can be Roche/454's pyrosequencing method for sequencing, or Illumina's fluorescent sequencing detection, ABI/Solid's fluorescent sequencing detection, or MGI's DNB sequencing detection, etc.
- the third-generation sequencing technology can make the sequencing length up to about 10KB and does not rely on PCR amplification.
- PacBio's SMRT or Oxford Nanopore Technologies nanopore single-molecule sequencing technology can be used.
- the obtained sequencing data of the nucleic acid samples of pregnant women’s blood can be used and analyzed to determine the concentration of fetal nucleic acid according to the method provided in this application .
- the reference genome includes at least one strongly linked region in the human genome.
- "Strong linkage region” varies according to the size and structure of the studied population, and is generally defined as the probability of historical recombination between any pair of variant sites in the region is less than 5%.
- the reference genome can contain one strong linkage region, two strong linkage regions, three strong linkage regions, and even more. Generally speaking, without considering the cost, the more strongly linked regions contained in the reference genome, the more accurate the concentration of fetal nucleic acid in the blood of pregnant women will be determined after comparison and calculation.
- the length of the strong linkage region is 5-10mb, for example, it can be 10mb, it can be 9mb, it can be 8mb, it can be 7mb, it can be 6mb, it can be 5mb.
- the length of the strong linkage regions can fluctuate by 10%-20%.
- the length of the strong linkage regions can be 10mb, 11mb or 12mb, 9mb or 8mb, etc. . In this way, accurate correction of information from paternal sequencing data can be achieved.
- the first genotype information is determined based on the number of supported sequencing reads.
- the first genotype information is determined based on the number of supported sequencing reads. For example, if there are 100 sequencing reads supporting base A at a certain site, the sequencing of base G is supported There are 8 reads and 20 sequencing reads that support base T, then the base at this site is determined to be A. In this way, the genotype information of each site can be obtained, and the required first genotype information can be determined by comparing with at least a part of the reference genome.
- the first genotype information includes at least one of SNP and Indel.
- the first genotype information includes single point mutation (SNP) information and/or small insertion deletion (Indel) information, and correction of these information can reflect the nucleic acid information of the father, thereby realizing accurate determination of the fetal nucleic acid concentration.
- the reference data includes multiple mutation site information and mutation frequency information.
- the first genotype information can be corrected, so that some sequencing information from the paternal parent is corrected, according to The correlation between the corrected information and the fetal nucleic acid concentration determines the fetal nucleic acid concentration.
- the correction is performed through IMPUTE2.
- IMPUTE2 is actually a genotype completion and correction algorithm for sites with missing data or low accuracy.
- step (3) further includes: (3-1) determining the difference ratio between the first genotype information and the second genotype information; (3-2) based on step (3)
- the difference ratio obtained in -1) and a predetermined fitting formula are used to determine the fetal nucleic acid concentration, and the fitting formula is determined based on multiple reference samples with known fetal nucleic acid concentrations.
- different formulas or models are used for fitting, such as linear regression model or other Models that effectively integrate all information, such as random forest models or other deep learning models, etc., so that the fetal nucleic acid concentration can be correlated with the difference ratio.
- the nucleic acid concentration of the fetus in the blood of the pregnant woman can be determined by means of a predetermined fitting formula. That is, in the early stage, some samples with known fetal concentration can be used as the training set. With the help of the method provided by the present invention, different formulas or model fittings can be used to determine the fitting formula; subsequent applications can be targeted at one or more samples Predicting fetal cfDNA concentration eliminates the need for additional samples with known fetal concentration.
- the number is at least 100, for example, it can be 100, it can be 500, it can be 1000, it can be 5000 or more.
- a formula or model is used for fitting, the more the number of reference samples, the more accurate the fitting formula, and the more accurate the fetal nucleic acid concentration in the blood of pregnant women measured by the formula.
- too many samples can also increase the calculation cost and the cost from the sample number itself.
- the number of these reference samples can be 5000 ⁇ 10000, for example, it can be 5000, 6000, 7000, 8000, 9000 , Or 10,000, there is no restriction on the number of samples.
- the present application provides a device for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- the device can determine the concentration of fetal nucleic acid in the blood of pregnant women. It only needs to use the NIPT data of pregnant women’s plasma without other data. Assist; and can be applied to ultra-low-depth sequencing data; and there is no special requirement for the sample type, whether it is male or female fetuses.
- the device includes: a comparison unit that determines first genotype information based on a comparison of sequencing data with at least a part of a reference genome, the sequencing data is derived from a nucleic acid sample of the pregnant woman's blood; a correction unit, The correction unit is connected to the comparison unit, and the correction unit uses the linkage disequilibrium relationship to correct the first genotype information based on the reference data to obtain the second genotype information; and a calculation unit, so The calculation unit is respectively connected with the comparison unit and the correction unit, and the calculation unit determines the fetal nucleic acid concentration based on the difference between the first genotype and the second genotype information.
- the above-mentioned device for determining the concentration of fetal nucleic acid in the blood of a pregnant woman may further include the following technical features, which are mentioned or involved in the above method for determining the concentration of fetal nucleic acid in the blood of a pregnant woman.
- the functions performed by the features are similar to the above-mentioned method for determining the concentration of fetal nucleic acid in the blood of pregnant women, and will not be described in detail here.
- the sequencing data is obtained by sequencing a nucleic acid sample of the pregnant woman’s blood, and the sequencing depth may be 10X, 5X, or 1X, It can be 0.5X, 0.2X, or 0.1X. That is, the device provided by this application can not only use high-depth or high-level sequencing data to determine the fetal nucleic acid concentration in pregnant women's blood, but also use low-depth sequencing or ultra-low-depth sequencing data to determine the fetal nucleic acid concentration in pregnant women's blood.
- the sequencing data is obtained through second-generation sequencing technology or third-generation sequencing technology.
- the reference genome in the device, includes at least one strongly linked region in the human genome.
- the length of the strong linkage area is 5mb-10mb, for example, it may be 10mb, may be 9mb, may be 8mb, may be 7mb, may be 6mb, may be 5mb , There is no restriction on the length of the strong linkage area.
- the first genotype information is determined based on the number of supported sequencing reads.
- the first genotype information includes at least one of SNP and Indel.
- the reference data includes multiple mutation site information and mutation frequency information.
- the correction is performed through IMPUTE2.
- the calculation unit further includes: a difference ratio calculation unit, which determines the difference ratio between the first genotype and the second genotype information; a fetal nucleic acid concentration calculation unit, and the fetal nucleic acid
- the concentration calculation unit is connected to the difference ratio calculation unit, and the fetal nucleic acid concentration calculation unit determines the fetal nucleic acid concentration based on the difference ratio obtained in the difference ratio calculation unit and a predetermined fitting formula, The fitting formula is determined based on multiple reference samples with known fetal nucleic acid concentrations.
- the number may be 5000 to 10000, for example, it may be 5000, 6000, 7000, 8000, 9000, or 10000. None is made about the number of samples here. limit.
- the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the program, it implements The method described in any embodiment of the first aspect of the present invention. Therefore, only the NIPT data of pregnant women's plasma is needed. With the help of linkage disequilibrium, the concentration of fetal nucleic acid can be quickly determined, and it can be applied to low-depth sequencing data, and there is no requirement for fetal concentration and sample type.
- the present invention provides a computer scale storage medium on which a computer program is stored, and when the program is executed by a processor, the method according to any one of the embodiments of the first aspect of the present invention is implemented. Therefore, only the NIPT data of pregnant women's plasma is needed. With the help of linkage disequilibrium, the concentration of fetal nucleic acid can be quickly determined, and it can be applied to low-depth sequencing data, and there is no requirement for fetal concentration and sample type.
- Fig. 1 is a schematic structural diagram of a device for determining the concentration of fetal nucleic acid in the blood of a pregnant woman according to an embodiment of the present invention.
- Fig. 2 is a schematic structural diagram of a calculation unit in a device for determining the concentration of fetal nucleic acid in the blood of a pregnant woman according to an embodiment of the present invention.
- Fig. 3 is a schematic diagram of a method for determining the concentration of fetal nucleic acid through model prediction according to an embodiment of the present invention.
- Fig. 4 is a diagram of the fetal concentration prediction result of a 1000-sample test data set for a model obtained by using 10,000 samples as a training set according to an embodiment of the present invention.
- FIG. 5 is a diagram of the fetal concentration prediction result of a 1000-sample test data set for a model obtained by using 10,000 samples as a training set according to an embodiment of the present invention.
- first genotype information and “second genotype information” refer to the information containing the genotype of each site. In this article, respectively refer to the original genotype obtained from the sequencing data and the genotype after correction using linkage disequilibrium information.
- linkage is used to describe the relationship between two sites. If the distance between two or more sites is relatively close, then crossover occurs during meiosis and two sites on the same chromosome The probability that the alleles on the above are separated is relatively small, which means that the alleles at these two sites are not independent when passed to the next generation (for example, the alleles above them tend to be passed on together). This biological phenomenon is called linkage. "Strong linkage region” varies according to the size and structure of the studied population, and is generally defined as the probability of historical recombination between any pair of variant sites in the region is less than 5%.
- Linkage disequilibrium refers to the situation where the probability of a specific genotype combination of two mutation sites being inherited at the same time is greater than the random probability. That is, as long as a certain genotype combination of two loci is not completely inherited independently, it means that the two loci are in linkage disequilibrium.
- the present application provides a method for determining the concentration of fetal nucleic acid in the blood of a pregnant woman, including: (1) determining the first genotype information based on a comparison of sequencing data with at least a part of a reference genome, said The sequencing data is derived from the nucleic acid sample of the pregnant woman's blood; (2) the first genotype information is corrected based on the reference data using the linkage disequilibrium relationship, so as to obtain the second genotype information; and (3) based on the The difference between the first genotype information and the second genotype information determines the fetal nucleic acid concentration.
- the method provided in this application can be used to detect and determine the concentration of fetal DNA in the blood of pregnant women.
- the linkage disequilibrium relationship When used for correction, it can be based on existing methods or software. For example, correction or correction can be carried out by means of Imputation. Imputation is a method of genotype completion and correction for sites with missing data or low accuracy. Specifically, the linkage disequilibrium (Linkage disequilibrium, LD) relationship between the analyzed site and its nearby sites with higher accuracy is used to find the haplotype that best matches the analyzed site (using the haplotype in the reference population Type information, or use haplotype information among different individuals of the analyzed population), so as to infer the missing genotype at the analyzed site or correct the low-accuracy genotype.
- Linkage disequilibrium Linkage disequilibrium, LD
- the Imputation method is mainly used in Genomewide Association Study (GWAS) or population genetic analysis.
- LD information is used to amplify the number of data loci on the chip to maximize the genotype information related to a specific phenotype.
- LD information is used to amplify the number of data loci on the chip to maximize the genotype information related to a specific phenotype.
- use the reference population or the analyzed population's own haplotype information to correct the erroneously detected genotype sites caused by the low depth, thereby improving the accuracy of analysis.
- the present invention applies the principle of using haplotype information to correct low-depth sites in the analyzed sample in imputation to maternal plasma data, and uses LD information to comprehensively estimate the fetal concentration of the whole genome (or chromosome level). Since IMPUTE2 and other genotype inference algorithms will add the same premise when imputation is performed on a single sample, that is, the analyzed sample is diploid, so when there are genotypes that contradict this hypothesis (that is, more than two When a haplotype exists), these sites will be regarded as error sites and corrected.
- the plasma of pregnant women actually contains three haplotypes, that is, two maternal haplotypes and one fetal haplotype inherited from the father, there is a certain probability that the paternal fetal haplotype will be affected during the imputation process.
- the error site is deemed to be corrected, and the corrected probability is further correlated with the concentration of fetal cfDNA in the plasma of pregnant women.
- the calculation of fetal concentration in non-invasive prenatal genetic testing can be completed by comparing the ratio of corrected sites before and after the imputation.
- a gene region with a more significant signal can be selected as the comparison region.
- the mentioned gene regions with more significant signals can be expressed as: coverage (the coverage of sequencing data on the genome) is better, and the population base frequency (Minor allele frequency) is higher (indicating that the site has a variation in the population) Probability is higher), the area with a higher proportion of variant sites can further extract feature information and reduce background noise interference, thereby improving the accuracy of fetal concentration estimation.
- the strong linkage region When selecting a strong linkage region, you can determine the strong linkage region by changing the calculation window size, such as changing the 5mb window to 10mb or the entire chromosome, so as to improve the accuracy by increasing the number of effective sites in each window.
- step (3) further includes: (3-1) determining the difference ratio between the first genotype information and the second genotype information; (3-2) based on step (3)
- the difference ratio obtained in -1) and a predetermined fitting formula are used to determine the fetal nucleic acid concentration, and the fitting formula is determined based on multiple reference samples with known fetal nucleic acid concentrations.
- different formulas or models are used for fitting, such as linear regression model or other Models that effectively integrate all information, such as random forest models or other deep learning models, etc., so that the fetal nucleic acid concentration can be correlated with the difference ratio.
- the nucleic acid concentration of the fetus in the blood of the pregnant woman can be determined by means of a predetermined fitting formula.
- the present application provides a device for determining the concentration of fetal nucleic acid in the blood of pregnant women.
- the device can determine the concentration of fetal nucleic acid in the blood of pregnant women. It only needs to use the NIPT data of pregnant women’s plasma without other data. Assist; and can be applied to ultra-low-depth sequencing data; and there is no special requirement for the sample type, whether it is male or female fetuses.
- the device includes: a comparison unit that determines first genotype information based on a comparison of sequencing data with at least a part of a reference genome, and the sequencing data is derived from nucleic acid in the blood of the pregnant woman.
- a sample A sample; a correction unit, the correction unit is connected to the comparison unit, and the correction unit uses the linkage disequilibrium relationship to correct the first genotype information based on reference data, so as to obtain the second genotype information; And a calculation unit, the calculation unit is respectively connected to the comparison unit and the correction unit, and the calculation unit determines the fetal nucleic acid concentration based on the difference between the first genotype and the second genotype information .
- the calculation unit is shown in FIG. 2 and further includes: a difference ratio calculation unit, which determines the difference ratio between the first genotype and the second genotype information; and a fetal nucleic acid concentration calculation unit,
- the fetal nucleic acid concentration calculation unit is connected to the difference ratio calculation unit, and the fetal nucleic acid concentration calculation unit determines the difference ratio based on the difference ratio obtained in the difference ratio calculation unit and a predetermined fitting formula
- the fetal nucleic acid concentration, the fitting formula is determined based on multiple reference samples with known fetal nucleic acid concentration.
- Example 1 provides a method for calculating the concentration of fetal nucleic acid in the plasma of a pregnant woman using the cfDNA sequencing data of the pregnant woman's plasma as input data. The specific steps are as follows:
- the original offline data (fq format) of all samples used for model training and prediction are aligned to the human reference chromosome hg38 using the samse mode in BWA after quality control; Picard is used to remove duplicate reads in the alignment results and calculate the repetition rate , Use the base quality value correction BQSR function in the mutation detection algorithm such as GATK to complete the local correction of the comparison result, and use the Depth of Coverage function in the mutation detection algorithm such as GATK to calculate the depth distribution of each sample; use the mutation detection algorithm such as GATK
- the population variation detection mode completes the detection of single point (SNP) and small indels (Indel).
- the whole genome is divided into 5mb windows.
- the mutation site information and frequency information are used as the population reference data.
- These mutation site information and frequency information can come from existing databases (such as thousands of people).
- Database, Hapmap database and other human population reference genome database can also be calculated by using the population information of the input data itself (that is, directly calculating the genotype and its corresponding frequency of each locus in the sample to be analyzed).
- Imputation is a method of genotype completion and correction for sites with missing data or low accuracy.
- the linkage disequilibrium (Linkage disequilibrium, LD) relationship between the analyzed site and its nearby sites with higher accuracy is used to find the haplotype that best matches the analyzed site (using the haplotype in the reference population Type information, or use haplotype information among different individuals of the analyzed population), so as to infer the missing genotype at the analyzed site or correct the low-accuracy genotype.
- Linkage disequilibrium Linkage disequilibrium, LD
- the Imputation method is mainly used in Genomewide Association Study (GWAS) or population genetic analysis.
- LD information is used to amplify the number of data loci on the chip to maximize the genotype information related to a specific phenotype.
- LD information is used to amplify the number of data loci on the chip to maximize the genotype information related to a specific phenotype.
- use the reference population or the analyzed population's own haplotype information to correct the erroneously detected genotype sites caused by the low depth, thereby improving the accuracy of analysis.
- the present invention applies the principle of using haplotype information to correct low-depth sites in the analyzed sample in imputation to maternal plasma data, and uses LD information to comprehensively estimate the fetal concentration of the whole genome (or chromosome level). Since IMPUTE2 and other genotype inference algorithms will add the same premise when imputation is performed on a single sample, that is, the analyzed sample is diploid, so when there are genotypes that contradict this hypothesis (that is, more than two When a haplotype exists), these sites will be regarded as error sites and corrected.
- the plasma of pregnant women actually contains three haplotypes, that is, two maternal haplotypes and one fetal haplotype inherited from the father, there is a certain probability that the paternal fetal haplotype will be affected during the imputation process.
- the error site is deemed to be corrected, and the corrected probability is further correlated with the concentration of fetal cfDNA in the plasma of pregnant women.
- the imputation process for sequencing data is a process of using the information of linkage disequilibrium between different loci to correct and complete the genotype of the low-accuracy or missing loci nearby with the help of high-accuracy loci.
- IMPUTE2 or other genotype inference algorithms are used to imputation on pregnant women's plasma cfDNA data based on the assumption that all sequencing data originate from the same individual, that is, only two haplotypes. Therefore, in the imputation process, the third haplotype composed of fetal paternal cfDNA in pregnant women's plasma will be regarded as an error site and corrected.
- a large sample size (recommended more than 10,000 cases) of male fetal pregnant women plasma cfDNA data is used as the training set, the fetal concentration calculated by the Y chromosome depth is the true set (Y value), and the correction position calculated in (3)
- the point ratio is a covariate (X value).
- the average sequencing depth of the sample, the high-quality sequencing depth, and the repetition rate are added as the covariate to construct a linear regression model for fetal concentration prediction.
- y i is the male fetal concentration calculated from the depth of the Y chromosome corresponding to sample i
- ⁇ x i1 ... x in ⁇ is the percentage of imputation correction sites in each window corresponding to all n windows in sample i
- p is the total number of samples in the training set.
- the ratio of imputation correction sites for each sample, average sequencing depth, high-quality sequencing depth, and repetition rate are covariates, and the fetal concentration is predicted using the prediction model obtained in (4).
- This method has been preliminarily tested on NIPT ultra-low depth ( ⁇ 0.1x) SE sequencing data.
- the data of 10,000 male fetuses are used as the training set, and the fetal concentration estimated by the Y chromosome depth is used as the true set for the simulation of linear regression models.
- the average sequencing depth, high-quality sequencing depth, and repetition rate of each sample are used as the covariates of the model to complete the construction of the prediction model.
- use this prediction model to independently estimate the concentration of 1,000 fetuses twice, and get the correlation between the estimated fetal concentration and the actual fetal concentration (the fetal concentration calculated by the Y chromosome depth) as follows:
- Fig. 4 is the fetal concentration prediction result of a 1000-sample test data set (test data set 1) independently performed by the model obtained with 10,000 samples as the training set.
- the correlation (R 2 ) between the fetal concentration calculated based on the Y chromosome depth (abscissa) and the fetal concentration (ordinate) calculated by the method of the present invention is 0.7318 (95% confidence interval: 0.7016 ⁇ 0.7593) .
- Fig. 5 is the fetal concentration prediction result of a 1000-sample test data set (test data set 2) independently performed by the model obtained by using 10,000 samples as the training set.
- the correlation (R 2 ) between the fetal concentration calculated based on the Y chromosome depth (abscissa) and the fetal concentration (ordinate) calculated by the method of the present invention is 0.7423 (95% confidence interval: 0.7131 ⁇ 0.7689) .
- the values in the appendix table are the standard output results of the linear regression model in R, and the estimated value (coefficient) is the calculated value of the corresponding coefficient of each input covariate (covariant), that is, the parameters of the model obtained from the training set.
- the parameters are directly brought into the linear model and can be used to predict the fetal cfDNA concentration of the new sample; the standard deviation is the error corresponding to the estimated value; the T value and p value are the significance test results of the corresponding covariates; the last column of significance In order to divide the significance degree according to the p value; in practical applications, only the more significant covariates (such as p less than 0.05) can be selected for prediction.
- first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features.
- a plurality of means at least two, such as two, three, etc., unless otherwise specifically defined.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
一种确定孕妇血液中胎儿核酸浓度的方法及设备。该方法包括:(1)基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,该测序数据来自于该孕妇血液的核酸样本;(2)利用连锁不平衡关系,基于参考数据,对该第一基因型信息进行校正,以便获得第二基因型信息;以及(3)基于该第一基因型信息和该第二基因型信息的差异,确定该胎儿核酸浓度。
Description
本发明涉及基因检测领域,具体涉及一种确定孕妇血液中胎儿核酸浓度的方法及设备。
自1997年发现孕妇血浆中存在胎儿游离DNA以来,通过提取孕妇血浆游离DNA以(cfDNA)获取胎儿DNA信息的无创产前诊断技术(noninvasive prenatal testing,NIPT)便取得长足的发展。孕妇血浆cfDNA中的胎儿cfDNA浓度不仅被证实随着采血孕周的增加而提高,且在不同孕妇体内也存在差异性。对孕妇血浆cfDNA中胎儿浓度的准确估算不仅有助于提高NIPT技术的准确性,同时也有助于研究其对多种孕期并发症和孕妇表型的影响。
多个机构都曾相继提出通过不同数据和不同方法来推算胎儿浓度的方法。这些方法可以归纳为六类:1)Y染色体深度计算法,其基于血浆中来源于孕妇的cfDNA测序reads无法比对上人类参考基因组的Y染色体非同源区,因此可以唯一比对上Y染色体非同源区的reads均来自于男性胎儿的cfDNA,由此原理计算胎儿的核酸浓度;2)捕获测序数据辅助的单核苷酸多态位点(SNP)计算法,其利用基因组中父亲和母亲分别为不同碱基型的纯合位点结合孕妇血浆的reads深度信息,计算胎儿的核酸浓度;3)测序reads深度分布计算法,该方法将基因组切分为例如50kb的窗口,计算每个窗口内孕妇血浆cfDNA的reads总数以及短片段reads比例,并将计算所得结果作为输入数据建立回归模型,以Y染色体深度计算所得胎儿浓度作为真集,利用超大样本量数据进行模型拟合,然后使用此模型完成胎儿浓度的检测;4)甲基化数据辅助计算法,该方法基于不同个体及同一个体的不同组织来源DNA的甲基化情况存在差异进行计算;5)cfDNA片段长度计算法,该方法基于已知孕妇血浆中胎儿游离DNA片段长度平均在147~167bp左右,而孕妇游离DNA长度分布一般在167~187bp左右,通过计算孕妇体内段片段cfDNA的比例碱基估算出胎儿cfDNA的浓度;6)核小体排列计算法,该方法基于降解程度不同,孕妇和胎儿cfDNA的片段长度分布有所差异,利用这种差异大小估算胎儿cfDNA的浓度。这六类方法中,只有1),3)和6)仅依靠NIPT数据即可计算胎儿浓度,而这其中1)只限于男性胎儿的计算,3)和6)则仅可用于较高深度的NIPT数据的计算。
目前尚无任何一种方法可以基于超低深度的NIPT数据来无差别计算男性及女性胎儿的cfDNA浓度。
发明内容
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本发明的一个目的在于提出一种确定孕妇血液中胎儿核酸浓度的方法及设备。
发明人在长期的研究过程中发现:
已有的胎儿核酸浓度的计算方法,在测定孕妇血液中胎儿核酸浓度时,除了孕妇血浆NIPT数据之外,多数需要其他类型数据进行辅助。其中,1)捕获测序数据辅助的单核苷酸多态位点(SNP)计算法需要利用父母捕获测序数据或高深度cfDNA测序数据作为辅助,在额外获取父亲及母亲(或至少母亲)的准确基因型的基础上才可以完成胎儿浓度的计算;2)甲基化数据辅助计算法则需要额外获取父亲及母亲的甲基化数据才能实现胎儿浓度的计算。这些方法对不同类型辅助数据的需求,一方面增加了取样的难度(如需要另外获取父亲的血样),另一方面也增加了分析所需的成本。
而仅需要孕妇血浆NIPT数据的方法则均对NIPT数据类型及测序深度有额外的需求。其中,1)cfDNA 片段长度计算法因为需要准确估算cfDNA片段长度,因而只能使用双端测序法(paired-end,PE);2)核小体排列计算法因需要利用核小体单元内reads的深度差异,因而对NIPT测序深度有一定要求,无法仅使用现在NIPT检测中较常见的约0.1x深度的数据。
其余两个不需要额外使用其他类型数据作为辅助、同时对NIPT数据没有特殊要求的方法则均有其应用的局限性,无法覆盖所有NIPT样本。其中,1)Y染色体深度计算法只能计算男性胎儿的胎儿浓度,无法计算女性胎儿;2)测序reads深度分布计算法仅能用于估算胎儿浓度较高的样本,无法应用于胎儿浓度在5%以内的样本中。
由此,如何借助于低深度的NIPT数据无差别地计算男性及女性胎儿的cfDNA浓度,还需要进一步改进。为此,本申请提供了一种确定孕妇血液中胎儿核酸浓度的方法和设备,该方法或者设备能够仅使用孕妇血浆NIPT数据而无需其他数据辅助,即实现胎儿核酸浓度的测定。而且可应用于超低深度(例如在0.1x左右)测序数据中,且对测序类型无要求,无论是双端测序或单端测序均可应用;对样本类型也无特殊要求,男胎和女胎都可适用。
具体而言,本申请提供了如下技术方案:
根据本申请的第一方面,本申请提供了一种确定孕妇血液中胎儿核酸浓度的方法,包括:(1)基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;(2)利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及(3)基于所述第一基因型信息和所述第二基因型信息的差异,确定所述胎儿核酸浓度。
本申请提供了一种确定孕妇血液中胎儿核酸浓度的方法,该方法通过将测序数据与参考基因组进行比对,获得第一基因型信息;这些测序数据是通过对孕妇血液的核酸样本进行测序所获得的,这些测序数据中含有母本的核酸信息,胎儿的核酸信息,其中由于胎儿的核酸信息中有部分来自于父本,所以这些测序数据也间接含有父本的核酸信息。然后利用连锁不平衡关系,对所获得的第一基因型信息进行校正,即测序数据中由于混有来自于父本的那部分核酸信息,这部分信息与母本信息来自不同个体,因而会一定程度上被校正,获得经过校正后的第二基因型信息。然后通过对比第一基因型信息和第二基因型信息的差异,确定被校正的那部分基因型信息,而且这部分被校正的基因型信息越多,说明孕妇血浆中胎儿cfDNA浓度越高,可以基于被校正的基因型比例与孕妇血浆中胎儿cfDNA浓度的关系,确定胎儿的核酸浓度。
本申请所提供的确定孕妇血液中胎儿核酸浓度的方法,具有多种优点,表现为:1)在整个方法中,仅需要使用孕妇血浆NIPT数据而无需其他数据辅助;2)由于应用了连锁不平衡关系对基因组基因型变化情况进行估计,所以即便是测序深度较低,只要存在孕妇和胎儿两种来源数据的混合情况,就可以通过基因组基因型变化情况反应胎儿的核酸浓度,因此可应用于超低深度(例如在0.1x左右)测序数据中;3)对测序类型无要求,无论是双端测序或单端测序均可应用;4)对样本类型无特殊要求(男胎和女胎均可应用,且对胎儿浓度无要求)。
本申请所提供的方法首次突破了对估算数据的测序深度,数据类型,及胎儿性别的限制,具有普适性且无需额外采样及测序成本,在NIPT领域有极高的应用价值。
根据本申请的实施例,以上所述确定孕妇血液中胎儿核酸浓度的方法可以进一步包括如下技术特征:
在本申请的一些实施例中,所述测序数据是通过对所述孕妇血液的核酸样本进行测序获得的,所述测序的深度可以为10X,可以为5X,可以为1X,可以为0.5X,可以为0.2X,也可以为0.1X。孕妇血液的核酸样本包括胎儿的核酸信息,母本的核酸信息,其中胎儿的核酸信息又有部分来自于父本,所以通 过对孕妇血液的核酸样本进行测序,即便是低测序深度测序数据,例如不超过10X,不超过5X,不超过1X,甚至是0.1X测序数据,通过本申请提供的方法进行分析,即可以实现胎儿核酸浓度的确定。
在本申请的一些实施例中,所述测序数据是通过二代测序技术或三代测序技术获得的。二代测序技术又称为高通量测序技术,其一次能够同时测很多序列,例如可以通过物理或是化学的方式将核酸随机打断成无数的小片段,可以为250~300bp左右的小片段,然后通过建库富集这些小分子片段,然后在测序仪中进行测序,测序仪中有着可以让这些片段附着的区域,每一个片段都有独立的附着区域,这样就可以一次检测所有附着的DNA序列的信息。二代测序技术可以一次测大量的序列,但是片段被限制在了例如250~300bp左右,成本较高。常用的二代测序技术可以为Roche/454公司的焦磷酸测序法进行测序,或者Illumina公司的荧光测序检测,ABI/Solid公司的荧光测序检测或华大智造(MGI)的DNB测序法检测等。三代测序技术可以使得测序长度达到10KB左右,而且不依赖于PCR扩增,例如可以采用PacBio公司的SMRT或者Oxford Nanopore Technologies纳米孔单分子测序技术。无论是二代测序技术或者是三代测序技术,无论是单端测序还是双端测序,所获得的孕妇血液的核酸样本的测序数据都可以拿来,按照本申请提供的方法分析确定胎儿核酸的浓度。
在本申请的一些实施例中,所述参考基因组包含人类基因组中的至少一个强连锁区域。“强连锁区域”依据所研究群体大小及结构有所不同,一般定义为在该区域内任意一对变异位点之间存在重组情况(historical recombination)的概率小于5%。参考基因组中可以包含一个强连锁区域,两个强连锁区域,三个强连锁区域,甚至更多。通常来说在不考虑成本的情况下,作为参考基因组所包含的强连锁区域越多,经过比对和计算,最终确定出来的孕妇血液中胎儿核酸浓度越精确。
在选择强连锁区域时,可以依据所研究群体大小及结构的不同,找到该群体在基因组上的全部或者部分强连锁区域,然后根据这些强连锁区域的范围选择适当大小的强连锁区域作为参考基因组。通常来说所选择的强连锁区域中覆盖了孕妇DNA和覆盖了胎儿DNA的变异位点的数量以及比例越多,利用包含该强连锁区域的参考基因组,所计算出来的孕妇血液中胎儿核酸浓度越精确。在本申请的一些实施例中,所述强连锁区域的长度为5~10mb,例如可以为10mb,可以为9mb,可以为8mb,可以为7mb,可以为6mb,可以为5mb。在示出的这些强连锁区域的长度的基础上,关于强连锁区域的长度可以再上下浮动10%~20%,例如强连锁区域的长度可以为10mb,可以是11mb或者12mb,9mb或者8mb等。由此可以实现来自父本测序数据的信息的精确校正。
在本申请的一些实施例中,所述第一基因型信息是基于测序读段支持数进行确定的。在获得第一基因型信息时,该第一基因型信息是基于测序读段的支持数来确定的,例如若某位点支持碱基A的测序读段有100个,支持碱基G的测序读段有8个,支持碱基T的测序读段有20个,则确定该位点的碱基为A。通过这种方式,可以获得各位点的基因型的信息,同时通过与参考基因组的至少一部分进行比对,确定所需要的第一基因型信息。
在本申请的一些实施例中,所述第一基因型信息包括SNP,Indel的至少之一。第一基因型信息中包含单位点突变(SNP)信息和/或小片段插入缺失(Indel)信息,通过这些信息校正,可以反映父本的核酸信息,从而实现胎儿核酸浓度的准确测定。
在本申请的一些实施例中,所述参考数据包括多个变异位点信息和变异频率信息。以包含有多个变异位点信息和多个变异频率信息的数据作为参考数据,基于连锁不平衡的关系,可以对第一基因型信息校正,使得一些来自于父本的测序信息被校正,根据被校正信息与胎儿核酸浓度的关联性,确定胎儿的核酸浓度。
在本申请的一些实施例中,所述校正是通过IMPUTE2进行的。IMPUTE2作为一种算法,其实针对 数据缺失或者准确性较低位点进行的一种基因型补全和校正的算法。当然除了IMPUTE2之外,还可以尝试使用其他的imputation方法利用LD信息进行位点的校正,例如BNEAGLE,PHASE等软件。
在本申请的一些实施例中,步骤(3)进一步包括:(3-1)确定所述第一基因型信息和所述第二基因型信息的差异比例;(3-2)基于步骤(3-1)中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。鉴于多个已知的胎儿核酸浓度和第一基因型信息和第二基因型信息的差异比例的关系,利用不同的公式或者模型进行拟合,例如可以借助于线性回归模型,也可以借助于其他有效的整合所有信息的模型,例如随机森林模型或者其他深度学习模型等等,从而可以将胎儿核酸浓度与差异比例进行关联。当确定了来自与某孕妇血液的核酸样本的第一基因型信息和第二基因型信息的差异比例后,就可以借助于预先确定的拟合公式,确定该孕妇血液中胎儿的核酸浓度。即前期可以通过部分已知胎儿浓度的样本作为训练集,借助于本发明所提供的方法,利用不同的公式或者模型拟合,确定拟合公式;后续在应用时,可以针对一个或者多个样本进行胎儿cfDNA浓度的预测,不再需要额外的已知胎儿浓度的样本。
在本发明的一些实施例中,所述多个为至少为100个,例如可以为100个,可以为500个,可以为1000个,可以为5000个或者更多个。当采用公式或者模型进行拟合时,参考样本数越多,拟合公式越准确,利用该公式所测得的孕妇血液中胎儿的核酸浓度也就越准确。当然过多的样本数可以也会增加计算成本和来自于样本数本身的成本。在使得成本和拟合精确性均较佳时,这些参考样本数可以为5000个~10000个,例如可以是5000个,可以是6000个,可以是7000个,可以是8000个,可以是9000个,或者为10000个,这里不对样本数做任何限制。
根据本申请的第二方面,本申请提供了一种确定孕妇血液中胎儿核酸浓度的设备,利用该设备能够确定孕妇血液中胎儿的核酸浓度,其仅需要使用孕妇血浆NIPT数据而不需其他数据辅助;而且可以应用于超低深度的测序数据中;且对样本类型无特殊要求,无论是男胎还是女胎均可应用。该设备包括:比对单元,所述比对单元基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;校正单元,所述校正单元与所述比对单元相连,所述校正单元利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及计算单元,所述计算单元分别与所述比对单元和所述校正单元相连,所述计算单元基于所述第一基因型和所述第二基因型信息的差异,确定所述胎儿核酸浓度。
根据本申请的实施例,以上所述确定孕妇血液中胎儿核酸浓度的设备可以进一步包括如下技术特征,这些技术特征在上述确定孕妇血液中胎儿核酸浓度的方法时均有提到或者涉及,各技术特征所行使的功能均与上述确定孕妇血液中胎儿核酸浓度的方法时相似,在此就不做详细的赘述。
在本申请的一些实施例中,所述设备中,所述测序数据是通过对所述孕妇血液的核酸样本进行测序获得的,所述测序的深度可以为10X,可以为5X,可以为1X,可以为0.5X,可以为0.2X,或者0.1X。即通过本申请所提供的设备不仅可以应用高深度或者较高测序数据来确定孕妇血液中胎儿的核酸浓度,也可以应用低深度测序或者超低深度测序数据来确定孕妇血液中胎儿的核酸浓度。
在本申请的一些实施例中,所述设备中,所述测序数据是通过二代测序技术或者三代测序技术获得的。
在本申请的一些实施例中,所述设备中,所述参考基因组包含人类基因组中的至少一个强连锁区域。
在本申请的一些实施例中,所述设备中,所述强连锁区域的长度为5mb~10mb,例如可以为10mb,可以为9mb,可以为8mb,可以为7mb,可以为6mb,可以为5mb,这里不对强连锁区域的长度做任何限制。
在本申请的一些实施例中,所述设备中,所述第一基因型信息是基于测序读段支持数进行确定的。
在本申请的一些实施例中,所述设备中,所述第一基因型信息包括SNP,Indel的至少之一。
在本申请的一些实施例中,所述设备中,所述参考数据包括多个变异位点信息和变异频率信息。
在本申请的一些实施例中,所述设备中,所述校正是通过IMPUTE2进行的。
在本申请的一些实施例中,所述计算单元进一步包括:差异比例计算单元,确定所述第一基因型和所述第二基因型信息的差异比例;胎儿核酸浓度计算单元,所述胎儿核酸浓度计算单元与所述差异比例计算单元相连,所述胎儿核酸浓度计算单元基于在所述差异比例计算单元中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。
在本申请的一些实施例中,所述多个为可以为5000个~10000个,例如可以为5000个,6000个,7000个,8000个,9000个,或者10000个,这里不对样本数做任何限制。
根据本发明的第三方面,本发明提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如本发明第一方面任一实施例所述的方法。由此仅需要使用孕妇血浆NIPT数据,借助于连锁不平衡关系,即可以快速测定胎儿核酸的浓度,而且可以应用于低深度测序数据中,且对胎儿浓度和样本类型无要求。
根据本发明的第四方面,本发明提供了一种计算机刻度存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本发明第一方面任一实施例所述的方法。由此仅需要使用孕妇血浆NIPT数据,借助于连锁不平衡关系,即可以快速测定胎儿核酸的浓度,而且可以应用于低深度测序数据中,且对胎儿浓度和样本类型无要求。
图1是根据本发明的实施例提供的确定孕妇血液中胎儿核酸浓度的设备的结构示意图。
图2是根据本发明的实施例提供的确定孕妇血液中胎儿核酸浓度的设备中的计算单元的结构示意图。
图3是根据本发明的实施例提供的通过模型预测确定胎儿核酸浓度的方法示意图。
图4是根据本发明的实施例提供的以10000例样本作为训练集所得模型进行一个1000例样本测试数据集的胎儿浓度预测结果图。
图5是根据本发明的实施例提供的以10000例样本作为训练集所得模型进行一个1000例样本测试数据集的胎儿浓度预测结果图。
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
同时,为了方便本领域技术人员的理解,对本发明的某些术语进行解释和说明,需要说明的是,这些解释和说明,仅用来帮助对于本发明技术方案的理解,而不应当看做是对本发明保护范围的限制。
本文中,术语“第一基因型信息”、“第二基因型信息”是指包含有各位点基因型的信息。在本文中,分别指从测序数据中得到的原始基因型,以及利用连锁不平衡信息完成矫正后的基因型。
术语“连锁(linkage)”用来描述两个位点之间的关系,如果两个或者两个以上位点间距离比较近, 那么在减数分裂过程中发生交叉并且同一条染色体两个位点上的等位基因被分离的概率就比较小,也就是说这两个位点的等位基因传递给下一代时是不独立的(例如,它们上面的等位基因倾向于一起传递),通常将这一生物现象称为连锁。“强连锁区域”依据所研究群体大小及结构有所不同,一般定义为在该区域内任意一对变异位点之间存在重组情况(historical recombination)的概率小于5%。
连锁不平衡是指两个变异位点的某种特定基因型组合同时遗传下去的概率大于随机概率的情况。即,只要两个位点的某种基因型组合不是完全独立遗传的,就表示这两个位点存在连锁不平衡的情况。
根据本申请的一个方面,本申请提供了一种确定孕妇血液中胎儿核酸浓度的方法,包括:(1)基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;(2)利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及(3)基于所述第一基因型信息和所述第二基因型信息的差异,确定所述胎儿核酸浓度。利用本申请提供的方法可以实现孕妇血液中胎儿DNA浓度的检测和测定。
在利用连锁不平衡关系进行校正时,可以基于已有的方法或者软件进行。例如可以借助于Imputation进行矫正或者校正。Imputation是针对数据缺失或准确性较低位点进行的一种基因型补全和矫正的方法。具体来说,利用被分析位点与其附近较高准确性位点的连锁不平衡(Linkage disequilibrium,LD)关系,寻找出与被分析位点最匹配的单倍型(利用参考群体中的单倍型信息,或利用被分析群体自身不同个体间单倍型信息),从而对被分析位点缺失基因型进行推断或对低准确性基因型进行矫正。
Imputation方法主要应用于全基因组关联分析(Genome wide association study,GWAS)或群体遗传性分析中,通过LD信息来扩增芯片数据位点数量从而最大程度挖掘出与特定表型相关的基因型信息,或者针对低深度群体测序数据,利用参考群体或被分析群体自身单倍型信息将由于深度过低导致的错误检测的基因型位点进行矫正,从而提高分析准确性。
本发明将imputation中利用单倍型信息对被分析样本中低深度位点进行矫正的原理应用于孕妇血浆数据,利用LD信息来综合推算全基因组(或染色体层面)胎儿浓度。由于IMPUTE2等基因型推断算法针对单个样本进行imputation时都会加入同一个前提,即所分析样本为二倍体,因而当某些位点中存在与此假设相矛盾的基因型存在(即多于两条单倍型存在)时,这些位点会被视为错误位点被矫正。而由于孕妇血浆中实际包含三种单倍型信息,即两种孕妇单倍型和一种遗传自父亲的胎儿单倍型,因而在imputation过程中,父源胎儿单倍型存在一定概率会被视为错误位点被矫正,这一被矫正概率进一步与孕妇血浆种胎儿cfDNA浓度存在相关性。
利用孕妇单倍型及基因组中连锁不平衡信息,借助imputation中对单倍型信息的应用,通过比较imputation前后被矫正位点的比例,可以完成无创产前基因检测中胎儿浓度的计算。
在将测序数据与参考基因组的至少一部分的比对时,可以通过筛选信号较为显著的基因区域作为比对区域。所提到的信号较为显著的基因区域可以表现为:覆盖度(测序数据在基因组上覆盖情况)较好,群体碱基频率(Minor allele frequency)较高(说明该位点在该群体存在变异的概率较高),变异位点比例较高的区域,来达到进一步提取特征信息,减少背景噪音干扰的目的,从而使胎儿浓度估算准确性提高。
在选择强连锁区域时,可以通过改变计算窗口大小,如将5mb窗口改为10mb或整条染色体等,确定强连锁区域,从而通过增加每个窗口内有效位点数的方法提高准确性。
在本申请的一些实施方式中,步骤(3)进一步包括:(3-1)确定所述第一基因型信息和所述第二基因型信息的差异比例;(3-2)基于步骤(3-1)中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。鉴于多个已知的胎儿 核酸浓度和第一基因型信息和第二基因型信息的差异比例的关系,利用不同的公式或者模型进行拟合,例如可以借助于线性回归模型,也可以借助于其他有效的整合所有信息的模型,例如随机森林模型或者其他深度学习模型等等,从而可以将胎儿核酸浓度与差异比例进行关联。当确定了来自与某孕妇血液的核酸样本的第一基因型信息和第二基因型信息的差异比例后,就可以借助于预先确定的拟合公式,确定该孕妇血液中胎儿的核酸浓度。
当然在选择或者确定模型时,可以通过在预测模型中加入更多更完善的孕妇表型信息作为协变量,从而优化预测模型,提高估算准确性。
根据本申请的另一个方面,本申请提供了一种确定孕妇血液中胎儿核酸浓度的设备,利用该设备能够确定孕妇血液中胎儿的核酸浓度,其仅需要使用孕妇血浆NIPT数据而不需其他数据辅助;而且可以应用于超低深度的测序数据中;且对样本类型无特殊要求,无论是男胎还是女胎均可应用。如图1所示该设备包括:比对单元,所述比对单元基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;校正单元,所述校正单元与所述比对单元相连,所述校正单元利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及计算单元,所述计算单元分别与所述比对单元和所述校正单元相连,所述计算单元基于所述第一基因型和所述第二基因型信息的差异,确定所述胎儿核酸浓度。
在至少一些实施方式中,所述计算单元如图2所示,进一步包括:差异比例计算单元,确定所述第一基因型和所述第二基因型信息的差异比例;胎儿核酸浓度计算单元,所述胎儿核酸浓度计算单元与所述差异比例计算单元相连,所述胎儿核酸浓度计算单元基于在所述差异比例计算单元中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。
下面将结合实施例对本发明的方案进行解释。本领域技术人员将会理解,下面的实施例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。
实施例1
实施例1提供了以孕妇血浆cfDNA测序数据为输入数据,计算孕妇血浆中胎儿核酸浓度的方法,具体步骤如下:
(1)前期数据处理。
所有用于模型训练及预测的样本的原始下机数据(fq格式)完成质控后使用BWA中samse模式比对至人类参考染色体hg38上;使用Picard去除比对结果中的重复reads并计算重复率,使用GATK等变异检测算法中碱基质量值纠正BQSR功能完成比对结果的局部矫正,使用GATK等变异检测算法中覆盖深度Depth of Coverage功能计算每个样本深度分布;使用GATK等变异检测算法中群体变异检测模式完成单位点(SNP)及小片段插入缺失(Indel)的检测。
(2)原始基因型信息提取。
以(1)中BWA比对及Picard去重后的结果(bam格式)为输入,通过samtools中mpileup功能输出以原始reads深度为基础的基因型结果(vcf格式)。即利用samtools软件中的pileup功能推测基因型。
(3)计算imputation矫正位点比例。
将全基因组划分为5mb的窗口,每个窗口内针对每个分析样本,以变异位点信息及频率信息作为群体参考数据,这些变异位点信息及频率信息可以来自于已有数据库(如千人数据库,Hapmap数据库等人 类群体参考基因组数据库),也可以通过使用输入数据本身的群体信息计算获得(即直接计算所要分析的样本内部的每个位点的基因型及其对应的频率)。使用IMPUTE2或其他基因型推断算法完成基因型的补全和矫正(imputation),最终得到每个样本的基因型结果(vcf格式),将所得基因型与(2)中通过原始reads深度信息推测所得基因型情况,计算两套数据中基因型不一致位点所占比例,即为imputation矫正位点比例。可以参照文献Porcu,E.;Sanna,S.;Fuchsberger,C.;Fritsche,L.G.Genotype imputation in genome-wide association studies.Curr Protoc Hum Genet.2013,Chapter 1,Unit 1.25.中所记载的Imputation原理进行矫正。
Imputation是针对数据缺失或准确性较低位点进行的一种基因型补全和矫正的方法。具体来说,利用被分析位点与其附近较高准确性位点的连锁不平衡(Linkage disequilibrium,LD)关系,寻找出与被分析位点最匹配的单倍型(利用参考群体中的单倍型信息,或利用被分析群体自身不同个体间单倍型信息),从而对被分析位点缺失基因型进行推断或对低准确性基因型进行矫正。
Imputation方法主要应用于全基因组关联分析(Genome wide association study,GWAS)或群体遗传性分析中,通过LD信息来扩增芯片数据位点数量从而最大程度挖掘出与特定表型相关的基因型信息,或者针对低深度群体测序数据,利用参考群体或被分析群体自身单倍型信息将由于深度过低导致的错误检测的基因型位点进行矫正,从而提高分析准确性。
本发明将imputation中利用单倍型信息对被分析样本中低深度位点进行矫正的原理应用于孕妇血浆数据,利用LD信息来综合推算全基因组(或染色体层面)胎儿浓度。由于IMPUTE2等基因型推断算法针对单个样本进行imputation时都会加入同一个前提,即所分析样本为二倍体,因而当某些位点中存在与此假设相矛盾的基因型存在(即多于两条单倍型存在)时,这些位点会被视为错误位点被矫正。而由于孕妇血浆中实际包含三种单倍型信息,即两种孕妇单倍型和一种遗传自父亲的胎儿单倍型,因而在imputation过程中,父源胎儿单倍型存在一定概率会被视为错误位点被矫正,这一被矫正概率进一步与孕妇血浆种胎儿cfDNA浓度存在相关性。
(4)建立胎儿浓度预测模型。
针对测序数据的imputation过程是利用不同位点间连锁不平衡信息,借助高准确性位点对其附近连锁的低准确性或缺失位点进行基因型的矫正和补全的过程。目前利用IMPUTE2或其他的基因型推断算法对孕妇血浆cfDNA数据进行imputation的前提为假设测序数据全部来源于同一个体,即仅两条单倍型(haplotypes)。因此在imputation过程中,孕妇血浆中胎儿父源cfDNA所组成的第三条haplotype将被视为错误位点被矫正。当测序深度相当时(或进行测序深度矫正后),当胎儿浓度提高时,可能提取到与孕妇不同的父源cfDNA的概率增加,因而矫正位点的比例也随之增加(如图3所示)。
如图3所示,当不同胎儿浓度对应的父源cfDNA的比例增加时,原始基因型(即第一基因型)由父源cfDNA所推测出的概率增加,进而经过imputation将这些父源基因型矫正回孕妇基因型的概率也随之增加,也即前文中提到的第一基因型及第二基因型不相同的比例提高,因此通过计算第一基因型和第二基因型比例的变化,可以反向推回胎儿游离DNA浓度。图3中所示出的不同胎儿浓度下imputation矫正位点比例也不不同,说明间接推出胎儿cfDNA浓度与基因型改变比例的这种相关性的过程。
基于上述理论,以大样本量(建议1万例以上)男胎孕妇血浆cfDNA数据为训练集,以Y染色体深度计算所得胎儿浓度为真集(Y值),以(3)中计算所得矫正位点比例为协变量(X值),同时加入样本平均测序深度,高质量测序深度,重复率作为协变量,构建线性回归模型,用于胎儿浓度预测。
具体线性回归模型公式如下:
其中,y
i为样本i对应的Y染色体深度推算所得男性胎儿浓度,{x
i1...x
in}为样本i中全部n个窗口对应的每个窗口内imputation矫正的位点比例,
为样本i对应的平均测序深度,
为样本i对应的较高质量比对reads所得测序深度,
为样本i对应的重复率,p为训练集中样本总数。
(5)胎儿浓度预测。
针对所有孕妇血浆cfDNA样本,以每个样本的imputation矫正位点比例,平均测序深度,高质量测序深度,重复率为协变量,利用(4)所得预测模型对胎儿浓度进行预测。
本方法已在NIPT超低深度(~0.1x)SE测序数据中完成初步测试,以10000例男胎数据作为训练集,以Y染色体深度估计所得胎儿浓度作为真集,用于线性回归模型的拟合,同时将每个样本的平均测序深度、高质量测序深度及重复率三个变量作为模型的协变量,完成预测模型的构建。之后利用此预测模型独立完成两次1000例胎儿浓度的估算,得到估算所得胎儿浓度与实际胎儿浓度(Y染色体深度计算所得胎儿浓度)的相关性如下:
图4是以10000例样本作为训练集所得模型独立进行一个1000例样本测试数据集(测试数据集1)的胎儿浓度预测结果。在该测试数据集1中,基于Y染色体深度计算所得胎儿浓度(横坐标)与本发明方法计算所得胎儿浓度(纵坐标)相关性(R
2)为0.7318(95%置信区间:0.7016~0.7593)。
图5是以10000例样本作为训练集所得模型独立进行一个1000例样本测试数据集(测试数据集2)的胎儿浓度预测结果。在该测试数据集2中,基于Y染色体深度计算所得胎儿浓度(横坐标)与本发明方法计算所得胎儿浓度(纵坐标)相关性(R
2)为0.7423(95%置信区间:0.7131~0.7689)。
两个测试数据集所估算胎儿浓度与Y染色体所得胎儿浓度Pearson检测结果均达到显著性相关(p值小于2.2x10
-16)。
其中一万例训练集样本所得线性回归相关结果见附录。其中附录表中各项数值均为R中线性回归模型的标准输出结果,其中估计值(coefficient)是每个输入的协变量(covariant)对应系数的计算值,即训练集所得模型的参数,此参数直接带入线性模型中,即可用于新的样本的胎儿cfDNA浓度的预测;标准偏差为估计值对应的误差情况;T值和p值为对应协变量的显著性检测结果;最后一列显著性为根据p值所划分的显著程度;在实际应用中,可仅挑选较为显著(如p小于0.05)的协变量用于预测。
在本发明的描述中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示 例的特征进行结合和组合。
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。
附录.
一万例孕妇样本(男胎)训练所得线性模型结果:
Claims (20)
- 一种确定孕妇血液中胎儿核酸浓度的方法,其特征在于,包括:(1)基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;(2)利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及(3)基于所述第一基因型信息和所述第二基因型信息的差异,确定所述胎儿核酸浓度。
- 根据权利要求1所述的方法,其特征在于,所述测序数据是通过对所述孕妇血液的核酸样本进行测序获得。
- 根据权利要求1所述的方法,其特征在于,所述参考基因组包含人类基因组中的至少一个强连锁区域。
- 根据权利要求1所述的方法,其特征在于,所述强连锁区域的长度为5mb~10mb。
- 根据权利要求1所述的方法,其特征在于,所述第一基因型信息是基于测序读段支持数进行确定的。
- 根据权利要求1所述的方法,其特征在于,所述第一基因型信息包括SNP,Indel的至少之一。
- 根据权利要求1所述的方法,其特征在于,所述参考数据包括多个变异位点信息和变异频率信息。
- 根据权利要求1所述的方法,其特征在于,所述校正是通过IMPUTE2进行的。
- 根据权利要求1所述的方法,其特征在于,步骤(3)进一步包括:(3-1)确定所述第一基因型和所述第二基因型信息的差异比例;(3-2)基于步骤(3-1)中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。
- 一种确定孕妇血液中胎儿核酸浓度的设备,其特征在于,包括:比对单元,所述比对单元基于测序数据与参考基因组的至少一部分的比对,确定第一基因型信息,所述测序数据来自于所述孕妇血液的核酸样本;校正单元,所述校正单元与所述比对单元相连,所述校正单元利用连锁不平衡关系,基于参考数据,对所述第一基因型信息进行校正,以便获得第二基因型信息;以及计算单元,所述计算单元分别与所述比对单元和所述校正单元相连,所述计算单元基于所述第一基因型和所述第二基因型信息的差异,确定所述胎儿核酸浓度。
- 根据权利要求10所述的设备,其特征在于,所述测序数据是通过对所述孕妇血液的核酸样本进行测序获得的。
- 根据权利要求10所述的设备,其特征在于,所述参考基因组包含人类基因组中的至少一个强连锁区域。
- 根据权利要求10所述的设备,其特征在于,所述强连锁区域的长度为5mb~10mb。
- 根据权利要求10所述的设备,其特征在于,所述第一基因型信息是基于测序读段支持数进行确定的。
- 根据权利要求10所述的设备,其特征在于,所述第一基因型信息包括SNP,Indel的至少之一。
- 根据权利要求10所述的设备,其特征在于,所述参考数据包括多个变异位点信息和变异频率信息。
- 根据权利要求10所述的设备,其特征在于,所述校正是通过IMPUTE2进行的。
- 根据权利要求10所述的设备,其特征在于,所述计算单元进一步包括:差异比例计算单元,确定所述第一基因型和所述第二基因型信息的差异比例;胎儿核酸浓度计算单元,所述胎儿核酸浓度计算单元与所述差异比例计算单元相连,所述胎儿核酸浓度计算单元基于在所述差异比例计算单元中所获得的所述差异比例和预先确定的拟合公式,确定所述胎儿核酸浓度,所述拟合公式是基于多个已知胎儿核酸浓度的参考样本确定的。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时,实现如权利要求1~9中任一项所述的方法。
- 一种计算机刻度存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1~9中任一项所述的方法。
Priority Applications (10)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| ES19941307T ES2942363T3 (es) | 2019-08-14 | 2019-08-14 | Método y dispositivo para determinar la concentración de ácido nucleico fetal en la sangre de una embarazada |
| PCT/CN2019/100629 WO2021026828A1 (zh) | 2019-08-14 | 2019-08-14 | 确定孕妇血液中胎儿核酸浓度的方法及设备 |
| HUE19941307A HUE061561T2 (hu) | 2019-08-14 | 2019-08-14 | Eljárás és berendezés magzati nukleinsav-koncentráció meghatározására várandós nõ vérében |
| DK19941307.1T DK3916105T3 (da) | 2019-08-14 | 2019-08-14 | Fremgangsmåde og indretning til bestemmelse af en føtal nukleinsyrekoncentration i blodet af en gravid kvinde |
| MYPI2021007517A MY205773A (en) | 2019-08-14 | 2019-08-14 | Method and device for determining fetal nucleic acid concentration in maternal plasma |
| EP19941307.1A EP3916105B1 (en) | 2019-08-14 | 2019-08-14 | Method and device for determining fetal nucleic acid concentration in blood of pregnant woman |
| CN201980094271.9A CN113874523B (zh) | 2019-08-14 | 2019-08-14 | 确定孕妇血液中胎儿核酸浓度的方法及设备 |
| PL19941307.1T PL3916105T3 (pl) | 2019-08-14 | 2019-08-14 | Metoda i urządzenie do oznaczania stężenia płodowego kwasu nukleinowego we krwi kobiety w ciąży |
| IL289007A IL289007A (en) | 2019-08-14 | 2021-12-14 | Method and device for determining the concentration of nucleic acids in maternal plasma |
| SA521431156A SA521431156B1 (ar) | 2019-08-14 | 2021-12-19 | طريقة وجهاز لتحديد تركيز الحمض النووي للجنين في بلازما دم الأم |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2019/100629 WO2021026828A1 (zh) | 2019-08-14 | 2019-08-14 | 确定孕妇血液中胎儿核酸浓度的方法及设备 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021026828A1 true WO2021026828A1 (zh) | 2021-02-18 |
Family
ID=74570317
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/100629 Ceased WO2021026828A1 (zh) | 2019-08-14 | 2019-08-14 | 确定孕妇血液中胎儿核酸浓度的方法及设备 |
Country Status (10)
| Country | Link |
|---|---|
| EP (1) | EP3916105B1 (zh) |
| CN (1) | CN113874523B (zh) |
| DK (1) | DK3916105T3 (zh) |
| ES (1) | ES2942363T3 (zh) |
| HU (1) | HUE061561T2 (zh) |
| IL (1) | IL289007A (zh) |
| MY (1) | MY205773A (zh) |
| PL (1) | PL3916105T3 (zh) |
| SA (1) | SA521431156B1 (zh) |
| WO (1) | WO2021026828A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113889189A (zh) * | 2021-10-14 | 2022-01-04 | 武汉蓝沙医学检验实验室有限公司 | 以生父和母亲dna评估胎儿dna浓度的方法及应用 |
| CN114171116A (zh) * | 2021-10-14 | 2022-03-11 | 武汉蓝沙医学检验实验室有限公司 | 孕妇游离及本身dna评估胎儿dna浓度的方法及应用 |
| WO2024140881A1 (zh) * | 2022-12-30 | 2024-07-04 | 深圳市真迈生物科技有限公司 | 胎儿dna浓度的确定方法及装置 |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020137088A1 (en) * | 1989-11-13 | 2002-09-26 | Children's Medical Center Corporation | Non-invasive method for isolation and detection of fetal DNA |
| US20040146883A1 (en) * | 2003-01-28 | 2004-07-29 | Affymetrix, Inc. | Methods for prenatal diagnosis |
| CN102753703A (zh) * | 2010-04-23 | 2012-10-24 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
| CN104120181A (zh) * | 2011-06-29 | 2014-10-29 | 深圳华大基因医学有限公司 | 对染色体测序结果进行gc校正的方法及装置 |
| CN104232777A (zh) * | 2014-09-19 | 2014-12-24 | 天津华大基因科技有限公司 | 同时确定胎儿核酸含量和染色体非整倍性的方法及装置 |
| CN105189787A (zh) * | 2013-05-09 | 2015-12-23 | 豪夫迈·罗氏有限公司 | 使用hla标志物测定母体血液中的胎儿dna的分数的方法 |
| WO2016084079A1 (en) * | 2014-11-24 | 2016-06-02 | Shaare Zedek Medical Center | Fetal haplotype identification |
| CN107133491A (zh) * | 2017-03-08 | 2017-09-05 | 广州市达瑞生物技术股份有限公司 | 一种获取胎儿游离dna浓度的方法 |
| JP2017192383A (ja) * | 2016-04-19 | 2017-10-26 | 学校法人藤田学園 | 胎児成分の検出方法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3783110B1 (en) * | 2009-11-05 | 2022-11-23 | The Chinese University Of Hong Kong | Fetal genomic analysis from a maternal biological sample |
| CN109971846A (zh) * | 2018-11-29 | 2019-07-05 | 时代基因检测中心有限公司 | 使用双等位基因snp靶向下一代测序的非侵入性产前测定非整倍体的方法 |
-
2019
- 2019-08-14 PL PL19941307.1T patent/PL3916105T3/pl unknown
- 2019-08-14 WO PCT/CN2019/100629 patent/WO2021026828A1/zh not_active Ceased
- 2019-08-14 MY MYPI2021007517A patent/MY205773A/en unknown
- 2019-08-14 DK DK19941307.1T patent/DK3916105T3/da active
- 2019-08-14 EP EP19941307.1A patent/EP3916105B1/en active Active
- 2019-08-14 ES ES19941307T patent/ES2942363T3/es active Active
- 2019-08-14 HU HUE19941307A patent/HUE061561T2/hu unknown
- 2019-08-14 CN CN201980094271.9A patent/CN113874523B/zh active Active
-
2021
- 2021-12-14 IL IL289007A patent/IL289007A/en unknown
- 2021-12-19 SA SA521431156A patent/SA521431156B1/ar unknown
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020137088A1 (en) * | 1989-11-13 | 2002-09-26 | Children's Medical Center Corporation | Non-invasive method for isolation and detection of fetal DNA |
| US20040146883A1 (en) * | 2003-01-28 | 2004-07-29 | Affymetrix, Inc. | Methods for prenatal diagnosis |
| CN102753703A (zh) * | 2010-04-23 | 2012-10-24 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
| CN104120181A (zh) * | 2011-06-29 | 2014-10-29 | 深圳华大基因医学有限公司 | 对染色体测序结果进行gc校正的方法及装置 |
| CN105189787A (zh) * | 2013-05-09 | 2015-12-23 | 豪夫迈·罗氏有限公司 | 使用hla标志物测定母体血液中的胎儿dna的分数的方法 |
| CN104232777A (zh) * | 2014-09-19 | 2014-12-24 | 天津华大基因科技有限公司 | 同时确定胎儿核酸含量和染色体非整倍性的方法及装置 |
| WO2016084079A1 (en) * | 2014-11-24 | 2016-06-02 | Shaare Zedek Medical Center | Fetal haplotype identification |
| JP2017192383A (ja) * | 2016-04-19 | 2017-10-26 | 学校法人藤田学園 | 胎児成分の検出方法 |
| CN107133491A (zh) * | 2017-03-08 | 2017-09-05 | 广州市达瑞生物技术股份有限公司 | 一种获取胎儿游离dna浓度的方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3916105A4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113889189A (zh) * | 2021-10-14 | 2022-01-04 | 武汉蓝沙医学检验实验室有限公司 | 以生父和母亲dna评估胎儿dna浓度的方法及应用 |
| CN114171116A (zh) * | 2021-10-14 | 2022-03-11 | 武汉蓝沙医学检验实验室有限公司 | 孕妇游离及本身dna评估胎儿dna浓度的方法及应用 |
| WO2024140881A1 (zh) * | 2022-12-30 | 2024-07-04 | 深圳市真迈生物科技有限公司 | 胎儿dna浓度的确定方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| ES2942363T3 (es) | 2023-05-31 |
| SA521431156B1 (ar) | 2023-12-03 |
| MY205773A (en) | 2024-11-12 |
| PL3916105T3 (pl) | 2023-06-26 |
| EP3916105A1 (en) | 2021-12-01 |
| EP3916105A4 (en) | 2022-04-06 |
| DK3916105T3 (da) | 2023-04-17 |
| EP3916105B1 (en) | 2023-01-25 |
| CN113874523A (zh) | 2021-12-31 |
| CN113874523B (zh) | 2024-04-30 |
| IL289007A (en) | 2022-02-01 |
| HUE061561T2 (hu) | 2023-07-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102770558B (zh) | 由母本生物样品进行胎儿基因组的分析 | |
| KR102018444B1 (ko) | 생물학적 샘플 중의 무세포 핵산의 분획을 결정하기 위한 방법 및 장치 및 이의 용도 | |
| EP2851431B1 (en) | Method, system and computer readable medium for determining base information in predetermined area of fetus genome | |
| IL265769B2 (en) | Estimation of gestational age using methylation and size profile of maternal plasma DNA | |
| EP3564391B1 (en) | Method, device and kit for detecting fetal genetic mutation | |
| US20130338012A1 (en) | Genetic risk factors of sick sinus syndrome | |
| CN105648045B (zh) | 确定胎儿目标区域单体型的方法和装置 | |
| TW202328458A (zh) | 癌症檢測之血漿dna突變分析 | |
| TWI675918B (zh) | 基於單倍型之通用非侵入性單基因疾病產前檢測 | |
| WO2021026828A1 (zh) | 确定孕妇血液中胎儿核酸浓度的方法及设备 | |
| TWI767888B (zh) | 藉由母體血漿dna之淺深度測序以準確定量胎兒dna含量 | |
| US20230383349A1 (en) | Methods of assessing risk of developing a disease | |
| GB2559437A (en) | Prenatal screening and diagnostic system and method | |
| Valsesia et al. | Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort | |
| US11869630B2 (en) | Screening system and method for determining a presence and an assessment score of cell-free DNA fragments | |
| CN116052766A (zh) | 一种染色体纯合区域的检测方法、系统及电子设备 | |
| HK40063849B (zh) | 确定孕妇血液中胎儿核酸浓度的方法及设备 | |
| HK40063849A (zh) | 确定孕妇血液中胎儿核酸浓度的方法及设备 | |
| WO2016112539A1 (zh) | 确定胎儿核酸含量的方法和装置 | |
| Berdnikova et al. | Genotype imputation in human genomic studies | |
| Du et al. | Unique dual indexing PCR reduces chimeric contamination and improves mutation detection in cell-free DNA of pregnant women | |
| JP2009069911A (ja) | 遺伝子関連解析装置及び遺伝子関連解析プログラム | |
| WO2024242641A1 (en) | Method for detection of samples with insufficient amount of fetal and circulating tumor dna fragments for non-invasive genetic testing | |
| WO2025096464A1 (en) | Estimation of circulating tumor fraction using off-target reads of targeted-panel sequencing | |
| WO2025154081A1 (en) | Methods for non-invasive prenatal testing of expansion mutations |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19941307 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2019941307 Country of ref document: EP Effective date: 20210813 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 521431156 Country of ref document: SA |














