WO2020041946A1 - Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit - Google Patents
Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit Download PDFInfo
- Publication number
- WO2020041946A1 WO2020041946A1 PCT/CN2018/102546 CN2018102546W WO2020041946A1 WO 2020041946 A1 WO2020041946 A1 WO 2020041946A1 CN 2018102546 W CN2018102546 W CN 2018102546W WO 2020041946 A1 WO2020041946 A1 WO 2020041946A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- homologous
- throughput sequencing
- reads
- homologous sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
Definitions
- the invention relates to the technical field of bioinformatics, in particular to a method and a device for detecting homologous sequences based on high-throughput sequencing.
- Homologous genes are two or more genes with sequence similarity greater than 80%. Based on the results of high-throughput sequencing data, the reads of homologous gene regions cannot be correctly aligned when they are aligned, resulting in multiple alignments. In most cases, such reads cannot accurately reflect the target position. Base case. Therefore, the correct comparison of homologous genes will encounter certain difficulties, which makes it impossible to use the existing analysis process to directly perform mutation analysis on the offline data. For example, high-throughput methods are needed to identify RHD blood types in clinical practice. Because the RHD gene has a highly homologous RHCE gene (96% similarity), the offline data cannot be correctly compared to the RHD gene. This makes some genetic diseases that contain homologous genes impossible to detect accurately.
- Rh blood group D antigen (D antigen is defined as Rh-positive or Rh-negative if it is expressed on the erythrocyte membrane) is the main erythrocyte antigen that causes severe neonatal hemolytic disease.
- the gene encoding the D antigen is the RHD gene.
- an Rh-positive individual has one RHD gene [RHD heterozygote, RHD (+) / RHD (-)] or two RHD genes [RHD homozygote, RHD (+) / RHD (+)], Rh negative individuals lack the RHD gene [RHD deletion homozygote, RHD (-) / RHD (-)], some complex Rh-negative individuals often have gene fusion or some exons are missing.
- RHD genes or RHD zygote were mainly based on the phenotype of the Rh small factor, or through indirect methods such as complex family surveys, or based on the amount of RhD antigens.
- Direct measurement technology is a restriction fragment length polymorphism (RFLP) method. This method uses a pair of PCR primers to amplify the downstream and fusion Rh boxes simultaneously, and then uses restriction endonucleases to perform digestion. It is possible to determine the three RHD zygote types in a single experiment, but the method is complicated, time-consuming and targeted at the RHD-negative type peculiar to Caucasians.
- RFLP restriction fragment length polymorphism
- Hearing and speech disabilities rank first among all types of disabilities. About 1 to 3 deaf children per 1,000 newborns each year, 60% of which are related to genetic factors. There are 27.8 million hearing-impaired people in China, about 78 million people with deafness mutation carriers, and about 4 million people with drug-sensitive mutations (high-risk groups). Nearly half of the new deaf children in China each year are drug-induced deafness. . Drug-induced deafness is also inherited in maternal family populations. Therefore, early detection of drug-induced deafness genes, avoiding damage to hearing by medication, and preventing family members and children from medication-induced deafness are urgently needed problems. The gene related to drug deafness is CYP2D6.
- CYP2D6 has a gene CYP2D7 with a similarity of 94%, it is difficult to determine using existing information analysis and comparison algorithms. Whether sequencing reads originate from CYP2D6 or CYP2D7, so a method is needed to distinguish between CYP2D6 and CYP2D7 sequencing reads, so as to accurately detect mutations that occur on CYP2D6.
- the invention provides a method and a device for detecting homologous sequences based on high-throughput sequencing, which can solve the problem of accurate positioning of the source of homologous sequences and achieve the purpose of accurately detecting mutations.
- an embodiment provides a method for detecting homologous sequences based on high-throughput sequencing, including:
- the number of reads of the reference sequence from which another homologous sequence has been removed is calculated and compared, and the homologous sequence information of the sample is determined according to the number of reads.
- the specific amplification product is a product obtained by specific amplification using multiple pairs of primers targeting multiple regions of a pair of homologous sequences.
- the aforementioned homologous sequence is a homologous gene.
- the aforementioned homologous genes are RHD and RHCE genes, or CYP2D6 and CYP2D7 genes.
- the specific amplification product is a product obtained by specific amplification using multiple pairs of primers targeting multiple exon regions of a homologous gene.
- sequence-specific site is a single nucleotide variation (SNV) site.
- SNV single nucleotide variation
- the above reference sequence from which another homologous sequence has been removed is a reference sequence in which another homologous sequence is completely replaced with an N base sequence.
- the determining the homologous sequence information of the sample according to the number of reads is specifically: determining the mutation information of the homologous sequence of the sample according to the number of reads.
- the determining the homologous sequence information of the sample according to the number of reads is specifically: determining that the homologous sequence is normal and occurs in the genome according to the difference in the number of reads of two different sources compared to the homologous sequence. Missing or duplicate conditions.
- the alignment result is corrected according to the CIGAR value to accurately distinguish the high-throughput sequencing result according to the sequence-specific site.
- the results of the alignment are corrected according to the CIGAR value.
- the method further includes: removing the linker sequences at both ends of the high-throughput sequencing result.
- the method further includes:
- an embodiment provides a device for detecting homologous sequences based on high-throughput sequencing, including:
- An obtaining unit configured to obtain a high-throughput sequencing result of specific amplification products of a pair of homologous sequences of a sample, where the specific amplification products include at least one sequence-specific site for distinguishing homologous sequences;
- the first alignment unit is configured to align the above-mentioned high-throughput sequencing results with a reference sequence, and divide the above-mentioned high-throughput sequencing results into two groups according to the sequence-specific sites, each group belonging to a homologous sequence ;
- a second alignment unit for aligning each group of high-throughput sequencing results belonging to one homologous sequence with a reference sequence from which another homologous sequence has been removed;
- the statistics unit is configured to count the number of reads of the reference sequence from which another homologous sequence has been removed, and determine the homologous sequence information of the sample according to the number of reads.
- an embodiment provides a computer-readable storage medium including a program that can be executed by a processor to implement the method as in the first aspect.
- homologous sequences are distinguished by sequence-specific sites, and the high-throughput sequencing results of specific amplification products derived from the homologous sequences are divided into two groups, and then the data of each group is compared and removed separately.
- a reference sequence of a homologous sequence so as to obtain the number of sequencing reads that belong to each type of sequence in the homologous sequence, so as to accurately distinguish the source of the sequencing sequence and then accurately detect the mutation.
- FIG. 1 is a flowchart of a method for detecting a homologous sequence based on high-throughput sequencing according to an embodiment of the present invention
- FIG. 2 is a structural block diagram of a device for detecting homologous sequences based on high-throughput sequencing according to an embodiment of the present invention
- FIG. 3 is a flowchart of RHD blood group identification and analysis according to an embodiment of the present invention.
- FIG. 4 is a diagram of a result of detecting primer uniformity in RHD blood group identification according to an embodiment of the present invention
- FIG. 5 is a statistical diagram of the number of sequencing reads that distinguish specific sites to different genes in RHD blood group identification according to an embodiment of the present invention
- FIG. 6 is a flowchart of CYP2D6 gene detection of drug-induced deafness according to an embodiment of the present invention
- FIG. 7 is a diagram showing a result of detecting primer homogeneity in CYP2D6 gene detection of drug-induced deafness according to an embodiment of the present invention.
- FIG. 8 is a statistical diagram of the number of sequencing reads that distinguish specific sites to different genes in the detection of CYP2D6 gene for drug-induced deafness in the embodiment of the present invention.
- the present invention provides a method for detecting homologous sequences based on high-throughput sequencing, which method comprises using a specific primer to amplify a pair (2) of homologous sequences, and the primer amplification region contains At least one sequence-specific site is used to distinguish a pair of homologous sequences.
- the first alignment is used to find the possible location of the sequencing reads.
- the sequence-specific sites are used to distinguish different homologous sequences to distinguish the good sequencing reads.
- Re-comparison is performed to accurately locate the mutation, and the mutation is detected through the comparison result.
- an embodiment provides a method for detecting homologous sequences based on high-throughput sequencing, including:
- S101 Obtain a high-throughput sequencing result of a specific amplification product of a pair of homologous sequences of a sample.
- the specific amplification product includes at least one sequence-specific site for distinguishing the homologous sequences.
- the “sample” refers to a sample targeted by the detection method of the present invention, which may be a clinical sample, including a healthy person sample and a patient sample, such as blood and cerebrospinal fluid samples derived from a healthy person or patient. These samples are subjected to nucleic acid (such as DNA) extraction by techniques known in the art, to obtain nucleic acid sequence fragments in the sample, and using primers that specifically target homologous sequences to amplify target fragments to obtain specific amplification products. Throughput sequencing results.
- the sequencing platform is not limited, and may be any second-generation high-throughput sequencing platform, including but not limited to Illumina, Ion Torrent, BGISEQ or MGISEQ sequencing platform, and the like.
- the "homologous sequence” generally refers to a sequence having a sequence similarity greater than 80%, but this is only an exemplary way of defining a homologous sequence.
- the type of the homologous sequence is not limited, and may be a homologous gene sequence (for example, a sequence containing a readable coding frame), or a non-genetic type homologous sequence. Examples of typical but non-limiting homologous sequences are the RHD and RHCE genes, and the CYP2D6 and CYP2D7 genes.
- the “specific amplification product” is obtained by specifically amplifying a corresponding position of a homologous sequence by a specific amplification primer.
- the specific amplification product is the result of multiplex amplification, that is, the product obtained by specific amplification using multiple pairs of primers targeting multiple regions of a pair of homologous sequences.
- multiplex amplification that is, the product obtained by specific amplification using multiple pairs of primers targeting multiple regions of a pair of homologous sequences.
- specific amplification is performed using multiple pairs of primers targeting multiple exon regions of the homologous gene to obtain a specific amplification product.
- sequence-specific site refers to a site that is different from each other at corresponding positions of a pair of homologous sequences, and the sequence-specific site contains at least 1 bp specific base, such as a single nucleotide Variation (SNV) loci.
- sequence-specific site may be a base insertion, deletion, or copy number variation.
- the high-throughput sequencing result refers to a result of a certain preprocessing of the offline data, for example, removing the adapter sequence at both ends of the sequencing reads of the offline data can improve the accuracy of the comparison. Rate and data validity.
- S102 Align the high-throughput sequencing results with a reference sequence, and divide the high-throughput sequencing results into two groups based on sequence-specific sites, each group belonging to a homologous sequence.
- the "reference sequence” generally refers to the genomic sequence of the species corresponding to the homologous sequence, such as the human reference genome sequence, and especially the human reference genome hg19.
- the base type at the sequence-specific sites can be used as a marker to distinguish high-throughput sequencing results.
- multiple (possibly thousands of) sequencing reads derived from a pair of homologous sequences are classified into each homologous sequence type.
- the alignment result is corrected based on the CIGAR value to accurately distinguish the high-throughput sequencing result according to the sequence-specific site.
- the CIGAR value correction in the present invention can improve the accuracy of sequencing reads distinguished into their respective groups, and is therefore a preferred embodiment.
- the purpose of removing another homologous sequence from the reference sequence is to obtain the absolute position of a certain homologous sequence (such as the RHD gene) for comparison to the reference sequence, so as to avoid sequence alignment to another homology sequence.
- the source sequence (such as the RHCE gene) causes sequencing reads to give the wrong alignment.
- removing another homologous sequence from a reference sequence by replacing the entire homologous sequence with an N base sequence.
- step S103 the alignment result is corrected according to the CIGAR value, which can improve the alignment accuracy of sequencing reads, and is therefore a preferred embodiment.
- step S103 considering the influence of abnormal sequencing results on subsequent statistical accuracy, after the comparison in step S103, it also includes any one or more of the following: (a) filtering out the length of the inserted fragment greater than a preset value ( (E.g., 500) or the results of aligning the sequences at different ends to different chromosomes; (b) removing the base sites whose sequencing quality value is lower than a preset value (for example, 10). Performing the statistics in step S104 after this process can improve the accuracy of the results.
- a preset value (E.g., 500) or the results of aligning the sequences at different ends to different chromosomes
- a preset value for example, 10
- S104 Count the number of reads of the reference sequence from which another homologous sequence has been removed, and determine the homologous sequence information of the sample according to the number of reads.
- the number of reads aligned on the reference sequence reflects the existence of a homologous sequence genotype in the reference genome, such as the dose. Therefore, by counting the number of reads of the reference sequence from which another homologous sequence has been removed, it is possible to determine the homologous sequence information of the sample, such as the mutation information of the homologous sequence, such as the homologous sequence is normal in the genome and has a deletion. Or repeated situations.
- an embodiment of the present invention provides a device for detecting homologous sequences based on high-throughput sequencing, including: an obtaining unit 201 High-throughput sequencing results of specific amplification products of a pair of homologous sequences used to obtain a sample, the specific amplification products containing at least one sequence-specific site for distinguishing homologous sequences; first alignment Unit 202 is configured to compare the above-mentioned high-throughput sequencing results with a reference sequence, and divide the above-mentioned high-throughput sequencing results into two groups according to the sequence-specific sites, each group belonging to a homologous sequence; the second An alignment unit 203 is used to compare each group of high-throughput sequencing results belonging to one homologous sequence with a reference sequence from which another homologous sequence has been removed; and a statistics unit 204 is used to perform statistical comparison to removal The number of reads of the reference sequence of another homologous sequence.
- an embodiment of the present invention provides a computer-readable storage medium including a program, which can be executed by a processor to implement the method for detecting a homologous sequence based on the high-throughput sequencing of the present invention.
- the program may be stored in a computer-readable storage medium.
- the storage medium may include: a read-only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc.
- the computer executes the program to realize the above functions.
- the program is stored in the memory of the device, and when the processor executes the program in the memory, all or part of the functions described above can be implemented.
- the program may also be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a mobile hard disk, and saved by downloading or copying.
- a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a mobile hard disk, and saved by downloading or copying.
- Design 10 pairs of primers for the 10 exons of the RHD gene and the RHCE gene are designed 10 pairs of primers for the 10 exons of the RHD gene and the RHCE gene. Each pair of primers amplifies one exon region of the RHD gene and the RHCE gene, respectively. 20 products are obtained by amplification, and the 20 products are compared. For classification, accurately distinguish the source of sequencing reads according to specific sites, and then perform comparison again, and finally compare the results of the comparison to calculate the coverage depth of sequencing reads on all exons. By comparing the coverage of a certain exon of the RHD gene and the RHCE gene, to determine whether the exon of the RHD gene is deleted or duplicated.
- This embodiment includes an experimental part and a biological information analysis part.
- the experimental part includes: designing specific primers for homologous genes and performing multiplex PCR amplification to complete the preparation of high-throughput sequencing libraries.
- the homologous gene regions obtained by primer amplification contain at least 1 bp differential sequences to distinguish homologous gene sequences.
- the biological information analysis section includes: comparing the sequences amplified by the primers, finding the possible positions of the sequences through the first alignment, using differentiating sites to distinguish the sequences of different homologous genes, and performing the distinguished sequences. Re-alignment, through the detection of mutations in the results of the comparison, and comparing the coverage depth of the sequencing reads of a certain homologous region to determine whether there are deletions or duplications in that region. Mutation detection is performed through precisely located sequencing reads to accurately detect genetic disease mutations.
- the RHD blood group identification analysis process includes:
- the RHD gene and the RHCE gene each contain 10 exon regions, and 10 pairs of primers are designed to amplify the exon sequences corresponding to the RHD gene and the RHCE gene, respectively.
- Each exon sequence contains at least 1 bp of specific base. Base to accurately distinguish the RHD gene from the RHCE gene.
- a second-generation sequencing library was obtained by PCR amplification, and sequencing reads were obtained by high-throughput sequencing for each of the 10 exons.
- the primers are processed according to the starting position of the primer, the starting position of the sequence, and the length of the primer to ensure that the primer sequence is removed most accurately, thereby retaining the most accurate true sequence information.
- Target region primer design Primer design for the RHD gene. 10 pairs of primers cover 10 exon regions of the RHD gene. Each pair of primers simultaneously amplifies the same homologous exon of RHD and RHCE. The obtained amplification product contains at least 1bp specific sequence, used in subsequent sequencing to distinguish the source of the sequence. The primer sequence is shown in Table 1.
- the specific primer pool 1 is obtained by mixing the above-mentioned primers in equal molar numbers.
- the PCR amplification enzyme uses the KAPA2G Fast Multiplex PCR Kit product (Cat. No. KK5801) from the American kapa company:
- the amplification system is shown in Table 3:
- step 1 98 °C, 2min Step 2 98 °C, 10s Step 3 62 °C, 2min Step 4 72 °C, 30s Step 5
- steps 2-4 15 cycles Step 6 72 °C, 5min
- Agencourt AMPure XP magnetic beads (American Beckman Coulter Co., Ltd.) were added in 1 volume, and purified according to the instructions. After purification, the DNA was dissolved with 20 ⁇ l of distilled water.
- the PCR amplifying enzyme used KAPA2G Fast Multiplex PCR Kit product (Cat. No. KK5801) from Kapa Company, USA.
- step 1 98 °C, 2min Step 2 98 °C, 10s Step 3 62 °C, 2min Step 4 72 °C, 30s Step 5
- steps 2-4 15 cycles Step 6 72 °C, 5min
- Agencourt AMPure XP magnetic beads (American Beckman Coulter Co., Ltd.) were added in 1 volume, and purified according to the instructions. After purification, the DNA was dissolved with 20 ⁇ l of distilled water.
- the BGISEQ-500 platform was used for sequencing, and the sequencing type was 50bp at both ends.
- the target sequence was aligned to the human reference genome hg19 by the BWA-ALN algorithm.
- the bwa version was 0.7.15 and the samtools version was 0.1.18.
- the purpose is to convert the numerical expression form of the FLAG value in the second column to the letter expression form, which can be used to distinguish R1. Or R2, which is used to calculate the degree of primer enrichment for each target region.
- the sequence is considered to be the data of the target region.
- 3M1I46M Compared with the reference genome, the first 3 bases of this sequence can be compared to the reference genome, the fourth base is the extra base, and the fifth base can be compared to the reference genome. , So the fourth base needs to be deleted; 3M1D47M: Compared with the reference genome, the first 3 bases of this sequence can be compared to the reference genome, one base is missing at the fourth position, and the fourth base can be started from Align to the reference genome, so you need to add the letter D in the fourth position; 48M2S: Compared with the reference genome, the first 48 bases of this sequence can be compared to the reference genome, and the last 2 bases cannot be compared to the reference Genome, so the last 2 bases need to be deleted; 3S47M: Compared with the reference genome, the first 3 bases of this sequence cannot be compared to the reference genome, and the reference genome can be compared from the 4th base, so Need to delete the first 3 bases.
- a sequence covers the position chr1: 25599086. If the sequence reads are aligned to the base G at this position, the sequence read is considered to belong to the RHD gene. If the sequence base is C, the sequence read is considered to belong to RHCE. gene.
- the quality value corresponding to each base is ASCII converted to the corresponding decimal value, and then the corresponding quality value is obtained by subtracting 33. If the value is less than 10, the base is replaced with an *.
- the target region sequence was counted and the difference in quantity between the two was compared (Figure 5).
- the results showed that there was no significant difference in the depth coverage of the 10 exons of RHD and RHCE in RHD homozygous positive individual 1.
- the 10 exons of RHD were normal, so it can be judged that the RHD gene is a homozygous RHD ( +) / RHD (+);
- RHD exon coverage in RHD heterozygous positive individuals 2 is about half that of RHCE, and half of RHD is missing compared to RHCE, so it can be judged that this RHD gene is heterozygous RHD (+) / RHD (-);
- the coverage of RHD exons in RHD-negative individuals 3 is almost absent, and there is almost no sequencing reads coverage compared with RHCE, so it can be judged that the 10 exons of the RHD gene are deleted and are homozygous.
- homologous genes are captured based on multiplex PCR, and the next-generation sequencing and information analysis methods are used to accurately perform dose analysis on a homologous gene region.
- the differences in homologous sequences can be used to determine whether genes are missing or duplicated .
- CYP2D6 There are several SNP sites in CYP2D6 gene related to drug-induced deafness. Different SNP base information is related to drug metabolism. However, CYP2D6 has a homology gene CYP2D7 with 94% similarity. It is difficult to avoid amplification based on PCR. To CYP2D7.
- sample 1 Five pairs of specific primers were used to detect two samples (sample 1, sample 2) of the base information of the known drug site, the amplified products were sequenced on the machine, the sequencing results were analyzed, and the specific sites were used to distinguish CYP2D6 and CYP2D7 sequencing reads, and then based on the discrimination results to accurately detect the base information of CYP2D6 and drug-related metabolic sites.
- the detection process is shown in Figure 6.
- Target region primer design Primer design for CYP2D6 gene, 5 pairs of primers cover 5 drug-related SNP sites (see Table 11) of CYP2D6 gene, each pair of primers simultaneously amplify the same site of CYP2D6, CYP2D7, the resulting extension
- the amplification product contains at least a 1 bp specific sequence (Table 12) and is used for subsequent sequencing to distinguish the source of the sequence.
- Table 12 The absolute positions and corresponding base types used to distinguish CYP2D6 and CYP2D7
- the PCR amplifying enzyme used KAPA2G Fast Multiplex PCR Kit product (Cat. No. KK5801) from Kapa Company, USA.
- the specific primer pool 2 is shown in Table 14:
- the specific primer pool 2 is composed of an equal number of moles of the above primers.
- the amplification system is shown in Table 15 below:
- step 1 98 °C, 2min Step 2 98 °C, 10s Step 3 62 °C, 2min Step 4 72 °C, 30s Step 5
- steps 2-4 15 cycles Step 6 72 °C, 5min
- the PCR amplifying enzyme used KAPA2G Fast Multiplex PCR Kit product (Cat. No. KK5801) from Kapa Company, USA.
- the general primers are shown in Table 5.
- the amplification system is shown in Table 17 below:
- step 1 98 °C, 2min Step 2 98 °C, 10s Step 3 62 °C, 2min Step 4 72 °C, 30s Step 5
- steps 2-4 15 cycles Step 6 72 °C, 5min
- Agencourt AMPure XP magnetic beads (American Beckman Coulter Co., Ltd.) were added in 1 volume, and purified according to the instructions. After purification, the DNA was dissolved with 20 ⁇ l of distilled water.
- the BGISEQ-500 platform was used for sequencing, and the sequencing type was 50bp at both ends.
- the target sequence was aligned to the human reference genome hg19 by the BWA-ALN algorithm.
- the bwa version was 0.7.15 and the samtools version was 0.1.18.
- the sequence is considered to be the data of the target region.
- the CIGAR value is used to correct the result of the comparison and restore the reads to the original state.
- the quality value corresponding to each base is ASCII converted to the corresponding decimal value, and then the corresponding quality value is obtained by subtracting 33. If the value is less than 10, the base is replaced with an *.
- the target region sequence was counted and the difference in quantity between the two was compared ( Figure 8).
- the results show that in one sample, the number of sequencing reads that are distinguished into CYP2D6 and CYP2D7 is approximately the same.
- Table 19 shows two sample locus result detections.
- the results show that: first, the different sequencing reads sources are distinguished by gene-specific sites, and then the target position base information is obtained based on the distinguished sequencing reads. In these two samples, the base information of the target site can be correctly identified.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne un procédé et un dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit, le procédé comprenant : l'obtention de résultats de séquençage à haut débit de produits d'amplification spécifiques d'une paire de séquences homologues d'un échantillon, les produits d'amplification spécifiques comprenant au moins un site spécifique à une séquence utilisé pour différencier des séquences homologues; la comparaison des résultats de séquençage à haut débit à une séquence de référence, et la division des résultats de séquençage à haut débit en deux groupes en fonction des sites spécifiques à une séquence, chaque groupe appartenant à une séquence homologue; la comparaison de chaque groupe de résultats de séquençage à haut débit appartenant à une séquence homologue à une séquence de référence de laquelle une autre séquence homologue a été retirée; et le décompte du nombre de lectures par rapport à la séquence de référence de laquelle une autre séquence homologue a été retirée, et la détermination d'informations de séquence homologue de l'échantillon en fonction du nombre de lectures. La présente invention peut résoudre le problème du positionnement précis de sources de séquences homologues et permet de détecter avec précision des mutations.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2018/102546 WO2020041946A1 (fr) | 2018-08-27 | 2018-08-27 | Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit |
| CN201880096241.7A CN112513292B (zh) | 2018-08-27 | 2018-08-27 | 基于高通量测序检测同源序列的方法和装置 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2018/102546 WO2020041946A1 (fr) | 2018-08-27 | 2018-08-27 | Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020041946A1 true WO2020041946A1 (fr) | 2020-03-05 |
Family
ID=69642718
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/102546 Ceased WO2020041946A1 (fr) | 2018-08-27 | 2018-08-27 | Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN112513292B (fr) |
| WO (1) | WO2020041946A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115851895A (zh) * | 2022-12-23 | 2023-03-28 | 中国医学科学院输血研究所 | 基于三代测序的RHD、RHCE mRNA全长的测定方法及试剂盒 |
| CN120340601A (zh) * | 2025-06-20 | 2025-07-18 | 杭州华大序风科技有限公司 | 基因变异检测方法及装置、电子设备及存储介质 |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114457169A (zh) * | 2022-03-07 | 2022-05-10 | 南京鼓楼医院 | 一种基于高通量测序的RhD基因型检测方法 |
| CN114944188B (zh) * | 2022-05-19 | 2026-02-06 | 广州微远基因科技有限公司 | 样本同源性判定模型及其建立方法和应用 |
| CN116959580B (zh) * | 2023-08-30 | 2025-09-19 | 予果生物科技(北京)有限公司 | 一种基于靶向高通量测序序列的比对方法及其应用 |
| CN118703609A (zh) * | 2024-04-08 | 2024-09-27 | 上海荷谱诊断技术有限公司 | 一种同时测定cyp2d6基因多态性和拷贝数的方法、试剂盒及系统 |
| CN119920328B (zh) * | 2025-03-31 | 2025-09-26 | 中国科学院微生物研究所 | 一种病毒物种鉴定方法、鉴定系统、设备和介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102965367A (zh) * | 2012-12-04 | 2013-03-13 | 中国农业科学院棉花研究所 | 一种获得植物候选抗病基因序列的方法 |
| CN105112569A (zh) * | 2015-09-14 | 2015-12-02 | 中国医学科学院病原生物学研究所 | 基于宏基因组学的病毒感染检测及鉴定方法 |
| WO2016023962A1 (fr) * | 2014-08-13 | 2016-02-18 | Progenika Biopharma S.A. | Détection d'allèles sur la base d'un consensus |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2003056030A2 (fr) * | 2001-11-08 | 2003-07-10 | The Johns Hopkins University | Procedes et systemes pour le sequençage d'acides nucleiques |
| CN104531883B (zh) * | 2015-01-14 | 2018-02-02 | 北京圣谷同创科技发展有限公司 | Pkd1基因突变的检测试剂盒及检测方法 |
| CN106367475B (zh) * | 2015-07-23 | 2019-08-30 | 上海生物信息技术研究中心 | 一种mmr基因突变检测试剂盒 |
| CN105567830A (zh) * | 2016-01-29 | 2016-05-11 | 江汉大学 | 一种植物转基因成分的检测方法 |
| WO2017156290A1 (fr) * | 2016-03-09 | 2017-09-14 | Baylor College Of Medicine | Nouvel algorithme pour l'analyse du nombre de copies de smn1 et smn2 à l'aide de données de profondeur de couverture à partir d'un séquençage de prochaine génération |
| CN107653299A (zh) * | 2016-07-23 | 2018-02-02 | 成都十洲科技有限公司 | 一种基于高通量测序的基因芯片探针序列的获取方法 |
| CN107688727B (zh) * | 2016-08-05 | 2020-07-14 | 深圳华大基因股份有限公司 | 生物序列聚类和全长转录组中转录本亚型识别方法和装置 |
| CN106372459B (zh) * | 2016-08-30 | 2019-03-15 | 天津诺禾致源生物信息科技有限公司 | 一种基于扩增子二代测序拷贝数变异检测的方法及装置 |
| CN106282356B (zh) * | 2016-08-30 | 2019-11-26 | 天津诺禾医学检验所有限公司 | 一种基于扩增子二代测序点突变检测的方法及装置 |
| WO2018112249A1 (fr) * | 2016-12-15 | 2018-06-21 | Illumina, Inc. | Procédés et systèmes pour déterminer des paralogues |
| CN108103204B (zh) * | 2017-12-15 | 2018-10-02 | 东莞博奥木华基因科技有限公司 | 基于多重PCR及二代测序的Rh血型分型方法及装置 |
| CN108424907B (zh) * | 2018-05-09 | 2021-10-15 | 北京大学 | 一种高通量dna多位点精确碱基突变方法 |
-
2018
- 2018-08-27 WO PCT/CN2018/102546 patent/WO2020041946A1/fr not_active Ceased
- 2018-08-27 CN CN201880096241.7A patent/CN112513292B/zh active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102965367A (zh) * | 2012-12-04 | 2013-03-13 | 中国农业科学院棉花研究所 | 一种获得植物候选抗病基因序列的方法 |
| WO2016023962A1 (fr) * | 2014-08-13 | 2016-02-18 | Progenika Biopharma S.A. | Détection d'allèles sur la base d'un consensus |
| CN105112569A (zh) * | 2015-09-14 | 2015-12-02 | 中国医学科学院病原生物学研究所 | 基于宏基因组学的病毒感染检测及鉴定方法 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115851895A (zh) * | 2022-12-23 | 2023-03-28 | 中国医学科学院输血研究所 | 基于三代测序的RHD、RHCE mRNA全长的测定方法及试剂盒 |
| CN115851895B (zh) * | 2022-12-23 | 2025-03-21 | 中国医学科学院输血研究所 | 基于三代测序的RHD、RHCE mRNA全长的测定方法及试剂盒 |
| CN120340601A (zh) * | 2025-06-20 | 2025-07-18 | 杭州华大序风科技有限公司 | 基因变异检测方法及装置、电子设备及存储介质 |
| CN120340601B (zh) * | 2025-06-20 | 2025-09-26 | 杭州华大序风科技有限公司 | 基因变异检测方法及装置、电子设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112513292B (zh) | 2023-12-26 |
| CN112513292A (zh) | 2021-03-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020041946A1 (fr) | Procédé et dispositif de détection de séquences homologues sur la base d'un séquençage à haut débit | |
| JP7637139B2 (ja) | がん予測パイプラインにおけるrna発現コールを自動化するためのシステムおよび方法 | |
| Aune et al. | Expression of long non-coding RNAs in autoimmunity and linkage to enhancer function and autoimmune disease risk genetic variants | |
| Sheng et al. | Multi-perspective quality control of Illumina RNA sequencing data analysis | |
| CN106715711B (zh) | 确定探针序列的方法和基因组结构变异的检测方法 | |
| TWI636255B (zh) | 癌症檢測之血漿dna突變分析 | |
| Guo et al. | Three-stage quality control strategies for DNA re-sequencing data | |
| US12106825B2 (en) | Computational modeling of loss of function based on allelic frequency | |
| WO2015149034A9 (fr) | Fusions de gènes et variants de gènes associés au cancer | |
| WO2015042980A1 (fr) | Procédé, système et support lisible par un ordinateur pour la détermination d'informations de snp dans une région chromosomique prédéfinie | |
| Fang et al. | DNA methylation entropy is associated with DNA sequence features and developmental epigenetic divergence | |
| US10787708B2 (en) | Method of identifying a gene associated with a disease or pathological condition of the disease | |
| Kubiritova et al. | On the critical evaluation and confirmation of germline sequence variants identified using massively parallel sequencing | |
| WO2020047694A1 (fr) | Procédé et dispositif de détermination du statut génétique d'une nouvelle mutation dans un embryon | |
| CN111508561A (zh) | 同源序列和同源序列中串联重复序列的检测方法、计算机可读介质和应用 | |
| Yamamoto et al. | Functional landscape of genome-wide postzygotic somatic mutations between monozygotic twins | |
| WO2024137407A1 (fr) | Procédés et cibles d'entropie de méthylation d'adn | |
| CN116287309A (zh) | 鉴定鹅繁殖障碍的分子标记、引物、pcr方法和应用 | |
| JP2020520679A (ja) | 無細胞核酸から得られた配列分析データに係わる背景対立因子の頻度分布を生成する方法、及びそれを利用して無細胞核酸から変異を検出する方法 | |
| Borràs et al. | The use of transcriptomics in clinical applications | |
| Meng | Ethics statement | |
| Lobon Garcia et al. | Somatic mutations detected in Parkinson disease could affect genes with a role in synaptic and neuronal processes | |
| Devine et al. | Xuefang Zhao, Ryan L. Collins, Wan-Ping Lee, 5 Alexandra M. Weber, 6, 7 Yukyung Jun, 5 Qihui Zhu, 5 Ben Weisburd, 2 Yongqing Huang, 8 Peter A. Audano, 9 Harold Wang, Mark Walker, 2, 3 Chelsea Lowther, Jack Fu, Human Genome Structural Variation Consortium, Mark B. Gerstein, 10 | |
| Vermeulen | Improving and estimating Y chromosome loss in blood and brain tissues using high-throughput sequencing | |
| Culibrk | Copy number variation in metastatic cancer: methods and analysis of somatic copy number variation in advanced human cancers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18931705 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18931705 Country of ref document: EP Kind code of ref document: A1 |