WO2026006622A2 - Séquençage de polynucléotides à complément complet à l'aide de nanopores - Google Patents
Séquençage de polynucléotides à complément complet à l'aide de nanoporesInfo
- Publication number
- WO2026006622A2 WO2026006622A2 PCT/US2025/035519 US2025035519W WO2026006622A2 WO 2026006622 A2 WO2026006622 A2 WO 2026006622A2 US 2025035519 W US2025035519 W US 2025035519W WO 2026006622 A2 WO2026006622 A2 WO 2026006622A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- complement
- polynucleotide
- nanopore
- construct
- duplex
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Definitions
- This application generally relates to sequencing polynucleotides using nanopores.
- a significant amount of academic and corporate time and energy has been invested into using nanopores to sequence polynucleotides.
- the dwell time has been measured for complexes of DNA with the KI enow fragment (KF) of DNA polymerase I atop a nanopore in an applied electric field.
- KF KI enow fragment
- a current or flux-measuring sensor has been used in experiments involving DNA captured in a a-hemolysin nanopore.
- KF -DNA complexes have been distinguished on the basis of their properties when captured in an electric field atop an a-hemolysin nanopore.
- polynucleotide sequencing is performed using a single polymerase enzyme complex including a polymerase enzyme and a template nucleic acid attached proximal to a nanopore, and nucleotide analogs in solution.
- the nucleotide analogs include charge blockade labels that are attached to the polyphosphate portion of the nucleotide analog such that the charge blockade labels are cleaved when the nucleotide analog is incorporated into a polynucleotide that is being synthesized.
- the charge blockade label is detected by the nanopore to determine the presence and identity of the incorporated nucleotide and thereby determine the sequence of a template polynucleotide.
- constructs include a transmembrane protein nanopore subunit and a nucleic acid handling enzyme.
- a DNA template-polymerase complex is formed on the first side of an a-hemolysin nanopore, and includes a DNA duplex and a polymerase.
- the DNA template includes abasic reporter nucleotides that initially are positioned on the second side of the a-hemolysin nanopore. While the polymerase is used to add nucleotides to the duplex based on the sequence of the DNA template, the ionic current through the nanopore is measured (IEBS, where EBS refers to the enzyme bound state). As these nucleotides are added, the abasic reporter nucleotides are drawn towards and subsequently through the a-hemolysin, which causes changes in IEBS.
- compositions, systems, and methods may not necessarily be sufficiently robust, reproducible, or sensitive and may not have sufficiently high throughput for practical implementation, e.g., demanding commercial applications such as genome sequencing in clinical and other settings that demand cost effective and highly accurate operation. Accordingly, what is needed are improved compositions, systems, and methods for sequencing polynucleotides.
- Some examples herein provide a method of sequencing a polynucleotide using a nanopore including a first side, a second side, and an aperture extending through the first and second sides.
- the method may include (a) generating a construct including the polynucleotide and a complement of the polynucleotide.
- the method may include (b) disposing the construct through the aperture of the nanopore such that a 3 ' end of the construct is on the first side of the nanopore, and a 5' end of the construct is on the second side of the nanopore.
- the method may include (c) forming a first duplex with the construct on the first side of the nanopore, the duplex including a 3 ' end.
- the method may include (d) characterizing the polynucleotide using operations including (i), (ii), (iii), and (iv).
- Operation (i) may include extending the first duplex on the first side of the nanopore by adding to the 3 ' end of the first duplex a first nucleotide that is complementary to a nucleotide of the polynucleotide.
- Operation (ii) may include inhibiting, using the nanopore, translocation of the 3' end of the extended first duplex to the second side of the nanopore.
- Operation (iii) may include measuring a value of an electrical property of the 3 ' end of the extended first duplex and a single-stranded portion of the polynucleotide.
- Operation (iv) may include repeating operations (i)-(iii) for a first plurality of nucleotides that are complementary to nucleotides of the polynucleotide to generate a first plurality of measured values.
- Some examples herein provide a method of sequencing a polynucleotide using a nanopore including a first side, a second side, and an aperture extending through the first and second sides. The method may include (a) generating a construct including the sense polynucleotide and a complement of an anti-sense of the polynucleotide.
- the method may include (b) disposing the construct through the aperture of the nanopore such that a 3' end of the construct is on the first side of the nanopore, and a 5' end of the construct is on the second side of the nanopore.
- the method may include (c) forming a first duplex with the construct on the first side of the nanopore, the duplex including a 3 ' end.
- the method may include (d) characterizing the polynucleotide using operations including (i), (ii), (iii), and (iv).
- Operation (i) may include extending the first duplex on the first side of the nanopore by adding to the 3 ' end of the first duplex a first nucleotide that is complementary to a nucleotide of the polynucleotide. Operation (ii) may include inhibiting, using the nanopore, translocation of the 3' end of the extended first duplex to the second side of the nanopore. Operation (iii) may include measuring a value of an electrical property of the 3 ' end of the extended first duplex and a single-stranded portion of the polynucleotide. Operation (iv) may include repeating operations (i)-(iii) for a first plurality of nucleotides that are complementary to nucleotides of the polynucleotide to generate a first plurality of measured values.
- the methods also may include (e) characterizing the complement using operations including (v), (vi), (vii), and (viii).
- Operation (v) may include extending the first duplex or a second duplex on the first side of the nanopore by adding to the 3 ' end of the first or second duplex a second nucleotide that is complementary to a nucleotide of the complement.
- (vi) may include inhibiting, using the nanopore, translocation of the 3 ' end of the extended first or second duplex to the second side of the nanopore.
- Operation (vii) may include measuring a value of an electrical property of the 3 ' end of the extended first or second duplex and a singlestranded portion of the polynucleotide.
- Operation (viii) may include repeating operations (v)-
- the method also may include (f) generating a first sequence of the polynucleotide using the first plurality of measured values and the second plurality of measured values. [0009] In some examples, the methods further includes repeatedly performing operations (c) through (f) to obtain a plurality of additional sequences of the polynucleotide, and generating a consensus sequence using the first sequence and the plurality of additional sequences.
- the methods further includes repeatedly performing operations (c) through (f) to obtain additional first and second pluralities of measured values, and generating a consensus set of measured values using the additional first and second pluralities of measured values.
- the polynucleotide and the complement of the polynucleotide hybridize to one another on the second side of the nanopore to form a third duplex.
- the third duplex inhibits formation of transient secondary structures on the second side of the nanopore.
- the construct includes a hairpin or linker coupling the polynucleotide to the complement of the polynucleotide.
- the method further includes synthesizing the complement of the polynucleotide using the hairpin as a primer.
- the polynucleotide includes epigenetic marks. In some examples, the complement of the polynucleotide lacks any epigenetic marks.
- the complement of the polynucleotide includes epigenetic marks. In some examples, the polynucleotide lacks any epigenetic marks.
- the method further includes generating the first sequence of the polynucleotide includes using the complement of the polynucleotide as a reference.
- the method further includes heating the construct to a temperature that inhibits formation of secondary structures on the first side of the nanopore.
- the methods further includes heating the construct to a temperature that inhibits formation of secondary structures on the second side of the nanopore.
- the methods further includes, after operation (iii) and before repeating operation (i), inhibiting hybridization between the polynucleotide and the complement of the polynucleotide on the first side of the nanopore by moving only a portion of the polynucleotide from the second side of the nanopore to the first side of the nanopore. Additionally, or alternatively, in some examples, the method further includes, after operation (vii) and before repeating operation (v), inhibiting hybridization between the polynucleotide and the complement of the polynucleotide on the first side of the nanopore by moving only a portion of the complement from the second side of the nanopore to the first side of the nanopore.
- the polynucleotide includes native RNA, and the complement of the polynucleotide includes synthetic cDNA. In some other examples, the polynucleotide includes synthetic cDNA, and the complement of the polynucleotide includes native RNA.
- the construct further includes a direct repeat of the RNA, and a complement of the direct repeat of the RNA.
- the method further includes characterizing the direct repeat of the RNA to generate a third plurality of measured values; and characterizing the complement of the direct repeat of the RNA to generate a fourth plurality of measured values.
- the first sequence of the polynucleotide further is generated using the third plurality of measured values and the fourth plurality of measured values.
- the polynucleotide includes native DNA, and the complement of the polynucleotide includes synthetic DNA. In some other examples, the polynucleotide includes synthetic DNA, and the complement of the polynucleotide includes native DNA.
- the polynucleotide includes a sense strand, and optionally a complement of the sense strand.
- the construct further may include an antisense strand that is complementary to the sense strand, and a complement of the antisense strand.
- the method further includes characterizing the antisense strand to generate a third plurality of measured values; and characterizing the complement of the antisense strand to generate a fourth plurality of measured values.
- the first sequence of the polynucleotide further may be generated using the third plurality of measured values and the fourth plurality of measured values.
- operation (d) is performed before operation (e). In some other examples, operation (e) is performed before operation (d).
- Some examples herein provide a method of generating a signal using a nanopore including a first side, a second side, and an aperture extending through the first and second sides. The method may include disposing a construct through the aperture of the nanopore such that a 3' end of the construct is on the first side of the nanopore, and a 5' end of the construct is on the second side of the nanopore, wherein the construct includes a polynucleotide and a complement of the polynucleotide.
- the method may include forming a first duplex with the construct on the first side of the nanopore, the first duplex including a 3' end.
- the method may include forming a second duplex on the second side of the nanopore, the second duplex including the polynucleotide at least partially hybridized to the complement.
- the method may include extending the first duplex on the first side of the nanopore by adding a first nucleotide that is complementary to a nucleotide of the polynucleotide.
- the method may include measuring a value of an electrical property of the 3 ' end of the extended first duplex and a single-stranded portion of the polynucleotide while (i) inhibiting, using the nanopore, translocation of the 3 ' end of the extended first duplex to the second side of the nanopore; and (ii) inhibiting, using the second duplex, formation of transient secondary structures on the second side of the nanopore.
- the method may include coupling a native double-stranded polynucleotide including a sense strand and an antisense strand to first and second stem-loop adapters each including a cleavable moiety.
- the method may include cleaving the cleavable moiety of each of the first and second stem-loop adapters.
- the method may include dehybridizing the sense strand from the antisense strand.
- the method may include synthesizing a complement of the sense strand to form a first partial construct including the sense strand coupled to the complement of the sense strand via the first stem-loop adapter.
- the method may include synthesizing a complement of the antisense strand to form a second partial construct including the antisense strand coupled to the complement of the antisense strand via the second stem-loop adapter.
- the method may include coupling a first forked adapter to the first partial construct to form a first construct.
- the method may include coupling a second forked adapter to the second partial construct to form a second construct.
- the method may include coupling a native double-stranded polynucleotide including a sense and an antisense strand to a first forked adapter and a second forked adapter.
- the first forked adapter and the second forked adapter each have a complementary linking sequence to each other.
- the complementary linking sequences may have a blocker sequence hybridized to one or both of the complementary linking sequences to prevent adapter dimerization.
- the complementary linking strands of the first adapter and second adapter hybridize to each other.
- the sense strand may be extended to synthesize a reverse complement of the antisense strand.
- the antisense strand may be extended to synthesize a reverse complement of the sense strand.
- dehybridizing the sense strand from the antisense strand, synthesizing the complement of the sense strand, and synthesizing the complement of the antisense strand include using a first strand-displacing polymerase to synthesize the complement of the sense strand; and using a second strand-displacing polymerase to synthesize the complement of the antisense strand.
- cleaving the cleavable moiety generates a 1-base pair gap flanked by a 3' phosphate and a 5' phosphate, the method further including using a polynucleotide kinase (PNK) to remove the 3' phosphate from the gap, to leave a free 3' OH at the gap.
- PNK polynucleotide kinase
- the forked adapter includes a 3' steric lock.
- the first and second stem-loop adapters respectively include unique molecular identifiers (UMIs).
- UMIs unique molecular identifiers
- the sense strand includes one or more epigenetic marks, and the complement of the sense strand does not include any epigenetic marks, or (ii) the antisense strand includes one or more epigenetic marks, and the complement of the antisense strand does not include any epigenetic marks.
- the first forked adapter is coupled to the first partial construct using ligation
- the second forked adapter is coupled to the second partial construct using ligation
- the method may include coupling a native double-stranded polynucleotide including a sense strand and an antisense strand to first and second adapters.
- the first adapter may include a first stem-loop adapter
- the second adapter may include a target sequence for in situ enzymatic loop formation and may lack any loop.
- the method may include coupling a forked adapter to the second adapter using the target sequence.
- the method may include dehybridizing the sense strand from the antisense strand.
- the first stem-loop adapter may couple the sense strand to the antisense strand after the dehybridizing.
- the method may include synthesizing a complement of the sense strand hybridized to the sense strand.
- the method may include synthesizing a complement of the antisense strand hybridized to the antisense strand.
- the method may include forming a second loop coupling the sense strand to the complement of the sense strand.
- the native double-stranded polynucleotide is coupled to the first adapter using a first transposase, and is coupled to the second adapter using a second transposase.
- dehybridizing the sense strand from the antisense strand, synthesizing the complement of the sense strand, and synthesizing the complement of the antisense strand include using a strand-displacing polymerase.
- the forked adapter includes a 3' steric lock.
- the first stem-loop adapter and/or the forked adapter respectively include unique molecular identifiers (UMIs).
- UMIs unique molecular identifiers
- the sense strand includes one or more epigenetic marks, and the complement of the sense strand does not include any epigenetic marks, and the antisense strand includes one or more epigenetic marks, and the complement of the antisense strand does not include any epigenetic marks.
- the complement of the sense strand and the complement of the antisense strand are synthesized using only unmodified nucleotides. In some other examples, the complement of the sense strand and the complement of the antisense strand are synthesized using modified nucleotides.
- the construct may include a sense strand.
- the construct may include an antisense strand that is complementary to the sense strand, is not hybridized to the sense strand, and is coupled to the sense strand via an adapter.
- the construct may include a complement of the sense strand that is hybridized to the sense strand.
- the construct may include a complement of the antisense strand that is hybridized to the antisense strand.
- the construct may include a loop coupling the sense strand to the complement of the sense strand.
- the sense strand includes one or more epigenetic marks, and the complement of the sense strand does not include any epigenetic marks, and the antisense strand includes one or more epigenetic marks, and the complement of the antisense strand does not include any epigenetic marks.
- Some examples herein provide another method of making a construct.
- the method may include coupling a native single-stranded polynucleotide to first and second adapters.
- the first adapter may include a target sequence and the second adapter may include a primer.
- the method may include, using the primer, synthesizing a complement of the single-stranded polynucleotide and a complement of the target sequence.
- the method may include coupling the target sequence to the complement of the target sequence to form a stem-loop adapter.
- the native single- stranded polynucleotide is coupled to the first and second adapters using ligation.
- the forked adapter includes a 5' steric lock.
- the first and second adapters respectively include unique molecular identifiers (UMIs).
- the native single-stranded polynucleotide includes one or more epigenetic marks, and the complement of the single-stranded polynucleotide does not include any epigenetic marks.
- the complement of the native single-stranded polynucleotide is synthesized using only unmodified nucleotides.
- the second adapter includes a 3' steric lock.
- the polynucleotide includes RNA, and the complement of the polynucleotide includes cDNA. In some other examples, the polynucleotide includes cDNA, and the complement of the polynucleotide includes RNA.
- FIGS. 1A-1H schematically illustrate use of an example sequencing system, and example compositions and operations, for sequencing a full-complement polynucleotide using a nanopore.
- FIG. 2A schematically illustrates the formation of transient secondary structures on the second side of the nanopore when sequencing a polynucleotide using a nanopore.
- FIG. 2B schematically illustrates an example manner in which a full-complement polynucleotide may inhibit the formation of transient secondary structures when sequencing that full-complement polynucleotide using a nanopore.
- FIG. 3 A schematically illustrates the formation of a secondary structure on the first side of the nanopore when sequencing a full-complement polynucleotide using the nanopore.
- FIG. 3B schematically illustrates an example manner in which the force applied and duration of the force applied to a full-complement polynucleotide may be adjusted to inhibit the formation of a secondary structure on the first side of the nanopore when sequencing the fullcomplement polynucleotide using the nanopore.
- FIGS. 4A-4B schematically illustrate use of the sequencing system of FIGS. 1 A-1H to resequence the same full-complement polynucleotide.
- FIG. 5 illustrates a flow of operations in an example method for sequencing a fullcomplement polynucleotide.
- FIG. 6 schematically illustrates an example workflow for generating a full-complement polynucleotide for sequencing.
- FIG. 7 schematically illustrates an example manner in which a full-complement polynucleotide may be loaded into the sequencing system of FIGS. 1A-1H for sequencing.
- FIGS. 8A-8C illustrate example values of electrical characteristics that may be measured using the system of FIGS. 1A-1H.
- FIG. 10 schematically illustrates example circuitry that may be used in the system of FIGS. 1A-1H.
- FIG. 11 A illustrates example current signals obtained from a polynucleotide and its complement using the system of FIGS. 1A-1H.
- FIG. 1 IB schematically illustrates alignment and joint decoding of the current signals of
- FIG. 11 A is a diagrammatic representation of FIG. 11 A.
- FIG. 12A schematically illustrate example polynucleotides using fragments of the PhiX genome, prepared for nanopore sequencing using the system of FIGS. 1A-1H.
- FIG. 12B schematically illustrates the polynucleotide of FIG. 12A locked to a nanopore.
- FIGS. 13A-13B schematically illustrate example polynucleotides and complements, and measured accuracy rates for separately and jointly decoding those sequences using the system of FIGS. 1A-1H.
- FIG. 14 schematically illustrates a workflow for preparing a Lambda polynucleotide for nanopore sequencing using the system of FIGS. 1A-1H.
- FIGS. 15B schematically illustrates measured accuracy rates for separately and jointly decoding the sequences of FIG. 15A using the system of FIGS. 1A-1H.
- FIG. 15C schematically illustrates example signals, and errors, obtained from the sequences of FIG. 15A using the system of FIGS. 1A-1H.
- FIG. 16A illustrates example signals obtained using the system of FIGS. 1A-1H without use of a complement.
- FIG. 16B illustrates example signals obtained using the system of FIGS. 1A-1H with use of a complement.
- FIG. 17A illustrates a secondary structure prediction and example signals obtained using the system of FIGS. 1A-1H without use of an elevated temperature.
- FIG. 17B illustrates a secondary structure prediction and example signals obtained using the system of FIGS. 1A-1H with use of an elevated temperature.
- FIGS. 18A-18B illustrate example signals obtained using sequences with different epigenetic marks than one another
- FIG. 19 schematically illustrates an example workflow for generating a full-complement polynucleotide for sequencing where top and bottom native strands, possibly including modified bases, are linked to unmodified synthetic reverse-complement copies of the top and bottom strands.
- FIGS. 21A-21B schematically illustrate an example workflow for generating a fullcomplement polynucleotide for native RNA sequencing that includes the native RNA strand, a direct-repeat RNA synthetic copy of the native RNA, and cDNA copies of both native and synthetic RNA strands.
- FIG. 22 illustrates example forked adapters with optional blockers.
- FIG. 23 schematically illustrates and example workflow for generating a forward-forward tandem sequencing polynucleotide.
- a construct that includes a target polynucleotide and its complement (together, a “full-complement polynucleotide”) is disposed through the aperture of a nanopore.
- a first duplex is formed with a portion of the construct, on a first side of the nanopore.
- a force then is applied that lodges the first duplex within the aperture of the nanopore, and an electrical measurement is made.
- the particular value of that measurement is based on the particular complementary bases that are located at the 3' end of the first duplex, and the particular sequence of bases that are in a singlestranded portion of the construct, within the aperture of the nanopore.
- the particular measured value provides information from which the sequence of the bases in the construct may be determined.
- a force then is applied that dislodges the first duplex from within the aperture of the nanopore so that the first duplex may be extended by a nucleotide, and the measurement repeated.
- the measurements for the target polynucleotide and its complement may be obtained using the same, first duplex, or the measurement for the complement may be obtained using a second duplex which is different from the first duplex.
- the repeated measurements provide further information from which the sequence of the bases in the construct may be determined. For example, measured values from the target polynucleotide and from its complement may be aligned and used together to obtain the sequence of the target nucleotide with significantly higher accuracy than if only the target nucleotide (or its complement) was sequenced.
- the present subject matter may be used to sequence fullcomplement polynucleotides - such as DNA or RNA constructs - using relatively few reagents and without the need for optical components that otherwise may add cost, weight, and complexity.
- the present subject matter may be used to identify modified bases, such as methylated bases, without the need to chemically or enzymatically modify the modified bases.
- the present subject matter is compatible with relatively long reads, e.g., of up to about 1,000 bases, or up to about 2,000 bases, or up to about 5,000 bases, or even up to 10,000 bases or more.
- the present subject matter overcomes the homopolymer problem traditionally associated with strand-based nanopore sequencing, for improved accuracy of sequencing areas that contain repeating nucleotides.
- the present subject matter provides for controllable translocation of polynucleotide constructs through a nanopore, thus inhibiting or preventing translocation events that are too fast to go undetected and that may lead to deletion errors or other types of errors that detrimentally affect accuracy.
- the above terms are to be interpreted synonymously with the phrases “having at least” or “including at least.”
- the term “comprising” means that the process includes at least the recited steps, but may include additional steps.
- the term “comprising” means that the compound, composition, or system includes at least the recited features or components, but may also include additional features or components.
- the terms “substantially,” “approximately,” and “about” used throughout this specification are used to describe and account for small fluctuations, such as due to variations in processing. For example, they may refer to less than or equal to ⁇ 10%, such as less than or equal to ⁇ 5%, such as less than or equal to ⁇ 2%, such as less than or equal to ⁇ 1%, such as less than or equal to ⁇ 0.5%, such as less than or equal to ⁇ 0.2%, such as less than or equal to ⁇ 0.1%, such as less than or equal to ⁇ 0.05%.
- nucleotide is intended to mean a molecule that includes a sugar and at least one phosphate group, and in some examples also includes a nucleobase.
- a nucleotide that lacks a nucleobase may be referred to as “abasic .”
- Nucleotides include deoxyribonucleotides, modified deoxyribonucleotides, ribonucleotides, modified ribonucleotides, peptide nucleotides, modified peptide nucleotides, modified phosphate sugar backbone nucleotides, and mixtures thereof.
- nucleotides examples include adenosine monophosphate (AMP), adenosine diphosphate (ADP), adenosine triphosphate (ATP), thymidine monophosphate (TMP), thymidine diphosphate (TDP), thymidine triphosphate (TTP), cytidine monophosphate (CMP), cytidine diphosphate (CDP), cytidine triphosphate (CTP), guanosine monophosphate (GMP), guanosine diphosphate (GDP), guanosine triphosphate (GTP), uridine monophosphate (UMP), uridine diphosphate (UDP), uridine triphosphate (UTP), deoxyadenosine monophosphate (dAMP), deoxyadenosine diphosphate (dADP), deoxyadenosine triphosphate (dATP), deoxythymidine monophosphate (dTMP), deoxythymidine diphosphate (dTDP), deoxy
- nucleotide also is intended to encompass any nucleotide analogue which is a type of nucleotide that includes a modified nucleobase, sugar, backbone, and/or phosphate moiety compared to naturally occurring nucleotides.
- Nucleotide analogues also may be referred to as “modified nucleic acids.”
- Example modified nucleobases include inosine, xanthine, hypoxanthine, isocytosine, isoguanine, 2-aminopurine, 5-methylcytosine, 5- hydroxymethyl cytosine, 2-aminoadenine, 6-methyl adenine, 6-methyl guanine, 2-propyl guanine, 2-propyl adenine, 2-thiouracil, 2-thiothymine, 2-thiocytosine, 15-halouracil, 15- halocytosine, 5-propynyl uracil, 5-propynyl cytosine, 6-azo uracil, 6-azo cytosine, 6-azo thymine, 5-uracil, 4-thiouracil, 8-halo adenine or guanine, 8-amino adenine or guanine, 8-thiol adenine or
- nucleotide analogues cannot become incorporated into a polynucleotide, for example, nucleotide analogues such as adenosine 5'-phosphosulfate.
- Nucleotides may include any suitable number of phosphates, e.g., three, four, five, six, or more than six phosphates.
- Nucleotide analogues also include locked nucleic acids (LNA), peptide nucleic acids (PNA), and 5- hydroxylbutynl-2'-deoxyuridine (“super T”).
- LNA locked nucleic acids
- PNA peptide nucleic acids
- super T 5- hydroxylbutynl-2'-deoxyuridine
- polynucleotide refers to a molecule that includes a sequence of nucleotides that are bonded to one another.
- a polynucleotide is one nonlimiting example of a polymer.
- examples of polynucleotides include deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and analogues thereof such as locked nucleic acids (LNA) and peptide nucleic acids (PNA).
- a polynucleotide may be a single stranded sequence of nucleotides, such as RNA or single stranded DNA, a double stranded sequence of nucleotides, such as double stranded DNA, or may include a mixture of a single stranded and double stranded sequences of nucleotides.
- Double stranded DNA includes genomic DNA, and PCR and amplification products. Single stranded DNA (ssDNA) can be converted to dsDNA and vice-versa.
- Polynucleotides may include non-naturally occurring DNA, such as enantiomeric DNA, LNA, or PNA.
- nucleotides in a polynucleotide may be known or unknown.
- polynucleotides for example, a probe, primer, expressed sequence tag (EST) or serial analysis of gene expression (SAGE) tag
- genomic DNA genomic DNA fragment, exon, intron, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozyme, cDNA, recombinant polynucleotide, synthetic polynucleotide, branched polynucleotide, plasmid, vector, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probe, primer or amplified copy of any of the foregoing.
- EST expressed sequence tag
- SAGE serial analysis of gene expression
- a polynucleotide that is coupled to its complement may be referred to as a “full-complement polynucleotide.”
- the polynucleotide may be coupled to its complement in any suitable manner, for example via an oligonucleotide that is disposed between, and coupled to each of, the polynucleotide and its complement.
- a molecule that contains the full-complement polynucleotide, as well as any structure coupling the polynucleotide to its complement may be referred to herein as a “construct.”
- a “polymerase” is intended to mean an enzyme having an active site that assembles polynucleotides by polymerizing nucleotides into polynucleotides.
- a polymerase can bind a primer and a single stranded target polynucleotide, and can sequentially add nucleotides to the growing primer to form a “complementary copy” polynucleotide having a sequence that is complementary to that of the target polynucleotide.
- DNA polymerases may bind to the target polynucleotide and then move down the target polynucleotide sequentially adding nucleotides to the free hydroxyl group at the 3' end of a growing polynucleotide strand.
- DNA polymerases may synthesize complementary DNA molecules from DNA templates.
- RNA polymerases may synthesize RNA molecules from DNA templates (transcription).
- Other RNA polymerases, such as reverse transcriptases may synthesize cDNA molecules from RNA templates.
- Still other RNA polymerases may synthesize RNA molecules from RNA templates, such as RdRP.
- Polymerases may use a short RNA or DNA strand (primer), to begin strand growth.
- Example DNA polymerases include Bst DNA polymerase, 9° Nm DNA polymerase, Phi29 DNA polymerase, DNA polymerase I (E. coll), DNA polymerase I (Large), (Klenow) fragment, Klenow fragment (3 '-5' exo-), T4 DNA polymerase, T7 DNA polymerase, Deep VentRTM (exo-) DNA polymerase, Deep VentRTM DNA polymerase, DyNAzymeTM EXT DNA, DyNAzymeTM II Hot Start DNA Polymerase, PhusionTM High-Fidelity DNA Polymerase, TherminatorTM DNA Polymerase, TherminatorTM II DNA Polymerase, VentR® DNA Polymerase, VentR® (exo-) DNA Polymerase, RepliPHITM Phi29 DNA Polymerase, rBst DNA Polymerase, rBst DNA Polymerase (Large), Fragment (IsoThermTM DNA Polymerase), MasterAmpTM AmpliThermTM,
- the polymerase is selected from a group consisting of Bst, Bsu, and Phi29.
- Some polymerases have an activity that degrades the strand behind them (3 1 exonuclease activity).
- Some useful polymerases have been modified, either by mutation or otherwise, to reduce or eliminate 3' and/or 5' exonuclease activity.
- Example RNA polymerases include RdRps (RNA dependent, RNA polymerases) that catalyze the synthesis of the RNA strand complementary to a given RNA template.
- Example RdRps include polioviral 3Dpol, vesicular stomatitis virus L, and hepatitis C virus NS5B protein.
- Example RNA Reverse Transcriptases include polioviral 3Dpol, vesicular stomatitis virus L, and hepatitis C virus NS5B protein.
- a non-limiting example list to include are reverse transcriptases derived from Avian Myelomatosis Virus (AMV), Murine Moloney Leukemia Virus (MMLV) and/or the Human Immunodeficiency Virus (HIV), telomerase reverse transcriptases such as (hTERT), SuperScriptTM III, SuperScriptTM IV Reverse Transcriptase, ProtoScript® II Reverse Transcriptase.
- AMV Avian Myelomatosis Virus
- MMLV Murine Moloney Leukemia Virus
- HAV Human Immunodeficiency Virus
- hTERT telomerase reverse transcriptases
- SuperScriptTM III SuperScriptTM IV Reverse Transcriptase
- ProtoScript® II Reverse Transcriptase ProtoScript® II Reverse Transcriptase.
- transposase is intended to mean an enzyme that, under certain conditions, is capable of coupling an oligonucleotide to a double-stranded polynucleotide.
- the oligonucleotide includes at least a mosaic end (ME) sequence, which also may be referred to as a transposition end (TE).
- ME mosaic end
- TE transposition end
- a “transposome” or “transposition system” is intended to refer to a transposase that is coupled to a respective oligonucleotide including at least an ME sequence.
- transposome the combination of a transposase and transposon end may be referred to as a “transposome.”
- a transposome may be activated, under certain conditions, to cut a doublestranded polynucleotide and to couple the oligonucleotide to the cut end.
- the transposome and the double-stranded polynucleotide may form a “transposition complex” wherein the transposome inserts the oligonucleotide into the double-stranded polynucleotide.
- a transposome may perform a process that may be referred to as “tagmentation” or “transposition” that results in fragmentation of the target polynucleotide and ligation of adapters to the 5' end of both strands of double-stranded DNA fragments, or to the 5' and 3' ends, e.g., in a manner such as described in U.S. 2010/0120098 or in WO 2010/048605, the entire contents of each of which are incorporated by reference herein.
- transposases may include integrases from retrotransposons or retroviruses.
- transposition systems include, but are not limited to, those formed by a hyperactive Tn5 transposase and a Tn5-type transposon end or by a MuA transposase and a Mu transposon end including R1 and R2 end sequences; see, e.g., the following references, the entire contents of each of which are incorporated by reference herein: Goryshin et al., “Tn5 in vitro transposition,” J. Biol. Chem.
- Transposases may be mutated to modulate their activity and/or the ME sequence may be changed to modulate the transposome’ s activity in a manner such as described in Reznikoff, “Tn5 as a model for understanding DNA transposition,” Mol. Microbiol. 47(5): 1199-1206 (2003), the entire contents of which are incorporated by reference herein.
- transposases and other suitable transposition systems include Staphylococcus aureus Tn552 (see, e.g., Budapesto et al., “In vitro transposition system for efficient generation of random mutants of Campylobacter jejuni,” J Bacteriol. 183: 2384-2388 (2001) and Kirby et al., “Cryptic plasmids of Mycobacterium avium: Tn552 to the rescue,” Mol Microbiol., 43(1): 173-186 (2002)); Tyl (Devine et al., “Efficient integration of artificial transposons into plasmid targets in vitro: a useful tool for DNA mapping, sequencing and genetic analysis,” Nucleic Acids Res.
- transposomes may include transposase monomers.
- a single unit (monomeric) Tn3 transposase may bind two target sequences simultaneously and change conformation to form the transposome, e.g., in a manner such as described in Nicolas et al., “Unlocking Tn3-family transposase activity in vitro unveils an asymetric pathway for transposome assembly,” PMAS 114(5): E669-E678 (2017), the entire contents of which are incorporated by reference herein.
- Some transposomes may include transposase dimers.
- Tn5 transposases may dimerize in a manner such as described in Naumann et al., “Trans catalysis in Tn5 transposition,” PNAS 97(16): 8944-8949 (2000), the entire contents of which are incorporated by reference herein.
- Some transposomes may include transposase tetramers.
- Mu transposases may form tetramers in a manner such as described in Harshey, “Transposable phase Mu,” Microbiol Spectr.
- variants and derivatives refer to a polypeptide that includes an amino acid sequence of a polypeptide or a fragment of a polypeptide, which has been altered by the introduction of amino acid residue substitutions, deletions, or additions.
- a variant or a derivative of a polypeptide can be a fusion protein which contains part of the amino acid sequence of a polypeptide.
- variant or derivative as used herein also refers to a polypeptide or a fragment of a polypeptide, which has been chemically modified, e.g., by the covalent attachment of any type of molecule to the polypeptide.
- a polypeptide or a fragment of a polypeptide can be chemically modified, e.g., by glycosylation, acetylation, pegylation, phosphorylation, methylation, nitrosylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, linkage to a cellular ligand or other protein, etc.
- the variants or derivatives are modified in a manner that is different from naturally occurring or starting peptide or polypeptides, either in the type or location of the molecules attached. Variants or derivatives further include deletion of one or more chemical groups which are naturally present on the peptide or polypeptide.
- a variant or a derivative of a polypeptide or a fragment of a polypeptide can be chemically modified by chemical modifications using techniques known to those of skill in the art, including, but not limited to specific chemical cleavage, acetylation, formulation, metabolic synthesis of tunicamycin, etc. Further, a variant or a derivative of a polypeptide or a fragment of a polypeptide can contain one or more non-classical amino acids.
- a polypeptide variant or derivative may possess a similar or identical function as a polypeptide, or a fragment of a polypeptide described herein.
- a polypeptide variant or derivative may possess an additional or different function compared with a polypeptide or a fragment of a polypeptide described herein.
- the terms “unique molecular identifier” and “UMI” are intended to mean an oligonucleotide that may be coupled to a polynucleotide and via which the polynucleotide may be identified.
- UMI unique molecular identifier
- a set of different UMIs may be coupled to a plurality of different polynucleotides, and each of those polynucleotides may be identified using the particular UMI coupled to that polynucleotide.
- UMI is a “barcode”.
- a target polynucleotide may include an “amplification adapter” or, more simply, an “adapter,” that hybridizes to (has a sequence that is complementary to) a primer, and may be amplified so as to generate a complementary copy polynucleotide by adding nucleotides to the free 3' OH group of the primer.
- an amplification adapter or, more simply, an “adapter,” that hybridizes to (has a sequence that is complementary to) a primer, and may be amplified so as to generate a complementary copy polynucleotide by adding nucleotides to the free 3' OH group of the primer.
- the term “adapter” may be intended to refer to an oligonucleotide that is coupled to a polynucleotide, e.g., is ligated or tagmented to a polynucleotide.
- the oligonucleotide may be synthetic, and the polynucleotide to which it is coupled may be either native (that is, obtained from an organism) or synthetic (e.g., is generated by synthesizing a complementary copy of a native polynucleotide).
- Adapters may be used for a variety of purposes, such as will be apparent from the following disclosure.
- the term “plurality” is intended to mean a population of two or more different members. Pluralities may range in size from small, medium, large, to very large. The size of small plurality may range, for example, from a few members to tens of members. Medium sized pluralities may range, for example, from tens of members to about 100 members or hundreds of members. Large pluralities may range, for example, from about hundreds of members to about 1000 members, to thousands of members and up to tens of thousands of members. Very large pluralities may range, for example, from tens of thousands of members to about hundreds of thousands, a million, millions, tens of millions and up to or greater than hundreds of millions of members.
- a plurality may range in size from two to well over one hundred million members as well as all sizes, as measured by the number of members, in between and greater than the above example ranges.
- Example polynucleotide pluralities include, for example, populations of about I xlO 5 or more, 5x l0 ? or more, or I x lO 6 or more different polynucleotides. Accordingly, the definition of the term is intended to include all integer values greater than two.
- An upper limit of a plurality may be set, for example, by the theoretical diversity of polynucleotide sequences in a sample.
- double-stranded when used in reference to a polynucleotide, is intended to mean that all or substantially all of the nucleotides in the polynucleotide are hydrogen bonded to respective nucleotides in a complementary polynucleotide.
- a doublestranded polynucleotide also may be referred to as a “duplex.”
- single- stranded when used in reference to a polynucleotide, means that essentially none of the nucleotides in the polynucleotide are hydrogen bonded to a respective nucleotide in a complementary polynucleotide.
- a target polynucleotide may include one or more adapters, including an amplification adapter that functions as a primer binding site, that flank(s) a target polynucleotide sequence that is to be analyzed.
- target polynucleotides may have different sequences than one another but may have first and second adapters that are the same as one another.
- the two adapters that may flank a particular target polynucleotide sequence may have the same sequence as one another, or complementary sequences to one another, or the two adapters may have different sequences.
- species in a plurality of target polynucleotides may include regions of known sequence that flank regions of unknown sequence that are to be evaluated by, for example, sequencing (e.g., SBS).
- target polynucleotides carry an amplification adapter at a single end, and such adapter may be located at either the 3' end or the 5 1 end the target polynucleotide.
- Target polynucleotides may be used without any adapter, in which case a primer binding sequence may come directly from a sequence found in the target polynucleotide.
- polynucleotide and “oligonucleotide” are used interchangeably herein. The different terms are not intended to denote any particular difference in size, sequence, or other property unless specifically indicated otherwise. For clarity of description, the terms may be used to distinguish one species of polynucleotide from another when describing a particular method or composition that includes several polynucleotide species.
- substrate refers to a material used as a support for compositions described herein.
- Example substrate materials may include glass, silica, plastic, quartz, metal, metal oxide, organo-silicate (e.g., polyhedral organic silsesquioxanes (POSS)), polyacrylates, tantalum oxide, complementary metal oxide semiconductor (CMOS), or combinations thereof.
- POSS polyhedral organic silsesquioxanes
- CMOS complementary metal oxide semiconductor
- An example of POSS can be that described in Kehagias et al., Microelectronic Engineering 86 (2009), pp. 776-778, which is incorporated by reference in its entirety.
- substrates used in the present application include silica-based substrates, such as glass, fused silica, or other silica-containing material.
- silica-based substrates can include silicon, silicon dioxide, silicon nitride, or silicone hydride.
- substrates used in the present application include plastic materials or components such as polyethylene, polystyrene, poly(vinyl chloride), polypropylene, nylons, polyesters, polycarbonates, and poly(methyl methacrylate).
- Example plastics materials include poly(methyl methacrylate), polystyrene, and cyclic olefin polymer substrates.
- the substrate is or includes a silica-based material or plastic material or a combination thereof.
- the substrate has at least one surface including glass or a silicon-based polymer.
- the substrates can include a metal.
- the metal is gold.
- the substrate has at least one surface including a metal oxide.
- the surface includes a tantalum oxide or tin oxide.
- Acrylamides, enones, or acrylates may also be utilized as a substrate material or component.
- Other substrate materials can include, but are not limited to gallium arsenide, indium phosphide, aluminum, ceramics, polyimide, quartz, resins, polymers and copolymers.
- the substrate and/or the substrate surface can be, or include, quartz.
- the substrate and/or the substrate surface can be, or include, semiconductor, such as GaAs or ITO.
- semiconductor such as GaAs or ITO.
- Substrates can include a single material or a plurality of different materials. Substrates can be composites or laminates.
- the substrate includes an organo-silicate material.
- Substrates can be flat, round, spherical, rod-shaped, or any other suitable shape. Substrates may be rigid or flexible. In some examples, a substrate is a bead or a flow cell.
- Substrates can be non-pattemed, textured, or patterned on one or more surfaces of the substrate.
- the substrate is patterned.
- Such patterns may include posts, pads, wells, ridges, channels, or other three-dimensional concave or convex structures. Patterns may be regular or irregular across the surface of the substrate. Patterns can be formed, for example, by nanoimprint lithography or by use of metal pads that form features on non-metallic surfaces, for example.
- a substrate described herein forms at least part of a flow cell or is located in or coupled to a flow cell.
- Flow cells may include a flow chamber that is divided into a plurality of lanes or a plurality of sectors.
- Example flow cells and substrates for manufacture of flow cells that can be used in methods and compositions set forth herein include, but are not limited to, those commercially available from Illumina, Inc. (San Diego, CA).
- Electrodes are intended to mean a solid structure that conducts electricity. Electrodes may include any suitable electrically conductive material, such as gold, palladium, or platinum, or combinations thereof. In some examples, an electrode may be disposed on a substrate. In some examples, an electrode may define a substrate. [0117] As used herein, the term “nanopore” is intended to mean a structure that includes an aperture that permits molecules to cross therethrough from a first side of the nanopore to a second side of the nanopore, in which a portion of the aperture of a nanopore has a width of 100 nm or less, e.g., 10 nm or less, or 2 nm or less.
- the aperture extends through the first and second sides of the nanopore.
- Molecules that can cross through an aperture of a nanopore can include, for example, ions or water-soluble molecules such as amino acids or nucleotides.
- the nanopore can be disposed within a barrier, or can be provided through a substrate.
- a portion of the aperture can be narrower than one or both of the first and second sides of the nanopore, in which case that portion of the aperture can be referred to as a “constriction.”
- the aperture of a nanopore, or the constriction of a nanopore (if present), or both can be greater than 0.1 nm, 0.5 nm, 1 nm, 10 nm or more.
- a nanopore can include multiple constrictions, e.g., at least two, or three, or four, or five, or more than five constrictions, nanopores include biological nanopores, solid-state nanopores, or biological and solid-state hybrid nanopores.
- Bio nanopores include, for example, polypeptide nanopores and polynucleotide nanopores.
- a “polypeptide nanopore” is intended to mean a nanopore that is made from one or more polypeptides.
- the one or more polypeptides can include a monomer, a homopolymer or a heteropolymer.
- Structures of polypeptide nanopores include, for example, an a-helix bundle nanopore and a P-barrel nanopore as well as all others well known in the art.
- Example polypeptide nanopores include a-hemolysin, Mycobacterium smegmatis porin A, gramicidin A, maltoporin, OmpF, OmpC, PhoE, Tsx, F-pilus, SP1, mitochondrial porin (VDAC), Tom40, outer membrane phospholipase A, CsgG, aerolysin, and Neisseria autotransporter lipoprotein (NalP).
- Mycobacterium smegmatis porin A is a membrane porin produced by Mycobacteria, allowing hydrophilic molecules to enter the bacterium.
- MspA forms a tightly interconnected octamer and transmembrane beta-barrel that resembles a goblet and includes a central constriction.
- a-hemolysin see U.S. 6,015,714, the entire contents of which are incorporated by reference herein.
- SP1 see Wang et al., Chem. Commun., 49: 1741-1743 (2013), the entire contents of which are incorporated by reference herein.
- MspA see Butler et al., “Single-molecule DNA detection with an engineered MspA protein nanopore,” Proc. Natl. Acad. Sci.
- nanopore DNA sequencing with MspA Proc. Natl. Acad. Sci. USA, 107: 16060-16065 (2010), the entire contents of both of which are incorporated by reference herein.
- Other nanopores include, for example, the MspA homolog from Norcadia farcinica, and lysenin.
- lysenin See PCT Publication No. WO 2013/153359, the entire contents of which are incorporated by reference herein.
- a “polynucleotide nanopore” is intended to mean a nanopore that is made from one or more nucleic acid polymers.
- a polynucleotide nanopore can include, for example, a polynucleotide origami.
- a “solid-state nanopore” is intended to mean a nanopore that is made from one or more materials that are not of biological origin.
- a solid-state nanopore can be made of inorganic or organic materials.
- Solid-state nanopores include, for example, silicon nitride (SiN), silicon dioxide (SiCh), silicon carbide (SiC), hafnium oxide (HfCh), molybdenum disulfide (M0S2), hexagonal boron nitride (h-BN), or graphene.
- a solid-state nanopore may comprise an aperture formed within a solid-state membrane, e.g., a membrane including any such material(s).
- a “biological and solid-state hybrid nanopore” is intended to mean a hybrid nanopore that is made from materials of both biological and non-biological origins. Materials of biological origin are defined above and include, for example, polypeptides and polynucleotides.
- a biological and solid-state hybrid nanopore includes, for example, a polypeptide-solid-state hybrid nanopore and a polynucleotide-solid-state nanopore.
- a “barrier” is intended to mean a structure that normally inhibits passage of molecules from one side of the barrier to the other side of the barrier.
- the molecules for which passage is inhibited can include, for example, ions or water soluble molecules such as nucleotides and amino acids.
- the aperture of the nanopore may permit passage of molecules from one side of the barrier to the other side of the barrier.
- the aperture of the nanopore may permit passage of molecules from one side of the barrier to the other side of the barrier.
- Barriers include membranes of biological origin, such as lipid bilayers, and non-biological barriers such as solid-state membranes or substrates.
- “of biological origin” refers to material derived from or isolated from a biological environment such as an organism or cell, or a synthetically manufactured version of a biologically available structure.
- solid-state refers to material that is not of biological origin.
- methylated base refers to a base that includes a methyl group (-CH3 or -Me) or a derivatized methyl group.
- methylcytosine or mC refers to cytosine in DNA (namely, 2'-deoxy cytosine) that includes a methyl group, or is a derivative of methylcytosine.
- methyladenine or mA refers to adenine in DNA that includes a methyl group, or is a derivative of methyladenine.
- a nonlimiting example of a derivatized methyl group is an oxidized methyl group.
- a nonlimiting example of an oxidized methyl group is hydroxymethyl (-CH2OH).
- An mC derivative having a hydroxymethyl group may be referred to as hydroxymethylcytosine or hmC.
- Another nonlimiting example of an oxidized methyl group is formyl group (-CHO).
- An mC derivative having a formyl group may be referred to as formylcytosine or fC.
- Another nonlimiting example of an oxidized methyl group is carboxyl (-COOH).
- An mC derivative including a carboxyl group may be referred to as carboxycytosine or caC.
- the methyl group of methylcytosine may be located at the 5 position of the cytosine, in which case the mC may be referred to as 5mC.
- the oxidized methyl group may be located at the 5 position of the cytosine, in which case the hmC may be referred to as 5hmC, the fC may be referred to as 5fC, or the caC may be referred to as 5caC.
- the methyl group of methyladenine may be located at the 6 position of the adenine, in which case the mA may be referred to as 6mA.
- base calling accuracy may be improved using complementary information, such as from both a target polynucleotide and its complement which are coupled together in the present constructs, and jointly decoded in the present operations and systems. Improved accuracy may be obtained because the errors that occur in either of the polynucleotide or its complement do not always occur on both at the same position. This is generally the case whether there is a random error or a systematic error.
- the present constructs may form stable hairpins on the second side of the nanopore which inhibit transient secondary structures from forming that may otherwise add noise to the measurement, and accordingly provide for still more accurate base calling.
- FIGS. 1A-1H schematically illustrate use of an example sequencing system, and example compositions and operations, for sequencing a full-complement polynucleotide using a nanopore.
- sequencing system 100 may include nanopore 110, construct 150, polynucleotide 140 that is hybridized to first portion 155 of construct 150 to form first duplex 154, and circuitry 160.
- Construct 150 may include target polynucleotide 158 that it is desired to sequence (e.g., that may be unknown), and complement 159 of target polynucleotide 158 (which also may have an unknown) sequence.
- polynucleotide 158 is, consists essentially of, or includes DNA
- complement 159 is, consists essentially of, or includes DNA
- target polynucleotide 158 is, consists essentially of, or includes RNA
- complement 159 is, consists essentially of, or includes cDNA.
- construct 150 optionally may include an additional polynucleotide and a complement thereof, where the additional polynucleotide may be related to polynucleotide 158.
- construct 150 may include both the sense strand and the antisense strand of a native polynucleotide (either or both of which may be considered a target polynucleotide 158), and may include complements of each of the sense strand and the antisense strand (either or both of which may be considered complement 159).
- construct 150 also includes structure 157 coupling target polynucleotide 158 to its complement 159.
- structure 157 includes any suitable number of nucleotides, collectively referred to and illustrated as “X.”
- structure 157 may include one nucleotide, or two or more nucleotides, e.g., three, four, five, six, seven, eight, nine, ten, or more than ten nucleotides.
- structure 157 may include a polymer is made only partially of nucleotides, or which does not include nucleotides at all.
- structure 157 may include a polymer including or made up of any suitable number of Spacer 18 (Sp 18) molecules.
- Spacer 18 is an 18- atom hexa-ethyleneglycol spacer which is commercially available, e.g., from Integrated DNA Technologies (IDT, Coralville, Iowa).
- Nanopore 110 may be disposed within barrier 101 and may include first side 111, second side 112, and aperture 113 extending through the first and second sides.
- nanopore 1 10 may include constriction 1 14 within aperture 113.
- Aperture 113 of nanopore 110 may provide a pathway for fluid 120 and/or fluid 120’ to flow through barrier 101.
- Nanopore 110 may include a solid-state nanopore, a biological nanopore (e.g., MspA such as illustrated in FIG. 1A), or a biological and solid-state hybrid nanopore. In the nonlimiting example illustrated in FIG.
- nanopore 110 may be oriented so that first side 111 of the nanopore includes the majority of aperture 113, such that 3' end 153 of first duplex 154 may fit relatively deeply within aperture 113 so as to be relatively inaccessible to fluid 120 and thus may not be acted upon by any polymerase in the fluid in a manner such as will be described with reference to FIG. IB.
- Barrier 101 may have any suitable structure that normally inhibits passage of molecules from one side of the barrier to the other side of the barrier, e.g., that normally inhibits contact between fluid 120 and fluid 120’.
- barrier 101 may include first layer 107 and second layer 108, one or both of which inhibit the flow of molecules across that layer.
- barrier 101 may include a lipid bilayer including lipid layers 107 and 108.
- barrier 101 may include any suitable structure(s), any suitable material(s), and any suitable number of layers.
- barrier 101 may include a solid-state barrier, which may include a single layer or multiple layers. Nonlimiting examples of materials that may be used in barriers are provided elsewhere herein.
- barriers, nanopores, and nanopore sequencing methods are described elsewhere herein, as well as in the following references, the entire contents of each of which are incorporated by reference herein: US 9,708,655; WO2023/049682; WO2023/187104;
- Construct 150 may include, for example, DNA and/or RNA. Construct 150 may be disposed through aperture 113 of nanopore 110 such that a first portion 155 of the construct 150 is located, optionally entirely, on first side 111 of the nanopore. The 3' end of construct 150 may be located on the first side 111 of nanopore 110, and the 5' end of construct 150 may be located on the second side 112 of nanopore 110. Optionally, the 3' end of construct 150 may be coupled to, or may include, a first steric lock 151. First steric lock 151 is sufficiently large as not to be able to pass through nanopore 110 or feature thereof (e g., through constriction 114), thus retaining that end on the first side of the nanopore.
- the 5' end of construct 150 may be located on second side 112 of nanopore 110 and may be coupled to, or may include a second steric lock 152.
- Second steric lock 152 is sufficiently large as not to be able to pass through constriction 114 (such as an oligonucleotide hybridized to construct 150), thus retaining the second end of construct 150 on the second side of the nanopore.
- construct 150 may remain associated with nanopore 110 during this operating mode.
- Polynucleotide 140 may include, for example, DNA and/or RNA. Polynucleotide 140 may include, for example, a primer hybridized to the first portion of construct 150 to form first duplex 154.
- the first duplex 154 between polynucleotide 140 and the first portion of construct 150 may be located, optionally entirely, on first side 111 of the nanopore, and may include 3' end 153.
- 3' end 153 includes the base pair GC, and it is desired to identify the C as being at this location of the sequence of construct 150, in addition to the sequence of other nucleotides in construct 150.
- the G nucleotide 121 was incorporated into polynucleotide 140 in a prior step based on the sequence of construct 150 using a polymerase, in a manner such as described in greater detail below with reference to FIG. 1C.
- the G nucleotide 121 is part of a primer.
- Single-stranded second portion 156 of construct 150 e.g., bases A and T
- Bases of polynucleotide 140 and construct 150 that are not specifically illustrated or labeled should be understood to be present, but omitted for simplicity of illustration.
- circuitry 160 may apply a first force (Fl) disposing the 3' end 153 of first duplex 154 within aperture 113.
- Nanopore 110 inhibits translocation of 3' end 153 of first duplex 154 to the second side of the nanopore while the first force is applied.
- circuitry 160 applies a first force Fl (such as a first voltage across nanopore 110) that moves first duplex 154 towards the second side 112 of nanopore 110, while constriction 114 or other feature of nanopore 110 inhibits the passage of 3' end 153 of the first duplex to the second side of the nanopore.
- Fl first force
- the nucleotide analogue(s) may include one or more locked nucleic acids (LNA) or may include one or more 2'-methoxy (2'-0Me) nucleotides, or may include one or more 2'-fluorinated (2'-F) nucleotides, or may include one or more peptide nucleic acids (PNA).
- LNA locked nucleic acids
- PNA peptide nucleic acids
- Such analogue(s) may be included, for example, in polynucleotide 140.
- such analogue(s) may be added to the primer at the time it is synthesized, or potentially triphosphate nucleotides with these modifications could be incorporated into strand 140 by the polymerase.
- Such analogue(s) may increase the Tm (melting temperature) of first duplex 154.
- addition of LNA monomers may help to increase the Tm of first duplex 154 and may be used to fine-tune the Tm of first duplex 154.
- 2'-0Me may increase the Tm of RNA:RNA duplexes and results in only small changes in RNA:DNA stability.
- 2' Fluoro bases may have a fluorine modified ribose which increases binding affinity (Tm) and also may confer some relative nuclease resistance when compared to native RNA.
- first duplex 154 may alter the rate at which the salt in fluid 120 moves through aperture 113 and into fluid 120’, and thus may alter the electrical current, ionic current, electrical resistance, or electrical voltage drop across nanopore 110 in such a manner as to be detected by circuitry 160.
- first duplex 154 may include one or more nucleotide analogue(s) that alter the value of the electrical property of the nanopore relative to a natural nucleotide.
- polynucleotide 140 may include nucleotide analogue(s) such as a 2' modification, or a base modification. Such analogue(s) thus may be identified using circuitry 160.
- the modified base is methylated in target polynucleotide 158, but is not methylated in complement 159, which can aid in detecting the methylated base in a manner such as described in greater detail below with reference to FIGS. 18A-18B, 19, 20, and 21A-21B.
- modified bases in the complement 159 e.g., such as methylation from genomic DNA
- unmodified bases in the target polynucleotide 158 which can aid in detecting the methylated base in a similar manner as described in greater detail below with reference to FIGS. 18A-18B, 19, 20, and 21A-21B.
- the value that circuitry 160 measures during the operation illustrated in FIG. IB may be based on any suitable number of nucleotides in first portion 155 (which portion is hybridized to polynucleotide 140) and in second portion 156 (which portion is single-stranded and not hybridized to polynucleotide 140).
- the measured value may be at least based on M nucleotides of the single-stranded portion and D pairs of hybridized nucleotides of the duplex, wherein M is greater than or equal to one, and wherein D is greater than or equal to one.
- M and D may have any suitable values.
- M may be greater than or equal to two, or M may be greater than or equal to three, or M may be greater than or equal to four.
- M may be about one.
- M may be about two.
- M may be about three.
- M may be about four.
- M may be about five.
- D may be greater than or equal to two, or D may be greater than or equal to three, or D may be greater than or equal to four.
- D may be about one.
- D may be about two.
- D may be about three.
- D may be about four.
- D may be about five.
- the value measured by circuitry 160 may include an electrical current, ionic current, electrical resistance, or electrical voltage drop that is based on the M nucleotides and D pairs of hybridized nucleotides.
- the numbers are used to represent bases which may affect the measured value under a given force (here, first force Fl).
- first force Fl For an MspA nanopore such as illustrated, it may be expected that bases 4, 5, 6, and 4' primarily may affect the measured value, although bases adjacent to these may also influence the measured value in a manner such as described elsewhere herein.
- bases 3 and 3' may affect the measured value.
- base 7 may affect the measured value.
- base 8 may affect the measured value.
- bases 3, 3', 4, 4', 5, 6, 7, and 8 affect the measured value. In another nonlimiting example, bases 3, 3', 4, 4', 5, 6, and 7 affect the measured value. In another nonlimiting example, bases 4, 4', 5, 6, 7, and 8 affect the measured value. In another nonlimiting example, bases 3, 3', 4, 4', 5, and 6 affect the measured value. In another nonlimiting example, bases 4, 4', 5, 6, and 7 affect the measured value. In another nonlimiting example, bases 4, 4', 5, and 6 affect the measured value.
- first duplex 154 may be disposed at a different location relative to nanopore 110, e.g., may be disposed deeper within aperture 113 if Fl’ is greater than Fl, or more shallowly within aperture 113 if Fl ’ is less than Fl in the nonlimiting example shown in FIG. IF.
- the ionic current through aperture 113 may be affected differently by the bases within the first duplex 154 and the single-stranded portion 156 of construct 150, for different forces. In a manner such as described further below with reference to FIGS. 8A-8C, 9, and 10, the values of measurements made under different forces may be compared so as to even further enhance the accuracy with which nucleotides are identified.
- the value measured by circuitry 160 is affected by an epigenetic mark within construct 150 such as, but not limited to, a methylated base.
- an epigenetic mark within construct 150 such as, but not limited to, a methylated base.
- FIG. 1G in which the numbers again are used to represent bases which may affect the measured value, it may be expected that if base 5* is methylated (or otherwise modified), such group may affect the measured value differently than would an unmethylated (or otherwise unmodified) base, and that therefore the presence of the methylated (or otherwise modified) base may be identified via its effect on the measured value.
- nucleotides in the target polynucleotide 158 may be methylated while the nucleotides in its complement 159 are not methylated (e.g., because the target polynucleotide 158 was obtained from a living organism while the complement 159 is synthetic), and the non-methylated nucleotides in the complement may be used as an internal “reference” to facilitate identifying the methylated nucleotides in the target polynucleotide.
- circuitry 160 may be used to identify nucleotides using the measured values are provided below, for example with reference to FIGS. 8A-8C, 9, and 10.
- fluid 120 may be in contact with the first side 111 of nanopore 110 and may include a plurality of nucleotides 121, 122, 123, 124, e.g., G, T, A, and C, respectively.
- Each of the nucleotides 121, 122, 123, 124 in fluid 120 optionally may be modified in a manner such as described in greater detail below, e.g., may include a nucleotide analogue.
- Fluid 120 further may include a plurality of polymerases 105 that may be used to add nucleotides to polynucleotide 140 using the sequence of construct 150 in a manner such as described with reference to FIG. 1C.
- aperture 113 of nanopore 110 may inhibit the addition of a nucleotide to 3' end 153 of first duplex 154.
- polymerases 105 may be sterically hindered from binding to 3' end 153 while the 3' end is located within aperture 113.
- circuitry 160 may be configured so as to switch system 100 to an operational mode in which the 3' end of first duplex 154 may be extended by adding a nucleotide.
- a nucleotide addition operation may be performed, for example, after forming first duplex 154 and before applying the first force Fl, e.g., to add nucleotide 121 prior to the particular time illustrated in FIG. IB (for example, prior to the particular time illustrated in FIG. 1A).
- FIG. 1C schematically illustrates an example mode for adding a nucleotide to first duplex 154.
- circuitry 160 may be configured to apply a second force F2 (such as a second voltage across nanopore 110 in a direction opposite that of the first voltage) that moves 3' end 153 of first duplex 154 out of aperture 1 13 such that polymerase 105 may contact the 3' end of the first duplex and may add a nucleotide thereto, e.g., T 122, based on the next nucleotide (e.g., A) in the sequence of construct 150.
- a second force F2 such as a second voltage across nanopore 110 in a direction opposite that of the first voltage
- circuitry 160 instead may be configured to as to release the first force Fl, following which release the 3' end 153 of first duplex 154 may naturally diffuse out of aperture 113 such that polymerase 105 may contact the 3' end of the first duplex and may add a first nucleotide thereto, without the need to use the circuitry to actively apply a force causing such motion and making the 3' end of the first duplex available to the fluid 120.
- system 100 may be configured to continue adding nucleotides to extend first duplex 154 along all or a portion of the length of construct 150, e g., to obtain a duplex such as illustrated in FIG. 1H.
- nucleotides are added to first duplex 154 based on the sequence of construct 150. More specifically, in the nonlimiting example described with reference to FIGS.
- nucleotides first are added to the first duplex which are complementary to target polynucleotide 158; then nucleotide(s) (collectively denoted X') are added to the first duplex which are complementary to structure 157 (collectively denoted X); then nucleotides are added to the first duplex which are complementary to complement 159.
- nucleotides are added to the first duplex which are complementary to complement 159.
- the particular order in which respectively complementary nucleotides are added to duplex 154 based on the sequence of construct 150 may vary. Illustratively, in examples in which the locations of polynucleotide 158 and complement 159 are swapped relative to those described with reference to FIGS.
- nucleotides first are added to the first duplex which are complementary to complement 159; then nucleotides are added to the first duplex which are complementary to complement 159.
- a single primer and a single duplex may be used).
- nucleotide(s) collectively denoted X' in FIG. 1H are added to the first duplex which are complementary to structure 157 (collectively denoted X).
- structure 157 does not include nucleotides (e.g., includes Spl8s)
- the duplex may not necessarily be extended or sequenced through structure 157. However, it would be visible/evident in the measurements that this point has been reached since the Spl8s will give currents higher than the nucleotides and the natural extension/addition of nucleotides would stop.
- a primer which is complementary to a first portion of complement 159 then may be added to form a second duplex which is extended in a similar manner as described for polynucleotide 158; or in examples in which the locations of polynucleotide 158 and 159 are swapped relative to that in FIGS. 1H, a primer which is complementary to a first portion of polynucleotide 158 then may be added to form a second duplex which is extended in a similar manner as for complement 159.
- circuitry 160 may be configured to generate a first plurality of measured values for polynucleotide 158, and a second plurality of measured values for complement 159, and to generate a sequence of polynucleotide 158 using the first plurality of measured values and the second plurality of measured values.
- Nonlimiting examples of operations for generating the sequence of polynucleotide 158 will be described in greater detail with regards to FIGS. 8A-8C and 9.
- any suitable type of polymerase 105 may be used to synthesize any suitable type of polynucleotide 140 based on any suitable type and arrangement of construct 150.
- a DNA polymerase 105 may be used copy a DNA construct 150 to form a DNA polynucleotide 140.
- an RNA polymerase 105 may be used to copy a DNA construct 150 to form an RNA polynucleotide 140.
- an RdRP (RNA dependent RNA polymerase) 105 may be used to copy a construct 150 including an RNA polynucleotide to form an RNA polynucleotide 140.
- a reverse transcriptase 105 may be used to copy a construct 150 including an RNA polynucleotide to form a DNA polynucleotide 140.
- Any DNA modifications may occur on the base or the sugar, including the 3’ end.
- Any RNA modifications may occur on the base or the sugar, including the 2’ end.
- modifications to the 2' end include 2'-O-methoxy-ethyl, 2'-0Me, 2'-F and locked nucleic acid (LNA).
- Circuitry 160 may be configured so as to repeatedly switch system 100 between a nucleotide addition mode (FIG. 1C), and a measurement mode (FIG. IB and FIG. ID). For example, after performing the measurement described with reference to FIG. IB, another nucleotide may be added in a manner such as described with reference to FIG. 1C, and another measurement performed. During the measurement mode, the aperture or other feature of nanopore 110 may inhibit polymerase 105 from adding another nucleotide. As such, each cycle of nucleotide measurement may be controlled so as to have any desired duration, e.g., may be electronically controlled using circuitry 160, so as to provide an appropriate signal -to-noise ratio (SNR) for the type of measurement being performed.
- SNR signal -to-noise ratio
- circuitry 160 may be configured to adjust the length of the measurement mode so as to obtain a sufficiently high SNR (e.g., a SNR exceeding a predefined threshold), even though throughput (number of bases per unit time) may be lower. In other applications, circuitry 160 may be configured to adjust the length of the measurement mode so as to obtain a sufficiently high throughput (e.g., a throughout exceeding a predefined threshold), even though the SNR may be lower.
- a sufficiently high SNR e.g., a SNR exceeding a predefined threshold
- throughput number of bases per unit time
- Circuitry 160 further may be configured to perform repeated cycles of nucleotide addition and measurement any suitable number of times, so as to sequence both the target polynucleotide 158 and its complement 159 in the same molecule (that is, in construct 150). For example, as illustrated in FIG. ID, after applying the second force F2 during which a nucleotide is added, circuit 160 again may apply the first force Fl.
- the first force Fl may, in some examples, remove polymerase 105 from contact with 3' end 153 of the first duplex 154, and moves the now-extended 3' end 153 of first duplex 154 into the aperture 113 of nanopore 110, where constriction 114 or other feature of nanopore 110 inhibits translocation of the 3' end to second side 112 of the nanopore in a similar manner as described with reference to FIG. IB.
- constriction 114 although such constriction 114 occupies only a portion of the length of nanopore 110, it may be the most sensitive region for base discrimination, because the constriction is where the largest voltage drop occurs (as it presents the largest resistance between electrodes 102 and 103).
- Typical nanopore constrictions are longer than a single DNA nucleotide of a single-stranded polynucleotide, and therefore a current signal that a nanopore can generate may be dependent on more than one nucleotide, and typically 3, 4, 5, or even 6 or more nucleotides. These nucleotides form what may be termed a “K-mer.” The number of possible K-mers for 4 bases of DNA is 4 K .
- Some previously known strand sequencing methods operate by translocating a single strand of DNA through the constriction of a nanopore (i.e., the “sensing zone”), such as MspA or CsgG, and attempting to associate the currents created by each K-mer with the sequence of the K-mer.
- a nanopore i.e., the “sensing zone”
- MspA or CsgG the nanopore
- two currents that correspond to unique K-mers may appear to be the same, or may be similar enough that they cannot be distinguished from one another within the bounds of a given experimental setup.
- the larger the K-mer the more likely that such “degenerate” cases will occur.
- the a-hemolysin nanopore presents challenges for strand sequencing because the K-mer is large, about 10 nucleotides, so there are about 4 10 possible signals - far too many to be able to realistically deconvolve.
- the K-mers are somewhat shorter, about 6 nucleotides, other methods have sometimes been used to try to distinguish the K-mers.
- This so-called single-base ratcheting mitigates the degeneracy problem because a given K-mer can transition to one of only four possibilities, because the base that leaves the sensing zone is replaced by A, C, G, or T, while the other bases remain unchanged within the sensing zone.
- the new 4-mer can be only CGTT, CGTA, CGTC, or CGTG, whereas there are actually 4 4 (256) K-mers that can be formed from four nucleotide types.
- the list of possibilities for the new K-mer is reduced from 256 to only 4, reducing the potential for degenerate signals.
- translocation events may occur extremely rapidly and may be too short to be detectable using strand sequencing. As a result, deletion errors may occur. Alternatively, some translocation events may be too slow. As a result, insertion errors may occur, rendering the detected homopolymer length greater than what was present in the physical construct 150.
- the present subject matter mitigates any and all of such issues associated with strand sequencing, and indeed is believed to provide greatly enhanced accuracy, controllability, and repeatability as compared to strand sequencing. Furthermore, the use of the present constructs - which include both the target polynucleotide and its complement in the same molecule, in the same nanopore - provide still further enhancements and benefits such as described herein.
- the circuitry 160 measures a signal that is based on a combination of the first or second duplex and a singlestranded portion of construct 150, as well as the particular set of measurement conditions that are used. Moreover, circuitry 160 generates a first plurality of such values for target polynucleotide 158 within construct 150, and a second plurality of such values for complement 159 within the same construct 150.
- the information which is measured therefore is far richer than for an exclusively single-stranded polynucleotide under a single set of measurement conditions, e.g., as is the case in strand sequencing, and also is far richer than for measuring target polynucleotide 158 alone.
- the homopolymer issue is solved. For example, each time a new base is added to the 3' end of the first or second duplex (e.g., such as described with reference to FIG. 1C), a discernible signal then is generated (e.g., such as described with reference to FIGS.
- nucleotide stretch of construct 150 may yield different measured values of the electrical property, because each of such nucleotides may have different proximities to other, different nucleotides outside of the homopolymer stretch. Accordingly, even nucleotides of the same type, within a homopolymer, may be individually identified using the values that circuitry 160 measures.
- the ability to re-sequence the same construct as many times as needed, as described in reference to FIGS. 4A-B, can further enable accurate homopolymer length determination. This may require, as a non-limiting example, using foreknowledge of the rates for nucleotide incorporation in combination with a minimum (possibly pre-determined) number of resequencing rounds of the same molecule.
- the correct length of the homopolymer for the regions of unidentified transitions may be inferred based on matching the detected distribution of homopolymer transit times to a statistical model of the time spent in that region for each round of re-sequencing construct 150.
- the 3' end 153 (including the 3' end of polynucleotide 140) may be sequestered within aperture 113 or otherwise inaccessible for addition of another nucleotide until circuitry 160 applies an opposing force that frees 3' end 153 from the aperture 113 and makes the 3' end accessible to fluid, polymerase, and nucleotides for use in adding another nucleotide to 3' end 153.
- circuitry 160 may electronically control the duration of the measurement operation to achieve a desired SNR (e.g., a SNR exceeding a predefined threshold) while inhibiting polymerase 105 from adding another nucleotide.
- Translocation of construct 150 through nanopore 110 also is controlled electronically using circuitry 160, and accordingly is less subject to variable kinetics of a translocation enzyme as is the case with strand sequencing, in which very fast translocation events may go undetected that may otherwise lead to deletion or other types of errors that affect accuracy.
- FIGS. 8A-8C illustrate example values of electrical characteristics that may be measured using the system of FIGS. 1A-1H. More specifically, FIG. 8A illustrates an example sequence of values that may be measured under a first set of measurement conditions as base pairs G (153) C (in 150 and paired to 153) (referred to herein as “GC”), T (153) A (in 150 and paired to 153) (referred to herein as “TA”), and A (153) T (in 150 and paired to 153) (referred to herein as “AT”) respectively become located at the 3' end 153 of duplex 154, and sequences of nucleotides A-T, T-G, and G-C respectively become located in the single- stranded second portion 156 of construct 150, in a manner such as respectively described with reference to FIGS.
- a first value is measured for the combination GC and A-T (FIG. IB)
- a second value is measured for the combination TA and T-G (FIG. ID)
- a third value is measured for the combination AT and G-C (not specifically illustrated).
- the particular types of nucleotides at the 3' end 153 of duplex 154, and in the sequences of nucleotides located in the single-stranded second portion 156 of construct 150 may affect the ionic current through nanopore 110 and therefore may affect the measured values.
- FIG. 8B illustrates an example sequence of values that may be measured, again under the first set of measurement conditions, as base pairs GC, TA, and AT respectively become located at the 3' end 153 of duplex 154, and sequences of nucleotides A-T, T-G, and G-C* respectively become located in the single-stranded second portion 156 of construct 150, in a manner such as respectively described with reference to FIGS. IB and ID, but in which C* is a nucleotide analogue, e g., a methylated cytosine.
- a first value is measured for the combination GC and A-T (FIG. IB)
- a second value is measured for the combination TA and T-G (FIG.
- the first value in FIG. 8B may be similar to the first value in FIG. 8A, for example because C* is separated from the 3' end of the duplex by three bases, and therefore may not significantly affect the ionic current through nanopore 110.
- the second value in FIG. 8B may differ somewhat from the second value in FIG. 8A, for example because C* may somewhat affect the ionic current through nanopore 110, relative to how unmethylated C may affect such current, even though it is separated from the 3' end of the duplex by two bases.
- the third value in FIG. 8B may differ significantly from the third value in FIG. 8 A, for example because C* may significantly affect the ionic current through nanopore 110, relative to how unmethylated C may affect such current, because it is directly adjacent to the 3' end of the duplex.
- Different sets of measurement conditions also may affect the values which are measured for different combinations of nucleotides at the 3' end 153 of the duplex 154 and in the singlestranded second portion 156 of construct 150.
- applying a different force using circuitry 160 may move the 3' end 153 of the duplex 154, and the second portion 156 of construct 150, to a different location relative to nanopore 1 10 at which nucleotides in the duplex 154 and construct 150 may affect the ionic current differently than they do at another location (under another force).
- Changes in the measurement conditions may linearly or nonlinearly affect the measured values, and indeed may change the measured values in different directions for different combinations of nucleotides at the 3' end 153 of the duplex 154 and within the second portion 156 of construct 150.
- FIG. 8C illustrates an example sequence of values that may be measured, under a second set of measurement conditions that differs from the first set of measurement conditions, as base pairs GC, TA, and AT respectively become located at the 3' end 153 of duplex 154, and sequences of nucleotides A-T, T-G, and G-C respectively become located in the single-stranded second portion 156 of construct 150, in a manner such as respectively described with reference to FIGS. IB and ID.
- a first value is measured for the combination GC and A-T (FIG. IB)
- a second value is measured for the combination TA and T- G (FTG.
- the first value in FIG. 8C may be different from the first value in FIG. 8A
- the second value in FIG. 8C may be different from the second value in FIG. 8A
- the third value in FIG. 8C may be different from the third value in FIG. 8A, even though the sequences are the same, for example because the second set of measurement conditions affects the ionic current through nanopore 110 in a manner that is different, for each combination of nucleotides, than does the first set of measurement conditions.
- the second set of measurement conditions decreases the first value, increases the second value, and increases the third value, relative to those values under the first set of measurement conditions.
- construct 150 may be sequenced multiple times under different sets of measurement conditions, and/or may be sequenced using multiple sets of conditions during a single round of sequencing construct 150, and the sequences compared to one another to further improve the accuracy of the sequence.
- measurements from target polynucleotide 158 and its complement 159 within construct 150 may be aligned to one another and bioinformatically combined to obtain significantly better sequencing accuracy for target polynucleotide 158 than may be obtained by sequencing the target polynucleotide (or its complement) alone.
- nucleotides at the 3' end 153 of duplex 154, and in the sequences of nucleotides located in the single-stranded second portion 156 of construct 150 may affect the ionic current through nanopore 110 and therefore may affect the measured values.
- paired nucleotides that are within duplex 154 and spaced apart from the 3' end 153 of duplex 154 may affect the ionic current through nanopore 110 and therefore may affect the measured values.
- signal contributions from different portions of the duplex 154 and/or from additional unpaired bases in the sequence of construct 150 may be used to distinguish different nucleotides in a homopolymer sequence from one another.
- the greater the values of M and/or D, the longer the sequence or “word” that may be read at a given time the longer the homopolymer stretches that may be reliably read because the more unpaired nucleotides and/or duplex base pairs may contribute to signals by which the nucleotides of the homopolymer may be distinguished from one another.
- the sequence of measurements may be repeated under a different set of measurement conditions, to obtain a different set of values that may distinguish the nucleotides in the homopolymer sequence from one another in a different way.
- circuitry 160 may measure such fluctuations or noise and use the measured values of those fluctuations or noise to identify nucleotides.
- measurements such as described with reference to FIGS. 8A-8C may be used as input to an algorithm that correlates measured values - each of which may have any desired level of accuracy and may be obtained under any suitable set of measurement conditions - with different combinations of nucleotides respectively within target polynucleotide 158 and its complement 159 within construct 150.
- the algorithm may be stored in non-volatile computer- readable memory in operable communication with circuitry 160, and circuitry 160 repeatedly may use the algorithm to identify individual nucleotides in the sequence of target polynucleotide 158, based on the values respectively measured from target polynucleotide 158 and complement 159 (regardless of the particular order in which those measurements are obtained).
- FIGS. 8A-8C may be used to generate a data structure, such as an N-dimensional “read map,” that correlates measured values - each of which may have any desired level of accuracy and may be obtained under any suitable set of measurement conditions - with different combinations of sequences within duplex 154 and construct 150.
- FIG. 8D illustrates an example N- dimensional read map of electrical characteristics that may be used to identify nucleotides using the system of FIGS. 1A-1H.
- read map 800 includes a first axis corresponding to a first type of measured value, a second axis corresponding to a second type of measured value, and a third axis corresponding to a given measurement condition that may be varied between different measurements.
- the measurement condition may be or include application of a first force Fl such as described with reference to FIGS. IB, ID, and IE, and the measurement condition may be varied (changed along the respective axis in FIG. 8D) by applying a modified first force Fl’, such as described with reference to FIG. IF, that positions the 3' end 153 of the duplex 154 differently relative to nanopore 110, thus causing a change in one or more measured values.
- a first force Fl such as described with reference to FIGS. IB, ID, and IE
- a modified first force Fl’ such as described with reference to FIG. IF
- N-dimensional read map 800 illustrated in FIG. 8D may be generated using a calibration procedure in which a sufficient number of different polynucleotides, the sequences of which are known a priori, are sequenced using measurement operations and nucleotide addition operations in a manner such as described with reference to FIGS. 1 A-1D and 1H.
- Such a calibration procedure may be performed on a per-system basis, or may be performed in such a manner that the points in read map 800 apply approximately equally to each of a plurality of systems, such that each system need not be individually calibrated to generate its own read map.
- circuitry 160 may measure the values of one or more electrical properties of the 3' end 153 of the duplex 154 and the second portion 156 of construct 150, such as an electrical current, an ionic current, an electrical resistance, or an electrical voltage drop across nanopore 110, between nucleotide addition steps. Circuitry 160 may store the measured values and the measurement condition in a non-volatile computer readable medium (e.g., memory), and this set of information may be considered to populate a first plane of points in read map 800.
- a non-volatile computer readable medium e.g., memory
- circuitry 160 may measure the values of one or more electrical properties of the 3' end 153 of the duplex 154 and the second portion 156 of construct 150, between nucleotide addition steps. Circuitry 160 may store the measured values and the measurement condition in the computer readable medium (e.g., memory), and this set of information may be considered to populate a second plane of points in read map 800. Such operations of measuring and storing measurement values and the respective measurement condition may be repeated any suitable number of times to provide read map 800 with the desired number of dimensions.
- the computer readable medium e.g., memory
- circuitry 160 For each of the points that are stored in read map 800, circuitry 160 also may store the identities and respective locations in the sequence of the nucleotide(s) which are known to have contributed to that signal because the sequences of the polynucleotides are known a priori. For example, for each of the points stored in read map 800, circuitry 160 may store the identities and locations in the sequence at least of nucleotides 4, 4’, 5, and 6 illustrated in FIG. IE. Optionally, circuitry 160 also may store the identities of nucleotides 3 and 3’. As a further option, circuitry 160 also may store the identities and locations in the sequence of nucleotides 2 and 2’.
- circuitry 160 also may store the identities and locations in the sequence of nucleotides 1 and 1’. Additionally, or alternatively, circuitry 160 also may store the identity and location in the sequence of nucleotide 7. As a further option, circuitry 160 also may store the identity and location in the sequence of nucleotide 8. Nonlimiting examples of the numbers and locations of nucleotides that may contribute to the measured signal, and thus may be stored within read map 800, are described with reference to FIGS. 1E-1F.
- circuitry 160 may store the nucleotides and their locations in a format such as G C A T, where it is defined by convention that the first listed nucleotide corresponds to nucleotide 4’ which is located at the 3' end 153 of duplex 154, the second listed nucleotide corresponds to nucleotide 4 which is the base pair of nucleotide 4’, the third listed nucleotide corresponds to nucleotide 5 which is unpaired and adjacent to the base pair, and the fourth listed nucleotide corresponds to nucleotide 6 which is unpaired and adjacent to nucleotide 5.
- circuitry 160 may store the nucleotides and their locations in a format such as T A T G, using the same convention.
- T A T G any other suitable format may be used, and any suitable number of nucleotides may be included and their identities and locations indicated in any suitable manner.
- methylated bases may be included in the polynucleotides at locations that are known a priori, and the calibration procedure performed. In a manner such as described with reference to FIGS. 1G and 8C, one or more of the values that are measured from sequences including such modified nucleotides may be different from the values that are measured from sequences including unmodified nucleotides.
- the identity of the nucleotide which circuitry 160 stores may include a suitable indication of whether and how the nucleotide is modified.
- point 813 in read map 800 may be defined by the measured value of electrical current through nanopore 810 at a first set of measurement conditions (e.g., first force Fl) for a sequence including AT the 3' end, G at location 5, and methylcytosine (C*) at location 6. It may be seen that point 813 may have a different location in read map 800 than may another point for which the cytosine is unmodified, as well as may yet another point for which the cytosine includes a different type of modification.
- a first set of measurement conditions e.g., first force Fl
- C* methylcytosine
- read map 800 may include any suitable number of dimensions, e.g., axes corresponding to any suitable number and types of measured values, standard deviations of measured values, measurement conditions (such as temperature, fluid composition such as salt concentration and/or pH, location in flow cell), types of (e.g., variants of) nanopores, types of (e.g., variants of) polymerases, nucleotide modifications, and the like.
- Read map 800 (or other suitable data structure such as described elsewhere herein) may be stored in non-volatile computer-readable memory in operable communication with circuitry 160, and circuitry 160 may use the read map in any suitable manner to identify individual nucleotides in the sequence of construct 150, e.g., as nucleotides are added to the 3' end 153 of duplex 154.
- circuit 160 may measure the values of one or more electrical properties of the 3' end 153 of duplex 1 4 and the single-stranded second portion 156 of construct 150 under a given set of measurement conditions.
- circuit 160 may be programmed to use sets of measurement conditions for which read map 800 contains points, so as to facilitate comparison of values that are measured for unknown sequences to values that were previously measured for sequences that were known a priori.
- circuit 160 may locate the set (e.g., plane) of points in the read map corresponding to that set of measurement conditions, may compare the measured value(s) to the corresponding set of values in the read map, and based upon such comparison may select the value or combination of values in that set that is/are closest to the measured value. Then, for the selected value(s), circuit 160 may retrieve the identities and locations of the nucleotides that generated the value(s) during the calibration step.
- circuit 160 may determine that the measured value(s) for the unknown construct 150 sequence is closest in magnitude to the value(s) for point 811 in read map 800. Based upon such comparison, circuit 160 may determine that the unknown construct 150 sequence includes the base pair GC at the 3' end 153 of the duplex 154 (locations 4’ and 4, respectively), A at location 5, and T at location 6. Such a determination may be referred to as a “base call.”
- circuitry 160 is not limited to using a single point in read map 800 to make a base call, although that is an option. Instead, circuitry 160 may use multiple points within read map in order to significantly enhance the accuracy of the base call. For example, as may be understood from comparing FIGS. IB and ID, the nucleotide that is added to the 3' end 153 of duplex 154 shifts the single-stranded second portion 156 of construct 150 upward by a single nucleotide. As such, the sequence of single-stranded second portion 156 in FIG. IB and the sequence of single-stranded second portion 156 in FIG.
- Circuitry 160 may compare the base call from the measurement made at the time of FIG. IB to the base call from the measurement made at the time of FIG. ID, to confirm whether the same nucleotide(s) are present but are shifted in location by a single nucleotide, as should be the case if (i) a single nucleotide was added in the operation illustrated in FIG. 1C and (ii) the base calls made for FIGS. IB and ID were both correct.
- circuitry 160 may proceed with the sequencing process. On the other hand, if circuitry 160 determines that these base calls are inconsistent with one another, then the circuitry may flag this portion of the sequence as containing an error, may attempt to make the base call again, or even to resequence the construct in a manner such as described elsewhere herein. Note that for a sufficiently long homopolymer stretch, there may not necessarily be any change in the signals. In this case, other information provided by the present systems and methods still may be used to confirm the individual addition of nucleotides in a manner such as described elsewhere herein.
- Circuitry 160 also, or alternatively, may use multiple measurements in order to locate the closest point within read map 800. For example, circuitry 160 may make another base call for the same sequence, but using a second, different set of measurement conditions for which read map 800 contains points, so as to facilitate comparison of values that are measured for unknown sequences to values that were previously measured for sequences that were known a priori. Circuitry 160 may impose the second set of measurement conditions shortly after imposing a first set of measurement conditions, e.g., may apply first force Fl and then may apply modified first force Fl’ before adding another nucleotide. Alternatively, circuitry 160 may impose the second set of measurement conditions while resequencing construct 150 in a manner such as described elsewhere herein.
- Circuitry 160 may compare the base call from the measurement made with the first set of measurement conditions to the base call from the measurement made with the same set of measurement conditions, to confirm whether the same nucleotide(s) are present at the same locations in both base calls, which should be the case if both of the base calls were correct. For example, base calls made using points 811 and 811’ in read map 800 will be consistent, and a base call made using point 812 and a base call made using point 812’ will be consistent. If, based upon such a comparison, circuitry 160 determines that the base calls for different measurement conditions are consistent with one another (e.g., contain the same sequences), then circuitry 160 may proceed with the sequencing process.
- circuitry 160 may flag this portion of the sequence as containing an error, may attempt to make one or both of the base calls again, or even to resequence the construct in a manner such as described elsewhere herein.
- read map 800 is illustrated graphically for purposes of discussion, it should be understood that the points in read map suitably may be stored any suitable format, within a non- volatile computer-readable medium, that correlates a given measurement condition with one or more values that were measured under that condition and combinations of duplex and singlestranded sequences that were used to generate those values.
- a look-up table (LUT) is a nonlimiting example of a format that may be used to store correlations between measured values and known combinations of duplex and single-stranded sequences that were used to generate those values.
- LUT look-up table
- any suitable data structure may be used to store correlations between measured values and known combinations of duplex and single-stranded sequences that were used to generate those values.
- the data structure may be generated by, and appropriately stored for use by, a machine learning algorithm.
- the data structure may be generated by training a machine learning algorithm to recognize values that are obtained, under each respective given set of measurement conditions, and are known a priori to correspond to respective combinations of duplex and single-stranded sequences, and may be stored in any suitable format within a non-volatile computer-readable medium.
- the data structure subsequently may be used by the trained machine learning algorithm, implemented by circuitry 160, to generate an output that identifies nucleotides in the sequence of construct 150, based upon an input of values that are measured in between nucleotide addition steps such as described with reference to FIGS. 1A-1H.
- circuitry 160 may be used to train the machine learning algorithm, or different circuitry may be used to train the machine learning algorithm that then is implemented by circuitry 160.
- the data structure may be generated by, and appropriately stored for use by, a neural network, such as a deep learning algorithm.
- the data structure may be generated by training a neural network (e.g., deep learning algorithm) to recognize values that are obtained, under each respective set of measurement conditions, and are known a priori to correspond to respective combinations of duplex and single-stranded sequences, and may be stored in any suitable format within a non-volatile computer-readable medium.
- the data structure may include neurons of the neural network (e.g., deep learning algorithm).
- the data structure subsequently may be used by the trained neural network (e.g., deep learning algorithm), implemented by circuitry 160, to generate an output that identifies nucleotides in the sequence of construct 150, based upon an input of values that are measured in between nucleotide addition steps such as described with reference to FIGS. 1A-1H.
- circuitry 160 may be used to train the neural network (e.g., deep learning algorithm), or different circuitry may be used to train the neural network (e.g., deep learning algorithm) that then is implemented by circuitry 160.
- FIG. 9 illustrates an example flow of operations in a method 900 for generating a sequence of a target polynucleotide using a first plurality of measured values from the polynucleotide and a second plurality of measured values from its complement.
- Method 900 includes aligning a first plurality of measured values, from template polynucleotide 158 in construct 150, to a second plurality of measured values, from complement 159 in that construct 150 (operation 910).
- FIG. 11A illustrates example current signals obtained from a polynucleotide and its complement using the system of FIGS. 1A-1H. More specifically, FIG.
- FIG. 11 A illustrates an example first plurality of measured values 1111 from template polynucleotide 158 (“Forward (F) reads”), and an example second plurality of measured values 1121 from its complement 159 (“Reverse complement (RC) reads”.
- the first and second pluralities of measured values 1111, 1121 may both be obtained using a first set of measurement conditions (illustratively, an 80 mV bias applied during measurement).
- a first set of measurement conditions illustrated in the nonlimiting example shown in FIG.
- an additional plurality of measured values 1112 from template polynucleotide 158 and an additional plurality of measured values 1122 from complement 159 are obtained using a second set of measurement conditions (illustratively, a 60 mV bias applied during measurement); and an additional plurality of measured values 1113 from template polynucleotide 158 and an additional plurality of measured values 1123 from complement 159 are obtained using a third set of measurement conditions (illustratively, a 40 mV bias applied during measurement).
- Circuit 160 may align the first and second sets of measurements (e.g., currents) in any suitable manner.
- one or more barcodes e.g., one or more oligonucleotides of known sequence
- Circuit 160 may be configured to detect the presence location(s) of the barcode(s) within construct 150 based on the known sequence of nucleotides within the barcode(s), because the measured values of the construct 150 contain values that correspond to the known sequence of nucleotides within the barcode(s).
- FIG. 1 IB schematically illustrates that the sets of measurements
- Method 900 includes using a joint decoding algorithm to obtain a joint sequence of the template polynucleotide and its complement based on the aligned values (operation 920).
- circuit 160 may be configured to use a joint hidden Markov model (HMM) decoder implementing a Viterbi algorithm to find the most likely joint sequence of polynucleotides along both the target polynucleotide 158 and its complement 159, taking as input the aligned values and providing as output the consensus sequence.
- the HMM may include a mapping of measurements (e.g., ionic currents) of all 4096 possible polynucleotide hexamer states, including AAAAAA, TAAAAA, .
- each of the possible hexamer “states” may be associated with a current or set of currents. Since only one hexamer sequence of DNA can be in the constriction at any single point in time, the measurements may be associated with the k-mer in the constriction. Later, algorithmically, the “forward” and “reverse complement” strand information may be combined for decoding.
- FIG. 1 IB also schematically illustrates joint decoding of the current signals of FIG. 11A.
- the aligned measurements made using all of the different measurement conditions were input to the HMM decoder for joint decoding, and the joint sequence for both the template polynucleotide 158 and complement 159 are output.
- the “forward strand” currents and the “reverse complement” currents are first aligned to each other, for example using the barcode sequences for each separate segment as references.
- a Hidden Markov Model (HMM) using the Viterbi algorithm is used to decode the joint signal.
- HMM Hidden Markov Model
- decoding with the HMM model is performed using 6 features, each of which map to a unique hexamer (e.g. AAAAAA, TAAAAA, and so on).
- the 6 features comprise: 3 currents for the “forward strand”, corresponding to the 40 mV, 60 mV and 80 mV reads, and 3 currents for the “reverse complement strand” also corresponding to 40 mV, 60 mV and 80 mV reads.
- the features were obtained from a read map consisting of 4 6 states (all hexamer states).
- the 6 features would look like as a list or vector corresponding to decoding the 6-mer AATTCC: ⁇ Feature 1 : 40 mV read for AATTCC, Feature 2: 60 mV read for AATTCC, Feature 3: 80 mV read for AATTCC, Feature 4: 40 mV read for TTAAGG, Feature 5: 60 mV read for TTAAGG, Feature 6: 80 mV read for TTAAGG ⁇ .
- the features 4-6 contain the direct complement sequence of the AATTCC, and can be found by looking it up in the read map.
- the 6 features would look like as a list or vector corresponding to decoding the 6-mer TTAAGG: ⁇ Feature 1 : 40 mV read for TTAAGG, Feature 2: 60 mV read for TTAAGG, Feature 3: 80 mV read for TTAAGG, Feature 4: 40 mV read for AATTCC, Feature 5: 60 mV read for AATTCC, Feature 6: 80 mV read for AATTCC ⁇ .
- the features 4-6 contain the direct complement sequence of the TTAAGG, and can be found by looking it up in the read map.
- the 6 features would look like as a list or vector corresponding to decoding the 6-mer ATCGAT: ⁇ Feature 1 : 40 mV read for ATCGAT, Feature 2: 60 mV read for ATCGAT, Feature 3: 80 mV read for ATCGAT, Feature 4: 40 mV read for TAGCTA, Feature 5: 60 mV read for TAGCTA, Feature 6: 80 mV read for TAGCTA ⁇ .
- any suitable data structure may be used to identify and/or store correlations between measured values and known combinations of nucleotides (e.g., duplex and single-stranded sequences) that were used to generate those values.
- the data structure may be generated by, and appropriately stored for use by, a machine learning algorithm.
- the data structure may be generated by training a machine learning algorithm to recognize values that are obtained, under each respective given set of measurement conditions, and are known a priori to correspond to respective combinations of duplex and single-stranded sequences, and may be stored in any suitable format within a non-volatile computer-readable medium.
- the data structure subsequently may be used by the trained machine learning algorithm, implemented by circuitry 160, to generate an output that identifies nucleotides in the sequence of construct 150, based upon an input of values that are measured in between nucleotide addition steps such as described with reference to FIGS. 1A-1H.
- circuitry 160 may be used to train the machine learning algorithm, or different circuitry may be used to train the machine learning algorithm that then is implemented by circuitry 160.
- the data structure may be generated by, and appropriately stored for use by, a neural network, such as a deep learning algorithm.
- the data structure may be generated by training a neural network (e.g., deep learning algorithm) to recognize values that are obtained, under each respective set of measurement conditions, and are known a priori to correspond to respective combinations of duplex and single-stranded sequences, and may be stored in any suitable format within a non-volatile computer-readable medium.
- the data structure may include neurons or weights of the neural network (e.g., deep learning algorithm).
- the data structure subsequently may be used by the trained neural network (e.g., deep learning algorithm), implemented by circuitry 160, to generate an output that identifies nucleotides in the sequence of construct 150, based upon an input of values that are measured in between nucleotide addition steps such as described with reference to FIGS. 1A-1H.
- the trained neural network e.g., deep learning algorithm
- circuitry 160 may be used to train the neural network (e.g., deep learning algorithm), or different circuitry may be used to train the neural network (e.g., deep learning algorithm) that then is implemented by circuitry 160.
- the training data preparation for the present systems and methods may use the following operations: 1) perform some initial basecalling of the raw signal, or level calls from some sequencing data from reference genomes, or a library of DNA fragments of known composition; 2) align the raw signals (or basecalled signals) to the reference genomes, or fragments of the known library; and 3) refine the initial basecalls and/or alignments based on knowledge of the true sequences, and fine-tune the mapping/association between signals and bases.
- FIG. 10 illustrates an example circuit 160 for sequencing a construct 150 in a manner such as provided herein.
- Circuit 160 may include processor 1040 and at least one computer-readable medium 1030.
- the computer-readable medium 1030 may store a first plurality of measured values 1031 of an electrical property of a 3' end of a first duplex and a single-stranded portion of a polynucleotide 158 within a construct, within an aperture of a nanopore.
- the computer-readable medium 1030 also may store a second plurality of measured values 1031’ of an electrical property of a 3' end of a duplex (e.g., the first duplex or a second duplex) and a single-stranded portion of a complement 159 of that polynucleotide 158 within that construct 150, within the aperture of the nanopore.
- the computer-readable medium 1030 may store a data structure 1032 (e g., joint decoding model or trained neurons), wherein the data structure correlates different measured values with different combinations of nucleotides within a duplex 154 and a single- stranded portion of polynucleotide 158 or complement 159 within construct 150, within an aperture 113 of a nanopore 110.
- Data structure 1032 further may identify a respective measurement condition under which the measured values were obtained, e.g., the magnitude of a bias voltage used to impose first force Fl or modified first force Fl’ during the measurement.
- data structure 1032 further may identify the combination of nucleotides (e.g., at least at locations 4 and 4' at the 3' end 153 of the duplex 154 and at locations 5 and 6 illustrated in FIG. IE) that provided each of the measured values.
- the computer-readable medium 1030 also may store instructions for causing processor
- the instructions may be for causing processor 1040 to use the first plurality of measured values and the second plurality of measured values (such as with a joint decoding algorithm or a neural network of trained neurons), to determine the sequence of nucleotides in the sequence of the polynucleotide; and to output a representation of the determined sequence of nucleotides.
- the instructions may be provided within sequencing module 1033.
- Sequencing module 1033 may include nucleotide addition module 1034 configured to cause processor 1040 to add a nucleotide to the 3' end 153 of a first or second duplex in a manner such as described with reference to FIGS. IB and ID.
- circuit 160 may be operably coupled to electrodes 102 and 103.
- Nucleotide addition module 1034 may cause processor 1040 to apply second force F2 by applying a suitable voltage bias across electrodes 102 and 103.
- Sequencing module 1033 may include measurement module 1035 configured to cause processor 1040 to measure, in between nucleotide addition operations, values of an electrical property of a 3' end 153 of a first or second duplex including portion of the construct (e.g., polynucleotide or complement) within an aperture of a nanopore in a manner such as described with reference to FIGS. IB, ID, and 1H, and to store the measured values within computer-readable medium 1031.
- measurement module 1035 may cause processor 1040 to apply first force Fl or modified first force Fl’ by applying a suitable voltage bias across electrodes 102 and 103, and measuring the value of the electrical property while applying such force.
- Circuitry 160 may include one or more sensor(s) 1010, each configured to measure the value of one or more electrical properties.
- Each sensor 1010 may be configured to measure, for example, an electrical voltage drop across nanopore 110, an electrical current through nanopore 110, or an electrical resistance across nanopore 110, or an intensity of light from a dye the emission from which changes responsive to the amount of ionic current through nanopore 110.
- Measured values 1031 further may identify a measurement condition under which the measured values were obtained, e.g., the magnitude of a bias voltage used to impose first force Fl or modified first force Fl’ during the measurement.
- CMOS-based detectors may include field effect transistors (FETs), e.g., metal oxide semiconductor field effect transistors (MOSFETs).
- FETs field effect transistors
- MOSFETs metal oxide semiconductor field effect transistors
- CMOS-SPAD single-photon avalanche diode
- FLIM fluorescence lifetime imaging
- circuitry 160 includes amplifier(s) 1020 respectively configured to amplify the electrical signal(s) from respective sensor(s) 1010 prior to storage of that signal at 1031; as a further option, an amplifier 1020 may be included within a respective sensor 1010.
- Sequencing module 1033 also may include nucleotide identification module 1036 configured to cause processor 1040 to identify nucleotides by comparing the first plurality of measured values 1031 and the second plurality of measured values 1031’ to values within data structure 1032 and/or using a joint decoding algorithm or deep learning algorithm, e.g., in a manner such as described with reference to FIGS. 8A-8C and 9.
- nucleotide identification module 1036 may cause processor 1040 to use measured values from 1031 and 1031 ’ (illustratively, in an order - or inverted order - corresponding to the temporal order in which they were obtained), and to use at least a portion of data structure 1032 that was obtained under the same operating condition as were the measured values.
- nucleotide identification module 1036 may cause processor 1040 to compare the identification of a measurement condition stored within measured values 1031 or measured values 1031’ to the identification of a measurement condition stored within data structure 1032, and to ignore any values within data structure 1032 that do not match the measurement condition of the measured values 1031 or 1031’.
- nucleotide identification module 1036 may cause processor 1040 to perform an operation comparing the measured values to values within the data structure, for example by taking differences between the measured values and values within the data structure, by taking ratios between the measured values and values within the data structure, by performing statistical comparisons such as the T-test, or the like.
- Nucleotide identification module 1036 may cause processor 1040 to select the value within data structure 1032 that, based on the comparison, is most similar to the measured value, and to select from data structure 1032 the combination of nucleotides (e.g., at least at locations 4 and 4' at the 3' end of the duplex and at locations 5 and 6 illustrated in FIG. IE) that previously provided that measured value.
- nucleotide identification module 1036 may implement a deep learning algorithmjoint decoding model, or other suitable algorithm for selecting such a combination of nucleotides.
- Nucleotide identification module 1036 may cause processor 1040 to use the selected combinations of nucleotides to construct an electronic sequence of nucleotides that corresponds to the physical sequence of nucleotides in construct 150 to which measured values 1031 and 1031’ correspond. For example, using the labels illustrated in FIG. IE, each measured value 1031 and 1031’ includes contributions from the base pair 4, 4', as well as the unpaired nucleotides 5 and 6. Accordingly, in some examples, nucleotide identification module 1036 may cause processor 1040 to include the nucleotides at locations 4, 5, and 6 within the electronic sequence of nucleotides that corresponds to the physical sequence of nucleotides in construct 150.
- nucleotides may already have been included in the electronic sequence, because at least some of such nucleotides also would have contributed to the value measured in the immediately prior measurement step - that is, before the addition of a single nucleotide - but at a location that is shifted by a single nucleotide.
- the electronic sequence already may include the nucleotides that now are at locations 4 and 5, but were at locations 5 and 6 in the previous measurement step.
- the nucleotide that is now at location 6 may be added to the electronic sequence; for example, having been at location 7 in the previous measurement step it may not have sufficiently contributed to the value in the previous measurement step in such a manner as to be identifiable during that step, but now is identifiable.
- Nucleotide identification module 1036 may cause processor 1040 to output the electronic sequence of nucleotides, e.g., by saving the sequence to memory 1030, electronically transmitting the sequence to another device or system (not specifically illustrated), displaying the sequence or a portion thereof on a display screen (not specifically illustrated) that is operably coupled to circuit 160, or the like.
- Nucleotide identification module 1036 optionally may be configured to cause processor 1040 to identify nucleotides, or confirm the identity of nucleotides, using measurements of multiple types of values. For example, in a manner such as described with reference to FIGS. 8A-8C and 9, for any given combination of nucleotides, data structure 1032 optionally may include different values of a particular type that respectively correspond to different measurement conditions. Additionally, or alternatively, for any given combination of nucleotides, data structure 1032 optionally may include different types of measured values. Nucleotide identification module 1036 may cause processor 1040 to use any combination of values obtained using different types of measurements and/or different measurement conditions to identify nucleotides, or confirm the identity of nucleotides. As one illustrative example, nucleotide identification module 1036 may cause processor 1040 to use values from two different types of measurements to identify a point in data structure 1032 corresponding to a particular combination of duplex base pairs and unpaired nucleotides.
- nucleotide identification module 1036 additionally, or alternatively, may be configured to cause processor 1040 to confirm the accuracy of an identification using at least the immediately previous or next measurement step, if not even earlier and/or even later measurement step(s). For example, in a manner such as described with reference to FIGS. 8A- 8C, if for a given measurement step processor 1040 identifies nucleotides A, C, and G at locations 4, 5 and 6, and if such identification is accurate, then for the next measurement step (after addition of a single nucleotide using nucleotide addition module 1034) the processor should identify nucleotides C and G at locations 4 and 5.
- nucleotide identification module 1036 may cause processor 1040 to compare the nucleotides identified for a given measurement step to the nucleotides identified for at least one other (e.g., earlier or later step) measurement step, and to indicate (e.g., flag) an error based on any differences between identified nucleotides that should have been the same as each other.
- nucleotide identification module 1036 may cause processor 1040 to take one or more remedial actions. For example, nucleotide identification module 1036 may cause processor 1040 to disregard one or more nucleotide identifications that are in error, e.g., by replacing the erroneous identification with an identification that is known to be correct from other measurement steps that are consistent with each other (e.g., because each given nucleotide may contribute to three or more sequential measurement values); or by indicating the nucleotide as not being identified (e.g., using a nonce character such as “Z” to indicate that the nucleotide’s identity is unknown).
- a nonce character such as “Z”
- nucleotide identification module 1036 may cause processor 1040 to attempt to identify the nucleotide again using stored measured values 1031, 1031’ and data structure 1032.
- nucleotide identification module 1036 may cause processor 1040 to attempt to identify the nucleotide again by obtaining a new measured value 1031 or 1031’ using a different measurement condition (e.g., a modified first force Fl’) and data structure 1032.
- a different measurement condition e.g., a modified first force Fl’
- nucleotide identification module 1036 may cause processor 1040 to attempt to identify the nucleotide again by obtaining a new measured value 1031 or 1031’ using a different measurement type and data structure 1032.
- nucleotide identification module 1036 may cause processor 1040 to resequence the construct 150 in a manner such as described elsewhere herein.
- nucleotide identification module 1036 may cause processor 1040 to indicate in the electronic sequence that the nucleotide was not identified (e.g., using a nonce character such as “Z” to indicate that the nucleotide’s identity is unknown). In some examples, nucleotide identification module 1036 may cause processor 1040 to select from among these or other operations based upon the apparent nature of the error and the options available at the time the error is identified.
- data structure 1032 may be generated by training any suitable machine learning algorithm, such as a neural network (e g., deep learning algorithm) using measured values, the combinations of nucleotides where are known a priori to correspond to those measured values, and the measurement conditions under which those measured values were obtained.
- data structure 1032 may have a construction that is readily usable by the trained machine learning algorithm, e g., trained neural network, such as a trained deep learning algorithm (nucleotide identification module 1036, implemented by processor 1040) to identify combinations of nucleotides using measured values, but such construction may not necessarily be usable by any other software, module, or algorithm to determine correlations between measured values and combinations of nucleotides.
- machine learning algorithms such as neural networks, e.g., deep learning algorithms
- machine learning algorithms are supervised, semi-supervised, unsupervised, and reinforcement algorithms.
- Neural network algorithms are a subset of machine learning algorithms and may include deep learning algorithms, convolutional neural networks, recurrent neural networks, generative adversarial networks, and recursive neural networks.
- the particular construction of data structure 1032 may include, for example, a vector space, graph space, neurons of a neural network, or the like.
- data structure 1032 may be implemented using any suitable data structure that may be queried using nucleotide identification module, such as a look-up table (LUT), matrix, flat-file database structure, SQL database structure, or the like.
- Nucleotide identification module 1036 may cause processor 1040 to suitably identify points in the data structure 1032 with measured values, for known combinations of nucleotides, that correspond to the measured values for unknown combinations of nucleotides.
- circuitry 160 may be implemented using any suitable combination of digital electronic circuitry, integrated circuitry, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), central processing units (CPUs), graphical processing units (GPUs), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- CPUs central processing units
- GPUs graphical processing units
- circuit 160 may be implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- the programmable system or computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a clientserver relationship to each other
- These computer programs can include machine instructions for a programmable processor, and/or can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language.
- the terms “memory” and “computer-readable medium” refer to any computer program product, apparatus and/or device, such as magnetic discs, optical disks, solid-state storage devices, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable data processor, including a machine-readable medium that receives machine instructions as a computer-readable signal.
- the term “computer-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable data processor.
- the computer-readable medium can store such machine instructions non-transitorily, such as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium.
- the computer-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random-access memory associated with one or more physical processor cores.
- a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code.
- the software components and/or functionality can be located on a single computer or distributed across multiple computers and/or the cloud, depending upon the situation at hand.
- a bus (not specifically illustrated) can serve as the information highway interconnecting the other illustrated components of the hardware.
- the system bus can also include at least one communication port (such as a network interface) to allow for communication with external devices either physically connected to the computing system or available externally through a wired or wireless network.
- Processor 1040 may be implemented using a CPU (central processing unit) (e.g., one or more computer processors/data processors at a given computer or at multiple computers) that can perform calculations and logic operations required to execute a program.
- CPU central processing unit
- Memory 1030 may include a non-transitory processor-readable storage medium, such as read only memory (ROM) and/or random access memory (RAM) in communication with processor 1040 and can include one or more programming instructions for the operations provided herein, e g., sequencing module 1033 and its components, and may store measured values 1031 and data structure 1032.
- memory 1030 may include a magnetic disk, optical disk, recordable memory device, flash memory, or other physical storage medium.
- circuit 160 may include or may be implemented on a computing device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) or OLED (organic light emitting diode) or plasma monitor) for displaying information obtained to the user and an input device such as keyboard and/or a pointing device (e.g., a mouse or a trackball) and/or a touchscreen by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) or OLED (organic light emitting diode) or plasma monitor
- an input device such as keyboard and/or a pointing device (e.g., a mouse or a trackball) and/or a touchscreen by which the user can provide input to the computer.
- a pointing device e.g., a mouse or a trackball
- sequencing module 1033 further may be configured to cause processor 1040 to perform additional operations such as will be described with reference to FIGS. 3A-3B, 4A-4B, 5, and 7. Such operations may be implemented using additional modules not specifically illustrated in FIG. 10, or may be implemented using suitable modifications to nucleotide addition module 1034, measurement module 1035, and/or nucleotide addition module 1036.
- nucleotide identification module 1036 may include the software/hardware doing the “basecalling”, and may include modules configured to perform operations including data extraction (e.g., signal level calling), alignment (e.g., account for offsets), decoding (e.g., converting called signal levels to base calls), and consensus (e.g., combining reads from sequencing and/or resequencing.
- data extraction e.g., signal level calling
- alignment e.g., account for offsets
- decoding e.g., converting called signal levels to base calls
- consensus e.g., combining reads from sequencing and/or resequencing.
- Method 500 may include generating a construct including a polynucleotide and a complement of the polynucleotide (operation 510).
- construct 150 may include template polynucleotide 158 and its complement 159 coupled thereto, e.g., via structure 157.
- the template polynucleotide 158 may be located 3' of complement 159, or complement 159 may be located 3' of template polynucleotide 158.
- Nonlimiting examples of methods for generating construct 150 that is, for performing operation 510) are described in greater detail with reference to FIG. 6.
- Method 500 may include disposing the construct through the aperture of the nanopore such that a 3' end of the construct is on the first side of the nanopore, and a 5' end of the construct is on the second side of the nanopore (operation 520).
- Example operations for disposing construct 150 through the aperture of nanopore 110 in such a manner are provided below with reference to FIG. 7.
- Method 500 may include forming a first duplex with the construct on the first side of the nanopore, the first duplex including a 3' end (operation 530).
- Such a first duplex may be formed, for example, by hybridizing the first portion 155 of construct 150 to a first primer (polynucleotide 140 or a portion thereof) on the first side of the nanopore, e.g., in a manner such as described below with reference to FIG. 7.
- polynucleotide 140 may be or include a portion of complex 150, e.g., in a manner such as described below with reference to FIG. 6.
- Method 500 may include characterizing the polynucleotide to generate a first plurality of measured values (operation 540).
- operation 540 may include (i) extending the first duplex on the first side of the nanopore by adding a nucleotide to the 3' end of the first duplex.
- circuitry 160 may apply second force F2, responsive to which the 3' end of the first duplex is located out of the aperture of the nanopore such that a polymerase may act upon the 3' end of the first duplex to add a nucleotide thereto, e.g., in a manner such as described with reference to FIG. 1C.
- Operation 540 of method 500 may include (ii) inhibiting, using the nanopore, translocation of the 3' end of the extended first duplex to the second side of the nanopore.
- constriction 114 or other feature of nanopore 110 may inhibit passage of 3' end 153 of duplex 154 (which in some examples is a first duplex), from the first side 111 of nanopore 110 to the second side 112 of nanopore 110 while first force Fl is applied in a manner such as described with reference to FIGS. IB and ID.
- Operation 540 of method 500 also may include (iii) measuring a value of an electrical property of the 3' end of the first duplex and a single-stranded portion of the polynucleotide.
- circuitry 160 may measure such a value of the electrical property while the first force Fl is applied in a manner such as described with reference to FIGS. IB and ID.
- Operation 540 of method 500 also may include (iv) repeating operations (i)-(iii) for a first plurality of nucleotides that are complementary to nucleotides of the polynucleotide to generate the first plurality of measured values.
- Method 500 also may include characterizing the complement to generate a second plurality of measured values (operation 550).
- operation 550 may include (v) extending the first duplex or a second duplex on the first side of the nanopore by adding a nucleotide to the 3' end of the first or second duplex.
- the duplex used in operation 550 may be a continuation of the first duplex 140 formed by extending nucleotides through the first portion of the construct 156.
- the duplex used in operation 550 may be a newly formed duplex, similar to duplex 140, which may hybridize to form a second duplex which is specific to a corresponding sequence (e.g., adapter or barcode) in complement 159 of the construct 150.
- circuitry 160 may apply second force F2, responsive to which the 3' end of the first or second duplex is located out of the aperture of the nanopore such that a polymerase may act upon the 3' end of the first or second duplex to add a nucleotide thereto, e.g., in a manner such as described with reference to FIG. 1C.
- Operation 550 of method 500 may include (vi) inhibiting, using the nanopore, translocation of the 3' end of the extended first or second duplex to the second side of the nanopore.
- constriction 114 or other feature of nanopore 110 may inhibit passage of 3' end 153 of a first duplex 154, from the first side 111 of nanopore 110 to the second side 112 of nanopore 110 while first force Fl is applied in a manner such as described with reference to FIGS. IB and ID.
- constriction 114 or other feature of nanopore 110 may inhibit passage of 3' end 153 of a second duplex, which may be formed by hybridizing a second primer to complement 159, from the first side 111 of nanopore 110 to the second side 112 of nanopore 110 while first force Fl is applied in a manner similar to that described with reference to FIGS. IB and ID.
- Operation 550 of method 500 also may include (vii) measuring a value of an electrical property of the 3' end of the first or second duplex and a single-stranded portion of the complement.
- circuitry 160 may measure such a value of the electrical property while the first force Fl is applied in a manner such as described with reference to FIGS. IB and ID.
- Operation 550 of method 500 also may include (iv) repeating operations (v)-(vii) for a second plurality of nucleotides that are complementary to nucleotides of the complement to generate the second plurality of measured values.
- Method 500 may include generating a first sequence of the polynucleotide using the first plurality of measured values (obtained in operation 540) and the second plurality of measured values (obtained in operation 550) (operation 560).
- circuitry 160 may identify a sequence of nucleotides in the construct 150 in a manner such as described with reference to FIGS. 8A-8C, 9, and 10.
- operations 540-560 may be repeated any suitable number of times, e.g., so as to substantially sequence construct 150.
- FIG. 2A schematically illustrates the formation of transient secondary structures on the second side of the nanopore when sequencing a polynucleotide using a nanopore.
- polynucleotide 250 may be disposed through nanopore 110 and hybridized to polynucleotide 140 on the first side of the nanopore in a manner similar to that described with reference to FIGS. 1 A-1H, except that polynucleotide 250 is not coupled to its full complement within construct 150, and thus is only partially stable.
- polynucleotide 250 may spontaneously transition between an “unfolded” (open or partially open) and a “folded” (closed) state.
- polynucleotide 250 spontaneously forms a secondary structure 251 in which a first portion of polynucleotide 250 hybridizes to a second portion of polynucleotide 250 on the second side of the nanopore, e.g., to form a hairpin.
- secondary structure 251 may apply a downward force on polynucleotide 250 which measurably changes the measurement (e g., ionic current) through nanopore 110 in a manner similar to that described with reference to force Fl’ of FIG. IF.
- secondary structure 251 may spontaneously open (partially or fully), which reduces or removes the downward force which secondary structure 251 had applied to polynucleotide 250, again measurably changing the ionic current through nanopore 110.
- secondary structure 251 is bistable, in that it repeatedly fluctuates between the unfolded and folded states.
- the fluctuation rate may depend on the length (and strength) of the secondary structure 251 (e.g., hairpin), and may vary widely from hundreds of times per second to fewer than once per second. It may be unknown during a given sequencing measurement whether a polynucleotide is in the folded state (e.g., FIG. 2A, right) or unfolded state (e.g., FIG. 2A, left), or even some combination of folded states and unfolded states, for example in the case of fluctuations that are sufficiently fast relative to the measurement time.
- the folded state e.g., FIG. 2A, right
- unfolded state e.g., FIG. 2A, left
- some combination of folded states and unfolded states for example in the case of fluctuations that are sufficiently fast relative to the measurement time.
- the nature (e.g., location, length, and/or strength) of the secondary structures 251 may change in ways that may be difficult to predict. These fluctuations thus may cause a sequence context-dependent uncertainty in the measured currents, and ultimately may give rise to base calling errors.
- the present constructs 150 may inhibit formation of transient secondary structures 251 on the second side of the nanopore 110.
- the present constructs 150 may form a secondary structure (such as a hairpin) which is sufficiently stable as to inhibit spontaneous fluctuations between folded and unfolded states such as described with reference to FIG. 2A.
- FIG. 2B schematically illustrates an example manner in which a full -complement polynucleotide may inhibit the formation of transient secondary structures when sequencing that full-complement polynucleotide using a nanopore.
- FIG. 2B illustrates nonlimiting examples of 3' first steric lock 151, polynucleotide 140, and 5' second steric lock 152 which will be described in greater detail below with reference to FIGS. 6-7.
- polynucleotide 158 and its complement 159 may hybridize to one another on the second side of nanopore 110 to form a relatively stable secondary structure 252 of sufficient length and strength to inhibit the spontaneous folding and unfolding of other secondary structures on the second side of nanopore 110.
- the secondary structure 252 is or includes a third duplex of any suitable length, e.g., a length of at least about 10 base pairs, or at least about 20 base pairs, or at least about 30 base pairs, or at least about 40 base pairs, or at least about 50 base pairs, or more.
- Secondary structure 252 may apply a downward force that shifts the measurement (e.g., ionic current) through the nanopore 110, e.g., in a manner similar to that described with reference to FIG. 2A.
- the nature (e.g., location, length, and/or strength) of the secondary structure 252 may change systematically and relatively slowly.
- the downward force may be relatively stable, for example because secondary structure 252 may not spontaneously unfold during the measurement, especially as compared to secondary structure 251.
- the measurements of the present constructs 150 will be significantly more stable than the fluctuating measurements (e.g., 261, 262, 263) described with reference to FIG. 2A for template polynucleotides which are not coupled to their complement.
- the present constructs 150 may be expected to significantly improve the accuracy of nanopore sequencing not only by providing the ability to jointly decode measurements made on both a template polynucleotide and its complement, but by significantly improving the quality of the measurements themselves.
- FIG. 3A schematically illustrates the formation of a secondary structure on the first side of the nanopore when sequencing a full-complement polynucleotide using the nanopore.
- FIG. 3A illustrates nonlimiting examples of 3' first steric lock 151, polynucleotide 140, and 5' second steric lock 152 which will be described in greater detail below with reference to FIGS. 6-7.
- template polynucleotide 158 and its complement 159 may hybridize to one another on the first side of nanopore 110 to form a relatively stable secondary structure 351 which in some regards is similar to secondary structure 252 described with reference to FIG. 2B.
- secondary structure 351 similarly may be of sufficient length and strength to inhibit the spontaneous folding and unfolding of other secondary structures on the first side of nanopore 110.
- the formation of secondary structure 351 may reduce the ability of polymerase 105 (not specifically shown) to access the 3' end of duplex 1 4, and thus may cause the extension of the duplex to slow down and potentially stop as polynucleotide 140 is extended towards secondary structure 351.
- hybridization between polynucleotide 158 and complement 159 may be inhibited in any suitable manner.
- FIG. 3B schematically illustrates an example manner in which the force and timing of the force applied to a full-complement polynucleotide may be adjusted to inhibit the formation of a secondary structure on the first side of the nanopore while still allowing enough time for polymerase to bind and extend the 3’ end of the duplex when sequencing the full-complement polynucleotide using the nanopore.
- FIG. 3B schematically illustrates an example manner in which the force and timing of the force applied to a full-complement polynucleotide may be adjusted to inhibit the formation of a secondary structure on the first side of the nanopore while still allowing enough time for polymerase to bind and extend the 3’ end of the duplex when sequencing the full-complement polynucleotide using the nanopore.
- polynucleotide 158 in which polynucleotide 158 is located 3' of complement 159, only a portion of polynucleotide 158 may be moved from the second side of nanopore 110 to the first side of nanopore 110, so as to inhibit hybridization between polynucleotide 158 and complement 159.
- Such movement may be performed, for example, after operation (iii) and before repeating operation (i) described above with reference to FIG. 5.
- polynucleotide 158 in which polynucleotide 158 is located 5' of complement 159, only a portion of complement 159 may be moved from the second side of nanopore 110 to the first side of nanopore 110, so as to inhibit hybridization between polynucleotide 158 and complement 159.
- Such movement may be performed, for example, after operation (vii) and before repeating operation (v) described above with reference to FIG. 5.
- all or a portion of complement 159 may be moved from the second side of nanopore 110 to the first side of nanopore 110, but the polymerase binding rate is sufficiently fast to access the 3' end of duplex 154 before any secondary structure may form.
- Such movement may be performed, for example, after operation (vii) and before repeating operation (v) described above with reference to FIG. 5.
- polynucleotide 158 in which polynucleotide 158 is located 5' of complement 159, all or a portion of polynucleotide 158 and all or a portion of complement 159 may be repeatedly move from the first side of nanopore 110 to the second side of nanopore 110 and back in such a manner as to disrupt secondary structure formation but retaining access the 3' end of duplex 154 before any secondary structure may form. Such movement may be performed, for example, after operation (vii) and before repeating operation (v) described above with reference to FIG. 5.
- secondary structure 351 may not be able to form on the first side of the nanopore because both polynucleotide 158 and complement 159 are not both located on that side of the nanopore at the same time as one another, or because the time for the stable secondary structure formation exceeds the time for polymerase binding to the 3’ end of duplex 154.
- secondary structure 352 may form on the second side of the nanopore which may be beneficial for reasons such as described with reference to FIGS. 2A- 2B. Movement of only a portion of template polynucleotide 158, or of only a portion of complement 159, to the first side of the nanopore may be achieved by suitably selecting F2 such as described with reference to FIG.
- the voltage used to provide F2 may be reduced in time, in amplitude, or in both time and amplitude, relative to a voltage that otherwise may be used to move both template polynucleotide 158 and complement 159 to the first side of nanopore 110. Movement of only a portion or all of template polynucleotide 158, or of only a portion or all of complement 159 to the first side of the nanopore for a sufficiently short time as to inhibit secondary structure formation may be achieved by suitably selecting F2 such as described with reference to FIG. 1C and Fl as described in reference to Fig. ID.
- the voltage used to provide F2 may be reduced in time, in amplitude, or in both time and amplitude, relative to a voltage that otherwise may be used to move both template polynucleotide 158 and complement 159 to the first side of nanopore 110 and Fl may be may be reduced in time, in amplitude, or in both time and amplitude, relative to a voltage that otherwise may be used to move both template polynucleotide 158 and complement 159 to the second side of nanopore 110.
- Hybridization between polynucleotide 158 and complement 159 may be inhibited in any other manner.
- the present systems and methods further may include heating the construct to a temperature that inhibits formation of secondary structures on the first side of the nanopore, and/or to a temperature that inhibits formation of secondary structures on the second side of the nanopore.
- an elevated temperature that inhibits formation of secondary structure 351 may also inhibit formation of secondary structure 252 and/or secondary structure 352.
- secondary structures 252 and 352 may be beneficial for inhibiting spontaneous transitions between folded and unfolded states such as described with reference to FIGS. 2A-2B, the elevated temperature that inhibits formation of secondary structure 252 and/or secondary structure 352 also may inhibit formation of transient secondary structures 251, thus reducing or obviating the need for more stable secondary structures 252 and 352.
- FIGS. 4A-4B schematically illustrate use of the sequencing system of FIGS. 1 A-1H to re-sequence the same construct 150, e.g., to generate a consensus read.
- system 100 is shown at a time at which sequencing of construct 150 is substantially complete; alternatively, the sequencing of construct 150 may be partially completed and nucleotide identification module 1042 has caused processor 1040 to take the remedial action of resequencing construct 150.
- circuitry 160 applying the first force Fl as shown in FIG. 1H, construct 150 remains hybridized to the (now-extended) polynucleotide 140 in a manner such as described elsewhere herein.
- circuitry may be configured to apply a sufficiently high voltage F4 to dissociate duplex 154, that is, to dehybridize extended polynucleotide 140 from construct 150 in a manner such as illustrated in FIG. 4A.
- Construct 150 optionally may be re-sequenced, e.g., using operations that include hybridizing a new (shorter) polynucleotide 140’, such as a primer, to construct 150 in a manner such as illustrated in FIG. 4B.
- primers 140’ may be included in fluid 120.
- the sequence of nucleotide addition and measurement operations provided herein then may be used to partially or fully sequence construct 150 again.
- the new duplex formed on the first side of the nanopore may include the same first portion 155 of construct 150 as described with reference to FIG. 1A, and may have another 3' end 153 that includes the end of the new polynucleotide 140’.
- 1A-1H, 5, and 4A-4B may be repeated any desired number of times to sequence construct 150 over and over, e.g., until a desired level of accuracy is achieved to provide desired confidence in the sequence. For example, accuracy is improved by combining multiple reads of the same polynucleotide when errors are random and those can be “averaged out” through the use of consensus algorithms known in the art.
- operations 540-560 described with reference to FIG. 5 may be repeatedly performed to obtain a plurality of additional sequences of the polynucleotide, and a consensus sequence may be generated using the first sequence and the plurality of additional sequences.
- operations 540-560 may be repeated to obtain additional first and second pluralities of measured values, and a consensus set of measured values may be generated using the additional first and second pluralities of measured values. A consensus sequence then may be generated using the consensus set of measured values.
- FIG. 6 schematically illustrates an example workflow (method) 600 for generating a full-complement polynucleotide for sequencing.
- Workflow 600 illustrated in FIG. 6 may include coupling native double-stranded polynucleotide 612 including a sense strand and an antisense strand to first and second stem-loop adapters 611 each including a cleavable moiety 660.
- native double-stranded polynucleotide 612 including a sense strand and an antisense strand to first and second stem-loop adapters 611 each including a cleavable moiety 660.
- a double-stranded polynucleotide 612 (e.g., dsDNA) is fragmented to the desired length, and is subjected to end repair, 5' phosphorylation, and A-tailing as in commercially available sequencing-by-synthesis (SBS) ligation library prep kits (e.g., TruSeq, Illumina, Inc.)
- SBS sequencing-by-synthesis
- stem-loop adapters 611 are coupled (e.g., ligated) to both ends of the processed double-stranded fragments 612 to generate a symmetrical “dumbbell” product including a sense strand 613 and an antisense strand 614.
- adapters 611 include a 3’ dT overhang, a 5' phosphate, and cleavable moiety 660, such as a deoxyuridine (dU) or other cleavable residue, in the 3' portion of the stem.
- cleavable moiety 660 such as a deoxyuridine (dU) or other cleavable residue, in the 3' portion of the stem.
- dU deoxyuridine
- cleavable residue such as a deoxyuridine (dU) or other cleavable residue
- Workflow 600 illustrated in FIG. 6 may include cleaving the cleavable moiety 660 of each of the first and second loop adapters 611. For example, at operation 620 illustrated in FIG.
- USER enzyme uracil glycosylase + endonuclease VIII (or other cleavage reagent) is used to cleave the dU (or other cleavable) residue in the adapter, generating a 1-base pair gap flanked by a 3' phosphate and a 5’ phosphate (these phosphates not specifically shown in FIG. 6).
- PNK polynucleotide kinase
- PNK may be used to remove the 3' phosphate from the gap, to leave a free 3 ' OH (not specifically illustrated) at the gap.
- Workflow 600 illustrated in FIG. 6 may include dehybridizing the sense strand 613 from the anti-sense strand 614.
- the sense strand 613 and antisense strand 614 are separated, facilitated by the gaps in the adapters.
- the separation of sense strand 613 from antisense strand 614 may be accomplished using heat, or by using a strand-displacing polymerase as polymerase 631.
- Workflow 600 illustrated in FIG. 6 may include synthesizing a complement of the sense strand 613 to form a first partial construct 671 that includes the sense strand 613 coupled to the complement of the sense strand 613’ via a first loop adapter 611. Additionally, workflow 600 illustrated in FIG. 6 may include synthesizing a complement of the anti-sense strand 614 to form a second partial construct 671 that includes the antisense strand 614 coupled to the complement of the antisense strand 614’ via a second loop adapter. For example, as illustrated in FIG. 6, polymerases 631 are used to synthesize complement 613’ to sense strand 613, and to synthesize complement 614’ to antisense strand 614.
- the complements 613’, 614’ are synthesized using the stem-loop adapter 611 (hairpin) as a primer.
- sense strand 613 and antisense strand 614 are naturally occurring, they may carry epigenetic marks (such as methylated bases).
- sense complement 613’ and antisense complement 614’ are synthetic, and thus may not necessarily carry epigenetic marks.
- the complement 613’ of the sense strand and the complement 614’ of the antisense strand may be synthesized using only unmodified nucleotides.
- the complement 613’ of the sense strand and the complement 614’ of the antisense strand may be synthesized using modified nucleotides.
- complements 613’ and 614’ respectively may be used to identify epigenetic marks on sense strand 613 and antisense strand 614, with or without the use of modified nucleotides in complements 613’ and 614’, are described further below with reference to FIGS. 18A-18B.
- complements 613’ and 614’ may be used as an internal reference to provide for, or improve, detection of the epigenetic marks on sense strand 613 and antisense strand 614.
- the sequencing polynucleotide may have a sense strand followed by a reverse complement of the anti-sense strand. This essentially produces in tandem, two sense strand sequences on the same sequencing polynucleotides.
- Methods for producing a sense-sense sequencing polynucleotide have been disclosed in US application number 63/665,758, filed on June 28, 2024, the contents of which are incorporated in their entirety herein. Briefly, paired forked adapters with complementary linking sequences are added to the ends of double stranded nucleic acid. Non-limiting embodiments of the adapters (2300) are shown in FIG. 22 where 2301 and 2301 ’ represent the complementary linking sequences.
- the adapter may have blockers bound to the complementary linking sequences (shown as 2302 and 2302’) to prevent adapter dimer formation.
- each adapter has an optional biotin (2303) on the non-linking strand 2304 and 2305).
- the biotin may be used to attach the adapters to a surface.
- FIG 23 illustrates how the sense-sense sequencing polynucleotide is generated. Briefly, the adapters are attached to a double stranded polynucleotide having a sense and an antisense strand. The adapters may be attached to a surface and the double stranded polynucleotide is denatured, along with the blocker, if present.
- Workflow 600 illustrated in FIG. 6 may include coupling a first forked adapter to the first partial construct 671 to form a first construct 150. Additionally, workflow 600 illustrated in FIG. 6 may include coupling a second forked adapter to the second partial construct 672 to form a second construct (not specifically illustrated in FIG. 6).
- the respective 3' ends of the complement strands 613’ and 614’ may be A-tailed using a DNA polymerase; however, note that this operation instead can be performed during operation 630 by using an A-tailing polymerase to generate complement strands 613’ and 614’.
- Forked adapter 641 also may include a 5' phosphate to ligate to the sense strand complement 613’ or to antisense strand complement 614’.
- Forked adapter 641 also may include complementary regions 644, 644’ which hybridize to one another so as to facilitate ligation of the forked adapter to 613, 613’ or to 614, 614’ to form construct 150 as illustrated in FIG. 6.
- Forked adapter 641 also may include 5' phosphate 645 (or other negatively charged modification, such as a branched polymer or polyphosphate) which may be used to drive capture of construct 150 in nanopore 110 in a manner as will now be described with reference to FIG. 7.
- first steric lock 151 may include a functional group (such as neutravidin or streptavidin) that will bind to the functionalized (e.g., biotinylated) 3' end of construct 150 and remain bound there until a sufficiently strong force is applied.
- first steric lock 151 may include LNA or PNA that hybridizes to construct 150 and is of sufficient length to remain hybridized to construct 150 until a sufficiently strong force is applied.
- construct 150 may be brought into contact with first side 111 of nanopore 110.
- the 5' end of construct may include a negative charge (e.g., a 5' phosphate or other negatively charged moiety), and this negative charge may be attracted into aperture 113 of nanopore 110 through application of a suitable bias by circuitry 160.
- circuitry 160 applies a force sufficient to cause dehybridization of the polynucleotide 158 and complement 159 within construct 150, allowing the 5' end of strand 150 to translocate through nanopore 110.
- first steric lock 151 may inhibit passage of the 3'-end of construct 150 to the second side of the nanopore through the aperture in a manner such as described with reference to FIGS. 1A-1H.
- Second steric lock 152 then may be coupled to the 5' end of construct 150 in a manner such as illustrated in operation C of FIG. 7.
- second steric lock 152 may include LNA or PNA that hybridizes to construct 150 and is of sufficient length to remain hybridized to construct 150 until a sufficiently strong force is applied, or may include a “self-hybridization” step to form a strong hairpin structure on the second side of the pore with proximal nucleotides uncovered by the dehybridization step B.
- a first primer (polynucleotide 140) may be hybridized to construct 150 to form a first duplex which may be used to sequence at least a portion of construct 150 in a manner such as described elsewhere herein.
- polymerase 105 may add nucleotide 121 to polynucleotide 140 based on the sequence of construct 150. Operations for identifying the nucleotide, and for adding additional nucleotides, are provided elsewhere herein.
- one portion of the construct may be synthesized to include reference nucleotides (e.g., unmodified nucleotides or modified nucleotides) and another portion of the construct may be native, e.g., may be of unknown composition and thus may contain possible epigenetic marks (e.g., modified nucleotides) which may be identified using the reference nucleotides in the synthesized portion.
- reference nucleotides e.g., unmodified nucleotides or modified nucleotides
- the present constructs may improve the ability to detect epigenetic marks (such as methylation), without necessarily chemically altering the epigenetic marks such as commonly done (e.g., using bisulfite).
- the target polynucleotide may be obtained from a living organism, and thus may contain naturally occurring (native) modified nucleotides, while the complement is synthesized from known nucleotides (reference nucleotides) based on the sequence of the target polynucleotide.
- both the native target polynucleotide and its synthetic complement are coupled together in the same molecule, and sequenced using the same nanopore and circuitry in the same run as one another.
- the synthetic complement may be used as an internal “reference” against the native target polynucleotide.
- signals from the native polynucleotide and the reference, complement polynucleotide may be used together to identify epigenetic marks in the native polynucleotide.
- the present constructs may be used in several ways to detect epigenetic marks using signals from native polynucleotides and from their synthesized reference polynucleotides.
- FIGS. 18A-18B illustrate example signals obtained using sequences with different epigenetic marks than one another. More specifically, in FIGS. 18A-18B, the polynucleotide denoted T1 includes the sequence TTTTTTACATTTTTTAC*ATTTTTT, and the polynucleotide denoted T2 includes the sequence TTTTTTAC*ATTTTTTACATTTTTT, where C* denotes methyl cytosine.
- T1 and T2 have the same sequence as one another, but differ in their epigenetic marks.
- signals from T1 and T2 may be used together to identify the location, and potentially also the type(s), of epigenetic marks on either strand.
- T1 has the sequence ACA.
- the unmethylated sequence of T1 here may be used as a reference for the corresponding methylated sequence of T2.
- the signals obtained along T2 and T1 are generally similar to one another because the sequences are the same.
- the signals for T2 and Tl differ from one another in FIG. 18A in the region where T2 is methylated and Tl is not, namely where T2 includes AC*A and Tl includes ACA.
- T2 has the sequence ACA.
- the unmethylated sequence of T2 here may be used as a reference for the corresponding methylated sequence of Tl.
- the signals obtained along Tl and T2 are generally similar to one another because the sequences are the same.
- the signals for T 1 and T2 differ from one another in FIG. 18B in the region where Tl is methylated and T2 is not, namely where Tl includes AC*A and T2 includes ACA.
- Tl and T2 are known a priori to have the same sequence as one another - other than for potential epigenetic marks - the difference in signals between Tl and T2 in a given region therefore means that Tl and T2 differ solely by epigenetic mark(s) in that region. More generally, if first and second sequences are directly related to one another (e.g., are the same as, or complementary to, one another), and the first sequence is a priori known at a given location then any difference in signals may be used as a reference to determine that the second sequence differs from the first sequence in epigenetic mark(s) at this location. This is the case in FIG.
- T1 is known to lack methylation and T2 includes methylation
- T2 is known to lack methylation and T1 includes methylation.
- T2 is known to lack methylation and T1 includes methylation.
- T1 contains any epigenetic marks
- T2 has the same sequence as Tl
- T2 does not contain epigenetic marks
- the polynucleotide being used as the reference may itself include epigenetic mark(s), e.g., may include one or more modified base(s) (such as one or more methylated nucleotide(s)), so long as those modifications are known a priori.
- it may be a priori known that a first polynucleotide is methylated (or contains other epigenetic mark) at a given location, and it may be unknown whether a second polynucleotide is methylated (or contains other epigenetic mark) at that location.
- the first polynucleotide may be used as a reference to determine whether the second polynucleotide also is methylated at that location, in similar manner as described above. Indeed, just as above, this is the case in FIG. 18A where Tl lacks methylation and T2 is known to include methylation; and also is the case in FIG. 18B where T2 lacks methylation and Tl is known to include methylation. Note that to determine whether Tl is methylated in the region shown in FIG.
- T l contains any epigenetic marks; instead, it need only be known that T2 has the same sequence as Tl, that T2 does contain an epigenetic mark, and that there is a signal difference between Tl (unknown) and T2 (reference).
- T2 is methylated in the region shown in FIG. 18B, it need not be known a priori whether or where T2 contains any epigenetic marks, only that that Tl has the same sequence as T2, that Tl does contain an epigenetic mark, and that there is a signal difference between Tl (reference) and T2 (unknown).
- polynucleotide 158 of construct 150 is a native polynucleotide, e.g., obtained from an organism
- complement 159 is a synthetic polynucleotide, e.g., synthesized using polynucleotide 158 as a template.
- epigenetic marks such as base modifications, illustratively methylation
- sequence of complement 159 is a priori known to be exactly complementary to that of polynucleotide 158, and thus may be considered to be the “same” as that of polynucleotide 158 for purposes of the present disclosure.
- complement 159 contains no modified bases, and complement 159 may be used as a reference to identify any modified bases within polynucleotide 158.
- complement 159 may be used as a reference to identify any unmodified bases of that type within polynucleotide 158.
- complement 159 may be synthesized using a known composition or mixture of unmodified and modified nucleotides, which may or may not be the same type of modification present in polynucleotide 158.
- One possible application for using a mixture of modified and unmodified nucleotides to synthesize complement 159 is for creating test data for a more robust, or alternatively more sensitive, base caller using neural networks or otherwise.
- using a mixture of modified and natural nucleotides may help to remove biases from training data, for example that otherwise may arise from the presence of specific epigenetic marks associated with specific sequence contexts in specific organisms.
- complements 159 which are synthesized using a pre-defined set of known modifications may be used to train a more sensitive base calling algorithm that has sensitivity to more than one kind of epigenetic modification, or potentially other modifications to the nucleotides that may be a result of, for example, DNA damage, DNA damage repair, or otherwise.
- Differences between (i) a first plurality of measured values corresponding to the signal from a polynucleotide with unknown epigenetic marks e.g., native polynucleotide
- a second plurality of measured values corresponding to the signal from reference polynucleotide with no epigenetic marks or with known genetic marks e.g., a complement synthesized using the native polynucleotide as a template, using natural nucleotides, or modified nucleotides, or a mixture of natural nucleotides and modified nucleotides
- known genetic marks e.g., a complement synthesized using the native polynucleotide as a template, using natural nucleotides, or modified nucleotides, or a mixture of natural nucleotides and modified nucleotides
- the first plurality of measured values may be subtracted from the second plurality of measured values to obtain a plurality of dissimilarity values.
- the second plurality of measured values may be subtracted from the first plurality of measured values to obtain a plurality of dissimilarity values.
- the first plurality of measured values may be divided by (e.g., normalized by) the second plurality of measured values to obtain a plurality of dissimilarity values.
- the second plurality of measured values may be divided by (e.g., normalized by) the first plurality of measured values to obtain a plurality of dissimilarity values.
- the first and second pluralities of measured values may be input to a machine learning algorithm (such as a neural network) to obtain a plurality of dissimilarity values.
- a machine learning algorithm such as a neural network
- more than one of such calculations are performed to obtain more than one plurality of dissimilarity values, and the pluralities of dissimilarity values may be compared to one another to provide still further accuracy in identifying epigenetic marks.
- first and second pluralities of measured values throughout the present application may be aligned to one another before calculating the dissimilarity values.
- the second plurality of measured values may be suitably computationally processed (e.g., inverted) before calculating the dissimilarity values.
- the complement signal may be run through base calling to generate the complement sequence, and then the reverse complement of the called sequence may be run through a “reverse base-calling” model to generate an expected trace from the sequence for comparison to the native trace; this also may be performed using a joint decoder.
- each native strand will have an exact copy in the synthetic reverse complement copy of the opposite native strand (e.g., original top strand 1913 in figure 19 will be identical in sequence to synthetic reverse complement bottom strand 1914’).
- the traces from these copies would then only need to be aligned for the comparison calculation. That is, in some examples “direct repeats” in the same construct may be compared to one another, such as for FIG. 19 constructs where “identical bases” are paired.
- UMIs may be used to “align” the sense strand (e.g. 612) on one template to the “complement of the antisense” (614’) strand on the second template as in FIG. 6.
- such values may correspond to differences in epigenetic marks along the lengths of the unknown and reference polynucleotides being compared (e.g., polynucleotide 158 and complement 159). Where the dissimilarity value is relatively small for a location corresponding to a given nucleotide in the sequences of the polynucleotides being compared, there is likely no difference in epigenetic mark at that nucleotide.
- the magnitude of the dissimilarity value for a given nucleotide may be compared to a threshold. If the magnitude exceeds the threshold, then the unknown nucleotide may be flagged as having an epigenetic mark. If the magnitude is less than the threshold, then the unknown nucleotide may lack an epigenetic mark.
- the reference polynucleotide lacks any epigenetic marks, and therefore the magnitudes of the dissimilarity values may be expected to correspond solely to epigenetic marks on the unknown polynucleotide.
- the amplitude of the dissimilarity value for a given nucleotide may be compared to both positive and negative thresholds. If the amplitude exceeds the positive threshold, then the unknown nucleotide may be flagged as having an epigenetic mark that the reference nucleotide lacks, and if the amplitude exceeds the negative threshold, then the reference nucleotide may be flagged as having an epigenetic mark that the unknown nucleotide lacks.
- both the unknown nucleotide and the reference nucleotide lack an epigenetic mark.
- the amplitudes of the dissimilarity values may be expected to correspond to differences between epigenetic marks on the unknown polynucleotide and epigenetic marks on the reference polynucleotide.
- the dissimilarity values may be input to a machine learning algorithm (e.g., neural network) which is trained to identify epigenetic marks on the unknown polynucleotide and/or reference polynucleotide using the difference values.
- the plurality of current measurements may be fed into a classifier algorithm (e.g. neural network, Random Forest classifier, Bayesian classifier, Support Vector Machine or any other suitable type of classification algorithm), in order to identify the presence/absence or type of epigenetic mark.
- a classifier algorithm e.g. neural network, Random Forest classifier, Bayesian classifier, Support Vector Machine or any other suitable type of classification algorithm
- While comparing signals from first and second polynucleotides that are a priori known to have the same sequence as one other is one way to identify epigenetic marks
- another way to detect epigenetic marks may use a sequence decoder (e.g., base caller) that is trained to recognize epigenetic marks, such as modified nucleotides.
- the sequence decoder may be trained to identify any suitable number of modified nucleotides, e.g., one, two, three, or more than three modified nucleotides, in addition to unmodified (natural) nucleotides.
- a first base caller may be run on the unknown polynucleotide, and a second base caller separately run on the reference polynucleotide.
- the reference (e.g., synthesized) polynucleotide may be used as a control, and the unknown (e g., native) polynucleotide may be used as an “experiment.”
- Statistical confidence of the epigenetic identification may be assigned based on differences in the confidence of the separate base calls between the reference and unknown polynucleotides. Additionally, or alternatively, statistical confidence of the epigenetic mark identification may be assigned based on differences in the confidence of separate base calls made using a model trained to recognize epigenetic marks and using a model trained not to recognize epigenetic marks. For example, a base caller trained to recognize epigenetic marks may be separately run on both the reference and unknown polynucleotides.
- a base caller which is trained to be unaware of epigenetic marks may be separately run on both the reference and unknown polynucleotides.
- a “joint decoder” such as described above may be used that is trained to be aware of epigenetic marks (e.g., modified nucleotides) as well as for base calling.
- the signal from a reference complement polynucleotide that does not include any modified bases may be used to “predict” what the signal should be from the native (unknown) polynucleotide from which the complement was synthesized. Because the reference does not contain any modified bases, the prediction similarly will not contain any modified bases. As such, at locations at which the native polynucleotide does contain modified bases, the predicted signal will not match the measured signal. Such a prediction may be generated, for example, using a read map such as described with reference to FIGS. 8A-8D.
- such a prediction may be generated using a context-specific simulator such as described in Li et al., “DeepSimulator: a deep simulator for Nanopore sequencing,” Bioinformatics 34(17): 2899-2908 (2016), the entire contents of which are incorporated by reference herein.
- a context-specific simulator such as described in Li et al., “DeepSimulator: a deep simulator for Nanopore sequencing,” Bioinformatics 34(17): 2899-2908 (2016), the entire contents of which are incorporated by reference herein.
- the procedure may involve obtaining measured values for the reference polynucleotide and the corresponding native polynucleotide; performing base calling for the reference polynucleotide; predicting the sequence (excluding epigenetic marks) for the native polynucleotide; inputting the predicted sequence into the read map or context-specific simulator to obtain predicted measured values for the native polynucleotide; calculating the dissimilarity values between the predicted measured values and the actual measured values for the native polynucleotide; and identifying epigenetic marks in the native polynucleotide using the dissimilarity values.
- a joint-decoder may be used that is nucleotide modification- aware.
- constructs for use in sequencing polynucleotides may be made in any suitable manner.
- the workflow described with reference to FIG. 6 optionally may be modified to include unique molecular identifiers (UMIs) 650.
- UMIs 650 are unique barcodes that can be included in stem-loop adapters 611 such that reads respectively coming from the sense strand 613 and antisense strand 614 of the same fragment of native duplexed DNA 612 may be identified algorithmically after sequencing.
- UMIs 650 are unique barcodes that can be included in stem-loop adapters 611 such that reads respectively coming from the sense strand 613 and antisense strand 614 of the same fragment of native duplexed DNA 612 may be identified algorithmically after sequencing.
- each construct 150 may include a first UMI 650 at the stem-loop adapter 611 and a second, different UMI 650’ at the end which came from a different stem-loop adapter and was added in a manner such as shown in operations 610-630.
- Epigenetic marks such as modified nucleotides, in the sense strand 613 and antisense strand 614 obtained from duplexed native polynucleotide 612 may be identified in manner such as described elsewhere herein, for example after aligning the paired constructs together using the UMIs and generating and analyzing dissimilarity values of the polynucleotides therein, and/or by basecalling with a modification-aware algorithm. Note that because sense strand 613 is sequenced using complement 613’ as a reference, and because sense strand 614 is sequenced using complement 614’ as a reference, any difference between epigenetic marks on sense strand 613 and epigenetic marks on antisense strand 614 also may be identified.
- constructs may be particularly useful for identifying epigenetic marks in a manner such as described with reference to FIGS. 18A-18B, their use is not so limited.
- constructs may be used in any suitable method for sequencing a polynucleotide, with or without also identifying epigenetic marks in the polynucleotide.
- FIG. 19 schematically illustrates an example workflow 1900 for generating a fullcomplement polynucleotide for sequencing where top and bottom native strands, possibly including modified bases, are linked to unmodified synthetic reverse-complement copies of the top and bottom strands.
- Workflow 1900 may include coupling a native double-stranded polynucleotide 1901 that includes a sense strand 1913 and an antisense strand 1914 to first and second adapters 1911, 1912.
- the native double-stranded polynucleotide 1901 may be coupled to the first adapter 1911 using a first transposome which includes first transposase 1981 and first adapter 1911, and may be coupled to the second adapter 1912 using a second transposome which includes second transposase 1982 and second adapter 1912.
- the first and second transposomes may fragment the double-stranded polynucleotide while coupling the first and second adapters 1911, 1912 thereto (sometimes referred to as “tagmentation”).
- transposases 1981, 1982 are Tn5 transposases, although other suitable transposases may be used.
- the first adapter 1911 includes a first stem-loop adapter, including first ME sequence 1911’ and its complement ME’ sequence 1911” that hybridize to form a binding site for transposase 1981.
- Adapter 1911 also includes a DNA loop 1911 ’” linking the 5’ end of ME sequence 1911’ to the 3’ end of 1911”.
- loop 1911 ’ includes a modified base with a branching connection to linker 1971 that anchors the transposome to a solid substrate 1972 (e.g., a magnetic bead) and includes a cleavable moiety 1973.
- Second adapter 1912 may include a transfer strand 1974 containing the ME sequence 1912’ at the 3’ end and a subsequence 1912” that is a target of a telomerase that can be used to form a second stem-loop adapter in a manner such as will be described with reference to operation 1960.
- Adapter 1912 may also include a distinct nontransfer strand 1975 containing ME’ sequence 1912’” that hybridizes to the ME sequence 1912’ of transfer strand 1974 for the binding site for transposase 1982.
- Nontransfer strand 1975 may also include a linker 1976 connecting the 3’ end of the ME’ sequence to a solid substrate 1977 (e.g., magnetic bead).
- the first stem-loop adapter 1911 and the forked adapter 1912 respectively may include unique molecular identifiers (UMIs) in a similar manner as described with reference to FIG. 6.
- UMIs unique molecular identifiers
- first adapter 1911 may be covalently coupled to antisense strand 1914, e.g., via subsequence 1911’, and may not be covalently coupled to sense strand 1913.
- second adapter 1912 may be covalently coupled to sense strand 1913, e.g., via subsequence 1912’, and may not be covalently coupled to antisense strand 1914.
- first adapter 1911 instead may be covalently coupled to sense strand 1913 and may not be covalently coupled to antisense strand 1914
- second adapter 1912 instead may be covalently coupled to antisense strand 1914 and may not be covalently coupled to antisense strand 1913
- First adapter 191 1 may also be coupled to both sense strand 1913 and antisense strand 1914 to form a non- sequenceable “B-B” product; similarly, adapter 1912 may be coupled to both sense strand 1913 and antisense strand 1914 to form a non-looped “A-A” product.
- workflow 1900 may include coupling a forked adapter to the second adapter 1912 using the target sequence.
- the transposases 1981, 1982 may be removed, e.g. by mild heating and chelation of the transposase catalytic divalent metal ion by EDTA. As it is removed, transposase 1981 releases first stem 1911’ which then hybridizes to second stem 1911” to form a stem-loop structure which is covalently coupled to antisense strand 1914 and is not covalently coupled to sense strand 1913.
- forked adapter 1941 may be hybridized to second adapter 1912 via hybridization between subsequence 1942 of forked adapter 1941 and subsequence 1912” of second adapter 1912. In the nonlimiting example illustrated in FIG.
- forked adapter 1912 optionally includes a 3' steric lock. As illustrated in operation 1930 of FIG. 19, gap-fill and ligation may be used to covalently couple first adapter 1911 to sense strand 1913 and may be used to covalently couple second adapter 1912 to antisense strand 1914.
- Method 1900 may include dehybridizing the sense strand 1913 from the antisense strand 1914, wherein the first stem-loop adapter 1911 couples the sense strand to the antisense strand after the dehybridizing.
- the sense strand 1913 and antisense strand 1914 are dehybridized.
- the dehybridization of sense strand 1913 from antisense strand 1914 may be accomplished using heat, or by using a strand-displacing polymerase as polymerase 1931.
- Sense strand 1913 and antisense strand 1914 may remain covalently coupled to one another by first adapter 1911, in which subsequences 191 1 ’ and 1911” are dehybridized from one another.
- Method 1900 may include synthesizing a complement 1913’ of the sense strand 1913 hybridized to the sense strand.
- Method 1900 also may include synthesizing a complement 1914’ of the antisense strand 1914 hybridized to the antisense strand.
- polymerase 1931 is used to synthesize complement 1913’ to sense strand 1913, and to synthesize complement 1914’ to antisense strand 1914.
- the complements 1913’, 1914’ are synthesized using subsequence 1943 of forked adapter 1941 as a primer. Note that because sense strand 1913 and antisense strand 1914 are naturally occurring, they may carry epigenetic marks (such as methylated bases).
- sense complement 1913’ and antisense complement 1914’ are synthetic, and thus may not necessarily carry epigenetic marks.
- the complement 1913’ of the sense strand and the complement 1914’ of the antisense strand may be synthesized using only unmodified nucleotides.
- the complement 1913’ of the sense strand and the complement 1914’ of the antisense strand may be synthesized using modified nucleotides.
- Nonlimiting examples of the manner in which complements 1913’ and 1914’ respectively may be used to identify epigenetic marks on sense strand 1913 and antisense strand 1914, with or without the use of modified nucleotides in complements 1913’ and 1914’, are described further above with reference to FIGS. 18A-18B.
- Method 1900 may include forming a second loop coupling the sense strand to the complement of the sense strand.
- a first strand that includes sense strand 1913 and antisense strand 1914, covalently coupled to one another by first adapter 1911 may be hybridized to a newly synthesized second strand that include the complement 1913’ of the sense strand and the complement 1914’ of the antisense strand, covalently coupled to one another by the complement 1915 of first adapter 1911 (which complement is also synthesized during operation 1940).
- the first strand and second strand may be hybridized to one another, but may not yet be covalently coupled to one another.
- an enzyme may be used to couple subsequence 1912” of second adapter 1912 to complement subsequence 1916 within the second strand (which complement is also synthesized during operation 1940).
- a telomerase 1990 may be used to perform such coupling.
- telomerase protelomerase TelN of bacteriophage N15, which has cleaving-joining activity in a manner such as described in Deneke et al., “The Protelomerase of Temperate Escherichia Coli Phage N15 Has Cleaving- Joining Activity,” Proceedings of the National Academy of Sciences of the United States of America 97(14): 7721-26 (2000), the entire contents of which are incorporated by reference herein.
- subsequence 1912 may include a TelN target
- complement subsequence 1916 may include the complement of the TelN target.
- construct 150 includes sense strand 1913, an antisense strand 1914 that is complementary to the sense strand, is not hybridized to the sense strand, and is coupled to the sense strand via an adapter 1911.
- Construct 150 also includes a complement 1913’ of the sense strand that is hybridized to the sense strand 1913, and a complement 1914’ of the antisense strand that is hybridized to the antisense strand. As illustrated in FIG. 19, complement 1913’ may be coupled to complement 1914’ via an adapter 1915.
- Construct 150 also may include a loop 1991 coupling the sense strand 1913 to the complement 1913’ of the sense strand.
- sense strand 1913 may include one or more epigenetic marks, the complement 1913’ of the sense strand may not comprise any epigenetic marks.
- antisense strand 1914 may include one or more epigenetic marks, and the complement 1914’ of the antisense strand may not comprise any epigenetic marks.
- the complement 1913’ of the sense strand and the complement 1914’ of the antisense strand may be synthesized using only unmodified nucleotides.
- the complement 1913’ of the sense strand and the complement 1914’ of the antisense strand may be synthesized using modified nucleotides.
- Such examples may be useful for example, to identify the absence of modified nucleotides in the sense strand 1913 and complement strand 1914, or to train a base caller to be modification aware or to train the base caller to be modification unaware in a manner such as described with reference to FIGS. 18A-18B.
- 1A-1H further may include characterizing the antisense strand 1914 to generate a third plurality of measured values; and characterizing the complement 1914’ of the antisense strand to generate a fourth plurality of measured values.
- the first sequence of the polynucleotide 158, 1913 further may be generated using the third plurality of measured values and the fourth plurality of measured values, e.g., in a manner such as described with reference to FIGS. 8A-8D, 9, 10, and 11A-11B.
- FIG. 19 describes a method to create a full complement construct that contains both forward and reverse complement copies of a native double-stranded polynucleotide (e g., native dsDNA) as well as a full set of synthesized complements.
- Construct 150 described with reference to FIG. 19 may be referred to as a “quadruplex” construct.
- the quadruplex construct may be made using an A/B tagmentation approach, in which the native double-stranded polynucleotide is first tagmented with “A” type transposomes and “B” type transposomes.
- the “A” type transposomes include second transposases 1982 and sequences that are adapted to become the “ends” of construct 150 (i.e., the 5' and 3’ ends), as well as the approximate midposition of the final construct 150.
- the “B” type transposomes include first transposases 1981 and a segment which will be present at approximately 25% and 75% of the way through the final construct 150.
- the quadruplex construct 150 illustrated in FIG. 19 includes both forward (sense, 1913) and reverse complement (antisense, 1914) strands with epigenetic marks (e.g., modified nucleotides), and also a fully synthesized set of complementary polynucleotides.
- epigenetic marks e.g., modified nucleotides
- signals from the modified (unknown) strands 1913, 1914 and reference strands 1913’, 1914’ may be directly compared using exactly the same molecule and system, in a single run.
- This configuration also provides the benefit of having both the forward and reverse complement with modified nucleotides (1913, 1914) and without modified nucleotides (1913’, 1914’).
- both a forward and reverse complement strand may be sequenced without ever losing a “full complement” on the trans side (second side 112 of nanopore 110).
- a forward and reverse complement strand e.g., “original top” sense strand 1913 and “original bottom” antisense strand 1914
- the “original top” and “synthetic RC top” remain hybridized to one another, creating a hairpin structure that stabilizes the signal in a manner such as described with reference to FIGS. 2B and 3A-3B.
- the synthetic RC top Upon sequencing through the “original top,” at some point it may become even more favorable for the synthetic RC top to pair with the synthetic RC bottom.
- quadruplex constructs 150 such as described with reference to FIG. 19 may be expected to inhibit or reduce split levels for sequencing through at least one full forward and reverse complement pair.
- RNA may include epigenetic marks.
- RNA forms secondary structures. This can make nanopore sequencing challenging, for example, by generating “split levels” similarly as described with reference to FIG. 2A.
- a “full complement” construct including RNA and cDNA may be made that may be sequenced in a manner such as described with reference to FIGS. 1A-1H, and that inhibits formation of RNA secondary structures that otherwise may make it difficult to sequence the RNA.
- method 2000 is not limited for use with RNA, and may be adapted for use in generating a full complement
- FIG. 20 schematically illustrates an example workflow 2000 for generating a fullcomplement polynucleotide for native RNA sequencing that may include modified bases in a native RNA strand and unmodified bases in a synthetic cDNA strand.
- Workflow 2000 may include coupling a native single-stranded polynucleotide 2001 to 5’ and 3’ adapters.
- Polynucleotide 2001 could be a full-length mRNA that may include a 5’ cap and 3’ poly-A tail as shown in FIG 20.
- polynucleotide 2001 could be a fragmented mRNA lacking a 5’ cap and poly-A tail, a noncoding RNA (e.g.
- RNA transfer RNA ribosomal RNA, long non-coding RNA
- a single-stranded DNA to which a poly-A tail or other 3’ primer binding sequence has been added, for example by enzymatic synthesis with a terminal transferase in a manner such as described in Tang et al., “mRNA-seq whole-transcriptome analysis of a single cell,” Nature Methods 6(5): 377-382 (2009), the entire contents of which are incorporated by reference herein, or poly-A polymerase in a manner such as described in Cao et al., “Identification of the gene for an Escherichia Coli poly(A) polymerase,” Proceedings of the National Academy of Sciences of the United States of America 89(21): 10380-10384 (1992), the entire contents of which are incorporated by reference herein.
- a first adapter 2012 may be covalently coupled to the 5’ end of RNA 2001, and may include a subsequence for directing enzymatic loop formation (e.g. bacteriophage N15 protelomerase TelN discussed above in reference to FIG. 19).
- a subsequence for directing enzymatic loop formation e.g. bacteriophage N15 protelomerase TelN discussed above in reference to FIG. 19.
- Nonlimiting examples that may be used to couple the 5' end of mRNA 2001 to 5’ adapter 2012 include (i) capture of 2012 by a template-switching reverse transcription protocol in a manner such as described in Zhu et al., “Reverse transcriptase template switching: A SMART approach for full-length cDNA Library Construction,” BioTechniques 30(4): 892-897 (2001), the entire contents of which are incorporated by reference herein, followed by ligation of the 3’ end of adapter 2012 to the 5’ end of RNA 2001 using a DNA-splinted RNA ligase (e.g., Chlorella virus RNA ligase such as described in Jin et al., “Sensitive and specific miRNA detection method using splintR ligase,” Nucleic Acids Research 44(13): el 16 (2016), the entire contents of which are incorporated by reference herein.
- a DNA-splinted RNA ligase e.g., Chlorella virus
- this may require removal of the 5’ cap structure, for example using Schizosaccharomyces pombe Edel -fused Dcpl-Dcp2 decapping enzyme in a manner such as described in Paquette et al., “Application of a Schizosaccharomyces pombe Edcl-fused Dcpl- Dcp2 decapping enzyme for transcription start site mapping,” RNA 24(2): 251-257 (2016), the entire contents of which are incorporated by reference herein; or (ii) non-splinted enzymatic ligation using T4 RNA ligase I in a manner such as described in Romaniuk et al., “Joining of RNA molecules with RNA ligase,” Methods in Enzymology 100: 52-59 (1983), the entire contents of which are incorporated by reference herein; or (iii) chemoenzymatic ligation between e g.
- 3' ligation handle may be coupled to the 3' end of primer 2011 via which the 3' end of the mRNA 2001 may be coupled to 3' lock 151.
- Nonlimiting examples that may be used to couple the 3' end of primer 2011 to 3' lock 151 include: nonsplinted enzymatic ligation using T4 RNA ligase I; 3’ addition of an azide-containing terminator ATP to RNA 2001 using poly-A polymerase followed by ‘click’ coupling to a 3’ lock adapter 151 containing a 5’ terminal alkyne; or chemical coupling of the 2’ -3’ terminal diol of 2001 to a 3’ lock adapter containing a 5’ boronic acid in a manner such as described in Lelievre-Buttner et al., “Boronic acid assisted selfassembly of functional RNAs,” Chemistry 29(35): e202300196 (2023), the entire contents of which are incorporated by reference herein.
- the first and second adapters respectively may include unique molecular identifiers (UMIs), e.g., in a manner such as described with reference to FIG. 6.
- Method 2000 also may include using the primer to synthesize a complement of the single-stranded polynucleotide and a complement of the target sequence.
- a construct including primer 2011” coupled to 5' lock 152 may be coupled to mRNA 2001 via hybridization between primer 2011 and primer 2011”.
- a suitable polymerase e.g., reverse transcriptase (RT)
- RT reverse transcriptase
- the mRNA 2001 including epigenetic marks may be hybridized to, but not covalently coupled to, cDNA 2001’ which in some examples was synthesized using natural nucleotides and thus lacks epigenetic marks, or which was synthesized using modified nucleotides.
- polymerase 2031 also synthesizes a complement of target sequence 2012”, which during operation 2030 is hybridized to, but not covalently coupled to, target sequence 2012.
- Method 2000 also may include coupling the target sequence 2012 to the complement of the target sequence 2012” to form a stem -loop adapter.
- a telomerase 2090 may be used to perform such coupling.
- a nonlimiting example of such a telomerase is protelomerase TelN of bacteriophage N15.
- target sequence 2012 may include a TeTN target
- complement target sequence 2012 may include the complement of the TelN target.
- construct 150 includes native mRNA strand 2001 and cDNA strand 2001’ that is complementary to the mRNA strand, is hybridized to the sense strand, and is coupled to the mRNA strand via loop 2091 formed by telomerase 2090.
- construct 150 described with reference to FIG. 20 may be useful for identifying epigenetic marks, construct 150 also or alternatively may be used simply to sequence the polynucleotide 2001.
- polynucleotide 158 of construct 150 may correspond to mRNA strand 2001 described with reference to FIG. 20, and complement 159 of construct 150 may correspond to cDNA strand 2001’ described with reference to FIG. 20.
- Sequencing may be performed for the RNA only, the cDNA only, or both the mRNA and cDNA.
- a reverse transcriptase may be used for the polymerase 105 in the operations described in FIGS. 1A-1H instead of DNA polymerase.
- a DNA polymerase may be used for the polymerase 105 in the operations described with reference to FIGS. 1A-1H.
- polymerase 105 in fluid 120 on the first side of nanopore 110 includes a mixture of reverse transcriptase and DNA polymerase.
- sequencing may proceed first for the RNA 2001 using the reverse transcriptase as polymerase 105; and then sequencing may proceed for the cDNA using the DNA polymerase as polymerase 105.
- the target sequence 2012 may further include a primer binding sequence which is specific to a DNA primer to allow the DNA polymerase to begin sequencing after the RNA sequencing is complete.
- sequencing of both RNA and cDNA may be carried out using the same reverse transcriptase as polymerase 105.
- Example enzymes that are capable of extending from both RNA and cDNA include Avian Myeloblastosis Virus reverse transcriptase, and Malone Murine Leukemia reverse transcriptase.
- the cDNA portion also may be used to help detect epigenetic modifications to the RNA in a manner similar to that described elsewhere herein.
- the cDNA may be sequenced, and used to predict the RNA sequence.
- the signal for that RNA sequence then may be predicted (e.g., using a read map or context-specific simulator).
- the values of the predicted signal then may be compared to the actual measured values from the RNA, to calculate dissimilarity values between the predicted measured values and the actual measured values for the native RNA.
- Epigenetic marks in the native RNA then may be identified using the dissimilarity values.
- a joint-decoder may be used that is nucleotide modification-aware. Another benefit to accuracy is that the initial base calling may be much easier for cDNA 2001 which may be synthesized using only natural nucleotides and thus may be base called using a four-base read map, than for native RNA 2001 which has a relatively large number of possible epigenetic marks and therefore a more complicated read map.
- RNA construct may be made in a manner which will now be described with reference to FIGS. 21A-21B, and used in a manner similar to that described with reference to FIG. 19.
- FIGS. 21A-21B schematically illustrate an example workflow for generating a fullcomplement polynucleotide for native RNA sequencing that includes the native RNA strand, a direct-repeat RNA synthetic copy of the native RNA, and cDNA copies of both native and synthetic RNA strands.
- FIGS. 21A-21B schematically illustrate another example workflow (method) 2100 for generating a full-complement polynucleotide for sequencing that may include modified bases in a native strand and unmodified bases in a synthetic strand.
- Workflow 2100 may include coupling a native single-stranded polynucleotide 2101 to a first adapter.
- polynucleotide 2101 is a full-length native mRNA including 5’ cap and poly-A tail, although it should be noted that workflow 2100 could be adapted to fragmented mRNA or noncoding RNA with methods outline above for method 2000.
- 3' adapter 2111’ may be coupled to 3 ' primer 2111.
- 3' adapter 2111’ may include first and second strands that are hybridized to each other.
- the first strand includes a landing pad and lox site [LOX-1], and the second strand includes complements of the landing pad and lox site [LOX-1 ’] as well as an extension block between the landing pad and lox site.
- an extension block include the Spl8 linker or 3-5 sequential synthetic abasic sites (dSpacer).
- the complement of the landing pad includes a 3' OH group.
- the 5' end of the first strand of 3' adapter 2111’ may be covalently coupled to the 3' end of poly-A tail 2111.
- Nonlimiting examples that may be used to couple the 3' end of primer 2111 to 3' adapter 2111’ include enzymatic ligation with T4 ligase 1, nontemplated extension using an azide-containing terminator NTP followed by chemical ligation, and terminal diol - boronic acid coupling similar to the methods discussed in reference to method 2000.
- the polyA tail 2111 may either be present in the native RNA (in the case of mRNA) or may be added by non-templated extension (e g., for fragmented / noncoding RNAs).
- a reverse transcriptase (not specifically illustrated) generates cDNA 2101’ based on the sequence of RNA 2101, starting from the 3 ' OH group of the landing pad complement (second strand of 3 ' adapter 2111’).
- the cDNA 2101’ is complementary to RNA 2101, and hybridized to the RNA to form a duplex.
- Workflow 2100 may include coupling a second adapter to the duplex.
- a “template switch oligo” hybridizes via its 3' rC residues to nontemplated 3' dC residues added to the cDNA during reverse transcription, and is then used as a template by reverse transcriptase to synthesize a complement strand, covalently linked to the 3' end of cDNA 2101’, to form double stranded adapter 2112.
- the TSO may include a lox site [LOX-2], Lox sites [LOX-1] and [LOX-2] may include inverted repeat mutations (e.g.
- the pair lox66 and lox71 to improve Cre-lox recombination efficiency by preventing backreaction in a manner such as described in Albert et al., “Site-specific integration of DNA into wild-type and mutant Lox sites placed in the plant genome,” The Plant Journal: For Cell and Molecular Biology 7(4): 649-659 (1995), the entire contents of which are incorporated by reference herein.
- the terminal 3’ OH of the TSO will remain unligated (not coupled to the 5’ end of RNA 2101), in contrast to workflow 2000.
- the TSO usually refers only to the top strand shown here; the bottom strand is copied by the polymerase during template switching.
- the first and second adapters respectively may include unique molecular identifiers (UMIs), e.g., in a manner such as described with reference to FIG.
- UMIs unique molecular identifiers
- a circular construct may be formed using the RNA-cDNA complex to which the first and second adapters are coupled.
- site-specific recombination e.g., Cre-mediated recombination
- adapters 2111’ and 2112 may include lox sequences that the Cre recombinase specifically targets.
- the adapters instead may include sequences that such recombinase specifically targets.
- a polymerase 2131 may be used to generate an RNA complement 2101” of the cDNA starting from the 3'-OH of the strand of the second adapter which is not covalently coupled to the cDNA.
- the polymerase 2131 may include a T7 RNA polymerase. As the polymerase synthesizes the RNA complement of the cDNA, the native RNA 2101 dehybridizes from the cDNA. When the polymerase 2131 reaches the extension block of 3' adapter 2111’, the polymerase stops synthesizing the RNA complement.
- the partial construct including the native RNA 2101 coupled to the RNA complement 2101” of the cDNA is decoupled from circular cDNA.
- a 5' lock 152 and a 3' lock 151 may be coupled to the partial construct.
- the construct 150 formed using operations 2110 through 2180 is used for full-complement sequencing in a manner such as described with reference to FIG. 20.
- the construct 150 generated in workflow 2000 included both RNA and cDNA
- the construct 150 generated in workflow 2100 includes both native RNA 2101 and direct repeat RNA 2101”, the latter of which is generated by making a complement of the cDNA.
- RNA and direct repeat RNA both may readily be sequenced using reverse transcriptase as polymerase 105.
- the product of operation 2170 is further processed to form a quadruplex construct.
- the circular cDNA is removed similarly as described with reference to FIG. 21 A to obtain the partial construct shown at operation 2210 in FIG. 21B.
- the 5' and 3' ends of the partial construct then are coupled to respective adapters.
- the first adapter may include an enzymatic loop-forming sequence such as the TelN target sequence discussed above in reference to method 2000.
- a 5' ligation handle may be coupled to the 5' end of the mRNA 2101 via which target sequence 2212 may be coupled to the mRNA.
- Nonlimiting examples that may be used to couple the 5' end of the mRNA 2001 to the target sequence are described with reference to FIG. 20.
- 3' ligation handle may be coupled to the 3' end of poly-A tail 2111 via which the 3 ' end of the mRNA 2101 may be coupled to 3 ' lock 151.
- Nonlimiting examples that may be used to couple the 3' end of poly-A tail 2111 to 3' lock 151 are described with reference to FIG. 20.
- the first and second adapters respectively may include unique molecular identifiers (UMIs), e.g., in a manner such as described with reference to FIG. 6.
- UMIs unique molecular identifiers
- Method 2200 also may include using the primer to synthesize a complement of the single-stranded RNA 2101, a complement of the direct repeat RNA 2101”, and a complement of the target sequence.
- a construct including primer 2211” coupled to 5' lock 152 may be coupled to mRNA 2101 via hybridization between the primer region of adapter 152 and a complementary landing pad site in the 3’ c/.s-lock adapter 151.
- the particular location at which primer hybridizes to the construct is not critical; for example, the primer instead my hybridize to a sequence of cis-lock adapter 141 .
- a suitable polymerase 2231 e.g., reverse transcriptase (RT)
- RT reverse transcriptase
- the mRNA 2101 including epigenetic marks may be hybridized to, but not covalently coupled to, cDNA 2201’ which in some examples was synthesized using natural nucleotides and thus lacks epigenetic marks, or which was synthesized using modified nucleotides.
- Polymerase 2231 also generates a cDNA reference polynucleotide 2201” using the direct repeat RNA 2101” as a template. Similarly as described with reference to FIG. 20, polymerase 2231 also synthesizes a complement of the target sequence within 5' adapter 2212.
- Method 2200 also may include coupling the target sequence to the complement of the target sequence of the 5' adapter to form a stem-loop adapter.
- a telomerase 2290 may be used to perform such coupling.
- a nonlimiting example of such a telomerase is protelomerase TelN of bacteriophage N15.
- the target sequence of adapter 2212 may include a TelN target
- the complement target sequence may include the complement of the TelN target.
- construct 150 includes native RNA 2101, synthetic cDNA 2201’ that is complementary to the native RNA, is not hybridized to the native RNA, and is coupled to the native RNA via an adapter 2291.
- Construct 150 also includes a synthetic RNA direct repeat 2101” that has the same sequence as the native RNA 2101 and is covalently coupled to the native RNA, and a cDNA complement 2201” of the RNA direct repeat that is hybridized to the RNA direct repeat.
- native RNA 2101 may include one or more epigenetic marks, while the synthetic RNA direct repeat 2101” and synthetic cDNA 2201’ and 2201” may not comprise any epigenetic marks.
- the RNA direct repeat 2101” and cDNA 2201’ and 2201” may be synthesized using only unmodified nucleotides.
- the RNA direct repeat 2101” and/or cDNA 2201’ and 2201” may be synthesized using modified nucleotides.
- construct 150 described with reference to FIGS. 21A-21B may be useful for identifying epigenetic marks
- construct 150 also or alternatively may be used simply to sequence the polynucleotide 1901, e.g., in a manner such as described with reference to FIGS. 1A-1H, potentially using a mixture of RNA reverse transcriptase and DNA polymerase as polymerase 105.
- the reference strand may be synthesized using one homogeneous choice of modified nucleotide which may be targeted specifically to hard-to-sequence (or more error-prone) regions.
- modified nucleotide such as super-G
- the modified nucleotides within the synthetic strand may help to break the degenerate transitions and improve overall base calling accuracy by providing complementary information.
- synthesizing the reference strand may help to discriminate between hard- to-distinguish genomic regions.
- homopolymer regions may be difficult to resolve.
- Having a mixture of modified and unmodified transitions may help to resolve specific sequence types.
- a mixture of modified C and unmodified C in the synthetic strand may aid in distinguishing “steps” through the homopolymer.
- FIG. 12A schematically illustrate example polynucleotides using fragments of the PhiX genome, prepared for nanopore sequencing using the system of FIGS. 1A-1H.
- DNA ultramers were obtained from Integrated DNA Technologies, Inc. (IDT, Coralville, Iowa) and underwent a round of annealing with several DNA fragments prior to capture and sequencing on the device. More specifically, the ultramer DNA oligos were subjected to a round of annealing with three other DNA fragments prior to capture and sequencing on the system of FIGS. 1A-1H.
- the “forward” (polynucleotide 158) templates were annealed to: 1) a 3’ LNA lock hairpin adapter
- FIG. 12B schematically illustrates the polynucleotide of FIG. 12A locked to a nanopore.
- the annealed HPA lock remains bound to the template polynucleotide or complement, the 5' hairpin self-locks to form a hairpin preventing the polynucleotide or complement from leaving the nanopore, and the complementary strands for inhibiting DNA extension in bulk (e.g., the complement DNA, the ddC terminated 5' self-lock block, and the ddC or polyT primer with biotin) are stripped off, leaving ssDNA and an extension primer.
- the complementary strands for inhibiting DNA extension in bulk e.g., the complement DNA, the ddC terminated 5' self-lock block, and the ddC or polyT primer with biotin
- FIGS. 13A-13B schematically illustrate example polynucleotides and complements, and measured accuracy rates for separately and jointly decoding those sequences using the system of FIGS. 1 A-1H.
- polynucleotide 158 and complement 159 had the 100-mer PhiX genomic sequences (PhiX test templates) respectively shown in FIG. 13 A.
- the measurements from polynucleotide 158 and the measurements from complement 159 were jointly input into a Viterbi HMM algorithm in a manner such as described with reference to FIGS. 8A-8C, 9, and 10.
- the accuracy of the jointly obtained sequence ranged from about 85.9% to about 92.2%, with an average accuracy of about 87.8% as illustrated in FIG.
- the accuracy of the measurements taken using polynucleotide 158 alone, also input to the Viterbi HMM algorithm ranged from about 58.0% to about 79.0%, with an average accuracy of 69.0% as illustrated in FIG. 13B.
- the accuracy of the measurements taken using complement 159 alone, also input to the Viterbi HMM algorithm ranged from about 67.0% to about 81.0%, with an average accuracy of 74.2% as illustrated in FIG. 13B.
- the present methods may be used to significantly improve accuracy of sequencing by jointly using measurements from both a target polynucleotide and its complement (e.g., by at least 5%, or by at least 8%, or by at least 9%, or more).
- FIG. 14 schematically illustrates a workflow for preparing a Lambda polynucleotide for nanopore sequencing using the system of FIGS. 1 A-1H.
- 1 kb templates underwent PCR from Lambda phage using a 3' primer with dU insertions and a 5' primer with a 5' LNA self-lock and a SP18 in the hairpin loop such that the PCR complement held open the hairpin and a self-lock blocking oligo need not be used as described with reference to FIGS. 12A-12B.
- the PCR construct was digested with USER enzyme which cleaved the dU bases in the complement strand near the 3' end, leaving nicked fragments that dissociated at room temperature in order to liberate the hybridization region at the 3' end for the HPA and primer. Finally, the construct underwent a SPRI purification, and then annealed on the HPA lock and the biotin tagged primer block.
- FIG. 15A schematically illustrate additional example polynucleotides and complements. More specifically, in this example, polynucleotide 158 and complement 159 had the 1 kb Lambda DNA sequences (Lambda templates) respectively shown in FIG. 15 A.
- the measurements from polynucleotide 158 and the measurements from complement 159 were either separately or jointly input into the Viterbi HMM algorithm in a manner such as described with reference to FIGS. 8A-8E, 9, and 10.
- the accuracy of the measurements taken using polynucleotide 158 alone, input to the Viterbi HMM algorithm was about 69.6%.
- FIG. 15A the accuracy of the measurements taken using polynucleotide 158 alone, input to the Viterbi HMM algorithm
- FIG. 15 A the accuracy of the measurements taken using complement 159 alone, also input to the Viterbi HMM algorithm, was about 68.75%.
- FIG. 15B schematically illustrates measured accuracy rates for separately and jointly decoding the sequences of FIG. 15A using the system of FIGS. 1A-1H.
- the accuracy of the jointly obtained sequence was about 85.0%.
- jointly using the measurements from polynucleotide 158 and the measurements from complement 159 in the Viterbi HMM algorithm provided an improvement of about 15.4% over those from the polynucleotide 158 alone and the complement alone, as illustrated in FIG. 15B.
- FIG. 15C schematically illustrates example signals, and errors, obtained from the sequences of FIG. 15A using the system of FIGS. 1A-1H. It may be understood from FIG. 15C that most of the errors that occur for one strand at a given position do not occur in both strands at the same position, and thus may be substantially eliminated through joint analysis.
- FIG. 16A illustrates example signals obtained using the system of FIGS. 1A-1H without use of a complement.
- the signals corresponding to different nucleotides which are added to duplex 154 fluctuate significantly over time. Without wishing to be bound by any theory, it is believed that the fluctuations may result from spontaneous transitions between folded and unfolded states of transient secondary structures in a manner such as described with reference to FIG. 2A.
- FIG. 16B illustrates example signals obtained using the system of FIGS. 1A-1H with use of a complement. As may be seen in FIG. 16B, the signals corresponding to different nucleotides which are added to duplex 154 fluctuate significantly less over time than in FIG. 16A. Without wishing to be bound by any theory, it is believed that the reduction in fluctuations relative to those in FIG. 16B may result from the complement inhibiting spontaneous transitions between folded and unfolded states of transient secondary structures in a manner such as described with reference to FIG. 2B.
- the target polynucleotide is from a PhiX sequence.
- the construct’s full sequence is: /5Phos/GGGCGGAGCTGGCgataGCCAGCTCCGCCCTTTTTTTTTTTTTTTTGTACAGATA GCTGATACGATGCGATACGCGACATGTCGCTGATGCACTCGGATACGAGTGCATCA GCGACATGTCGCGTATCGCATCGTATCAGCTATCTGTACTGTGTGAGTGTGTGATGT
- the construct thus included a 3' lock, primer, and barcode sequence, a duplexed region formed between "Sequence a" and "Sequence b," where Sequence a is the PhiX insert, Sequence b is Sequence a’s reverse complement), and a 5' lock.
- Loop is a standard GATA sequence. This sequence was custom-produced from a commercially available source, and annealed to form the full complement template for tethering as depicted FIG. 2B.
- the target polynucleotide is the PhiX sequence:
- the presence of a complement on the second side of the nanopore may inhibit the spontaneous transitions between folded and unfolded states of transient secondary structures, and/or otherwise may reduce fluctuations in signal level.
- FIG. 17A illustrates a secondary structure prediction and example signals obtained using the system of FIGS. 1 A-1H without use of an elevated temperature (here, 25 °C).
- the construct sequence is as follows: /5Phos/GGGCGGAGCTGGCgataGCCAGCTCCGCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTGTG ATGTGTGATGTGAAACTAACAGTCTGTGAGCCGTGGTGTGTAATGTGTAGGTAAATG TAAACGTAAAGTTAAATTTAAACTTAAAGCTAAATCTAAACCTAAAGATATGTGTGA TGTGTGGCCTCGGCCGGTGCCTGTGCCGTTGCCTTTGCCGCTGCCTCTGCCGGCGCCT GCGCCGTGTGTAATGTGTACTAAGGATAATGATAACGATAAGTATAATTATAACTAT AAGCATAATCATAACCATAATGTGTGATGTGTCCTTCCTTCCGCCTCCTCCGGGCCCT GGCCCGTGCCCTTGCCCGGTCTGTGTAATGTGTAAAAG
- FIG. 17B illustrates a secondary structure prediction and example signals obtained using the system of FIGS. 1A-1H with use of an elevated temperature (here, 37°C). The same construct was used as described with reference to FIG. 17A. As may be seen in FIG. 17B, the signals corresponding to different nucleotides which are added to duplex 154 fluctuate significantly less over time than in FIG. 17A. Without wishing to be bound by any theory, it is believed that the reduction in fluctuations relative to those in FIG. 17B may result from the elevated temperature inhibiting spontaneous transitions between folded and unfolded states of transient secondary structures in a manner such as described with reference to FIG. 2B. The DNA template secondary structure is shown on the left side of FIGS.
- FIGS. 17A-17B it may be understood from FIGS. 17A-17B that the presence of a complement on the second side of the nanopore may inhibit the spontaneous transitions between folded and unfolded states of transient secondary structures, and/or otherwise may reduce fluctuations in signal level.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Procédé de séquençage pouvant consister à générer un constituant comprenant le polynucléotide et un complément du polynucléotide. Le constituant peut être disposée à travers l'ouverture d'un nanopore. Une première pluralité de valeurs mesurées peut être générée en correspondant à la séquence du polynucléotide. Une seconde pluralité de valeurs mesurées peut être générée en correspondant à la séquence du complément. Une séquence du polynucléotide peut être générée à l'aide de la première pluralité de valeurs mesurées et de la seconde pluralité de valeurs mesurées.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463666044P | 2024-06-28 | 2024-06-28 | |
| US63/666,044 | 2024-06-28 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2026006622A2 true WO2026006622A2 (fr) | 2026-01-02 |
| WO2026006622A3 WO2026006622A3 (fr) | 2026-02-12 |
Family
ID=96703494
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/035519 Pending WO2026006622A2 (fr) | 2024-06-28 | 2025-06-26 | Séquençage de polynucléotides à complément complet à l'aide de nanopores |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2026006622A2 (fr) |
Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1995023875A1 (fr) | 1994-03-02 | 1995-09-08 | The Johns Hopkins University | Transposition in vitro de transposons artificiels |
| US6015714A (en) | 1995-03-17 | 2000-01-18 | The United States Of America As Represented By The Secretary Of Commerce | Characterization of individual polymer molecules based on monomer-interface interactions |
| WO2010048605A1 (fr) | 2008-10-24 | 2010-04-29 | Epicentre Technologies Corporation | Compositions terminales de transposon et procédé de modification d’acides nucléiques |
| US20100120098A1 (en) | 2008-10-24 | 2010-05-13 | Epicentre Technologies Corporation | Transposon end compositions and methods for modifying nucleic acids |
| WO2013153359A1 (fr) | 2012-04-10 | 2013-10-17 | Oxford Nanopore Technologies Limited | Pores formés de lysenine mutante |
| US9708655B2 (en) | 2014-06-03 | 2017-07-18 | Illumina, Inc. | Compositions, systems, and methods for detecting events using tethers anchored to or adjacent to nanopores |
| WO2023049682A1 (fr) | 2021-09-22 | 2023-03-30 | Illumina, Inc. | Séquençage de polynucléotides à l'aide de nanopores |
| WO2023187112A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Copolymères séquencés amphiphiles réticulés avec différents agents de réticulation formant consécutivement une hiérarchie de réticulation |
| WO2023187111A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Barrières comprenant des nanopores biologiques pour le séquençage d'adn, les barrières étant constituées de co-polymères avec des groupes d'extrémité et/ou intermédiaires, et leurs procédés de fabrication |
| WO2023187104A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Dispositifs à nanopores dotés de barrières utilisant des copolymères diblocs ou triblocs, et leurs procédés de fabrication |
| WO2023187110A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Polymères amphiphiles destinés à être utilisés dans des barrières et leur préparation, barrières comprenant des nanopores et leur préparation |
| WO2023187106A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Barrières comprenant des molécules amphiphiles réticulées, et leurs procédés de fabrication |
| WO2023187081A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Procédés d'insertion de nanopores dans des membranes polymères à l'aide de solvants chaotropiques |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4230747A3 (fr) * | 2008-03-28 | 2023-11-15 | Pacific Biosciences Of California, Inc. | Compositions et procédés de séquençage d'acide nucléique |
| AU2015202111B9 (en) * | 2008-03-28 | 2017-07-27 | Pacific Biosciences Of California, Inc. | Compositions and methods for nucleic acid sequencing |
| WO2015150786A1 (fr) * | 2014-04-04 | 2015-10-08 | Oxford Nanopore Technologies Limited | Méthode de caractérisation d'un acide nucléique double brin au moyen d'un nano-pore et de molécules d'ancrage aux deux extrémités dudit acide nucléique |
| EP3464612B8 (fr) * | 2016-05-31 | 2024-12-11 | Switchback Systems, Inc. | Séquençage bicolore par les nanopores |
| KR20230091116A (ko) * | 2020-10-21 | 2023-06-22 | 일루미나, 인코포레이티드 | 다수의 삽입체를 포함하는 시퀀싱 주형, 및 시퀀싱 처리량을 개선하기 위한 조성물 및 방법 |
-
2025
- 2025-06-26 WO PCT/US2025/035519 patent/WO2026006622A2/fr active Pending
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1995023875A1 (fr) | 1994-03-02 | 1995-09-08 | The Johns Hopkins University | Transposition in vitro de transposons artificiels |
| US6015714A (en) | 1995-03-17 | 2000-01-18 | The United States Of America As Represented By The Secretary Of Commerce | Characterization of individual polymer molecules based on monomer-interface interactions |
| WO2010048605A1 (fr) | 2008-10-24 | 2010-04-29 | Epicentre Technologies Corporation | Compositions terminales de transposon et procédé de modification d’acides nucléiques |
| US20100120098A1 (en) | 2008-10-24 | 2010-05-13 | Epicentre Technologies Corporation | Transposon end compositions and methods for modifying nucleic acids |
| WO2013153359A1 (fr) | 2012-04-10 | 2013-10-17 | Oxford Nanopore Technologies Limited | Pores formés de lysenine mutante |
| US9708655B2 (en) | 2014-06-03 | 2017-07-18 | Illumina, Inc. | Compositions, systems, and methods for detecting events using tethers anchored to or adjacent to nanopores |
| WO2023049682A1 (fr) | 2021-09-22 | 2023-03-30 | Illumina, Inc. | Séquençage de polynucléotides à l'aide de nanopores |
| WO2023187112A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Copolymères séquencés amphiphiles réticulés avec différents agents de réticulation formant consécutivement une hiérarchie de réticulation |
| WO2023187111A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Barrières comprenant des nanopores biologiques pour le séquençage d'adn, les barrières étant constituées de co-polymères avec des groupes d'extrémité et/ou intermédiaires, et leurs procédés de fabrication |
| WO2023187104A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Dispositifs à nanopores dotés de barrières utilisant des copolymères diblocs ou triblocs, et leurs procédés de fabrication |
| WO2023187110A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Polymères amphiphiles destinés à être utilisés dans des barrières et leur préparation, barrières comprenant des nanopores et leur préparation |
| WO2023187106A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Barrières comprenant des molécules amphiphiles réticulées, et leurs procédés de fabrication |
| WO2023187081A1 (fr) | 2022-03-31 | 2023-10-05 | Illumina Cambridge Limited | Procédés d'insertion de nanopores dans des membranes polymères à l'aide de solvants chaotropiques |
Non-Patent Citations (45)
| Title |
|---|
| ALBERT ET AL.: "Site-specific integration of DNA into wild-type and mutant Lox sites placed in the plant genome", THE PLANT JOURNAL: FOR CELL AND MOLECULAR BIOLOGY, vol. 7, no. 4, 1995, pages 649 - 659, XP002097329, DOI: 10.1046/j.1365-313X.1995.7040649.x |
| BOEKE ET AL.: "Transcription and reverse transcription of retrotransposons", ANNU REV MICROBIOL, vol. 43, 1989, pages 403 - 434 |
| BOZA ET AL.: "DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads", PLOS ONE, vol. 0178751, 2017, pages 13 |
| BROWN ET AL.: "Retroviral integration: Structure of the initial covalent product and its precursor, and a role for the viral IN protein", PROC NATL ACAD SCI, vol. 86, 1989, pages 2525 - 2529 |
| BUTLER ET AL.: "Single-molecule DNA detection with an engineered MspA protein nanopore", PROC. NATL. ACAD. SCI., vol. 105, 2008, pages 20647 - 20652, XP007920663, DOI: 10.1073/pnas.0807514106 |
| CAO ET AL.: "Identification of the gene for an Escherichia Coli poly(A) polymerase", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 89, no. 21, 1992, pages 10380 - 10384 |
| CAO ET AL.: "Single-molecule sensing of peptides and nucleic acids by engineered aerolysin nanopores", NATURE COMMUNICATIONS, vol. 10, 2019, XP055814050, DOI: 10.1038/s41467-019-12690-9 |
| COLEGIO ET AL., J. BACTERIOL., vol. 183, 2001, pages 2384 - 8 |
| COLEGIO ET AL.: "In vitro transposition system for efficient generation of random mutants of Campylobacter jejuni", J BACTERIOL, vol. 183, 2001, pages 2384 - 2388, XP002681925, DOI: 10.1128/JB.183.7.2384-2388.2001 |
| CRAIG, REVIEW IN: CURR TOP MICROBIOL IMMUNOL, vol. 204, 1996, pages 27 - 48 |
| CRAIG: "V(D)J recombination and transposition: Closer than expected", SCIENCE, vol. 271, no. 5255, 1996, pages 1512 |
| DENEKE ET AL.: "The Protelomerase of Temperate Escherichia Coli Phage N15 Has Cleaving-Joining Activity", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 97, no. 14, 2000, pages 7721 - 26, XP002580375, DOI: 10.1073/pnas.97.14.7721 |
| DERRINGTON ET AL.: "Nanopore DNA sequencing with MspA", PROC. NATL. ACAD. SCI., vol. 107, 2010, pages 16060 - 16065, XP055687027, DOI: 10.1073/pnas.1001831107 |
| DEVINE ET AL.: "Efficient integration of artificial transposons into plasmid targets in vitro: a useful tool for DNA mapping, sequencing and genetic analysis", NUCLEIC ACIDS RES., vol. 22, no. 18, 1994, pages 3765 - 3772, XP002072909 |
| GLOOR, METHODS MOL. BIOL., vol. 260, 2004, pages 97 - 114 |
| GLOOR: "Gene targeting in Drosophila", METHODS MOL BIOL, vol. 260, 2004, pages 97 - 114 |
| GORYSHIN ET AL.: "Tn5 in vitro transposition", J. BIOL. CHEM., vol. 273, 1998, pages 7367 - 7394 |
| HARSHEY: "Transposable phase Mu", MICROBIOL SPECTR, vol. 2, no. 5, 2014, pages 22 |
| ICHIKAWA ET AL.: "In vitro transposition of transposon Tn3", J BIOL CHEM., vol. 265, no. 31, 1990, pages 18829 - 18832 |
| ICHIKAWAOHTSUBO, J BIOL. CHEM., vol. 265, 1990, pages 18829 - 32 |
| JIN ET AL.: "Sensitive and specific miRNA detection method using splintR ligase", NUCLEIC ACIDS RESEARCH, vol. 44, no. 13, 2016, pages 116 |
| KEHAGIAS ET AL., MICROELECTRONIC ENGINEERING, vol. 86, 2009, pages 776 - 778 |
| KIRBY ET AL., MOL. MICROBIOL., vol. 43, 2002, pages 173 - 86 |
| KIRBY ET AL.: "Cryptic plasmids of Mycobacterium avium: Tn552 to the rescue", MOL MICROBIOL, vol. 43, no. 1, 2002, pages 173 - 186 |
| KLECKNER ET AL., CURR TOP MICROBIOL IMMUNOL, vol. 204, 1996, pages 125 - 143 |
| LAMBERG ET AL.: "Efficient insertion mutagenesis strategy for bacterial genomes involving electroporation of in vitro-assembled DNA transposition complexes of bacteriophage Mu", APPL ENVIRON MICROBIOL., vol. 68, no. 2, 2002, pages 705 - 712, XP002980043, DOI: 10.1128/AEM.68.2.705-712.2002 |
| LAMPE ET AL., EMBO J, vol. 15, 1996, pages 5470 - 9 |
| LAMPE ET AL.: "A purified mariner transposase is sufficient to mediate transposition in vitro", EMBO J., vol. 15, no. 19, 1996, pages 5470 - 5479, XP002145182 |
| LI ET AL.: "DeepSimulator: a deep simulator for Nanopore sequencing", BIOINFORMATICS, vol. 34, no. 17, 2018, pages 2899 - 2908, XP055563771, DOI: 10.1093/bioinformatics/bty223 |
| MIZUUCHI: "In vitro transposition of bacteriophage Mu: a biochemical approach to a novel replication reaction", CELL, vol. 35, no. 3, 1983, pages 785 - 794, XP026994526, DOI: 10.1016/0092-8674(83)90111-3 |
| NAUMANN ET AL.: "Trans catalysis in Tn5 transposition", PNAS, vol. 97, no. 16, 2000, pages 8944 - 8949, XP002152831, DOI: 10.1073/pnas.160107997 |
| NICOLAS ET AL.: "Unlocking Tn3-family transposase activity in vitro unveils an asymetric pathway for transposome assembly", PMAS, vol. 114, no. 5, 2017, pages 669 - 678 |
| OHTSUBO ET AL.: "Bacterial insertion sequences", CURR. TOP. MICROBIOL. IMMUNOL., vol. 204, 1996, pages 1 - 26 |
| OLASAGASTI ET AL.: "Replication of individual DNA molecules under electronic control using a protein nanopore", NATURE NANOTECHNOLOGY, vol. 5, no. 11, 2010, pages 798 - 806, XP055107845, DOI: 10.1038/nnano.2010.177 |
| PAGES-GALLEGO ET AL.: "Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling", GENOME BIOL, vol. 24, no. 1, 2023, pages 71,18, XP093210456, DOI: 10.1186/s13059-023-02903-2 |
| PAQUETTE ET AL.: "Application of a Schizosaccharomyces pombe Edc1-fused Dcpl-Dcp2 decapping enzyme for transcription start site mapping", RNA, vol. 24, no. 2, 2018, pages 251 - 257 |
| REZNIKOFF: "Tn5 as a model for understanding DNA transposition", MOL. MICROBIOL., vol. 47, no. 5, 2003, pages 1199 - 1206, XP093043456, DOI: 10.1046/j.1365-2958.2003.03382.x |
| ROMANIUK ET AL.: "Joining of RNA molecules with RNA ligase", METHODS IN ENZYMOLOGY, vol. 100, 1983, pages 52 - 59, XP001248219, DOI: 10.1016/0076-6879(83)00045-2 |
| SAVILAHTI ET AL.: "The phage Mu transposomes core: DNA requirements for assembly and function", EMBO J, vol. 14, no. 19, 1995, pages 4893 - 4903 |
| SPEALMAN ET AL.: "Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy", NUCLEIC ACIDS RESEARCH, vol. 48, no. 9, 2020, pages 4940 - 4945 |
| SYLVESTRE-RYAN ET AL.: "Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing", GENOME BIOLOGY, vol. 22, no. 38, 2021, pages 6 |
| TANG ET AL.: "mRNA-seq whole-transcriptome analysis of a single cell", NATURE METHODS, vol. 6, no. 5, 2009, pages 377 - 382, XP055037482, DOI: 10.1038/nmeth.1315 |
| TENG ET AL.: "Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning", GIGASCIENCE, vol. 7, no. 5, 2018, pages 9, XP055492410, DOI: 10.1093/gigascience/giy037 |
| WANG ET AL., CHEM. COMMUN., vol. 49, 2013, pages 1741 - 1743 |
| ZHU ET AL.: "Reverse transcriptase template switching: A SMART approach for full-length cDNA Library Construction", BIOTECHNIQUES, vol. 30, no. 4, 2001, pages 892 - 897, XP093263889, DOI: 10.2144/01304pf02 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2026006622A3 (fr) | 2026-02-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20230090867A1 (en) | Sequencing polynucleotides using nanopores | |
| AU2022231786B2 (en) | Compositions and methods for polynucleotide sequencing | |
| CN115916994A (zh) | 使用S-腺苷-L-甲硫氨酸类似物(xSAM)检测甲基胞嘧啶及其衍生物 | |
| US20240124921A1 (en) | Detection of analytes using targeted epigenetic assays, proximity-induced tagmentation, strand invasion, restriction, or ligation | |
| WO2026006622A2 (fr) | Séquençage de polynucléotides à complément complet à l'aide de nanopores | |
| US20240287504A1 (en) | Genomic library preparation and targeted epigenetic assays using cas-grna ribonucleoproteins | |
| HK40126736A (en) | Detection of analytes using targeted epigenetic assays, proximity-induced tagmentation, strand invasion, restriction, or ligation | |
| CN117881796A (zh) | 使用靶向表观遗传测定、邻近诱导标签化、链侵入、限制或连接来检测分析物 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25753339 Country of ref document: EP Kind code of ref document: A2 |