WO2024249634A2 - Dnaps sujettes aux erreurs pour réplication d'adn orthogonale - Google Patents
Dnaps sujettes aux erreurs pour réplication d'adn orthogonale Download PDFInfo
- Publication number
- WO2024249634A2 WO2024249634A2 PCT/US2024/031672 US2024031672W WO2024249634A2 WO 2024249634 A2 WO2024249634 A2 WO 2024249634A2 US 2024031672 W US2024031672 W US 2024031672W WO 2024249634 A2 WO2024249634 A2 WO 2024249634A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mutation
- sequences
- trpb
- selection
- mutations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1252—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
Definitions
- OrthoRep cells have an additional DNA replication system comprising an orthogonal DNA polymerase (DNAP)/plasmid pair wherein the orthogonal DNAP (TP-DNAP1) durably replicates the orthogonal plasmid (p1) but not the host genome; likewise, host DNAPs replicate the host genome but not p1 (Fig.1A).
- DNAP orthogonal DNA polymerase
- TP-DNAP1 the orthogonal DNAP
- TP-DNAP1 the orthogonal DNAP
- DNAPs are ideally suited for OrthoRep due to the superior mutation rates and mutation spectrum they achieve. These new DNAPs allow OrthoRep-driven biomolecular evolution experiments to occur with much greater speed than before. Described herein are orthogonal DNAPs that durably replicate p1 at mutation rates exceeding 10 -4 s.p.b. and which generate a remarkable level of divergence.
- the error-prone DNA polymerase comprises an amino acid sequence having at least 90% identity with SEQ ID NO: 1 and at least three amino acid substitutions relative to SEQ ID NO: 1, wherein the at least three amino acid substitutions are at positions selected from E266, N282, I327, N449, L474, E488, Q598, K635, P680, F702, N713, K753, I761, I777, T828, I863, L900, and F965, and wherein the DNAP has a mutation rate greater than 10 -6 substitutions per base.
- the DNAP has a mutation rate of at least 10 -4 substitutions per base.
- the measured rate of mutation for any individual mutation type i.e.
- A/T/G/C ⁇ A/T/G/C) is above 10 -7 substitutions per base.
- the measured rate of transversion mutations is greater than 4.89 x 10 -7 .
- the measured rate of transversion mutations is greater than 6 x 10 -6 .
- the measured rate of transversion mutations is greater than 3 x 10 -5 .
- the measured rate of transversion mutations is 1.89 x 10 -6 to 3.18 x 10 -5 .
- the measured rate of transition mutations is greater than 1.32 x 10 -5 .
- the measured rate of transition mutations is 1.41 x 10 -5 to 1.38 x 10 -4 .
- the DNAP comprises one or more of the amino acid substitutions shown in Table 1.
- the at least three substitutions comprise (a) P680T; (b) I777K, I177T, or I777S; and (c) L900S.
- the at least three substitutions comprise K635R, K753R, and F965Y.
- the at least three substitutions comprise L474S and E488G.
- the DNAP comprises an amino acid sequence having at least 95% identity with an amino acid sequence selected from SEQ ID NOs: 3-18, and wherein the at least three substitutions comprise P680T.
- the DNAP consists of an amino acid sequence selected from SEQ ID NOs: 3-18.
- the DNAP comprises an amino acid sequence that varies from the sequences described herein due to truncation, insertions, deletions, and/or N- or C-terminal tags.
- SEQ ID NO: 1 can be employed to identify substitutions corresponding to the positions identified as E266, N282, I327, N449, L474, E488, Q598, K635, P680, F702, N713, K753, I761, I777, T828, I863, L900, and F965 of SEQ ID NO: 1.
- nucleic acid molecule encoding the DNAP described herein.
- the nucleic acid molecule further comprises a promoter.
- the promotor sequence is selected from pSAC6, pPSP2, and pSLD3.
- yeast host cell comprising a p1 plasmid and DNAP as described herein, and one or more p2 components for orthogonal replication of the p1 plasmid.
- the method comprises subjecting a yeast host cell containing a p1 plasmid encoding the protein, and a DNAP as described herein, to error prone orthogonal replication.
- the method further comprises selecting yeast cells expressing the protein having the desired characteristic.
- the kit comprises reagents for integration of a gene of interest onto a p1 plasmid, and a DNAP as described herein.
- the kit optionally further comprises one or more reagents or devices for transforming a yeast cell therewith.
- the kit further comprises a p1 plasmid packaged together with a yeast host cell comprising one or more p2 components for orthogonal replication of the p1 plasmid.
- the yeast host cell is packaged together with one or more reagents or devices for culturing and/or transforming the yeast host cell.
- TP-DNAP1 A DNA polymerase (TP-DNAP1) that exclusively replicates a specific cytoplasmically localized plasmid via protein primed replication at a high error rate enables in vivo targeted mutagenesis without mutagenizing genomic DNA.
- TP-DNAP1 Schematic for a directed evolution approach to engineer TP- DNAP1’s mutation rates and mutation spectrum incorporating both a direct selection for rare transversion mutations as well as high accuracy mutation rate measurement using a mutation accumulation and high throughput sequencing (HTS) assay.
- HTS mutation accumulation and high throughput sequencing
- TrpB Thermotoga maritima
- TrpB was integrated onto the p1 plasmid in a yeast strain lacking the native yeast tryptophan synthase gene (TRP5).96 independent cultures of the resulting strain were passaged mostly under selective pressure for Trp production using exogenously supplied indole over ⁇ 540 generations. DNA from fifteen timepoints throughout the evolution campaign was harvested and sequenced using HTS.
- Selection pressure for TrpB function is applied by lowering or eliminating exogenously supplied Trp and lowering exogenously supplied indole over time. The schedule of selection pressure imposed throughout extensive evolution is plotted.
- FIGS.3A-3C Revealed effects of selective constraints.
- First shell, ⁇ - ⁇ interface, and ⁇ - ⁇ interface residues are designated as such if they are within the ⁇ heterotetramer holoenzyme, respectively.
- Alignment to Pyrococcus furiosus TrpB crystal structures (PDB codes 5E0K and 5DW3) were used to determine distances from substrate and cofactor, ⁇ -subunit, and ⁇ -subunit.
- Mean solvent accessible surface area (SASA) was used to categorize all remaining residues as either surface (SASA ⁇ 0.2) or buried (SASA ⁇ 0.2).
- Trp5 TrpA-TrpB holoenzyme ortholog from Saccharomyces cerevisiae
- Trp5- ⁇ N N-terminally- truncated Trp5 homologous to TrpB
- 3E Violin plots of isoelectric points for all OrthoRep- evolved and simulated TrpB sequences, split by timepoint. Points and black bars denote the means and interquartile range for all sequences within each timepoint.
- FIGS.4A-4C Lineage barcodes reveal covarying residues.
- FIGS.5A-5G Pooled measurement and TransceptEVE prediction of TrpB variant fitness.
- 5A Schematic of pooled TrpB fitness assay using HTS.
- 5B Spot plating growth assay of control sequences included in the pooled fitness assay.
- 5C Hexbin plot of replicate concordance among pairs of replicates under growth conditions with Trp (no selection), without Trp and with 400 uM indole (weak selection) or without Trp and with 25 uM indole (strong selection) for highly functional sequences (enrichment score > -5)
- FIGS.6A-6D Hexbin plot of TranceptEVE score vs. measured mean enrichment score with strong selection for either all enrichment scores (5E) or enrichment scores for highly active sequences (5F). The percentage of all sequences that fall in the upper or lower quartile of score predictions and are classified as either high or low function (enrichment score greater than or less than -5, respectively) are shown in the respective sections of the plot in (5E).
- 5G Hexbin plot of TransceptEVE score vs. number of nonsynonymous substitutions for all sequences with a measured strong selection mean enrichment score. r, Pearson correlation.
- FIGS.6A-6D Validation of mutation accumulation and comparison of p1 maintenance by legacy TP-DNAP1 variants.
- Yeast strains encoding p1-leu2*-URA3 were transformed with plasmids encoding wild-type (wt) TP-DNAP1, TP-DNAP1-4-2, or TP- DNAP1-KS and passaged for 130 generations under selection for URA3 to allow for accumulation of mutations in leu2*.
- DNA was isolated from these samples at four timepoints throughout the experiment. Gel electrophoresis of these DNA samples (6A) revealed the resurgence of a DNA band corresponding in length to wt p1 ( ⁇ 9 kb) in the TP-DNAP1-4-2 sample, but not others.
- TP- DNAP1-KS maintained both a consistent mutation rate (6B, 6C) and monotonic diversification throughout the experiment (6D).
- Two replicates were isolated for the first timepoint for gel electrophoresis, but only one of these was isolated for subsequent timepoints and used for mutation accumulation. Mutations per base for TP-DNAP1 wt is shown rescaled in (6B) to highlight the poor linear fit. n.d., not detected.
- FIGS.7A-7B Mutation spectrum of legacy TP-DNAP1s.
- FIG.8 Amino acid accessibility by mutation type. Accessibility of any codon for each amino acid from each of the 64 possible codons by one (left column), two (middle column), or three (last column) nucleotide mutations.
- FIGS.9A-9C Validation of transversion-specific selection and epPCR 1.
- FIG.10 Polymerase replacement transformation.
- a strain encoding a “landing pad” DNA sequence that includes the wild type (wt) TP-DNAP1 and an I-SceI cut site at the CAN1 locus was co-transformed with both a TP-DNAP1 variant (typically in library format) and a transient I-SceI endonuclease expression cassette. Integration of NatMX was selected for using nourseothricin. Retention of the landing pad was prevented by counterselection of CAN1 using L-canavanine.
- FIG.11 Flowchart of the programmatic steps performed by the mutation analysis for parallel laboratory evolution (Maple) pipeline.
- an end-to-end data processing pipeline that takes as input a sequencing dataset and a minimal set of additional user inputs and carries out the necessary steps to produce commonly desired visualizations as well as the data that supports those visualizations. This includes generating high accuracy consensus sequences from multiple reads of a sequence via rolling circle amplification (RCA) or unique molecular identifiers (UMI), alignment-based demultiplexing to separate and label sequences derived from different samples, and mutation analysis to generate human-readable .csv outputs that are further analyzed and visualized by Maple or can be viewed and analyzed by the user.
- RCA rolling circle amplification
- UMI unique molecular identifiers
- alignment-based demultiplexing to separate and label sequences derived from different samples
- mutation analysis to generate human-readable .csv outputs that are further analyzed and visualized by Maple or can be viewed and analyzed by the user.
- FIGS.12A-12C Identification of TP-DNAP1-TKS (epPCR 1).
- FIGS.13A-13C Identification of Trixy and SgtKis (epPCR 2).
- 13B-C Heatmap representation of individual mutation rates for Trixy (13B) and SgtKis (13C) as measured in HTS dataset 3.
- FIGS.14A-14D Mutation accumulation and HTS analysis of TP-DNAP1 variants from manual recombination.
- FIG.16 Relationship between p1 length and mutation rate. Mutation rate measurements for a subset of TP-DNAP1 variants and the length of recombinant p1 used to generate the indicated mutation accumulation dataset are shown. TP-DNAP1-Trixy, whose mutation rates show the most obvious relationship with p1 length, is highlighted with diagonal hatching.
- FIG.17 The effect of mutations to residue 777 on transition mutation rates. Mutation rate measurements from HTS dataset 6 for all TP-DNAP1 variants assayed for all four transition mutations, highlighting variants 3B, 3C, which differ from Trixy only by a single mutation (K777T/K777S, respectively) and BB-3B and BB-3C, which differ from BadBoy3 by only a single mutation (K777T/K777S, respectively). [0032] FIGS.18A-18B. Control of p1 copy number via TP-DNAP1 expression level.
- TP- DNAP1 variants were placed under control of one of 3 promoter sequences of varying strength as determined by median protein abundance data. (18A) and p1 copy number in the resulting strains was determined by qPCR (18B) Points represent data for one technical replicate, thick horizontal lines represent average measurement for one biological replicate. [0033] FIGS.19A-19C. The effect of selection on mutation rate. (19A) Illustration of the p1 subjected to mutation accumulation in HTS dataset 6. Cells were passaged in +uracil/- leucine synthetic complete media, enforcing functional selection for LEU2, but not ura3 or mScarlett-I (mScar).
- FIGS.20A-20D A rolling circle amplification (RCA) method for high accuracy long read nanopore sequencing.
- FIGS.21A-21E Diversity of evolved TrpB variants.
- 21A 2-dimensional representation for all unique genotypes identified using PaCMAP for dimensionality reduction, with the timepoint from in which each genotype was identified represented with shading intensity.
- 21B and 21C 2D representations as in 21E, with shading intensity used to indicate the most frequently observed combinations of mutations among the six most frequently mutated positions in the final timepoint.
- 21D Heatmap of all mutations to the 20 most commonly mutated positions throughout the entire experiment.
- FIGS.22A-22E Comparison of substitution distributions for evolved TrpB sequences and projected distributions.
- 22A-22C Violin plots of total nucleotide substitutions (22A), synonymous substitutions (22B), or nonsynonymous substitutions (22C) per sequence in each timepoint of TrpB evolution (darker shapes) compared to projected distributions corresponding to sequences generated by a computational bulk mutation process using mutation rates and preferences of BadBoy2 (lighter shapes), with the extent of mutagenesis determined by the estimated number of generations.
- FIG.23 Nonsynonymous mutation distributions for simulated and evolved sequences. Violin plots of nonsynonymous mutations per sequence in each timepoint of TrpB evolution compared to that of sequences generated by a computational mutation simulation process using mutation rates and preferences of BadBoy2, with the extent of mutagenesis determined by generating an equivalent number of synonymous mutations as that observed in evolved sequences. Points and black bars denote the means and interquartile range for all sequences within each timepoint.
- FIG.24 Effect of long-term mutagenesis on hydrophobicity. Violin plots of Kyte- Doolittle hydrophobicity index in each timepoint of TrpB evolution for both evolved and simulated sequences. Points and black bars denote the means and interquartile range for all sequences within each timepoint. Hydrophobicity indices of the wild type TrpB, the TrpA- TrpB holoenzyme ortholog from Saccharomyces cerevisiae (Trp5) and an N-terminally- truncated Trp5 (Trp5- ⁇ N), homologous to TrpB, are shown for comparison. [0039] FIG.25.
- parental sequence refers to an initial sequence that is subjected to mutagenesis and selection.
- the parental sequence refers to the sequence of the gene of interest provided on a p1 integration plasmid or the protein it encodes that is to be artificially evolved to have one or more desired characteristics.
- one or more sequences on the p1 integration plasmid that are provided for effecting orthogonal replication, surface display, selection, and/or detection may also be artificially evolved by way of being integrated on the p1 expression plasmid, such a sequence is not considered part of the parental sequence unless mutations in the sequence caused by OrthoRep will be specifically selected over its original starting sequence.
- a “p1 plasmid” refers to a plasmid capable of orthogonal replication in yeast cells. P1 plasmids comprise recognition elements, which minimally include p1- specific terminal proteins (TPs) and terminal inverted repeats, that are needed for replication of a gene of interest by a TP-DNAP1.
- TPs p1-specific terminal proteins
- P1 may be used interchangeably.
- a “p1 integration plasmid” refers to a circular or linear plasmid that is used to insert a gene of interest into a p1 plasmid of a yeast cell by homologous recombination after transducing the yeast cell therewith.
- a “p1 expression plasmid” refers to the p1 plasmids of a yeast cell that have been modified to express a given parental sequence and copies thereof resulting from one or more OrthoRep rounds.
- “p2 components” refers to the components encoded on naturally occurring p2 plasmids and derivatives thereof that are needed for orthogonal replication of p1 plasmids.
- p2 components need not be encoded on a p2 plasmid, but may instead be encoded in the yeast host cell’s nuclear DNA or in another plasmid (including p1 expression plasmids) found in the yeast host cell.
- the terms “p2” and “P2” may be used interchangeably.
- a “desired characteristic” refers to a structure or function that one desires a given protein to obtain that it does not already possess.
- Such desired characteristics include: affinity; selectivity; agonism; antagonism; inhibition; irreversible binding; enhancement; a different affinity, avidity, and/or specificity for a target the protein is already capable of binding; an ability to bind a new target; an ability to catalyze a given reaction it is already capable of catalyzing but with a different efficiency and/or under different reaction conditions; an ability to catalyze a new reaction that gives a new product or the same reaction product it already produces but by way of a different synthetic pathway; a change in its resistance or susceptibility to a given condition, e.g., heat, moisture, a given pH, a given chemical or other biomolecule (e.g., protease), degradation, agglutination; a change in a structural domain, a structural motif, a protein fold, and/or supersecondary structure; and the like.
- an “affinity reagent” refers to a compound (e.g., an antibody or fragment thereof, a receptor, an enzyme, etc.) that specifically binds a given target (e.g., a compound or composition, a protein, a nucleic acid molecule, etc.), or vice versa.
- a given target e.g., a compound or composition, a protein, a nucleic acid molecule, etc.
- an affinity reagent may an enzyme that binds with a protein substrate or the affinity reagent may be the protein substrate that binds with the enzyme.
- sequence identity refers to the percentage of nucleotides or amino acid residues that are the same between sequences, when compared and optimally aligned for maximum correspondence over a given comparison window, as measured by visual inspection or by a sequence comparison algorithm in the art, such as the BLAST algorithm, which is described in Altschul et al., (1990) J Mol Biol 215:403-410.
- Software for performing BLAST (e.g., BLASTP and BLASTN) analyses is publicly available through the National Center for Biotechnology Information (ncbi.nlm.nih.gov).
- the comparison window can exist over a given portion, e.g., a functional domain, or an arbitrarily selection a given number of contiguous nucleotides or amino acid residues of one or both sequences.
- the comparison window can exist over the full length of the sequences being compared.
- a given comparison window e.g., over 80% of the given sequence
- the recited sequence identity is over 100% of the given sequence.
- the percentages are determined using BLASTP 2.8.0+, scoring matrix BLOSUM62, and the default parameters available at blast.ncbi.nlm.nih.gov/Blast.cgi.
- Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv Appl Math 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J Mol Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, PNAS USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection.
- polypeptide As used herein, the terms “protein”, “polypeptide” and “peptide” are used interchangeably to refer to two or more amino acids linked together. Groups or strings of amino acid abbreviations are used to represent peptides. Except when specifically indicated, peptides are indicated with the N-terminus on the left and the sequence is written from the N-terminus to the C-terminus. [0051] Polypeptides may be made using methods known in the art including chemical synthesis, biosynthesis or in vitro synthesis using recombinant DNA methods, and solid phase synthesis. See, e.g., Kelly & Winkler (1990) Genetic Engineering Principles and Methods, vol.12, J. K.
- Polypeptides may be purified using protein purification techniques known in the art such as reverse phase high-performance liquid chromatography (HPLC), ion-exchange or immunoaffinity chromatography, filtration or size exclusion, or electrophoresis.
- HPLC reverse phase high-performance liquid chromatography
- ion-exchange or immunoaffinity chromatography filtration or size exclusion, or electrophoresis.
- antibody refers to naturally occurring and synthetic immunoglobulin molecules and immunologically active portions thereof (i.e., molecules that contain an antigen binding site that specifically bind the molecule to which antibody is directed against, such as minibodies and nanobodies).
- antibody encompasses not only whole antibody molecules, but also antibody multimers and antibody fragments as well as variants (including derivatives) of antibodies, antibody multimers and antibody fragments.
- molecules which are described by the term “antibody” herein include: single chain Fvs (scFvs), Fab fragments, Fab’ fragments, F(ab’)2, disulfide linked Fvs (sdFvs), Fvs, and fragments comprising or alternatively consisting of, either a VL or a VH domain.
- a compound e.g., receptor or antibody “specifically binds” a given target (e.g., ligand or epitope) if it reacts or associates more frequently, more rapidly, with greater duration, and/or with greater binding affinity with the given target than it does with a given alternative, and/or indiscriminate binding that gives rise to nonspecific binding and/or background binding.
- a given target e.g., ligand or epitope
- background binding refer to an interaction that is not dependent on the presence of a specific structure (e.g., a given epitope).
- an “epitope” is the part of a molecule that is recognized by an antibody. Epitopes may be linear epitopes or three-dimensional epitopes. As used herein, the terms “linear epitope” and “sequential epitope” are used interchangeably to refer to a primary structure of an antigen, e.g., a linear sequence of consecutive amino acid residues, that is recognized by an antibody.
- binding affinity refers to the propensity of a compound to associate with (or alternatively dissociate from) a given target and may be expressed in terms of its dissociation constant, Kd.
- the antibodies have a Kd of 10 -5 or less, 10 -6 or less, preferably 10 -7 or less, more preferably 10 -8 or less, even more preferably 10 - 9 or less, and most preferably 10 -10 or less, to their given target.
- Binding affinity can be determined using methods in the art, such as equilibrium dialysis, equilibrium binding, gel filtration, immunoassays, surface plasmon resonance, and spectroscopy using experimental conditions that exemplify the conditions under which the compound and the given target may come into contact and/or interact. Dissociation constants may be used determine the binding affinity of a compound for a given target relative to a specified alternative.
- Nucleotide sequence refers to a heteropolymer of deoxyribonucleotides, ribonucleotides, or peptide-nucleic acid sequences that may be assembled from smaller fragments, isolated from larger fragments, or chemically synthesized de novo or partially synthesized by combining shorter oligonucleotide linkers, or from a series of oligonucleotides, to provide a sequence which is capable of expressing the encoded protein.
- DNAPs DNA Polymerases
- DNAPs DNA polymerases
- the error-prone DNA polymerase comprises an amino acid sequence having at least 90% identity with SEQ ID NO: 1 and at least three amino acid substitutions relative to SEQ ID NO: 1, wherein the at least three amino acid substitutions are at positions selected from E266, N282, I327, N449, L474, E488, Q598, K635, P680, F702, N713, K753, I761, I777, T828, I863, L900, and F965, and wherein the DNAP has a mutation rate greater than 10 -6 substitutions per base.
- the DNAP has a mutation rate of at least 10 -4 substitutions per base.
- the measured rate of mutation for any individual mutation type i.e.
- A/T/G/C ⁇ A/T/G/C) is above 10 -7 substitutions per base.
- the measured rate of transversion mutations is greater than 4.89 x 10 -7 .
- the measured rate of transversion mutations is greater than 6 x 10 -6 .
- the measured rate of transversion mutations is greater than 3 x 10 -5 .
- the measured rate of transversion mutations is 1.89 x 10 -6 to 3.18 x 10 -5 .
- the measured rate of transition mutations is greater than 1.32 x 10 -5 .
- the measured rate of transition mutations is 1.41 x 10 -5 to 1.38 x 10 -4 .
- the DNAP comprises one or more of the amino acid substitutions shown in Table 1.
- the at least three substitutions comprise (a) P680T; (b) I777K, I177T, or I777S; and (c) L900S.
- the at least three substitutions comprise K635R, K753R, and F965Y.
- the at least three substitutions comprise L474S and E488G.
- the DNAP consists of an amino acid sequence selected from SEQ ID NOs: 3-18.
- the DNAP comprises an amino acid sequence that varies from the sequences described herein due to truncation, insertions, deletions, and/or N- or C-terminal tags.
- SEQ ID NO: 1 can be employed to identify substitutions corresponding to the positions identified as E266, N282, I327, N449, L474, E488, Q598, K635, P680, F702, N713, K753, I761, I777, T828, I863, L900, and F965 of SEQ ID NO: 1.
- nucleic acid molecule encoding the DNAP described herein.
- the nucleic acid molecule further comprises a promoter. Promoters can be selected in accordance with desired TP-DNAP1 expression levels to modify p1 copy number, as shown in Fig.18.
- the promotor sequence is selected from pSAC6, pPSP2, and pSLD3.
- yeast host cell comprising a p1 plasmid and DNAP as described herein, and one or more p2 components.
- the p2 components can include RNA polymerases and other transcriptional machinery for expression of genes encoded on p1 and p2, capping enzymes and other machinery for translation of genes encoded on p1 and p2, and replication machinery for the replication of p1 and p2.
- P2 components can be interpreted as accessories to the orthogonal replication of the p1 plasmid by DNAPs described herein.
- the method comprises subjecting a yeast host cell containing a p1 plasmid encoding the protein, and a DNAP as described herein, to error prone orthogonal replication.
- the method further comprises selecting yeast cells expressing the protein having the desired characteristic.
- different TP-DNAP1 expression levels are employed to modify p1 copy number, as illustrated in Fig.18, which demonstrates that altering the promoter driving the expression of TP-DNAP1 can influence p1 copy number.
- the engineered protein is an enzyme.
- the engineered protein is an antibody, binding portion thereof, or other protein capable of selectively binding a target.
- the engineered protein is a biosensor, capable of sensing small molecules or macromolecules and transducing a response.
- the engineered protein is a gene editor or gene therapy agent.
- the engineered protein may be multiple proteins or enzymes comprising a complex or metabolic pathway.
- Kits [0072] Further described is a kit for use in implementing methods of the disclosure.
- the kit comprises reagents for integration of a gene of interest onto a p1 plasmid, and a DNAP as described herein.
- the gene of interest typically encodes a protein to be engineered using the methods described herein.
- the kit optionally further comprises one or more reagents or devices for transforming a yeast cell therewith.
- the kit further comprises a p1 plasmid packaged together with a yeast host cell comprising one or more p2 components for orthogonal replication of the p1 plasmid.
- the yeast host cell is packaged together with one or more reagents or devices for culturing and/or transforming the yeast host cell.
- Orthogonal DNA replication is a genetic architecture for continuously hypermutating user-defined genes in vivo, but the mutation rates of legacy systems are too low to condense long gene evolutionary trajectories onto laboratory timescales when strong directional selection is absent.
- TrpB maladapted gene
- TrpB diverged extensively such that the median distance separating pairs of evolved sequences reached 35 amino acids ( ⁇ 9%), with thousands of unique pairs separated by >60 amino acids ( ⁇ 15%). For comparison, the median distance between mouse and human orthologous genes is 11%.
- the high fitness of extensively diverged TrpB variants were not predictable by advanced machine learning models trained on natural variation. Analyzing the rich collection of resultant TrpB sequences – referenced against a precise null model of sequence change simulated from detailed measurements on the mutation rates and preferences of our new OrthoRep systems – revealed both known and unexpected factors influencing TrpB’s function and evolution at high resolution.
- DNA library construction [0081] Amplicons for TP-DNAP1 libraries were generated using error prone PCR with GeneMorph II (Agilent) according to manufacturer instructions, aiming for ⁇ 3-5 nucleotide substitutions per sequence. Amplicons for all other libraries were generated with Q5 Hot Start High-Fidelity DNA Polymerase (NEB). For epPCR 1, the resulting PCR product was assembled into plasmids using Gibson assembly in 20 ⁇ L reaction volumes. For all other libraries, resulting PCR products were assembled into plasmids with Golden Gate assembly with T4 DNA ligase and BsaI-HF v2 or PaqCI (all NEB) in a 40 ⁇ L reaction volume.
- NEB Hot Start High-Fidelity DNA Polymerase
- Gibson reactions were run at 50 °C for 1 hour. Golden gate reactions were run isothermally at 37 °C for 1 hour and heat inactivated at 65 °C for 10 minutes. Reactions were purified with AMPure XP beads (Beckman), typically with a 0.9:1 bead:sample ratio according to manufacturer instructions. Libraries were transformed into high-competency electrocompetent E. coli TOP10 cells (ThermoFisher). [0082] Yeast strains, media, transformations, and DNA extraction [0083] All yeast strains used in this study and their provenance are listed in Table 6.
- Yeast were grown in liquid or on plates at 30 °C in synthetic complete (SC) growth medium (20 g/L dextrose, 6.7 g/L yeast nitrogen base w/ ammonium sulfate w/o amino acids (US Biological), appropriate nutrient drop-out mix (US Biological), as directed) or MSG SC growth medium (20 g/L dextrose, 1.72 g/L yeast nitrogen base w/o ammonium sulfate w/o amino acids (US Biological), appropriate nutrient drop-out mix (US Biological), as directed, 1 g/L L-Glutamic acid monosodium salt hydrate (ThermoFisher)) minus nutrients (referred to as -X where X is either the single letter amino acid code for an amino acid nutrient, or U for uracil) required for appropriate auxotrophy selection(s).
- SC synthetic complete
- MSG SC growth medium (20 g/L dextrose, 1.72 g/L yeast nitrogen base w/
- Yeast transformations including p1 integrations and polymerase replacement integration, were performed as previously described (38). For all integration transformations, plasmid DNA was linearized prior to transformation using either ScaI-HF or EcoRI-HF (both NEB) for p1 or genomic integrations, respectively. Due to its repetitive nature, deletion of FLO1 was performed by a URA3 knock-in knock-out method (70) (see Table 4, pFLO1-KO).
- the wt TP-DNAP1 was integrated at the CAN1 locus using pGR475.
- a sequence encoding a non-functional partial LEU2 sequence lacking the N terminus was then integrated over the wt p1 using pGR420 to generate the landing pad p1.
- the wt p1, which encodes the TP-DNAP1 was then cured out via 3-41:1000 passages.
- the p1 plasmid(s) encoding the desired sequence were then generated via integration using cassette(s) that include a LEU2 sequence lacking the C terminus (e.g. pGR438).
- the polymerase replacement integration transformation was performed by first digesting 0.5-2 ⁇ g of the polymerase replacement plasmid or library with EcoRI-HF in a 25 ⁇ L reaction per 1x transformation followed by directly transforming this digestion reaction into a yeast strain encoding the CAN1-WT-TP-DNAP1 landing pad (all polymerase libraries were transformed into OR-Y488). Library transformations were carried out at 20-40x scale.
- Transformed yeast were plated onto solid MSG SC -LR or -MCR media w/ 100 mg/L nourseothricin (for positive selection of integration) and 200 mg/L l-canavanine (a toxic l- arginine analog for counterselection of cells that fail to perform polymerase replacement and remove the arginine permease CAN1).
- Leu or Met/Cys dropout was used to maintain selection for p1-encoded LEU2 or MET15, respectively, while Arg dropout was used to improve l-canavanine selection.
- Extraction of genomic DNA (gDNA) and p1/p2 plasmids was performed as previously described for 1.5 mL yeast culture volumes (38).
- the resulting pellet was resuspended in 250 ⁇ L Zymolyase solution (0.9 M D-Sorbitol (Sigma Aldrich), 0.1 M Ethylenediaminetetraacetic acid (EDTA, Sigma Aldrich), 10 U/mL Zymolyase (US Biological)) and incubated with shaking (37 °C, 200 RPM).
- Zymolyase solution 0.9 M D-Sorbitol (Sigma Aldrich), 0.1 M Ethylenediaminetetraacetic acid (EDTA, Sigma Aldrich), 10 U/mL Zymolyase (US Biological)
- the 96-well block was then centrifuged (2500 ⁇ G, 5 min), supernatant was discarded, and pellets were resuspended in 280.5 ⁇ L proteinase K solution (250 ⁇ L TE (50 mM Tris-HCl (pH 7.5), 20mM EDTA), 25 ⁇ L 10% sodium dodecyl sulfate (SDS, Sigma Aldrich), 5.5 ⁇ L proteinase K stock solution (10 mg/mL proteinase K (ThermoFisher)). The 96-well block was then incubated at 65 °C for 30 min, combined with 75 ⁇ L 5M potassium acetate (ThermoFisher), and incubated on ice for 30 min.
- the 96-well block was centrifuged at 12,000 ⁇ g for 10 min, the resulting supernatant was combined and mixed with 2 volumes buffer PB (5 M Guanidine hydrochloride (ThermoFisher), 30% isopropanol, 70% water), and this mixture was applied to a 96 well DNA-binding plate (Epoch Life Science) on a vacuum manifold. Flow through was discarded, columns were washed with PE buffer (10 mM Tris- HCl (ThermoFisher), 80% ethanol, 20% water, pH 7.5), centrifuged and dried, and 60 ⁇ L water was applied to columns for elution by centrifugation (2500 ⁇ G, 5 min).
- PE buffer 10 mM Tris- HCl (ThermoFisher), 80% ethanol, 20% water, pH 7.5
- High throughput sequencing [0089] All high throughput sequencing datasets are listed in Table 7, along with the method used to construct them. All PCRs for high throughput sequencing were performed with Platinum SuperFi II DNA Polymerase (ThermoFisher). For short read paired end sequencing, both low yield (AmpliconEZ, Azenta) and high yield (HiSeq paired end 150, Novogene) were performed directly on PCR products generated in either one or two rounds of PCR using primers that each included an adapter sequence, a 6- or 7-nucleotide barcode, or both. [0090] For in-house long read high throughput sequencing, we used the Oxford Nanopore Technologies nanopore sequencing platform.
- Plasmid pGR554 (Table 4) was designed for this purpose and contains CcdB and sfGFP, which both are replaced with the insert during Golden Gate assembly, as well as NotI and SbfI sites, strategically placed to enable separation of the desired library insert from the backbone prior to sequencing.
- CcdB and sfGFP provided counterselection and visualization of colonies resulting from undigested vector. (Cloning into any E. coli vector will suffice however, so long as unique library members are associated with a unique relatively short (20-50 bp) sequence.) Resulting colonies each contained many copies of a unique plasmid species encoding a UMI-tagged library member.
- the resulting library was downsampled by only harvesting ⁇ 20-fold fewer colonies than the expected number of reads, amounting to 100 – 200 thousand colonies for a standard MinION flow cell. Plasmid DNA from this library was then miniprepped, digested (for example with NotI-HF or NotI-HF/SbfI-HF (both NEB) if using plasmid pGR554), and gel extracted prior to sequencing.
- the intermediate UMI libraries were generated at a library size of >100-fold larger than the desired final library size to minimize the chance that distinct library members would be tagged with identical UMIs.
- the second method used for generating libraries for high accuracy nanopore sequencing was adapted from methods described in Volden et al. (72), Oliynyk and Church (73), and Zhang and Tanner (74). It involved circularization of the target sequence and use of this circularized product as template for rolling circle amplification (RCA) using strand displacing DNA polymerases (Fig.20).
- UMI-tagged PCR products were generated as described above, albeit with complementary Type IIS cut sites (BsaI or PaqCI) on the forward and reverse primers used during PCR amplification such that the two ends of the amplicon ligated to each other during a Golden Gate assembly reaction, forming a circular product.
- BsaI or PaqCI complementary Type IIS cut sites
- a large amount of amplicon was used in the circularization reaction, typically 1-2 pmols in 100 ⁇ L Golden Gate assembly reactions.
- Oliynyk and Church (73) provide a useful discussion of relevant considerations for such circularization reactions.
- An isothermal Golden Gate assembly reaction was then performed to circularize the amplicon library.
- the reaction was then AMPure bead purified with a 0.7:1 bead:sample ratio, eluting in a maximum of 10 ⁇ L of water, and the resulting circularized library was used in a rolling circle amplification reaction with the following components combined on ice: 4 ⁇ L 10X NEB buffer 4 (NEB) 4.8 ⁇ L 10 mM dNTPs 2.64 ⁇ L 5 U/ ⁇ L Bsu DNA Polymerase, Large Fragment (NEB) 1.6 ⁇ L 10 ⁇ g/ ⁇ LT4 gene 32 protein (NEB) 0.2 – 1 ⁇ g circularized DNA library RCA primers, 2 ⁇ M each (must bind internal to first primer set, e.g.
- primer pair 7 Water to 40 ⁇ L. [0095] This reaction was incubated at 37 °C for 3 hours. SDS-containing loading dye was added directly to the reaction, and the entire sample was run on an agarose gel. Bands corresponding to 3x–6x concatemers were gel extracted, and purified DNA was used for nanopore sequencing. Unlike RCA using Phi29, this method enabled size selection of specific repeat numbers and did not require a ‘debranching’ step but required large amounts of input DNA and in our experience suffered from high sensitivity to DNA contamination. Use of Phi29 with random hexamers, followed by debranching, is therefore a reasonable alternative.
- nanopore library preparation and sequencing was performed using the most up-to-date ligation sequencing kit (e.g. LSK-114) and flow cell (e.g. R10.4.1), following manufacturer instructions, with two exceptions. First, 1 ⁇ 2 volumes (but unaltered DNA input) for end prep and ligation reactions were used and second, FFPE Repair Mix was not used during end prep reactions. [0097] All relevant high throughput sequencing datasets are made available on the NCBI sequence read archive (SRA), accession number PRJNA1050257.
- SRA NCBI sequence read archive
- High accuracy HTS was used to first obtain c j,t , the total counts of mutation j (e.g., A to T) among all sequences in passage t.
- this count is normalized to obtain the expected count for an idealized sequence with a 1:1:1:1 A:T:G:C ratio.
- n j,t is calculated as [0102]
- the total normalized count of all substitution mutations at each timepoint is then calculated as the sum of all n j,t for all twelve substitution types.
- TP-DNAP1 library selection and screening [0106] Error prone TP-DNAP1 libraries were cloned in E. coli. Mutagenesis was validated by Sanger sequencing of 8-12 individual clones. See below for a summary of these libraries: [0107] Following TP-DNAP1 replacement library construction, resulting purified plasmid DNA was used for a polymerase replacement integration transformation. Prior to transformation, OR-Y488 was grown up in SC -L + 1 mg/L 5-fluoroorotic acid (US Biological) for counterselection against cells that had reverted the inactivating mutation in ura3* by chance.
- colonies were either picked into liquid media for mutation rate characterization by mutation accumulation or were harvested in bulk, miniprepped for gDNA isolation, and used as template for a non-mutagenic PCR to generate amplicons for Golden Gate assembly into pGR554, which was then retransformed into OR-Y488 to repeat the selection and perform mutation rate screening.
- the fluctuation test was performed as follows. Following transformation of the epPCR 1 TP-DNAP1 library into OR-Y488 and selection for ura3* reversion on solid media, individual colonies were picked from this plate, inoculated into 500 ⁇ L SC -LU media in a 96 well block, and grown to saturation.
- Cultures were then passaged 1:10,000 into 200 ⁇ L SC - LU, 12 replicates per each individual colony, and grown to saturation. Cultures were centrifuged, washed with 0.9% NaCl, and pellets were resuspended in 35 ⁇ L 0.9% NaCl.10 ⁇ L of each resuspension was then plated onto SC -LUW plates. A subset of cultures were titered and plated on SC -LU plates to estimate population size. Plated cells were allowed to grow for 4 days, and revertants on each spot were counted.
- Counts were used to estimate the m value using the FALCOR online web tool (lianglab.brocku.ca/FALCOR/) and the Ma- Sandri-Sarkar Maximum Likelihood Estimator. Mutation frequency was calculated from this m value as previously described (42), using a target size of 1 (only one mutation is capable of restoring Trp5 activity). Copy number was not considered and therefore per base substitution rate was not calculated using this method. [0109] Copy number measurement [0110] Quantitative PCR (qPCR) was used to determine p1 copy number. gDNA was isolated from samples according to the 1.5 mL DNA extraction protocol referred to above.
- qPCR reactions were performed in 10 ⁇ L volumes using the Sybr Powerup Master Mix (ThermoFisher), with 1-5 ng of gDNA template. Reactions were run at ‘standard speed’ on a ThermoFisher Quantstudio 6. [0111] The reactions measured the amplification of genomic GAL1, and p1-encoded LEU2 using primer pairs 16 and 17, respectively (Table 5), yielding cycle threshold (C t ) values for each sample. A standard curve was generated correlating C t with DNA quantity using reactions containing known quantities of plasmids encoding GAL1 and LEU2. Standard curves were used to validate the assumption that the two primer pairs had the same amplification efficiency.
- Relative copy numbers of GAL1 and LEU2 in each sample were determined from the standard curves, and absolute p1 copy numbers were normalized to genomic copy numbers to obtain absolute per-cell p1 copy number by dividing LEU2 relative copy numbers by GAL1 relative copy numbers for each sample.
- TrpB evolution Plasmid pGR595 (TP-DNAP1 BadBoy2) was first transformed into yeast strain OR- Y484 according to the polymerase replacement procedure described above. The resulting strain (OR-Y538) was then transformed with plasmid pGR438 (TrpB with lineage barcodes), following the p1 integration procedure described above, plating on SC -L media.
- ⁇ 400 resulting colonies were harvested together and passaged into 512 ⁇ L SC -L media in all wells of a 96-well block, grown to saturation, then again passaged 1:1024 into SC -L media.
- DNA extracted from these resulting cultures served as passage/timepoint 0, which we approximate to be ⁇ 50 generations from TrpB p1 integration.
- These cultures were also passaged 1:1024 (0.5 ⁇ L into 512 ⁇ L) for all passages in the experiment into growth media and with timepoints taken as described in Table 2.
- DNA extraction for timepoints was performed by combining all 96 saturated cultures for a specific timepoint and extracting DNA from the pooled cultures according to the 1.5 mL DNA extraction protocol referred to above.
- Selection pressures shown in Table 2 and Fig.2B for visualization purposes were derived from the concentrations of Trp and indole in the growth media used for each passage. These two components are inversely related to the selection pressure supplied by each component, scaled to range from 0 to 0.5 such that the maximum concentration used for both Trp and indole would yield a selection pressure of 0, media without Trp and with the maximum concentration of indole would yield a selection pressure of 0.5, and media without either component would yield a selection pressure of 1.
- the three selection periods ‘no selection’, ‘mostly positive’ and ‘mostly purifying’ were characterized as such based on the selection pressure used during that period. The initial ‘no selection’ period only included the minimum selection pressure (0).
- yeast cultures from the passages corresponding to generation 340 and 510 were inoculated into SC -L media from glycerol stock and grown to saturation. Saturated cultures were then combined, and multiple serial dilutions of both cultures were plated onto SC -L media. Plates derived from generation 340 and 510 with ⁇ 3700 and ⁇ 1700 colonies respectively were harvested.
- This library was also passaged 1:100 into 50 mL of either SC -L (nonselective), SC -LW + 400 ⁇ M indole (weakly selective), or SC -LW + 25 ⁇ M indole (strongly selective) and grown to saturation. This passaging and growth was repeated for each of the three growth conditions five times for a total of six passages. Of these, DNA was extracted from passages 1, 2, 5, and 6 to serve as additional timepoints. DNA from all timepoints were used as templates for PCR amplification of only the UMI and barcode regions for high throughput sequencing. [0118] Enrichment scores were calculated following the procedure described in Rubin et al.
- TrpB fitness prediction with TranceptEVE [0121] TrpB fitness prediction with TranceptEVE (66) [0122] A multiple sequence alignment (MSA) of natural TrpB subunits was created using 5 iterations of jackhmmer (76) to query the UniRef100 database with a bitscore of 0.9. Columns with more than 20% gaps were ignored, and we used a theta parameter of 0.8 to downweight sequences with more than 80% sequence homology, as described in Hopf et al. (77).
- MSA multiple sequence alignment
- a corresponding simulated genotype was generated by starting from the wild type TrpB sequence used in the evolution experiment and stochastically sampling nucleotide mutations with probabilities determined by the mutation rates of the same polymerase used for TrpB evolution (BadBoy2) until the same number of synonymous mutations as the evolved genotype was reached. Additional information such as the timepoint from which the real sequence was identified and the count of the genotype was also replicated for the corresponding simulated sequence. To account for some minor strand-dependent mutational biases, mutation rates and spectrum calculated for a sequence in the same position and orientation (relative to the LEU2 gene) as TrpB were used to generate the simulated sequences.
- TP-DNAP1-4-2 As the error-prone orthogonal DNAP. Besides its suboptimal error rate of 10 -5 s .p.b., TP-DNAP1-4-2 also has low replicative activity (Fig.6) and exhibits a heavily transition-biased mutation spectrum (Fig.7), suppressing the impact of point mutations on amino acid sequence during protein evolution (Fig.8).
- Fig.6 replicative activity
- Fig.7 transition-biased mutation spectrum
- Fig.8 We carried out a directed evolution campaign on TP-DNAP1 to increase OrthoRep’s overall error rate, transversion rate, and activity.
- a selection strain, OR-Y488, was engineered to contain a p1 plasmid (p1-ura3*-trp5*) encoding two auxotrophic marker genes, ura3 and trp5, each specifically disabled via an active site missense mutation whose sole option for functional reversion is a transversion (Fig.1B and Fig.9).
- TP-DNAP1s with the highest transversion rates should restore URA3 and TRP5 most frequently, resulting in their enrichment from genomically-integrated TP-DNAP1 libraries (see Fig.10) when OR- Y488 is grown in the absence of exogenous uracil or tryptophan.
- a region of p1 not under selection is sequenced at two or more timepoints using unique molecular identifiers (UMIs) for error correction (53, 54), and the rate of change in the number of mutations per position is calculated individually for all types of mutations to fully describe the overall mutation rate and mutation preferences of the TP- DNAP1.
- UMIs unique molecular identifiers
- Fig.11 This pipeline, Mutation Analysis for Parallel Laboratory Evolution or Maple, performs consensus sequence generation, demultiplexing, mutation identification, mutation rate analysis, and many other operations to generate a collection of visualizations and data tables that accelerate analysis of mutation-rich sequencing datasets while minimizing user input.
- TP-DNAP1 variants BadBoy1, BadBoy2, and BadBoy3 quickly degrades the function of genes when purifying selection is absent. Furthermore, an extremely error-prone 1.7 ⁇ 10 -4 s .p.b. TP-DNAP1 variant we isolated (BB-5k) did not durably maintain p1 in two of four biological replicates under selection over ⁇ 120 generations of mutation accumulation, possibly because BB-5k exerted an excessive mutational load on the selection marker used to maintain p1 leading to mutational meltdown.
- the TP-DNAP1s obtained here complete a set of OrthoRep systems evenly spanning a range of ⁇ 5 orders-of-magnitude, from ⁇ 10 -9 s .p.b., similar to the mutation rate of modern cellular genomes, up to ⁇ 10 -4 s .p.b., which is likely in the regime where the error thresholds of individual genes reside, above which gene mutational meltdown occurs, and near which maximal gene adaptation rates can be reached (55,56).
- the overall ecosystem of OrthoRep systems should not only allow for the rapid continuous evolution of chosen genes in vivo, but also investigations on the detailed role of mutation rates and error thresholds in molecular evolution for which theory is abundant but experiment is sparse.
- TrpB maritima TrpB is maladapted for this standalone reaction, since it normally functions in complex with TrpA (57,58). Therefore, cells grown in the absence of Trp need to evolve improved TrpB activity to propagate, allowing TrpB to serve as the subject of an extended evolution experiment that included a range of selection pressures (Fig.2B and Table 2). [0135] Table 2: TrpB selection schedule *reported generations correspond to the culture after it has reached saturation [0136] We designed our evolution experiment to prioritize sequence divergence and diversity to maximize the amount of evolutionary information that could later be extracted.
- the evolution experiment was therefore run for ⁇ 540 generations ( ⁇ 3 months) at a scale of 96 independent replicate 500 ⁇ L cultures.1:1024 (10 generation) transfers into fresh growth medium were made every one or two days, depending on cell density, following a passaging schedule that included all types of selection pressures (Fig.2B): an initial period of evolution without selection (‘no selection’), then a period of adaptation when strong selection pressure was applied and functional improvements in TrpB’s function were observed (‘mostly positive selection’), followed by a long period characterized by mostly purifying selection where adapted TrpBs were pressured to maintain the fitness they evolved (‘mostly purifying selection’) (Fig.2B).
- the ‘mostly purifying’ period also included some brief episodes of relaxed or removed selection pressure, which we introduced with the intention of promoting sequence divergence.
- Fig.2B ⁇ 540- generation evolution experiment
- Table 3 Summary mutation statistics for TrpB evolution [0139] Table 3 (cont’d) [0140] General structural and functional constraints [0141] ⁇ 500,000 sequences of TrpB with an average of 13.1 amino acid replacements each were captured over the evolution experiment, and over 90% of those sequences were unique (Table 3). With such a diverse evolutionary dataset, patterns of conservation should contain structural and functional constraints defining TrpB. To test this notion, we used an AlphaFold structure (33) and knowledge from previous studies on TrpB (60, 61) to first categorize each residue in TrpB according to its general structural or functional role, as outlined in Fig.3A. We then asked whether different categories showed different levels of conservation.
- the simulated dataset serves as the null model where patterns in evolved TrpB sequences are simply a reflection of the mutation preferences of BadBoy2 and codon usage of wt TrpB.
- the nonsynonymous differences between the real dataset and the simulated dataset contain the influence of selective forces. An excess of nonsynonymous changes in the real dataset compared to the simulated dataset is therefore an indication of positive selection, while the opposite signifies purifying selection.
- the real dataset has a paucity of nonsynonymous mutations per sequence in the active site region and buried residues and an excess of nonsynonymous mutations per sequence in the COMM domain.
- Generation 70 is during the first phase of positive selection for TrpB’s operation as a standalone enzyme capable of generating tryptophan (Fig.2B), so this timepoint is most likely to reveal signatures of adaptation.
- TrpB Although purifying selection constrained all regions of TrpB, some were clearly more constrained than others. For example, the active site region had almost no mutations (maximally 1 or 2 but mostly 0) and deviated from the simulated mutant distribution more than all other regions. Buried residues also had substantially fewer nonsynonymous mutations than the simulated dataset. In contrast, the effect of purifying selection was less pronounced on surface residues, reflecting the relative tolerance of protein surfaces to mutation. Surprisingly, this also applied to the newly solvent-exposed ⁇ - ⁇ interface region. In the absence of the ⁇ -subunit, this region should be more solvent- exposed than in TrpB’s native context.
- TrpB s pI evolved to be comfortably below the typical yeast cytosolic pH of 6.8 to 7.2 (62), which is consistent with the notion that intracellular proteins (49, 63) prefer to be negatively charged to minimize large-scale clustering with RNAs and other proteins.
- TrpB function itself for example by increasing the diffusivity of the enzyme (64).
- Another mechanism by which a preference for negative charge in TrpB could have been adaptive is by lessening its perturbation on other entities in the cell, for example by preventing spurious association or aggregation that would disturb the function of the proteome (49). Our data does not exclude either mechanism but suggests that the latter mechanism is present. In generations 0- 50 of the evolution experiment, TrpB was not under selection for function as excess Trp was supplied to the growth media.
- maritima TrpB was from a thermophile but needed to evolve standalone activity in a mesophile, we looked for statistical evidence of thermoadaptation in our evolved sequences.
- Haney et al. studied the patterns of amino acid replacements between natural orthologous proteins in mesophilic versus thermophilic organisms and found 17 amino acid replacements that distinguished the mesophilic variants from the thermophilic variants at homologous positions in multiple sequence alignments with high confidence (13).
- the frequency of these 17 amino acid replacements among all mutations in our evolution experiment’s outcomes we found that replacements in the mesophilic direction were enriched (Fig.25). As before, this illustrates the ability of extensive gene evolution to reveal selective forces through the evolutionary information embedded into the resulting diversity.
- TrpB-003-A highly functional
- TrpTriple nearly nonfunctional
- Trp production by members of this library using three distinct growth conditions: Trp- supplemented media (no selection), media lacking Trp with a high concentration of indole (400 ⁇ M, weak selection), and media lacking Trp with a low concentration of indole (25 ⁇ M, strong selection).
- Trp- supplemented media no selection
- media lacking Trp with a high concentration of indole 400 ⁇ M, weak selection
- media lacking Trp with a low concentration of indole 25 ⁇ M, strong selection
- TrpBs a state-of-the-art ML model
- TranceptEVE which ensembles an autoregressive LLM (Tranception) trained across protein families with a variational autoencoder (EVE) trained on a specific family of proteins (in this case, TrpBs)
- TrpBs a specific family of proteins
- TrpB for standalone function, rarely found in nature, combined with the extensiveness of sequence divergence from wt TrpB have brought our evolved sequences into regions of the fitness landscape that are out-of- distribution of natural sequences. It has been shown that large ML models can generate highly functional artificial sequences that are more dissimilar to natural sequences than our TrpBs are to wt TrpB; it has also been shown that these models can nominate artificial sequences containing evolutionarily plausible mutations that improve non-natural functions (68). Yet these ML successes do not preclude the possibility that ML-generated sequences nonetheless miss important regions of underlying fitness landscapes.
- TP-DNAP1 Directed evolution of TP-DNAP1
- TP-DNAP1-KS is a relative of TP-DNAP1-4-2 with a mutation rate near 10 -5 s.p.b. and greater activity than TP- DNAP1-4-2 (42) (Fig.6).
- TP- DNAP1-TKS with the mutation P680T, had the highest per base mutation rate (Fig.12B- 12C).
- round 2 selection applied to an epPCR library generated from TP-DNAP1-TKS resulted in the enrichment of several clones whose full mutation rates and spectra were then determined (Fig.13).
- TP-DNAP1-SgtKis contained 5 nonsynonymous mutations in addition to those in the parent TP-DNAP1-TKS and demonstrated an altered mutation spectrum favoring transversions but only a minimal apparent increase in overall mutation rate.
- TP-DNAP1-Trixy contained three nonsynonymous mutations and had the highest overall mutation rate measured, but only marginal changes to the mutation spectrum. [0158] Since TP-DNAP1-SgtKis had an increased transversion rate and TP-DNAP1-Trixy had an increased overall substitution rate, we reasoned that their combination could yield orthogonal DNAPs with both high overall and transversion rates. We cloned seven new TP- DNAP1 variants where a subset of mutations from TP-DNAP1-SgtKis were added to TP- DNAP1-Trixy and obtained their fully described mutation rates (SgtKis / Trixy recombination round, Fig.14).
- TP-DNAP1 variants that included the mutations L474S and E488G from TP-DNAP1-SgtKis exhibited a dramatic elevation in their overall mutation rate, in each case bringing the per base rate to ⁇ 10 -4 s.p.b. (Fig.1C, Fig.14C), 1-million-fold higher than the yeast genomic mutation rate (42, 56). Furthermore, the broad mutation spectrum of TP-DNAP1-SgtKis was preserved in these variants (Fig.1D-E, Fig.14D), which we named BadBoy1, BadBoy2, and BadBoy3 (Table 1) to recognize their poor fidelity.
- TP-DNAP1 variants resulting from combining mutations enriched from an epPCR library derived from TP- DNAP1-Trixy (epPCR 3 round, Fig.15) with those in BadBoy3 (BadBoy3 + epPCR 3 round, Table 1).
- BB-5k One of the resulting TP-DNAP1s, named BB-5k, exhibited a further increase in mutation rate, to 1.7 ⁇ 10 -4 s.p.b. (Fig.1D), representing ⁇ 1 mutation every time a 5 kb recombinant p1 plasmid is replicated.
- BB-5k did not durably maintain p1-ura3*- trp5* in two of four biological replicates over the ⁇ 120 generations of mutation accumulation tested, possibly because it exerts an excessive mutational load on the LEU2 marker used to maintain p1-ura3*-trp5*.
- the other DNAP of potential value, BB-Tv has a relatively low mutation rate (1.6 ⁇ 10 -5 s.p.b.), like that of our previous OrthoRep systems.
- BB-Tv demonstrated a near-ideal mutation spectrum, with transversions accounting for 43% of all mutations, up from only 2.5% for TP-DNAP1-KS (Table 1, Fig.1D).
- BB-Tv should therefore be useful in continuous evolution experiments involving larger targets that have lower error thresholds.
- Detailed characterization of mutation accumulation across our several TP-DNAP1 variants revealed interesting trends. For example, we found that the mutation rate of TP- DNAP1-Trixy was correlated with p1 length while TP-DNAP1-SgtKis did not exhibit this relationship, suggesting an interplay between mutation rate and the number of bases replicated that is dependent on mutation mechanism (Fig.16). For BadBoy1, BadBoy2, and BadBoy3, mutation rates were largely independent of p1 length (Fig.16).
- Table 4 Plasmids used in Rix et al.2023 [0163] Table 4 (cont’d) [0164] Table 5: Primer pairs used in Rix et al.2023 [0165] Table 5 (cont’d) *X denotes a nucleotide with a known sequence used for multiplexing (barcode); N, Y, and R denote mixed nucleotides within a region used as a UMI [0166] Table 6: Yeast strains used in Rix et al.2023 [0167] Table 6 (cont’d) [0168] Table 7: High throughput sequencing datasets generated in Rix et al.2023 [0169] Table 7 (cont’d) [0170] Discussion [0171] In this work, we have engineered OrthoRep’s mutation rate to reach >10 -4 s .p.b.
- TrpB While also reducing OrthoRep’s bias against transversion mutations to maximize the exploration of sequence space.
- TrpB sequences swiftly diffused into new regions of sequence space; under strong positive selection, new functional adaptations in TrpB rapidly emerged; and during mostly purifying selection, TrpB sequences quickly sampled the space bounded by the constraints of structure and function, thereby revealing those constraints.
- TrpB At the end of our ⁇ 540 generation evolution experiment on TrpB, we obtained thousands of unique sequences, pairs of which were separated by an average ⁇ 35 amino acids ( ⁇ 9% divergence) including many >60 amino acids apart (>15% divergence).
- the amount of evolutionary information recorded into such extensive diversity allowed us to infer both known and unknown mechanisms shaping TrpB’s evolution, including the focusing of adaptation on the COMM domain, the importance of certain allosterically linked positions on the function of TrpB, and the reduction of TrpB’s isoelectric point to yield negatively charged variants even when selection for TrpB’s enzymatic function was absent.
- the evolutionary information from the experiment also revealed structural and functional constraints acting on TrpB, including conservation of positions near the active site and in buried regions.
- TrpB was used to demonstrate the new capabilities of OrthoRep in both the evolutionary improvement of a gene’s function and the extensive evolutionary recording of selective forces into sequence diversity, our experiments should easily extend beyond TrpB. Our experiments should also be capable of scaling beyond 96 replicate lines to support the generation and maintenance of greater diversity and/or comparative evolution across different selection schedules.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP24816418.8A EP4720270A2 (fr) | 2023-05-30 | 2024-05-30 | Dnaps sujettes aux erreurs pour réplication d'adn orthogonale |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363504905P | 2023-05-30 | 2023-05-30 | |
| US63/504,905 | 2023-05-30 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024249634A2 true WO2024249634A2 (fr) | 2024-12-05 |
| WO2024249634A3 WO2024249634A3 (fr) | 2025-04-17 |
Family
ID=93658783
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/031672 Ceased WO2024249634A2 (fr) | 2023-05-30 | 2024-05-30 | Dnaps sujettes aux erreurs pour réplication d'adn orthogonale |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4720270A2 (fr) |
| WO (1) | WO2024249634A2 (fr) |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230242902A1 (en) * | 2017-10-20 | 2023-08-03 | The Regents Of The University Of California | A highly error-prone orthogonal dna replication system for targeted continuous evolution in vivo |
-
2024
- 2024-05-30 WO PCT/US2024/031672 patent/WO2024249634A2/fr not_active Ceased
- 2024-05-30 EP EP24816418.8A patent/EP4720270A2/fr active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024249634A3 (fr) | 2025-04-17 |
| EP4720270A2 (fr) | 2026-04-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Paoli et al. | Biosynthetic potential of the global ocean microbiome | |
| Robinson et al. | A roadmap for metagenomic enzyme discovery | |
| Engqvist et al. | Applications of protein engineering and directed evolution in plant research | |
| Rix et al. | Continuous evolution of user-defined genes at 1 million times the genomic mutation rate | |
| Wanamaker et al. | CrY2H-seq: a massively multiplexed assay for deep-coverage interactome mapping | |
| US20210256394A1 (en) | Methods and systems for the optimization of a biosynthetic pathway | |
| Younger et al. | Engineering modular biosensors to confer metabolite-responsive regulation of transcription | |
| CA3029254A1 (fr) | Procedes permettant de generer des bibliotheques combinatoires a code a barres | |
| WO2012142591A2 (fr) | Compositions, procédés et utilisations pour le mappage de la relation d'activité des séquences de protéines multiplexes | |
| CN115349128A (zh) | 宏基因组文库和天然产物发现平台 | |
| Belkhelfa et al. | Continuous culture adaptation of Methylobacterium extorquens AM1 and TK 0001 to very high methanol concentrations | |
| Mate et al. | The pocket manual of directed evolution: tips and tricks | |
| Qiao et al. | Systematic characterization of hypothetical proteins in Synechocystis sp. PCC 6803 reveals proteins functionally relevant to stress responses | |
| McClure et al. | Species-specific transcriptomic network inference of interspecies interactions | |
| de Pins et al. | A systematic exploration of bacterial form I rubisco maximal carboxylation rates | |
| Collins et al. | Substrate-specific effects of natural genetic variation on proteasome activity | |
| Pinto et al. | Construction of a chassis for hydrogen production: physiological and molecular characterization of a Synechocystis sp. PCC 6803 mutant lacking a functional bidirectional hydrogenase | |
| WO2024249634A2 (fr) | Dnaps sujettes aux erreurs pour réplication d'adn orthogonale | |
| US20190376067A1 (en) | Compositions, methods and uses for multiplexed trackable genomically-engineered polypeptides | |
| WO2024030344A1 (fr) | Optimisation, basée sur un algorithme génétique et sur imodulon, d'une formulation de milieu pour des produits biologiques d'amélioration de qualité, de titre, de souche et de processus | |
| Stikeleather et al. | Translation Accuracy in E. coli | |
| Jutur et al. | Marine microalgae: exploring the systems through an omics approach for biofuel production | |
| Rix | Engineering in vivo hypermutation and selection systems for observing molecular evolution at scale | |
| Hoffmann et al. | The fitness landscape of a Form II rubisco in a photosynthetic bacterium guides engineering of oxygen tolerance | |
| REPRODUCIBLE | ESTABLISHING AFFINITY PURIFICATION-MASS SPECTROMETRY PROCEDURES PERMITTING REPRODUCIBLE AND HIGH-CONFIDENCE IDENTIFICATIONS OF IN VIVO PROTEIN INTERACTIONS 2 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024816418 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2024816418 Country of ref document: EP Effective date: 20260102 |
|
| ENP | Entry into the national phase |
Ref document number: 2024816418 Country of ref document: EP Effective date: 20260102 |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24816418 Country of ref document: EP Kind code of ref document: A2 |
|
| ENP | Entry into the national phase |
Ref document number: 2024816418 Country of ref document: EP Effective date: 20260102 |
|
| ENP | Entry into the national phase |
Ref document number: 2024816418 Country of ref document: EP Effective date: 20260102 |
|
| ENP | Entry into the national phase |
Ref document number: 2024816418 Country of ref document: EP Effective date: 20260102 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024816418 Country of ref document: EP |