WO2019118949A1 - Systems and methods for predicting repair outcomes in genetic engineering - Google Patents

Systems and methods for predicting repair outcomes in genetic engineering Download PDF

Info

Publication number
WO2019118949A1
WO2019118949A1 PCT/US2018/065886 US2018065886W WO2019118949A1 WO 2019118949 A1 WO2019118949 A1 WO 2019118949A1 US 2018065886 W US2018065886 W US 2018065886W WO 2019118949 A1 WO2019118949 A1 WO 2019118949A1
Authority
WO
WIPO (PCT)
Prior art keywords
repair
nucleotide sequence
deletion
computational model
cas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2018/065886
Other languages
French (fr)
Inventor
Max Walt SHEN
Jonathan Yee-Ting HSU
Mandana ARBAB
David K. Gifford
David R. Liu
Richard Irving SHERWOOD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brigham and Womens Hospital Inc
Massachusetts Institute of Technology
Broad Institute Inc
Harvard University
Original Assignee
Brigham and Womens Hospital Inc
Massachusetts Institute of Technology
Broad Institute Inc
Harvard University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brigham and Womens Hospital Inc, Massachusetts Institute of Technology, Broad Institute Inc, Harvard University filed Critical Brigham and Womens Hospital Inc
Priority to US16/772,747 priority Critical patent/US12406749B2/en
Priority to EP18887576.9A priority patent/EP3724214A4/en
Publication of WO2019118949A1 publication Critical patent/WO2019118949A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K31/00Medicinal preparations containing organic active ingredients
    • A61K31/70Carbohydrates; Sugars; Derivatives thereof
    • A61K31/7088Compounds having three or more nucleosides or nucleotides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/43Enzymes; Proenzymes; Derivatives thereof
    • A61K38/46Hydrolases (3)
    • A61K38/465Hydrolases (3) acting on ester bonds (3.1), e.g. lipases, ribonucleases
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • C12N2320/11Applications; Uses in screening processes for the determination of target sites, i.e. of active nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites

Definitions

  • Non-template directed repair systems including non-homologous end-joining (NHEJ) and microhomology-mediated end-joining (MMEJ), are major pathways involved in the repair of Cas9-mediated double-strand breaks that can result in highly heterogeneous repair outcomes that generate hundreds of genotypes following DNA cleavage at a single site. While end-joining repair of Cas9-mediated double-stranded DNA breaks has been harnessed to facilitate knock-in of DNA templates 18-21 or deletion of intervening sequence between two cleavage sites 22 , NHEJ and MMEJ are not generally considered useful for precision genome editing applications.
  • DNA double-strand break repair i.e., template-free, non-homology-dependent repair
  • template-free repair processes e.g., MMEJ and NHEJ
  • DNA double-strand break repair processes e.g., MMEJ and NHEJ
  • restoring the function of a defective gene with a gain-of-function genetic change are generally not viewed as feasible solutions to precision repair applications, such as restoring the function of a defective gene with a gain-of-function genetic change. Accordingly, methods and solutions enabling the judicious application of template-free genome editing systems, including
  • CRISPR/Cas CRISPR/Cas, TALEN, or Zinc-Finger genome editing systems, would significantly advance the field of genome editing.
  • the present inventors have unexpectedly found through computational analyses that template-free DNA/genome editing systems, e.g., CRISPR/Cas9, Cas-based, Cpfl-based, or other DSB (double-strand break)-based genome editing systems, produce a predictable set of repair genotypes thereby enabling the use of such editing systems for applications involving or requiring precise manipulation of DNA, e.g., the correction of a disease-causing genetic mutation or modifying a wildtype sequence to confer a genetic advantage.
  • This finding is contrary to the accepted view that DNA double-strand break repair (i.e., template-free, non-homology- dependent repair) following cleavage by genome editing systems produces stochastic and heterogenous repair products and are therefore impractical for applications beyond gene disruption.
  • the specification describes and discloses in various aspects and embodiments computational-based methods and systems for practically harnessing the innate efficiencies of template-free DNA repair systems for carrying out precise DNA and/or genomic editing without the reliance upon homology-based repair.
  • the specification provides in one aspect a method of introducing a desired genetic change in a nucleotide sequence using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the desired genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site.
  • DSB double-strand brake
  • the specification provides a method of treating a genetic disease by correcting a disease-causing mutation using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence comprising a disease-causing mutation; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for correcting the disease-causing mutation in the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby correcting the disease-causing mutation and treating the disease.
  • DSB double-strand brake
  • the specification provides a method of altering a genetic trait by introducing a genetic change in a nucleotide sequence using a double-strand brake (DSB)- inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site and consequently altering the associated genetic trait.
  • DSB double-strand brake
  • the specification provides a method of selecting a guide RNA for use in a Cas-genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising: identifying in a nucleotide sequence of a target genomic location one or more available cut sites for a Cas-based genome editing system; and analyzing the nucleotide sequence and cut site with a computational model to identify a guide RNA capable of introducing the genetic change into the nucleotide sequence of the target genomic location.
  • the specification provides a method of introducing a genetic change in the genome of a cell with a Cas-based genome editing system comprising: selecting a guide RNA for use in the Cas-based genome editing system in accordance with the method of the above aspect; and contacting the genome of the cell with the guide RNA and the Cas-based genome editing system, thereby introducing the genetic change.
  • the cut sites available in the nucleotide sequence are a function of the particular DSB-inducing genome editing system in use, e.g., a Cas-based genome editing system.
  • the nucleotide sequence is a genome of a cell.
  • the method for introducing the desired genetic change is done in vivo within a cell or an organism (e.g., a mammal), or ex vivo within a cell isolated or separated from an organism (e.g., an isolated mammalian cancer cell), or in vitro on an isolated nucleotide sequence outside the context of a cell.
  • the DSB-inducing genome editing system can be a Cas-based genoe editing system, e.g., a type II Cas-based genome editing system.
  • the DSB-inducing genome editing system can be a TALENS-based editing system or a Zinc- Finger-based genome editing system.
  • the DSB-inducing genome editing system can be any such endonuclease-based system which catalyzes the formation of a double-strand break at a specific one or more cut sites.
  • the method can further comprise selecting a cognate guide RNA capable of directing a double-strand break at the optimal cut site by the Cas-based genome editing system.
  • the guide RNA is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6. In various embodiments, the guide RNA can be known or can be newly designed.
  • the double-strand brake (DSB)-inducing genome editing system is capable of editing the genome without homology-directed repair.
  • the double-strand brake (DSB)-inducing genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system may comprise a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof in certain embodiments.
  • the double-strand brake (DSB)-inducing genome editing system can comprise a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system can comprise a Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof.
  • SpCas9 Streptococcus pyogenes Cas9
  • SpCas9 Staphylococcus pyogenes Cas9
  • SaCas9 Staphyloccocus aureus Cas
  • FeCas9 Francisella novicida Cas9
  • the desired genetic change to be introduced into the nucleotide sequence is to a correction to a genetic mutation.
  • the genetic mutation is a single-nucleotide polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
  • the genetic change can comprises a 2-60-bp deletion or a l-bp insertion.
  • the genetic change in other embodiments can comprise a deletion of between 2-20, or 4- 40, or 8-80, or 16-160, or 32-320, 64-640, or up to 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10,000 or more nucleotides.
  • the deletion can restore the function of a defective gene, e.g., a gain-of-function frameshift genetic change.
  • the desired genetic change is a desired modification to a wildtype gene that confers and/or alters one or more traits, e.g., conferring increased resistance to a pathogen or altering a monogenic trait (e.g., eye color) or polygenic trait (e.g., height or weight).
  • a monogenic trait e.g., eye color
  • polygenic trait e.g., height or weight
  • the disease can be a monogenic disease.
  • monogenic diseases can include, for example, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
  • the step of identifying the available cut sites can involve identifying one or more PAM sequences in the case of a Cas-based genome editing system.
  • the computational model used to analyze the nucleotide sequence is a deep learning computational model, or a neural network model having one or more hidden layers.
  • the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site.
  • the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
  • the computational model comprises one or more training modules for evaluating experimental data.
  • the computational model can comprise: a first training module for computing a microhomology score matrix; a second training module for computing a microhomology independent score matrix; and a third training module for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
  • the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
  • the genomic repair outcomes can comprise microhomology deletions, microhomology- less deletions, and/or l-bp insertions.
  • the computational model can comprise one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site, and type of DSB- genome editing system.
  • the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or even up to 7K, 8K, 9K, 10K, 11K, 12K, 13K, 14K, 15K, 16K, 17K, 18K, 19K, 20K nucleotides, or more in length.
  • the specification relates to guide RNAs which are identified by various methods described herein.
  • the guide RNAs can be any of those presented in Tables 1-6, the contents of which form part of this specification.
  • the RNA can be purely ribonucleic acid molecules.
  • the RNA guides can comprise one or more naturally-occurring or non-naturally occurring modifications.
  • the modifications can including, but are not limited to, nucleoside analogs, chemically modified bases, intercalated bases, modified sugars, and modified phosphate group linkers.
  • the guide RNAs can comprise one or more phosphorothioate and/or 5’-N-phosphporamidite linkages.
  • the specification discloses vectors comprising one or more nucleotide sequences disclosed herein, e.g., vectors encoding one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
  • the vectors may comprise naturally occurring sequences, or non-naturally occurring sequences, or a combination thereof.
  • the specification discloses host cells comprising the herein disclosed vectors encoding one more more nucleotide sequences embodied herein, e.g., one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
  • the specification discloses a Cas-based genome editing system comprising a Cas protein (or homolog, variant, or orthologue thereof) complexed with at least one guide RNA.
  • the guide RNA can be any of those disclosed in Tables 1-6, or a functional variant thereof.
  • the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
  • the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
  • the specification provides a library for training a computational model for selecting a guide RNA sequence for use with a Cas-based genome editing system capable of introducing a genetic change into a genome without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a first nucleotide sequence of a target genomic location having a cut site and a second nucleotide sequence encoding a cognate guide RNA capable of directing a Cas-based genome editing system to carry out a double-strand break at the cut site of the first nucleotide sequence.
  • the specification provides a library and its use for training a computational model for selecting an optimized cut site for use with a DSB -based genome editing system (e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system) that is capable of introducing a desired genetic change into a nucleotide sequence (e.g., a genome) at the selected cut site without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second nucleotide sequence encoding a cognate guide RNA (in embodiments involving a Cas- based genome editing system).
  • a DSB -based genome editing system e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system
  • the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second
  • the specification discloses a computational model.
  • the computational model can predict and/or compute an optimized or preferred cut site for a DSB-based genome editing system for introducing a genetic change into a nucleotide sequence.
  • the repair does not require homology-based repair mechanisms.
  • the computational model can predict and/or compute an optimized or preferred cut site for a Cas-based genome editing system for introducing a genetic change into a nucleotide sequence.
  • the repair does not require homology-based repair mechanisms.
  • the computation model provides for the selection of a optimized or preferred guide RNA for use with a Cas-based genome editing system for introducing a genetic change in a genome.
  • the repair does not require homology-based repair mechanisms.
  • the computational model is a neural network model having one or more hidden layers.
  • the computational model is a deep learning computational model.
  • that the DSB-based genome editing system e.g., a Cas-based genome editing system
  • computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In other embodiments, computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
  • the computational model comprises one or more training modules for evaluating experimental data.
  • the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
  • the computational model predicts genomic repair outcomes for any given input nucleotide sequence (i.e., context sequence) and cut site.
  • the genomic repair outcomes comprise microhomology deletions, microhomology- less deletions, and l-bp insertions.
  • the one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
  • the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more.
  • the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cognate guide RNA, wherein each nucleotide target sequence comprises a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a Cas-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
  • the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a DSB-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
  • the trained computational models disclosed herein are capable of computing a probability of distribution of indel lengths for any given input nucleotide sequence and input cut site, and/or a probability of distribution of genotype frequencies for any given input nucleotide sequence and input cut site.
  • the trained neural network In embodiments relating to Cas-based genomic editing systems, the trained
  • computational model is capable of selecting a guide RNA for use with a Cas-based genome editing system for introducing a genetic change into a genome.
  • the computational model provides a means to produce precision genetic change with a DSB-based genomic editing system.
  • the genetic changes can include microhomology deletion, microhomology-less deletion, and l-bp insertion.
  • the genetic change corrects a disease-causing mutation.
  • the genetic change modifies a wildtype sequence, which may confer a change in a genetic trait (e.g., a monogenic or polygenic trait).
  • the disease-causing mutation that can be corrected using the computational model with a DSB-based genomic editing sytem can include, but is not limited to, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, or Duchenne muscular dystrophy.
  • the disclosure provides a method for selecting one or more guide RNAs (gRNAs) from a plurality of gRNAs for CRISPR, comprising acts of: for at least one gRNA of the plurality of gRNAs, using a local DNA sequence and a cut site targeted by the at least one gRNA to predict a frequency of one or more repair genotypes resulting from template- free repair following application of CRISPR with the at least one gRNA; and
  • the one or more repair genotypes correspond to one or more healthy alleles of a gene related to a disease.
  • the predicted frequency of the one or more repair genotypes is at least about 30%, or at least about 40%, or at least about 50%, or more.
  • the step of predicting the frequency of the one or more repair genotypes comprises: for each deletion length of a plurality of deletion lengths, aligning subsequences of that deletion length on 5’ and 3’ sides of the cut site to identify one or more longest microhomologies; featurizing the identified microhomologies; applying a machine learning model to compute a frequency distribution over the plurality of deletion lengths; and using frequency distribution over the plurality of deletion lengths to determine the frequency of the one or more repair genotypes.
  • the plurality of gRNAs comprise gRNAs for CRISPR/Cas9, and the application of CRISPR comprises application of CRISPR/Cas9.
  • the system comprises: at least one processor; and at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause the at least one processor to perform a herein disclosed computational method.
  • a method for editing a nucleotide sequence using a DSB-based genomic editing system that introduces a genetic change at a cut site in a nucleotide sequence, wherein the cut site location is informed by a computational model that computes a frequency distribution over the plurality of deletion lengths and/or a frequency distribution of one or more repaired genotypes over the deletion lengths.
  • FIG. 1 shows an illustrative DNA segment 100, in accordance with some embodiments.
  • FIGs. 2A-D show an illustrative matching of 3’ ends of top and bottom strands of a DNA segment at a cut site and an illustrative repair product, in accordance with some embodiments.
  • FIG. 3A shows an illustrative machine learning model 300, in accordance with some embodiments.
  • FIG. 3B shows an illustrative process 350 for building one or more machine learning models for predicting frequencies of deletion genotypes and/or deletion lengths, in accordance with some embodiments.
  • FIG. 4A shows an illustrative neural network 400A for computing microhomology (MH) scores, in accordance with some embodiments.
  • FIG. 4B shows an illustrative neural network 400B for computing MH-less scores, in accordance with some embodiments.
  • FIG. 4C shows an illustrative process 400C for training two neural networks jointly, in accordance with some embodiments.
  • FIG. 4D shows an illustrative implementation of the insertion module 315 shown in FIG. 3A, in accordance with some embodiments.
  • FIG. 5 shows an illustrative process 500 for processing data collected from CRISPR/Cas9 experiments, in accordance with some embodiments.
  • FIG. 6 shows an illustrative process 600 for using a machine learning model to predict frequencies of indel genotypes and/or indel lengths, in accordance with some embodiments.
  • FIG. 7 shows illustrative examples of a blunt-end cut and a staggered cut, in accordance with some embodiments.
  • FIG. 8A shows an illustrative plot 800A of predicted repair genotypes, in accordance with some embodiments.
  • FIG. 8B shows another illustrative plot 800B of predicted repair genotypes, in accordance with some embodiments.
  • FIG. 8C shows another illustrative plot 800C of predicted repair genotypes, in accordance with some embodiments.
  • FIG. 8D shows a microhomology identified in the example of FIG. 8C, in accordance with some embodiments.
  • FIG. 9 shows another illustrative neural network 900 for computing a frequency distribution over deletion lengths, in accordance with some embodiments.
  • FIG. 10 shows, schematically, an illustrative computer 1000 on which any aspect of the present disclosure may be implemented.
  • FIGs. 11A-11C show a high-throughput assessment of Cas9-mediated DNA repair products.
  • FIG. 11 A A genome-integrated screening library approach for monitoring Cas9 editing products at thousands of target sequences.
  • FIG. 11B Frequency of Cas9-mediated repair products by class from 1,996 Lib-A target sequences in mouse embryonic stem cells (mESCs).
  • FIG. 11C Distribution of Cas9-mediated repair products by class in 88 VO target sequences in K562 cells.
  • FIGs. 12A-12D show modeling of Cas9-mediated indels by inDelphi.
  • FIG. 12A Schematic of computational flow for inDelphi modeling.
  • inDelphi separates Cas9-mediated editing products by indel type and uses machine learning tools trained on experimental Lib-A editing products to predict relative frequencies of editing products for any target site.
  • Major editing products include 1- to 60-bp MH deletions, 1- to 60-bp MH-less deletions, and l-bp insertions.
  • FIG. 12B Mechanism depicting microhomology-mediated end-joining repair, which yields distinct repair outcomes that reflect which microhomologous bases are used during repair.
  • FIG. 12A Schematic of computational flow for inDelphi modeling.
  • inDelphi separates Cas9-mediated editing products by indel type and uses machine learning tools trained on experimental Lib-A editing products to predict relative frequencies of editing products for any target site.
  • Major editing products include 1- to 60-bp MH deletions, 1- to 60-
  • LIGs. 13A-13L show that Cas9-mediated editing outcomes are accurately predicted by inDelphi.
  • FIG. 13D Comparison of inDelphi and
  • FIG. 13F shows smoothed predicted distribution of the highest frequency indel among major editing outcomes (+1 to -60 indels) for SpCas9 gRNAs targeting the human genome.
  • FIGs. 14A-14F show high-precision, template-free Cas9 nuclease-mediated deletion and insertion.
  • FIG. 14A Schematic of deletion repair at a designed Fib-B target sequence with a 9-bp microduplication and strong sequence microhomology.
  • FIG. 14B Observed frequency of microduplication collapse among all edited products at 56 Fib-B target sequences designed with 7- to 25-bp microduplications.
  • the error band represents the 95% C.I. around the regression estimate with 1 ,000-fold bootstrapping.
  • FIG. 14A Schematic of deletion repair at a designed Fib-B target sequence with a 9-bp microduplication and strong sequence microhomology.
  • FIG. 14B Observed frequency of microduplication collapse among all edited products at 56 Fib-B target sequences designed with 7- to 25-bp microduplications.
  • the error band represents the 95% C.I. around the regression estimate with 1 ,000-fold bootstrapping.
  • FIG. 14D Comparison of the observed l-bp insertion frequency at 205 Fib-B designed sequences as in (FIG. 14C) with varying positions -4 and -3.
  • the box denotes the 25th, 50th, and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers.
  • FIG. 14E Comparison of predicted precision scores to observed precision scores for microhomology deletions in 86 VO target sites in HEK293 cells.
  • FIG. 14F Distribution of the predicted frequency of the most frequent deletion and insertion outcomes among major editing outcomes (l-bp insertions, 1- to 60-bp MH deletions, and 1- to 60-bp MH-less deletions) at 1,063,802 Cas9 gRNAs targeting human exons and introns.
  • FIGs. 15A-15F show precise template-free Cas9-mediated editing of pathogenic alleles to wild-type genotypes.
  • FIG. 15 A Using Cas9-nuclease to correct a pathogenic FDFR allele to wild-type.
  • FIG. 15B Comparison among ClinVar/HGMD pathogenic alleles of observed and predicted frequencies of repair to wild-type alleles, accompanied by a histogram of observed frequencies.
  • Major editing products include l-bp insertions, 1- to 60-bp MH deletions, and 1- to 60-bp MH-less deletions.
  • FIG. 15C Comparison of observed and predicted frequencies of frameshift repair to the wild-type frame among ClinVar/HGMD pathogenic alleles, accompanied by a histogram of observed frequencies.
  • Major editing products are defined as in (FIB. 15B).
  • FIG. 15D Histograms of observed frequencies of repair to the wild-type genotype for wild-type mESCs and Prdkc-/-Lig4- /- mESCs at Lib-B pathogenic microduplication alleles with predicted repair frequency >50% among all major editing products, defined as in (FIG. 15B). Dashed lines indicate sample means which differ significantly.
  • FIG. 15E Flow cytometry contour plots showing GFP fluorescence and FDF-DyFight550 uptake in mESCs containing the FDFRdupl662_l669dupGCTGGTGA-P2A-GFP allele (FDFRdup- P2A-GFP) and treated with SpCas9 and gRNA when denoted.
  • FIG. 15F Fluorescence microscopy of mESCs containing the FDFRdupl662_l669dupGCTGGTGA-P2A-GFP allele treated with SpCas9 and gRNA, or untreated.
  • FIGs. 16A-16F show design and cloning of a high-throughput library to assess CRISPR- Cas9-mediated editing products.
  • FIG. 16A From left to right, distributions of predicted Cas9 on- target efficiency (Azimuth score), number of nucleotides participating in microhomology in 3- 30-bp deletions, GC content, and estimated precision of deletion outcomes derived from 169,279 potential SpCas9 gRNA target sites in the human genome with quintiles marked as used to design Fib-A.
  • FIG. 16B Schematic of the cloning process used to clone Fib-A and Fib-B.
  • the cloning process involves ordering a library of oligonucleotides pairing a gRNA protospacer with its 55-bp target sequence, centered on an NGG PAM.
  • the library undergoes an intermediate Gibson Assembly circularization step, restriction enzyme linearization, and Gibson Assembly into a plasmid backbone containing a U6-promoter to facilitate gRNA expression, a hygromycin resistance cassette, and flanking Tol2 transposon sites to facilitate integration into the genome.
  • Outliers are depicted as diamonds l-bp insertion frequency adjustment was performed at each target site by proportionally scaling them to be equal between two cell types.
  • FIG. 16E shows Pearson’s r of genotype frequencies comparing lib-A in mESCs and U20S cells with end
  • FIGs.l7A-l7D show that high-throughput CRISPR-Cas9 editing outcome screening yields replicate-consistent data that is concordant with the repair spectrum at endogenous human genomic loci.
  • FIG. 17D Frequencies of deletions occurring beyond the Cas9 cutsite by distance as measured by the number of bases between the deleted base nearest to the cutsite and the two bases immediately surrounding the cutsite. Cutsite and distances are oriented with the NGG PAM on the positive side.
  • FIG. 18A Diagram of all unique alignment outcomes at an example 7-bp deletion accompanied with a table of their MH-less end-joining type, MH length, deletion length, and delta-position.
  • FIG. 18B Plot of function learned by the neural network modeling MH deletions (MH-NN) mapping MH length and % GC to a numeric score (psi).
  • FIG. 18C Plot of function learned by the neural network modeling MH-independent deletions (MHless-NN) mapping deletion length to a numeric score (psi).
  • FIG. 18D Histogram of MHless-NN phi scores by deletion length, normalized to sum to 1.
  • FIG. 18E Observed frequency of 1 -bp insertion genotypes in 1,981 Lib- A target sequences with varying -4 nucleotides.
  • microhomology-less deletions In microhomology-less deletions, one 3’-overhang is ligated to the dsDNA backbone and the opposing strand is removed entirely, giving rise to a unilateral deletion with loss of bases on one side of the cutsite only. DNA polymerase and ligation bridge the ssDNA to create a contiguous dsDNA strand. Microhomology-independent mutations occur as a combined result of exonuclease, polymerase, and ligase activity that results in the joining of modified ends at the double strand break cutsite, giving rise to microhomology-less deletions, insertions, and mixtures thereof. FIG.
  • FIGs. 19A-19F show performance of inDelphi at predicting Cas9-mediated indel length and repair genotypes.
  • FIG. 19A Box and swarm plot of the Pearson correlation at 189 held-out Lib-A target sequences comparing inDelphi predictions with observed mESC Lib-A genotype product frequencies.
  • FIG. 19B Box and swarm plot of the Pearson correlation at 189 held-out Lib-A target sequences comparing inDelphi predictions with observed mESC Lib-A indel length frequencies for l-bp insertions to 60-bp deletions. Box plot as in (FIG. 19A).
  • FIG. 19C Distribution of predicted frameshift frequencies among 1 -60-bp deletions for SpCas9 gRNAs targeting exons, shuffled exons, and introns in the human genome. Dashed lines indicate means. *** P ⁇ 10-100.
  • FIG. 19D Pie chart depicting the output of Delphi for specific outcome classes.
  • the error band represents the 95% confidence intervals around the regression estimate with 1 ,000-fold bootstrapping.
  • FIGs. 20A-20K show target sequences with extremely high or low microhomology phi scores skew toward a single predictable Cas9-mediated edited product.
  • the Error band represents the 95% C.I. around the regression estimate with 1, 000-fold bootstrapping.
  • the Error band represents the 95% C.I. around the regression estimate with 1, 000-fold bootstrapping.
  • FIGs. 21A-21D show the precise repair of pathogenic microduplications.
  • FIG. 21A Observed frequencies of repair to wild-type genotype at 194 ClinVar pathogenic alleles vs.
  • FIG. 21B Observed frequencies of repair to wild-type frame at 140 ClinVar pathogenic alleles vs. predicted frequencies in Lib-B in human HEK293T cells.
  • FIG. 21C Observed frequencies of repair to wild-type genotype at 49 Clinvar pathogenic alleles vs. predicted frequencies in Lib-B in human U20S cells.
  • FIG. 21D Observed frequencies of repair to wild-type frame at 37 ClinVar pathogenic alleles vs. predicted frequencies in Lib-B in human U20S cells.
  • FIGs. 22A-22E show altered distribution of Cas9-mediated genotypic products in Prkdc- /-Lig4-/- mESCs as compared to wild-type mESCs.
  • FIG. 22A Distribution of Cas9-mediated genotypic products by repair outcome class in Prkdc-/-Lig4-/- mESC for 1,446 target sequences.
  • FIG. 22B Comparison of observed mean frequency of deletion products contributed by microhomology-less unilateral joining and medial joining deletions among all deletions comparing 1,995 Lib-A target sequences in wildtype mESC to 1,850 Lib-A target sequences in Prkdc-/-Lig4-/- mESC.
  • FIG. 22C Comparison of observed frequency of deletion products contributed by microhomology-less unilateral joining and medial joining deletions among all deletions, between 1,995 Lib-A target sequences in wildtype mESC to 1,850 Lib-A target sequences in Prkdc-/-Lig4-/- mESC. * and ** as in (FIG. 22B).
  • the box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers.
  • FIG. 22D Observed mean frequency of l-bp insertion genotypes at 1,055 target sequences with varying -4 nucleotides in Lib-A in Prkdc-/-Lig4-/- mESCs. The error bars show the 95% C.I. on the sample mean with 1, 000-fold bootstrapping.
  • FIG. 22E Observed frequency of l-bp insertion genotypes at 1,055 target sequences with varying -4 nucleotides in Lib-A in Prkdc-/-Lig4-/- mESCs. Box plot as in
  • FIGs. 23A-23H show that template-free Cas9-nuclease editing of human cells containing pathogenic LDLR microduplication alleles restores LDL uptake.
  • FIG. 23A Flow cytometric contour plots showing GFP fluorescence and LDL-Dylight550 uptake in HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted.
  • FIG. 23B Fluorescence microscopy of HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue.
  • FIG. 23C Fluorescence microscopy of U20S cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue.
  • FIG. 23D Flow cytometry gating strategy used for mESC + LDLRdup-P2A-GFP untreated.
  • FIG. 23E Flow cytometry gating strategy used for mESC + LDLRdup-P2A-GFP + SpCas9 + gRNA.
  • FIG. 23C Fluorescence microscopy of U20S cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue.
  • FIG. 23D Flow cytometry
  • the box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds.
  • *P 1.6 x 10-4, two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, the most frequent l-bp insertion genotype predicted by inDelphi that does not correspond to the wild-type genotype is indicated by an asterisk.
  • FIG. 23H shows mESC-trained inDelphi genotype prediction accuracy as 40 library sites.
  • FIG. 24A is a schematic depicting an exemplary method of using a trained computational model (e.g.,“inDelphi”) in conjunction with a Cas-based genome editing system to edit a nucleotide sequence (e.g., a genome) to achieve a desired genetic outcome (e.g., a correction to a disease-causing mutation to treat a disease, or modification of a wildtype type gene to confer an improved trait or phenotype).
  • a trained computational model e.g.,“inDelphi”
  • a Cas-based genome editing system to edit a nucleotide sequence (e.g., a genome) to achieve a desired genetic outcome (e.g., a correction to a disease-causing mutation to treat a disease, or modification of a wildtype type gene to confer an improved trait or phenotype).
  • the trained computational model computes the probability distribution of indel lengths and the probability distribution of genotype frequencies, enabling the user to select the optimal input (e.g., cut site) for conducting editing by a Cas-based genome editing system to achieve the highest frequency of desired genetic output.
  • the computational method may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof.
  • FIG. 24B is a schematic depicting an exemplary method of using a trained computational model (e.g.,“inDelphi”) in conjunction with a double-strand break (DSB)-inducing genome editing system to edit a nucleotide sequence (e.g., a genome) to achieve a desired genetic outcome (e.g., a correction to a disease-causing mutation to treat a disease, or modification of a wildtype type gene to confer an improved trait or phenotype).
  • a trained computational model e.g.,“inDelphi”
  • DSB double-strand break
  • the trained computational model For any given set of inputs (a context sequence and a selected cut site), the trained computational model computes the probability distribution of indel lengths and the probability distribution of genotype frequencies, enabling the user to select the optimal input (e.g., cut site) for conducting editing by a DSB- inducing genome editing system to achieve the highest frequency of desired genetic output.
  • the computational method may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof.
  • FIG. 25A-25D provides a characterization of lib-B data including pathogenic
  • FIG. 25B shows the frequencies of repair to wild- type genotype at 567 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U20S cells with Pearson’s r.
  • FIG. 25C shows the frequencies of repair to wild-type frame at 437 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U20S cells with Pearson’s r.
  • FIG. 26A-26G shows the altered distributions of Cas9-mediated genotypic products in Prkdc-/-Lig4-/- mESCs and mESCs treated with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs.
  • FIG. 26C shows frequency of l-bp insertions at 1,055 target sites in lib-A in Prkdc-/-Lig4-/- mESCs.
  • FIG. 26C shows frequency of l-bp insertions at 1,055 target sites in lib-A in Prkdc-/-Lig4-/- mESCs.
  • a reference to“an agent” includes a single agent and a plurality of such agents.
  • Cas9 or“Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 protein, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9).
  • a Cas9 nuclease is also referred to sometimes as a casn 1 nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease.
  • CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids).
  • CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target
  • RNA single guide RNAs
  • sgRNA single guide RNAs
  • Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self.
  • Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeanx C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
  • Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilu . Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
  • a Cas9 nuclease has an inactive (e.g ., an inactivated) DNA cleavage domain.
  • Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_0l5683.l, NC_0l73l7.l); Corynebacterium diphtheria (NCBI Refs: NC_0l6782.l, NC_0l6786.l); Spiroplasma syrphidicola (NCBI Ref: NC_02l284.l); Prevotella intermedia (NCBI Ref: NC_0l786l.
  • NCBI Refs Corynebacterium ulcerans
  • NCBI Refs Corynebacterium diphtheria
  • NCBI Refs NC_0l6782.l, NC_0l6786.l
  • Spiroplasma syrphidicola NC_02l284.l
  • Prevotella intermedia NCBI Ref: NC_0l786l.
  • NCBI Ref Spiroplasma taiwanense
  • NCBI Ref Streptococcus iniae
  • NCBI Ref NC_02l3l4.l
  • Belliella baltica NCBI Ref: NC_0l80l0.l
  • Psychroflexus torques I NCBI Ref: NC_0l872l.l
  • Streptococcus thermophilus NCBI Ref: YP_820832. l
  • Listeria innocua NCBI Ref: NP_472073.l
  • Campylobacter jejuni NCBI Ref:
  • YP_002344900.1 YP_002344900.1
  • NCBI Ref YP_002342100.1
  • Cas-based genome editing system refers to a system comprising any naturally occurring or variant Cas endonuclease (e.g., Cas9), or functional variant, homolog, or orthologue thereof, and a cognate guide RNA.
  • the term“Cas-based genome editing system” may also refer to an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA.
  • DDB-based genome editing system refers to a system comprising any naturally occurring or variant endonuclease which catalyzes the formation of a double strand break at a cut site (e.g., Cas9, Crfl, TALEN, or Zinc Finger), or functional variant, homolog, or orthologue thereof, and a cognate guide RNA if required (e.g., TALENs and Zinc Fingers do not require a guide RNA for targeting to a cut site).
  • a cut site e.g., Cas9, Crfl, TALEN, or Zinc Finger
  • DSB-based genome editing system may also refer to an expression vector having at least one expressible nucleotide sequence encoding a DSB endonuclease protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, if required (e.g., as required for Cas9 or Crfl).
  • an effective amount refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some
  • an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease.
  • an effective amount of a recombinase may refer to the amount of the recombinase that is sufficient to induce recombination at a target site specifically bound and recombined by the recombinase.
  • an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • linker refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease.
  • a linker joins a gRNA binding domain of an RNA-programmable nuclease and the catalytic domain of a recombinase.
  • a linker joins a dCas9 and a recombinase.
  • the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
  • the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
  • the linker is an organic molecule, group, polymer, or chemical moiety.
  • the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length.
  • Fonger or shorter linkers are also contemplated.
  • mutant refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way.
  • nucleic acid and“nucleic acid molecule,” as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleoside, a nucleotide, or a polymer of nucleotides.
  • polymeric nucleic acids e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage.
  • “nucleic acid” refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides).
  • nucleic acid refers to an oligonucleotide chain comprising three or more individual nucleotide residues.
  • oligonucleotide and polynucleotide can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides).
  • nucleic acid encompasses RNA as well as single and/or double- stranded DNA.
  • Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule.
  • a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides.
  • a non-naturally occurring molecule e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides.
  • nucleic acid “DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone.
  • Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5' to 3' direction unless otherwise indicated.
  • a nucleic acid is or comprises natural nucleosides (e.g.
  • nucleoside analogs e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5- methylcytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5- propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7- deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocy
  • protein refers to a polymer of amino acid residues linked together by peptide (amide) bonds.
  • the terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long.
  • a protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins.
  • One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc.
  • a protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex.
  • a protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide.
  • a protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof.
  • fusion protein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins.
  • One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively.
  • a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a recombinase.
  • a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent.
  • a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA.
  • Any of the proteins provided herein may be produced by any method known in the art.
  • the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
  • RNA-programmable nuclease and“RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage.
  • an RNA-programmable nuclease when in a complex with an RNA, may be referred to as a nuclease:RNA complex.
  • the bound RNA(s) is referred to as a guide RNA (gRNA).
  • gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule.
  • gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeabley to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules.
  • gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 protein.
  • domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure.
  • domain (2) is homologous to a tracrRNA as depicted in Figure 1E of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference.
  • gRNAs e.g., those including domain 2
  • a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.”
  • an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein.
  • the gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex.
  • the RNA- programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csnl) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes .” Ferretti J.J., McShan W.M., Ajdic D J., Savic D J., Savic G., Lyon K., Primeanx C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
  • Cas9 endonuclease for example Cas
  • RNA-programmable nucleases e.g., Cas9
  • Cas9 RNA:DNA hybridization to target DNA cleavage sites
  • these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA.
  • Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013);
  • the term“subject,” as used herein, refers to an individual organism, for example, an individual mammal.
  • the subject is a human.
  • the subject is a non-human mammal.
  • the subject is a non-human primate.
  • the subject is a rodent.
  • the subject is a sheep, a goat, a cattle, a cat, or a dog.
  • the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode.
  • the subject is a research animal.
  • the subject is genetically engineered, e.g., a genetically engineered non -human subject. The subject may be of either sex and at any stage of
  • treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
  • the terms“treatment,”“treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
  • treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease.
  • treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.
  • CRISPR/Cas9 major research efforts focus on improving efficiency and specificity of genome editing systems, such as, CRISPR/Cas9, other Cas-based, TALEN-based, and Zinc Finger-based genome editing systems.
  • efficiency may be improved by predicting optimal Cas9 guide RNA (gRNA) sequences, while specificity may be improved by modeling factors leading to off-target cutting, and by manipulating Cas9 enzymes.
  • Variant Cas9 enzymes and fusion proteins may be developed to alter the protospacer adjacent motif (PAM) sequences acted on by Cas9, and to produce base-editing Cas9 constructs with high efficiency and specificity.
  • PAM protospacer adjacent motif
  • Cpfl also known as Casl2a
  • Cpfl also known as Casl2a
  • nucleotide insertions and/or deletions resulting from template-free repair mechanisms are commonly thought to be random and therefore only suitable for gene knock-out applications.
  • a template- based repair mechanism such as HDR is typically used.
  • CRISPR/Cas with HDR allows arbitrarily designed DNA sequences to be incorporated at precise genomic locations.
  • this technique suffers from low efficiency - HDR occurs rarely in typical biological conditions (e.g., around 10% frequency), because cells only permit HDR to occur after sister chromatids are synthesized in S phase but before M phase when mitosis splits the sister chromatids into daughter cells.
  • the fraction of time spent in S-G2-M phases of a cell cycle is low.
  • HDR occurs infrequently, and therefore a desired DNA sequence will be incorporated into only a small percentage of cells.
  • the HDR repair pathway is no longer used, further limiting HDR’s utility for genetic engineering.
  • NHEJ is capable of occurring during any phase of a cell cycle and in post mitotic cells.
  • NHEJ has been perceived as a random process that produces a large variety of repair genotypes with insertions and/or deletions, and has been used mainly to knock out genes. In short, NHEJ is efficient but unpredictable.
  • the present inventors have unexpectedly found through computational analyses that template-free DNA/genome editing systems, e.g., CRISPR/Cas9, Cas-based, Cpfl-based, or other DSB (double-strand break)-based genome editing systems, produce a predictable set of repair genotypes thereby enabling the use of such editing systems for applications involving or requiring precise manipulation of DNA, e.g., the correction of a disease-causing genetic mutation or modifying a wildtype sequence to confer a genetic advantage.
  • This finding is contrary to the accepted view that DNA double-strand break repair (i.e., template-free, non-homology- dependent repair) following cleavage by genome editing systems produces stochastic and heterogenous repair products and are therefore impractical for applications beyond gene disruption.
  • the specification describes and discloses in various aspects and embodiments computational-based methods and systems for practically harnessing the innate efficiencies of template-free DNA repair systems for carrying out precise DNA and/or genomic editing without the reliance upon homology-based repair.
  • techniques are provided for predicting genotypes of CRISPR/Cas editing outcomes. For instance, a high-throughput approach may be used for monitoring CRISPR/Cas cutting outcomes, and/or a computer-implemented method may be used to predict genotypic repair outcomes for NHEJ and/or MMEJ. The inventors have recognized and appreciated that accurate prediction of repair genotypes may allow development of
  • CRISPR/Cas gene knock-in or gain-of-function applications based on one or more template-free repair mechanisms. This approach may simplify a genome editing process, by reducing or eliminating a need to introduce exogenous DNA into a cell as a template.
  • template-free repair mechanisms for gene knock-in may provide improved efficiency.
  • the inventors have recognized and appreciated that NHEJ and MMEJ may account for a large portion of CRISPR/Cas repair products. While template-free repair mechanisms may not always produce desired repair genotypes with sufficiently high frequencies, one or more desired repair genotypes may occur with sufficiently high frequencies in some specific local sequence contexts. For such a local sequence context, template-free repair mechanisms may outperform HDR with respect to simplicity and efficiency.
  • one or more of the techniques provided herein may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof, which may facilitate designs of gene knock-in or gain-of-function applications.
  • template-free repair genotypes and frequencies of occurrence thereof may facilitate designs of gene knock-in or gain-of-function applications.
  • the inventors have recognized and appreciated that some disease- causing alleles, when cut at a selected location by CRISPR/Cas, may exhibit one or just a few repair outcomes that occur at a high frequency and transform the disease-causing allele into one or more healthy alleles.
  • Disease-causing alleles may occur in genomic sequences that code for proteins or regulatory RNAs, or genomic sequences that regulate transcription or other genomic functions.
  • one or more of the techniques provided herein may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof, which may be used to select desirable one or more guide RNAs when HDR is employed to edit DNA. Since HDR does not occur 100% of the time, the template-free repair genotypes predicted by this method will be a natural byproduct of sites where HDR failed to occur. The one or more techniques provided herein allow these failed HDR byproducts to be predicted and one or more guide RNAs chosen that will produce the most desirable byproducts for HDR failures.
  • a disease-causing allele may be targeted for HDR repair, but if HDR does not occur at a specific site the template-free repair products can be chosen to transform a disease-causing allele into one or more healthy alleles or to not have deleterious effects. Deleterious effects could result from template-free repair that changed a weakly functional allele into a non-functional allele or into a dominant allele that negatively impacted health.
  • guide RNA selection consists of considering all guide RNAs that are compatible with HDR repair of a disease-causing allele, and for each guide RNA using one or more of the techniques provided herein to predict its template-free repair genotypes.
  • One or more guide RNAs are then selected for use with the HDR template that have the template-free repair genotypes that are most advantageous for health. Alternatively in some embodiments, one or more guide RNAs are then selected for use with the HDR template that have the template- free repair genotypes that are most likely to disrupt gene function.
  • the techniques disclosed herein may be implemented in any of numerous ways, as the disclosed techniques are not limited to any particular manner of implementation. Examples of details of implementation are provided solely for illustrative purposes. For instance, while examples are given where CRISPR/Cas9 is used to perform genome editing, it should be appreciated that aspects of the present application are not so limited. In some embodiments, another genome editing technique, such as CRISPR/Cpfl, may be used.
  • the disclosed techniques may be used individually or in any suitable combination, as aspects of the present disclosure are not limited to the use of any particular technique or combination of techniques.
  • FIG. 1 shows an illustrative DNA segment 100, in accordance with some embodiments.
  • the DNA segment 100 may be exon 43 of a dystrophin gene.
  • Duchenne s muscular dystrophy cases are caused by mutations in this exon.
  • Therapeutic solutions showing success in clinical trials use antisense oligonucleotides to cause this exon to be skipped during translation, thereby restoring normal dystrophin function.
  • CRISPR/Cas9 or another suitable technique for cutting a DNA sequence, such as CRISPR/Cpfl
  • CRISPR/Cas9 may be used to disrupt a donor splice site motif of dystrophin exon 43, and one or more template-free repair mechanisms may restore normal dystrophin function.
  • the specification discloses a computational model.
  • the computational model can predict and/or compute an optimized or preferred cut site for a DSB-based genome editing system for introducing a genetic change into a nucleotide sequence.
  • the repair does not require homology-based repair mechanisms.
  • the computational model can predict and/or compute an optimized or preferred cut site for a Cas-based genome editing system for introducing a genetic change into a nucleotide sequence.
  • the repair does not require homology-based repair mechanisms.
  • the computation model provides for the selection of a optimized or preferred guide RNA for use with a Cas-based genome editing system for introducing a genetic change in a genome.
  • the repair does not require homology-based repair mechanisms.
  • the computational model is a neural network model having one or more hidden layers.
  • the computational model is a deep learning computational model.
  • DSB-based genome editing system e.g., a Cas-based genome editing system
  • computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In other embodiments, computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
  • the computational model comprises one or more training modules for evaluating experimental data.
  • the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
  • the computational model predicts genomic repair outcomes for any given input nucleotide sequence (i.e., context sequence) and cut site.
  • the genomic repair outcomes comprise microhomology deletions, microhomology- less deletions, and l-bp insertions.
  • the one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
  • the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more.
  • the computation model concerns predicting genetic repair outcomes at double-strand breaks cleaves induced by any DSB-based genomic editing system (e.g., CRISPR/Cas9, Cas-base, Cfrl-based, or others).
  • FIG. 1 depicts the anatomy of a double strand break. In the example shown in FIG.
  • the DNA segment 100 includes a top strand 105 A and a bottom strand 105B. These two strands are complementary and therefore encode the same information.
  • CRISPR/Cas9 may be used to create a double strand cut at a selected donor splice site motif, which may be a specific sequence of 6-10 nucleotides.
  • a selected donor splice site motif which may be a specific sequence of 6-10 nucleotides.
  • an NGG PAM may be used, as underlined and shown at 115, so that a cut site 110 would occur within the selected donor splice site motif.
  • Any suitable algorithm may be used to detect presence or absence of the splice site motif in repair products, thereby verifying if the splice site motif has been successfully eliminated.
  • FIGs. 2A-D show an illustrative matching of 3’ ends of top and bottom strands of a DNA segment at a cut site and an illustrative repair product, in accordance with some embodiments.
  • the strands may be the illustrative top strand 105 A and the illustrative bottom strand 105B of FIG. 1
  • the cut site may be the illustrative cut site HO of FIG. 1. (To avoid clutter, the surrounding sequence context is omitted in FIGs. 2B-D.)
  • a segment of double-stranded DNA may be represented such that the top strand runs 5’ on the left to 3’ on the right.
  • nucleotides and their complementary base-paired nucleotides that lie between the 5’ end of the top strand and the cut site may be said to be located at the 5’ side of the cut site.
  • nucleotides and their complementary base-paired nucleotides that lie between the cut site and the 3’ end of the top strand may be said to be located at the 3’ side of the cut site.
  • a deletion length of 5 base pairs is considered, for example, as a result of 5’ end resection, where the top strand 105A has an overhang 200A of length 5 at the 5’ side of the cut site 110, and the bottom strand 105B has an overhang 200B of length 5 at the 3’ side of the cut site 110.
  • FIG. 2B there is no match between the overhangs 200A and 200B in the first three bases, but there is a match in each of the last two bases.
  • a microhomology 205 is present, with a 2 base pair match.
  • FIG. 2C shows an illustrative result of flap removal, where the three mismatched bases in the overhang 200B are removed.
  • some or all nucleotides on the 3’ side of the microhomology on the top strand, and/or some or all nucleotides on the 3’ side of the microhomology on the bottom strand may be resected.
  • nucleotides to the right of the microhomology on the top strand may be resected, and nucleotides to the left of the microhomology on the bottom strand may be resected.
  • FIG. 2D shows an illustrative repair product resulting from polymerase fill-in and ligation, where three matching bases are added to the overhang 200B.
  • FIG. 3A shows an illustrative machine learning model 300, in accordance with some embodiments.
  • the machine learning model 300 may be trained using experimental data to compute, given an input DNA sequence seq and a cut site location, a probability distribution over any suitable set of deletion and/or insertion genotypes, and/or a probability distribution over any suitable set of deletion and/or insertion lengths. For instance, in some embodiments, 1 base pair insertions and 1 -60 base pair deletions may be considered. (These repair outcomes may also be referred to herein as +1 to -60 indels.) The inventors have observed empirically that indels outside of this range occur infrequently. However, it should be appreciated that aspects of the present disclosure are not limited to any particular set of repair outcomes. In some embodiments, 1 base pair insertions and 1 -60 base pair deletions may be considered. (These repair outcomes may also be referred to herein as +1 to -60 indels.) The inventors have observed empirically that indels outside of this range occur infrequently. However,
  • only insertions e.g., 1-2 base pair insertions
  • only deletions e.g., 1-28 base pair deletions
  • the inventors have recognized and appreciated that accurate predictions of repair outcomes may be facilitated by separating the repair outcomes into three classes: microhomology (MH) deletions, microhomology-less (MH-less) deletions, and insertions.
  • the inventors have further recognized and appreciated that different machine learning techniques may be applied to the different classes of repair outcomes.
  • the machine learning model 300 includes three modules: the MH deletion module 305, the MH-less deletion module 310, and the insertion module 315. As discussed below, these modules may compute scores for various indel genotypes and/or indel lengths, which may in turn be used to compute a probability distribution over indel genotypes and/or a probability distribution over indel lengths.
  • one or more modules may be trained jointly.
  • a module may be dependent upon one or more other modules.
  • an input feature used in the insertion module 315 may be derived based on outputs of the MH deletion module 305 and/or the MH-less deletion module 310.
  • MH deletions may include deletions that are derivable analytically by simulating MMEJ. For instance, all microhomologies may be identified for deletion lengths of interest (e.g., deletion lengths 1-60).
  • a genotypic outcome may be derived for each such microhomology by simulating polymerase fill-in, for example, as discussed in connection with FIGs. 2A-2D. (The inventors have recognized and appreciated that there is a one-to-one correspondence between the microhomologies and the genotypic outcomes.) A deletion that is derivable in this manner may be classified as a MH deletion, whereas a deletion that is not derivable in this matter may be classified as a MH-less deletion.
  • an input DNA sequence seq may be represented as a vector with integer indices, where each element of the vector is a nucleotide from the set, ⁇ A, C, G, T ⁇ , and the cut site is between seq[— 1] and seq[0], and seq is oriented 5’ on the left to 3’ on the right.
  • a subsequence seq[i: j], i ⁇ j, may be a vector of length j— i, including elements seq[i ] to seq[j— 1]
  • L of interest e.g., L between 1 and 60
  • left[L ⁇ may be used to denote seq[—L: 0]
  • right[L ⁇ may be used to denote seq[ 0, L].
  • left ⁇ 5] may be ACAAG
  • right ⁇ 5] may be GGTAG.
  • L of interest e.g., L between 1 and 60
  • ' for all i ⁇ k ⁇ j and match[L] [i] ! '
  • ’ and match[L] ⁇ j ] ! '
  • match[L] ⁇ j ] ! '
  • G[n] denote the genotypic outcome corresponding to the microhomology n
  • ML ⁇ n ] denote the microhomology length of the microhomology n
  • C [n] denote the GC fraction of the microhomology n
  • DL [n] denote the deletion length of the microhomology n.
  • FIG. 3B shows an illustrative process 350 for building one or more machine learning models for predicting frequencies of deletion genotypes and/or deletion lengths, in accordance with some embodiments.
  • the process 350 may be used to build the illustrative MH deletion module 305 and/or the illustrative MH-less deletion module 310 in the example of FIG. 3A.
  • These modules may be used to compute, given an input DNA sequence seq and a cut site location, a probability distribution over any suitable set of deletion genotypes and/or a probability distribution over any suitable set of deletion lengths.
  • a probability distribution over deletion lengths from 1-60 may be computed.
  • an upper limit of deletion lengths may be determined based on availability of training data and/or any other one or more suitable considerations.
  • act 355 of the process 350 may include, for each deletion length L of interest (e.g., each deletion length between 1-60), aligning subsequences of length L on the 5’ and 3’ sides of a cut site in an input DNA sequence to identify one or more microhomologies, as discussed in connection with FIG. 3A. This may be performed for an input DNA sequence and a cut site for which repair genotype data from an CRISPR/Cas9 experiment is available.
  • L of interest e.g., each deletion length between 1-60
  • aligning subsequences of length L on the 5’ and 3’ sides of a cut site in an input DNA sequence to identify one or more microhomologies as discussed in connection with FIG. 3A. This may be performed for an input DNA sequence and a cut site for which repair genotype data from an CRISPR/Cas9 experiment is available.
  • one or more microhomologies identified at act 355 may be featurized. Any suitable one or more features may be used, as aspects of the present disclosure are not so limited. As one example, the inventors have recognized and appreciated that energetic stability of a microhomology may increase proportionately with a length of the microhomology. Accordingly, in some embodiments, a microhomology length j— i may be used as a feature for a
  • thermodynamic stability of a microhomology may depend on specific base pairings, and that G-C pairings have three hydrogen bonds and therefore have higher thermodynamic stability than A-T pairings, which have two hydrogen bonds. Accordingly, in some embodiments, a GC fraction, as shown below, may be used as a feature for a microhomology match[L] [i : j], where
  • a length N vector may be constructed for each feature (e.g., microhomology length, GC fraction, etc.), where N is the number of microhomologies identified at act 355 for a set of deletion lengths of interest (e.g., 1-60), as discussed in connection with FIG. 3A.
  • N is the number of microhomologies identified at act 355 for a set of deletion lengths of interest (e.g., 1-60), as discussed in connection with FIG. 3A.
  • the inventors have recognized and appreciated that there is a one- to-one correspondence between microhomologies and genotypic outcomes that are classified as MH deletions. Therefore, feature vectors for microhomologies may be viewed as feature vectors for MH deletions.
  • acts 355 and 360 may be repeated for different input DNA sequences and/or cut sites for which repair genotype data from CRISPR/Cas9 experiments is available.
  • aspects of the present disclosure are not limited to any particular featurization technique.
  • two features may be used, such as microhomology length and GC fraction.
  • one feature may be used (e.g., microhomology length, GC fraction, or some other suitable feature), or more than two features may be used (e.g., three, four, five, etc.).
  • Examples of features that may be used for a microhomology match[L] [i : j] within a deletion of length L include, but are not limited to, a position of the microhomology within the deletion (e.g., as represented by and a ratio between a length of the microhomology (i.e., j— i ) and the
  • deletion length L L.
  • DNase deoxyribonuclease hypersensitivity
  • open vs. closed chromatin may be used as a feature. Any one or more of these features, and/or other features, may be used in addition to, or instead of, microhomology length and GC fraction.
  • explicit featurization may be reduced or eliminated by automatically learning data representations (e.g., using one or more deep learning techniques).
  • one or more machine learning models may be trained at act 365 to compute one or more target probability distributions.
  • a neural network model may be built for the illustrative MH deletion module 305 in the example of FIG. 3A.
  • This model may take as input a length N vector for each of one or more features, as constructed at act 360, and output a length N vector of MH scores, where N is the number of microhomologies identified at act 355 for a set of deletion lengths of interest (e.g., 1-60).
  • a neural network model may be built for the illustrative MH-less deletion module 310 in the example of FIG. 3A. This model may take as input a vector for each of one or more features, and output a vector of MH-less scores.
  • FIG. 4A shows an illustrative neural network 400A for computing MH scores, in accordance with some embodiments.
  • the neural network 400A may be used in the illustrative MH deletion module 305 in the example of FIG. 3 A, and may be trained at act 365 of the illustrative process 350 shown in FIG. 3B.
  • the neural network 400A may have one input node for each microhomology feature being used. For instance, in the example shown in FIG. 4A, there are two input nodes, which are associated with microhomology length and GC fraction, respectively. Each input node may receive a length N vector, where N is the number of microhomologies identified for a set of deletion lengths of interest (e.g., 1-60), for example, as discussed in connection with act 355 in the example of FIG. 3B.
  • N is the number of microhomologies identified for a set of deletion lengths of interest (e.g., 1-60), for example, as discussed in connection with act 355 in the example of FIG. 3B.
  • the neural network 400A may include one or more hidden layers, each having one or more nodes.
  • aspects of the present disclosure are not limited to the use of any particular number of hidden layers or any particular number of nodes in a hidden layer.
  • different hidden layers may have different numbers of nodes.
  • the neural network 400A may be fully connected. (To avoid clutter, the connections are not illustrated in FIG. 4A.) However, that is not required.
  • a dropout technique may be used, where a parameter p may be selected, and during training each node’s value is independently set to 0 with probability p. This may result in a neural network that is not fully connected.
  • a leaky rectified linear unit (ReLU) nonlinearity sigma may be used in the neural network 400A.
  • ReLU leaky rectified linear unit
  • the neural network 400A may be parameterized by w[h ] and b[h ⁇ for each hidden layer h.
  • these parameters may be initialized randomly, for example, from a spherical Gaussian distribution with some suitable center (e.g., 0) and some suitable variance (e.g., 0.1). These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments, for instance, as discussed below.
  • the neural network 400A may have one output node, producing a length N vector ' YMII of scores, where N is the number of microhomologies identified for the set of deletion lengths of interest (e.g., 1-60). Thus, there may be one score for each identified microhomology.
  • the neural network 400 A may operate independently for each microhomology, taking as input the length of that microhomology (from the first input node) and the GC fraction of that microhomology (from the second input node), transforming those two values into 16 values (at the first hidden layer), then transforming those 16 values into 16 other values (at the second hidden layer), and finally outputting a single value (at the output node).
  • parameters for the first hidden layer, w[l] [i] and b[l] [i] are vectors of length 2 for each node i from 1 to 16
  • parameters for the second hidden layer, w[2] [i] and b[ 2] [i] are vectors of length 16 for each node i from 1 to 16
  • parameters for the output layer, w[3] [l] and b[3] [1] are also vectors of length 16.
  • the vector y MH of raw scores may be converted into a vector f MH of MH scores.
  • an exponential linear model may be used to convert the raw scores into the MH scores. For instance, the following formula may be used:
  • n is an index for a microhomology (and thus a number between 1 and N), and DL ⁇ n ] is the deletion length of the microhomology n.
  • 0.25 may be a hyperparameter value chosen to improve training speed by appropriate scaling.
  • the vector y of raw scores may be used directly as MH scores.
  • FIG. 4B shows an illustrative neural network 400B for computing MH-less scores, in accordance with some embodiments.
  • the neural network 400B may be used in the illustrative MH-less deletion module 310 in the example of FIG. 3A, and may be trained at act 365 of the illustrative process 350 shown in FIG. 3B.
  • deletion length may be modeled explicitly as an input to the neural network 400B.
  • an input node of the neural network 400B may receive a deletion length vector, [1, 2, , 60].
  • the neural network 400B may include one or more hidden layers, each having one or more nodes.
  • the neural network 400B has two hidden layers that are similarly constructed as the illustrative neural network 400A in the example of FIG. 4A.
  • aspects of the present disclosure are not limited to the use of a similar construction between the neural network 400A and the neural network 400B.
  • the neural network 400B may have an output node producing a vector i MH -i ess °f scores. There may be one score for each deletion length L of interest. Thus, in an example where the set of deletion lengths of interest is 1 -60, the length of the vector yMH-iess may be 60.
  • an exponential linear model may be used to convert the vector Y Mil -less into a vector (pMii-iess of MH-less scores. For instance, the following formula may be used:
  • L is a deletion length of interest.
  • aspects of the present disclosure are not limited to the use of any particular hyperparameter value for exponential conversion, or any conversion at all.
  • FIG. 4C shows an illustrative process 400C for training two neural networks jointly, in accordance with some embodiments.
  • the process 400C may be used to jointly train the illustrative neural networks 400A and 400B of FIGs. 4A-4B.
  • a microhomology n is said to be full if the length of the microhomology n is the same as the deletion length associated with the microhomology n.
  • a frequency may be predicted as follows, out of all MH deletion genotypes.
  • DL[m] denotes the deletion length of the microhomology m
  • a full microhomology only a single deletion genotype is possible for the entire deletion length.
  • the single genotype may be generated via different pathways, such as MMEJ and MH-less end-joining. Therefore, full microhomologies may be modeled as receiving contributions from MH-dependent and an MH-less mechanisms.
  • a frequency may be predicted as follows, out of all MH deletion genotypes.
  • V MHG i s a probability distribution over all microhomologies identified for the set of deletion lengths of interest, and hence also a probability distribution over all MH deletions.
  • a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all deletions, taking into account contributions from MH-dependent and MH-less mechanisms.
  • DL[m] denotes the deletion length of the microhomology m
  • the parameters w[h ] and b[h ⁇ for each hidden layer h of the neural networks 400A and 400B may be trained using a gradient descent method with L2-loss:
  • V ⁇ HG is an observed probability distribution on MH deletion genotypes
  • V GL is an observed probability distribution on deletion lengths (e.g., based on repair genotype data collected from CRISPR/Cas9 experiments).
  • multiple instantiations of the neural networks 400A and 400B may be trained with different loss functions. For instance, in addition to, or instead of L2-loss, a squared Pearson correlation function may be used.
  • Loss -(pearsonr(V MHG [m ⁇ , V ⁇ HG [m ⁇ )) 2 -(pear soar (V MHG [m ⁇ , V ⁇ HG [m ⁇ )) 2
  • the function pearsonr(x, y) may be defined as follows for length N vectors x and y, where x and y denote the averages of x and y, respectively.
  • neural networks are used in the examples shown in FIGs. 4A-4C, it should be appreciated that aspects of the present disclosure are not so limited. For instance, in some embodiments, one or more other types of machine learning techniques, such as linear regression, non-linear regression, random-forest regression, etc., may be used additionally or alternatively.
  • machine learning techniques such as linear regression, non-linear regression, random-forest regression, etc.
  • one or more neural networks that are different from the neural networks 400A and 400B may be used additionally or alternatively.
  • batch normalization may be performed at one or more hidden layers.
  • a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all MH deletion genotypes.
  • the neural network 400A may be trained independently.
  • one or more other probability distributions may be predicted in addition to, or instead of V MHG and V DL .
  • a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all deletion genotypes (both MH and MH-less).
  • a frequency may be predicted as follows, out of all deletion genotypes (both MH and MH-less).
  • a frequency may be predicted as follows, out of all deletion genotypes (both MH and MH-less).
  • DL [n] denotes the deletion length of the microhomology n.
  • a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all MH-less deletion genotypes.
  • a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all deletion genotypes (both MH and MH-less).
  • Any one or more of the above predicted probability distributions may be used to train the neural networks 400A and 400B, with some suitable loss function.
  • FIG. 4D shows an illustrative implementation of the insertion module 315 shown in FIG. 3 A, in accordance with some embodiments.
  • the insertion module 315 includes two models.
  • an insertion rate model 405 may be constructed to predict, given an input DNA sequence and a cut site, a frequency of 1 base pair insertions out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
  • an insertion base pair model 410 may be constructed to predict frequencies of 1 base pair insertion genotypes (i.e., A, C, G,
  • T again out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
  • +1 to -60 indels i.e., 1 base pair insertions and 1-60 base pair deletions.
  • aspects of the present disclosure are not limited to any particular set of indels.
  • a small set of indels e.g., 1 base pair insertions and 1-28 base pair deletions may be considered, for instance, when less training data is available.
  • the insertion rate model 405 may have one or more input features, which may be encoded as an M-dimensional vector of values for some suitable M.
  • the insertion rate model 405 may have at least one output value.
  • a set of training data for the insertion rate model 405 may include a plurality of M-dimensional training vectors and respective output values. Given an M-dimensional query vector, a k-nearest neighbor (k-NN) algorithm with weighting by inverse distance may be used to compute a predicted output value for the query vector.
  • k-NN k-nearest neighbor
  • x is the query vector
  • d is a distance function for the M-dimensional vector space
  • x ⁇ l ⁇ , , x [5] are the five closest training vectors
  • y ⁇ 1] ... , y[5] are the output values
  • y is the predicted output value for the query vector x.
  • aspects of the present disclosure are not limited to the use of any particular k, or to the use of any k-NN algorithm.
  • any one or more of the following techniques, and/or Bayesian variants thereof may be used in addition to, or instead of k-NN: gradient-boosted regression, linear regression, nonlinear regression, multilayer perceptron, deep neural network, etc.
  • any suitable distance metric d may be used, such as Euclidean distance.
  • the insertion rate model 405 may have three input features: overall deletion score, precision score, and one or more cut site nucleotides. The overall deletion score may be computed based on outputs of the MH deletion module 305 and the MH-less deletion module 310 in the example of FIG. 3A, for instance, as follows.
  • log(0) may be used as the overall deletion score.
  • the precision score may be indicative of an amount of entropy in predicted frequencies of a suitable set of deletion lengths.
  • the inventors have recognized and appreciated that it may be desirable to calculate precision based on a large set of deletion lengths, but in some instances a smaller set (e.g., 1-28) may be used due to one or more constraints associated with available data.
  • a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all deletions, taking into account contributions from MH-dependent and MH-less mechanisms.
  • DL[m] denotes the deletion length of the microhomology m
  • the precision score may be computed as follows.
  • the one or more cut site nucleotides may include nucleotides on either side of the cut site (i.e., seq[— 1] and seq[0]).
  • the cut site nucleotides are G and G, which are the third and fourth nucleotides to the left of the PAM sequence 115.
  • aspects of the present disclosure are not limited to the use of two cut side nucleotides as input features to the insertion rate model 405.
  • one cut side nucleotide e.g., seq[— 1], which may be the fourth nucleotide to the left of the PAM sequence
  • more than two cut side nucleotides e.g., seq[— 2], seq[— 1], and seq[0], which may be the third, fourth, and fifth nucleotides to the left of the PAM sequence
  • one or more input features to the insertion rate model 405 may be encoded in some suitable manner.
  • the one or more cut site nucleotides may be one- hot encoded, for example, as follows.
  • encoded input features may be concatenated to form an input vector.
  • an input vector may have a length of 10: four for each of the two cut side nucleotides, one for the precision score, and one for the overall deletion score.
  • training data for a certain input DNA sequence may be organized into a matrix X.
  • each column in the matrix (X[—,j]) may be normalized to mean 0 and variance 1, as follows.
  • values in a query vector may be normalized in a like fashion. For instance, a y ' th value in a query vector x may be normalized as follows.
  • an output value may be computed for each row in the training matrix X.
  • an output value F[i], i corresponding to a possible cut site may be a frequency of observed 1 base pair insertions, relative to all observed +1 to -60 indels, at that cut site.
  • more than one cut site nucleotides may be considered.
  • a frequency of 1 base pair insertion genotype A, out of all +1 to - 60 indels may be predicted as follows. Frequencies for the other three insertion genotypes may be predicted similarly.
  • a frequency of 1 base pair insertions, out of all +1 to -60 indels may be predicted as follows.
  • a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions), taking into account contributions from MH-dependent and MH-less mechanisms.
  • a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions). V DL+ins [L]
  • a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
  • a frequency may be predicted as follows, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
  • a frequency may be predicted as follows, out of all all +1 to 60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
  • DL[n] denotes the deletion length of the microhomology n.
  • FIG. 5 shows an illustrative process 500 for processing data collected from CRISPR/Cas9 experiments, in accordance with some embodiments.
  • the process 500 may be performed for each input DNA sequence and CRISPR/Cas9 cut site, and a resulting dataset may be used to train the illustrative computational models described in connection with FIGs. 4A-4D.
  • repair genotypes observed from CRISPR/Cas 9 experiments may be aligned with an original DNA sequence. Any suitable technique may be used to observe the repair genotypes, such as Illumina DNA sequencing. Any suitable alignment algorithm may be used for alignment, such as a Needleman-Wunsch alghorithm with some suitable scoring parameters (e.g., +1 for match, -2 for mismatch, -4 for gap open, and -1 for gap extend, or +1 for match, -1 for mismatch, -5 for gap open, and -0 for gap extend).
  • one or more filter criteria may be applied to alignment reads from act 505.
  • a deletion includes at least one base directly 5’ or 3’ of the CRISPR/Cas9 cut site are considered. This may filter out deletions that are unlikely to have resulted from CRISPR/Cas9.
  • frequencies of indels of interest may be normalized into a probability distribution.
  • FIG. 6 shows an illustrative process 600 for using a machine learning model to predict frequencies of indel genotypes and/or indel lengths, in accordance with some embodiments.
  • Acts 605 and 610 may be similar to, respectively, acts 355 and 360 of the illustrative process 350 of FIG. 3B, except that acts 605 and 610 may be performed for an input DNA sequence seq and a cut site location for which repair genotype data from an CRISPR/Cas9 experiment may not be available.
  • one or more machine learning models such as the machine learning models trained at act 365 of the illustrative process 350 of FIG. 3B, may be applied to an output of act 610 to compute a frequency distribution over deletion lengths of interest.
  • FIG. 7 shows illustrative examples of a blunt-end cut and a staggered cut, in accordance with some embodiments.
  • FIG. 8A shows an illustrative plot 800A of predicted repair genotypes, in accordance with some embodiments.
  • the plot 800A may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to the example shown in FIG. 1.
  • Each vertical bar may correspond to a deletion length, and a height of the bar may correspond to a predicted frequency of that deletion length.
  • the lighter color may indicate repair genotypes that successfully eliminate the donor splice site motif, whereas the darker color may indicate failure.
  • about 90% of repair products in the 3-26 base pair deletion class are predicted to be successful for the illustrative local sequence context and cut site shown in FIG. 1.
  • the 3-26 base pair deletion class may occur as frequently as 50%, for example, when assaying selected sequences (e.g., patient genotypes underlying certain diseases) integrated into the genome of mouse embryonic stem cells, with a l4-day exposure to CRISPR/Cas9.
  • selected sequences e.g., patient genotypes underlying certain diseases
  • CRISPR/Cas9 a genetic editing approach using CRISPR-Cas9 may be provided that achieves a desired result with a 45% rate.
  • genetic editing using HDR may achieve a success rate of 10% or lower, and may require a more complex
  • FIG. 8B shows another illustrative plot 800B of predicted repair genotypes, in accordance with some embodiments.
  • the plot 800B may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to an illustrative DNA sequence 805B, which may be associated with spinal muscular atrophy (SMA).
  • SMA spinal muscular atrophy
  • a specific single nucleotide polymorphism (SNP) in exon 7 of the SMA2 gene may induce exon skipping of exon 7, erroneously including exon 8 instead.
  • Exon 8 includes a protein degradation signal (namely, EMLA-STOP, as shown in FIG. 8B), which causes degradation in the SMA2 gene product, thereby inducing spinal muscular atrophy.
  • EMLA-STOP protein degradation signal
  • a disease genotype must have precisely EMLA-STOP. Nearly any other genotype is considered healthy.
  • each vertical bar corresponds to a deletion length
  • a height of the bar corresponds to a predicted frequency of that deletion length.
  • the lighter color may indicate repair genotypes that successfully disrupt the EMLA-STOP signal, whereas the darker color may indicate failure.
  • over 90% of repair products in the 3-26 base pair deletion class are predicted to be healthy.
  • FIG. 8C shows another illustrative plot 800C of predicted repair genotypes, in accordance with some embodiments.
  • the plot 800C may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to an illustrative DNA sequence associated with breast-ovarian cancer.
  • a clinical observed patient genotype includes an abnormal duplication of 14 base pairs that a wild type sequence from a normal/health individual lacks.
  • FIG. 8D shows a microhomology identified in the example of FIG. 8C.
  • predicting frequencies of deletion lengths As discussed above, the inventors have recognized and appreciated at least two tasks of interest: predicting frequencies of deletion lengths, as well as predicting frequencies of repair genotypes.
  • a single machine learning model may be provided that performs both tasks.
  • repair genotypes corresponding to a deletion of length L may be labeled as follows: for every integer K ranging from 0 to L, a K- genotype associated with deletion length L may be obtained by concatenating left[L] [—inf: K] with right[L] [K: +inf].
  • a vector COLLECTION of length Q where each element is a tuple (K, L) may be constructed by enumerating each /Ggenotype for each deletion length L of interest and removing tuples that have the same repair genotype, e.g., (k', L) and ( k , L) such that left[L] [—inf: k'] concatenated with right[L] [k' ⁇ +inf] is equivalent to left[L] [—inf ⁇ k] concatenated with
  • a training data set may be constructed using observational data by constructing a vector X of length Q where X sums to 1 and X ⁇ c( ⁇ represents an observed frequency of a repair genotype generated by
  • the vector COLLECTION may be featurized. This may be performed for a given tuple ( k , l ) by determining whether there is an index i such that match[l] [i : k] is a microhomology. If no such i exists, then the tuple ( k , l ) may be considered to not partake in microhomology.
  • the inventors have recognized and appreciated that frequencies of repair products may be influenced by certain features of microhomologies such as microhomology length, fraction of G- C pairings, and/or deletion length.
  • the inventors have also recognized and appreciated that some default values may be useful for repair genotypes that are considered to not partake in
  • the inventors have recognized and appreciated that energetic stability of a microhomology may increase proportionately with a length of the microhomology. Accordingly, in some embodiments, the microhomology length k— i may be used for a tuple ( k , l ), and a default value of 0 may be used if ( k , l ) does not partake in microhomology.
  • thermodynamic stability of a microhomology may depend on specific base pairings, and that G-C pairings have three hydrogen bonds and therefore have higher thermodynamic stability than A-T pairings, which have two hydrogen bonds.
  • a GC fraction as shown below, may be used as a feature for ( k , l ), where indicator(boolean ) equals 1 if boolean is true, and 0 otherwise.
  • a default value of— 1 may be used if ( k , l ) does not partake in
  • a feature for deletion length may be considered, represented as l for the tuple (/c, /).
  • FIG. 9 shows another illustrative neural network 900 for computing a frequency distribution over deletion lengths, in accordance with some embodiments.
  • the neural network 900 may be parameterized by w[h ] and b[h] for each hidden layer h. In some embodiments, these parameters may be initialized randomly, for example, from a spherical Gaussian distribution with some suitable center (e.g., 0) and some suitable variance (e.g., 0.1). These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments.
  • the neural network 900 may operate independently for each microhomology, taking as input the length of that microhomology (from the first input node), the GC fraction of that microhomology (from the second input node), Boolean features for 0 and Z- genotypes (from the third and fourth input node, where N-flag corresponds to Z-genotypes), and the length of the deletion (from the fifth input node), transforming those five values into 16 values (at the first hidden layer), then transforming those 16 values into 16 other values (at the second hidden layer), and finally outputting a single value (at the output node).
  • parameters for the first hidden layer, w[l] [/] and b[l] [/] are vectors of length 5 for each node Z from 1 to 16
  • parameters for the second hidden layer, w[2] [/] and b[2] [i] are vectors of length 16 for each node i from 1 to 16
  • parameters for the output layer, w[3] [l] and b[3] [1] are also vectors of length 16.
  • the neural network 900 may be applied independently (e.g., as discussed above) to each featurized ( k , l ) in COLLECTIONS to produce a vector of Q microhomology scores called Z.
  • Z may be normalized into a probability distribution over all unique repair genotypes of interest within all deletion lengths of interest (e.g., deletion lengths between 3 and 26).
  • the inventors have recognized and appreciated (e.g., from experimental data) that frequency may decrease exponentially with deletion length.
  • an exponential linear model may be used to normalize the vector of repair genotype scores. For example, the following formula may be used:
  • a probability distribution Y over all unique repair genotypes of interest within all deletion lengths of interest may be converted to a probability distribution Y' over all deletion lengths.
  • the following formula may be used for this:
  • the parameter beta may be initialized to—1. These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments.
  • the parameters w[h ] and b[h ⁇ for each hidden layer h and the parameters beta may be trained by using a gradient descent method with L2-loss on Y :
  • predY is a predicted probability distribution on deletion lengths (e.g., as computed by the neural network 900 using current parameter values)
  • obsY is an observed probability distribution on deletion lengths (e.g., based on repair genotype data collected from CRISPR/Cas9 experiments).
  • one or more of the techniques described herein may be used to identify therapeutic guide RNAs that are expected to produce a therapeutic outcome when used in combination with a genomic editing system without an HDR template. For instance, one or more of the techniques described herein may be used to identify a therapeutic guide RNA that is expected to result in a substantial fraction of genotypic
  • a therapeutic guide RNA may be used singly, or in combination with other therapeutic guide RNAs.
  • An action of the therapeutic guide RNA may be independent of, or dependent on, one or more genomic consequences of the other therapeutic guide RNAs.
  • FIG. 10 shows, schematically, an illustrative computer 1000 on which any aspect of the present disclosure may be implemented.
  • the computer 1000 includes a processing unit 1001 having one or more processors and a non-transitory computer-readable storage medium 1002 that may include, for example, volatile and/or non volatile memory.
  • the memory 1002 may store one or more instructions to program the processing unit 1001 to perform any of the functions described herein.
  • the computer 1000 may also include other types of non-transitory computer-readable medium, such as storage 1005 (e.g., one or more disk drives) in addition to the system memory 1002.
  • the storage 1005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 1002.
  • the computer 1000 may have one or more input devices and/or output devices, such as devices 1006 and 1007 illustrated in FIG. 10. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 1007 may include a microphone for capturing audio signals, and the output devices 1006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
  • the computer 1000 may also comprise one or more network interfaces (e.g., the network interface 1010) to enable communication via various networks (e.g., the network 1020).
  • networks include a local area network or a wide area network, such as an enterprise network or the Internet.
  • Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • the above-described embodiments of the present disclosure can be implemented in any of numerous ways.
  • the embodiments may be implemented using hardware, software or a combination thereof.
  • the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
  • the concepts disclosed herein may be embodied as a non-transitory computer-readable medium (or multiple computer-readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non- transitory, tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the present disclosure discussed above.
  • the computer-readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present disclosure as discussed above.
  • program or“software” are used herein to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present disclosure as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present disclosure.
  • Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • functionality of the program modules may be combined or distributed as desired in various embodiments.
  • data structures may be stored in computer-readable media in any suitable form.
  • data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields.
  • any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
  • a computational model described herein is trained with experimental data as outlined in Example 1.
  • the method outlined in Example 1 for training a computational model with experimental data is meant to be non-limiting.
  • the specification discloses a method for training a computational model described herein, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cognate guide RNA, wherein each nucleotide target sequence comprises a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a Cas-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
  • the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a DSB-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
  • nucleic acid libraries Methods for preparing nucleic acid libraries, vectors, host cells, and sequencing methods are well known in the art. The instant description is not meant to be limiting in any way as to the construction and configuration of the libraries described herein for training the computational model.
  • the specification provides in one aspect a method of introducing a desired genetic change in a nucleotide sequence using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the desired genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site.
  • DSB double-strand brake
  • a cut site can be at any position in a nucleotide sequence and its position is not particularly limiting.
  • the nucleotide sequence into which a genetic change is desired is not intended to have any limitations as to sequence, source, or length.
  • the nucleotide sequence may comprise one or more mutations, which can include one or more disease-causing mutations.
  • the specification provides a method of treating a genetic disease by correcting a disease-causing mutation using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence comprising a disease-causing mutation; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for correcting the disease-causing mutation in the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby correcting the disease-causing mutation and treating the disease.
  • DSB double-strand brake
  • the specification provides a method of altering a genetic trait by introducing a genetic change in a nucleotide sequence using a double-strand brake (DSB)- inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site and consequently altering the associated genetic trait.
  • DSB double-strand brake
  • the specification provides a method of selecting a guide RNA for use in a Cas-genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising: identifying in a nucleotide sequence of a target genomic location one or more available cut sites for a Cas-based genome editing system; and analyzing the nucleotide sequence and cut site with a computational model to identify a guide RNA capable of introducing the genetic change into the nucleotide sequence of the target genomic location.
  • the specification provides a method of introducing a genetic change in the genome of a cell with a Cas-based genome editing system comprising: selecting a guide RNA for use in the Cas-based genome editing system in accordance with the method of the above aspect; and contacting the genome of the cell with the guide RNA and the Cas-based genome editing system, thereby introducing the genetic change.
  • the cut sites available in the nucleotide sequence are a function of the particular DSB-inducing genome editing system in use, e.g., a Cas-based genome editing system.
  • the nucleotide sequence is a genome of a cell.
  • the method for introducing the desired genetic change is done in vivo within a cell or an organism (e.g., a mammal), or ex vivo within a cell isolated or separated from an organism (e.g., an isolated mammalian cancer cell), or in vitro on an isolated nucleotide sequence outside the context of a cell.
  • the DSB-inducing genome editing system can be a Cas-based genoe editing system, e.g., a type II Cas-based genome editing system.
  • the DSB-inducing genome editing system can be a TALENS-based editing system or a Zinc- Finger-based genome editing system.
  • the DSB-inducing genome editing system can be any such endonuclease-based system which catalyzes the formation of a double-strand break at a specific one or more cut sites.
  • the method can further comprise selecting a cognate guide RNA capable of directing a double-strand break at the optimal cut site by the Cas-based genome editing system.
  • the guide RNA is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6. In various embodiments, the guide RNA can be known or can be newly designed.
  • the double-strand brake (DSB)-inducing genome editing system is capable of editing the genome without homology-directed repair.
  • the double-strand brake (DSB)-inducing genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system may comprise a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof in certain embodiments.
  • the double-strand brake (DSB)-inducing genome editing system can comprise a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
  • the double-strand brake (DSB)-inducing genome editing system can comprise a Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof.
  • the desired genetic change to be introduced into the nucleotide sequence e.g., a genome, is to a correction to a genetic mutation.
  • the genetic mutation is a single-nucleotide polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
  • the genetic change can comprises a 2-60-bp deletion or a l-bp insertion.
  • the genetic change in other embodiments can comprise a deletion of between 2-20, or 4- 40, or 8-80, or 16-160, or 32-320, 64-640, or up to 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10,000 or more nucleotides.
  • the deletion can restore the function of a defective gene, e.g., a gain-of-function frameshift genetic change.
  • the desired genetic change is a desired modification to a wildtype gene that confers and/or alters one or more traits, e.g., conferring increased resistance to a pathogen or altering a monogenic trait (e.g., eye color) or polygenic trait (e.g., height or weight).
  • a monogenic trait e.g., eye color
  • polygenic trait e.g., height or weight
  • the disease can be a monogenic disease.
  • monogenic diseases can include, for example, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
  • the step of identifying the available cut sites can involve identifying one or more PAM sequences in the case of a Cas-based genome editing system.
  • the computational model used to analyze the nucleotide sequence is a deep learning computational model, or a neural network model having one or more hidden layers.
  • the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site.
  • the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
  • the computational model comprises one or more training modules for evaluating experimental data.
  • the computational model can comprise: a first training module for computing a microhomology score matrix; a second training module for computing a microhomology independent score matrix; and a third training module for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
  • the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
  • the genomic repair outcomes can comprise microhomology deletions, microhomology- less deletions, and/or l-bp insertions.
  • the computational model can comprise one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site, and type of DSB- genome editing system.
  • the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or even up to 7K, 8K, 9K, 10K, 11K, 12K, 13K, 14K, 15K, 16K, 17K, 18K, 19K, 20K nucleotides, or more in length.
  • the specification relates to guide RNAs which are identified by various methods described herein.
  • the guide RNAs can be any of those presented in Tables 1-6, the contents of which form part of this specification.
  • the RNA can be purely ribonucleic acid molecules.
  • the RNA guides can comprise one or more naturally-occurring or non-naturally occurring modifications.
  • the modifications can including, but are not limited to, nucleoside analogs, chemically modified bases, intercalated bases, modified sugars, and modified phosphate group linkers.
  • the guide RNAs can comprise one or more phosphorothioate and/or 5’-N-phosphporamidite linkages.
  • the specification discloses vectors comprising one or more nucleotide sequences disclosed herein, e.g., vectors encoding one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
  • the vectors may comprise naturally occurring sequences, or non-naturally occurring sequences, or a combination thereof.
  • the specification discloses host cells comprising the herein disclosed vectors encoding one more more nucleotide sequences embodied herein, e.g., one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
  • the specification discloses a Cas-based genome editing system comprising a Cas protein (or homolog, variant, or orthologue thereof) complexed with at least one guide RNA.
  • the guide RNA can be any of those disclosed in Tables 1-6, or a functional variant thereof.
  • the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
  • the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
  • the specification provides a library for training a computational model for selecting a guide RNA sequence for use with a Cas-based genome editing system capable of introducing a genetic change into a genome without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a first nucleotide sequence of a target genomic location having a cut site and a second nucleotide sequence encoding a cognate guide RNA capable of directing a Cas-based genome editing system to carry out a double-strand break at the cut site of the first nucleotide sequence.
  • the specification provides a library and its use for training a computational model for selecting an optimized cut site for use with a DSB -based genome editing system (e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system) that is capable of introducing a desired genetic change into a nucleotide sequence (e.g., a genome) at the selected cut site without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second nucleotide sequence encoding a cognate guide RNA (in embodiments involving a Cas- based genome editing system).
  • a DSB -based genome editing system e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system
  • the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second
  • the concepts disclosed herein may be embodied as a method, of which an example has been provided.
  • the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
  • DNA double-strand break repair following cleavage by Cas9 is generally considered stochastic, heterogeneous, and impractical for applications beyond gene disruption.
  • template-free Cas9 nuclease-mediated DNA repair is predictable in human and mouse cells and is capable of precise repair to a predicted genotype in certain sequence contexts, enabling correction of human disease-associated mutations.
  • a genomically integrated library of guide RNAs (gRNAs) was constructed, each paired with its corresponding DNA target sequence, and trained a machine learning model, inDelphi, on the end-joining repair products of 1,095 sequences cleaved by Cas9 nuclease in mammalian cells.
  • the inDelphi model predicts that 26% of all Streptococcus pyogenes Cas9 (SpCas9) gRNAs targeting the human genome result in outcomes in which a single predictable product accounts for >30% of all edited products, while 5% of gRNAs are “high-precision guides” that result in repair outcomes in which one product accounts for >50% of all edited products.
  • this study developed a high-throughput Streptococcus pyogenes Cas9 (SpCas9)-mediated repair outcome assay to characterize end-joining repair products at Cas9- induced double-stranded breaks using 1,872 target sites based on sequence characteristics of the human genome.
  • the study used the resulting rich set of repair product data to train the herein disclosed machine-learning algorithm (i.e., inDelphi), which accurately predicts the frequencies of the substantial majority of template-free Cas9-induced insertion and deletion events at single base resolution (which is further described in M. Shen et al.,“Predictable and precise template- free CRISPR editing of pathogenic variants,” Nature, vol. 563, November 29, 2018, pp.
  • this study further uses inDelphi to design 14 gRNAs for high-precision template-free editing yielding predictable 1 -bp insertion genotypes in endogenous human disease-relevant loci and experimentally confirmed highly precise editing (median 61% among edited products) in two human cell lines.
  • inDelphi was used to reveal human pathogenic alleles that are candidates for efficient and precise template-free gain-of-function genotypic correction and achieved template-free correction of 183 pathogenic human microduplication alleles to the wild-type genotype in >50% of all editing products.
  • a genome-integrated gRNA and target library screen was designed in which many unique gRNAs are paired with corresponding 55-bp target sequences containing a single canonical“NGG” SpCas9 protospacer-adjacent motif (PAM) that directs cleavage to the center of each target sequence (FIG. 11 A).
  • PAM SpCas9 protospacer-adjacent motif
  • 1,872 target sequences were computationally designed that collectively span the human genome’s distributions of % GC, number of nucleotides participating in microhomology, predicted Cas9 on-target cutting efficiency 4 , and estimated precision of deletion products 24 (FIGs.
  • Lib-A was stably integrated into the genomes of mouse embryonic stem cells (mESCs). Next, these cells were targeted with a Tol2 transposon-based SpCas9 expression plasmid containing a blasticidin expression cassette and selected for cells with stable Cas9 expression. Sufficient numbers of cells were maintained throughout the experiment to ensure >2, 000-fold coverage of the library. After one week, genomic DNA was collected from three independent replicate experiments from these cells and performed paired-end high-throughput DNA sequencing (HTS) using primers flanking the gRNA and the target site to reveal the spectrum of repair products at each target site. Using a sequence alignment procedure, the resulting
  • Lib-A included the 55-bp sequences surrounding 90 endogenous genomic loci for which the products of Cas9-mediated repair were previously characterized by HTS 24 .
  • Previously reported repair products from this endogenous dataset (VO) in three human cell lines (HCT116, K562, and HEK293) reveal that 94% of endogenous Cas9-mediated deletions are 30 bp or shorter (FIGs. 16A-16C), suggesting that the Lib-A analysis method is capable of assessing the vast majority of Cas9-mediated editing products.
  • end-joining repair of Cas9-mediated double-strand breaks primarily causes deletions (73-87% of all products) and insertions (13-25% of all products) (FIGs. 11B, 11C, FIGs. 17A-17D).
  • Rarer Cas9-mediated repair products were also detected such as combination insertion/deletions (0.5-2% of all products) and deletions and insertions distal to the cutsite (3-5% of all products), which occur more often on the PAM-distal side of the double strand break ( FIGs. 17A-17D).
  • the majority of products are deletions containing
  • FIGs. 11B, 11C, FIGs. 17A-17D for a definition of microhomology-containing deletions.
  • microhomology-less deletions MH-less deletions
  • l-bp insertions FIG. 12A
  • These three repair classes are defined as constituting all major editing outcomes and note that they comprise 80-95% of all observed editing products (FIGs. 11B, 11C).
  • a deep neural network was designed to predict MH deletions as one module of inDelphi. This module simulates MH deletions using the MMEJ repair mechanism, where 5’-to-3’ end resection at a double-strand break reveals two 3’ ssDNA overhangs that can anneal through sequence microhomology.
  • inDelphi models MH deletions as a competition between different MH-mediated hybridization possibilities. Using the input features of MH length, MH %GC, and deletion length, inDelphi outputs a score (phi) reflecting the predicted strength of each microhomology (FIG. 12A). From training data, inDelphi learned that strong microhomologies tend to be long and have high GC content (FIGs. 18A-18H).
  • inDelphi also contains a second neural network module that predicts the distribution of MH-less deletion lengths using the minimum required resection length as the only input feature (FIG. 12A). Because there are many MH-less genotypes for each deletion length with frequencies that do not fit a simple pattern, inDelphi predicts the frequencies of deletion lengths but not of genotypic outcomes for MH-less deletions. This module learned from training data that the frequency of MH-less deletions decays rapidly with increasing length (FIGs. 18A-18H). It is hypothesized that MH-less deletions arise primarily from the activity of the classical and alternative NHEJ pathways 27 . The two neural networks were jointly trained using observed distributions of deletion genotypes from 1,095 Lib-A target sites (FIG. 12A).
  • the inDelphi model contains a third module to predict l-bp insertions (FIG. 12A).
  • insertions represent a major class of DNA repair at Cas9-mediated double-strand breaks (13-25% of all products, FIGs. 11B, 11C, FIGs. 17A-17D).
  • l-bp insertions are dominant (9-21% of all products, FIGs. 11B, 11C, FIGs. 17A-17D).
  • l-bp insertions predominantly comprise duplications of the -4 nucleotide (counting the NGG PAM as nucleotides 0-2, FIG. 12A), with higher precision when the -4 nucleotide is an A or T and with lower precision when it is a C or G (FIG. 12C). While l-bp insertions were observed occurring in 9% of products on average in Lib-A, this frequency varies significantly depending on the nucleotide at position -4, falling to less than 4% on average when the -4 nucleotide is G (LIG.12D, P ⁇ 10 V ).
  • inDelphi models insertions and deletions as competitive processes in which the total deletion phi score (overall microhomology strength) and predicted deletion precision influence the relative frequency of l-bp insertions, and the local sequence context influences the relative frequency and genotypic outcomes of l-bp insertions (PIG. 12A).
  • inDelphi integrates these factors into predictions of l-bp insertion genotype frequencies using a k-nearest neighbor approach.
  • the MH length feature was deleted from the inDelphi MH deletion module and found that inDelphi’s performance predicting genotype frequency was reduced to the performance of a model with random weights.
  • a second control in which the deep neural networks were replaced with linear models showed 10-24% reduced performance on the genotype frequency and indel length prediction tasks. Together, these controls indicate that inDelphi’s computational structure is important for its accuracy.
  • inDelphi facilitates Cas9-mediated gene knockout approaches by allowing a priori selection of gRNAs that induce high or low knockout frequencies.
  • an online tool is provided to predict frameshift frequencies for any SpCas9 gRNA targeting the coding human and mouse genome (cris screwlphi.design). It is noted that human exons have a significant tendency (p ⁇ 10 100 ,
  • FIGs. 19A-19D to favor frame -preserving deletion repair compared to shuffled exon sequences or non-coding human DNA.
  • inDelphi provides accurate single-base resolution predictions for the relative frequencies of most Cas9 nuclease-mediated end-joining repair outcomes, including frameshifts.
  • Microduplications in which a stretch of DNA is repeated in tandem, contain stretches of exact microhomology and thus are predicted by inDelphi to collapse precisely through deletion upon MMEJ repair (FIG. 14A).
  • a second high-throughput Cas9 substrate library (Lib-B - see Table 5) was designed and constructed that contains three families of target sequences with microduplications of each length from 7-25 bp.
  • Cas9-mediated double-strand break repair products were analyzed in Lib-B in mESCs and in human U20S and HEK293T cells using the same procedure as for Lib-A evaluation. Highly precise repair was consistently observed in which 40-80% of all repair events correspond to a single repair genotype (FIG. 14B), substantially higher than the 21% median frequency of the most abundant deletion genotype in 90 VO sites that were not pre-selected for microhomology. The fraction of microduplication repair to a single collapsed product as compared to other outcomes increased with
  • Lib-B three target sequence frameworks with low total phi scores were included in Lib-B (FIGs. 20A-20E) containing randomization at the four positions surrounding the Cas9 cleavage site (positions - 5 to -2 with respect to the PAM at positions 0-2; see FIG. 12A).
  • l-bp insertions comprise a median of 29% of all repair products, which is significantly higher than in VO sites (FIGs. 20A-20E). Moreover, a median of 61% of all products are l-bp insertions at sites with TG at the - 4 and -3 positions (FIG. 14D), revealing that precise l-bp insertion can be obtained through Cas9-mediated end-joining at specific, predictable sequence contexts.
  • inDelphi was used to predict gRNAs that lead to such precise outcomes.
  • inDelphi was then used to discover SpCas9 gRNAs that support precise end joining repair in the human genome. It was found that substantial fractions of all genome-targeting SpCas9 gRNAs are predicted to produce relatively precise outcomes (Table 2).
  • inDelphi-classified high-precision gRNAs were used to identify new targets for therapeutic genome editing.
  • inDelphi was tasked with identifying pathogenic alleles that are suitable for template-free Cas9-mediated editing to effect precise gain-of-function repair of the pathogenic genotype.
  • Two genetic disease allele categories that have not been previously identified as targets for Cas9-mediated repair are predicted by inDelphi to be candidates for high-precision repair.
  • the first category is a selected subset of pathogenic frameshifts in which, because of high-precision repair, inDelphi predicts that 50-90% of Cas9-mediated deletion products will correct the reading frame compared to the average frequency of 34% among all disease-associated frameshift mutations.
  • the second category is pathogenic microduplication alleles in which a short sequence duplication leads to a frameshift or loss-of-function protein sequence changes (FIG. 15 A).
  • the frequency of MH-dependent deletion repair is substantially increased (58% to 72%) in Prkdc _/_ Lig4 _/_ mESCs, enabling a subset of pathogenic alleles to be repaired to wild- type with strikingly high precision.
  • wild-type mESCs 183 pathogenic alleles are repaired to wild-type in >50% of all edited products and 11 pathogenic alleles are repaired to wild-type in >70% of all edited products
  • Prkdc 7 Lig4 7 mESCs 286 pathogenic alleles are repaired to wildtype in >50% of all edited products and 153 pathogenic alleles are repaired to wild-type in >70% of products (FIG. 15D, Table 6).
  • hypercholesterolemia 36 Five pathogenic LDLR microduplication alleles were separately introduced within a full-length LDLR coding sequence upstream of a P2A-GFP cassette into the genome of human and mouse cells, such that Cas9-mediated repair to the wild-type LDLR sequence should induce phenotypic gain of LDL uptake and restore the reading frame of GFP. Cas9 and a gRNA that is specific to each pathogenic allele and does not target the wild-type repaired sequence were then deleivered.
  • HTS confirms efficient genotypic repair to wild-type of these five LDLR microduplication alleles in human and mouse cells as well as of three other pathogenic microduplication alleles in the GAA, GLB 1, and PORCN genes introduced to cells using the same method (Table 1, Table 3).
  • Table 1 Repair of microduplication pathogenic alleles through template-free Cas9- nuclease treatment.
  • Table 2 Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision.
  • genotype comprises XX% of a!i major editing products
  • Precision gRNA is a ⁇ : ⁇ !: ⁇ >: ⁇ a 1-bp insertion product
  • Table 3 Repair of eight pathogenic microduplication alleles in individual cellular experiments.
  • MO SS MB SB SO MO MO MB
  • Table 4 Lib-A sequences (presented below between the end of this specification and Table 5).
  • Table 5 Lib-B sequences (presented below between Table 4 and Table 6).
  • Table 6 inDelphi predictions and observed results.
  • Table 6 comprises Table 6A: inDelphi predictions and observed results for Lib-B, showing all sequences with replicate- consistent mESC results; Table 6B (continued from 6A); Table 6C (continued from 6B); Table 6D (continued from 6C); and Table 6E (continued from 6D) (presented below between Table 5 and the claims).
  • Table 7 Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision
  • the Cas9-mediated end-joining repair products of thousands of target DNA loci integrated into mammalian cells were used to train a machine learning model, inDelphi, that accurately predicts the spectrum of genotypic products resulting from double-strand break repair at a target DNA site of interest.
  • the ability to predict Cas9-mediated products enables new precision genome editing research applications and facilitates existing applications.
  • the inDelphi model identifies target loci in which a substantial fraction of all repair products consists of a single genotype.
  • the findings suggest that 26% of SpCas9 gRNAs targeting the human genome are precision gRNAs, yielding a single genotypic outcome in >30% of all major repair products, and 5% are high-precision gRNAs in which >50% of all major repair products are of a single genotype.
  • precision and high-precision gRNAs enable uses of Cas9 nuclease in which the major genotypic products can be predicted a priori.
  • inDelphi will also be able to accurately predict repair genotypes from other double-strand break creation methods, including other Cas9 homologs, Cpfl, transcription activator-like effector nucleases (TALENs), and zinc-finger nucleases (ZFNs) 3741 43 .
  • This work establishes that the prediction and judicious application of template-free Cas9 nuclease-mediated genome editing offers new capabilities for the study and potential treatment of genetic diseases.
  • the goal of the machine learning algorithm, inDelphi is to accurately predict the identities and relative frequencies of non-wildtype genotypic outcomes produced following a CRISPR/Cas9-mediated DNA double-strand break.
  • parameters were developed to classify three distinct categories of genotypic outcomes, microhomology deletions, microhomology-less deletions, and insertions, informed by the biochemical mechanisms underlying the DNA repair pathways that typically give rise to them.
  • Double strand breaks are thought to be repaired via four major pathways: classical non- homologous end-joining (c-NHEJ), alternative-NHEJ (alt-NHEJ), microhomology-mediated end-joining (MMEJ), and homology-directed repair (HDR)l.
  • c-NHEJ classical non- homologous end-joining
  • alternative-NHEJ alt-NHEJ
  • microhomology-mediated end-joining MMEJ
  • HDR homology-directed repair
  • CRISPR/Cas9 DNA double-strand break may lead to HDR repair via endogenous homology templates that exist in trans 45 , HDR-characteristic outcomes are not explicitly modeled using the algorithm.
  • genotypic outcomes were seperated into three classes: microhomology deletions (MH deletions), microhomology-less deletions (MH-less deletions), and single-base insertions (l-bp insertions) (FIG. 12A).
  • MH deletions microhomology deletions
  • MH-less deletions microhomology-less deletions
  • l-bp insertions single-base insertions
  • MH deletions are predicted from MH length, MH GC content, and deletion length
  • microhomologous basepairing of single-stranded DNA (ssDNA) sequences occurs across the border of the double strand breakpoint 46, 47 .
  • ssDNA single-stranded DNA
  • the 5’-overhangs not participating in the microhomology are removed up until the paired microhomology region, and the unpaired ssDNA sequences are extended by DNA polymerase using the opposing strand as a template (FIG. 12B, FIGs. 18A-18H).
  • inDelphi calculates the set of all MH deletions available given a specific sequence context and cleavage site.
  • the 3’-overhang is overlapped downstream of the cut site under the upstream 3’-overhang and it is determined if there is any microhomologous basepairing.
  • 4-bp deletion length Given the 4-bp deletion length:
  • the set of MH deletions thus includes all l-bp to 60-bp deletions that can be derived from the steps above that simulate the MMEJ mechanism.
  • MMEJ efficiency has been reported to depend on the thermodynamic favorability and stability of a candidate microhomology 46, 47 .
  • inDelphi calculates the MH length, MH GC content, and resulting deletion length for each possible MH deletion. These features are input into a machine learning module as the microhomology neural network (MH- NN) to learn the relationship between these features and the frequency of an MH deletion outcome in a training CRISPR/Cas9 genotypic outcome dataset. While it was predicted and empirically found that favored MH deletions have long MH lengths relative to total deletion length and high MH GC-contents, any explicit direction or comparative weighting to these parameters are not provided at the outset.
  • inDelphi then outputs a phi-score for any MH deletion genotype (whether it was in the training data or not) that represents the favorability of that outcome as predicted by MH-NN. It is important to emphasize that the phi-score of a particular MH deletion does not itself represent the likelihood of that MH deletion occurring in the context of all MH deletions at a given site. Some CRISPR/Cas9 target sites may have many possible favorable MH deletion outcomes while other sites have few, and thus phi-score must be normalized for a given target site to generate the fractional likelihood of that genotypic outcome at that site. Total unnormalized MH deletion phi-score is one factor that is further used to predict the relative frequency of the different repair classes: MH deletions, MH-less deletions, and insertions.
  • MH-less deletions are defined as all possible deletions that have not been accounted for by the workflow described above for MH deletions. Mechanistically, the data analysis suggests that MH deletions are associated with repair genotypes produced by c-NHEJ and
  • c-NHEJ-associated proteins rapidly bind the DNA strands flanking the double-strand DNA breakpoint and recruit ligases, exonucleases, and polymerases to process and re-anneal the breakpoint in the absence of 5’-end resection (FIGs. 18A-18H) 26, 35 .
  • c-NHEJ repair is error-free; however, in the context of Cas9- mediated cutting, faithful repair leads to repeated cutting, thereby increasing the eventual likelihood of mutagenic repair.
  • Erroneous c-NHEJ repair products are mainly thought to consist of small insertions or deletions or combinations thereof that most frequently occur in the direct vicinity of the DNA break point 35, 48, 49 .
  • the resulting deletions which are referred to as medial end-joining MH-less deletions, have often lost bases both upstream and downstream of the cleavage site.
  • Microhomology-mediated alt-NHEJ is a distinct pathway that produces MH-less deletion products.
  • this form of alt-NHEJ repair occurs following 5’-end resection and is mediated by microhomology in the sequence surrounding the double-strand break-point 1.
  • Microhomologous basepairing stabilizes the 3’- ssDNA overhangs following 5’-end resection, similarly to in MMEJ, allowing DNA ligases to join the break across one of the strands of this temporarily configured complex.
  • the opposing un-annealed flap is then removed, and newly synthesized DNA templated off of the remaining strand is annealed to repair the lesion (FIGs. 18A-18H).
  • alt-NHEJ uses microhomology, the repair products it produces do not follow the predictable genotypic patterns induced by MMEJ and are thus grouped into MH-less deletion genotypes.
  • MH deletions are a direct merger of both annealed strands, in which the outcome genotype switches from top to bottom strand at the exact end-point of a microhomology.
  • alt-NHEJ employs microhomology in its repair mechanism, the deletion outcomes it generates comprise bases exclusively derived from either the top or bottom strand. Mechanistically, this occurs because ligation of a 3’-overhang to its downstream ligation partner results in removal of the entire opposing ssDNA overhang up until the point of ligation.
  • This process prevents any deletion from occurring in the 3’-overhang strand that is first attached to the DNA backbone, while inducing loss of an indeterminant length of sequence on the opposing strand.
  • the resulting deletion genotypes which are referred to as unilateral end-joining MH-less deletions, do not retain information on the exact microhomology causal to their occurrence, and are thus also referred to as MH-less.
  • inDelphi detects MH-less deletions from training data as the set of all deletions that are not MH deletions and parameterizes them solely by the length of the resulting deletion. This is based on the simple assumption that c-NHEJ and alt-NHEJ processes are most likely to produce short deletions, supported by the empirical observation. As with MH deletions, this assumption is not explicitly coded into the inDelphi MH-less deletion prediction module, instead allowing it to be“learned” by a neural network called MHless-NN.
  • MHless-NN optimizes a phi-score for a given MH-less deletion length, grounded in the frequency of MH-less deletion outcomes of that length observed in the training data. It was observed that MHless-NN learns a near-exponential decaying phi-score for increasing deletion length, that reflects the sum total frequency of all MH-less deletion genotypes. The total unnormalized MH-less deletion phi-score for a given target and cut site is also employed to inform the relative frequency of different repair classes.
  • inDelphi is fed with training data on l-bp insertion frequencies and identities at each training site parameterized with the identities of the -3, -4, and -5 bases upstream of the NGG PAM-sequence (when the training set is sufficiently large, and the -4 base alone when training data is limited) as features. Also added as features are the precision score of the deletion length distribution and the total deletion phi-score at that site. These features are combined into a k-nearest neighbor algorithm that predicts the relative frequencies and identities of l-bp insertion products at any target site.
  • inDelphi While it is plausible to generalize that the competition and collaboration among outcome classes modeled by inDelphi reflects interactions among components of distinct DNA repair pathways, the classes of outcomes considered by inDelphi do not necessarily arise from distinct DNA repair pathways as they are described above. InDelphi is trained on the repair outcomes only and cannot distinguish between the nature of genotypes when they may occur through MH-mediated and MH-less mechanisms, and it is imaginable that some repair products result through more than one repair pathway. As an additional note, while NHEJ is generally assumed to dominate double-strand break repair from environmentally induced damage 35 , it was found in the context of Cas9 cutting that MH deletion genotypes are more common than MH-less deletions and insertions.
  • Prkdc- , -Lig4- , ⁇ mutants have distinct and predictable DNA repair product distributions
  • Prkdc / Lig4 / mESCs are impaired in unilateral deletions, where only bases from one side of the cutsite are removed, but not medial MH-less deletion outcomes that have loss of bases on both sides of the breakpoint.
  • FIGs. 22A-22E microhomology-mediated alt-NHEJ, which it was hypothesized may give rise to unilateral MH-less deletions, proceeds through a mechanism in which DNA repair
  • Specified pools of 2000 oligos were synthesized by Twist Bioscience and amplified with NEBNext polymerase (New England Biolabs) using primers OligoLib_Fw and OligoLib_Rv (see below), to extend the sequences with overhangs complementary to the donor template used for circular assembly.
  • qPCR was first performed by addition of SybrGreen Dye (Thermo Fisher) to determine the number of cycles required to complete the exponential phase of amplification. The PCR reaction was run for half of the determined number of cycles at this stage. Extension time for all PCR reactions was extended to 1 minute per cycle to prevent skewing towards GC-rich sequences.
  • the 246-bp fragment was purified using a PCR purification kit (Qiagen).
  • the donor template for circular assembly was amplified with NEBNext polymerase (New England Biolabs) for 20 cycles from an SpCas9 sgRNA expression plasmid (Addgene 71485) 34 using primers CircDonor_Fw and CircDonor_Rv (see below) to amplify the sgRNA hairpin and terminator, and extended further with a linker region meant to separate the gRNA expression cassette from the target sequence in the final library.
  • the 146-bp amplicon was gel-purified (Qiagen) from a 2.5% agarose gel.
  • the amplified synthetic library and donor templates were ligated by Gibson Assembly (New England Biolabs) in a 1 :3 molar ratio for 1 hour at 50°C, and unligated fragments were digested with Plasmid Safe ATP-Dependent DNase (Lucigen) for 1 hour at 37°C.
  • Assembled circularized sequences were purified using a PCR purification kit (Qiagen), linearized by digestion with Sspl for >3 hours at 37°C, and the 237-bp product was gel purified (Qiagen) from a 2.5% agarose gel.
  • the linearized fragment was further amplified with NEBNext polymerase (New England Biolabs) using primers PlasmidIns_Fw and PlasmidIns_Rv (see below) for the addition of overhangs complementary to the 5’- and 3’-regions of a Tol2-transposon containing gRNA expression plasmid (Addgene 71485) 34 previously digested with Bbsl and Xbal (New England Biolabs), to facilitate gRNA expression and integration of the library into the genome of mammalian cells.
  • NEBNext polymerase New England Biolabs
  • PlasmidIns_Fw and PlasmidIns_Rv see below
  • qPCR was performed by addition of SybrGreen Dye (Thermo Fisher) to determine the number of cycles required to complete the exponential phase of amplification, and then ran the PCR reaction for the determined number of cycles.
  • SybrGreen Dye Thermo Fisher
  • the 375-bp amplicon was gel-purified (Qiagen) from a 2.5% agarose gel.
  • the 375-bp amplicon and double-digested Tol2-transposon containing gRNA expression plasmid were ligated by Gibson Assembly (New England Biolabs) in a 3: 1 ratio for 1 hour at 50°C. Assembled plasmids were purified by isopropanol precipitation with GlycoBlue
  • NEBlObeta New England Biolabs electrocompetent cells. Following recovery, a small dilution series was plated to assess transformation efficiency and the remainder was grown in liquid culture in DRM medium overnight at 37°C. A detailed step-by-step library cloning protocol is provided below.
  • the plasmid library was isolated by Midiprep plasmid purification (Qiagen). Library integrity was verified by restriction digest with Sapl (New England Biolabs) for 1 hour at 37°C, and sequence diversity was validated by high-throughput sequencing (HTS) as described below.
  • a base plasmid was constructed starting from a Tol2-transposon containing plasmid (Addgene 71485) 34 .
  • the sequence between Tol2 sites was replaced with a CAGGS promoter, multi-cloning site, P2A peptide sequence followed by eGFP sequence, and Puromycin resistance cassette to produce p2T-CAG-MCS-P2A-GFP-PuroR.
  • the full sequence of this plasmid is appended in the Sequences section below, and this plasmid has been submitted to Addgene.
  • LDLR To generate p2T-CAGGS-LDLRwt-P2A-GFP-PuroR, LDLR (NCBI Gene ID #3949, transcript variant 1 CDS) was PCR amplified from a base plasmid ordered from the Harvard PlasmID resource core and cloned between the Bam HI and Nhel sites of the base plasmid.
  • LDLR c.668_681 dup AGG AC A A ATCTG AC (LDLRdup254/255) (SEQ ID NO: 14)
  • LDLR:c.669_680dupGGACAAATCTGA (LDLRdup258) (SEQ ID NO: 15)
  • LDLR:c.672_683dupCAAATCTGACGA (LDLRdup26l) (SEQ ID NO: 16)
  • LDLR c.1662_ 1669dupGCTGGTGA (LDLRdup264)
  • transcript variant C CDS was PCR amplified from HCT116 cDNA and cloned between the Bam HI and Nhel sites of the base plasmid.
  • PORCN c.1059_ 1071 dupCCTGGCTTTTATC (SEQ ID NO: 17) (PORCNdup20) was generated through InFusion cloning.
  • GLB1 NCBI Gene ID #2720, transcript variant 1 CDS was PCR amplified from
  • HCT116 cDNA and cloned between the Bam HI and Nhel sites of the base plasmid.
  • GLBl :c.l456_l466dupGGTGCATATAT (SEQ ID NO: 18) (GLBldup84) was generated through InFusion cloning.
  • GAA NCBI Gene ID #2548, transcript variant 1 CDS was PCR amplified from a base plasmid ordered from the Harvard PlasmID resource core and cloned between the Bam HI and Nhel sites of the base plasmid.
  • SpCas9 guide RNAs were cloned as a pool into a Tol2-transposon containing gRNA expression plasmid (Addgene 71485) 34 using Bbsl plasmid digest and Gibson Assembly (NEB).
  • SaCas9 guide RNAs were cloned into a similar Tol2-transposon containing SaCas9 gRNA expression plasmid (p2T-U6-sgsaCas2xBbsI-HygR) which has been submitted to Addgene using Bbsl plasmid digest and Gibson Assembly.
  • Protospacer sequences used are listed below, using the internal nomenclature which matches the duplication alleles. LDLR gRNAs
  • sgsaLDLRdup252 GCTGCGAAGATGGCTCGGAGGC (SEQ ID NO: 20)
  • sgsaLDLRdup254 GTGCAAGGACAAATCTGACAGG (SEQ ID NO: 21)
  • sgsaLDLRdup255 GTTCCTCGTCAGATTTGTCCTG (SEQ ID NO: 22)
  • sgsaLDLRdup258 GACTGCAAGGACAAATCTGAGG (SEQ ID NO: 23)
  • sgsaLDLRdup261 GTTTTCCTCGTCAGATTTGTCG (SEQ ID NO: 24)
  • sgspLDLRdup264 GACATCTACTCGCTGGTGAGC (SEQ ID NO: 25)
  • sgspPORCNdup20 GCTGTCCCTGGCTTTTATCCC (SEQ ID NO: 26)
  • sgspGLB ldup84 GTGTGAACTATGGTGCATATA (SEQ ID NO: 27)
  • sgsaGAAdup327 GCAGCTGCAGAAGGTGACTGCA (SEQ ID NO: 28)
  • sgspGAAdup328 GCTGCAGAAGGTGACTGCAGA (SEQ ID NO: 29)
  • HEK293T, HCT116, and U20S cells were purchased from ATCC and cultured as recommended by ATCC.
  • Tol2 transposon plasmid integration cells were transfected using Lipofectamine 3000 (Thermo Fisher) using standard protocols with equimolar amounts of Tol2 transposase plasmid 25 (a gift from Koichi Kawakami) and transposon- containing plasmid.
  • Lipofectamine 3000 Thermo Fisher
  • 6-well plates with >l0 6 initial cells were used for library applications.
  • PCR1 was performed using the primers specified below.
  • PCR2 was performed to add full-length Illumina sequencing adapters using the NEBNext Index Primer Sets 1 and 2 (NEB) or internally ordered primers with equivalent sequences. All PCRs were performed using NEBNext polymerase (New England Bioscience). Extension time for all PCR reactions was extended to lmin per cycle to prevent skewing towards GC-rich sequences.
  • the pooled samples were sequenced using NextSeq (Illumina) at the Harvard Medical School Biopolymers Facility, the MIT BioMicro Center, or the Broad Institute Sequencing Facility.
  • Cycle number is half the number of cycles needed to reach signal amplification plateau in the QPCR in step 1 , reduced by 1 cycle to scale for DNA input.
  • Cycle number is number of cycles needed to reach signal amplification plateau in the QPCR in step 5, reduced by 4 cycles to scale for increased DNA input.
  • Electroporation competent cells give a higher transformation efficiency than chemically competent cells.
  • NEB1 Obeta electro-competent cells were used, however these can be substituted for other lines and transformed according to the manufacturer’s instructions.
  • DRM was used as recovery and culture medium to enhance yield. If substituting for a less rich medium such as LB, it isrecommended scaling up the culture volume to obtain similar plasmid DNA quantities.
  • Antibiotic-free recovery time should be limited to 15 minutes to prevent shedding of transformed plasmids from replicating bacteria.
  • each sequenced pair of gRNA fragment and target was associated with a set of designed sequence contexts G by finding the designed sequence contexts for all gRNAs whose beginning section perfectly matches the gRNA fragment (read 1 in general does not fully sequence the gRNA), and by using locality sensitive hashing (LSH) with 7-mers on the sequenced target to search for similar designed targets.
  • LSH locality sensitive hashing
  • An LSH score on 7-mers between a reference and a sequenced context reflects the number of shared 7-mers between the two. If the best reference candidate scored, through LSH, greater than 5 higher than the best LSH score of the reference candidates obtained from the gRNA-fragment, the LSH candidate is also added to G.
  • LSH was used due to extensive (-33% rate) PCR recombination between readl and read2 which in sequenced data appears as mismatched readl and read2 pairs.
  • the sequenced target was aligned to each candidate in G and the alignment with the highest number of matches is kept.
  • Sequence alignment was performed using the Needleman-Wunsch algorithm using the parameters: +1 match, -1 mismatch, -5 gap open, -0 gap extend. For library data, starting gaps cost 0. For all other data, starting and ending gaps cost 0. For VO data, sequence alignments were derived from SAM files from SRA.
  • CRISPR-associated DNA repair events were defined as any alignment with deletions or insertions occurring within a 4 bp window centered at the expected cut site and any alignment with both deletions and insertions (combination indel) occurring with a 10 bp window centered at the expected cut site. All CRISPR-associated DNA repair events observed in control data had their frequencies subtracted from treatment data to a minimum of 0.
  • Homopolymer Entire read is homopolymer of a single nucleotide. Not considered a CRISPR repair product.
  • Has N Read contains at least one N. Discarded as noise, not considered a CRISPR repair product.
  • PCR Recombination Contains recombination alignment signature: (1) if a long indel (10 bp+) followed by chance overlap followed by long indel (10 bp+) of the opposite type, e.g., i n serti on -ran dom m atch -del eti on and del eti on -ran dom m atch -i n serti on . OR, if one of these two indels is 30 bp+, the other can be arbitrarily short. If either criteria is true, and if the chance overlap is length 5 or less, or any length with less than 80% match rate, then it satisfies the recombination signature.
  • Poor-Matches 55bp designed sequence context has less than 5 bp representation (could occur from 50 bp+ deletions or severe recombination) or less than 80% match rate. Not considered a CRISPR repair product.
  • Cutsite-Not-Sequenced The read does not contain the expected cleavage site.
  • Wildtype No indels in all of alignment. Not considered a CRISPR repair product.
  • Deletion - Not at cut Single deletion occurring within 2 bp window around cleavage site, but not immediately at cleavage site. Considered a CRISPR repair product. Deletion: Single deletion occurring immediately at cleavage site. Considered a CRISPR repair product.
  • Insertion An alignment with only a single insertion event. Subdivided into:
  • Insertion - Not CRISPR Single insertion occurs outside of 10 bp window around cleavage site. Not considered a CRISPR repair product.
  • Insertion - Not at cut Single insertion occurring within 2 bp window around cleavage site, but not immediately at cleavage site. Considered a CRISPR repair product.
  • Insertion Single insertion occurring immediately at cleavage site. Considered a CRISPR repair product.
  • Combination indel An alignment with multiple indels where all non-gap regions have at least 80% match rate. Subdivided into:
  • Combination Indel All indels are within a 10 bp window around the cleavage site.
  • Combination Indel At least two indels, but not all, are within a 10 bp window around the cleavage site. Considered a rarer secondary CRISPR repair product, ignored.
  • Combination Indel - Not CRISPR No indels are within a 10 bp window around the cleavage site. Not considered a CRISPR repair product.
  • deletion and insertion events are defined to occur at a single location between bases.
  • events occurring up to 5 bp away from the cleavage site are defined as events where there are five or fewer matched/mismatched alignment positions between the event and the cleavage site, irrespective of the number of gap dashes in the alignment.
  • a total of 4,935 unique variants were selected from Clinvar submissions where the functional consequence is described as complete insertions, deletions, or duplications where the reference or alternate allele is of length less than or equal to 30 nucleotides. Variants were included where at least one submitting lab designated the clinical significance as‘pathogenic’ or ‘likely pathogenic’ and no submitting lab had designated the variant as‘benign’ or‘likely benign’, including variants will all disease associations. More complex indels and somatic variants were included. A total of 18,083 unique insertion variants were selected from HGMD which were between 2 to 30 nucleotides in length. Variants were included with any disease association with the HGMD classification of‘DM’ or disease-causing mutation.
  • SpCas9 gRNAs and their cleavage sites were enumerated for each disease allele.
  • genotype frequency and indel length distributions were predicted for each tuple of disease variant and unique cleavage site.
  • the single best gRNA was identified as the gRNA inducing the highest predicted frequency of repair to wildtype genotype, and if this was impossible (due to, for example, a disease allele with 2+ bp deletion), then the single best gRNA was identified as the gRNA inducing the highest predicted frameshift repair rate. 1327 sequence contexts were designed in this manner for Lib-B.
  • deletion events can be predicted at single-base resolution.
  • the tuple deletion length, delta-position
  • a delta-position associated with a deletion length N is an integer between 0 and N inclusive (FIGs. 19A-19D).
  • a delta-position describes the starting position of the deletion gap in the read w.r.t. the reference sequence relative to the cleavage site.
  • a delta-position of 0 corresponds to a deletion gap at seq[C-N+0 : C+0], and generally with a delta-position of D, the deletion gap occurs at seq[C-N+D : C+D].
  • Microhomologies can be described with multiple delta-positions. To uniquely identify microhomology-based deletion genotypes, the single maximum delta-position in the redundant set is used. Microhomology-less deletion genotypes are associated with only a single delta position and deletion length tuple; this was used as its unique identifier.
  • delta-positions can be motivated by the example workflow shown above on MH deletions describing how each microhomology is associated with a deletion genotype.
  • the delta-position is the number of bases included on the top strand before“jumping down” to the bottom strand.
  • MH-less medial end-joining products correspond to all MH-less genotypes with delta- position between 1 and N-l where N is the deletion length.
  • MH-less unilateral end-joining products correspond to MH-less genotypes with delta-position 0 or N. It is noted that a deletion genotype with delta position N does not immediately imply that it is a microhomology-less unilateral end-joining product since it may contain microhomology (it’s possible that delta- positions N-j, N-j+l, .., N all correspond to the same MH deletion.)
  • inDelphi receives as input a sequence context and a cleavage site location, and outputs two objects: a frequency distribution on deletion genotypes, and a frequency distribution on deletion lengths.
  • inDelphi trains two neural networks: MH-NN and MHless-NN.
  • MH-NN receives as input a microhomology that is described by two features:
  • MH-NN outputs a number (psi).
  • MHless-NN receives as input the deletion length.
  • MHless-NN outputs a number (psi).
  • the architecture of the MH-NN and MHless-NN networks are input-dimension -> 16 -> 16 -> 1 for a total of two hidden layers where all nodes are fully connected. Sigmoidal activations are used in all layers except the output layer. All neural network parameters are initialized with Gaussian noise centered around 0. inDelphi Deletion Modeling: Making predictions
  • inDelphi Given a sequence context and cleavage site, inDelphi enumerates all unique deletion genotypes as a tuple of its deletion length and its delta-position for deletion lengths from 1 bp to 60 bp. For each microhomology enumerated, an MH-phi score is obtained using MH-NN. In addition, for each deletion length from 1 bp to 60 bp, an MHindep-phi score is obtained using MHless-NN.
  • inDelphi combines all MH-phi and MHindep-phi scores for a particular sequence context into two objects - a frequency distribution on deletion genotypes, and a frequency distribution on deletion lengths - which are both compared to observations for training.
  • the model is designed to output two separate objects because both are of biological interest, and separate but intertwined modeling approaches are useful for generating both.
  • inDelphi jointly learns about microhomology-based deletion repair and microhomology less deletion repair.
  • inDelphi assigns a score for each microhomology. Score assignment considers the concept of“full” microhomology and treats full and not full MHs differently.
  • a microhomology is“full” if the length of the microhomology is equal to its deletion length.
  • the biological significance of full microhomologies is that there is only a single deletion genotype possible for the entire deletion length, while in general, a single deletion length is consistent with multiple genotypes.
  • this single genotype can be generated through not just the MH-dependent MMEJ mechanism but also through MH-less end-joining, for example as mediated by Lig4. Therefore, full microhomologies were modelled as receiving contributions from both MH-containing and MH-less mechanisms by scoring full
  • microhomologies as MH-phi[i] + MHindep-phi[j] for deletion length j and microhomology index i.
  • Microhomologies that are not“full” are assigned a score of MH-phi[i] for MH index i.
  • Scores for all deletion genotypes assigned this way are normalized to sum to 1 to produce a predicted frequency distribution on deletion genotypes.
  • inDelphi assigns a score for each deletion length. Score assignment integrates contributions from both MH-dependent and MH-independent mechanisms via the following procedure: For each deletion length j, its score is assigned as MHindep-phi[j] plus the sum of MH-phi for each microhomology with that deletion length. Scores for all deletion lengths are normalized to sum to 1 to produce a frequency distribution.
  • inDelphi trains its parameters using a single sequence context by producing both a predicted frequency distribution on deletion genotypes and deletion lengths and minimizing the negative of the sum of squared Pearson correlations for both objects to their observed versions.
  • deletion genotype frequency distributions are formed from observations for deletion lengths 1-60, and deletion length frequency distributions are formed from observations for deletion lengths 1-28.
  • Both neural networks are trained simultaneously on both tasks.
  • inDelphi is trained with stochastic gradient descent with batched training sets.
  • inDelphi is implemented in Python using the autograd library. A batch size of 200, an initial weight scaling factor of 0.10, an initial step size of 0.10, and an exponential decaying factor for the step size of 0.999 per step were used.
  • inDelphi Deletion Modeling Summary and Revisiting Assumptions
  • inDelphi trains MH-NN, which uses as input (microhomology length, microhomology GC content) to output a psi score which is translated into a phi score using deletion length.
  • This phi score represents the“strength” of the microhomology corresponding to a particular MH deletion genotype.
  • MHless-NN which uses as input (deletion length) to directly output a phi score representing the“total strength” of all MH-independent activity for a particular deletion length. While the model assumes that microhomology and microhomology-less repair can overlap in contributions to a single repair genotype, this assumption is made conservatively by assuming that their contributions overlap only when there is no alternative.
  • this single genotype’s score is equal to MHindep-phi[j].
  • multiple MH-less genotypes are possible, in which case the total score of all of the MH-less genotypes is equal to MHindep- phi [j ] -
  • inDelphi predicts insertions from a sequence context and cleavage site by using the precision score of the predicted deletion length distribution and total deletion phi (from all MH-phi and MHindep-phi).
  • inDelphi also uses one- hot-encoded binary vectors encoding nucleotides -4 and -3. In a training set, these features are collected and normalized to zero mean and unit variance, and the fraction of l-bp insertions over the sum counts of l-bp insertions and all deletions are tabulated as the prediction goal.
  • the above procedure is used to predict the frequency of 1 -bp insertions out of l-bp insertions and all deletions for a particular sequence context. Once this frequency is predicted, it is used to make frequency predictions for each of the 4 possible insertion genotypes, which are predicted by deriving from the training set the average insertion frequency for each base given its local sequence context.
  • the training set is small, only the -4 nucleotide is used.
  • nucleotides -5, -4, and -3 are used.
  • inDelphi predicts MH-less deletions to the resolution of deletion length. That is, inDelphi predicts a single frequency corresponding to the sum total frequency of all unique MH-less deletion genotypes possible for a particular deletion length. This modeling choice was made because genotype frequency replicability among MH-less deletions is substantially lower than among MH deletions.
  • Measuring performance on the task of indel length frequency considers MH deletions, MH-less deletions, and l-bp insertions (90% of all outcomes).
  • end-users desire, they can extend inDelphi predictions to frequency predictions for specific MH-less deletion genotypes by noting that MH-less deletions are distributed uniformly between 0 delta-position genotypes, medial genotypes, and N delta- position genotypes.
  • inDelphi was compared to a baseline model with the same model structure but replacing the deep neural networks with linear models. The comparison was done using Lib-A mESC data. While inDelphi achieves a mean held-out Pearson correlation of 0.851 on deletion genotype frequency prediction and 0.837 on deletion length frequency prediction, the linear baseline model achieves a mean held-out Pearson correlation of 0.816 on deletion genotype frequency prediction and 0.796 on deletion length frequency prediction. When including the third model component for l-bp insertion modeling and testing on genotype frequency prediction for l-bp insertions and all deletions, inDelphi achieves a median held-out Pearson correlation of 0.937 and 0.910 on the task of indel length frequency prediction. The linear baseline model achieves a median held-out Pearson correlation of 0.919 and 0.900 on the two tasks respectively.
  • the deep neural network version of MH-NN learns that microhomology length is more important than % GC (FIGs. 18A-18H).
  • the linear version learns the same concept, with a weight of 1.1585 for MH length and 0.332 for % GC.
  • Microhomology length is an important feature for MH-NN (FIGs. 18A-18H).
  • a model was trained that uses only % GC as input to MH-NN while keeping the rest of the model structure identical.
  • this baseline model at convergence achieves to a mean Pearson correlation of 0.59 on the task of predicting deletion genotype frequencies, and a mean Pearson correlation of 0.58 on the task of predicting deletion length frequencies.
  • a model at iteration 0 with randomly initialized weights achieves mean Pearson correlations of 0.55 and 0.54 on the two respective tasks on held-out data.
  • This basal Pearson correlation is relatively high due to the model structure, in particular, the exponential penalty on deletion length. In sum, removing MH length as a feature severely impacts model performance, restricting it to predictive performance not appreciably better than random chance.
  • inDelphi training and testing on data from varying cell-types
  • deletion component was first trained on a subset of Lib-A mESC data. Then, k-fold cross-validation was applied on D where D is iteratively split into training and test datasets.
  • the training set is used to train the insertion frequency model (k-nearest neighbors) and insertion genotype model (matrix of observed probabilities of each inserted base given local sequence context, which is just the -4 nucleotide when the training dataset is small, and -5, -4 and -3 nucleotides when the training dataset is large).
  • the deletion component of inDelphi was trained on a subset of the Lib-A mESC data.
  • all VO sequence contexts (about 100) were randomly split into training and test datasets 100 times.
  • the training set was used for k-nearest neighbor modeling of l-bp insertion frequencies. Feature normalization to zero mean and unit variance was not performed.
  • the average frequency of each l-bp insertion genotype was derived from the training set as well.
  • the median test-time Pearson correlation was used for plotting in FIGs. 13A-13D. Due to the small size of the training set, only the -4 nucleotide was used for modeling both the insertion frequency and insertion genotype frequencies. inDelphi testing on library data
  • inDelphi was trained on data from 946 Lib-A sequence contexts and tested on 168 held- out Lib-A sequence contexts. Nucleotide -4 was used for insertion rate modeling, all other modeling choices were standard as described above. On held-out data, this version of inDelphi achieved a median Pearson correlation of 0.84 on predicting indel genotype frequencies, and 0.80 on predicting indel length frequencies.
  • inDelphi For general-use on arbitrary cell types, a version of inDelphi was trained using additional data from diverse types of cells. Deletion modeling was trained using data from 2,464 sequence contexts from high-replicability Lib-A and Lib-B data (including clinical variants and microduplications, fourbp, and longdup) in mES and data from VO sequence contexts in HEK293 and K562. Insertion frequency modeling is implemented as above. Insertion genotype modeling uses nucleotides -5, -4, and -3.
  • the insertion frequency model and insertion genotype model are trained on VO endogenous data in K562 and HEK293T, Lib-A data in mESC, and Lib-B data (including clinical variants and microduplications, fourbp, and longdup) in mESC and U20S.
  • the training dataset has rich and uniform representation across all quintiles of several major axes of variation including GC content, precision, and number of bases participating in microhomology as measured empirically in the human genome.
  • This design strategy enables inDelphi to generalize well to arbitrary sequence contexts from the human genome.
  • training data further include data in the outlier range of statistics of interest, including extremely high and low precision repair distributions, and extremely weak and strong microhomology (minimal microhomology and extensive microduplication microhomology sequences).
  • extremely high and low precision repair distributions and extremely weak and strong microhomology (minimal microhomology and extensive microduplication microhomology sequences).
  • microhomology minimal microhomology and extensive microduplication microhomology sequences.
  • the availability of such sequences in the training data enables inDelphi to generalize well to sequence contexts of clinical interest and sequence contexts supporting unusually high frequencies of precision repair.
  • inDelphi has received strong preparation for accurate prediction on other clinical microduplications.
  • HCT116 human colon cancer cell line experiences a markedly higher frequency of single base insertions compared to all other cell lines that were studied, possibly due to the MLH1 deficiency of this cell line leading to impaired DNA mismatch repair. For this reason, HCT116 data was excluded from the training dataset. For best results, it is suggested that end-users keep in mind that repair class frequencies can be cell type-dependent, and this issue has not been well-characterized thus far.
  • inDelphi main error tendency is on the side of overestimating rather than underestimating the precision of repair (FIGs. 14A-14F, FIGs. 15A-15F).
  • this tendency can be explained by noting that inDelphi only considers sequence microhomology as a factor, while it’s plausible and likely in biological experimental settings that even sequence contexts with very strong sequence microhomology may not yield precise results due to noise factors that are not considered by inDelphi.
  • end-users take this tendency into account when using inDelphi predictions for further experiments.
  • end-users should recognize that observed repair outcomes may have empirical precision under this threshold.
  • sequence contexts were designed by empirically determining the distribution of four statistics in sequence contexts from the human genome. These four statistics are GC content, total sum of bases participating in microhomology for 3-27 -bp deletions, Azimuth predicted on- target efficiency score, and the statistical entropy of the predicted 3-27 -bp deletion length distribution from a previous version of inDelphi. For each of these statistics, empirical quintiles were derived by calculating these statistics in a large number of sequence contexts from the human genome. For the library, sequence contexts were designed by randomly generated DNA that categorized into each combination of quintiles across each of the four statistics.
  • 228“fourbp” sequence contexts were designed at 3 contexts with random sequences (with total phi score on average lower than VO sequence contexts) while varying positions -5 to - 2; for each of the 3“low-microhomology” contexts, 76 four bases were randomly designed while ensuring representation from all possible 2 bp microhomology patterns including no
  • microhomology one base of microhomology at either position, and full two bases of
  • Nucleotides from positions -7 to 0 were one-hot-encoded and used in ridge regression to predict the observed frequency of l-bp insertions out of all Cas9 editing events in 1996 sequence contexts from Lib-A mESC data.
  • the data were split into training and testing sets (80/20 split) 10,000 times to calculate a bootstrapped estimate of linear regression weights and test-set predictive Pearson correlation.
  • the median test-set Pearson correlation was found to be 0.62.
  • any features that included 0 within the bootstrapped weight range were excluded (probability that the weight is zero > le-4).
  • the average bootstrapped weight estimate was used as the“logo height” for all remaining features. Each feature is independent; vertical stacking of features follows the published tradition of DNA motifs.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Veterinary Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)

Abstract

The specification provides methods for introducing a desired genetic change in a nucleotide sequence using a double-strand break (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the desired genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site.

Description

SYSTEMS AND METHODS FOR PREDICTING
REPAIR OUTCOMES IN GENETIC ENGINEERING
FEDERALLY SPONSORED RESEARCH
This invention was made with Government support under Grant No. R01 HG008754 awarded by the National Institutes of Health. The Government has certain rights in the invention.
RELATED APPLICATION
This application claims priority to to U.S. Provisional Patent Application Serial No. 62/599,623, filed on December 15, 2017, titled“SYSTEMS AND METHODS FOR
PREDICTING REPAIR OUTCOMES IN GENETIC ENGINEERING” and to U.S. Provisional Patent Application Serial No. 62/669,771, filed May 10, 2018, each of which are hereby incorporated by reference in their entireties.
BACKGROUND OF THE INVENTION
CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 has
revolutionized genome editing, providing powerful research tools and promising agents for the potential treatment of genetic diseases 1-3. The DNA-targeting capabilities of Cas9 have been improved by the development of gRNA design principles4, modeling of factors leading to off- target DNA cleavage, enhancement of Cas9 sequence fidelity by modifications to the nuclease5,6 and gRNA7, and the evolution or engineering of Cas9 variants with alternative PAM sequences8- 10. Similarly, control over the product distribution of genome editing has been advanced by the development of base editing to achieve precise and efficient single-nucleotide mutations7,11,12, and the improvement of template-directed homology-directed repair (HDR) of double strand breaks13-15.
Non-template directed repair systems, including non-homologous end-joining (NHEJ) and microhomology-mediated end-joining (MMEJ), are major pathways involved in the repair of Cas9-mediated double-strand breaks that can result in highly heterogeneous repair outcomes that generate hundreds of genotypes following DNA cleavage at a single site. While end-joining repair of Cas9-mediated double-stranded DNA breaks has been harnessed to facilitate knock-in of DNA templates18-21 or deletion of intervening sequence between two cleavage sites22, NHEJ and MMEJ are not generally considered useful for precision genome editing applications. Recent work has found that the heterogeneous distribution of Cas9-mediated editing products at a given target site is reproducible and dependent on local sequence context20,21, but no general methods have been described to predict genotypic products following Cas9-induced double-stranded DNA breaks.
The generally accepted view is that DNA double-strand break repair (i.e., template-free, non-homology-dependent repair) following cleavage by genome editing systems produces stochastic and heterogenous repair products and is therefore impractical for applications beyond gene disruption. Further, template-free repair processes (e.g., MMEJ and NHEJ) following DNA double-strand break, despite being more efficient than homology-based repair, are generally not viewed as feasible solutions to precision repair applications, such as restoring the function of a defective gene with a gain-of-function genetic change. Accordingly, methods and solutions enabling the judicious application of template-free genome editing systems, including
CRISPR/Cas, TALEN, or Zinc-Finger genome editing systems, would significantly advance the field of genome editing.
SUMMARY OF THE INVENTION
The present inventors have unexpectedly found through computational analyses that template-free DNA/genome editing systems, e.g., CRISPR/Cas9, Cas-based, Cpfl-based, or other DSB (double-strand break)-based genome editing systems, produce a predictable set of repair genotypes thereby enabling the use of such editing systems for applications involving or requiring precise manipulation of DNA, e.g., the correction of a disease-causing genetic mutation or modifying a wildtype sequence to confer a genetic advantage. This finding is contrary to the accepted view that DNA double-strand break repair (i.e., template-free, non-homology- dependent repair) following cleavage by genome editing systems produces stochastic and heterogenous repair products and are therefore impractical for applications beyond gene disruption. Thus, the specification describes and discloses in various aspects and embodiments computational-based methods and systems for practically harnessing the innate efficiencies of template-free DNA repair systems for carrying out precise DNA and/or genomic editing without the reliance upon homology-based repair.
Accordingly, the specification provides in one aspect a method of introducing a desired genetic change in a nucleotide sequence using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the desired genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site.
In another aspect, the specification provides a method of treating a genetic disease by correcting a disease-causing mutation using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence comprising a disease-causing mutation; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for correcting the disease-causing mutation in the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby correcting the disease-causing mutation and treating the disease.
In yet another aspect, the specification provides a method of altering a genetic trait by introducing a genetic change in a nucleotide sequence using a double-strand brake (DSB)- inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site and consequently altering the associated genetic trait.
In another aspect, the specification provides a method of selecting a guide RNA for use in a Cas-genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising: identifying in a nucleotide sequence of a target genomic location one or more available cut sites for a Cas-based genome editing system; and analyzing the nucleotide sequence and cut site with a computational model to identify a guide RNA capable of introducing the genetic change into the nucleotide sequence of the target genomic location.
In still another aspect, the specification provides a method of introducing a genetic change in the genome of a cell with a Cas-based genome editing system comprising: selecting a guide RNA for use in the Cas-based genome editing system in accordance with the method of the above aspect; and contacting the genome of the cell with the guide RNA and the Cas-based genome editing system, thereby introducing the genetic change.
In various embodiments, the cut sites available in the nucleotide sequence are a function of the particular DSB-inducing genome editing system in use, e.g., a Cas-based genome editing system.
In certain embodiments, the nucleotide sequence is a genome of a cell.
In certain other embodiments, the method for introducing the desired genetic change is done in vivo within a cell or an organism (e.g., a mammal), or ex vivo within a cell isolated or separated from an organism (e.g., an isolated mammalian cancer cell), or in vitro on an isolated nucleotide sequence outside the context of a cell.
In various embodiments, the DSB-inducing genome editing system can be a Cas-based genoe editing system, e.g., a type II Cas-based genome editing system. In other embodiments, the DSB-inducing genome editing system can be a TALENS-based editing system or a Zinc- Finger-based genome editing system. In still other embodiments, the DSB-inducing genome editing system can be any such endonuclease-based system which catalyzes the formation of a double-strand break at a specific one or more cut sites.
In embodiments involving a Cas-based genome editing system, the method can further comprise selecting a cognate guide RNA capable of directing a double-strand break at the optimal cut site by the Cas-based genome editing system.
In certain embodiments, the guide RNA is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6. In various embodiments, the guide RNA can be known or can be newly designed.
In various embodiments, the double-strand brake (DSB)-inducing genome editing system is capable of editing the genome without homology-directed repair.
In other embodiments, the double-strand brake (DSB)-inducing genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
In still other embodiments, the double-strand brake (DSB)-inducing genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof. The double-strand brake (DSB)-inducing genome editing system may comprise a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof in certain embodiments.
In still other embodiments, the double-strand brake (DSB)-inducing genome editing system can comprise a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
In yet further embodiments, the double-strand brake (DSB)-inducing genome editing system can comprise a Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof.
In various embodiments, the desired genetic change to be introduced into the nucleotide sequence, e.g., a genome, is to a correction to a genetic mutation. In embodiments, the genetic mutation is a single-nucleotide polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
In still other embodiments, the genetic change can comprises a 2-60-bp deletion or a l-bp insertion.
The genetic change in other embodiments can comprise a deletion of between 2-20, or 4- 40, or 8-80, or 16-160, or 32-320, 64-640, or up to 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10,000 or more nucleotides. Preferably, the deletion can restore the function of a defective gene, e.g., a gain-of-function frameshift genetic change.
In other embodiments, the desired genetic change is a desired modification to a wildtype gene that confers and/or alters one or more traits, e.g., conferring increased resistance to a pathogen or altering a monogenic trait (e.g., eye color) or polygenic trait (e.g., height or weight).
In embodiments involving correcting a disease-causing mutation, the disease can be a monogenic disease. Such monogenic diseases can include, for example, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
In any of the above aspects and embodiments, the step of identifying the available cut sites can involve identifying one or more PAM sequences in the case of a Cas-based genome editing system.
In various embodiments of the above methods, the computational model used to analyze the nucleotide sequence is a deep learning computational model, or a neural network model having one or more hidden layers. In various embodiments, the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In still other embodiments, the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
In various embodiments, the computational model comprises one or more training modules for evaluating experimental data.
In various embodiments, the computational model can comprise: a first training module for computing a microhomology score matrix; a second training module for computing a microhomology independent score matrix; and a third training module for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
In other embodiments, the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
In various embodiments, the genomic repair outcomes can comprise microhomology deletions, microhomology- less deletions, and/or l-bp insertions.
In still other embodiments, the computational model can comprise one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site, and type of DSB- genome editing system.
In various embodiments, the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or even up to 7K, 8K, 9K, 10K, 11K, 12K, 13K, 14K, 15K, 16K, 17K, 18K, 19K, 20K nucleotides, or more in length.
In another aspect, the specification relates to guide RNAs which are identified by various methods described herein. In certain embodiments, the guide RNAs can be any of those presented in Tables 1-6, the contents of which form part of this specification. According to various embodiments, the RNA can be purely ribonucleic acid molecules. However, in other embodiments, the RNA guides can comprise one or more naturally-occurring or non-naturally occurring modifications. In various embodiments, the modifications can including, but are not limited to, nucleoside analogs, chemically modified bases, intercalated bases, modified sugars, and modified phosphate group linkers. In certain embodiments, the guide RNAs can comprise one or more phosphorothioate and/or 5’-N-phosphporamidite linkages.
In still other aspects, the specification discloses vectors comprising one or more nucleotide sequences disclosed herein, e.g., vectors encoding one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof. The vectors may comprise naturally occurring sequences, or non-naturally occurring sequences, or a combination thereof.
In still other aspects, the specification discloses host cells comprising the herein disclosed vectors encoding one more more nucleotide sequences embodied herein, e.g., one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
In other aspects, the specification discloses a Cas-based genome editing system comprising a Cas protein (or homolog, variant, or orthologue thereof) complexed with at least one guide RNA. In certain embodiments, the guide RNA can be any of those disclosed in Tables 1-6, or a functional variant thereof.
In still other aspects, the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
In yet another aspect, the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA. In still a further aspect, the specification provides a library for training a computational model for selecting a guide RNA sequence for use with a Cas-based genome editing system capable of introducing a genetic change into a genome without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a first nucleotide sequence of a target genomic location having a cut site and a second nucleotide sequence encoding a cognate guide RNA capable of directing a Cas-based genome editing system to carry out a double-strand break at the cut site of the first nucleotide sequence.
In another aspect, the specification provides a library and its use for training a computational model for selecting an optimized cut site for use with a DSB -based genome editing system (e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system) that is capable of introducing a desired genetic change into a nucleotide sequence (e.g., a genome) at the selected cut site without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second nucleotide sequence encoding a cognate guide RNA (in embodiments involving a Cas- based genome editing system).
In a still further aspect, the specification discloses a computational model.
In certain embodiments, the computational model can predict and/or compute an optimized or preferred cut site for a DSB-based genome editing system for introducing a genetic change into a nucleotide sequence. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In certain other embodiments, the computational model can predict and/or compute an optimized or preferred cut site for a Cas-based genome editing system for introducing a genetic change into a nucleotide sequence. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In still other embodiments, the computation model provides for the selection of a optimized or preferred guide RNA for use with a Cas-based genome editing system for introducing a genetic change in a genome. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In various embodiments, the computational model is a neural network model having one or more hidden layers.
In other embodiments, the computational model is a deep learning computational model. In various embodiments, that the DSB-based genome editing system (e.g., a Cas-based genome editing system) edits the genome without relying on homology-based repair.
In various embodiments, that computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In other embodiments, computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
In embodiments, the computational model comprises one or more training modules for evaluating experimental data.
In an embodiment, the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
In certain embodiments, the computational model predicts genomic repair outcomes for any given input nucleotide sequence (i.e., context sequence) and cut site.
In certain embodiments, the genomic repair outcomes comprise microhomology deletions, microhomology- less deletions, and l-bp insertions.
In various embodiments, the one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
In certain embodiments, the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more.
In yet another aspect, the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cognate guide RNA, wherein each nucleotide target sequence comprises a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a Cas-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
In still another aspect, the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a DSB-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
In certain embodiments, the trained computational models disclosed herein are capable of computing a probability of distribution of indel lengths for any given input nucleotide sequence and input cut site, and/or a probability of distribution of genotype frequencies for any given input nucleotide sequence and input cut site.
In embodiments relating to Cas-based genomic editing systems, the trained
computational model is capable of selecting a guide RNA for use with a Cas-based genome editing system for introducing a genetic change into a genome.
The computational model provides a means to produce precision genetic change with a DSB-based genomic editing system. The genetic changes can include microhomology deletion, microhomology-less deletion, and l-bp insertion. In certain embodiments, the genetic change corrects a disease-causing mutation. In other embodiments, the genetic change modifies a wildtype sequence, which may confer a change in a genetic trait (e.g., a monogenic or polygenic trait). The disease-causing mutation that can be corrected using the computational model with a DSB-based genomic editing sytem can include, but is not limited to, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, or Duchenne muscular dystrophy.
In another aspect, the disclosure provides a method for selecting one or more guide RNAs (gRNAs) from a plurality of gRNAs for CRISPR, comprising acts of: for at least one gRNA of the plurality of gRNAs, using a local DNA sequence and a cut site targeted by the at least one gRNA to predict a frequency of one or more repair genotypes resulting from template- free repair following application of CRISPR with the at least one gRNA; and
determining whether to select the at least one gRNA based at least in part on the predicted frequency of the one or more repaired genotypes.
In embodiments, the one or more repair genotypes correspond to one or more healthy alleles of a gene related to a disease. In other embodiments, the predicted frequency of the one or more repair genotypes is at least about 30%, or at least about 40%, or at least about 50%, or more.
In certain embodiments, the step of predicting the frequency of the one or more repair genotypes comprises: for each deletion length of a plurality of deletion lengths, aligning subsequences of that deletion length on 5’ and 3’ sides of the cut site to identify one or more longest microhomologies; featurizing the identified microhomologies; applying a machine learning model to compute a frequency distribution over the plurality of deletion lengths; and using frequency distribution over the plurality of deletion lengths to determine the frequency of the one or more repair genotypes.
In certain embodiments, the plurality of gRNAs comprise gRNAs for CRISPR/Cas9, and the application of CRISPR comprises application of CRISPR/Cas9.
In yet another aspect, the system comprises: at least one processor; and at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause the at least one processor to perform a herein disclosed computational method.
A method for editing a nucleotide sequence using a DSB-based genomic editing system that introduces a genetic change at a cut site in a nucleotide sequence, wherein the cut site location is informed by a computational model that computes a frequency distribution over the plurality of deletion lengths and/or a frequency distribution of one or more repaired genotypes over the deletion lengths.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows an illustrative DNA segment 100, in accordance with some embodiments. FIGs. 2A-D show an illustrative matching of 3’ ends of top and bottom strands of a DNA segment at a cut site and an illustrative repair product, in accordance with some embodiments. FIG. 3A shows an illustrative machine learning model 300, in accordance with some embodiments.
FIG. 3B shows an illustrative process 350 for building one or more machine learning models for predicting frequencies of deletion genotypes and/or deletion lengths, in accordance with some embodiments.
FIG. 4A shows an illustrative neural network 400A for computing microhomology (MH) scores, in accordance with some embodiments.
FIG. 4B shows an illustrative neural network 400B for computing MH-less scores, in accordance with some embodiments.
FIG. 4C shows an illustrative process 400C for training two neural networks jointly, in accordance with some embodiments.
FIG. 4D shows an illustrative implementation of the insertion module 315 shown in FIG. 3A, in accordance with some embodiments.
FIG. 5 shows an illustrative process 500 for processing data collected from CRISPR/Cas9 experiments, in accordance with some embodiments.
FIG. 6 shows an illustrative process 600 for using a machine learning model to predict frequencies of indel genotypes and/or indel lengths, in accordance with some embodiments.
FIG. 7 shows illustrative examples of a blunt-end cut and a staggered cut, in accordance with some embodiments.
FIG. 8A shows an illustrative plot 800A of predicted repair genotypes, in accordance with some embodiments.
FIG. 8B shows another illustrative plot 800B of predicted repair genotypes, in accordance with some embodiments.
FIG. 8C shows another illustrative plot 800C of predicted repair genotypes, in accordance with some embodiments.
FIG. 8D shows a microhomology identified in the example of FIG. 8C, in accordance with some embodiments.
FIG. 9 shows another illustrative neural network 900 for computing a frequency distribution over deletion lengths, in accordance with some embodiments.
FIG. 10 shows, schematically, an illustrative computer 1000 on which any aspect of the present disclosure may be implemented. FIGs. 11A-11C show a high-throughput assessment of Cas9-mediated DNA repair products. FIG. 11 A, A genome-integrated screening library approach for monitoring Cas9 editing products at thousands of target sequences. FIG. 11B, Frequency of Cas9-mediated repair products by class from 1,996 Lib-A target sequences in mouse embryonic stem cells (mESCs). FIG. 11C, Distribution of Cas9-mediated repair products by class in 88 VO target sequences in K562 cells.
FIGs. 12A-12D show modeling of Cas9-mediated indels by inDelphi. FIG. 12A, Schematic of computational flow for inDelphi modeling. inDelphi separates Cas9-mediated editing products by indel type and uses machine learning tools trained on experimental Lib-A editing products to predict relative frequencies of editing products for any target site. Major editing products include 1- to 60-bp MH deletions, 1- to 60-bp MH-less deletions, and l-bp insertions. FIG. 12B, Mechanism depicting microhomology-mediated end-joining repair, which yields distinct repair outcomes that reflect which microhomologous bases are used during repair. FIG. 12C, Observed mean frequency of l-bp insertion genotypes among 1,981 Lib-A target sequences with varying -4 nucleotides. Error bars show 95% C.I. on sample mean with lOOO-fold bootstrapping. Data distributions are shown in LIGs. 18A-18H. LIG. 12D, Comparison of observed l-bp insertion frequencies among all Cas9-edited products from 1,996 Lib-A target sequences. The box denotes the 25th, 50th, and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers. *P = 5.4x10-36; **P = 8.6x10-70, two- sided two-sample T-test, test statistic = -13.0 and -18.4, degrees of freedom = 111 and 1,994; Hedges’ g effect size = 0.94 and 0.85, for * and ** respectively e, Motif representation of base identities that impact the frequency of l-bp insertions in Lib-A data. Only bases with non-zero linear regression weights in 10, 000-fold iterative cross-validation are shown. Median held-out Pearson correlation 0.62, total N = 1996.
LIGs. 13A-13L show that Cas9-mediated editing outcomes are accurately predicted by inDelphi. PIG. 13 A, Histogram of the observed fraction of Cas9-mediated editing products whose indel length is included in inDelphi predictions in endogenous VO target sequences in HEK293 (N=86), HCT116 (N=9l), and K562 (N=82) cells. PIG. 13B, Distribution of Pearson correlation values comparing inDelphi predictions to observed product frequencies in VO sequence contexts in HEK293T (N=86), HCT116 (N=9l), and K562 (N=82) cells. The box denotes the 25th, 50th, and 75th percentiles, and whiskers show 1.5 times the interquartile range. FIG. 13C, Distribution of Pearson correlation values comparing inDelphi predictions to observed indel length frequencies in VO sequence contexts in HEK293 (N=86), HCT116 (N=9l), and K562 (N=82) cells. Box plot as in (FIG. 13B). FIG. 13D, Comparison of inDelphi and
Microhomology-Predictor frameshift predictions to observed frameshift frequencies among 86 VO target sequences in HEK293 cells. The error band represents the 95% C.I. around the regression estimate with 1, 000-fold bootstrapping. FIG. 13E shows l-bp insertion frequencies among edited outcomes in U20S and HEK293T cells (n = 27 and 26 observations, baseline n = 1,958 and 89 target sites, P = 4.2 x 10-8 and 8.1 x 10-12, respectively), two-sided Welch’s t- test. FIG. 13F shows smoothed predicted distribution of the highest frequency indel among major editing outcomes (+1 to -60 indels) for SpCas9 gRNAs targeting the human genome.
FIGs. 14A-14F show high-precision, template-free Cas9 nuclease-mediated deletion and insertion. FIG. 14A, Schematic of deletion repair at a designed Fib-B target sequence with a 9-bp microduplication and strong sequence microhomology. FIG. 14B, Observed frequency of microduplication collapse among all edited products at 56 Fib-B target sequences designed with 7- to 25-bp microduplications. The error band represents the 95% C.I. around the regression estimate with 1 ,000-fold bootstrapping. FIG. 14C, Observed frequencies of l-bp insertions among 205 sequence contexts designed to vary base identity at positions -5 to -2 (relative to the PAM at positions 0-2) in three surrounding low-microhomology sequence contexts. The X-axis is sorted by median l-bp insertion frequency; see FIGs. 20A-20E for the complete axis. FIG.
14D, Comparison of the observed l-bp insertion frequency at 205 Fib-B designed sequences as in (FIG. 14C) with varying positions -4 and -3. The box denotes the 25th, 50th, and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers. *P = 0.03; **P = 2.98x10-7, two-sided two-sample T-test, test statistic = -2.2 and -6.5, degrees of freedom = 185 and 32, Hedges’ g effect size = 0.58 and 2.3, for * and ** respectively. FIG. 14E, Comparison of predicted precision scores to observed precision scores for microhomology deletions in 86 VO target sites in HEK293 cells. FIG. 14F, Distribution of the predicted frequency of the most frequent deletion and insertion outcomes among major editing outcomes (l-bp insertions, 1- to 60-bp MH deletions, and 1- to 60-bp MH-less deletions) at 1,063,802 Cas9 gRNAs targeting human exons and introns.
FIGs. 15A-15F show precise template-free Cas9-mediated editing of pathogenic alleles to wild-type genotypes. FIG. 15 A, Using Cas9-nuclease to correct a pathogenic FDFR allele to wild-type. FIG. 15B, Comparison among ClinVar/HGMD pathogenic alleles of observed and predicted frequencies of repair to wild-type alleles, accompanied by a histogram of observed frequencies. Major editing products include l-bp insertions, 1- to 60-bp MH deletions, and 1- to 60-bp MH-less deletions. FIG. 15C, Comparison of observed and predicted frequencies of frameshift repair to the wild-type frame among ClinVar/HGMD pathogenic alleles, accompanied by a histogram of observed frequencies. Major editing products are defined as in (FIB. 15B).
FIG. 15D, Histograms of observed frequencies of repair to the wild-type genotype for wild-type mESCs and Prdkc-/-Lig4- /- mESCs at Lib-B pathogenic microduplication alleles with predicted repair frequency >50% among all major editing products, defined as in (FIG. 15B). Dashed lines indicate sample means which differ significantly. P = 7.8x10-12; two-sided two- sample T-test, test statistic = -6.9, degrees of freedom = 1,297, Hedges’ g effect size = 0.39. FIG. 15E, Flow cytometry contour plots showing GFP fluorescence and FDF-DyFight550 uptake in mESCs containing the FDFRdupl662_l669dupGCTGGTGA-P2A-GFP allele (FDFRdup- P2A-GFP) and treated with SpCas9 and gRNA when denoted. FIG. 15F, Fluorescence microscopy of mESCs containing the FDFRdupl662_l669dupGCTGGTGA-P2A-GFP allele treated with SpCas9 and gRNA, or untreated.
FIGs. 16A-16F show design and cloning of a high-throughput library to assess CRISPR- Cas9-mediated editing products. FIG. 16A, From left to right, distributions of predicted Cas9 on- target efficiency (Azimuth score), number of nucleotides participating in microhomology in 3- 30-bp deletions, GC content, and estimated precision of deletion outcomes derived from 169,279 potential SpCas9 gRNA target sites in the human genome with quintiles marked as used to design Fib-A. FIG. 16B, Schematic of the cloning process used to clone Fib-A and Fib-B. The cloning process involves ordering a library of oligonucleotides pairing a gRNA protospacer with its 55-bp target sequence, centered on an NGG PAM. To insert the gRNA hairpin between the gRNA protospacer and the target site, the library undergoes an intermediate Gibson Assembly circularization step, restriction enzyme linearization, and Gibson Assembly into a plasmid backbone containing a U6-promoter to facilitate gRNA expression, a hygromycin resistance cassette, and flanking Tol2 transposon sites to facilitate integration into the genome. FIG. 16C, Analysis of cumulative percentage of all CRISPR-Cas9-mediated deletions from VO target sequences in HEK293 (N=89), HCT116 (N=92), and K562 (N=86) that delete up to the reported number of nucleotides (X-axis). 94% of deletions are 30-bp or shorter. FIG. 16D shows the number of unique high-confidence editing outcomes called by simulating data subsampling in data in lib-A (n = 2,000 target sites) in mESCs (combined data from n = 3 independent biological replicates) and U20S cells (combined data from n = 2 independent biological replicates). For ‘all’, the original non-subsampled data are presented. Each box depicts data for 2,000 target sites. Outliers are not depicted. FIG. 16E shows Pearson’s r of genotype frequencies comparing lib-A in mESCs and U20S cells with endogenous data in HEK293 (n = 87 target sites), HCT116 (n = 88), and K562 (n = 86) cells. Outliers are depicted as diamonds l-bp insertion frequency adjustment was performed at each target site by proportionally scaling them to be equal between two cell types. FIG. 16F shows Pearson’s r of genotype frequencies at lib-A target sites, comparing two independent biological replicate experiments in mESCs (n = 1,861 target sites, median r = 0.89) and U20S cells (n = 1,921, median r = 0.77). Outliers are depicted as diamonds. Box plots denote the 25th, 50th and 75th percentiles and whiskers show 1.5 times the interquartile range.
FIGs.l7A-l7D show that high-throughput CRISPR-Cas9 editing outcome screening yields replicate-consistent data that is concordant with the repair spectrum at endogenous human genomic loci. FIG. 17A, Box and swarm plot of the Pearson correlation of the genotypic product frequency spectra at VO target sequences comparing Lib-A in mESCs with endogenous data inHEK293 (N=87), HCT116 (N=88), and K562 (N=86). Each dot represents a target sequence, the box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range. FIG. 17B, Pearson correlation of the genotypic product frequency spectra at 1,861 Lib-A target sequences comparing two biological replicate experiments in mESCs. Median r = 0.89.
The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers. FIG. 17C, Distribution of Cas9-mediated genotypic products by repair category in endogenous data at VO target sequences inK562 (N=88), HCT116 (N=92), and HEK293 (N=89). FIG. 17D, Frequencies of deletions occurring beyond the Cas9 cutsite by distance as measured by the number of bases between the deleted base nearest to the cutsite and the two bases immediately surrounding the cutsite. Cutsite and distances are oriented with the NGG PAM on the positive side. * P < 1x105 for the Pearson correlation between a specific deletion frequency distribution and Cas9 editing rates across target sequences (VO- HEK293T N = 96, Lib-A mESC N = 2000). Box plot as in (FIG. 17A), with outliers beyond whiskers not depicted. FIGs. 18A-18I show that sequence features correlated with higher and lower inDelphi phi scores. FIG. 18A, Diagram of all unique alignment outcomes at an example 7-bp deletion accompanied with a table of their MH-less end-joining type, MH length, deletion length, and delta-position. FIG. 18B, Plot of function learned by the neural network modeling MH deletions (MH-NN) mapping MH length and % GC to a numeric score (psi). FIG. 18C, Plot of function learned by the neural network modeling MH-independent deletions (MHless-NN) mapping deletion length to a numeric score (psi). FIG. 18D, Histogram of MHless-NN phi scores by deletion length, normalized to sum to 1. FIG. 18E, Observed frequency of 1 -bp insertion genotypes in 1,981 Lib- A target sequences with varying -4 nucleotides. The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers. FIG. 18F, Plot showing l-bp insertion frequency in 1,996 Lib-A target sequences compared to their total phi score. Pearson correlation = -0.084 (P = 1.7x10-4). FIG. 18G, Relationship between l-bp insertion frequency in 1,996 Lib-A target sequences compared to the predicted deletion length precision score. Pearson correlation = 0.069 (P =2.1x10-3). PIG.
18H, Diagram of hypothesized repair mechanisms that give rise to the outcome categories used by inDelphi, based on known mechanisms of MMEJ, microhomology-mediated alt-NHEJ and c- NHEJ repair pathways. Microhomology-mediated repair begins with 5’-end resection, allowing overlap of 3’-overhangs. Microhomologous basepairing of the 3’-overhangs temporarily stabilizes the ssDNA ends. In microhomology deletion, non-paired 3’-overhangs are removed and polymerase and ligase fill in and connect the gaps to reconstitute a dsDNA strand. In microhomology-less deletions, one 3’-overhang is ligated to the dsDNA backbone and the opposing strand is removed entirely, giving rise to a unilateral deletion with loss of bases on one side of the cutsite only. DNA polymerase and ligation bridge the ssDNA to create a contiguous dsDNA strand. Microhomology-independent mutations occur as a combined result of exonuclease, polymerase, and ligase activity that results in the joining of modified ends at the double strand break cutsite, giving rise to microhomology-less deletions, insertions, and mixtures thereof. FIG. 181 shows the categories of Cas9-mediated genotypic outcomes in data from U20S cells (n = 1,958 lib-A target sites), which can be compared to the categories of Cas9- mediated genotypic outcomes shown in FIG. 17C with regard to data from endogenous contexts at VO target sites in K562 (n = 88 target sites), HCT116 (n = 92), HEK293 (n = 89) cells. FIGs. 19A-19F show performance of inDelphi at predicting Cas9-mediated indel length and repair genotypes. FIG. 19A, Box and swarm plot of the Pearson correlation at 189 held-out Lib-A target sequences comparing inDelphi predictions with observed mESC Lib-A genotype product frequencies. The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range. FIG. 19B, Box and swarm plot of the Pearson correlation at 189 held-out Lib-A target sequences comparing inDelphi predictions with observed mESC Lib-A indel length frequencies for l-bp insertions to 60-bp deletions. Box plot as in (FIG. 19A). FIG. 19C, Distribution of predicted frameshift frequencies among 1 -60-bp deletions for SpCas9 gRNAs targeting exons, shuffled exons, and introns in the human genome. Dashed lines indicate means. *** P < 10-100. FIG. 19D, Pie chart depicting the output of Delphi for specific outcome classes. MH deletions (58% of all products) and single -base insertions (9% of all products) are predicted at single-base resolution, and deletion length is predicted for MH-less deletions (25% of all products). FIG. 19E and 19F show a comparison of two methods for frameshift predictions to observed values with Pearson’s r in HCT116 cells (FIG. 19E, n = 91 target sites) and K562 cells (FIG. 19F, n = 82 target sites). The error band represents the 95% confidence intervals around the regression estimate with 1 ,000-fold bootstrapping.
FIGs. 20A-20K show target sequences with extremely high or low microhomology phi scores skew toward a single predictable Cas9-mediated edited product. FIG. 20A, Scatter plot of the frequency of microduplication repair in Lib-B target sequences with designed 7-25 -bp regions of microduplication as a function of microduplication length in human U20S (N = 32) and HEK293T cells (N = 39). The Error band represents the 95% C.I. around the regression estimate with 1, 000-fold bootstrapping. FIG. 20B, Box plots displaying total deletion phi score, total precision scores, and l-bp insertion frequencies for (blue) 312 Lib-B sequences in the low- microhomology cohort with four randomized bases flanking the cutsite (fourbp), (green) 89 VO sequences (VO), and (red) 71 Lib-B sequences in the high-microhomology cohort with microduplications ranging from 7-25 bp (longdup). Box displays median and first and third quartiles. Whiskers are at 1.5 times interquartile range (IQR). Either swarm plot or outlier fliers depicted for each box plot. * P = 6.1x10-9; two-sided two-sample T-test, test statistic = -5.94, degrees of freedom = 399, Hedges’ g effect size = 0.49. FIG. 20C, Scatterplot of l-bp insertion frequency among all non-wild-type products when varying four bases surrounding the cutsite (positions -5 to -2 counted from the NGG-PAM at positions 0-2) with all x-tick labels depicted, contained within three target sequences (red, blue, green) from the low-microhomology cohort of Lib-B in mESCs (N=205). FIG. 20D, Distribution of the total frequency of all non-wild-type Cas9 editing products in the subset of target sequences from the low-microhomology cohort containing four randomized bases flanking the cutsite (fourbp) with >50% overall frequencies of l-bp insertion (N = 50), VO sequences (N = 89), and the high-microhomology cohort with microduplications ranging from 7-25 bp (longdup) in Lib-B editing in mESCs (N = 56). FIG. 20E, Scatterplot displaying l-bp insertion frequencies and Cas9 editing rate in 205“fourbp” contexts with Pearson correlation of -0.35 (P = 3.3e-07). FIG. 20F shows the frequency of l-bp insertions in mESCs (n = 1,981 lib-A target sites) and U20S cells (n = 1,918) with varying -4 nucleotides. FIGs 20G and 20H show plots of l-bp insertion frequency in mESCs (n = 1,996 lib- A target sites) and U20S cells (n = 1 ,966) compared to their total phi score (FIG. 20G) and predicted deletion length precision score (FIG. 20H) with Pearson’s r. FIG. 201 shows a comparison of l-bp insertion frequencies among all edited products from 1,966 lib-A target sites in U20S cells (combined data from n = 2 independent biological replicates). FIG. 20J shows nucleotides and their effect on the frequency of l-bp insertions in U20S cells. Only bases with non-zero linear regression weights in 10, 000-fold iterative cross-validation are shown. Total n = 1,966 lib-A target sites. FIG. 20K shows the insertion frequency in mESCs (n = 205) and U20S cells (n = 217) when varying four bases by the cleavage site (positions -5 to -2 counted from the NGG-PAM at positions 0-2) contained within three target sites designed with weak
microhomology.
FIGs. 21A-21D show the precise repair of pathogenic microduplications. FIG. 21A, Observed frequencies of repair to wild-type genotype at 194 ClinVar pathogenic alleles vs.
predicted frequencies in Lib-B in human HEK293T cells. FIG. 21B, Observed frequencies of repair to wild-type frame at 140 ClinVar pathogenic alleles vs. predicted frequencies in Lib-B in human HEK293T cells. FIG. 21C, Observed frequencies of repair to wild-type genotype at 49 Clinvar pathogenic alleles vs. predicted frequencies in Lib-B in human U20S cells. FIG. 21D, Observed frequencies of repair to wild-type frame at 37 ClinVar pathogenic alleles vs. predicted frequencies in Lib-B in human U20S cells.
FIGs. 22A-22E show altered distribution of Cas9-mediated genotypic products in Prkdc- /-Lig4-/- mESCs as compared to wild-type mESCs. FIG. 22A, Distribution of Cas9-mediated genotypic products by repair outcome class in Prkdc-/-Lig4-/- mESC for 1,446 target sequences. FIG. 22B, Comparison of observed mean frequency of deletion products contributed by microhomology-less unilateral joining and medial joining deletions among all deletions comparing 1,995 Lib-A target sequences in wildtype mESC to 1,850 Lib-A target sequences in Prkdc-/-Lig4-/- mESC. * P < 10-66; two-sided two-sample T-test, test statistic > 17.7, degrees of freedom = 3,843; Hedges’ g effect size > 0.58. FIG. 22C, Comparison of observed frequency of deletion products contributed by microhomology-less unilateral joining and medial joining deletions among all deletions, between 1,995 Lib-A target sequences in wildtype mESC to 1,850 Lib-A target sequences in Prkdc-/-Lig4-/- mESC. * and ** as in (FIG. 22B). The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as fliers. FIG. 22D, Observed mean frequency of l-bp insertion genotypes at 1,055 target sequences with varying -4 nucleotides in Lib-A in Prkdc-/-Lig4-/- mESCs. The error bars show the 95% C.I. on the sample mean with 1, 000-fold bootstrapping. FIG. 22E, Observed frequency of l-bp insertion genotypes at 1,055 target sequences with varying -4 nucleotides in Lib-A in Prkdc-/-Lig4-/- mESCs. Box plot as in
(b).
FIGs. 23A-23H show that template-free Cas9-nuclease editing of human cells containing pathogenic LDLR microduplication alleles restores LDL uptake. FIG. 23A, Flow cytometric contour plots showing GFP fluorescence and LDL-Dylight550 uptake in HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. FIG. 23B, Fluorescence microscopy of HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue. FIG. 23C, Fluorescence microscopy of U20S cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted. GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue. FIG. 23D, Flow cytometry gating strategy used for mESC + LDLRdup-P2A-GFP untreated. FIG. 23E, Flow cytometry gating strategy used for mESC + LDLRdup-P2A-GFP + SpCas9 + gRNA. FIG. 23F and 23G show the results of 12 pathogenic l-bp deletion alleles selected by inDelphi for high l-bp insertion frequency (combined data from n = 2 independent biological replicates) compared to lib-A (f) and presented in a table (FIG. 23G). The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. *P = 1.6 x 10-4, two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, the most frequent l-bp insertion genotype predicted by inDelphi that does not correspond to the wild-type genotype is indicated by an asterisk. In fluorescence microscopy plots, GFP fluorescence is shown in green, LDL-Dylight550 uptake in red, and Hoechst staining nuclei in blue. FIG. 23H shows mESC-trained inDelphi genotype prediction accuracy as 40 library sites.
FIG. 24A is a schematic depicting an exemplary method of using a trained computational model (e.g.,“inDelphi”) in conjunction with a Cas-based genome editing system to edit a nucleotide sequence (e.g., a genome) to achieve a desired genetic outcome (e.g., a correction to a disease-causing mutation to treat a disease, or modification of a wildtype type gene to confer an improved trait or phenotype). For any given set of inputs (a context sequence and a selected cut site), the trained computational model computes the probability distribution of indel lengths and the probability distribution of genotype frequencies, enabling the user to select the optimal input (e.g., cut site) for conducting editing by a Cas-based genome editing system to achieve the highest frequency of desired genetic output. The computational method may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof.
FIG. 24B is a schematic depicting an exemplary method of using a trained computational model (e.g.,“inDelphi”) in conjunction with a double-strand break (DSB)-inducing genome editing system to edit a nucleotide sequence (e.g., a genome) to achieve a desired genetic outcome (e.g., a correction to a disease-causing mutation to treat a disease, or modification of a wildtype type gene to confer an improved trait or phenotype). For any given set of inputs (a context sequence and a selected cut site), the trained computational model computes the probability distribution of indel lengths and the probability distribution of genotype frequencies, enabling the user to select the optimal input (e.g., cut site) for conducting editing by a DSB- inducing genome editing system to achieve the highest frequency of desired genetic output. The computational method may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof.
FIG. 25A-25D provides a characterization of lib-B data including pathogenic
microduplication repair in wild-type mESCs, wild-type U20S cells and mESCs treated with DPKi3, NU7026 and MLN4924. FIG. 25 A shows box plots of the number of unique high- confidence editing outcomes called by simulating data subsampling in data at 2,000 lib-B target sites in mESCs (combined data from n = 2 independent technical replicates) and U20S cells (combined data from n = 2 independent biological replicates). In‘all’, the full non-subsampled data are presented (see Table 8 herein for read counts). Each box depicts data for 2,000 target sites. The box denotes the 25th, 50th, and 75th percentiles and whiskers show 1.5 times the interquartile range. Outliers are not depicted. FIG. 25B shows the frequencies of repair to wild- type genotype at 567 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U20S cells with Pearson’s r. FIG. 25C shows the frequencies of repair to wild-type frame at 437 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U20S cells with Pearson’s r. FIG. 25D shows the frequency of pathogenic microduplication repair in wild-type mESCs (n = 1,480 target sites) compared to mESCs treated with MLN4924 (n = 1,569), NU7041 (n = 1,561) and DPKi3 (n = 1,563).
FIG. 26A-26G shows the altered distributions of Cas9-mediated genotypic products in Prkdc-/-Lig4-/- mESCs and mESCs treated with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs. FIG. 26A shows a comparison of MH deletions among all deletions at lib-B target sites in wild-type cells (n = 1,909 target sites), cells treated with DPKi3 (n = 1,999), MLN4924 (n = 1,995) or NU7026 (n = 1,999) and Prkdc-/-Lig4-/- cells (n = 1,446). Statistical tests performed against wild-type population. *P = 5.6 x 10-5, **P = 3.5 x 10-13, ***P = 5.0 x 10-41, two-sided Welch’s t-test. FIG. 26B shows a comparison of the frequency of each class of MH-less deletions among all deletion products in wild-type (lib-A and lib-B target sites, n = 3,829 target sites), DPKi3 (lib-B, n = 1,990), MLN4924 (lib-B, n = 1,980), NU7026 (lib-B, n = 1,992) and Prkdc-/-Lig4-/- (lib-A and lib-B target sites, n = 3,344). P values are compared to wild-type, two-sided Welch’s t-test. FIG. 26C shows frequency of l-bp insertions at 1,055 target sites in lib-A in Prkdc-/-Lig4-/- mESCs. FIG. 26D Frequencies of deletion repair to wild-type genotype in lib-B in wild-type mESCs (n = 1,480 target sites, combined data from two technical replicates) compared to conditions, with combined data from two independent biological replicates for each of Prkdc-/-Fig4-/- (n = 1,041 target sites), MFN4924 (n = 1,569), NU7026 (n = 1,561) and DPKi3 (n = 1,563). FIG. 26E provides a table of Pearson’s r of the change in disease correction frequency compared to wild-type at n = 791 target sites for each pair of conditions f, g, Annexin V-568 staining flow cytometry contour plots (FIG. 26F) and mean ± standard deviation values (FIG. 26G) in wild-type and Prkdc-/-Fig4-/- lib-A mESCs following transfection with SpCas9-P2A-GFP (representative data for n = 2 experiments). Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds.
DEFINITIONS
As used herein and in the claims, the singular forms“a,”“an,” and“the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to“an agent” includes a single agent and a plurality of such agents.
The term“Cas9” or“Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 protein, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A Cas9 nuclease is also referred to sometimes as a casn 1 nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target
complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5' exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply“gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeanx C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471 :602-607(2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilu . Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive ( e.g ., an inactivated) DNA cleavage domain.
In some embodiments, Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_0l5683.l, NC_0l73l7.l); Corynebacterium diphtheria (NCBI Refs: NC_0l6782.l, NC_0l6786.l); Spiroplasma syrphidicola (NCBI Ref: NC_02l284.l); Prevotella intermedia (NCBI Ref: NC_0l786l. l); Spiroplasma taiwanense (NCBI Ref: NC_02l846.l); Streptococcus iniae (NCBI Ref: NC_02l3l4.l); Belliella baltica (NCBI Ref: NC_0l80l0.l); Psychroflexus torques I (NCBI Ref: NC_0l872l.l); Streptococcus thermophilus (NCBI Ref: YP_820832. l); Listeria innocua (NCBI Ref: NP_472073.l); Campylobacter jejuni (NCBI Ref:
YP_002344900.1 ) ; or Neisseria meningitidis (NCBI Ref: YP_002342100.1).
The term“Cas-based genome editing system” refers to a system comprising any naturally occurring or variant Cas endonuclease (e.g., Cas9), or functional variant, homolog, or orthologue thereof, and a cognate guide RNA. The term“Cas-based genome editing system” may also refer to an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA.
The term“DSB-based genome editing system” refers to a system comprising any naturally occurring or variant endonuclease which catalyzes the formation of a double strand break at a cut site (e.g., Cas9, Crfl, TALEN, or Zinc Finger), or functional variant, homolog, or orthologue thereof, and a cognate guide RNA if required (e.g., TALENs and Zinc Fingers do not require a guide RNA for targeting to a cut site). The term“DSB-based genome editing system” may also refer to an expression vector having at least one expressible nucleotide sequence encoding a DSB endonuclease protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, if required (e.g., as required for Cas9 or Crfl).
The term“effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some
embodiments, an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease. In some embodiments, an effective amount of a recombinase may refer to the amount of the recombinase that is sufficient to induce recombination at a target site specifically bound and recombined by the recombinase. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, the specific allele, genome, target site, cell, or tissue being targeted, and the agent being used.
The term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease and the catalytic domain of a recombinase. In some embodiments, a linker joins a dCas9 and a recombinase. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Fonger or shorter linkers are also contemplated.
The term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way.
The terms“nucleic acid” and“nucleic acid molecule,” as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleoside, a nucleotide, or a polymer of nucleotides. Typically, polymeric nucleic acids, e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage. In some embodiments,“nucleic acid” refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides). In some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising three or more individual nucleotide residues. As used herein, the terms“oligonucleotide” and“polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides).
In some embodiments,“nucleic acid” encompasses RNA as well as single and/or double- stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides.
Furthermore, the terms“nucleic acid,”“DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5' to 3' direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5- methylcytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5- propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7- deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioates and 5'-N-phosphoramidite linkages).
The terms“protein,”“peptide,” and“polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. The term“fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a recombinase. In some embodiments, a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
The term“RNA-programmable nuclease,” and“RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage. In some embodiments, an RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeabley to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure. For example, in some embodiments, domain (2) is homologous to a tracrRNA as depicted in Figure 1E of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in U.S. Provisional Patent Application, U.S.S.N. 61/874,682, filed September 6, 2013, entitled“Switchable Cas9 Nucleases And Uses Thereof,” and U.S. Provisional Patent Application, U.S.S.N. 61/874,746, filed September 6, 2013, entitled“Delivery System For Functional Nucleases,” the entire contents of each are hereby incorporated by reference in their entirety. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.” For example, an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA- programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csnl) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes .” Ferretti J.J., McShan W.M., Ajdic D J., Savic D J., Savic G., Lyon K., Primeanx C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001);“CRISPR RNA maturation by trans- encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602- 607(2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara L, Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference.
Because RNA-programmable nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA. Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013);
Hwang, W.Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e0047l (2013); Dicarlo, J.E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic acids research (2013); Jiang, W. et al. RNA- guided editing of bacterial genomes using CRISPR-Cas systems. Nature biotechnology 31, 233- 239 (2013); the entire contents of each of which are incorporated herein by reference).
The term“subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non -human subject. The subject may be of either sex and at any stage of
development.
The terms“treatment,”“treat,” and“treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms“treatment,”“treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.
Use of ordinal terms such as“first,”“second,”“third,” etc. in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," "having,"
“containing,”“involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
Major research efforts focus on improving efficiency and specificity of genome editing systems, such as, CRISPR/Cas9, other Cas-based, TALEN-based, and Zinc Finger-based genome editing systems. For instance, with regard to CRISPR/Cas9 systems, efficiency may be improved by predicting optimal Cas9 guide RNA (gRNA) sequences, while specificity may be improved by modeling factors leading to off-target cutting, and by manipulating Cas9 enzymes. Variant Cas9 enzymes and fusion proteins may be developed to alter the protospacer adjacent motif (PAM) sequences acted on by Cas9, and to produce base-editing Cas9 constructs with high efficiency and specificity. For example, Cpfl (also known as Casl2a) and other alternatives may be used in CRISPR genome editing in addition to, or instead of, Cas9.
The inventors have recognized and appreciated that less attention has been devoted to understanding and modulating repair outcomes. In that respect, nucleotide insertions and/or deletions resulting from template-free repair mechanisms (e.g., NHEJ, MMEJ, etc. and excluding homology-based repair (HDR)) are commonly thought to be random and therefore only suitable for gene knock-out applications. For gene knock-in or gain-of-function applications, a template- based repair mechanism such as HDR is typically used.
CRISPR/Cas with HDR allows arbitrarily designed DNA sequences to be incorporated at precise genomic locations. However, this technique suffers from low efficiency - HDR occurs rarely in typical biological conditions (e.g., around 10% frequency), because cells only permit HDR to occur after sister chromatids are synthesized in S phase but before M phase when mitosis splits the sister chromatids into daughter cells. For many cell-types, the fraction of time spent in S-G2-M phases of a cell cycle is low. In sum, while outcomes are predictable when HDR does occur, HDR occurs infrequently, and therefore a desired DNA sequence will be incorporated into only a small percentage of cells. In addition, in post-mitotic cell-types of interest such as neurons, the HDR repair pathway is no longer used, further limiting HDR’s utility for genetic engineering.
Some research has been done to improve efficiency of HDR, for example, through improved homology templates and small molecule modulation. Despite these efforts, template- based repair efficiency remains low, and proposed CRISPR/Cas gene knock-in or gain of function applications have thus far been limited to ex vivo applications where screening may be performed for cells with a desired repair genotype.
Unlike HDR, NHEJ is capable of occurring during any phase of a cell cycle and in post mitotic cells. However, NHEJ, as discussed above, has been perceived as a random process that produces a large variety of repair genotypes with insertions and/or deletions, and has been used mainly to knock out genes. In short, NHEJ is efficient but unpredictable.
Recent work suggests that outcomes of some template-free repair mechanisms are actually non-random. For instance, it has been observed that MMEJ is involved in repair outcomes. Furthermore, repair outcomes have been analyzed to predict gRNAs that are more likely to produce frameshifts. However, there is still a need for accurate prediction of genotypic outcomes of CRISPR/Cas cutting and ensuing cellular DNA repair.
The present inventors have unexpectedly found through computational analyses that template-free DNA/genome editing systems, e.g., CRISPR/Cas9, Cas-based, Cpfl-based, or other DSB (double-strand break)-based genome editing systems, produce a predictable set of repair genotypes thereby enabling the use of such editing systems for applications involving or requiring precise manipulation of DNA, e.g., the correction of a disease-causing genetic mutation or modifying a wildtype sequence to confer a genetic advantage. This finding is contrary to the accepted view that DNA double-strand break repair (i.e., template-free, non-homology- dependent repair) following cleavage by genome editing systems produces stochastic and heterogenous repair products and are therefore impractical for applications beyond gene disruption. Thus, the specification describes and discloses in various aspects and embodiments computational-based methods and systems for practically harnessing the innate efficiencies of template-free DNA repair systems for carrying out precise DNA and/or genomic editing without the reliance upon homology-based repair.
In accordance with some embodiments, techniques are provided for predicting genotypes of CRISPR/Cas editing outcomes. For instance, a high-throughput approach may be used for monitoring CRISPR/Cas cutting outcomes, and/or a computer-implemented method may be used to predict genotypic repair outcomes for NHEJ and/or MMEJ. The inventors have recognized and appreciated that accurate prediction of repair genotypes may allow development of
CRISPR/Cas gene knock-in or gain-of-function applications based on one or more template-free repair mechanisms. This approach may simplify a genome editing process, by reducing or eliminating a need to introduce exogenous DNA into a cell as a template.
Additionally, or alternatively, using one or more template-free repair mechanisms for gene knock-in may provide improved efficiency. For instance, the inventors have recognized and appreciated that NHEJ and MMEJ may account for a large portion of CRISPR/Cas repair products. While template-free repair mechanisms may not always produce desired repair genotypes with sufficiently high frequencies, one or more desired repair genotypes may occur with sufficiently high frequencies in some specific local sequence contexts. For such a local sequence context, template-free repair mechanisms may outperform HDR with respect to simplicity and efficiency.
In some embodiments, one or more of the techniques provided herein may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof, which may facilitate designs of gene knock-in or gain-of-function applications. For example, the inventors have recognized and appreciated that some disease- causing alleles, when cut at a selected location by CRISPR/Cas, may exhibit one or just a few repair outcomes that occur at a high frequency and transform the disease-causing allele into one or more healthy alleles. Disease-causing alleles may occur in genomic sequences that code for proteins or regulatory RNAs, or genomic sequences that regulate transcription or other genomic functions.
In some embodiments, one or more of the techniques provided herein may be used to predict, for a given local sequence context, template-free repair genotypes and frequencies of occurrence thereof, which may be used to select desirable one or more guide RNAs when HDR is employed to edit DNA. Since HDR does not occur 100% of the time, the template-free repair genotypes predicted by this method will be a natural byproduct of sites where HDR failed to occur. The one or more techniques provided herein allow these failed HDR byproducts to be predicted and one or more guide RNAs chosen that will produce the most desirable byproducts for HDR failures. For example, a disease-causing allele may be targeted for HDR repair, but if HDR does not occur at a specific site the template-free repair products can be chosen to transform a disease-causing allele into one or more healthy alleles or to not have deleterious effects. Deleterious effects could result from template-free repair that changed a weakly functional allele into a non-functional allele or into a dominant allele that negatively impacted health. In some embodiments, guide RNA selection consists of considering all guide RNAs that are compatible with HDR repair of a disease-causing allele, and for each guide RNA using one or more of the techniques provided herein to predict its template-free repair genotypes. One or more guide RNAs are then selected for use with the HDR template that have the template-free repair genotypes that are most advantageous for health. Alternatively in some embodiments, one or more guide RNAs are then selected for use with the HDR template that have the template- free repair genotypes that are most likely to disrupt gene function. It should be appreciated that the techniques disclosed herein may be implemented in any of numerous ways, as the disclosed techniques are not limited to any particular manner of implementation. Examples of details of implementation are provided solely for illustrative purposes. For instance, while examples are given where CRISPR/Cas9 is used to perform genome editing, it should be appreciated that aspects of the present application are not so limited. In some embodiments, another genome editing technique, such as CRISPR/Cpfl, may be used. Furthermore, the disclosed techniques may be used individually or in any suitable combination, as aspects of the present disclosure are not limited to the use of any particular technique or combination of techniques.
FIG. 1 shows an illustrative DNA segment 100, in accordance with some embodiments. For instance, the DNA segment 100 may be exon 43 of a dystrophin gene. About 4% of
Duchenne’s muscular dystrophy cases are caused by mutations in this exon. Therapeutic solutions showing success in clinical trials use antisense oligonucleotides to cause this exon to be skipped during translation, thereby restoring normal dystrophin function.
The inventors have recognized and appreciated that another therapeutic approach may be possible, using genome editing to make permanent changes to dystrophin exon 43. For instance, in some embodiments, CRISPR/Cas9 (or another suitable technique for cutting a DNA sequence, such as CRISPR/Cpfl) may be used to disrupt a donor splice site motif of dystrophin exon 43, and one or more template-free repair mechanisms may restore normal dystrophin function.
In one aspect, the specification discloses a computational model.
In certain embodiments, the computational model can predict and/or compute an optimized or preferred cut site for a DSB-based genome editing system for introducing a genetic change into a nucleotide sequence. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In certain other embodiments, the computational model can predict and/or compute an optimized or preferred cut site for a Cas-based genome editing system for introducing a genetic change into a nucleotide sequence. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In still other embodiments, the computation model provides for the selection of a optimized or preferred guide RNA for use with a Cas-based genome editing system for introducing a genetic change in a genome. In preferred embodiments, the repair does not require homology-based repair mechanisms.
In various embodiments, the computational model is a neural network model having one or more hidden layers.
In other embodiments, the computational model is a deep learning computational model.
In various embodiments, that the DSB-based genome editing system (e.g., a Cas-based genome editing system) edits the genome without relying on homology-based repair.
In various embodiments, that computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In other embodiments, computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
In embodiments, the computational model comprises one or more training modules for evaluating experimental data.
In an embodiment, the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
In certain embodiments, the computational model predicts genomic repair outcomes for any given input nucleotide sequence (i.e., context sequence) and cut site.
In certain embodiments, the genomic repair outcomes comprise microhomology deletions, microhomology- less deletions, and l-bp insertions.
In various embodiments, the one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
In certain embodiments, the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more. In various embodiments, the computation model concerns predicting genetic repair outcomes at double-strand breaks cleaves induced by any DSB-based genomic editing system (e.g., CRISPR/Cas9, Cas-base, Cfrl-based, or others). FIG. 1 depicts the anatomy of a double strand break. In the example shown in FIG. 1, the DNA segment 100 includes a top strand 105 A and a bottom strand 105B. These two strands are complementary and therefore encode the same information. In some embodiments, CRISPR/Cas9 may be used to create a double strand cut at a selected donor splice site motif, which may be a specific sequence of 6-10 nucleotides. In the example of FIG. 1, an NGG PAM may be used, as underlined and shown at 115, so that a cut site 110 would occur within the selected donor splice site motif. Any suitable algorithm may be used to detect presence or absence of the splice site motif in repair products, thereby verifying if the splice site motif has been successfully eliminated.
FIGs. 2A-D show an illustrative matching of 3’ ends of top and bottom strands of a DNA segment at a cut site and an illustrative repair product, in accordance with some embodiments. For instance, the strands may be the illustrative top strand 105 A and the illustrative bottom strand 105B of FIG. 1 , and the cut site may be the illustrative cut site HO of FIG. 1. (To avoid clutter, the surrounding sequence context is omitted in FIGs. 2B-D.)
In some embodiments, a segment of double-stranded DNA may be represented such that the top strand runs 5’ on the left to 3’ on the right. Given a cut in this double stranded DNA, nucleotides and their complementary base-paired nucleotides that lie between the 5’ end of the top strand and the cut site may be said to be located at the 5’ side of the cut site. Likewise, nucleotides and their complementary base-paired nucleotides that lie between the cut site and the 3’ end of the top strand may be said to be located at the 3’ side of the cut site.
In the example shown in FIG. 2A, a deletion length of 5 base pairs is considered, for example, as a result of 5’ end resection, where the top strand 105A has an overhang 200A of length 5 at the 5’ side of the cut site 110, and the bottom strand 105B has an overhang 200B of length 5 at the 3’ side of the cut site 110. As shown in FIG. 2B, there is no match between the overhangs 200A and 200B in the first three bases, but there is a match in each of the last two bases. Thus, in this example, a microhomology 205 is present, with a 2 base pair match.
FIG. 2C shows an illustrative result of flap removal, where the three mismatched bases in the overhang 200B are removed. For instance, in some embodiments, given a microhomology, some or all nucleotides on the 3’ side of the microhomology on the top strand, and/or some or all nucleotides on the 3’ side of the microhomology on the bottom strand, may be resected.
Pictorially, with the top strand running 5’ to 3’, nucleotides to the right of the microhomology on the top strand may be resected, and nucleotides to the left of the microhomology on the bottom strand may be resected.
FIG. 2D shows an illustrative repair product resulting from polymerase fill-in and ligation, where three matching bases are added to the overhang 200B.
FIG. 3A shows an illustrative machine learning model 300, in accordance with some embodiments. The machine learning model 300 may be trained using experimental data to compute, given an input DNA sequence seq and a cut site location, a probability distribution over any suitable set of deletion and/or insertion genotypes, and/or a probability distribution over any suitable set of deletion and/or insertion lengths. For instance, in some embodiments, 1 base pair insertions and 1 -60 base pair deletions may be considered. (These repair outcomes may also be referred to herein as +1 to -60 indels.) The inventors have observed empirically that indels outside of this range occur infrequently. However, it should be appreciated that aspects of the present disclosure are not limited to any particular set of repair outcomes. In some
embodiments, only insertions (e.g., 1-2 base pair insertions), or only deletions (e.g., 1-28 base pair deletions), may be considered, for example, based on availability of training data.
The inventors have recognized and appreciated that accurate predictions of repair outcomes may be facilitated by separating the repair outcomes into three classes: microhomology (MH) deletions, microhomology-less (MH-less) deletions, and insertions. The inventors have further recognized and appreciated that different machine learning techniques may be applied to the different classes of repair outcomes. For instance, in the example of FIG. 3, the machine learning model 300 includes three modules: the MH deletion module 305, the MH-less deletion module 310, and the insertion module 315. As discussed below, these modules may compute scores for various indel genotypes and/or indel lengths, which may in turn be used to compute a probability distribution over indel genotypes and/or a probability distribution over indel lengths. In some embodiments, one or more modules (e.g., the MH deletion module 305 and the MH-less deletion module 310) may be trained jointly. In some embodiments, a module may be dependent upon one or more other modules. For instance, as discussed below, an input feature used in the insertion module 315 may be derived based on outputs of the MH deletion module 305 and/or the MH-less deletion module 310. In some embodiments, MH deletions may include deletions that are derivable analytically by simulating MMEJ. For instance, all microhomologies may be identified for deletion lengths of interest (e.g., deletion lengths 1-60). A genotypic outcome may be derived for each such microhomology by simulating polymerase fill-in, for example, as discussed in connection with FIGs. 2A-2D. (The inventors have recognized and appreciated that there is a one-to-one correspondence between the microhomologies and the genotypic outcomes.) A deletion that is derivable in this manner may be classified as a MH deletion, whereas a deletion that is not derivable in this matter may be classified as a MH-less deletion.
Techniques for identifying microhomologies for a given a deletion length L of interest (e.g., each deletion length between 1 and 60) are described below. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular technique for identifying microhomologies.
In some embodiments, an input DNA sequence seq may be represented as a vector with integer indices, where each element of the vector is a nucleotide from the set, {A, C, G, T}, and the cut site is between seq[— 1] and seq[0], and seq is oriented 5’ on the left to 3’ on the right.
A subsequence seq[i: j], i < j, may be a vector of length j— i, including elements seq[i ] to seq[j— 1] For each deletion length L of interest (e.g., L between 1 and 60), left[L\ may be used to denote seq[—L: 0], and right[L\ may be used to denote seq[ 0, L]. Thus, with reference to the example shown in FIGs. 1, 2A, left\ 5] may be ACAAG, and right\ 5] may be GGTAG. Because the top strand 105A and the bottom strand 105B are complementary, a microhomology (e.g., the microhomology 205) may be identified by looking for exact matches between left\ 5] and right[ 5] (which may be equivalent to complementary matches between the overhang 200A and the overhang 200B). For instance, a match vector may be constructed for each deletion length L of interest (e.g., L between 1 and 60) as follows: match[L] [i] = '|’ if left[L] [i] = right[L] [i], otherwise match[L] [i] = ' Such matching between left [5] and right [5] is illustrated below.
ACAAG
. . . I I
GGTAG
In some embodiments, a microhomology may be identified by looking for match[L] [i : j ] such that match[L] [k\ = '|' for all i < k < j and match[L] [i] ! = ' |’ and match[L] \j ] ! = '| - For instance, with reference to the example shown in FIG. 1 , there may be no microhomology for deletion length 3, no microhomology for deletion length 4, one microhomology for deletion length 5, three microhomologies for deletion length 6, etc., as illustrated below.
AAG
GGT
CAAG
GGT A
ACAAG
. . . I I
GGTAG
GACAAG
I . . I . I
GGTAGG
In some embodiments, microhomologies identified for a suitable set of deletion lengths (e.g., 1-60) may be enumerated using indices n = 1 ... N, where N is the number of identified microhomologies. For each n, let G[n] denote the genotypic outcome corresponding to the microhomology n, let ML\n ] denote the microhomology length of the microhomology n, let C [n] denote the GC fraction of the microhomology n, and let DL [n] denote the deletion length of the microhomology n.
Although examples of representations of DNA sequences and subsequences are discussed herein, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular representation.
FIG. 3B shows an illustrative process 350 for building one or more machine learning models for predicting frequencies of deletion genotypes and/or deletion lengths, in accordance with some embodiments. For instance, the process 350 may be used to build the illustrative MH deletion module 305 and/or the illustrative MH-less deletion module 310 in the example of FIG. 3A. These modules may be used to compute, given an input DNA sequence seq and a cut site location, a probability distribution over any suitable set of deletion genotypes and/or a probability distribution over any suitable set of deletion lengths.
In some embodiments, a probability distribution over deletion lengths from 1-60 may be computed. However, it should be appreciated that aspects of the present disclosure are not limited to any particular set of deletion lengths. In some embodiments, an upper limit of deletion lengths may be determined based on availability of training data and/or any other one or more suitable considerations.
Referring to FIG. 3B, act 355 of the process 350 may include, for each deletion length L of interest (e.g., each deletion length between 1-60), aligning subsequences of length L on the 5’ and 3’ sides of a cut site in an input DNA sequence to identify one or more microhomologies, as discussed in connection with FIG. 3A. This may be performed for an input DNA sequence and a cut site for which repair genotype data from an CRISPR/Cas9 experiment is available.
At act 360, one or more microhomologies identified at act 355 may be featurized. Any suitable one or more features may be used, as aspects of the present disclosure are not so limited. As one example, the inventors have recognized and appreciated that energetic stability of a microhomology may increase proportionately with a length of the microhomology. Accordingly, in some embodiments, a microhomology length j— i may be used as a feature for a
microhomology match[L] [i : j ].
As another example, the inventors have recognized and appreciated that thermodynamic stability of a microhomology may depend on specific base pairings, and that G-C pairings have three hydrogen bonds and therefore have higher thermodynamic stability than A-T pairings, which have two hydrogen bonds. Accordingly, in some embodiments, a GC fraction, as shown below, may be used as a feature for a microhomology match[L] [i : j], where
indicator (bo ole an) equals 1 if boolean is true, and 0 otherwise.
Figure imgf000042_0001
In some embodiments, a length N vector may be constructed for each feature (e.g., microhomology length, GC fraction, etc.), where N is the number of microhomologies identified at act 355 for a set of deletion lengths of interest (e.g., 1-60), as discussed in connection with FIG. 3A. As discussed above, the inventors have recognized and appreciated that there is a one- to-one correspondence between microhomologies and genotypic outcomes that are classified as MH deletions. Therefore, feature vectors for microhomologies may be viewed as feature vectors for MH deletions. In some embodiments, acts 355 and 360 may be repeated for different input DNA sequences and/or cut sites for which repair genotype data from CRISPR/Cas9 experiments is available.
It should be appreciated that aspects of the present disclosure are not limited to any particular featurization technique. For instance, in some embodiments, two features may be used, such as microhomology length and GC fraction. However, that is not required, as in some embodiments one feature may be used (e.g., microhomology length, GC fraction, or some other suitable feature), or more than two features may be used (e.g., three, four, five, etc.). Examples of features that may be used for a microhomology match[L] [i : j] within a deletion of length L include, but are not limited to, a position of the microhomology within the deletion (e.g., as represented by and a ratio between a length of the microhomology (i.e., j— i ) and the
Figure imgf000043_0001
deletion length L. As another example, the inventors have recognized and appreciated that deoxyribonuclease (DNase) hypersensitivity may be used to classify genomic sequences into open or closed chromatin, which may impact DNA repair outcomes. Accordingly, in some embodiments, open vs. closed chromatin may be used as a feature. Any one or more of these features, and/or other features, may be used in addition to, or instead of, microhomology length and GC fraction. Furthermore, in some embodiments, explicit featurization may be reduced or eliminated by automatically learning data representations (e.g., using one or more deep learning techniques). Returning to FIG. 3B, one or more machine learning models may be trained at act 365 to compute one or more target probability distributions. For instance, a neural network model may be built for the illustrative MH deletion module 305 in the example of FIG. 3A. This model may take as input a length N vector for each of one or more features, as constructed at act 360, and output a length N vector of MH scores, where N is the number of microhomologies identified at act 355 for a set of deletion lengths of interest (e.g., 1-60). Additionally, or alternatively, a neural network model may be built for the illustrative MH-less deletion module 310 in the example of FIG. 3A. This model may take as input a vector for each of one or more features, and output a vector of MH-less scores. Both of the input vector and the output vector may be indexed by the set of deletion lengths of interest (e.g., 1-60) These neural network models may then be trained jointly using repair genotype data collected from CRISPR/Cas9 experiments. FIG. 4A shows an illustrative neural network 400A for computing MH scores, in accordance with some embodiments. For instance, the neural network 400A may be used in the illustrative MH deletion module 305 in the example of FIG. 3 A, and may be trained at act 365 of the illustrative process 350 shown in FIG. 3B.
In some embodiments, the neural network 400A may have one input node for each microhomology feature being used. For instance, in the example shown in FIG. 4A, there are two input nodes, which are associated with microhomology length and GC fraction, respectively. Each input node may receive a length N vector, where N is the number of microhomologies identified for a set of deletion lengths of interest (e.g., 1-60), for example, as discussed in connection with act 355 in the example of FIG. 3B.
In some embodiments, the neural network 400A may include one or more hidden layers, each having one or more nodes. In the example shown in FIG. 4A, there are two hidden layers, each having 16 nodes. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular number of hidden layers or any particular number of nodes in a hidden layer. Furthermore, different hidden layers may have different numbers of nodes.
In some embodiments, the neural network 400A may be fully connected. (To avoid clutter, the connections are not illustrated in FIG. 4A.) However, that is not required. For instance, in some embodiments, a dropout technique may be used, where a parameter p may be selected, and during training each node’s value is independently set to 0 with probability p. This may result in a neural network that is not fully connected.
In some embodiments, a leaky rectified linear unit (ReLU) nonlinearity sigma may be used in the neural network 400A. For instance, at hidden layer h and node i, an activation function may be provided as follows:
unit[h] [i] = sigma(w[h] [i] * unit[h— 1] + b[h] [i]), where sigma(x) = max(0, x) + 0.001 * min(0, x).
Thus, the neural network 400A may be parameterized by w[h ] and b[h\ for each hidden layer h. In some embodiments, these parameters may be initialized randomly, for example, from a spherical Gaussian distribution with some suitable center (e.g., 0) and some suitable variance (e.g., 0.1). These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments, for instance, as discussed below. In some embodiments, the neural network 400A may have one output node, producing a length N vector 'YMII of scores, where N is the number of microhomologies identified for the set of deletion lengths of interest (e.g., 1-60). Thus, there may be one score for each identified microhomology.
In some embodiments, the neural network 400 A may operate independently for each microhomology, taking as input the length of that microhomology (from the first input node) and the GC fraction of that microhomology (from the second input node), transforming those two values into 16 values (at the first hidden layer), then transforming those 16 values into 16 other values (at the second hidden layer), and finally outputting a single value (at the output node). In such an embodiment, parameters for the first hidden layer, w[l] [i] and b[l] [i], are vectors of length 2 for each node i from 1 to 16, whereas parameters for the second hidden layer, w[2] [i] and b[ 2] [i], are vectors of length 16 for each node i from 1 to 16, and parameters for the output layer, w[3] [l] and b[3] [1], are also vectors of length 16.
In some embodiments, the vector yMH of raw scores may be converted into a vector fMH of MH scores. The inventors have recognized and appreciated (e.g., from experimental data) that the strength of a microhomology decreases exponentially with deletion length. Accordingly, in some embodiments, an exponential linear model may be used to convert the raw scores into the MH scores. For instance, the following formula may be used:
MH M = exp(? MH W] ~ DL[n] * 0.25),
where n is an index for a microhomology (and thus a number between 1 and N), and DL\n ] is the deletion length of the microhomology n.
In some embodiments, 0.25 may be a hyperparameter value chosen to improve training speed by appropriate scaling. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular hyperparameter value for exponential conversion, or any conversion at all. In some embodiments, the vector y of raw scores may be used directly as MH scores.
FIG. 4B shows an illustrative neural network 400B for computing MH-less scores, in accordance with some embodiments. For instance, the neural network 400B may be used in the illustrative MH-less deletion module 310 in the example of FIG. 3A, and may be trained at act 365 of the illustrative process 350 shown in FIG. 3B. In some embodiments, deletion length may be modeled explicitly as an input to the neural network 400B. Thus, in an example where the set of deletion lengths of interest is 1-60, an input node of the neural network 400B may receive a deletion length vector, [1, 2, , 60].
In some embodiments, the neural network 400B may include one or more hidden layers, each having one or more nodes. In the example shown in FIG. 4B, the neural network 400B has two hidden layers that are similarly constructed as the illustrative neural network 400A in the example of FIG. 4A. However, it should be appreciated that aspects of the present disclosure are not limited to the use of a similar construction between the neural network 400A and the neural network 400B.
In some embodiments, the neural network 400B may have an output node producing a vector i MH-iess °f scores. There may be one score for each deletion length L of interest. Thus, in an example where the set of deletion lengths of interest is 1 -60, the length of the vector yMH-iess may be 60.
In some embodiments, an exponential linear model may be used to convert the vector Y Mil -less into a vector (pMii-iess of MH-less scores. For instance, the following formula may be used:
tpMH-lessiL — exP( ^MH-lessY^ ~ L * 0.25),
where L is a deletion length of interest. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular hyperparameter value for exponential conversion, or any conversion at all.
FIG. 4C shows an illustrative process 400C for training two neural networks jointly, in accordance with some embodiments. For instance, the process 400C may be used to jointly train the illustrative neural networks 400A and 400B of FIGs. 4A-4B.
In some embodiments, the MH score vector fMH and the MH-less score vector may be used to predict a probability distribution over MH deletion genotypes and/or a probability distribution over deletion lengths. For instance, given a microhomology n, a frequency may be predicted for the corresponding MH deletion genotype, out of all MH deletion genotypes. As discussed above, the inventors have recognized and appreciated that there is a one-to-one correspondence between microhomologies and genotypic outcomes that are classified as MH deletions. Thus, n = 1 ... N may be used as an index both for microhomologies and for MH deletions. In some embodiments, a frequency prediction for a microhomology n may depend on whether the microhomology n is full. A microhomology n is said to be full if the length of the microhomology n is the same as the deletion length associated with the microhomology n. For a microhomology n that is not full, a frequency may be predicted as follows, out of all MH deletion genotypes.
Figure imgf000047_0001
Here DL[m] denotes the deletion length of the microhomology m, and
indicator {bo ole an) equals 1 if boolean is true, and 0 otherwise.
The inventors have recognized and appreciated that, for a full microhomology, only a single deletion genotype is possible for the entire deletion length. Moreover, the single genotype may be generated via different pathways, such as MMEJ and MH-less end-joining. Therefore, full microhomologies may be modeled as receiving contributions from MH-dependent and an MH-less mechanisms. Thus, for a microhomology n that is full, a frequency may be predicted as follows, out of all MH deletion genotypes.
Figure imgf000047_0002
Because the predicted frequencies are normalized. VMHG is a probability distribution over all microhomologies identified for the set of deletion lengths of interest, and hence also a probability distribution over all MH deletions.
In some embodiments, given a deletion length L, a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all deletions, taking into account contributions from MH-dependent and MH-less mechanisms.
Figure imgf000047_0003
Here DL[m] denotes the deletion length of the microhomology m, and
indicator {bo ole an) equals 1 if boolean is true, and 0 otherwise.
In some embodiments, the parameters w[h ] and b[h\ for each hidden layer h of the neural networks 400A and 400B may be trained using a gradient descent method with L2-loss:
Figure imgf000047_0004
where V^HG is an observed probability distribution on MH deletion genotypes, and VGL is an observed probability distribution on deletion lengths (e.g., based on repair genotype data collected from CRISPR/Cas9 experiments).
In some embodiments, multiple instantiations of the neural networks 400A and 400B may be trained with different loss functions. For instance, in addition to, or instead of L2-loss, a squared Pearson correlation function may be used.
Loss = -(pearsonr(VMHG[m\, V^HG [m\))2 -(pear soar (VMHG [m\, V^HG[m\))2 The function pearsonr(x, y) may be defined as follows for length N vectors x and y, where x and y denote the averages of x and y, respectively.
Figure imgf000048_0001
Although neural networks are used in the examples shown in FIGs. 4A-4C, it should be appreciated that aspects of the present disclosure are not so limited. For instance, in some embodiments, one or more other types of machine learning techniques, such as linear regression, non-linear regression, random-forest regression, etc., may be used additionally or alternatively.
Furthermore, in some embodiments, one or more neural networks that are different from the neural networks 400A and 400B may be used additionally or alternatively. As one example, a different activation function may be used for one or more nodes, such as sigma(x ) = max( 0, x) (rectified linear unit, or ReLU), sigma(x ) = max( O.OOlx, x) (another example of leaky ReLU), sigma(x ) = 0.5
Figure imgf000048_0002
(Sigmoid), sigma(x) = max(0, x) + min(0, x) * 0.5 * (tanh(x) + 1) (Swish), etc. As another example, batch normalization may be performed at one or more hidden layers. It should be appreciated that aspects of the disclosure are not limited to training the neural networks 400A and 400B jointly. For instance, given a microhomology n, a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all MH deletion genotypes.
Figure imgf000048_0003
Since this prediction does not depend on the 4>MH-iess scores, the neural network 400A may be trained independently. In some embodiments, one or more other probability distributions may be predicted in addition to, or instead of VMHG and VDL. As one example, given a microhomology n, a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all deletion genotypes (both MH and MH-less).
Figure imgf000049_0001
As another example, for a microhomology n that is not full, a frequency may be predicted as follows, out of all deletion genotypes (both MH and MH-less).
Figure imgf000049_0002
For a microhomology n that is full, a frequency may be predicted as follows, out of all deletion genotypes (both MH and MH-less).
Figure imgf000049_0003
Here DL [n] denotes the deletion length of the microhomology n.
As another example, given a deletion length L, a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all MH-less deletion genotypes.
Figure imgf000049_0004
As another example, given a deletion length L, a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all deletion genotypes (both MH and MH-less).
Figure imgf000049_0005
Any one or more of the above predicted probability distributions may be used to train the neural networks 400A and 400B, with some suitable loss function.
FIG. 4D shows an illustrative implementation of the insertion module 315 shown in FIG. 3 A, in accordance with some embodiments. In this example, the insertion module 315 includes two models. First, an insertion rate model 405 may be constructed to predict, given an input DNA sequence and a cut site, a frequency of 1 base pair insertions out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions). Second, an insertion base pair model 410 may be constructed to predict frequencies of 1 base pair insertion genotypes (i.e., A, C, G,
T), again out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions). However, it should be appreciated that aspects of the present disclosure are not limited to any particular set of indels. In some embodiments, a small set of indels (e.g., 1 base pair insertions and 1-28 base pair deletions) may be considered, for instance, when less training data is available.
In some embodiments, the insertion rate model 405 may have one or more input features, which may be encoded as an M-dimensional vector of values for some suitable M. The insertion rate model 405 may have at least one output value. A set of training data for the insertion rate model 405 may include a plurality of M-dimensional training vectors and respective output values. Given an M-dimensional query vector, a k-nearest neighbor (k-NN) algorithm with weighting by inverse distance may be used to compute a predicted output value for the query vector. For instance, k = 5 may be used, and five training vectors that are closest to the query vector may be identified, and a predicted output value for the query vector may be computed as a sum of the output values corresponding to the five closest training vectors, weighted by inverse distance, as follows.
Figure imgf000050_0001
Here x is the query vector, d is a distance function for the M-dimensional vector space, x\l\, , x [5] are the five closest training vectors, y\ 1], ... , y[5] are the output values
corresponding respectively to x\l\, ... , x[5], and y is the predicted output value for the query vector x.
It should be appreciated that aspects of the present disclosure are not limited to the use of any particular k, or to the use of any k-NN algorithm. For instance, any one or more of the following techniques, and/or Bayesian variants thereof, may be used in addition to, or instead of k-NN: gradient-boosted regression, linear regression, nonlinear regression, multilayer perceptron, deep neural network, etc. Also, any suitable distance metric d may be used, such as Euclidean distance. In some embodiments, the insertion rate model 405 may have three input features: overall deletion score, precision score, and one or more cut site nucleotides. The overall deletion score may be computed based on outputs of the MH deletion module 305 and the MH-less deletion module 310 in the example of FIG. 3A, for instance, as follows.
Figure imgf000051_0001
Alternatively, log(0) may be used as the overall deletion score.
In some embodiments, the precision score may be indicative of an amount of entropy in predicted frequencies of a suitable set of deletion lengths. The inventors have recognized and appreciated that it may be desirable to calculate precision based on a large set of deletion lengths, but in some instances a smaller set (e.g., 1-28) may be used due to one or more constraints associated with available data. As discussed above, given a deletion length L, a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all deletions, taking into account contributions from MH-dependent and MH-less mechanisms.
Figure imgf000051_0002
Here DL[m] denotes the deletion length of the microhomology m, and
indicator(booleari) equals 1 if boolean is true, and 0 otherwise. The precision score may be computed as follows.
åHi VDL[L] * \og(VDL[L])
precision = 1—
log(28)
In some embodiments, the one or more cut site nucleotides may include nucleotides on either side of the cut site (i.e., seq[— 1] and seq[0]). In the example shown in FIG. 1, the cut site nucleotides are G and G, which are the third and fourth nucleotides to the left of the PAM sequence 115. However, it should be appreciated that aspects of the present disclosure are not limited to the use of two cut side nucleotides as input features to the insertion rate model 405.
For instance, only one cut side nucleotide (e.g., seq[— 1], which may be the fourth nucleotide to the left of the PAM sequence) may be used when less training data is available, whereas more than two cut side nucleotides (e.g., seq[— 2], seq[— 1], and seq[0], which may be the third, fourth, and fifth nucleotides to the left of the PAM sequence) may be used when more training data is available. In some embodiments, one or more input features to the insertion rate model 405 may be encoded in some suitable manner. For instance, the one or more cut site nucleotides may be one- hot encoded, for example, as follows.
A = 1000, C = 0100, G = 0010, 7 = 0001
In some embodiments, encoded input features may be concatenated to form an input vector. In an example in which two cut side nucleotides are used, an input vector may have a length of 10: four for each of the two cut side nucleotides, one for the precision score, and one for the overall deletion score.
In some embodiments, training data for a certain input DNA sequence may be organized into a matrix X. Each row in the matrix (C[ί,— ]) may correspond to a possible cut site, and may store a length M training vector for that cut site (e.g., M = 10). In some embodiments, each column in the matrix (X[—,j]) may be normalized to mean 0 and variance 1, as follows.
vr. X[i, j] - mean(X[-, j])
l,J var(X[— , j])
In some embodiments, values in a query vector may be normalized in a like fashion. For instance, a y'th value in a query vector x may be normalized as follows.
x[j] - mecm(X[- j])
Figure imgf000052_0001
var(X[— , j])
In some embodiments, an output value may be computed for each row in the training matrix X. For instance, an output value F[i], i corresponding to a possible cut site, may be a frequency of observed 1 base pair insertions, relative to all observed +1 to -60 indels, at that cut site.
In some embodiments, the insertion base pair model 410 may be constructed to predict frequencies of 1 base pair insertion genotypes (i.e., A, C, G, T). For instance, the insertion base pair model 410 may predict that the probability of a certain insertion genotype given one or more cut site nucleotides is the same as the frequency of that insertion genotype as observed in a subset of training data in which those one or more cut site nucleotides are observed. Thus, given an input DNA sequence seq and a cut site, the insertion base pair model 410 may determine one or more cut site nucleotides (e.g., seq[— 1] = "C"). The insertion base pair model 410 may then score the insertion genotypes A as follows.
Figure imgf000052_0002
seq[-l]="C") Here y is the frequency of 1 base pair insertions as predicted by the insertion rate model 405, and ObsFreq(" A" | seq[— 1]="C") is the observed frequency of insertion genotype A given that the nucleotide to the left of the cut site C. The other three insertion genotypes may be scored similarly.
In some embodiments, more than one cut site nucleotides may be considered. For instance, the insertion base pair model 410 may determine that seq[— 2] = "A", seq[— 1] = "C", and seq[0] = "G". The insertion base pair model 410 may then score the insertion genotypes A as follows, and the other three insertion genotypes may be scored similarly.
0 s["A"] = y * ObsFreq A" | seq[- 2] = "A" & seq[— 1]="C" & seq[0] = "G")
In some embodiments, a frequency of 1 base pair insertion genotype A, out of all +1 to - 60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions), may be predicted as follows. Frequencies for the other three insertion genotypes may be predicted similarly.
Figure imgf000053_0001
In some embodiments, a frequency of 1 base pair insertions, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions), may be predicted as follows.
Figure imgf000053_0002
In some embodiments, given a deletion length L, a frequency may be predicted as follows for the set of all deletions having the deletion length L, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions), taking into account contributions from MH-dependent and MH-less mechanisms.
Figure imgf000053_0003
In some embodiments, given a deletion length L, a frequency may be predicted as follows for the set of MH-less deletions having the deletion length L, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions). V DL+ins [L]
Figure imgf000054_0001
In some embodiments, given a microhomology n, a frequency may be predicted as follows for the corresponding MH deletion genotype, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
Figure imgf000054_0002
In some embodiments, for a microhomology n that is not full, a frequency may be predicted as follows, out of all +1 to -60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
Figure imgf000054_0003
For a microhomology n that is full, a frequency may be predicted as follows, out of all all +1 to 60 indels (i.e., 1 base pair insertions and 1-60 base pair deletions).
Figure imgf000054_0004
Here DL[n] denotes the deletion length of the microhomology n.
FIG. 5 shows an illustrative process 500 for processing data collected from CRISPR/Cas9 experiments, in accordance with some embodiments. For instance, the process 500 may be performed for each input DNA sequence and CRISPR/Cas9 cut site, and a resulting dataset may be used to train the illustrative computational models described in connection with FIGs. 4A-4D.
At act 505, repair genotypes observed from CRISPR/Cas 9 experiments may be aligned with an original DNA sequence. Any suitable technique may be used to observe the repair genotypes, such as Illumina DNA sequencing. Any suitable alignment algorithm may be used for alignment, such as a Needleman-Wunsch alghorithm with some suitable scoring parameters (e.g., +1 for match, -2 for mismatch, -4 for gap open, and -1 for gap extend, or +1 for match, -1 for mismatch, -5 for gap open, and -0 for gap extend). At act 510, one or more filter criteria may be applied to alignment reads from act 505.
For instance, in some embodiments, only those reads in which a deletion includes at least one base directly 5’ or 3’ of the CRISPR/Cas9 cut site are considered. This may filter out deletions that are unlikely to have resulted from CRISPR/Cas9.
At act 515, frequencies of indels of interest (e.g., from +1 to -60) may be normalized into a probability distribution.
FIG. 6 shows an illustrative process 600 for using a machine learning model to predict frequencies of indel genotypes and/or indel lengths, in accordance with some embodiments.
Acts 605 and 610 may be similar to, respectively, acts 355 and 360 of the illustrative process 350 of FIG. 3B, except that acts 605 and 610 may be performed for an input DNA sequence seq and a cut site location for which repair genotype data from an CRISPR/Cas9 experiment may not be available. At act 615, one or more machine learning models, such as the machine learning models trained at act 365 of the illustrative process 350 of FIG. 3B, may be applied to an output of act 610 to compute a frequency distribution over deletion lengths of interest.
The inventors have recognized and appreciated that, while Cas9 is typically understood to induce a blunt-end double-strand break, some evidence suggests that Cas9 may generate a 1 base pair staggered end cut instead. FIG. 7 shows illustrative examples of a blunt-end cut and a staggered cut, in accordance with some embodiments.
FIG. 8A shows an illustrative plot 800A of predicted repair genotypes, in accordance with some embodiments. For instance, the plot 800A may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to the example shown in FIG. 1. Each vertical bar may correspond to a deletion length, and a height of the bar may correspond to a predicted frequency of that deletion length. The lighter color may indicate repair genotypes that successfully eliminate the donor splice site motif, whereas the darker color may indicate failure. In this example, about 90% of repair products in the 3-26 base pair deletion class are predicted to be successful for the illustrative local sequence context and cut site shown in FIG. 1.
The inventors have recognized and appreciated that the 3-26 base pair deletion class may occur as frequently as 50%, for example, when assaying selected sequences (e.g., patient genotypes underlying certain diseases) integrated into the genome of mouse embryonic stem cells, with a l4-day exposure to CRISPR/Cas9. Thus, in view of the 90% success rate predicted above for the 3-26 base pair deletion class, a genetic editing approach using CRISPR-Cas9 may be provided that achieves a desired result with a 45% rate. In contrast, genetic editing using HDR may achieve a success rate of 10% or lower, and may require a more complex
experimental protocol.
FIG. 8B shows another illustrative plot 800B of predicted repair genotypes, in accordance with some embodiments. For instance, the plot 800B may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to an illustrative DNA sequence 805B, which may be associated with spinal muscular atrophy (SMA).
In some patients, a specific single nucleotide polymorphism (SNP) in exon 7 of the SMA2 gene may induce exon skipping of exon 7, erroneously including exon 8 instead. Exon 8 includes a protein degradation signal (namely, EMLA-STOP, as shown in FIG. 8B), which causes degradation in the SMA2 gene product, thereby inducing spinal muscular atrophy. In this region, a disease genotype must have precisely EMLA-STOP. Nearly any other genotype is considered healthy.
In the example of FIG. 8B, each vertical bar corresponds to a deletion length, and a height of the bar corresponds to a predicted frequency of that deletion length. The lighter color may indicate repair genotypes that successfully disrupt the EMLA-STOP signal, whereas the darker color may indicate failure. In this example, over 90% of repair products in the 3-26 base pair deletion class are predicted to be healthy.
FIG. 8C shows another illustrative plot 800C of predicted repair genotypes, in accordance with some embodiments. For instance, the plot 800C may be generated by applying one or more of the illustrative techniques described in connection with FIGs. 2A-2D, 3A-3B, 4A-4D, 5-6 to an illustrative DNA sequence associated with breast-ovarian cancer.
In the example of FIG. 8C, a clinical observed patient genotype includes an abnormal duplication of 14 base pairs that a wild type sequence from a normal/health individual lacks.
The patient genotype is incorporated into the genome of mouse embryonic stem cells, and then CRISPR/Cas9 is applied. It is observed that the 3-26 base pair deletion class occurs 65% out of all repair classes at this local sequence context. Moreover, as shown in FIG. 8C, repair to wild type is observed to occur at 89% rate among all 3-26 base pair deletions. Thus, an overall wild type repair rate is about 57%. FIG. 8D shows a microhomology identified in the example of FIG. 8C.
As discussed above, the inventors have recognized and appreciated at least two tasks of interest: predicting frequencies of deletion lengths, as well as predicting frequencies of repair genotypes. In some embodiments, a single machine learning model may be provided that performs both tasks.
In some embodiments, repair genotypes corresponding to a deletion of length L may be labeled as follows: for every integer K ranging from 0 to L, a K- genotype associated with deletion length L may be obtained by concatenating left[L] [—inf: K] with right[L] [K: +inf].
A vector COLLECTION of length Q where each element is a tuple (K, L) may be constructed by enumerating each /Ggenotype for each deletion length L of interest and removing tuples that have the same repair genotype, e.g., (k', L) and ( k , L) such that left[L] [—inf: k'] concatenated with right[L] [k' \ +inf] is equivalent to left[L] [—inf\ k] concatenated with
right[L] [k: +inf], for example, by retaining only the tuple with the larger K. A training data set may be constructed using observational data by constructing a vector X of length Q where X sums to 1 and X\c(\ represents an observed frequency of a repair genotype generated by
COLLECTION[q\.
In some embodiments, the vector COLLECTION may be featurized. This may be performed for a given tuple ( k , l ) by determining whether there is an index i such that match[l] [i : k] is a microhomology. If no such i exists, then the tuple ( k , l ) may be considered to not partake in microhomology.
The inventors have recognized and appreciated that frequencies of repair products may be influenced by certain features of microhomologies such as microhomology length, fraction of G- C pairings, and/or deletion length. The inventors have also recognized and appreciated that some default values may be useful for repair genotypes that are considered to not partake in
microhomology.
For example, the inventors have recognized and appreciated that energetic stability of a microhomology may increase proportionately with a length of the microhomology. Accordingly, in some embodiments, the microhomology length k— i may be used for a tuple ( k , l ), and a default value of 0 may be used if ( k , l ) does not partake in microhomology.
As another example, the inventors have recognized and appreciated that thermodynamic stability of a microhomology may depend on specific base pairings, and that G-C pairings have three hydrogen bonds and therefore have higher thermodynamic stability than A-T pairings, which have two hydrogen bonds. Accordingly, in some embodiments, a GC fraction, as shown below, may be used as a feature for ( k , l ), where indicator(boolean ) equals 1 if boolean is true, and 0 otherwise. A default value of— 1 may be used if ( k , l ) does not partake in
microhomology.
Figure imgf000058_0001
In some embodiments, a feature for deletion length may be considered, represented as l for the tuple (/c, /).
The inventors have also recognized and appreciated (e.g., from experimental data) that 0- genotype and /-genotype repair products may occur despite a lack of microhomology, and may occur through microhomology-free end-joining repair pathways. Accordingly, (/c, Z) may be featurized with a Boolean for 0-genotype that is equal to 1 if k = 0 and (/c, Z) does not partake in microhomology, and 0 otherwise. A Boolean feature for /-genotypes may also be used where it is equal to 1 if k = l and (/c, Z) does not partake in microhomology, and 0 otherwise.
FIG. 9 shows another illustrative neural network 900 for computing a frequency distribution over deletion lengths, in accordance with some embodiments.
In some embodiments, the neural network 900 may be parameterized by w[h ] and b[h] for each hidden layer h. In some embodiments, these parameters may be initialized randomly, for example, from a spherical Gaussian distribution with some suitable center (e.g., 0) and some suitable variance (e.g., 0.1). These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments.
In some embodiments, the neural network 900 may operate independently for each microhomology, taking as input the length of that microhomology (from the first input node), the GC fraction of that microhomology (from the second input node), Boolean features for 0 and Z- genotypes (from the third and fourth input node, where N-flag corresponds to Z-genotypes), and the length of the deletion (from the fifth input node), transforming those five values into 16 values (at the first hidden layer), then transforming those 16 values into 16 other values (at the second hidden layer), and finally outputting a single value (at the output node). In such an embodiment, parameters for the first hidden layer, w[l] [/] and b[l] [/], are vectors of length 5 for each node Z from 1 to 16, whereas parameters for the second hidden layer, w[2] [/] and b[2] [i], are vectors of length 16 for each node i from 1 to 16, and parameters for the output layer, w[3] [l] and b[3] [1], are also vectors of length 16.
In some embodiments, the neural network 900 may be applied independently (e.g., as discussed above) to each featurized ( k , l ) in COLLECTIONS to produce a vector of Q microhomology scores called Z.
In some embodiments, Z may be normalized into a probability distribution over all unique repair genotypes of interest within all deletion lengths of interest (e.g., deletion lengths between 3 and 26). The inventors have recognized and appreciated (e.g., from experimental data) that frequency may decrease exponentially with deletion length. Accordingly, in some embodiments, an exponential linear model may be used to normalize the vector of repair genotype scores. For example, the following formula may be used:
Figure imgf000059_0001
where DL[q\ = l for each q where COLLECT I ONS[q] = ( k , l ), and beta is a parameter.
In some embodiments, a probability distribution Y over all unique repair genotypes of interest within all deletion lengths of interest may be converted to a probability distribution Y' over all deletion lengths. The following formula may be used for this:
Figure imgf000059_0002
In some embodiments, the parameter beta may be initialized to—1. These parameters may then be trained using repair genotype data collected from CRISPR/Cas9 experiments.
In some embodiments, the parameters w[h ] and b[h\ for each hidden layer h and the parameters beta may be trained by using a gradient descent method with L2-loss on Y :
Figure imgf000059_0003
where predY is a predicted probability distribution on deletion lengths (e.g., as computed by the neural network 900 using current parameter values), and obsY is an observed probability distribution on deletion lengths (e.g., based on repair genotype data collected from CRISPR/Cas9 experiments).
The inventors have recognized and appreciated that one or more of the techniques described herein may be used to identify therapeutic guide RNAs that are expected to produce a therapeutic outcome when used in combination with a genomic editing system without an HDR template. For instance, one or more of the techniques described herein may be used to identify a therapeutic guide RNA that is expected to result in a substantial fraction of genotypic
consequences that cause a gain-of-function mutation in DNA in the absence of an HDR template. A therapeutic guide RNA may be used singly, or in combination with other therapeutic guide RNAs. An action of the therapeutic guide RNA may be independent of, or dependent on, one or more genomic consequences of the other therapeutic guide RNAs.
FIG. 10 shows, schematically, an illustrative computer 1000 on which any aspect of the present disclosure may be implemented. In the embodiment shown in FIG. 10, the computer 1000 includes a processing unit 1001 having one or more processors and a non-transitory computer-readable storage medium 1002 that may include, for example, volatile and/or non volatile memory. The memory 1002 may store one or more instructions to program the processing unit 1001 to perform any of the functions described herein. The computer 1000 may also include other types of non-transitory computer-readable medium, such as storage 1005 (e.g., one or more disk drives) in addition to the system memory 1002. The storage 1005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 1002.
The computer 1000 may have one or more input devices and/or output devices, such as devices 1006 and 1007 illustrated in FIG. 10. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 1007 may include a microphone for capturing audio signals, and the output devices 1006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
As shown in FIG. 10, the computer 1000 may also comprise one or more network interfaces (e.g., the network interface 1010) to enable communication via various networks (e.g., the network 1020). Examples of networks include a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
Having thus described several aspects of at least one embodiment, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the present disclosure. Accordingly, the foregoing description and drawings are by way of example only.
The above-described embodiments of the present disclosure can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
In this respect, the concepts disclosed herein may be embodied as a non-transitory computer-readable medium (or multiple computer-readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non- transitory, tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the present disclosure discussed above. The computer-readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present disclosure as discussed above.
The terms“program” or“software” are used herein to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present disclosure as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present disclosure.
Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
Various features and aspects of the present disclosure may be used alone, in any combination of two or more, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
In an exemplary embodiment, a computational model described herein is trained with experimental data as outlined in Example 1. The method outlined in Example 1 for training a computational model with experimental data is meant to be non-limiting.
Accordingly, the specification discloses a method for training a computational model described herein, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cognate guide RNA, wherein each nucleotide target sequence comprises a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a Cas-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
In another aspect, the specification discloses a method for training a computational model, comprising: (i) preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a DSB-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the nucleotide target sequence and/or the genomic repair products and the cut sites.
Methods for preparing nucleic acid libraries, vectors, host cells, and sequencing methods are well known in the art. The instant description is not meant to be limiting in any way as to the construction and configuration of the libraries described herein for training the computational model.
Accordingly, the specification provides in one aspect a method of introducing a desired genetic change in a nucleotide sequence using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the desired genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site.
A cut site can be at any position in a nucleotide sequence and its position is not particularly limiting.
The nucleotide sequence into which a genetic change is desired is not intended to have any limitations as to sequence, source, or length. The nucleotide sequence may comprise one or more mutations, which can include one or more disease-causing mutations.
In another aspect, the specification provides a method of treating a genetic disease by correcting a disease-causing mutation using a double-strand brake (DSB)-inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence comprising a disease-causing mutation; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for correcting the disease-causing mutation in the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby correcting the disease-causing mutation and treating the disease.
In yet another aspect, the specification provides a method of altering a genetic trait by introducing a genetic change in a nucleotide sequence using a double-strand brake (DSB)- inducing genome editing system, the method comprising: identifying one or more available cut sites in a nucleotide sequence; analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the genetic change into the nucleotide sequence; and contacting the nucleotide sequence with a DSB-inducing genome editing system, thereby introducing the desired genetic change in the nucleotide sequence at the cut site and consequently altering the associated genetic trait.
In another aspect, the specification provides a method of selecting a guide RNA for use in a Cas-genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising: identifying in a nucleotide sequence of a target genomic location one or more available cut sites for a Cas-based genome editing system; and analyzing the nucleotide sequence and cut site with a computational model to identify a guide RNA capable of introducing the genetic change into the nucleotide sequence of the target genomic location.
In still another aspect, the specification provides a method of introducing a genetic change in the genome of a cell with a Cas-based genome editing system comprising: selecting a guide RNA for use in the Cas-based genome editing system in accordance with the method of the above aspect; and contacting the genome of the cell with the guide RNA and the Cas-based genome editing system, thereby introducing the genetic change.
In various embodiments, the cut sites available in the nucleotide sequence are a function of the particular DSB-inducing genome editing system in use, e.g., a Cas-based genome editing system.
In certain embodiments, the nucleotide sequence is a genome of a cell. In certain other embodiments, the method for introducing the desired genetic change is done in vivo within a cell or an organism (e.g., a mammal), or ex vivo within a cell isolated or separated from an organism (e.g., an isolated mammalian cancer cell), or in vitro on an isolated nucleotide sequence outside the context of a cell.
In various embodiments, the DSB-inducing genome editing system can be a Cas-based genoe editing system, e.g., a type II Cas-based genome editing system. In other embodiments, the DSB-inducing genome editing system can be a TALENS-based editing system or a Zinc- Finger-based genome editing system. In still other embodiments, the DSB-inducing genome editing system can be any such endonuclease-based system which catalyzes the formation of a double-strand break at a specific one or more cut sites.
In embodiments involving a Cas-based genome editing system, the method can further comprise selecting a cognate guide RNA capable of directing a double-strand break at the optimal cut site by the Cas-based genome editing system.
In certain embodiments, the guide RNA is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6. In various embodiments, the guide RNA can be known or can be newly designed.
In various embodiments, the double-strand brake (DSB)-inducing genome editing system is capable of editing the genome without homology-directed repair.
In other embodiments, the double-strand brake (DSB)-inducing genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
In still other embodiments, the double-strand brake (DSB)-inducing genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof.
The double-strand brake (DSB)-inducing genome editing system may comprise a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof in certain embodiments.
In still other embodiments, the double-strand brake (DSB)-inducing genome editing system can comprise a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
In yet further embodiments, the double-strand brake (DSB)-inducing genome editing system can comprise a Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof. In various embodiments, the desired genetic change to be introduced into the nucleotide sequence, e.g., a genome, is to a correction to a genetic mutation. In embodiments, the genetic mutation is a single-nucleotide polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
In still other embodiments, the genetic change can comprises a 2-60-bp deletion or a l-bp insertion.
The genetic change in other embodiments can comprise a deletion of between 2-20, or 4- 40, or 8-80, or 16-160, or 32-320, 64-640, or up to 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10,000 or more nucleotides. Preferably, the deletion can restore the function of a defective gene, e.g., a gain-of-function frameshift genetic change.
In other embodiments, the desired genetic change is a desired modification to a wildtype gene that confers and/or alters one or more traits, e.g., conferring increased resistance to a pathogen or altering a monogenic trait (e.g., eye color) or polygenic trait (e.g., height or weight).
In embodiments involving correcting a disease-causing mutation, the disease can be a monogenic disease. Such monogenic diseases can include, for example, sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
In any of the above aspects and embodiments, the step of identifying the available cut sites can involve identifying one or more PAM sequences in the case of a Cas-based genome editing system.
In various embodiments of the above methods, the computational model used to analyze the nucleotide sequence is a deep learning computational model, or a neural network model having one or more hidden layers. In various embodiments, the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site. In still other embodiments, the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
In various embodiments, the computational model comprises one or more training modules for evaluating experimental data. In various embodiments, the computational model can comprise: a first training module for computing a microhomology score matrix; a second training module for computing a microhomology independent score matrix; and a third training module for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
In other embodiments, the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
In various embodiments, the genomic repair outcomes can comprise microhomology deletions, microhomology- less deletions, and/or l-bp insertions.
In still other embodiments, the computational model can comprise one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site, and type of DSB- genome editing system.
In various embodiments, the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or even up to 7K, 8K, 9K, 10K, 11K, 12K, 13K, 14K, 15K, 16K, 17K, 18K, 19K, 20K nucleotides, or more in length.
In another aspect, the specification relates to guide RNAs which are identified by various methods described herein. In certain embodiments, the guide RNAs can be any of those presented in Tables 1-6, the contents of which form part of this specification.
According to various embodiments, the RNA can be purely ribonucleic acid molecules. However, in other embodiments, the RNA guides can comprise one or more naturally-occurring or non-naturally occurring modifications. In various embodiments, the modifications can including, but are not limited to, nucleoside analogs, chemically modified bases, intercalated bases, modified sugars, and modified phosphate group linkers. In certain embodiments, the guide RNAs can comprise one or more phosphorothioate and/or 5’-N-phosphporamidite linkages. In still other aspects, the specification discloses vectors comprising one or more nucleotide sequences disclosed herein, e.g., vectors encoding one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof. The vectors may comprise naturally occurring sequences, or non-naturally occurring sequences, or a combination thereof.
In still other aspects, the specification discloses host cells comprising the herein disclosed vectors encoding one more more nucleotide sequences embodied herein, e.g., one or more guide RNAs, one or more target nucleotide sequences which are being edited, or a combination thereof.
In other aspects, the specification discloses a Cas-based genome editing system comprising a Cas protein (or homolog, variant, or orthologue thereof) complexed with at least one guide RNA. In certain embodiments, the guide RNA can be any of those disclosed in Tables 1-6, or a functional variant thereof.
In still other aspects, the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
In yet another aspect, the specification provides a Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein (or homolog, variant, or orthologue thereof) and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA can be identified by the methods disclosed herein for selecting a guide RNA.
In still a further aspect, the specification provides a library for training a computational model for selecting a guide RNA sequence for use with a Cas-based genome editing system capable of introducing a genetic change into a genome without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a first nucleotide sequence of a target genomic location having a cut site and a second nucleotide sequence encoding a cognate guide RNA capable of directing a Cas-based genome editing system to carry out a double-strand break at the cut site of the first nucleotide sequence. In another aspect, the specification provides a library and its use for training a computational model for selecting an optimized cut site for use with a DSB -based genome editing system (e.g., Cas-based system, TALAN-based system, or a Zinc-Finger-based system) that is capable of introducing a desired genetic change into a nucleotide sequence (e.g., a genome) at the selected cut site without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a nucleotide sequence having a cut site, and optionally a second nucleotide sequence encoding a cognate guide RNA (in embodiments involving a Cas- based genome editing system).
Also, the concepts disclosed herein may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
EXAMPLES
In order that the invention described herein may be more fully understood, the following examples are set forth. It should be understood that these examples are for illustrative purposes only and are not to be construed as limiting this invention in any manner.
Example 1: Demonstration of predictable and precise template-free CRISPR editing of pathogenic variants
Summary
DNA double-strand break repair following cleavage by Cas9 is generally considered stochastic, heterogeneous, and impractical for applications beyond gene disruption. Here, it is shown that template-free Cas9 nuclease-mediated DNA repair is predictable in human and mouse cells and is capable of precise repair to a predicted genotype in certain sequence contexts, enabling correction of human disease-associated mutations. A genomically integrated library of guide RNAs (gRNAs) was constructed, each paired with its corresponding DNA target sequence, and trained a machine learning model, inDelphi, on the end-joining repair products of 1,095 sequences cleaved by Cas9 nuclease in mammalian cells. The resulting model accurately predicted frequencies of 1- to 60-bp deletions and l-bp insertions (median r = 0.87) with single base resolution at 194 held-out library sites and ~90 held-out endogenous sequence contexts in four human and mouse cell lines. The inDelphi model predicts that 26% of all Streptococcus pyogenes Cas9 (SpCas9) gRNAs targeting the human genome result in outcomes in which a single predictable product accounts for >30% of all edited products, while 5% of gRNAs are “high-precision guides” that result in repair outcomes in which one product accounts for >50% of all edited products. It was experimentally confirmed that 183 human disease-associated microduplication alleles can each be corrected to their wild-type genotypes with >50% frequency among edited products following Cas9 cleavage in mammalian cells. Using these insights, genotypic and functional rescue of pathogenic LDLR microduplication alleles was achieved in human and mouse cells, and restored to wild-type an endogenous genomic Hermansky-Pudlak syndrome (HPS1) pathogenic allele in primary patient-derived fibroblasts. This study establishes that template-free Cas9 nuclease activity can be harnessed for precise genome editing
applications. More in particular, this study developed a high-throughput Streptococcus pyogenes Cas9 (SpCas9)-mediated repair outcome assay to characterize end-joining repair products at Cas9- induced double-stranded breaks using 1,872 target sites based on sequence characteristics of the human genome. The study used the resulting rich set of repair product data to train the herein disclosed machine-learning algorithm (i.e., inDelphi), which accurately predicts the frequencies of the substantial majority of template-free Cas9-induced insertion and deletion events at single base resolution (which is further described in M. Shen et al.,“Predictable and precise template- free CRISPR editing of pathogenic variants,” Nature, vol. 563, November 29, 2018, pp. 646-651, and including Extended Data). This study finds that in contrast to the notion that end-joining repair is heterogeneous, inDelphi identifies that 5-11% of SpCas9 gRNAs in the human genome induce a single predictable repair genotype in >50% of editing products.
Building on this idea of precision gRNAs, this study further uses inDelphi to design 14 gRNAs for high-precision template-free editing yielding predictable 1 -bp insertion genotypes in endogenous human disease-relevant loci and experimentally confirmed highly precise editing (median 61% among edited products) in two human cell lines. As described herein, inDelphi was used to reveal human pathogenic alleles that are candidates for efficient and precise template-free gain-of-function genotypic correction and achieved template-free correction of 183 pathogenic human microduplication alleles to the wild-type genotype in >50% of all editing products. Finally, these developments were integrated to achieve high-precision correction of five pathogenic low-density lipoprotein receptor (LDLR) microduplication alleles in human and mouse cells, as well as correction of endogenous pathogenic microduplication alleles for Hermansky-Pudlak syndrome (HPS1) and Menkes disease (ATP7A) to the wild-type sequence in primary patient-derived fibroblasts.
Results
Cas9-mediated DNA repair products are predictable
To capture Cas9-mediated end-joining repair products across a wide variety of target sequences, a genome-integrated gRNA and target library screen was designed in which many unique gRNAs are paired with corresponding 55-bp target sequences containing a single canonical“NGG” SpCas9 protospacer-adjacent motif (PAM) that directs cleavage to the center of each target sequence (FIG. 11 A). To explore repair products among sequences representative of the human genome, 1,872 target sequences were computationally designed that collectively span the human genome’s distributions of % GC, number of nucleotides participating in microhomology, predicted Cas9 on-target cutting efficiency4, and estimated precision of deletion products24 (FIGs. 16A-16C). Through a multi-step process (FIGs. 16A-16C), the library (Lib-A - see Table 4) was cloned into a plasmid backbone allowing Tol2 transposon-based integration into the genome25, gRNA expression, and hygromycin selection for cells with genomically integrated library members.
Lib-A was stably integrated into the genomes of mouse embryonic stem cells (mESCs). Next, these cells were targeted with a Tol2 transposon-based SpCas9 expression plasmid containing a blasticidin expression cassette and selected for cells with stable Cas9 expression. Sufficient numbers of cells were maintained throughout the experiment to ensure >2, 000-fold coverage of the library. After one week, genomic DNA was collected from three independent replicate experiments from these cells and performed paired-end high-throughput DNA sequencing (HTS) using primers flanking the gRNA and the target site to reveal the spectrum of repair products at each target site. Using a sequence alignment procedure, the resulting
96,838,690 sequence reads were tabulated into observed frequencies of, on average, 1,262 unique repair genotypes for each target site.
To test the correspondence between library repair products and endogenous repair products, Lib-A included the 55-bp sequences surrounding 90 endogenous genomic loci for which the products of Cas9-mediated repair were previously characterized by HTS24. Previously reported repair products from this endogenous dataset (VO) in three human cell lines (HCT116, K562, and HEK293) reveal that 94% of endogenous Cas9-mediated deletions are 30 bp or shorter (FIGs. 16A-16C), suggesting that the Lib-A analysis method is capable of assessing the vast majority of Cas9-mediated editing products. It was found that repair outcomes for these Lib- A members corresponding to the VO sites in mESCs are consistent with previously reported endogenous repair products in human cells (median r = 0.76, FIGs. 17A-17D). Lib-A repair genotype frequencies are also consistent between experimental replicates (median r = 0.89, FIGs. 17A-17D), confirming that Cas9-mediated editing products of the target library reflect previously reported endogenous target locus editing products in human cells.
In Lib-A data from mESCs and in the three VO datasets from endogenous HEK293, K562, and HCT116 cells, end-joining repair of Cas9-mediated double-strand breaks primarily causes deletions (73-87% of all products) and insertions (13-25% of all products) (FIGs. 11B, 11C, FIGs. 17A-17D). Rarer Cas9-mediated repair products were also detected such as combination insertion/deletions (0.5-2% of all products) and deletions and insertions distal to the cutsite (3-5% of all products), which occur more often on the PAM-distal side of the double strand break ( FIGs. 17A-17D). The majority of products are deletions containing
microhomology consistent with MMEJ (53-58% of all products, and 70-75% of deletions) (FIGs. 11B, 11C, FIGs. 17A-17D for a definition of microhomology-containing deletions).
Using the wealth of Cas9 outcome data provided by Lib-A, a novel machine learning model, inDelphi, was trained to predict the spectrum of Cas9-mediated editing products at a given target site. This model consists of three interconnected modules aimed at predicting the three major classes of repair outcomes: microhomology deletions (MH deletions),
microhomology-less deletions (MH-less deletions), and single-base insertions (l-bp insertions, FIG. 12A). These three repair classes are defined as constituting all major editing outcomes and note that they comprise 80-95% of all observed editing products (FIGs. 11B, 11C). Motivated by the abundance of MH deletion products in Lib-A and VO data, a deep neural network was designed to predict MH deletions as one module of inDelphi. This module simulates MH deletions using the MMEJ repair mechanism, where 5’-to-3’ end resection at a double-strand break reveals two 3’ ssDNA overhangs that can anneal through sequence microhomology.
Extraneous ssDNA overhangs are eliminated, and DNA synthesis and ligation generates a dsDNA repair product26 (FIG. 12B). Through this mechanism, each potential microhomology results in a distinct deletion genotype, allowing a 1 : 1 mapping between possible
microhomologies at a target site and available MH deletion outcome genotypes (FIG. 12B). inDelphi models MH deletions as a competition between different MH-mediated hybridization possibilities. Using the input features of MH length, MH %GC, and deletion length, inDelphi outputs a score (phi) reflecting the predicted strength of each microhomology (FIG. 12A). From training data, inDelphi learned that strong microhomologies tend to be long and have high GC content (FIGs. 18A-18H).
To account for all deletions that cannot be simulated through the MMEJ mechanism, inDelphi also contains a second neural network module that predicts the distribution of MH-less deletion lengths using the minimum required resection length as the only input feature (FIG. 12A). Because there are many MH-less genotypes for each deletion length with frequencies that do not fit a simple pattern, inDelphi predicts the frequencies of deletion lengths but not of genotypic outcomes for MH-less deletions. This module learned from training data that the frequency of MH-less deletions decays rapidly with increasing length (FIGs. 18A-18H). It is hypothesized that MH-less deletions arise primarily from the activity of the classical and alternative NHEJ pathways27. The two neural networks were jointly trained using observed distributions of deletion genotypes from 1,095 Lib-A target sites (FIG. 12A).
The inDelphi model contains a third module to predict l-bp insertions (FIG. 12A). In VO and Lib-A data, insertions represent a major class of DNA repair at Cas9-mediated double-strand breaks (13-25% of all products, FIGs. 11B, 11C, FIGs. 17A-17D). Among insertions, l-bp insertions are dominant (9-21% of all products, FIGs. 11B, 11C, FIGs. 17A-17D). Surprisingly, it was found that the frequency of 1 -bp insertions and their resultant genotypes depend strongly on local sequence context. In endogenous and library settings, l-bp insertions predominantly comprise duplications of the -4 nucleotide (counting the NGG PAM as nucleotides 0-2, FIG. 12A), with higher precision when the -4 nucleotide is an A or T and with lower precision when it is a C or G (FIG. 12C). While l-bp insertions were observed occurring in 9% of products on average in Lib-A, this frequency varies significantly depending on the nucleotide at position -4, falling to less than 4% on average when the -4 nucleotide is G (LIG.12D, P < 10 V). While position -4 is most strongly associated with l-bp insertion frequency, other surrounding bases also contribute to insertion frequency (PIG. 12E). In addition, it was found that target sites with poor microhomology (low total phi score) and target sites with imprecise deletion product distributions are more likely to contain insertions at the expense of deletions (PIGs. 18A-18H).
Based on these empirical observations, inDelphi models insertions and deletions as competitive processes in which the total deletion phi score (overall microhomology strength) and predicted deletion precision influence the relative frequency of l-bp insertions, and the local sequence context influences the relative frequency and genotypic outcomes of l-bp insertions (PIG. 12A). inDelphi integrates these factors into predictions of l-bp insertion genotype frequencies using a k-nearest neighbor approach. Collectively, from sequence context alone, inDelphi predicts the indel lengths of 80-95% of Cas9-mediated editing products and the single base resolution genotypes of 65-80% of all products (FIG. 13A, FIGs. 19A-19D).
Trained on data from 1,095 Lib-A sequence contexts in mESCs, inDelphi demonstrates highly accurate genotypic prediction of l-bp insertions and 1 -60-bp deletions at 87-90 VO target sequences previously characterized experimentally in endogenous K562, HCT116, and HEK293 cells (median r = 0.87, FIG. 13B). It is noted that the Lib-A versions of these target sites were held out of inDelphi training. The inDelphi model also performs well when predicting indel length distributions from l-bp insertions to 60-bp deletions at the endogenous VO sites in three human cell lines (median r = 0.84, FIG. 13C). Additionally, inDelphi accurately predicts relative frequencies of genotypic outcomes (median r = 0.94) and indel length distributions (median r = 0.91) of 189 held-out Lib-A targets in mESCs (FIGs. 19A-19D). As a control that the features used in training inDelphi are crucial for its performance, the MH length feature was deleted from the inDelphi MH deletion module and found that inDelphi’s performance predicting genotype frequency was reduced to the performance of a model with random weights. A second control in which the deep neural networks were replaced with linear models showed 10-24% reduced performance on the genotype frequency and indel length prediction tasks. Together, these controls indicate that inDelphi’s computational structure is important for its accuracy. An online implementation of inDelphi is provided to predict the spectrum of Cas9-mediated products at any target site (crisprindelphi.design). Taken together, these results establish that in data from human and mouse cells, the relative frequencies of most Cas9 nuclease-mediated editing outcomes are highly predictable.
The ability of Cas9-mediated end-joining repair to induce frameshifts enables efficient gene knockout28. It was reasoned that inDelphi’ s accurate prediction of the indel length distribution of 80-95% of template-free Cas9-mediated editing products should also enable accurate prediction of Cas9-induced frameshifts. This task was simulated in 86 endogenous VO target sequences in HEK293 by tabulating the observed frequency of indels resulting in +0, +1, and +2 frameshifts. The observed frequency of indels in each frame predicted by inDelphi (median r = 0.81) compare favorably to those generated by Microhomology Predictor (median r = 0.37), a previously published method29 (FIG. 13D). Thus, it is expected that inDelphi facilitates Cas9-mediated gene knockout approaches by allowing a priori selection of gRNAs that induce high or low knockout frequencies. To this end, an online tool is provided to predict frameshift frequencies for any SpCas9 gRNA targeting the coding human and mouse genome (crisprindelphi.design). It is noted that human exons have a significant tendency (p < 10 100,
FIGs. 19A-19D) to favor frame -preserving deletion repair compared to shuffled exon sequences or non-coding human DNA. Taken together, the results show that inDelphi provides accurate single-base resolution predictions for the relative frequencies of most Cas9 nuclease-mediated end-joining repair outcomes, including frameshifts.
Designing high-precision template-free Cas9 nuclease-mediated editing
While end-joining repair is highly efficient at inducing mutations after Cas9 treatment, its tendency to induce a heterogeneous mixture of repair genotypes has limited its application primarily to gene disruption and removal of intervening sequences between two double-stranded breaks30-33. Motivated by inDelphi’s ability to predict Cas9-mediated repair outcomes from target sequences alone, it was sought to identify target sites for which the repair profile is highly skewed toward a single outcome. In principle, the ability of inDelphi to identify such sites may enable efficient, template-free, nuclease-mediated precision gene editing.
It was reasoned that a single strong microhomology hybridization possibility with a high phi score would outcompete a background of weaker alternative microhomologies to yield efficient and precise repair to a single deletion genotype. Microduplications, in which a stretch of DNA is repeated in tandem, contain stretches of exact microhomology and thus are predicted by inDelphi to collapse precisely through deletion upon MMEJ repair (FIG. 14A).
To test this prediction, a second high-throughput Cas9 substrate library (Lib-B - see Table 5) was designed and constructed that contains three families of target sequences with microduplications of each length from 7-25 bp. Cas9-mediated double-strand break repair products were analyzed in Lib-B in mESCs and in human U20S and HEK293T cells using the same procedure as for Lib-A evaluation. Highly precise repair was consistently observed in which 40-80% of all repair events correspond to a single repair genotype (FIG. 14B), substantially higher than the 21% median frequency of the most abundant deletion genotype in 90 VO sites that were not pre-selected for microhomology. The fraction of microduplication repair to a single collapsed product as compared to other outcomes increased with
microduplication length in mESCs, U20S, and HEK293T cells (r = 0.35, p < 7xl0 5, FIG. 14B, FIGs. 20A-20E). It is noted that these sites have significantly higher phi scores and precision scores compared to VO sites, and significantly fewer l-bp insertions ( FIGs. 20A-20E). Thus, sites with strong MH deletion candidates are enriched in that specific deletion outcome at the expense of MH-less deletion and l-bp insertion outcomes. It is also hypothesized that sequence contexts with no strong MHs (low total phi scores) could enable precise l-bp insertion repair. To test this possibility, three target sequence frameworks with low total phi scores were included in Lib-B (FIGs. 20A-20E) containing randomization at the four positions surrounding the Cas9 cleavage site (positions - 5 to -2 with respect to the PAM at positions 0-2; see FIG. 12A). Cas9 nuclease treatment of 205 such sequences in Lib-B resulted in highly precise (up to 90% of all repair events) and reproducible (r = 0.90 between mESC replicates) l-bp insertions (FIG. 14C, FIGs. 20A-20E). Strikingly, the efficiency of 1 -bp insertions is strongly influenced by the nucleotide identities in positions -5 to - 2 (FIG. 14C). Similar to the findings from Lib-A (FIG. 12E, FIGs. 20A-20E), -4T and -3G correlate with higher relative frequencies of l-bp insertion among all products while -4G correlates with lower frequencies of insertion (FIG. 14D). Among these three fixed sequence contexts in Lib-B with low total MH, l-bp insertions comprise a median of 29% of all repair products, which is significantly higher than in VO sites (FIGs. 20A-20E). Moreover, a median of 61% of all products are l-bp insertions at sites with TG at the - 4 and -3 positions (FIG. 14D), revealing that precise l-bp insertion can be obtained through Cas9-mediated end-joining at specific, predictable sequence contexts.
It is noted that sequences that support higher insertion efficiencies (>50%) have on average 33% lower total efficiencies of Cas9-mediated indels than sequences that yield lower insertion efficiencies (r = -0.35, p = 3.3xl0 7, FIGs. 20A-20E), possibly because the lower efficiency of MMEJ at such sites decreases the likelihood of mutagenic repair of the Cas9- induced double-strand break. These observations collectively establish that Cas9-mediated repair of target sites with predictable sequence features can lead to precise editing favoring one particular outcome.
Based on these findings, inDelphi was used to predict gRNAs that lead to such precise outcomes. A metric was defined using information entropy to measure the precision of a repair outcome spectrum as a score ranging from zero (highest entropy, lowest precision) to one (lowest entropy, highest precision) and demonstrated that inDelphi is capable of predicting the precision of Cas9-induced deletions in 86 VO target sequences in HEK293 cells (median r = 0.64, FIG. 14E). inDelphi was then used to discover SpCas9 gRNAs that support precise end joining repair in the human genome. It was found that substantial fractions of all genome-targeting SpCas9 gRNAs are predicted to produce relatively precise outcomes (Table 2). Indeed, inDelphi predicts that 26% of SpCas9 gRNAs that target human exons and introns are“precision gRNAs” (FIG. 14F), which are defined as gRNAs predicted to produce a single genotypic outcome in >30% of all major editing products, with 20% of gRNAs predicted to be produce a single deletion genotype at >30% efficiency and 6.2% predicted to produce a single l-bp insertion genotype at >30% efficiency. Moreover, inDelphi predicts that 4.8% of SpCas9 gRNAs targeting human exons and introns are“high-precision gRNAs,” which are defined as gRNAs that produce a single genotype in >50% of all major editing products (FIG. 14F, 3.8% producing high-precision deletion, 0.94% producing high-precision l-bp insertion). These findings suggest that Cas9-mediated end-joining outcomes at many target sites are both predictable and precise, and that precision and high- precision gRNAs offer new opportunities for precision deletion and insertion by Cas9 nuclease- mediated editing. An online tool is provided to predict the precision of a given gRNA and to identify precision and high-precision SpCas9 gRNAs targeting the human and mouse genomes (crisprindelphi.design).
Efficient template-free repair of pathogenic alleles to wild-type genotypes
Next, inDelphi-classified high-precision gRNAs were used to identify new targets for therapeutic genome editing. Starting with 23,018 insertion, short deletion, and microduplication disease genotypes from the ClinVar and HGMD databases16 17, inDelphi was tasked with identifying pathogenic alleles that are suitable for template-free Cas9-mediated editing to effect precise gain-of-function repair of the pathogenic genotype. Two genetic disease allele categories that have not been previously identified as targets for Cas9-mediated repair are predicted by inDelphi to be candidates for high-precision repair. The first category is a selected subset of pathogenic frameshifts in which, because of high-precision repair, inDelphi predicts that 50-90% of Cas9-mediated deletion products will correct the reading frame compared to the average frequency of 34% among all disease-associated frameshift mutations. The second category is pathogenic microduplication alleles in which a short sequence duplication leads to a frameshift or loss-of-function protein sequence changes (FIG. 15 A).
To test the accuracy of inDelphi at predicting repair genotypes of therapeutically relevant alleles, 1,592 pathogenic human loci that inDelphi identified to have the highest predicted rates of frameshift or microduplication repair to the wild-type sequence, were included in Lib-B. Cas9-mediated repair of genome-integrated Lib-B in mESCs and human U20S and HEK293T cells confirmed highly efficient and precise gain-of-function editing. It was observed that 183 human disease microduplication alleles included in Lib-B were repaired to wild-type in >50% of all products FIG. 15B), and 508 pathogenic human frameshift alleles were restored into proper reading frame in >50% of all products in mESCs (FIG. 15C), in agreement with inDelphi’s predictions (r = 0.64 for frame restoration, r = 0.64 for wildtype repair). Similar results were observed in HEK293T and U20S cells (FIGs. 21A-21D). While microduplication repair to the wild-type genotype unambiguously restores wild-type protein function, it is noted that frameshift restoration that alters coding sequence requires case-by-case analysis to validate rescue of protein function.
To determine if the efficiency of microduplication repair can be increased by
manipulation of DNA repair pathways, Cas9 cleavage of Lib-B was performed in Prkdc 7 Lig4 7 mESCs34, which are deficient for two proteins involved in NHEJ repair35. As expected, the frequency of MH-less deletion repair in cells with impaired NHEJ is decreased (25% to 16%) ( FIGs. 22A-22E). It was also observed that the precision of l-bp insertions that result in duplication of the -4 nucleotide is increased in Prkdc_/_Lig4_/_ mESCs (FIGs. 22A-22E).
Importantly, the frequency of MH-dependent deletion repair is substantially increased (58% to 72%) in Prkdc_/_Lig4_/_ mESCs, enabling a subset of pathogenic alleles to be repaired to wild- type with strikingly high precision. In wild-type mESCs, 183 pathogenic alleles are repaired to wild-type in >50% of all edited products and 11 pathogenic alleles are repaired to wild-type in >70% of all edited products, while in Prkdc 7 Lig4 7 mESCs, 286 pathogenic alleles are repaired to wildtype in >50% of all edited products and 153 pathogenic alleles are repaired to wild-type in >70% of products (FIG. 15D, Table 6). Thus, impairing NHEJ can further increase the precise repair of pathogenic microduplications to wild-type (p = 7.8* 10 12, FIG. 15D). These data support the model that competing end-joining repair mechanisms determine the relative frequencies of specific editing outcome types and demonstrate that template-free genotypic correction of hundreds of pathogenic microduplication alleles in genes such as PKD 1 (corrected in 92% of edited Prkdc 7 Lig4 7 mESC alleles), GJB2 (91%), MSH2 (88%), LDLR (87%), and BRCA1 (82%) can be optimized to occur with strikingly high efficiency by manipulation of repair pathways. inDelphi’s prediction of highly efficient wild-type repair was further tested on pathogenic LDLR microduplication alleles, which cause dominantly inherited familial
hypercholesterolemia36. Five pathogenic LDLR microduplication alleles were separately introduced within a full-length LDLR coding sequence upstream of a P2A-GFP cassette into the genome of human and mouse cells, such that Cas9-mediated repair to the wild-type LDLR sequence should induce phenotypic gain of LDL uptake and restore the reading frame of GFP. Cas9 and a gRNA that is specific to each pathogenic allele and does not target the wild-type repaired sequence were then deleivered. Robust restoration of LDL uptake was observed as well as restoration of GFP fluorescence in mESCs, U20S cells, and HCT116 cells in up to 79% of cells following transfection with Cas9 and inDelphi gRNAs (FIGs. 15E, 15F, FIGs. 23A-23E). HTS confirms efficient genotypic repair to wild-type of these five LDLR microduplication alleles in human and mouse cells as well as of three other pathogenic microduplication alleles in the GAA, GLB 1, and PORCN genes introduced to cells using the same method (Table 1, Table 3). Importantly, in these experiments, high-frequency LDLR phenotypic correction was observed when cutting with either SpCas9 or Streptococcus aureus Cas9 (SaCas9)37 (Table 3), suggesting that microduplication repair is a feature of cellular repair after a Cas9-mediated double-strand break that does not require a specific nuclease.
Finally, precise template-free Cas9-mediated MMEJ was used to repair an endogenous pathogenic l6-bp microduplication in primary fibroblasts from a Hermansky-Pudlak syndrome (HPS1) patient. HPS 1 causes blood clotting deficiency and albinism in patients and is particularly common in Puerto Ricans38. Simultaneous delivery of Cas9 and gRNA specific to the pathogenic microduplication allele induced high-efficiency correction to the wild-type sequence (mean frequency = 71% of edited alleles, N = 3, Table 1). These findings suggest the potential of template-free, precise Cas9 nuclease-mediated repair of microduplication alleles to achieve efficient repair to the wild-type sequence for therapeutic gain-of-function genome editing.
The following tables are referenced in this specification.
Table 1: Repair of microduplication pathogenic alleles through template-free Cas9- nuclease treatment.
Figure imgf000081_0001
Table 2: Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision.
Fraction of 1,083,524 SpCas9 gRNAs in human exons
and tnirons for which the most-common repair
genotype comprises XX% of a!i major editing products
Precise product Precise product is Any precise
Precision gRNA is a <: ····!:<>:· a 1-bp insertion product
frequency ( } (% of gRNAs} (% of gRNAs, (¾. of gRNAs)
10 86 35 se
15 63 23 76
20 42 14 55
25 29 10 38
30 20 82 26
35 13 41 7
40 86 26 1
45 58 13 71
50 38 094 48
55 2.4 0.52 3.0
60 15 339 1.9
65 038 00S6 11
70 0.58 0.026 0.61
75 Q2S 0924 032
SO 012 0 0.12
S5 0040 3 0040
Figure imgf000082_0001
Table 3: Repair of eight pathogenic microduplication alleles in individual cellular experiments.
Figure imgf000083_0001
CssiSii Aii ifAsaasA'v si
WWX k> AsM-iapa
ssaaSBA ispapp aS aiSiaa
-a Aists « MTS, M20S t¾r OSi SS MB NO MO ΪΪ MB SB: NO NO MB MB
CSssisvfti! iffsSKaaay iif
mp&t is s*SkM¾ts«
¾sao¾Tis ssxsAg a¾ arSiaa
:;iAAi:if:is Ti MTS, MCTi ίd s:14> MO MO NO 24 MO SS MB SB: SO MO MO MB
B4>s asx-ss M asjpiss > af
A47SA K »¾1-Si74
§ssfta¾ >e sTssTst a§ asiisMi
piSiAiPi Ti LSs- S, ff>ESCs B'J> MB NS MB MB MB 'AS 42 MB S& 41 MB MB
Figure imgf000083_0002
Table 4: Lib-A sequences (presented below between the end of this specification and Table 5).
Table 5: Lib-B sequences (presented below between Table 4 and Table 6).
Table 6: inDelphi predictions and observed results. Table 6 comprises Table 6A: inDelphi predictions and observed results for Lib-B, showing all sequences with replicate- consistent mESC results; Table 6B (continued from 6A); Table 6C (continued from 6B); Table 6D (continued from 6C); and Table 6E (continued from 6D) (presented below between Table 5 and the claims).
Table 7: Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision
Figure imgf000084_0001
33 02 0 0S 03 0 01 3,2 02
85 0,35 0 04 0 1 0 003 3.37 008
83 O '·_> ; 002 9.0-5 9.0097 003 0.03 Table 8: Endogenous repair of 24 designed high-precision gRNAs in human cell lines
Figure imgf000085_0001
VEGF.4
91, 67 38, 34* S3, 99 43, 40··
: 456
VEGFR2
exarsS: 2 S1r S1 50.. 53* 91. SI 53, 24s
PGCD1
90, 90 20, 21s 91, 90 25, 3s
exsiiS: 238
AFC©
¾3 S3 22. 21' 87, SS 35, 8s
e¾ 25: 147
VEGFA
85 88 27, 26*
exa«3: 127 93, St 55, 32* OCRS
82, 8 20, 21s SS, 84 43,
exsfsl: 1§41 27*
CD274
esissC: 271 85. SS 9, 10s 84, 82 31, 14* AFOS
91, 89 2S, 25s 83 37*
©xcsfiS: 55SQ
VH3FR2 82, 82 35, 33* 82, 82 43, exssCS: 19 24*
CXCR4
88, SS 32, 33s 95 S4S
e.xsal : 825
CSKS
exsAll : 5 81, 7S 28, 2s1· 73 2 r CORE
84, 85 55, 52* 67 4S
SX«B1: 8SS
OCRS
32, 34 61, 91, S2 43, 587
esssl: 1S27
.4FGB
33. 38 75, 747 33, §5 63, s ¬
5573
CORE
84, 04 37, 2S* 83, SS ze, 381
exorsl: 81
OCRS
exa«1: 1577 81, 81 28, 297 83, 83 23, 437 APOB
as, as 25, 277 si, as 25, 387
sx 22: 102
APG8EC38 S3. 84 S3, S2" 75, SS S1, SSf ©xss 3: 252
A·.'·.Ή7.':
37. SS 63, 77» 97, 36 73, S5:;
cfsrl: 45S738S2
PRDK2
S3, 94 44, 41 S3, S3 45, 537=
dsrS: 71821867
SOS
SS, SS 72, 74» S3, S3 84, 69»
:2H:> : MSSS47G0
EC&31 87.39 44, 47» 83, 32, 3S¾ cisi: 158484838
Kc z
shi7: 40 25» 65, SS 35, 14»
150844SS6
LDLR 30, 31 78, 77» SS, 36 77, 83» chrlS: 1:1222333 Discussion
The Cas9-mediated end-joining repair products of thousands of target DNA loci integrated into mammalian cells were used to train a machine learning model, inDelphi, that accurately predicts the spectrum of genotypic products resulting from double-strand break repair at a target DNA site of interest. The ability to predict Cas9-mediated products enables new precision genome editing research applications and facilitates existing applications.
The inDelphi model identifies target loci in which a substantial fraction of all repair products consists of a single genotype. The findings suggest that 26% of SpCas9 gRNAs targeting the human genome are precision gRNAs, yielding a single genotypic outcome in >30% of all major repair products, and 5% are high-precision gRNAs in which >50% of all major repair products are of a single genotype. Such precision and high-precision gRNAs enable uses of Cas9 nuclease in which the major genotypic products can be predicted a priori. Indeed, it was experimentally shown that high-precision, template-free Cas9-mediated editing can mediate efficient gain-of-function repair at hundreds of pathogenic alleles including microduplications (FIGs. 15B, 15E, 15F) in cell lines and in patient-derived primary cells (Table 1).
Moreover, evidence is presented that manipulation of available DNA repair pathways can further increase the precision of template-free repair outcomes. Suppressing NHEJ augments repair of pathogenic microduplication alleles, suggesting that temporary manipulation of DNA repair pathways could be combined with Cas9-mediated editing to favor specific editing genotypes with high precision. Genome editing currently lacks flexible strategies to correct indels in post-mitotic cells because of the limited efficiency of HDR in non-dividing cells39. As MMEJ is thought to occur throughout the cell cycle40, inDelphi may provide access to predictable and precise post-mitotic genome editing in a wider range of cell states. It is also anticipated that, given appropriate training data, inDelphi will also be able to accurately predict repair genotypes from other double-strand break creation methods, including other Cas9 homologs, Cpfl, transcription activator-like effector nucleases (TALENs), and zinc-finger nucleases (ZFNs) 3741 43. This work establishes that the prediction and judicious application of template-free Cas9 nuclease-mediated genome editing offers new capabilities for the study and potential treatment of genetic diseases.
Cellular repair of double-stranded DNA breaks and inDelphi DNA double-strand breaks are detrimental to genomic stability, and as such the detection and faithful repair of genomic lesions is crucial to cellular integrity. A large number of genes have evolved to respond to and repair DNA double-strand breaks, and these genes can be broadly grouped into a set of DNA repair pathways26, each of which differs in the biochemical steps it takes to repair DNA double-strand breaks. Accordingly, these pathways tend to produce characteristically distinguishable non-wildtype genotypic outcomes.
The goal of the machine learning algorithm, inDelphi, is to accurately predict the identities and relative frequencies of non-wildtype genotypic outcomes produced following a CRISPR/Cas9-mediated DNA double-strand break. To accomplish this goal, parameters were developed to classify three distinct categories of genotypic outcomes, microhomology deletions, microhomology-less deletions, and insertions, informed by the biochemical mechanisms underlying the DNA repair pathways that typically give rise to them.
Double strand breaks are thought to be repaired via four major pathways: classical non- homologous end-joining (c-NHEJ), alternative-NHEJ (alt-NHEJ), microhomology-mediated end-joining (MMEJ), and homology-directed repair (HDR)l. To create inDelphi, three machine learning modules were developed to model genotypic outcomes assuming characteristic of the c- NHEJ, microhomology mediated alt-NHEJ, and MMEJ pathways. While template-free
CRISPR/Cas9 DNA double-strand break may lead to HDR repair via endogenous homology templates that exist in trans45, HDR-characteristic outcomes are not explicitly modeled using the algorithm.
Before proceeding, it is important to note that while specific DNA repair pathways are characteristically associated with distinct genotypic outcomes, the proteins involved in the various pathways and the resulting repair products may at times overlap. This fact has several implications. First, conclusive statements cannot be made about the role of specific proteins or pathways in specific genotypic outcomes without perturbation experiments (e.g. the comparison of wildtype and Prkdc^ LigJ^ mESCs can illuminate the roles of these proteins, specifically). Second, because assigning genotypic outcomes to biochemical mechanisms is likely imperfect, machine learning methods were used to identify trends and patterns in genotype frequencies that refine this crude binning process.
In the first step of the inDelphi method, genotypic outcomes were seperated into three classes: microhomology deletions (MH deletions), microhomology-less deletions (MH-less deletions), and single-base insertions (l-bp insertions) (FIG. 12A). Below the algorithmic definitions of each genotypic outcome class are outlined, the pathways associated with each class, and the DNA sequence parameters included in inDelphi training of each class. For more detailed technical algorithmic definitions of the genotypic outcome classes.
MH deletions are predicted from MH length, MH GC content, and deletion length
The majority of Cas9-mediated double-strand break repair genotypes observed in the datasets are what are classified as MH deletions (53-58% in mESC, K562, HCT116, and HEK293). It is hypothesized that these deletions occur through MMEJ-like processes and use known features of this pathways to inform a machine learning module to predict MH deletion outcomes. Following 5’-end resection as occurs in MMEJ, alt-NHEJ, and HDR26,
microhomologous basepairing of single-stranded DNA (ssDNA) sequences occurs across the border of the double strand breakpoint46, 47. To restore a contiguous double-strand DNA chain, the 5’-overhangs not participating in the microhomology are removed up until the paired microhomology region, and the unpaired ssDNA sequences are extended by DNA polymerase using the opposing strand as a template (FIG. 12B, FIGs. 18A-18H).
Assuming these same processes, inDelphi calculates the set of all MH deletions available given a specific sequence context and cleavage site.
As an example workflow, given the following sequence and its cleavage site:
ACGTG I CATGA TGCAC I GTACT
for every possible deletion length from l-bp to 60-bp deletions, the 3’-overhang is overlapped downstream of the cut site under the upstream 3’-overhang and it is determined if there is any microhomologous basepairing. As an example, given the 4-bp deletion length:
ACGTG I I I GTACT
it is seen that there are three microhomologous basepairing events.
Then a particular microhomology is chosen (here, the italicized C:G):
ACGTG GTACT
then generate its unique repair genotype by following left-to-right along the top strand and jumping down to the complement of the bottom strand to simulate DNA polymerase fill-in.
Here, this yields:
ACATGA TGTACT
This can also be displayed as an alignment. It is noted that by“jumping down” after the first base in the top strand, this outcome can also be described using the delta-position 1. (See section on delta-positions). A deletion at delta-position 0 yields the same genotype.
Deletion b: ACGTG— A
Wt: ACGTGCATGA (SEQ ID NO: 7)
Thus, there may be multiple MH deletion outcome genotypes for a given deletion length, and there is always a 1 : 1 mapping between the microhomologous basepairing used in that MH deletion and the resultant genotypic outcome. The set of MH deletions thus includes all l-bp to 60-bp deletions that can be derived from the steps above that simulate the MMEJ mechanism.
MMEJ efficiency has been reported to depend on the thermodynamic favorability and stability of a candidate microhomology46, 47. To parameterize MH deletions using the biochemical sequence features that influence this form of DNA repair, inDelphi calculates the MH length, MH GC content, and resulting deletion length for each possible MH deletion. These features are input into a machine learning module as the microhomology neural network (MH- NN) to learn the relationship between these features and the frequency of an MH deletion outcome in a training CRISPR/Cas9 genotypic outcome dataset. While it was predicted and empirically found that favored MH deletions have long MH lengths relative to total deletion length and high MH GC-contents, any explicit direction or comparative weighting to these parameters are not provided at the outset. inDelphi then outputs a phi-score for any MH deletion genotype (whether it was in the training data or not) that represents the favorability of that outcome as predicted by MH-NN. It is important to emphasize that the phi-score of a particular MH deletion does not itself represent the likelihood of that MH deletion occurring in the context of all MH deletions at a given site. Some CRISPR/Cas9 target sites may have many possible favorable MH deletion outcomes while other sites have few, and thus phi-score must be normalized for a given target site to generate the fractional likelihood of that genotypic outcome at that site. Total unnormalized MH deletion phi-score is one factor that is further used to predict the relative frequency of the different repair classes: MH deletions, MH-less deletions, and insertions.
MH-less deletions are predicted from their length
MH-less deletions are defined as all possible deletions that have not been accounted for by the workflow described above for MH deletions. Mechanistically, the data analysis suggests that MH deletions are associated with repair genotypes produced by c-NHEJ and
microhomology-mediated alt-NHEJ pathways.
Following a double-strand break, c-NHEJ-associated proteins rapidly bind the DNA strands flanking the double-strand DNA breakpoint and recruit ligases, exonucleases, and polymerases to process and re-anneal the breakpoint in the absence of 5’-end resection (FIGs. 18A-18H)26, 35. Commonly, c-NHEJ repair is error-free; however, in the context of Cas9- mediated cutting, faithful repair leads to repeated cutting, thereby increasing the eventual likelihood of mutagenic repair. Erroneous c-NHEJ repair products are mainly thought to consist of small insertions or deletions or combinations thereof that most frequently occur in the direct vicinity of the DNA break point35, 48, 49. The resulting deletions, which are referred to as medial end-joining MH-less deletions, have often lost bases both upstream and downstream of the cleavage site.
Microhomology-mediated alt-NHEJ is a distinct pathway that produces MH-less deletion products. In contrast to c-NHEJ, which is microhomology independent, this form of alt-NHEJ repair occurs following 5’-end resection and is mediated by microhomology in the sequence surrounding the double-strand break-point 1. Microhomologous basepairing stabilizes the 3’- ssDNA overhangs following 5’-end resection, similarly to in MMEJ, allowing DNA ligases to join the break across one of the strands of this temporarily configured complex. The opposing un-annealed flap is then removed, and newly synthesized DNA templated off of the remaining strand is annealed to repair the lesion (FIGs. 18A-18H). While alt-NHEJ uses microhomology, the repair products it produces do not follow the predictable genotypic patterns induced by MMEJ and are thus grouped into MH-less deletion genotypes. MH deletions are a direct merger of both annealed strands, in which the outcome genotype switches from top to bottom strand at the exact end-point of a microhomology. In contrast, while alt-NHEJ employs microhomology in its repair mechanism, the deletion outcomes it generates comprise bases exclusively derived from either the top or bottom strand. Mechanistically, this occurs because ligation of a 3’-overhang to its downstream ligation partner results in removal of the entire opposing ssDNA overhang up until the point of ligation. This process prevents any deletion from occurring in the 3’-overhang strand that is first attached to the DNA backbone, while inducing loss of an indeterminant length of sequence on the opposing strand. The resulting deletion genotypes, which are referred to as unilateral end-joining MH-less deletions, do not retain information on the exact microhomology causal to their occurrence, and are thus also referred to as MH-less.
Consequently, the various mechanisms that give rise to MH-less deletions are capable of generating a vast number of genotypic outcomes for any given deletion length. Having less information on the biochemical mechanisms that impact the relative frequency of NHEJ deletion products, inDelphi models these deletions without assuming any particular mechanism.
inDelphi detects MH-less deletions from training data as the set of all deletions that are not MH deletions and parameterizes them solely by the length of the resulting deletion. This is based on the simple assumption that c-NHEJ and alt-NHEJ processes are most likely to produce short deletions, supported by the empirical observation. As with MH deletions, this assumption is not explicitly coded into the inDelphi MH-less deletion prediction module, instead allowing it to be“learned” by a neural network called MHless-NN.
MHless-NN optimizes a phi-score for a given MH-less deletion length, grounded in the frequency of MH-less deletion outcomes of that length observed in the training data. It was observed that MHless-NN learns a near-exponential decaying phi-score for increasing deletion length, that reflects the sum total frequency of all MH-less deletion genotypes. The total unnormalized MH-less deletion phi-score for a given target and cut site is also employed to inform the relative frequency of different repair classes.
1-bp insertions are predicted from sequence context and deletion phi-scores Lastly, inDelphi predicts l-bp insertions from both the broader sequence context and the immediate vicinity of the cleavage site. It was empirically found that l-bp insertions are far more common than longer insertions, so the focus is on their prediction. It is classically assumed that short sequence insertions are the result of c-NHEJ48,49, however, little else is known about their biochemical mechanism as it pertains to local sequence context to help inform prediction.
Nonetheless, powerful correlations were found between the identities of the bases surrounding the Cas9 cleavage site and the frequency and identity of the inserted base (see main text).
Motivated by these empirical observations, inDelphi is fed with training data on l-bp insertion frequencies and identities at each training site parameterized with the identities of the -3, -4, and -5 bases upstream of the NGG PAM-sequence (when the training set is sufficiently large, and the -4 base alone when training data is limited) as features. Also added as features are the precision score of the deletion length distribution and the total deletion phi-score at that site. These features are combined into a k-nearest neighbor algorithm that predicts the relative frequencies and identities of l-bp insertion products at any target site.
The combination of the MH, MH-less, and insertion model predict genotype fractions
Altogether, informed by known paradigms of DNA repair, 2 neural networks and a k- nearest neighbor model were built to predict genotypic outcomes following Cas9 cutting. These models compete and collaborate in inDelphi to generate predictions of the relative frequencies of these products. This competition within inDelphi among repair types reflects empirical evidence from Lib-A and Lib-B that sequence contexts do influence classes of repair outcomes. Sequence contexts with high phi scores (high microhomology) have higher efficiencies of MH deletions among all editing outcomes (FIG. 14B, FIGs. 20A-20E), and sequence contexts with low phi scores (low microhomology) have higher efficiencies of l-bp insertions among all editing outcomes (FIGs. 14C, 14D, FIGs. l8A-l8H,FIGs. 20A-20E). While it is tempting to generalize that the competition and collaboration among outcome classes modeled by inDelphi reflects interactions among components of distinct DNA repair pathways, the classes of outcomes considered by inDelphi do not necessarily arise from distinct DNA repair pathways as they are described above. inDelphi is trained on the repair outcomes only and cannot distinguish between the nature of genotypes when they may occur through MH-mediated and MH-less mechanisms, and it is imaginable that some repair products result through more than one repair pathway. As an additional note, while NHEJ is generally assumed to dominate double-strand break repair from environmentally induced damage35, it was found in the context of Cas9 cutting that MH deletion genotypes are more common than MH-less deletions and insertions. It is possible that error-free c-NHEJ is occurring frequently in response to Cas9 cutting but that its perfect repair allows for recurring Cas9 cutting that goes undetected by the workflow, thus skewing the observed relative frequency profile of mutagenic outcomes toward MMEJ-type repair.
Prkdc-,-Lig4-,~ mutants have distinct and predictable DNA repair product distributions
While it is generally true that the work cannot establish roles for specific DNA repair pathways in specific types of Cas9-mediated outcomes, an experiment has been performed in which Cas9-mediated genotypic outcomes were measured from mESCs that are lacking Prkdc and Lig4, two proteins known to be key in C-NHEJ5. An increase in relative frequency of MH deletions was found as compared to MH-less deletions in Prkdc / Lig4 / mESCs as compared to wild-type mESCs (see main text), which is suggestive of an increase in MMEJ outcomes at the expense of NHEJ outcomes.
Intriguingly, it was also found that Prkdc / Lig4 / mESCs are impaired in unilateral deletions, where only bases from one side of the cutsite are removed, but not medial MH-less deletion outcomes that have loss of bases on both sides of the breakpoint. (FIGs. 22A-22E). As discussed earlier, microhomology-mediated alt-NHEJ, which it was hypothesized may give rise to unilateral MH-less deletions, proceeds through a mechanism in which DNA repair
intermediates that mimic MMEJ -mediated repair are formed initially (FIGs. 18A-18H), as microhomology basepairing temporarily stabilizes 3’-overhangs following 5’-end resection. Subsequently, ligation joins one 3’ overhang with the sequence on the other side of the DNA double-strand break, giving rise to a unilateral deletion. If the unilateral joining products observed in the experiments indeed arise through similar mechanisms as those described by this form of alt-NHEJ, it is conceivable that the MMEJ pathway may overtake 3’-end ligation at this microhomology-containing intermediate step when ligation is impaired through loss of Lig4. Thus, cross-talk of microhomology-mediated repair pathways could account for loss of unilateral end-joining MH-less outcomes and concomitant increase in MH deletion outcomes. Medial joining outcomes are not hypothesized to originate from intermediates that overlap with microhomology-mediated deletion products (FIGs. 18A-18H). Therefore, the repair genotypes generated via this orthogonal pathway may be afforded more time to be completed by ligases other than Lig4, thus explaining why these outcomes appear unaffected by NHEJ impairment.
While DNA repair products in Prkdc LigJ mESCs differ substantially from those in wild-type cells, it was found that these DNA repair products are also highly predictable. In particular, inDelphi performed well on held-out PrkdcALigJA data when trained on Prkdc' LigJ /_ data (indel genotype prediction median Pearson correlation = 0.84, indel length frequency prediction Pearson correlation = 0.80), showing that the modeling approach is robustly capable of learning accurate predictions for Cas9 editing data in not just wild-type experimental settings but also settings with significant biochemical perturbation. As such, it is suggested here that inDelphi’s modeling approach can be useful on additional tasks unexplored here provided that inDelphi is supplied with appropriate training data.
Methods
Library cloning.
Specified pools of 2000 oligos were synthesized by Twist Bioscience and amplified with NEBNext polymerase (New England Biolabs) using primers OligoLib_Fw and OligoLib_Rv (see below), to extend the sequences with overhangs complementary to the donor template used for circular assembly. To avoid over-amplification in the library cloning process, qPCR was first performed by addition of SybrGreen Dye (Thermo Fisher) to determine the number of cycles required to complete the exponential phase of amplification. The PCR reaction was run for half of the determined number of cycles at this stage. Extension time for all PCR reactions was extended to 1 minute per cycle to prevent skewing towards GC-rich sequences. The 246-bp fragment was purified using a PCR purification kit (Qiagen).
Separately, the donor template for circular assembly was amplified with NEBNext polymerase (New England Biolabs) for 20 cycles from an SpCas9 sgRNA expression plasmid (Addgene 71485)34 using primers CircDonor_Fw and CircDonor_Rv (see below) to amplify the sgRNA hairpin and terminator, and extended further with a linker region meant to separate the gRNA expression cassette from the target sequence in the final library. The 146-bp amplicon was gel-purified (Qiagen) from a 2.5% agarose gel.
The amplified synthetic library and donor templates were ligated by Gibson Assembly (New England Biolabs) in a 1 :3 molar ratio for 1 hour at 50°C, and unligated fragments were digested with Plasmid Safe ATP-Dependent DNase (Lucigen) for 1 hour at 37°C. Assembled circularized sequences were purified using a PCR purification kit (Qiagen), linearized by digestion with Sspl for >3 hours at 37°C, and the 237-bp product was gel purified (Qiagen) from a 2.5% agarose gel.
The linearized fragment was further amplified with NEBNext polymerase (New England Biolabs) using primers PlasmidIns_Fw and PlasmidIns_Rv (see below) for the addition of overhangs complementary to the 5’- and 3’-regions of a Tol2-transposon containing gRNA expression plasmid (Addgene 71485)34 previously digested with Bbsl and Xbal (New England Biolabs), to facilitate gRNA expression and integration of the library into the genome of mammalian cells. To avoid over-amplification, qPCR was performed by addition of SybrGreen Dye (Thermo Fisher) to determine the number of cycles required to complete the exponential phase of amplification, and then ran the PCR reaction for the determined number of cycles. The 375-bp amplicon was gel-purified (Qiagen) from a 2.5% agarose gel.
The 375-bp amplicon and double-digested Tol2-transposon containing gRNA expression plasmid were ligated by Gibson Assembly (New England Biolabs) in a 3: 1 ratio for 1 hour at 50°C. Assembled plasmids were purified by isopropanol precipitation with GlycoBlue
Coprecipitant (Thermo Fisher) and reconstituted in milliQ water and transformed into
NEBlObeta (New England Biolabs) electrocompetent cells. Following recovery, a small dilution series was plated to assess transformation efficiency and the remainder was grown in liquid culture in DRM medium overnight at 37°C. A detailed step-by-step library cloning protocol is provided below.
The plasmid library was isolated by Midiprep plasmid purification (Qiagen). Library integrity was verified by restriction digest with Sapl (New England Biolabs) for 1 hour at 37°C, and sequence diversity was validated by high-throughput sequencing (HTS) as described below.
Library cloning primers
01igoLib_Fw
TTTTTGTTTTCTGTGTTCCGTTGTCCGTGCTGTAACGAAAGGATGGGTGCGACGC GTCAT (SEQ ID NO: 8) 01igoLib_Rv
GTTGATAACGGACTAGCCTTATTTAAACTTGCTATGCTGTTTCCAGCATAGCTCTT AAAC (SEQ ID NO: 9)
CircDonor_Fw
GTTTAAGAGCTATGCTGGAAACAGC (SEQ ID NO: 10)
CircDonor_Rv
ATGACGCGTCGCACCCATCCTTTCGTTACAGCACGGACAACGGAACACAGAAAA CAAAAAAGCACCGACTC (SEQ ID NO: 11)
PlasmidIns_Fw
GTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAA CACC (SEQ ID NO: 12)
PlasmidIns_Rv
TTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGCTCGAAGCGGCCGT ACCTCTAGATTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 13)
Cloning.
A base plasmid was constructed starting from a Tol2-transposon containing plasmid (Addgene 71485)34. The sequence between Tol2 sites was replaced with a CAGGS promoter, multi-cloning site, P2A peptide sequence followed by eGFP sequence, and Puromycin resistance cassette to produce p2T-CAG-MCS-P2A-GFP-PuroR. The full sequence of this plasmid is appended in the Sequences section below, and this plasmid has been submitted to Addgene.
Plasmids with this backbone and containing wildtype and micro-duplication mutation versions of LDLR and three other genes, GAA, GLB 1, and PORCN, were constructed.
Information on cloning these genes is provided below, and the gene sequences are appended below.
LDLR: To generate p2T-CAGGS-LDLRwt-P2A-GFP-PuroR, LDLR (NCBI Gene ID #3949, transcript variant 1 CDS) was PCR amplified from a base plasmid ordered from the Harvard PlasmID resource core and cloned between the Bam HI and Nhel sites of the base plasmid.
The following mutants were generated through InFusion (Clontech) cloning. Sequences are provided below, and the internal allele nomenclature is in parentheses:
LDLR:c.526_533dupGGCTCGGA (LDLRdup252)
LDLR : c.668_681 dup AGG AC A A ATCTG AC (LDLRdup254/255) (SEQ ID NO: 14)
LDLR:c.669_680dupGGACAAATCTGA (LDLRdup258) (SEQ ID NO: 15)
LDLR:c.672_683dupCAAATCTGACGA (LDLRdup26l) (SEQ ID NO: 16)
LDLR: c.1662_ 1669dupGCTGGTGA (LDLRdup264)
PORCN: NCBI Gene ID #64840, transcript variant C CDS was PCR amplified from HCT116 cDNA and cloned between the Bam HI and Nhel sites of the base plasmid.
PORCN : c.1059_ 1071 dupCCTGGCTTTTATC (SEQ ID NO: 17) (PORCNdup20) was generated through InFusion cloning.
GLB1 : NCBI Gene ID #2720, transcript variant 1 CDS was PCR amplified from
HCT116 cDNA and cloned between the Bam HI and Nhel sites of the base plasmid.
GLBl :c.l456_l466dupGGTGCATATAT (SEQ ID NO: 18) (GLBldup84) was generated through InFusion cloning.
GAA: NCBI Gene ID #2548, transcript variant 1 CDS was PCR amplified from a base plasmid ordered from the Harvard PlasmID resource core and cloned between the Bam HI and Nhel sites of the base plasmid. GAA:c.2704_27l6dupCAGAAGGTGACTG (SEQ ID NO: 19) (GAAdup327/328) was generated through InFusion cloning.
8rq389c and KKH SaCas99 were constructed starting from a Tol2-transposon containing plasmid (Addgene 71485)34. The sequence between Tol2 sites was replaced with a CAGGS promoter, Cas9 sequence, and blasticidin resistance cassette to produce p2T-CAG-SpCas9- BlastR and p2T-CAG-KKHSaCas9-BlastR. These plasmids have been submitted to Addgene.
SpCas9 guide RNAs were cloned as a pool into a Tol2-transposon containing gRNA expression plasmid (Addgene 71485)34 using Bbsl plasmid digest and Gibson Assembly (NEB). SaCas9 guide RNAs were cloned into a similar Tol2-transposon containing SaCas9 gRNA expression plasmid (p2T-U6-sgsaCas2xBbsI-HygR) which has been submitted to Addgene using Bbsl plasmid digest and Gibson Assembly. Protospacer sequences used are listed below, using the internal nomenclature which matches the duplication alleles. LDLR gRNAs
sgsaLDLRdup252: GCTGCGAAGATGGCTCGGAGGC (SEQ ID NO: 20)
sgsaLDLRdup254: GTGCAAGGACAAATCTGACAGG (SEQ ID NO: 21)
sgsaLDLRdup255: GTTCCTCGTCAGATTTGTCCTG (SEQ ID NO: 22)
sgsaLDLRdup258: GACTGCAAGGACAAATCTGAGG (SEQ ID NO: 23)
sgsaLDLRdup261 : GTTTTCCTCGTCAGATTTGTCG (SEQ ID NO: 24)
sgspLDLRdup264: GACATCTACTCGCTGGTGAGC (SEQ ID NO: 25)
PORCN gRNAs
sgspPORCNdup20 : GCTGTCCCTGGCTTTTATCCC (SEQ ID NO: 26)
GLB1 gRNAs
sgspGLB ldup84: GTGTGAACTATGGTGCATATA (SEQ ID NO: 27)
GAA gRNAs
sgsaGAAdup327 : GCAGCTGCAGAAGGTGACTGCA (SEQ ID NO: 28) sgspGAAdup328: GCTGCAGAAGGTGACTGCAGA (SEQ ID NO: 29)
Cell culture.
Mouse embryonic stem cell lines used have been described previously and were cultured as described previously44. HEK293T, HCT116, and U20S cells were purchased from ATCC and cultured as recommended by ATCC. For stable Tol2 transposon plasmid integration, cells were transfected using Lipofectamine 3000 (Thermo Fisher) using standard protocols with equimolar amounts of Tol2 transposase plasmid25 (a gift from Koichi Kawakami) and transposon- containing plasmid. For library applications, l5-cm plates with >l07 initial cells were used, and for single gRNA targeting, 6-well plates with >l06 initial cells were used. To generate lines with stable Tol2-mediated genomic integration, selection with the appropriate selection agent at an empirically defined concentration (blasticidin, hygromycin, or puromycin) was performed starting 24 hours after transection and continuing for > 1 week. In cases where sequential plasmid integration was performed such as integrating gRNA/target library and then Cas9 or micro duplication plasmid and then Cas9 plus gRNA, the same Lipofectamine 3000 transfection protocol with Tol2 transposase plasmid was performed each time, and >1 week of appropriate drug selection was performed after each transfection.
Deep sequencing.
Genomic DNA was collected from cells after >1 week of selection. For library samples, 16 pg gDNA was used for each sample; for individual locus samples, 2 pg gDNA was used; for plasmid library verification, 0.5 pg purified plasmid DNA was used.
For individual locus samples, the locus surrounding CRISPR/Cas9 mutation was PCR amplified in two steps using primers >50-bp from the Cas9 target site. PCR1 was performed using the primers specified below. PCR2 was performed to add full-length Illumina sequencing adapters using the NEBNext Index Primer Sets 1 and 2 (NEB) or internally ordered primers with equivalent sequences. All PCRs were performed using NEBNext polymerase (New England Bioscience). Extension time for all PCR reactions was extended to lmin per cycle to prevent skewing towards GC-rich sequences. The pooled samples were sequenced using NextSeq (Illumina) at the Harvard Medical School Biopolymers Facility, the MIT BioMicro Center, or the Broad Institute Sequencing Facility.
Library prep primers:
For LDLRDup252, 254, 255, 258, 261 :
120417_LDLRDup254_r 1 seq_A CTTTCCCTACACGACGCTCTTCCGATCT NNN
ACTCCAGCTGGCGCTGTGAT (SEQ ID NO: 30)
120417_LDLR254_r2seq_ A GGAGTTCAGACGTGTGCTCTTCCGATCT
CAACTTCATCGCTCATGTCCTTG (SEQ ID NO: 31)
For LDLRDup264:
120817_LDLR264_r 1 seq_B CTTTCCCTACACGACGCTCTTCCGATCT
NNNAACTCCCGCCAAGATCAAGAAAG (SEQ ID NO: 32)
1208 l7_LDLR264_r2seq_B GGAGTTCAGACGTGTGCTCTTCCGATCT CAGCCTCTTTTCATCCTCCAAGA (SEQ ID NO: 33)
For PQRCDup20:
120517_PORCN20_r 1 seq CTTTCCCTACACGACGCTCTTCCGATCT NNN
CCTCCTACATGGCTTCAGTTTCC (SEQ ID NO: 34)
1205 l7_PORCN20_r2seq GGAGTTCAGACGTGTGCTCTTCCGATCT
CCAGAGCTCCAAAGAGCAAGTTT (SEQ ID NO: 35)
For GLBlDup84:
120517_GLB 184_r 1 seq CTTTCCCTACACGACGCTCTTCCGATCT NNN
AGCCACTCTGGACCTTCTGGTA (SEQ ID NO: 36)
1205 l7_GLB 184_r2seq GGAGTTCAGACGTGTGCTCTTCCGATCT
CCAGTCCGTGAGGATATTGGAAC (SEQ ID NO: 37)
For GAADup327/328:
120517_GAA327_r 1 seq CTTTCCCTACACGACGCTCTTCCGATCT NNN
GATCGTGAATGAGCTGGTACGTG (SEQ ID NO: 38)
1205 l7_GAA327_r2seq GGAGTTCAGACGTGTGCTCTTCCGATCT
AACAGCGAGACACAGATGTCCAG (SEQ ID NO: 39)
Data availability.
High-throughput sequencing data have been deposited in the NCBI Sequence Read Archive database under accession codes SRP141261 and SRP141144.
Code availability.
All data processing, analysis, and modeling code is available
atgithub.com/maxwshen/inDelphi-dataprocessinganalysis. The inDelphi model is available online at the URLcrisprindelphi.design.
Library cloning protocol
Synthesized oligo library sequence GATGGGTGCGACGCGTCATT55bpTarget1 AGATCGGAAGAGC ACACGTCTG A A T A TTG TGGA AAGGACGAAACACCGr !9/20-nt PROTOSPACER depending on whether it naturally starts with a G] GTTT A AG AGCT ATGCTGG A A AC AGC (SEQ ID NO: 40)
.ynker region / Oli^o library amplification primer anneal region
Read 2 sequencing primer stub
Sspl restriction site
U6-promoter stub
sgRNA-hairpin stub
1. Oligo library QPCR to determine number of amplification cycles for Oligo Library PCR
Notes: Amplification of oligos with relatively low GC-content is less efficient than GC-rich sequences. It was found that NEBNext polymerase was the least biased in amplification of the library. Increasing the elongation time to 1 min per cycle for all cloning and sequencing library prep PCRs eliminates GC-skewing of library sequences and reduces the rate ofPCR- recombination. Set up the following reaction:
Figure imgf000101_0001
67°C annealing temperature
- Check 246bp amplicon size on 2.5% agarose gel.
- Determine the point that signal amplification has plateaued.
2. Oligo Library PCR amplification
- Set up the following reaction:
Figure imgf000102_0001
67°C annealing temperature, 1 minute extension time.
Cycle number is half the number of cycles needed to reach signal amplification plateau in the QPCR in step 1 , reduced by 1 cycle to scale for DNA input.
- PCR purify amplified sequence.
3. Donor template amplification
- Set up the following reaction:
Figure imgf000102_0002
62°C annealing temperature
20 cycles
- Gel purify l67bp band from 2.5% agarose gel.
4. Circular assembly and restriction digest linearization
Note: A molar ratio of donor template to amplified oligo library of 3:1 was used. An increase in amplified oligo library compounds cross-over within library members resulting in mismatch of protospacer and target sequences. Set up the following reaction:
Figure imgf000103_0001
50°C incubation for 1 hour.
- Exonuclease treatment
Figure imgf000103_0002
37°C incubation for 1 hour.
- PCR purify and elute in 50 ul.
- Digest to linearize library
Figure imgf000103_0003
37°C incubation for > 3 hours.
- Gel purify 273bp band from 2.5% agarose gel.
Note: Band is sometimes fuzzy and poorly visible. If not clearly discernible, proceed with gel isolation between 200-300bp.
5. Linearized library QPCR to determine number of amplification cycles for PCR amplification
- Set up the following reaction:
Figure imgf000104_0001
65 °C annealing temperature
- Determine the point that signal amplification has plateaued.
6. Linearized Library PCR amplification
- Set up the following reaction:
Figure imgf000104_0002
65°C annealing temperature, 1 minute extension time.
Cycle number is number of cycles needed to reach signal amplification plateau in the QPCR in step 5, reduced by 4 cycles to scale for increased DNA input.
- Gel purify 375bp band from 2.5% agarose gel.
7. Vector backbone digest
- Set up the following reaction:
Figure imgf000104_0003
37°C incubation for > 3 hours.
- Gel purify 5.9 kb band from 1% agarose gel.
8. Vector assembly and cleanup
Note: Include a ligation with water for insert as a control.
Set up the following reaction:
Figure imgf000105_0001
50°C incubation for 1 hour.
- Isopropanol precipitation
Figure imgf000105_0002
- Vortex and incubate at room temperature for 15 minutes.
- Spin down at >l5.000g for 15 minutes, and carefully remove supernatant.
- Wash pellet with 300ul 80% EtOH and repeat spin at >l5.000g for 5 minutes.
- Carefully remove all liquid without disturbing pellet, and let air dry for 1-3 minutes. - Dissolve dried pellet in 10 ul H20 at 55°C for 10 minutes.
9. Transformation
Note: Electroporation competent cells give a higher transformation efficiency than chemically competent cells. NEB1 Obeta electro-competent cells were used, however these can be substituted for other lines and transformed according to the manufacturer’s instructions. Note: DRM was used as recovery and culture medium to enhance yield. If substituting for a less rich medium such as LB, it isrecommended scaling up the culture volume to obtain similar plasmid DNA quantities.
Note: Antibiotic-free recovery time should be limited to 15 minutes to prevent shedding of transformed plasmids from replicating bacteria.
Note: Also transform water ligation as control. Pre-warm 3.5mL recovery medium per electroporation reaction, at 37°C for 1 hour.
- Pre-warm LB -agar plates containing appropriate antibiotic.
- Per reaction, add 1 ul purified vector assembly to 25ul competent cells on ice.
Perform 8 replicate reactions.
- Electroporate according to the manufacturer’s instructions.
- Immediately add 100 ul pre-warmed recovery media per cuvette and pool all
replicates into culture flask.
- Add 1 mL recovery media per replicate reaction to culture flask and shake at
200rpm 37°C for 10 - 15 minutes.
- Plate a dilution series from 1: 104 - 1 : 106 on LB -agar plates containing antibiotic
and grow overnight at 37°C
- Add 2 mL media per replicate reaction and admix appropriate antibiotic.
- Grow overnight in shaking incubator at 200rpm 37°C
- Assess transformation efficiency from serial dilution LB-agar plates. Expect -106 clones.
The development of this cloning protocol was guided by work described in Videgal et al. 2015.
Sequence alignment and data processing
Lor library data, each sequenced pair of gRNA fragment and target was associated with a set of designed sequence contexts G by finding the designed sequence contexts for all gRNAs whose beginning section perfectly matches the gRNA fragment (read 1 in general does not fully sequence the gRNA), and by using locality sensitive hashing (LSH) with 7-mers on the sequenced target to search for similar designed targets. An LSH score on 7-mers between a reference and a sequenced context reflects the number of shared 7-mers between the two. If the best reference candidate scored, through LSH, greater than 5 higher than the best LSH score of the reference candidates obtained from the gRNA-fragment, the LSH candidate is also added to G. LSH was used due to extensive (-33% rate) PCR recombination between readl and read2 which in sequenced data appears as mismatched readl and read2 pairs. The sequenced target was aligned to each candidate in G and the alignment with the highest number of matches is kept. Sequence alignment was performed using the Needleman-Wunsch algorithm using the parameters: +1 match, -1 mismatch, -5 gap open, -0 gap extend. For library data, starting gaps cost 0. For all other data, starting and ending gaps cost 0. For VO data, sequence alignments were derived from SAM files from SRA.
Alignments with low-accuracy or short matching sections flanked by long (10 bp+) insertions and deletions were filtered out as PCR recombination products (observed frequency of -5%). These PCR recombination products are different than that occurring between readl and read2; these occur strictly in read2. Alignments with low matching rates were removed.
Deletions and insertions were shifted towards the expected cleavage site while preserving total alignment score. CRISPR-associated DNA repair events were defined as any alignment with deletions or insertions occurring within a 4 bp window centered at the expected cut site and any alignment with both deletions and insertions (combination indel) occurring with a 10 bp window centered at the expected cut site. All CRISPR-associated DNA repair events observed in control data had their frequencies subtracted from treatment data to a minimum of 0.
Replicate experiments were carried out for library data in each cell type. For each cell- type, each sequence context not fulfilling the following data quality criteria was filtered: data at this sequence context in the two replicates with the highest read-counts must have at least 1000 reads of CRISPR editing outcomes in both replicates, and a Pearson correlation of at least 0.85 in the frequency of microhomology-based deletion events. The class of microhomology-based deletion events was used for this criterion since it is a major repair class with the highest average replicability across experiments. For disease library data in U20S and HEK, a less stringent read count threshold of 500 was used instead.
Details on alignment processing
All alignments with gaps were shifted as much as possible towards the cleavage site while preserving the overall alignment score. Then, the following criteria were used to categorize the alignments into noise, not-noise but not CRISPR-associated (for example, wildtype); as well as primary and secondary CRISPR activity. All data used in modeling and analysis derive solely from outcomes binned into primary CRISPR activity.
The following criteria was used to filter library alignments into“noise” categories.
Homopolymer: Entire read is homopolymer of a single nucleotide. Not considered a CRISPR repair product.
Has N: Read contains at least one N. Discarded as noise, not considered a CRISPR repair product.
PCR Recombination: Contains recombination alignment signature: (1) if a long indel (10 bp+) followed by chance overlap followed by long indel (10 bp+) of the opposite type, e.g., i n serti on -ran dom m atch -del eti on and del eti on -ran dom m atch -i n serti on . OR, if one of these two indels is 30 bp+, the other can be arbitrarily short. If either criteria is true, and if the chance overlap is length 5 or less, or any length with less than 80% match rate, then it satisfies the recombination signature. In addition, if both indels are 30 bp+, regardless of the middle match region, it satisfies the recombination signature. Finally, if randommatch is length 0, then indel is allowed to be any length. Not considered a CRISPR repair product.
Poor-Matches: 55bp designed sequence context has less than 5 bp representation (could occur from 50 bp+ deletions or severe recombination) or less than 80% match rate. Not considered a CRISPR repair product.
Cutsite-Not-Sequenced: The read does not contain the expected cleavage site.
Other: An alignment with multiple indels where at least one non-gap region has lower than an 80% match rate. Or generally, any alignment not matching any defined category above or below. In practice, can include near-homopolymers. Not considered a CRISPR repair product.
The following criteria was used to filter library alignments into“main” categories.
Wildtype: No indels in all of alignment. Not considered a CRISPR repair product.
Deletion: An alignment with only a single deletion event. Subdivided into:
Deletion - Not CRISPR: Single deletion occurs outside of 2 bp window around cleavage site. Not considered a CRISPR repair product.
Deletion - Not at cut: Single deletion occurring within 2 bp window around cleavage site, but not immediately at cleavage site. Considered a CRISPR repair product. Deletion: Single deletion occurring immediately at cleavage site. Considered a CRISPR repair product.
Insertion: An alignment with only a single insertion event. Subdivided into:
Insertion - Not CRISPR: Single insertion occurs outside of 10 bp window around cleavage site. Not considered a CRISPR repair product.
Insertion - Not at cut: Single insertion occurring within 2 bp window around cleavage site, but not immediately at cleavage site. Considered a CRISPR repair product.
Insertion: Single insertion occurring immediately at cleavage site. Considered a CRISPR repair product.
Combination indel: An alignment with multiple indels where all non-gap regions have at least 80% match rate. Subdivided into:
Combination Indel: All indels are within a 10 bp window around the cleavage site.
Considered a primary CRISPR repair product.
Forgiven Combination Indel: At least two indels, but not all, are within a 10 bp window around the cleavage site. Considered a rarer secondary CRISPR repair product, ignored.
Forgiven Single Indel: Exactly one indel is within a 10 bp window around the cleavage site. Considered a rarer secondary CRISPR repair product, ignored.
Combination Indel - Not CRISPR: No indels are within a 10 bp window around the cleavage site. Not considered a CRISPR repair product.
It is noted that deletion and insertion events, even those spanning many bases, are defined to occur at a single location between bases. As such, events occurring up to 5 bp away from the cleavage site are defined as events where there are five or fewer matched/mismatched alignment positions between the event and the cleavage site, irrespective of the number of gap dashes in the alignment.
Selection of variants from disease databases
Disease variants were selected from the NCBI ClinVar database (downloaded September 9, 2017)16 and the Human Gene Mutation Database (publicly available variant data from before 2014.3)17 for computational screening and subsequent experimental correction.
A total of 4,935 unique variants were selected from Clinvar submissions where the functional consequence is described as complete insertions, deletions, or duplications where the reference or alternate allele is of length less than or equal to 30 nucleotides. Variants were included where at least one submitting lab designated the clinical significance as‘pathogenic’ or ‘likely pathogenic’ and no submitting lab had designated the variant as‘benign’ or‘likely benign’, including variants will all disease associations. More complex indels and somatic variants were included. A total of 18,083 unique insertion variants were selected from HGMD which were between 2 to 30 nucleotides in length. Variants were included with any disease association with the HGMD classification of‘DM’ or disease-causing mutation.
SpCas9 gRNAs and their cleavage sites were enumerated for each disease allele. Using a previous version of inDelphi, genotype frequency and indel length distributions were predicted for each tuple of disease variant and unique cleavage site. Among each unique disease, the single best gRNA was identified as the gRNA inducing the highest predicted frequency of repair to wildtype genotype, and if this was impossible (due to, for example, a disease allele with 2+ bp deletion), then the single best gRNA was identified as the gRNA inducing the highest predicted frameshift repair rate. 1327 sequence contexts were designed in this manner for Lib-B. An additional 265 sequence contexts were designed by taking the 265 sequence contexts in any disease in decreasing order of predicted wildtype repair rate, starting with Clinvar, stopping at 45% wildtype repair rate, then continuing with HGMD. This yielded 1592 total sequences derived from Clinvar and HGMD.
Definition of Delta-Positions
Using the MMEJ mechanism, deletion events can be predicted at single-base resolution. For computational convenience, the tuple (deletion length, delta-position) was used to construct a unique identifier for deletion genotypes. A delta-position associated with a deletion length N is an integer between 0 and N inclusive (FIGs. 19A-19D). In a sequence alignment, a delta-position describes the starting position of the deletion gap in the read w.r.t. the reference sequence relative to the cleavage site. For a deletion length N and a cleavage site at position C such that seq[:C] and seq[C:] yield the expected DSB products where the vector slicing operation vector[indexl:index2] is inclusive on the first index and exclusive on the second index (python style), a delta-position of 0 corresponds to a deletion gap at seq[C-N+0 : C+0], and generally with a delta-position of D, the deletion gap occurs at seq[C-N+D : C+D]. Microhomologies can be described with multiple delta-positions. To uniquely identify microhomology-based deletion genotypes, the single maximum delta-position in the redundant set is used. Microhomology-less deletion genotypes are associated with only a single delta position and deletion length tuple; this was used as its unique identifier.
Another way to define delta-positions can be motivated by the example workflow shown above on MH deletions describing how each microhomology is associated with a deletion genotype. In that workflow, the delta-position is the number of bases included on the top strand before“jumping down” to the bottom strand.
MH-less medial end-joining products correspond to all MH-less genotypes with delta- position between 1 and N-l where N is the deletion length. MH-less unilateral end-joining products correspond to MH-less genotypes with delta-position 0 or N. It is noted that a deletion genotype with delta position N does not immediately imply that it is a microhomology-less unilateral end-joining product since it may contain microhomology (it’s possible that delta- positions N-j, N-j+l, .., N all correspond to the same MH deletion.)
Definition of Precision Score
For a distribution X, where IXI indicates its cardinality (or length when represented as a vector):
PrecisionScore(X) = 1 - ~ å^ p x^0 (p(Xj))
log(|X|)
This precision score ranges between zero (minimally precise, or highest entropy) to one (maximally precise, or lowest entropy). inDelphi Deletion Modeling: Neural network input and output
inDelphi receives as input a sequence context and a cleavage site location, and outputs two objects: a frequency distribution on deletion genotypes, and a frequency distribution on deletion lengths.
To model deletions, inDelphi trains two neural networks: MH-NN and MHless-NN. MH-NN receives as input a microhomology that is described by two features:
microhomology length and GC fraction in the microhomology. Using these features, MH-NN outputs a number (psi). MHless-NN receives as input the deletion length. Using this feature, MHless-NN outputs a number (psi). A phi score is obtained from a psi score using: phi_i = exp(psi_i - 0.25*deletion_length), where 0.25 is a“redundant” hyperparameter that serves to reduce training speed by helpful scaling. This relationship between psi and phi is differentiable and encodes the assumption that the frequency of an event exponentially increases with neural network output psi (which empirically appears to reflect MH strength) and exponentially decreases with its minimum necessary resection length (deletion length).
The architecture of the MH-NN and MHless-NN networks are input-dimension -> 16 -> 16 -> 1 for a total of two hidden layers where all nodes are fully connected. Sigmoidal activations are used in all layers except the output layer. All neural network parameters are initialized with Gaussian noise centered around 0. inDelphi Deletion Modeling: Making predictions
Given a sequence context and cleavage site, inDelphi enumerates all unique deletion genotypes as a tuple of its deletion length and its delta-position for deletion lengths from 1 bp to 60 bp. For each microhomology enumerated, an MH-phi score is obtained using MH-NN. In addition, for each deletion length from 1 bp to 60 bp, an MHindep-phi score is obtained using MHless-NN.
inDelphi combines all MH-phi and MHindep-phi scores for a particular sequence context into two objects - a frequency distribution on deletion genotypes, and a frequency distribution on deletion lengths - which are both compared to observations for training. The model is designed to output two separate objects because both are of biological interest, and separate but intertwined modeling approaches are useful for generating both. By learning to generate both objects, inDelphi jointly learns about microhomology-based deletion repair and microhomology less deletion repair.
To generate a frequency distribution on deletion genotypes, inDelphi assigns a score for each microhomology. Score assignment considers the concept of“full” microhomology and treats full and not full MHs differently.
A microhomology is“full” if the length of the microhomology is equal to its deletion length. The biological significance of full microhomologies is that there is only a single deletion genotype possible for the entire deletion length, while in general, a single deletion length is consistent with multiple genotypes. In addition, this single genotype can be generated through not just the MH-dependent MMEJ mechanism but also through MH-less end-joining, for example as mediated by Lig4. Therefore, full microhomologies were modelled as receiving contributions from both MH-containing and MH-less mechanisms by scoring full
microhomologies as MH-phi[i] + MHindep-phi[j] for deletion length j and microhomology index i. Microhomologies that are not“full” are assigned a score of MH-phi[i] for MH index i.
Scores for all deletion genotypes assigned this way are normalized to sum to 1 to produce a predicted frequency distribution on deletion genotypes.
To generate a frequency distribution on deletion lengths, inDelphi assigns a score for each deletion length. Score assignment integrates contributions from both MH-dependent and MH-independent mechanisms via the following procedure: For each deletion length j, its score is assigned as MHindep-phi[j] plus the sum of MH-phi for each microhomology with that deletion length. Scores for all deletion lengths are normalized to sum to 1 to produce a frequency distribution.
inDelphi trains its parameters using a single sequence context by producing both a predicted frequency distribution on deletion genotypes and deletion lengths and minimizing the negative of the sum of squared Pearson correlations for both objects to their observed versions.
In practice, deletion genotype frequency distributions are formed from observations for deletion lengths 1-60, and deletion length frequency distributions are formed from observations for deletion lengths 1-28. Both neural networks are trained simultaneously on both tasks. inDelphi is trained with stochastic gradient descent with batched training sets. inDelphi is implemented in Python using the autograd library. A batch size of 200, an initial weight scaling factor of 0.10, an initial step size of 0.10, and an exponential decaying factor for the step size of 0.999 per step were used. inDelphi Deletion Modeling: Summary and Revisiting Assumptions
In summary, inDelphi trains MH-NN, which uses as input (microhomology length, microhomology GC content) to output a psi score which is translated into a phi score using deletion length. This phi score represents the“strength” of the microhomology corresponding to a particular MH deletion genotype. It also trains MHless-NN which uses as input (deletion length) to directly output a phi score representing the“total strength” of all MH-independent activity for a particular deletion length. While the model assumes that microhomology and microhomology-less repair can overlap in contributions to a single repair genotype, this assumption is made conservatively by assuming that their contributions overlap only when there is no alternative. Specifically, in the context of a deletion length with full microhomology, the model assumes that they must overlap, while in the context of a deletion length without full microhomology, inDelphi allows MHindep- phi to represent all MH-less repair genotypes and none of the MH-dependent repair genotypes which are represented solely using their MH-phi scores. This can be seen by noting that at a deletion length j without full microhomology, MH genotypes are scored using their MH-phi scores, while the length j is scored by MHindep-phi[j] plus the sum of MH-phi for each microhomology. Therefore, the subset of MH-less genotypes at this deletion length have a score MHindep-phi[j].
When the subset of MH-less genotypes includes only one MH-less genotype, this single genotype’s score is equal to MHindep-phi[j]. In general, multiple MH-less genotypes are possible, in which case the total score of all of the MH-less genotypes is equal to MHindep- phi [j ] -
The relative frequency of MH deletions and MH-less deletions is learned implicitly by the balancing between the sum of all MH-phi and MHindep-phi. Since MHindep-phi does not vary by sequence context while MH-phi does, the model assumes that variation in the fraction of deletions that use MH can at least partially be explained by varying sequence microhomology as represented by MH-NN. inDelphi Insertion Modeling
Once inDelphi is trained on both deletion tasks, it predicts insertions from a sequence context and cleavage site by using the precision score of the predicted deletion length distribution and total deletion phi (from all MH-phi and MHindep-phi). inDelphi also uses one- hot-encoded binary vectors encoding nucleotides -4 and -3. In a training set, these features are collected and normalized to zero mean and unit variance, and the fraction of l-bp insertions over the sum counts of l-bp insertions and all deletions are tabulated as the prediction goal. A k- nearest neighbor model is built using the training data. inDelphi uses the default parameter k = 5.
On test data, the above procedure is used to predict the frequency of 1 -bp insertions out of l-bp insertions and all deletions for a particular sequence context. Once this frequency is predicted, it is used to make frequency predictions for each of the 4 possible insertion genotypes, which are predicted by deriving from the training set the average insertion frequency for each base given its local sequence context. When the training set is small, only the -4 nucleotide is used. When the training set is relatively large, nucleotides -5, -4, and -3 are used.
To produce a frequency distribution on l-bp insertions and 1-60 bp deletion genotypes, scores for all deletion genotypes and all l-bp insertions are normalized to sum to 1. To produce a frequency distribution on indel lengths (+1 to -60), scores for all deletion lengths and l-bp insertions are normalized to sum to 1. inDelphi: Repair classes predicted at varying resolution
inDelphi predicts MH-deletions and l-bp insertions at single base resolution. Measuring performance on the task of genotype frequency prediction considers this subset of repair outcomes only (about 60-70% of all outcomes).
inDelphi predicts MH-less deletions to the resolution of deletion length. That is, inDelphi predicts a single frequency corresponding to the sum total frequency of all unique MH-less deletion genotypes possible for a particular deletion length. This modeling choice was made because genotype frequency replicability among MH-less deletions is substantially lower than among MH deletions.
Measuring performance on the task of indel length frequency considers MH deletions, MH-less deletions, and l-bp insertions (90% of all outcomes).
In practice, if end-users desire, they can extend inDelphi predictions to frequency predictions for specific MH-less deletion genotypes by noting that MH-less deletions are distributed uniformly between 0 delta-position genotypes, medial genotypes, and N delta- position genotypes.
Comparison with a linear baseline model
inDelphi was compared to a baseline model with the same model structure but replacing the deep neural networks with linear models. The comparison was done using Lib-A mESC data. While inDelphi achieves a mean held-out Pearson correlation of 0.851 on deletion genotype frequency prediction and 0.837 on deletion length frequency prediction, the linear baseline model achieves a mean held-out Pearson correlation of 0.816 on deletion genotype frequency prediction and 0.796 on deletion length frequency prediction. When including the third model component for l-bp insertion modeling and testing on genotype frequency prediction for l-bp insertions and all deletions, inDelphi achieves a median held-out Pearson correlation of 0.937 and 0.910 on the task of indel length frequency prediction. The linear baseline model achieves a median held-out Pearson correlation of 0.919 and 0.900 on the two tasks respectively.
From these results, it is shown that much of the model’s power is derived from its designed structure which is independent of the choice of linear or non-linear modeling. While the baseline does not significantly cripple the model, the use of deep nonlinear neural networks offers a substantial performance improvement (10-24%) above linear modeling. In addition, the strong performance of the linear baseline model highlights that the prediction task, given the model structure, is relatively straightforward. This suggests that the model should be able to generalize well to unseen data.
The deep neural network version of MH-NN learns that microhomology length is more important than % GC (FIGs. 18A-18H). The linear version learns the same concept, with a weight of 1.1585 for MH length and 0.332 for % GC.
Comparison with a baseline model lacking microhomology length as a feature
Microhomology length is an important feature for MH-NN (FIGs. 18A-18H). A model was trained that uses only % GC as input to MH-NN while keeping the rest of the model structure identical. On held-out data, this baseline model at convergence achieves to a mean Pearson correlation of 0.59 on the task of predicting deletion genotype frequencies, and a mean Pearson correlation of 0.58 on the task of predicting deletion length frequencies. Notably, a model at iteration 0 with randomly initialized weights achieves mean Pearson correlations of 0.55 and 0.54 on the two respective tasks on held-out data. This basal Pearson correlation is relatively high due to the model structure, in particular, the exponential penalty on deletion length. In sum, removing MH length as a feature severely impacts model performance, restricting it to predictive performance not appreciably better than random chance.
inDelphi training and testing on data from varying cell-types For predicting genotype and indel length frequencies in any particular cell-type C where data D is available, inDelphi’s deletion component was first trained on a subset of Lib-A mESC data. Then, k-fold cross-validation was applied on D where D is iteratively split into training and test datasets. For each cross-validation iteration, the training set is used to train the insertion frequency model (k-nearest neighbors) and insertion genotype model (matrix of observed probabilities of each inserted base given local sequence context, which is just the -4 nucleotide when the training dataset is small, and -5, -4 and -3 nucleotides when the training dataset is large). For each cross-validation iteration, predictions are made at each sequence context in the test set which are compared to observations for each sequence context to yield a Pearson correlation. For any particular sequence context, the median test-time Pearson correlation across all cross-validation iterations is used as a single number summary of the overall performance of inDelphi. For all reported results, lOO-fold cross-validation was used with 80%/20% training and testing splits. Empirically, small variance in test-time Pearson correlation was observed, highlighting the stability of inDelphi’s modeling approach. inDelphi testing on endogenous VO data
On this task, the deletion component of inDelphi was trained on a subset of the Lib-A mESC data. For each cell type in HCT116, K562, and HEK293T, all VO sequence contexts (about 100) were randomly split into training and test datasets 100 times. During each split, the training set was used for k-nearest neighbor modeling of l-bp insertion frequencies. Feature normalization to zero mean and unit variance was not performed. The average frequency of each l-bp insertion genotype was derived from the training set as well. For each of the -100 sequence contexts, the median test-time Pearson correlation was used for plotting in FIGs. 13A-13D. Due to the small size of the training set, only the -4 nucleotide was used for modeling both the insertion frequency and insertion genotype frequencies. inDelphi testing on library data
On this task, the deletion component of inDelphi was trained on a subset of the Lib-A mESC data. The remaining test set was used for measuring test-time prediction performance on Lib-A. Nucleotides -5, -4, and -3 were used for the insertion genotype model. For testing on Lib- B, Lib-B was split into training and test datasets in the same manner as with VO data. Nucleotide -4 was used for the insertion genotype model. The median test-time Pearson correlation is used as a single number summary of the overall performance of inDelphi on any particular sequence context. For reporting predictive results in FIGs. 15A-15F, sequence contexts with low replicability (less than 0.85 Pearson correlation) in observed editing outcome frequencies were first removed. inDelphi training and testing on Prkdc ^Lig 41 data
inDelphi was trained on data from 946 Lib-A sequence contexts and tested on 168 held- out Lib-A sequence contexts. Nucleotide -4 was used for insertion rate modeling, all other modeling choices were standard as described above. On held-out data, this version of inDelphi achieved a median Pearson correlation of 0.84 on predicting indel genotype frequencies, and 0.80 on predicting indel length frequencies.
Training the online public version of inDelphi and its expected properties
For general-use on arbitrary cell types, a version of inDelphi was trained using additional data from diverse types of cells. Deletion modeling was trained using data from 2,464 sequence contexts from high-replicability Lib-A and Lib-B data (including clinical variants and microduplications, fourbp, and longdup) in mES and data from VO sequence contexts in HEK293 and K562. Insertion frequency modeling is implemented as above. Insertion genotype modeling uses nucleotides -5, -4, and -3. The insertion frequency model and insertion genotype model are trained on VO endogenous data in K562 and HEK293T, Lib-A data in mESC, and Lib-B data (including clinical variants and microduplications, fourbp, and longdup) in mESC and U20S.
Though MHless-NN, as trained on library data, never receives information on deletion lengths beyond 28, it was allowed to generalize its learned function and make predictions on deletion lengths up to 60 bp to match the supported range of MH-NN.
inDelphi makes predictions on l-bp insertions and 1 -60-bp deletions, which were empirically shown to consist of higher than 90% of all Cas9 editing outcomes in data from multiple human and mouse cell lines. Nevertheless, there is a subset of repair (about 8% on average) that inDelphi does not attempt to predict. It is suggested that end-users, depending on what predictive quantities are of interest, take this into account when using inDelphi. For example, if inDelphi predicts that 60% of l-bp insertions and 1 -60-bp deletions at a disease allele correspond to repair to wildtype genotype, a quantity of interest may be the rate of wildtype repair in all Cas9 editing outcomes (including the 8% not predicted by inDelphi). In such a situation, this quantity can be calculated as (92%*60%) = 55.2%.
By the design of 1872 sequence contexts in Lib-A, the training dataset has rich and uniform representation across all quintiles of several major axes of variation including GC content, precision, and number of bases participating in microhomology as measured empirically in the human genome. This design strategy enables inDelphi to generalize well to arbitrary sequence contexts from the human genome.
These training data further include data in the outlier range of statistics of interest, including extremely high and low precision repair distributions, and extremely weak and strong microhomology (minimal microhomology and extensive microduplication microhomology sequences). The availability of such sequences in the training data enables inDelphi to generalize well to sequence contexts of clinical interest and sequence contexts supporting unusually high frequencies of precision repair. In particular, by training on more than 1000 examples of repair at clinical microduplications, inDelphi has received strong preparation for accurate prediction on other clinical microduplications.
By training on data from many cell-types, inDelphi was enabled to make predictions that are generally applicable to many human cell-types. It is noted that the HCT116 human colon cancer cell line experiences a markedly higher frequency of single base insertions compared to all other cell lines that were studied, possibly due to the MLH1 deficiency of this cell line leading to impaired DNA mismatch repair. For this reason, HCT116 data was excluded from the training dataset. For best results, it is suggested that end-users keep in mind that repair class frequencies can be cell type-dependent, and this issue has not been well-characterized thus far.
It is noted that inDelphi’s main error tendency is on the side of overestimating rather than underestimating the precision of repair (FIGs. 14A-14F, FIGs. 15A-15F). In general, this tendency can be explained by noting that inDelphi only considers sequence microhomology as a factor, while it’s plausible and likely in biological experimental settings that even sequence contexts with very strong sequence microhomology may not yield precise results due to noise factors that are not considered by inDelphi. For best results, it is recommended that end-users take this tendency into account when using inDelphi predictions for further experiments. In particular, if gRNAs are designed by using a minimum precision threshold, end-users should recognize that observed repair outcomes may have empirical precision under this threshold. However, conversely, it is unlikely that a gRNA will have precision higher than what inDelphi predicts.
Lib-A design (see Table 4)
All designed sequence contexts were 55 bp in length with cutting between the 27th and 28th base.
1872 sequence contexts were designed by empirically determining the distribution of four statistics in sequence contexts from the human genome. These four statistics are GC content, total sum of bases participating in microhomology for 3-27 -bp deletions, Azimuth predicted on- target efficiency score, and the statistical entropy of the predicted 3-27 -bp deletion length distribution from a previous version of inDelphi. For each of these statistics, empirical quintiles were derived by calculating these statistics in a large number of sequence contexts from the human genome. For the library, sequence contexts were designed by randomly generated DNA that categorized into each combination of quintiles across each of the four statistics. For example, a sequence context falling into the Ist quintile in GC, 2nd quintile of total MH, Ist quintile of Azimuth score, and 5th quintile of entropy, was found by random search. With four statistics and five bins each (due to quintiles), there are 54 = 625 possible combinations. For each combination, it was attempted to design three sequence contexts for a total of 1875; 3 sequences could not be designed (for a total of 1872) though each bin was filled. 90 sequence contexts were designed from VO sequence contexts. Other sequence contexts were also designed for a total of 2000 sequence contexts in Lib-A. Lib-A sequence names, gRNAs, and sequence contexts are listed in Table 4 (appended, forming part of the instant specification).
Lib-B design (see Table 5)
All designed sequence contexts were 55 bp in length with cutting between the 27th and 28th base.
1592 sequence contexts were designed from Clinvar and HGMD (see section on
Selection of variants from disease databases). Some disease sequence contexts were designed that such that the corrected wildtype or frameshift allele supports further cutting by the original gRNA; data from such sequence contexts were ignored during analysis. 57“longdup” sequence contexts were designed by repeating the following procedure three times: for N = 7 to 25, an N- mer was randomly generated, then duplicating and surrounded by randomly generated sequences, while ensuring that SpCas9 NGG was included and appropriately positioned for cutting between positions 27 and 28. 90 sequence contexts were designed from VO sequence contexts. 228“fourbp” sequence contexts were designed at 3 contexts with random sequences (with total phi score on average lower than VO sequence contexts) while varying positions -5 to - 2; for each of the 3“low-microhomology” contexts, 76 four bases were randomly designed while ensuring representation from all possible 2 bp microhomology patterns including no
microhomology, one base of microhomology at either position, and full two bases of
microhomology. Other sequence contexts were also designed for a total of 2000 sequence contexts in Lib-B. Lib-B sequence names, gRNAs, and sequence contexts are listed in Table 5.
Generating a DNA motif for 1-bp insertion frequencies
Nucleotides from positions -7 to 0 were one-hot-encoded and used in ridge regression to predict the observed frequency of l-bp insertions out of all Cas9 editing events in 1996 sequence contexts from Lib-A mESC data. The data were split into training and testing sets (80/20 split) 10,000 times to calculate a bootstrapped estimate of linear regression weights and test-set predictive Pearson correlation. The median test-set Pearson correlation was found to be 0.62. To generate a DNA motif, any features that included 0 within the bootstrapped weight range were excluded (probability that the weight is zero > le-4). The average bootstrapped weight estimate was used as the“logo height” for all remaining features. Each feature is independent; vertical stacking of features follows the published tradition of DNA motifs.
Plasmid and insert sequences
P2T-CAG-MCS-P2A-GFP-PuroR complete plasmid sequence
CCACCTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAAT C AGCTC ATTTTTT A ACC A AT AGGCCG A A ATCGGC A A A ATCCCTT AT A A ATC A A A AG A ATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAA AGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCC ACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCAC TAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGC
GAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGC
TGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGC
GCCGCTACAGGGCGCGTCCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGG
GCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGC
TGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAA
CGACGGCCAGTGAGCGCGCGTAATACGACTCACTATAGGGCGAATTGGGTACCG
GCATATGGTTCTTGACAGAGGTGTAAAAAGTACTCAAAAATTTTACTCAAGTGAAAG
TACAAGTACTTAGGGAAAATTTTACTCAATTAAAAGTAAAAGTATCTGGCTAGAATC
TTACTTGAGTAAAAGTAAAAAAGTACTCCATTAAAATTGTACTTGAGTATTAAGGAA
GTAAAAGTAAAAGCAAGAAAGATCGATCTCGAAGGATCTGGAGGCCACCATGGTG
TCGATAACTTCGTATAGCATACATTATACGAAGTTATCGTGCTCGACATTGATTATT
GACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGA
GTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGAC
CCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGAC
TTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTAC
ATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG
CCCGCCTGGC ATT ATGCCC AGT AC ATG ACCTT ATGGG ACTTTCCT ACTTGGC AGT A
CATCTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCT
TCACTCTCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTA
ATTATTTTGTGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCG
GGGCGGGGCGGGGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGG
CAGCCAATCAGAGCGGCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGC
GGCGGCGGCCCTATAAAAAGCGAAGCGCGCGGCGGGCGGGGAGTCGCTGCGAC
GCTGCCTTCGCCCCGTGCCCCGCTCCGCCGCCGCCTCGCGCCGCCCGCCCCGG
CTCTGACTGACCGCGTTACTCCCACAGGTGAGCGGGCGGGACGGCCCTTCTCCTC
CGGGCTGTAATTAGCGCTTGGTTTAATGACGGCTTGTTTCTTTTCTGTGGCTGCGT
GAAAGCCTTGAGGGGCTCCGGGAGGGCCCTTTGTGCGGGGGGAGCGGCTCGGG
GGGTGCGTGCGTGTGTGTGTGCGTGGGGAGCGCCGCGTGCGGCTCCGCGCTGC
CCGGCGGCTGTGAGCGCTGCGGGCGCGGCGCGGGGCTTTGTGCGCTCCGCAGT
GTGCGCGAGGGGAGCGCGGCCGGGGGCGGTGCCCCGCGGTGCGGGGGGGGCT
GCGAGGGGAACAAAGGCTGCGTGCGGGGTGTGTGCGTGGGGGGGTGAGCAGGG
GGTGTGGGCGCGTCGGTCGGGCTGCAACCCCCCCTGCACCCCCCTCCCCGAGTT
GCTGAGCACGGCCCGGCTTCGGGTGCGGGGCTCCGTACGGGGCGTGGCGCGGG
GCTCGCCGTGCCGGGCGGGGGGTGGCGGCAGGTGGGGGTGCCGGGCGGGGCG
GGGCCGCCTCGGGCCGGGGAGGGCTCGGGGGAGGGGCGCGGCGGCCCCCGGA
GCGCCGGCGGCTGTCGAGGCGCGGCGAGCCGCAGCCATTGCCTTTTATGGTAAT
CGTGCGAGAGGGCGCAGGGACTTCCTTTGTCCCAAATCTGTGCGGAGCCGAAATC
TGGGAGGCGCCGCCGCACCCCCTCTAGCGGGCGCGGGGCGAAGCGGTGCGGCG
CCGGCAGGAAGGAAATGGGCGGGGAGGGCCTTCGTGCGTCGCCGCGCCGCCGT
CCCCTTCTCCCTCTCCAGCCTCGGGGCTGTCCGCGGGGGGACGGCTGCCTTCGG
GGGGGACGGGGCAGGGCGGGGTTCGGCTTCTGGCGTGTGACCGGCGGCTCTAG
AGCCTCTGCTAACCATGTTCATGCCTTCTTCTTTTTCCTACAGCTCCTGGGCAACGT
GCTGGTTATTGTGCTGTCTCATCATTTTGGCAAAGAATTCCTCGAGCGGCCGCCAG
TGTGATGGATATCGGATCCGCTAGCGCTACTAACTTCAGCCTGCTGAAGCAGGCT
GGAGACGTGGAGGAGAACCCTGGACCTGGACCGGTCGCCACCATGGTGAGCAAG GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGAC
GTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGG
CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCG
ACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTT
CAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCAC
AACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAA
CACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCAC
CCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAA
GCGGCCGCCACCGCGGTGGAGCTCGAATTAATTCATCGATGATGATCCAGACATG
ATAAGATACATTGATGAGTTTGGACAAACCACAACTAGAATGCAGTGAAAAAAATG
CTTTATTTGTGAAATTTGTGATGCTATTGCTTTATTTGTAACCATTATAAGCTGCAAT
AAACAAGTTAACAACAACAATTGCATTCATTTTATGTTTCAGGTTCAGGGGGAGGTG
TGGGAGGTTTTTTAAAGCAAGTAAAACCTCTACAAATGTGGTATGGCTGATTATGAT
CCTCTAGAGTCGGTGGGCCTCGGGGGCGGGTGCGGGGTCGGCGGGGCCGCCCC
GGGTGGCTTCGGTCGGAGCCATGGGGTCGTGCGCTCCTTTCGGTCGGGCGCTGC
GGGTCGTGGGGCGGGCGTCAGGCACCGGGCTTGCGGGTCATGCACCAGGTGCG
CGGTCCTTCGGGCACCTCGACGTCGGCGGTGACGGTGAAGCCGAGCCGCTCGTA
GAAGGGGAGGTTGCGGGGCGCGGAGGTCTCCAGGAAGGCGGGCACCCCGGCGC
GCTCGGCCGCCTCCACTCCGGGGAGCACGACGGCGCTGCCCAGACCCTTGCCCT
GGTGGTCGGGCGAGACGCCGACGGTGGCCAGGAACCACGCGGGCTCCTTGGGC
CGGTGCGGCGCCAGGAGGCCTTCCATCTGTTGCTGCGCGGCCAGCCGGGAACCG
CTCAACTCGGCCATGCGCGGGCCGATCTCGGCGAACACCGCCCCCGCTTCGACG
CTCTCCGGCGTGGTCCAGACCGCCACCGCGGCGCCGTCGTCCGCGACCCACACC
TTGCCGATGTCGAGCCCGACGCGCGTGAGGAAGAGTTCTTGCAGCTCGGTGACC
CGCTCGATGTGGCGGTCCGGGTCGACGGTGTGGCGCGTGGCGGGGTAGTCGGC
GAACGCGGCGGCGAGGGTGCGTACGGCCCGGGGGACGTCGTCGCGGGTGGCGA
GGCGCACCGTGGGCTTGTACTCGGTCATGGAAGGTCGTCTCCTTGTGAGGGGTCA
GGGGCGTGGGTCAGGGGATGGTGGCGGCACCGGTCGTGGCGGCCGACCTGCAG
GCATGCAAGCTTTTTGCAAAAGCCTAGGCCTCCAAAAAAGCCTCCTCACTACTTCT
GGAATAGCTCAGAGGCCGAGGCGGCCTCGGCCTCTGCATAAATAAAAAAAATTAG
TCAGCCATGGGGCGGAGAATGGGCGGAACTGGGCGGAGTTAGGGGCGGGATGG
GCGGAGTTAGGGGCGGGACTATGGTTGCTGACTAATTGAGATGCATGCTTTGCAT
ACTTCTGCCTGCTGGGGAGCCTGGGGACTTTCCACACCTGGTTGCTGACTAATTG
AGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGGGGACTTTCCACACC
CTAACTGACACACATTCCACAGAATTCAAGTGATCTCCAAAAAATAAGTACTTTTTG
ACTGTAAATAAAATTGTAAGGAGTAAAAAGTACTTTTTTTTCTAAAAAAATGTAATT
AAGTAAAAGTAAAAGTATTGATTTTTAATTGTACTCAAGTAAAGTAAAAATCCCCAA
A A AT A AT ACTTA AGT AC AGT A ATC A AGT A A A ATT ACTC A AGT ACTTT AC ACCTCTGG
TTCTTGACCCCCTACCTTCAGCAAGCCCAGCAGATCCGAGCTCCAGCTTTTGTTCCCT
TTAGTGAGGGTTAATTGCGCGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTG
AAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTA AAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTG
CCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAAC
GCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGA
CTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGC
GGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCA
AAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCC
ATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTG
GCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTC
GTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCC
CTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGT
GTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGAC
CGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTT
ATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC
GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAG
TATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGC
TCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCA
GCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGG
GGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTA
TCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATC
TAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCA
CCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGT
AGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACC
GCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGA
AGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTA
ATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTT
GTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATT
CAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAA
AAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGT
GTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGT
AAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTAT
GCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACAT
AGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTC
AAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACT
GATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGG
CAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACT
CTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATAC
ATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGA
AAAGTG (SEQ ID NO: 41)
LDLRwt
ATGGGGCCCTGGGGCTGGAAATTGCGCTGGACCGTCGCCTTGCTCCTCGCCGCG
GCGGGGACTGCAGTGGGCGACAGATGCGAAAGAAACGAGTTCCAGTGCCAAGAC
GGGAAATGCATCTCCTACAAGTGGGTCTGCGATGGCAGCGCTGAGTGCCAGGATG
GCTCTGATGAGTCCCAGGAGACGTGCTTGTCTGTCACCTGCAAATCCGGGGACTT
CAGCTGTGGGGGCCGTGTCAACCGCTGCATTCCTCAGTTCTGGAGGTGCGATGGC
CAAGTGGACTGCGACAACGGCTCAGACGAGCAAGGCTGTCCCCCCAAGACGTGC TCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTCTCGGCAGTTCGTCT
GTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGCCTCCTGCCCGGTGC
TCACCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACCTGCATCCCCCAGCT
GTGGGCCTGCGACAACGACCCCGACTGCGAAGATGGCTCGGATGAGTGGCCGCA
GCGCTGTAGGGGTCTTTACGTGTTCCAAGGGGACAGTAGCCCCTGCTCGGCCTTC
GAGTTCCACTGCCTAAGTGGCGAGTGCATCCACTCCAGCTGGCGCTGTGATGGTG
GCCCCGACTGCAAGGACAAATCTGACGAGGAAAACTGCGCTGTGGCCACCTGTCG
CCCTGACGAATTCCAGTGCTCTGATGGAAACTGCATCCATGGCAGCCGGCAGTGT
GACCGGGAATATGACTGCAAGGACATGAGCGATGAAGTTGGCTGCGTTAATGTGA
CACTCTGCGAGGGACCCAACAAGTTCAAGTGTCACAGCGGCGAATGCATCACCCT
GGACAAAGTCTGCAACATGGCTAGAGACTGCCGGGACTGGTCAGATGAACCCATC
AAAGAGTGCGGGACCAACGAATGCTTGGACAACAACGGCGGCTGTTCCCACGTCT
GCAATGACCTTAAGATCGGCTACGAGTGCCTGTGCCCCGACGGCTTCCAGCTGGT
GGCCCAGCGAAGATGCGAAGATATCGATGAGTGTCAGGATCCCGACACCTGCAGC
CAGCTCTGCGTGAACCTGGAGGGTGGCTACAAGTGCCAGTGTGAGGAAGGCTTC
CAGCTGGACCCCCACACGAAGGCCTGCAAGGCTGTGGGCTCCATCGCCTACCTCT
TCTTCACCAACCGGCACGAGGTCAGGAAGATGACGCTGGACCGGAGCGAGTACA
CCAGCCTCATCCCCAACCTGAGGAACGTGGTCGCTCTGGACACGGAGGTGGCCA
GCAATAGAATCTACTGGTCTGACCTGTCCCAGAGAATGATCTGCAGCACCCAGCTT
GACAGAGCCCACGGCGTCTCTTCCTATGACACCGTCATCAGCAGAGACATCCAGG
CCCCCGACGGGCTGGCTGTGGACTGGATCCACAGCAACATCTACTGGACCGACTC
TGTCCTGGGCACTGTCTCTGTTGCGGATACCAAGGGCGTGAAGAGGAAAACGTTA
TTCAGGGAGAACGGCTCCAAGCCAAGGGCCATCGTGGTGGATCCTGTTCATGGCT
TCATGTACTGGACTGACTGGGGAACTCCCGCCAAGATCAAGAAAGGGGGCCTGAA
TGGTGTGGACATCTACTCGCTGGTGACTGAAAACATTCAGTGGCCCAATGGCATCA
CCCTAGATCTCCTCAGTGGCCGCCTCTACTGGGTTGACTCCAAACTTCACTCCATC
TCAAGCATCGATGTCAATGGGGGCAACCGGAAGACCATCTTGGAGGATGAAAAGA
GGCTGGCCCACCCCTTCTCCTTGGCCGTCTTTGAGGACAAAGTATTTTGGACAGAT
ATCATCAACGAAGCCATTTTCAGTGCCAACCGCCTCACAGGTTCCGATGTCAACTT
GTTGGCTGAAAACCTACTGTCCCCAGAGGATATGGTCCTCTTCCACAACCTCACCC
AGCCAAGAGGAGTGAACTGGTGTGAGAGGACCACCCTGAGCAATGGCGGCTGCC
AGTATCTGTGCCTCCCTGCCCCGCAGATCAACCCCCACTCGCCCAAGTTTACCTG
CGCCTGCCCGGACGGCATGCTGCTGGCCAGGGACATGAGGAGCTGCCTCACAGA
GGCTGAGGCTGCAGTGGCCACCCAGGAGACATCCACCGTCAGGCTAAAGGTCAG
CTCCACAGCCGTAAGGACACAGCACACAACCACCCGGCCTGTTCCCGACACCTCC
CGGCTGCCTGGGGCCACCCCTGGGCTCACCACGGTGGAGATAGTGACAATGTCT
CACCAAGCTCTGGGCGACGTTGCTGGCAGAGGAAATGAGAAGAAGCCCAGTAGC
GTGAGGGCTCTGTCCATTGTCCTCCCCATCGTGCTCCTCGTCTTCCTTTGCCTGGG
GGTCTTCCTTCTATGGAAGAACTGGCGGCTTAAGAACATCAACAGCATCAACTTTG
ACAACCCCGTCTATCAGAAGACCACAGAGGATGAGGTCCACATTTGCCACAACCA
GGACGGCTACAGCTACCCCTCGAGACAGATGGTCAGTCTGGAGGATGACGTGGCG
(SEQ ID NO: 42)
LDLRDup252 with surrounding region
CCCCCAAGACGTGCTCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTC
TCGGCAGTTCGTCTGTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGC CTCCTGCCCGGTGCTCACCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACC
TGCATCCCCCAGCTGTGGGCCTGCGACAACGACCCCGACTGCGAAGATGGCTCG
GAGGCTCGGATGAGTGGCCGCAGCGCTGTAGGGGTCTTTACGTGTTCCAAGGGG
ACAGTAGCCCCTGCTCGGCCTTCGAGTTCCACTGCCTAAGTGGCGAGTGCATCCA
CTCCAGCTGGCGCTGTGATGGTGGCCCCGACTGCAAGGACAAATCTGACGAGGA
AAACTGCG (SEQ ID NO: 43)
LDLRDup254/255 with surrounding region
CCCCCAAGACGTGCTCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTC
TCGGCAGTTCGTCTGTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGC
CTCCTGCCCGGTGCTCACCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACC
TGCATCCCCCAGCTGTGGGCCTGCGACAACGACCCCGACTGCGAAGATGGCTCG
GATGAGTGGCCGCAGCGCTGTAGGGGTCTTTACGTGTTCCAAGGGGACAGTAGC
CCCTGCTCGGCCTTCGAGTTCCACTGCCTAAGTGGCGAGTGCATCCACTCCAGCT
GGCGCTGTGATGGTGGCCCCGACTGCAAGGACAAATCTGACAGGACAAATCTGAC
GAGGAAAACTGCGCTGTGGCCACCTGTCGCCCTGACGAATTCCAGTGCTCTGATG
GAAACTGCATCCATG (SEQ ID NO: 44)
LDLRDup258 with surrounding region
CCCCCAAGACGTGCTCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTC
TCGGCAGTTCGTCTGTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGC
CTCCTGCCCGGTGCTCACCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACC
TGCATCCCCCAGCTGTGGGCCTGCGACAACGACCCCGACTGCGAAGATGGCTCG
GATGAGTGGCCGCAGCGCTGTAGGGGTCTTTACGTGTTCCAAGGGGACAGTAGC
CCCTGCTCGGCCTTCGAGTTCCACTGCCTAAGTGGCGAGTGCATCCACTCCAGCT
GGCGCTGTGATGGTGGCCCCGACTGCAAGGACAAATCTGAGGACAAATCTGACGA
GGAAAACTGCGCTGTGGCCACCTGTCGCCCTGACGAATTCCAGTGCTCTGATGGA
AACTGCATCCATG (SEQ ID NO: 45)
LDLRDup261 with surrounding region
CCCCCAAGACGTGCTCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTC
TCGGCAGTTCGTCTGTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGC
CTCCTGCCCGGTGCTCACCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACC
TGCATCCCCCAGCTGTGGGCCTGCGACAACGACCCCGACTGCGAAGATGGCTCG
GATGAGTGGCCGCAGCGCTGTAGGGGTCTTTACGTGTTCCAAGGGGACAGTAGC
CCCTGCTCGGCCTTCGAGTTCCACTGCCTAAGTGGCGAGTGCATCCACTCCAGCT
GGCGCTGTGATGGTGGCCCCGACTGCAAGGACAAATCTGACGACAAATCTGACGA
GGAAAACTGCGCTGTGGCCACCTGTCGCCCTGACGAATTCCAGTGCTCTGATGGA
AACTGCATCCATG (SEQ ID NO: 46)
LDLRDup264 with surrounding region
CTTCATGTACTGGACTGACTGGGGAACTCCCGCCAAGATCAAGAAAGGGGGCCTG AATGGTGTGGACATCTACTCGCTGGTGAGCTGGTGACTGAAAACATTCAGTGGCC CAATGGCATCACCCTAG (SEQ ID NO: 47) GAAwt
ATGGGAGTGAGGCACCCGCCCTGCTCCCACCGGCTCCTGGCCGTCTGCGCCCTC
GTGTCCTTGGCAACCGCTGCACTCCTGGGGCACATCCTACTCCATGATTTCCTGCT
GGTTCCCCGAGAGCTGAGTGGCTCCTCCCCAGTCCTGGAGGAGACTCACCCAGCT
CACCAGCAGGGAGCCAGCAGACCAGGGCCCCGGGATGCCCAGGCACACCCCGG
CCGTCCCAGAGCAGTGCCCACACAGTGCGACGTCCCCCCCAACAGCCGCTTCGA
TTGCGCCCCTGACAAGGCCATCACCCAGGAACAGTGCGAGGCCCGCGGCTGTTG
CTACATCCCTGCAAAGCAGGGGCTGCAGGGAGCCCAGATGGGGCAGCCCTGGTG
CTTCTTCCCACCCAGCTACCCCAGCTACAAGCTGGAGAACCTGAGCTCCTCTGAAA
TGGGCTACACGGCCACCCTGACCCGTACCACCCCCACCTTCTTCCCCAAGGACAT
CCTGACCCTGCGGCTGGACGTGATGATGGAGACTGAGAACCGCCTCCACTTCACG
ATCAAAGATCCAGCTAACAGGCGCTACGAGGTGCCCTTGGAGACCCCGCATGTCC
ACAGCCGGGCACCGTCCCCACTCTACAGCGTGGAGTTCTCCGAGGAGCCCTTCG
GGGTGATCGTGCGCCGGCAGCTGGACGGCCGCGTGCTGCTGAACACGACGGTG
GCGCCCCTGTTCTTTGCGGACCAGTTCCTTCAGCTGTCCACCTCGCTGCCCTCGC
AGTATATCACAGGCCTCGCCGAGCACCTCAGTCCCCTGATGCTCAGCACCAGCTG
GACCAGGATCACCCTGTGGAACCGGGACCTTGCGCCCACGCCCGGTGCGAACCT
CTACGGGTCTCACCCTTTCTACCTGGCGCTGGAGGACGGCGGGTCGGCACACGG
GGTGTTCCTGCTAAACAGCAATGCCATGGATGTGGTCCTGCAGCCGAGCCCTGCC
CTTAGCTGGAGGTCGACAGGTGGGATCCTGGATGTCTACATCTTCCTGGGCCCAG
AGCCCAAGAGCGTGGTGCAGCAGTACCTGGACGTTGTGGGATACCCGTTCATGCC
GCCATACTGGGGCCTGGGCTTCCACCTGTGCCGCTGGGGCTACTCCTCCACCGCT
ATCACCCGCCAGGTGGTGGAGAACATGACCAGGGCCCACTTCCCCCTGGACGTC
CAGTGGAACGACCTGGACTACATGGACTCCCGGAGGGACTTCACGTTCAACAAGG
ATGGCTTCCGGGACTTCCCGGCCATGGTGCAGGAGCTGCACCAGGGCGGCCGGC
GCTACATGATGATCGTGGATCCTGCCATCAGCAGCTCGGGCCCTGCCGGGAGCTA
CAGGCCCTACGACGAGGGTCTGCGGAGGGGGGTTTTCATCACCAACGAGACCGG
CCAGCCGCTGATTGGGAAGGTATGGCCCGGGTCCACTGCCTTCCCCGACTTCACC
AACCCCACAGCCCTGGCCTGGTGGGAGGACATGGTGGCTGAGTTCCATGACCAG
GTGCCCTTCGACGGCATGTGGATTGACATGAACGAGCCTTCCAACTTCATCAGGG
GCTCTGAGGACGGCTGCCCCAACAATGAGCTGGAGAACCCACCCTACGTGCCTG
GGGTGGTTGGGGGGACCCTCCAGGCGGCCACCATCTGTGCCTCCAGCCACCAGT
TTCTCTCCACACACTACAACCTGCACAACCTCTACGGCCTGACCGAAGCCATCGCC
TCCCACAGGGCGCTGGTGAAGGCTCGGGGGACACGCCCATTTGTGATCTCCCGC
TCGACCTTTGCTGGCCACGGCCGATACGCCGGCCACTGGACGGGGGACGTGTGG
AGCTCCTGGGAGCAGCTCGCCTCCTCCGTGCCAGAAATCCTGCAGTTTAACCTGC
TGGGGGTGCCTCTGGTCGGGGCCGACGTCTGCGGCTTCCTGGGCAACACCTCAG
AGGAGCTGTGTGTGCGCTGGACCCAGCTGGGGGCCTTCTACCCCTTCATGCGGAA
CCACAACAGCCTGCTCAGTCTGCCCCAGGAGCCGTACAGCTTCAGCGAGCCGGC
CCAGCAGGCCATGAGGAAGGCCCTCACCCTGCGCTACGCACTCCTCCCCCACCT
CTACACACTGTTCCACCAGGCCCACGTCGCGGGGGAGACCGTGGCCCGGCCCCT
CTTCCTGGAGTTCCCCAAGGACTCTAGCACCTGGACTGTGGACCACCAGCTCCTG
TGGGGGGAGGCCCTGCTCATCACCCCAGTGCTCCAGGCCGGGAAGGCCGAAGTG
ACTGGCTACTTCCCCTTGGGCACATGGTACGACCTGCAGACGGTGCCAGTAGAGG
CCCTTGGCAGCCTCCCACCCCCACCTGCAGCTCCCCGTGAGCCAGCCATCCACAG
CGAGGGGCAGTGGGTGACGCTGCCGGCCCCCCTGGACACCATCAACGTCCACCT CCGGGCTGGGTACATCATCCCCCTGCAGGGCCCTGGCCTCACAACCACAGAGTC
CCGCCAGCAGCCCATGGCCCTGGCTGTGGCCCTGACCAAGGGTGGGGAGGCCC
GAGGGGAGCTGTTCTGGGACGATGGAGAGAGCCTGGAAGTGCTGGAGCGAGGG
GCCTACACACAGGTCATCTTCCTGGCCAGGAATAACACGATCGTGAATGAGCTGG
TACGTGTGACCAGTGAGGGAGCTGGCCTGCAGCTGCAGAAGGTGACTGTCCTGG
GCGTGGCCACGGCGCCCCAGCAGGTCCTCTCCAACGGTGTCCCTGTCTCCAACTT
CACCTACAGCCCCGACACCAAGGTCCTGGACATCTGTGTCTCGCTGTTGATGGGA
GAGCAGTTTCTCGTCAGCTGGTGT (SEQ ID NO: 48)
GAADup327/328
ATGGGAGTGAGGCACCCGCCCTGCTCCCACCGGCTCCTGGCCGTCTGCGCCCTC
GTGTCCTTGGCAACCGCTGCACTCCTGGGGCACATCCTACTCCATGATTTCCTGCT
GGTTCCCCGAGAGCTGAGTGGCTCCTCCCCAGTCCTGGAGGAGACTCACCCAGCT
CACCAGCAGGGAGCCAGCAGACCAGGGCCCCGGGATGCCCAGGCACACCCCGG
CCGTCCCAGAGCAGTGCCCACACAGTGCGACGTCCCCCCCAACAGCCGCTTCGA
TTGCGCCCCTGACAAGGCCATCACCCAGGAACAGTGCGAGGCCCGCGGCTGTTG
CTACATCCCTGCAAAGCAGGGGCTGCAGGGAGCCCAGATGGGGCAGCCCTGGTG
CTTCTTCCCACCCAGCTACCCCAGCTACAAGCTGGAGAACCTGAGCTCCTCTGAAA
TGGGCTACACGGCCACCCTGACCCGTACCACCCCCACCTTCTTCCCCAAGGACAT
CCTGACCCTGCGGCTGGACGTGATGATGGAGACTGAGAACCGCCTCCACTTCACG
ATCAAAGATCCAGCTAACAGGCGCTACGAGGTGCCCTTGGAGACCCCGCATGTCC
ACAGCCGGGCACCGTCCCCACTCTACAGCGTGGAGTTCTCCGAGGAGCCCTTCG
GGGTGATCGTGCGCCGGCAGCTGGACGGCCGCGTGCTGCTGAACACGACGGTG
GCGCCCCTGTTCTTTGCGGACCAGTTCCTTCAGCTGTCCACCTCGCTGCCCTCGC
AGTATATCACAGGCCTCGCCGAGCACCTCAGTCCCCTGATGCTCAGCACCAGCTG
GACCAGGATCACCCTGTGGAACCGGGACCTTGCGCCCACGCCCGGTGCGAACCT
CTACGGGTCTCACCCTTTCTACCTGGCGCTGGAGGACGGCGGGTCGGCACACGG
GGTGTTCCTGCTAAACAGCAATGCCATGGATGTGGTCCTGCAGCCGAGCCCTGCC
CTTAGCTGGAGGTCGACAGGTGGGATCCTGGATGTCTACATCTTCCTGGGCCCAG
AGCCCAAGAGCGTGGTGCAGCAGTACCTGGACGTTGTGGGATACCCGTTCATGCC
GCCATACTGGGGCCTGGGCTTCCACCTGTGCCGCTGGGGCTACTCCTCCACCGCT
ATCACCCGCCAGGTGGTGGAGAACATGACCAGGGCCCACTTCCCCCTGGACGTC
CAGTGGAACGACCTGGACTACATGGACTCCCGGAGGGACTTCACGTTCAACAAGG
ATGGCTTCCGGGACTTCCCGGCCATGGTGCAGGAGCTGCACCAGGGCGGCCGGC
GCTACATGATGATCGTGGATCCTGCCATCAGCAGCTCGGGCCCTGCCGGGAGCTA
CAGGCCCTACGACGAGGGTCTGCGGAGGGGGGTTTTCATCACCAACGAGACCGG
CCAGCCGCTGATTGGGAAGGTATGGCCCGGGTCCACTGCCTTCCCCGACTTCACC
AACCCCACAGCCCTGGCCTGGTGGGAGGACATGGTGGCTGAGTTCCATGACCAG
GTGCCCTTCGACGGCATGTGGATTGACATGAACGAGCCTTCCAACTTCATCAGGG
GCTCTGAGGACGGCTGCCCCAACAATGAGCTGGAGAACCCACCCTACGTGCCTG
GGGTGGTTGGGGGGACCCTCCAGGCGGCCACCATCTGTGCCTCCAGCCACCAGT
TTCTCTCCACACACTACAACCTGCACAACCTCTACGGCCTGACCGAAGCCATCGCC
TCCCACAGGGCGCTGGTGAAGGCTCGGGGGACACGCCCATTTGTGATCTCCCGC
TCGACCTTTGCTGGCCACGGCCGATACGCCGGCCACTGGACGGGGGACGTGTGG
AGCTCCTGGGAGCAGCTCGCCTCCTCCGTGCCAGAAATCCTGCAGTTTAACCTGC
TGGGGGTGCCTCTGGTCGGGGCCGACGTCTGCGGCTTCCTGGGCAACACCTCAG AGGAGCTGTGTGTGCGCTGGACCCAGCTGGGGGCCTTCTACCCCTTCATGCGGAA
CCACAACAGCCTGCTCAGTCTGCCCCAGGAGCCGTACAGCTTCAGCGAGCCGGC
CCAGCAGGCCATGAGGAAGGCCCTCACCCTGCGCTACGCACTCCTCCCCCACCT
CTACACACTGTTCCACCAGGCCCACGTCGCGGGGGAGACCGTGGCCCGGCCCCT
CTTCCTGGAGTTCCCCAAGGACTCTAGCACCTGGACTGTGGACCACCAGCTCCTG
TGGGGGGAGGCCCTGCTCATCACCCCAGTGCTCCAGGCCGGGAAGGCCGAAGTG
ACTGGCTACTTCCCCTTGGGCACATGGTACGACCTGCAGACGGTGCCAGTAGAGG
CCCTTGGCAGCCTCCCACCCCCACCTGCAGCTCCCCGTGAGCCAGCCATCCACAG
CGAGGGGCAGTGGGTGACGCTGCCGGCCCCCCTGGACACCATCAACGTCCACCT
CCGGGCTGGGTACATCATCCCCCTGCAGGGCCCTGGCCTCACAACCACAGAGTC
CCGCCAGCAGCCCATGGCCCTGGCTGTGGCCCTGACCAAGGGTGGGGAGGCCC
GAGGGGAGCTGTTCTGGGACGATGGAGAGAGCCTGGAAGTGCTGGAGCGAGGG
GCCTACACACAGGTCATCTTCCTGGCCAGGAATAACACGATCGTGAATGAGCTGG
TACGTGTGACCAGTGAGGGAGCTGGCCTGCAGCTGCAGAAGGTGACTGCAGAAG
GTGACTGTCCTGGGCGTGGCCACGGCGCCCCAGCAGGTCCTCTCCAACGGTGTC
CCTGTCTCCAACTTCACCTACAGCCCCGACACCAAGGTCCTGGACATCTGTGTCTC
GCTGTTGATGGGAGAGCAGTTTCTCGTCAGCTGGTGT (SEQ ID NO: 49)
GLBlwt
ATGCCGGGGTTCCTGGTTCGCATCCTCCCTCTGTTGCTGGTTCTGCTGCTTCTGG
GCCCTACGCGCGGCTTGCGCAATGCCACCCAGAGGATGTTTGAAATTGACTATAG
CCGGGACTCCTTCCTCAAGGATGGCCAGCCATTTCGCTACATCTCAGGAAGCATTC
ACTACTCCCGTGTGCCCCGCTTCTACTGGAAGGACCGGCTGCTGAAGATGAAGAT
GGCTGGGCTGAACGCCATCCAGACGTATGTGCCCTGGAACTTTCATGAGCCCTGG
CCAGGACAGTACCAGTTTTCTGAGGACCATGATGTGGAATATTTTCTTCGGCTGGC
TCATGAGCTGGGACTGCTGGTTATCCTGAGGCCCGGGCCCTACATCTGTGCAGAG
TGGGAAATGGGAGGATTACCTGCTTGGCTGCTAGAGAAAGAGTCTATTCTTCTCCG
CTCCTCCGACCCAGATTACCTGGCAGCTGTGGACAAGTGGTTGGGAGTCCTTCTG
CCCAAGATGAAGCCTCTCCTCTATCAGAATGGAGGGCCAGTTATAACAGTGCAGG
TTGAAAATGAATATGGCAGCTACTTTGCCTGTGATTTTGACTACCTGCGCTTCCTGC
AGAAGCGCTTTCGCCACCATCTGGGGGATGATGTGGTTCTGTTTACCACTGATGGA
GCACATAAAACATTCCTGAAATGTGGGGCCCTGCAGGGCCTCTACACCACGGTGG
ACTTTGGAACAGGCAGCAACATCACAGATGCTTTCCTAAGCCAGAGGAAGTGTGA
GCCCAAAGGACCCTTGATCAATTCTGAATTCTATACTGGCTGGCTAGATCACTGGG
GCCAACCTCACTCCACAATCAAGACCGAAGCAGTGGCTTCCTCCCTCTATGATATA
CTTGCCCGTGGGGCGAGTGTGAACTTGTACATGTTTATAGGTGGGACCAATTTTGC
CTATTGGAATGGGGCCAACTCACCCTATGCAGCACAGCCCACCAGCTACGACTAT
GATGCCCCACTGAGTGAGGCTGGGGACCTCACTGAGAAGTATTTTGCTCTGCGAA
ACATCATCCAGAAGTTTGAAAAAGTACCAGAAGGTCCTATCCCTCCATCTACACCA
AAGTTTGCATATGGAAAGGTCACTTTGGAAAAGTTAAAGACAGTGGGAGCAGCTCT
GGACATTCTGTGTCCCTCTGGGCCCATCAAAAGCCTTTATCCCTTGACATTTATCCA
GGTGAAACAGCATTATGGGTTTGTGCTGTACCGGACAACACTTCCTCAAGATTGCA
GCAACCCAGCACCTCTCTCTTCACCCCTCAATGGAGTCCACGATCGAGCATATGTT
GCTGTGGATGGGATCCCCCAGGGAGTCCTTGAGCGAAACAATGTGATCACTCTGA
ACATAACAGGGAAAGCTGGAGCCACTCTGGACCTTCTGGTAGAGAACATGGGACG
TGTGAACTATGGTGCATATATCAACGATTTTAAGGGTTTGGTTTCTAACCTGACTCT CAGTTCCAATATCCTCACGGACTGGACGATCTTTCCACTGGACACTGAGGATGCAG
TGTGCAGCCACCTGGGGGGCTGGGGACACCGTGACAGTGGCCACCATGATGAAG
CCTGGGCCCACAACTCATCCAACTACACGCTCCCGGCCTTTTATATGGGGAACTTC
TCCATTCCCAGTGGGATCCCAGACTTGCCCCAGGACACCTTTATCCAGTTTCCTGG
ATGGACCAAGGGCCAGGTCTGGATTAATGGCTTTAACCTTGGCCGCTATTGGCCA
GCCCGGGGCCCTCAGTTGACCTTGTTTGTGCCCCAGCACATCCTGATGACCTCGG
CCCCAAACACCATCACCGTGCTGGAACTGGAGTGGGCACCCTGCAGCAGTGATGA
TCCAGAACTATGTGCTGTGACGTTCGTGGACAGGCCAGTTATTGGCTCATCTGTGA
CCTACGATCATCCCTCCAAACCTGTTGAAAAAAGACTCATGCCCCCACCCCCGCAA
AAAAACAAAGATTCATGGCTGGACCATGTA (SEQ ID NO: 50)
GLBlDup84
ATGCCGGGGTTCCTGGTTCGCATCCTCCCTCTGTTGCTGGTTCTGCTGCTTCTGG
GCCCTACGCGCGGCTTGCGCAATGCCACCCAGAGGATGTTTGAAATTGACTATAG
CCGGGACTCCTTCCTCAAGGATGGCCAGCCATTTCGCTACATCTCAGGAAGCATTC
ACTACTCCCGTGTGCCCCGCTTCTACTGGAAGGACCGGCTGCTGAAGATGAAGAT
GGCTGGGCTGAACGCCATCCAGACGTATGTGCCCTGGAACTTTCATGAGCCCTGG
CCAGGACAGTACCAGTTTTCTGAGGACCATGATGTGGAATATTTTCTTCGGCTGGC
TCATGAGCTGGGACTGCTGGTTATCCTGAGGCCCGGGCCCTACATCTGTGCAGAG
TGGGAAATGGGAGGATTACCTGCTTGGCTGCTAGAGAAAGAGTCTATTCTTCTCCG
CTCCTCCGACCCAGATTACCTGGCAGCTGTGGACAAGTGGTTGGGAGTCCTTCTG
CCCAAGATGAAGCCTCTCCTCTATCAGAATGGAGGGCCAGTTATAACAGTGCAGG
TTGAAAATGAATATGGCAGCTACTTTGCCTGTGATTTTGACTACCTGCGCTTCCTGC
AGAAGCGCTTTCGCCACCATCTGGGGGATGATGTGGTTCTGTTTACCACTGATGGA
GCACATAAAACATTCCTGAAATGTGGGGCCCTGCAGGGCCTCTACACCACGGTGG
ACTTTGGAACAGGCAGCAACATCACAGATGCTTTCCTAAGCCAGAGGAAGTGTGA
GCCCAAAGGACCCTTGATCAATTCTGAATTCTATACTGGCTGGCTAGATCACTGGG
GCCAACCTCACTCCACAATCAAGACCGAAGCAGTGGCTTCCTCCCTCTATGATATA
CTTGCCCGTGGGGCGAGTGTGAACTTGTACATGTTTATAGGTGGGACCAATTTTGC
CTATTGGAATGGGGCCAACTCACCCTATGCAGCACAGCCCACCAGCTACGACTAT
GATGCCCCACTGAGTGAGGCTGGGGACCTCACTGAGAAGTATTTTGCTCTGCGAA
ACATCATCCAGAAGTTTGAAAAAGTACCAGAAGGTCCTATCCCTCCATCTACACCA
AAGTTTGCATATGGAAAGGTCACTTTGGAAAAGTTAAAGACAGTGGGAGCAGCTCT
GGACATTCTGTGTCCCTCTGGGCCCATCAAAAGCCTTTATCCCTTGACATTTATCCA
GGTGAAACAGCATTATGGGTTTGTGCTGTACCGGACAACACTTCCTCAAGATTGCA
GCAACCCAGCACCTCTCTCTTCACCCCTCAATGGAGTCCACGATCGAGCATATGTT
GCTGTGGATGGGATCCCCCAGGGAGTCCTTGAGCGAAACAATGTGATCACTCTGA
ACATAACAGGGAAAGCTGGAGCCACTCTGGACCTTCTGGTAGAGAACATGGGACG
TGTGAACTATGGTGCATATATGGTGCATATATCAACGATTTTAAGGGTTTGGTTTCT
AACCTGACTCTCAGTTCCAATATCCTCACGGACTGGACGATCTTTCCACTGGACAC
TGAGGATGCAGTGTGCAGCCACCTGGGGGGCTGGGGACACCGTGACAGTGGCCA
CCATGATGAAGCCTGGGCCCACAACTCATCCAACTACACGCTCCCGGCCTTTTATA
TGGGGAACTTCTCCATTCCCAGTGGGATCCCAGACTTGCCCCAGGACACCTTTATC
CAGTTTCCTGGATGGACCAAGGGCCAGGTCTGGATTAATGGCTTTAACCTTGGCC
GCTATTGGCCAGCCCGGGGCCCTCAGTTGACCTTGTTTGTGCCCCAGCACATCCT
GATGACCTCGGCCCCAAACACCATCACCGTGCTGGAACTGGAGTGGGCACCCTG CAGCAGTGATGATCCAGAACTATGTGCTGTGACGTTCGTGGACAGGCCAGTTATT GGCTCATCTGTGACCTACGATCATCCCTCCAAACCTGTTGAAAAAAGACTCATGCC CCCACCCCCGCAAAAAAACAAAGATTCATGGCTGGACCATGTA (SEQ ID NO: 51)
PORCNwt
ATGGCCACCTTTAGCCGCCAGGAATTTTTCCAGCAGCTACTGCAAGGCTGTCTCCT
GCCTACTGCCCAGCAGGGCCTTGACCAGATCTGGCTGCTCCTTGCCATCTGCCTC
GCCTGCCGCCTCCTCTGGAGGCTCGGGTTGCCATCCTACCTGAAGCATGCAAGCA
CCGTGGCAGGCGGGTTCTTCAGCCTCTACCACTTCTTCCAGCTGCACATGGTTTG
GGTCGTGCTGCTCAGCCTCCTGTGCTACCTCGTGCTGTTCCTCTGCCGACATTCCT
CCCATCGAGGCGTCTTCCTATCCGTCACCATCCTCATCTACCTACTCATGGGTGAG
ATGCACATGGTAGACACCGTGACATGGCACAAGATGCGAGGGGCACAGATGATTG
TGGCCATGAAGGCAGTGTCTCTGGGCTTCGACCTGGACCGGGGCGAGGTGGGTA
CGGTGCCCTCGCCAGTGGAGTTCATGGGCTACCTCTACTTCGTGGGCACCATCGT
CTTCGGGCCCTGGATATCCTTCCACAGCTACCTACAAGCTGTCCAAGGCCGCCCA
CTGAGCTGCCGGTGGCTGCAGAAGGTGGCCCGGAGCCTGGCACTGGCCCTGCTG
TGCCTTGTGCTGTCCACTTGCGTGGGCCCCTACCTCTTCCCGTACTTCATCCCCCT
CAACGGTGACCGCCTCCTTCGCAAGGGCACCATGGTAAGGTGGCTGCGAGCCTA
CGAGAGTGCTGTCTCCTTCCACTTCAGCAACTATTTTGTGGGCTTTCTTTCCGAGG
CCACGGCCACGTTGGCGGGGGCTGGCTTTACCGAGGAGAAGGATCACCTGGAAT
GGGACCTGACGGTGTCCAAGCCACTGAATGTGGAGCTGCCTCGGTCAATGGTGG
AAGTTGTCACAAGCTGGAACCTGCCCATGTCTTATTGGCTAAATAACTATGTTTTCA
AGAATGCTCTCCGCCTGGGGACCTTCTCGGCTGTGCTGGTCACCTATGCAGCCAG
CGCCCTCCTACATGGCTTCAGTTTCCACCTGGCTGCGGTCCTGCTGTCCCTGGCT
TTTATCACTTACGTGGAGCATGTCCTCCGGAAGCGCCTGGCTCGGATCCTCAGTG
CCTGTGTCTTGTCAAAGCGGTGCCCGCCAGACTGTTCGCACCAGCATCGCTTGGG
CCTGGGGGTGCGAGCCTTAAACTTGCTCTTTGGAGCTCTGGCCATCTTCCACCTG
GCCTACCTGGGCTCCCTGTTTGATGTCGATGTGGATGACACCACAGAGGAGCAGG
GCTACGGCATGGCATACACTGTCCACAAGTGGTCAGAGCTCAGCTGGGCCAGTCA
CTGGGTCACTTTTGGATGCTGGATCTTCTACCGTCTCATAGGC (SEQ ID NO: 52)
PORCNDup20
ATGGCCACCTTTAGCCGCCAGGAATTTTTCCAGCAGCTACTGCAAGGCTGTCTCCT
GCCTACTGCCCAGCAGGGCCTTGACCAGATCTGGCTGCTCCTTGCCATCTGCCTC
GCCTGCCGCCTCCTCTGGAGGCTCGGGTTGCCATCCTACCTGAAGCATGCAAGCA
CCGTGGCAGGCGGGTTCTTCAGCCTCTACCACTTCTTCCAGCTGCACATGGTTTG
GGTCGTGCTGCTCAGCCTCCTGTGCTACCTCGTGCTGTTCCTCTGCCGACATTCCT
CCCATCGAGGCGTCTTCCTATCCGTCACCATCCTCATCTACCTACTCATGGGTGAG
ATGCACATGGTAGACACCGTGACATGGCACAAGATGCGAGGGGCACAGATGATTG
TGGCCATGAAGGCAGTGTCTCTGGGCTTCGACCTGGACCGGGGCGAGGTGGGTA
CGGTGCCCTCGCCAGTGGAGTTCATGGGCTACCTCTACTTCGTGGGCACCATCGT
CTTCGGGCCCTGGATATCCTTCCACAGCTACCTACAAGCTGTCCAAGGCCGCCCA
CTGAGCTGCCGGTGGCTGCAGAAGGTGGCCCGGAGCCTGGCACTGGCCCTGCTG
TGCCTTGTGCTGTCCACTTGCGTGGGCCCCTACCTCTTCCCGTACTTCATCCCCCT
CAACGGTGACCGCCTCCTTCGCAAGGGCACCATGGTAAGGTGGCTGCGAGCCTA
CGAGAGTGCTGTCTCCTTCCACTTCAGCAACTATTTTGTGGGCTTTCTTTCCGAGG CCACGGCCACGTTGGCGGGGGCTGGCTTTACCGAGGAGAAGGATCACCTGGAAT
GGGACCTGACGGTGTCCAAGCCACTGAATGTGGAGCTGCCTCGGTCAATGGTGG
AAGTTGTCACAAGCTGGAACCTGCCCATGTCTTATTGGCTAAATAACTATGTTTTCA
AGAATGCTCTCCGCCTGGGGACCTTCTCGGCTGTGCTGGTCACCTATGCAGCCAG
CGCCCTCCTACATGGCTTCAGTTTCCACCTGGCTGCGGTCCTGCTGTCCCTGGCT
TTTATCCCTGGCTTTTATCACTTACGTGGAGCATGTCCTCCGGAAGCGCCTGGCTC
GGATCCTCAGTGCCTGTGTCTTGTCAAAGCGGTGCCCGCCAGACTGTTCGCACCA
GCATCGCTTGGGCCTGGGGGTGCGAGCCTTAAACTTGCTCTTTGGAGCTCTGGCC
ATCTTCCACCTGGCCTACCTGGGCTCCCTGTTTGATGTCGATGTGGATGACACCAC
AGAGGAGCAGGGCTACGGCATGGCATACACTGTCCACAAGTGGTCAGAGCTCAG
CTGGGCCAGTCACTGGGTCACTTTTGGATGCTGGATCTTCTACCGTCTCATAGGC
(SEQ ID NO: 53)
Executive Summary
It was found that template-free DNA repair of Cas9-cleaved and Cpfl -cleaved DNA produces a predictable set of repair genotypes that can result in the gain-of-function repair of human disease mutations. Contrary to the assumption that end-joining following double-strand breaks is random and difficult to harness for applications beyond gene disruption, here it is shown that template-free end-joining repair of DNA cleaved by CRISPR-associated nucleases produces a predictable set of repair genotypes. A library of 2000 guide RNAs paired with target DNA sites was constructed, and they were integrated into mouse and human genomes, applied Cas9, and performed high-throughput sequencing of repair genotypes. Data from this assay are consistent with results from 98 endogenous loci. Building upon prior work, it is shown that the majority of repair genotypes in cells with saturated exposure to both CRISPR-Cas9 and Cpfl are deletions associated with sequence microhomology. Using 1,588 sequence contexts from the data, CRISPR-Texture, a machine learning method that accurately predicted the frequencies of template-free Cas9-mediated microhomology-associated deletions as well as 1 bp insertions, was trained. On 282 held-out sequence contexts, CRISPR-Texture predicted frameshift rates more accurately than published methods and accurately predicted the statistical entropy of repair product distributions. Applied to the human genome, CRISPR-Texture identified an appreciable fraction of Cas9 target sites supporting high-precision repair distributions that are dominated by a single genotype. Further, it was found that a class of human disease-associated micro duplication mutations can be repaired to wildtype at high frequency by template-free Cas9 nuclease editing and used the assay to validate hundreds of such alleles. Template-free Cas9 nuclease-mediated rescue of pathogenic LDLR alleles to wildtype phenotype in cellular models was also validated. This work establishes a strategy for predicting the outcomes of template-free end-joining and demonstrates that CRISPR editing can also mediate efficient gain-of-function editing at certain disease alleles without homology-directed repair.
References
1. Cong, L. et al. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339, 819 (2013).
2. Mali, P. et al. RNA-Guided Human Genome Engineering via Cas9. Science 339, 823- 826 (2013).
3. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e0047l
(2013).
4. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off- target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 1-12 (2016).
5. Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable
genome-wide off-target effects. Nature 529, 490-495 (2016).
6. Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity.
Science 351, 84-88 (2016).
7. Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. K. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279-284 (2014).
8. Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM
specificities. Nature 523, 481-485 (2015).
9. Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus
CRISPR-Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293-1298 (2015).
10. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA
specificity. Nature 1-24 (2018). doi: l0.l038/nature26l55
11. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016).
12. Gaudelli, N. M. et al. Programmable base editing of A·T to G*C in genomic DNA
without DNA cleavage. Nature 1-27 (2017). doi: l0.l038/nature24644 Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR- Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543-548 (2015).
Richardson, C. D., Ray, G. J., DeWitt, M. A., Curie, G. L. & Corn, J. E. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat. Biotechnol. 34, 339-344 (2016).
Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 1-18 (2016).
Landrum, M. J. et al. ClinVar: Public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862-D868 (2016).
Stenson, P. D. et al. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124 (2008).
Shin, H. Y. et al. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome. Nat. Commun. 8, 1-10 (2017).
Sakuma, T., Nakade, S., Sakane, Y., Suzuki, K.-I. T. & Yamamoto, T. MMEJ-assisted gene knock-in using TALENs and CRISPR-Cas9 with the PITCh systems. Nat. Protoc. 11, 118-133 (2015).
Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology- independent targeted integration. Nature 540, 144-149 (2016).
Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat. Commun. 5, 5560- 5560 (2014).
Kraft, K. et al. Deletions, Inversions, Duplications: Engineering of Structural Variants using CRISPR/Cas in Mice. Cell Rep. 10, 833-839.
Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, M. D. C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267-273 (2013).
van Overbeek, M. et al. DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9- Mediated Breaks. Mol. Cell 63, 633-646 (2016). Urasaki, A., Morvan, G. & Kawakami, K. Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics 174, 639-649 (2006).
Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair Pathway Choices and
Consequences at the Double-Strand Break. Spec. Issue Qual. Control 26, 52-64 (2016). Deriano, L. & Roth, D. B. Modernizing the Nonhomologous End-Joining Repertoire: Alternative and Classical NHEJ Share the Stage. Annu. Rev. Genet. 47, 433-455 (2013). Evers, B. et al. CRISPR knockout screening outperforms shRNA and CRISPRi in identifying essential genes. Nat. Biotechnol. 34, 631-633 (2016).
Bae, S., Kweon, J., Kim, H. S. & Kim, J.-S. Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 11, 705-706 (2014).
Cornu, T. L, Mussolino, C. & Cathomen, T. Refining strategies to translate genome editing to the clinic. Nat. Med. 23, 415 (2017).
Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 16, 299 (2015).
Mandal, P. K. et al. Efficient Ablation of Genes in Human Hematopoietic Stem and Effector Cells using CRISPR/Cas9. Cell Stem Cell 15, 643-652 (2014).
Tabebordbar, M. et al. In vivo gene editing in dystrophic mouse muscle and muscle stem cells. Science 351, 407 (2016).
Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N. & Sherwood, R. I. Cloning-free CRISPR. Stem Cell Rep. 5, 908-917 (2015).
Davis, A. J. & Chen, D. J. DNA double strand break repair via non-homologous end joining. Transl. Cancer Res. 2, 130-143 (2013).
Bourbon, M., Alves, A. C. & Sijbrands, E. J. Low-density lipoprotein receptor mutational analysis in diagnosis of familial hypercholesterolemia. Curr. Opin. Lipidol. 28, 120-129 (2017).
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186 (2015).
Oh, J. et al. Positional cloning of a gene for Hermansky-Pudlak syndrome, a disorder of cytoplasmic organelles. Nat. Genet. 14, 300-306 (1996). 39. Orthwein, A. et al. A mechanism for the suppression of homologous recombination in Gl cells. Nature 528, 422 (2015).
40. Biehs, R. et al. DNA Double-Strand Break Resection Occurs during Non-homologous End Joining in Gl but Is Distinct from Resection during Homologous Recombination. Mol. Cell 671-684 (2017). doi: l0.l0l6/j.molcel.20l6.l2.0l6
41. Zetsche, B. et al. Cpfl Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR- Cas System. Cell 163, 759-771 (2015).
42. Christian, M. et al. Targeting DNA Double-Strand Breaks with TAL Effector Nucleases.
Genetics 186, 757 (2010).
43. Kim, Y. G., Cha, J. & Chandrasegaran, S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. 93, 1156 (1996).
44. Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171-178 (2014).
45. DiCarlo, J. E., Chavez, A., Dietz, S. L., Esvelt, K. M. & Church, G. M. Safeguarding CRISPR-Cas9 gene drives in yeast. Nat. Biotechnol. 33, 1250 (2015).
46. McVey, M. & Lee, S. E. MMEJ repair of double-strand breaks (director’s cut): deleted sequences and alternative endings. Trends Genet. 24, 529-538 (2008).
47. Yu, A. M. & McVey, M. Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions. Nucleic Acids Res. 38, 5706-5717 (2010).
48. Heidenreich, E., Novotny, R., Kneidinger, B., Holzmann, V. & Wintersberger, U. Non; homologous end joining as an important mutagenic process in cell cycle; arrested cells. EMBO J. 22, 2274 (2003).
49. Pfeiffer, P., Goedecke, W. & Obe, G. Mechanisms of DNA double-strand break repair and their potential to induce chromosomal aberrations. Mutagenesis 15, 289-302 (2000).
Example 2: Workflow description of using inDelphi to design Cas9 gRNAs for efficient genome editing to induce exon skipping
One application of CRISPR-Cas9 for therapeutic purposes is to alter the genome to cause RNA editing to skip a pathogenic exon by changing the splicing regulatory sites controlling the inclusion of the pathogenic exon. In some cases, a single exon in a gene may contain a pathogenic variant, and the entire exon can be skipped to produce an alternative isoform with normal function. For example, certain cases of Duchenne muscular dystrophy (DMD) are caused by a deleterious mutation in exon 23 of the Dmd gene that results in a premature stop codon that produces a dysfunctional Dmd protein. To restore function the deleterious exon 23 can skipped by editing the DNA proximal to the diseased exon with a single gRNA (Long,
2016).
This Example tests a new method for selecting gRNAs for CRISPR editing to disrupt splice site sequences to cause the skipping of pathogenic exons to restore wild-type cellular function. Splice site acceptor DNA motifs occur at the boundary between introns and exons, depicted as 5’— intron— AG— exon— 3’, where the AG is a highly-conserved element of the splice site acceptor motif and is considered to reside in the intron. The splice site acceptor DNA motif is considered to be as long as 23 bp in length including the AG. Algorithms such as MaxEntScan (Yeo, 2004) receive as input a DNA sequence and output a numerical score representing how strongly the splice site acceptor motif is present at the DNA sequence.
CRISPR-Cas9 induces a DNA double-strand break at a specific location specified by the gRNA. DNA repair fixes the double-strand break, inducing insertions and deletions through non- homologous end-joining (NHEJ) and microhomology-mediated end-joining (MMEJ) when a homology template is not present for repair through homology-directed repair (HDR). InDelphi, as disclosed herein, predicts the frequency distribution of NHEJ/MMEJ-mediated repair genotypes following Cas9 cutting. In this Example, inDelphi’s ability to predict the spectrum of repair genotypes is utilized to identify gRNAs that ablate the splice site acceptor motif at a high frequency out of all non- wild-type repair outcomes.
Other computational methods have focused on predicting on-target efficiency of
CRISPR-Cas9 cutting. This Example aims to identify gRNAs that will efficiently cut and induce non-wild-type repair outcomes (otherwise described as high on-target activity). A relevant published method (Doench et ah, 2016, Nature, aka“Azimuth”) uses the DNA sequence surrounding the gRNA, as well as the position of the cutsite in the protein, to predict the gRNA’s ability to knock out the protein as observed in gRNA enrichment in screens of cell-essential genes, where a higher score indicates higher gRNA cutting efficacy. However, Doench et al. does not directly predict the frequency of non-wild-type repair frequency. It is reasoned here that the frequency of protein knockdown depends on the rate of non-wild-type repair, and the rate of frameshift repair out of all non-wild-type repair outcomes. Despite this concern, gRNAs are filtered based on a minimum threshold Azimuth score in order to maximize the chances that selected gRNAs have high on-target activity (increase true positives), at the risk of filtering away some gRNAs that also have high on-target activity (increase false negatives). In alternative embodiments, this Azimuth filtering step may be skipped to decrease the rate of false negatives at the risk of decreasing the rate of true positives.
To more directly address the question of on-target efficiency, this Example developed an algorithm referred to as the Basic On-Target Model (BOTM) which directly predicts the frequency of non- wild-type observations from DNA sequence features. BOTM uses DNA sequence as input and outputs a predicted frequency of non-wild-type repair, where non-wild- type repair is defined as the sum frequency of CRISPR-associated deletions and insertions (defined as reads aligning to the reference with exactly one gap which resides within l-bp of the cutsite, using alignment scores +1 match, -1 mismatch, -5 gap open, -0 gap extend), over the denominator of sum frequency of non-noise outcomes consisting of CRISPR-associated indels, wildtype repair, and reads with multiple indels with at least one occurring near the cutsite, and reads with exactly one indel occurring anywhere outside the cutsite). BOTM is implemented as an ensemble of 100 gradient boosted regression trees, each with maximum depth 3, that are fitted in consecutive stages on the negative gradient of the least squares loss function. BOTM uses the following input features: one-hot encoded nucleotides at positions -7 to 0 (such that“NGG” occupies positions 0 to 2), the GC fraction of the 40-bp window around the cutsite, and the following features from inDelphi: log phi score (microhomology score), precision score (ranging from 0 to 1, with 1 being more precise), expected value of the indel length distribution, the frequency of l-bp insertions, microhomology deletions, and microhomology- less deletions, the highest frequency of any single 1 -bp insertion outcome, the highest frequency of any single deletion outcome, and the highest frequency of any single outcome. Trained on deep sequencing data at 3,600 target sites from our genome-integrated library construct in mES also used to train inDelphi, BOTM achieves a Pearson correlation of 0.42 at predicting the observed frequency of non-wild-type repair on 400 held-out target sites from our genome-integrated library construct in mES. On held-out data, it was manually determined a BOTM predicted frequency of 0.65 or greater for gRNAs that have a high frequency of non-wildtype repair. One computational workflow for identifying Cas9 gRNAs with clinical relevance for the correction of genetic diseases by inducing exon skipping consists of four steps: identify relevant exons, select gRNAs for these exons with effective targeting, determine the genotypic products of each gRNA using inDelphi, and select gRNAs with genotypic products that are predicted to disrupt the relevant splicing motif. Using this approach, we have identified 4000 gRNAs that target splice sites to correct genetic diseases (Appendix attached).
First, 6805 exons with the following characteristics were determined: the exon length is evenly divisible by 3 so that skipping them preserves frame; the exon contains at least one HGMD pathogenic indel, which are likely to disrupt normal protein function (basal frameshift rate -66%, column“hgmd_indel_count” in Appendix spreadsheet); the exon is not constitutive, measured by <100% presence in Ensembl transcripts (Ensembl); and the exon does not contain an annotated protein domain in Pfam (Pfam). The last two criteria are used to identify exons that may not be essential for wild- type protein function. The resulting 6,805 exons were candidates for disease correction by exon skipping.
Then, SpCas9 gRNAs (NGG PAM) with cutsites in a 6 bp window surrounding and including the AG motif were selected, resulting in an average of 2.2 SpCas9 gRNAs per exon. We then ensured high predicted on-target editing efficiency by removing all gRNAs with Azimuth score below 0.20. (threshold set manually) or BOTM score below 0.65 which is chosen to separate gRNAs with high versus low frequencies of non-wild-type repair).
Each gRNA and exon target site for splice site motif disruption were scored. We obtained this prediction by first using inDelphi to predict the frequency distribution of l-bp insertion and deletion (1-60 bp) genotypes resulting from template-free DNA repair of a CRISPR gRNA induced cut at the target exon site.
Finally, for each genotype predicted by inDelphi, we classified a genotype as“motif disrupting” when its MaxEntScan score is < 0.9 of its unedited MaxEntScan score; otherwise we classified the genotype as“no effect”. This classification ruleset was provided by (Tang, 2016) and validated on experimental splicing data to achieve a sensitivity of 83.6% and specificity of 79.2% (Tang, 2016). The total frequency of all motif-disruption repair genotypes was used to predict the splice site motif disruption frequency out of all inDelphi predicted genotypes.
The top 4000 gRNA and target site pairs were selected based on this predicted frequency of splice site disruptions. Long, 2016 identified several SpCas9 gRNAs that, in a mouse model of muscular dystrophy, restored some degree of dystrophin protein expression and improved skeletal muscle function by inducing exon skipping of exon 23 (containing a non-sense mutation) via NHEJ-mediated DNA repair of a Cas9-induced cut. Without considering the results of their experiments and focusing solely on the DNA sequence context and background biological knowledge, our computational workflow recognizes that exon 23 of DMD is a good candidate for disease correction via exon skipping: the exon has a length evenly divisible by 3, is associated with a pathogenic non-sense variant that destroys normal protein function, and is not constitutive or required for normal protein function. Long 2016 reports results for only one SpCas9 gRNA targeting the 5’ end of exon 23 called sgRNA-L8
ATAATTTCTATTATATTACA with PAM GGG. In their experiments with sgRNA-L8, they observe 9/18 pups with exon 23 skipping. This gRNA targets mm 10 chrX: 83,803,134- 83,803,156 (minus strand), while the exon 23 boundary is l49-bp downstream at mm 10 chrX: 83,803,305. This Example’s computational workflow for now only identifies gRNAs cutting within a 6-bp window of the AG motif at the exon 5’ boundary, so as described our workflow does not identify Long’s sgRNA-L8. Other methods of selecting exons for splice site acceptor removal include selecting exons with mutant splice regulatory sites that result in the
inappropriate inclusion of exons in RNA transcripts (Sterne -Weiler 2014). Alternatively, subsequent expressed exons can be skipped to restore reading frame. In this case, reading frame can be restored by skipping a subsequent expressed exon where the length of the subsequent skipped exon and the length of the indel sum to 0 mod 3. In addition, constitutive exons and/or exons known to contain annotated protein domains in Pfam can be selected for exon skipping as an alternative method for knocking out a gene.
Correcting genetic disorders using predictable CRISPR/Cas9-induced exon skipping
Exon skipping has emerged as a powerful method to restore gene function in a number of genetic disorders. These therapies force the splicing machinery to bypass exons that contain deleterious point mutations or frameshifts. The FDA has recently approved an antisense oligonucleotide therapy that induces exon skipping in Duchenne muscular dystrophy to restore dystrophin function, and several other related strategies have shown pre-clinical promise. Yet, oligonucleotide therapies are transient treatments that require frequent dosing.
CRISPR/Cas9 instead promises to alleviate genetic disease permanently, through genome alteration. Using a high-throughput experimental-computational pipeline, as described herein, the inventors have developed an algorithm capable of highly accurate prediction of CRISPR/Cas9 genotypic alterations. At a predictable subset of genomic target sites, CRISPR/Cas9 induces precise sequence deletions. These modifications are highly specific and have excellent potential for therapeutic genome editing through controlled deletion of splice-acceptor sites.
The inventors will systematically evaluate this new approach to treat genetic disorders using CRISPR/Cas9 deletions. At intron-exon junctions, we will induce small deletions to bypass exons containing deleterious variants that affect protein function or alter the reading frame.
While not every splice site can be successfully deleted through CRISPR/Cas9 modification, and not every exon can be skipped without compromising gene function, the inventors expect that this approach will succeed in enough genes to have broad therapeutic implications.
To measure the applicability of this approach to treat disease throughout the genome, the inventors will establish a principled computational approach to identify exons known to harbor disease-causing mutations where omission is unlikely to impact gene function. The inventors will then apply a novel, high-throughput CRISPR/Cas9 assay that quantifies the impact of high- precision genome editing on splicing at thousands of these intron-exon boundaries. After determining a set of candidate exons that can be skipped efficiently, the inventors will measure the impact of CRISPR/Cas9-mediated exon skipping on transcript structure and gene function for dozens of human disease exons. This exhaustive approach promises to chart a systematic path toward classifying disease genes that would be most amenable for future pre-clinical evaluation of permanent therapeutic exon skipping.
Induce exon skipping using CRISPR/Cas9 at thousands of exons that harbor disease variants.
By mapping coding variants known to be associated with genetic disorders, the inventors will develop a set of exons whose skipping could feasibly provide clinical benefit. The inventors will use measures of selective constraint from large-scale population data and alternative splicing data to prioritize exons whose skipping is least likely to compromise protein function. Using the herein described algorithm which predicts the genotypic consequence of targeting with
CRISPR/Cas9, the inventors will refine a list of up to 10,000 intron-exon boundaries where modification is predicted to induce exon skipping at high rates. To test these predictions in high- throughput, the inventors will adapt our CRISPR/Cas9 cutting assay to read out context-specific splicing outcomes in human cells in vitro. This novel assay will allow paired evaluation of CRISPR/Cas9 cutting genotype and splicing phenotype for hundreds of distinct replicates in each of 10,000 human exons. The inventors will perform this assay in several human cell lines and will computationally identify exons that can be skipped at high frequency for further study. The inventors will also explore Cas9 base-editing in the same high-throughput system to determine if splicing can be altered by single base alterations. Using these data, the inventors will derive computational rules for which sequence alterations do and do not lead to exon skipping.
Evaluate the consequences of CRISPR/Cas9 exon skipping on transcript structure and function.
Using results from the high-throughput assay, the inventors will select up to 100 exon skipping guide RNAs to pursue in greater depth. The inventors will prioritize exons that are natively excluded from at least one experimentally validated splice isoform, that lack
characterized protein domains, and that are under relaxed selective constraint. For these exons, the inventors will edit native genomes in an appropriate cell line given the gene function and disease process. By performing transcript-specific RNA deep sequencing, the inventors will determine the rate of exon skipping and the transcript structure after the exon is skipped, monitoring for the appearance of aberrant splice acceptors. The inventors will also assay the function of these genes with skipped exons, using appropriate cellular and biochemical assays for each gene. This analysis will identify a set of disease genes that are promising candidates for further study in mutated cell lines and animal models. Overall, this systematic study will elucidate which disease genes are compelling candidates for pre-clinical evaluation of
CRISPR/Cas9-mediated exon skipping therapy.
References
Long C, Amoasii L, Mireault A, McAnally J, Li H, Sanchez-Ortiz E, Bhattacharyya S, Shelton J, Bassel-Duby R, Olson E. Postnatal genome editing partially restores dystrophin expression in a mouse model of muscular dystrophy. Science 2016; 351 (6271): 400-403.
Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;l l(2-3):377-94.
John G. Doench*, Nicolo Fusi*, Meagan Sullender*, Mudra Hegde*, Emma W.
Vaimberg*, Katherine F. Donovan, Ian Smith, Zuzana Tothova, Craig Wilen , Robert Orchard , Herbert W. Virgin, Jennifer Listgarten*, David E. Root. Optimized sgRNA design to maximize activity and minimize off-target effects for genetic screens with CRISPR-Cas9 . Nature
Biotechnology Jan 2016.
Rongying Tang, Debra O. Prosser, and Donald R. Love,“Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions,” Advances in Bioinformatics, vol. 2016, Article ID 5614058, 10 pages, 2016.
Sterne-Weiler T, Sanford J. Exon identity crisis: disease-causing mutations that disrupt the splicing code. Genome Biology 2014 15:201.
OTHER EMBODIMENTS
The foregoing has been a description of certain non-limiting embodiments of the invention. Those of ordinary skill in the art will appreciate that various changes and
modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.
EQUIVALENTS AND SCOPE
In the claims articles such as“a,”“an,” and“the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include“or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms“comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.
This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.
Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
Figure imgf000148_0001
Figure imgf000149_0001
Figure imgf000150_0001
Figure imgf000151_0001
Figure imgf000152_0001
Figure imgf000153_0001
Figure imgf000154_0001
Figure imgf000155_0001
Figure imgf000156_0001
Figure imgf000157_0001
Figure imgf000158_0001
Figure imgf000159_0001
Figure imgf000160_0001
Figure imgf000161_0001
Figure imgf000162_0001
Figure imgf000163_0001
Figure imgf000164_0001
Figure imgf000165_0001
Figure imgf000166_0001
Figure imgf000167_0001
Figure imgf000168_0001
Figure imgf000169_0001
Figure imgf000170_0001
Figure imgf000171_0001
Figure imgf000172_0001
Figure imgf000173_0001
Figure imgf000174_0001
Figure imgf000175_0001
Figure imgf000176_0001
Figure imgf000177_0001
Figure imgf000178_0001
Figure imgf000179_0001
Figure imgf000180_0001
Figure imgf000181_0001
Figure imgf000182_0001
Figure imgf000183_0001
Figure imgf000184_0001
Figure imgf000185_0001
Figure imgf000186_0001
Figure imgf000187_0001
Figure imgf000188_0001
Figure imgf000189_0001
Figure imgf000190_0001
Figure imgf000191_0001
Figure imgf000192_0001
Figure imgf000193_0001
Figure imgf000194_0001
Figure imgf000195_0001
Figure imgf000196_0001
Figure imgf000197_0001
Figure imgf000198_0001
Figure imgf000199_0001
Figure imgf000200_0001
Figure imgf000201_0001
Figure imgf000202_0001
Figure imgf000203_0001
Figure imgf000204_0001
Figure imgf000205_0001
Figure imgf000206_0001
Figure imgf000207_0001
Figure imgf000208_0001
Figure imgf000209_0001
Figure imgf000210_0001
Figure imgf000211_0001
Figure imgf000212_0001
Figure imgf000213_0001
Figure imgf000214_0001
Figure imgf000215_0001
Figure imgf000216_0001
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000221_0001
Figure imgf000222_0001
Figure imgf000223_0001
Figure imgf000224_0001
Figure imgf000225_0001
Figure imgf000226_0001
Figure imgf000227_0001
Figure imgf000228_0001
Figure imgf000229_0001
Figure imgf000230_0001
Figure imgf000231_0001
Figure imgf000232_0001
Figure imgf000233_0001
Figure imgf000234_0001
Figure imgf000235_0001
Figure imgf000236_0001
Figure imgf000237_0001
Figure imgf000238_0001
Figure imgf000239_0001
Figure imgf000240_0001
Figure imgf000241_0001
Figure imgf000242_0001
Figure imgf000243_0001
Figure imgf000244_0001
Figure imgf000245_0001
Figure imgf000246_0001
Figure imgf000247_0001
Figure imgf000248_0001
Figure imgf000249_0001
Figure imgf000250_0001
Figure imgf000251_0001
Figure imgf000252_0001
Figure imgf000253_0001
Figure imgf000254_0001
Figure imgf000255_0001
Figure imgf000256_0001
Figure imgf000257_0001
Figure imgf000258_0001
Figure imgf000259_0001
Figure imgf000260_0001
Figure imgf000261_0001
Figure imgf000262_0001
Figure imgf000263_0001
Figure imgf000264_0001
Figure imgf000265_0001
Figure imgf000266_0001
Figure imgf000267_0001
Figure imgf000268_0001
Figure imgf000269_0001
Figure imgf000270_0001
Figure imgf000271_0001
Figure imgf000272_0001
Figure imgf000273_0001
Figure imgf000274_0001
Figure imgf000275_0001
Figure imgf000276_0001
Figure imgf000277_0001
Figure imgf000278_0001
Figure imgf000279_0001
Figure imgf000280_0001
Figure imgf000281_0001
Figure imgf000282_0001
Figure imgf000283_0001
Figure imgf000284_0001
Figure imgf000285_0001
Figure imgf000286_0001
Figure imgf000287_0001
Figure imgf000288_0001
Figure imgf000289_0001
Figure imgf000290_0001
Figure imgf000291_0001
Figure imgf000292_0001
Figure imgf000293_0001
Figure imgf000294_0001
Figure imgf000295_0001
Figure imgf000296_0001
Figure imgf000297_0001
Figure imgf000298_0001
Figure imgf000299_0001
Figure imgf000300_0001
Figure imgf000301_0001
Figure imgf000302_0001
Figure imgf000303_0001
Figure imgf000304_0001
Figure imgf000305_0001
Figure imgf000306_0001
Figure imgf000307_0001
Figure imgf000308_0001
Figure imgf000309_0001
Figure imgf000310_0001
Figure imgf000311_0001
Figure imgf000312_0001
Figure imgf000313_0001
Figure imgf000314_0001
Figure imgf000315_0001
Figure imgf000316_0001
Figure imgf000317_0001
Figure imgf000318_0001
Figure imgf000319_0001
Figure imgf000320_0001
Figure imgf000321_0001
Figure imgf000322_0001
Figure imgf000323_0001
Figure imgf000324_0001
Figure imgf000325_0001
Figure imgf000326_0001
Figure imgf000327_0001
Figure imgf000328_0001
Figure imgf000329_0001
Figure imgf000330_0001
Figure imgf000331_0001
Figure imgf000332_0001
Figure imgf000333_0001
Figure imgf000334_0001
Figure imgf000335_0001
Figure imgf000336_0001
Figure imgf000337_0001
Figure imgf000338_0001
Figure imgf000339_0001
Figure imgf000340_0001
Figure imgf000341_0001
Figure imgf000342_0001
Figure imgf000343_0001
Figure imgf000344_0001
Figure imgf000345_0001
Figure imgf000346_0001
Figure imgf000347_0001
Figure imgf000348_0001
Figure imgf000349_0001
Figure imgf000350_0001
Figure imgf000351_0001
Figure imgf000352_0001
Figure imgf000353_0001
Figure imgf000354_0001
Figure imgf000355_0001
Figure imgf000356_0001
Figure imgf000357_0001
Figure imgf000358_0001
Figure imgf000359_0001
Figure imgf000360_0001
Figure imgf000361_0001
Figure imgf000362_0001
Figure imgf000363_0001
Figure imgf000364_0001
Figure imgf000365_0001
Figure imgf000366_0001
Figure imgf000367_0001
Figure imgf000368_0001
Figure imgf000369_0001
Figure imgf000370_0001
Figure imgf000371_0001
Figure imgf000372_0001
Figure imgf000373_0001
Figure imgf000374_0001
Figure imgf000375_0001
Figure imgf000376_0001
Figure imgf000377_0001
Figure imgf000378_0001
Figure imgf000379_0001
Figure imgf000380_0001
Figure imgf000381_0001
Figure imgf000382_0001
Figure imgf000383_0001
Figure imgf000384_0001
Figure imgf000385_0001
Figure imgf000386_0001
Figure imgf000387_0001
Figure imgf000388_0001
Figure imgf000389_0001
Figure imgf000390_0001
Figure imgf000391_0001
Figure imgf000392_0001
Figure imgf000393_0001
Figure imgf000394_0001
Figure imgf000395_0001
Figure imgf000396_0001
Figure imgf000397_0001
Figure imgf000398_0001
Figure imgf000399_0001
Figure imgf000400_0001
Figure imgf000401_0001
Figure imgf000402_0001
Figure imgf000403_0001
Figure imgf000404_0001
Figure imgf000405_0001
Figure imgf000406_0001
Figure imgf000407_0001
Figure imgf000408_0001
Figure imgf000409_0001
Figure imgf000410_0001
Figure imgf000411_0001
Figure imgf000412_0001
Figure imgf000413_0001
Figure imgf000414_0001
Figure imgf000415_0001
Figure imgf000416_0001
Figure imgf000417_0001
Figure imgf000418_0001
Figure imgf000419_0001
Figure imgf000420_0001
Figure imgf000421_0001
Figure imgf000422_0001
Figure imgf000423_0001
Figure imgf000424_0001
Figure imgf000425_0001
Figure imgf000426_0001
Figure imgf000427_0001
Figure imgf000428_0001
Figure imgf000429_0001
Figure imgf000430_0001
Figure imgf000431_0001
Figure imgf000432_0001
Figure imgf000433_0001
Figure imgf000434_0001
Figure imgf000435_0001
Figure imgf000436_0001
Figure imgf000437_0001
Figure imgf000438_0001
Figure imgf000439_0001
Figure imgf000440_0001
Figure imgf000441_0001
Figure imgf000442_0001
Figure imgf000443_0001
Figure imgf000444_0001
Figure imgf000445_0001
Figure imgf000446_0001
Figure imgf000447_0001
Figure imgf000448_0001
Figure imgf000449_0001
Figure imgf000450_0001
Figure imgf000451_0001
Figure imgf000452_0001
Figure imgf000453_0001
Figure imgf000454_0001
Figure imgf000455_0001
Figure imgf000456_0001
Figure imgf000457_0001
Figure imgf000458_0001
Figure imgf000459_0001
Figure imgf000460_0001
Figure imgf000461_0001
Figure imgf000462_0001
Figure imgf000463_0001
Figure imgf000464_0001
Figure imgf000465_0001
Figure imgf000466_0001
Figure imgf000467_0001
Figure imgf000468_0001
Figure imgf000469_0001
Figure imgf000470_0001
Figure imgf000471_0001
Figure imgf000472_0001
Figure imgf000473_0001
Figure imgf000474_0001
Figure imgf000475_0001
Figure imgf000476_0001
Figure imgf000477_0001
Figure imgf000478_0001
Figure imgf000479_0001
Figure imgf000480_0001
Figure imgf000481_0001
Figure imgf000482_0001
Figure imgf000483_0001
Figure imgf000484_0001
Figure imgf000485_0001
Figure imgf000486_0001
Figure imgf000487_0001
Figure imgf000488_0001
Figure imgf000489_0001
Figure imgf000490_0001
Figure imgf000491_0001
Figure imgf000492_0001
Figure imgf000493_0001
Figure imgf000494_0001
Figure imgf000495_0001
Figure imgf000496_0001
Figure imgf000497_0001
Figure imgf000498_0001
Figure imgf000499_0001
Figure imgf000500_0001
Figure imgf000501_0001
Figure imgf000502_0001
Figure imgf000503_0001
Figure imgf000504_0001
Figure imgf000505_0001
Figure imgf000506_0001
Figure imgf000507_0001
Figure imgf000508_0001
Figure imgf000509_0001
Figure imgf000510_0001
Figure imgf000511_0001
Figure imgf000512_0001
Figure imgf000513_0001
Figure imgf000514_0001
Figure imgf000515_0001
Figure imgf000516_0001
Figure imgf000517_0001
Figure imgf000518_0001
Figure imgf000519_0001
Figure imgf000520_0001
Figure imgf000521_0001
Figure imgf000522_0001
Figure imgf000523_0001
Figure imgf000524_0001
Figure imgf000525_0001
Figure imgf000526_0001
Figure imgf000527_0001
Figure imgf000528_0001
Figure imgf000529_0001
Figure imgf000530_0001
Figure imgf000531_0001
Figure imgf000532_0001
Figure imgf000533_0001
Figure imgf000534_0001
Figure imgf000535_0001
Figure imgf000536_0001
Figure imgf000537_0001
Figure imgf000538_0001
Figure imgf000539_0001
Figure imgf000540_0001
Figure imgf000541_0001
Figure imgf000542_0001
Figure imgf000543_0001
Figure imgf000544_0001
Figure imgf000545_0001
Figure imgf000546_0001

Claims

CLAIMS What is claimed is:
1. A method of selecting a guide RNA for use in a Cas-based genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising:
(i) identifying in a nucleotide sequence of a target genomic location one or more available cut sites for a Cas-based genome editing system;
(ii) analyzing the nucleotide sequence and cut site with a computational model to identify a guide RNA capable of introducing the genetic change into the nucleotide sequence of the target genomic location.
2. The method of claim 1 , wherein the guide RNA identified by the method is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6.
3. The method of claim 1, wherein the guide RNA identified by the method is a guide RNA known in the art.
4. The method of claim 1 , wherein the guide RNA identified by the method is a new guide RNA disclosed in any of Tables 1-6.
5. The method of claim 1, wherein the Cas-based genome editing system is capable of editing the genome without homology-directed repair.
6. The method of claim 1, wherein the Cas-based genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
7. The method of claim 1, wherein the Cas-based genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof.
8. The method of claim 1, wherein the Cas-based genome editing system comprises a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof.
9. The method of claim 1, wherein the Cas-based genome editing system comprises a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
10. The method of claim 1, wherein the Cas is Streptococcus pyogenes Cas9 (SpCas9),
Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof.
11. The method of claim 1 , wherein the genetic change is to a genetic mutation.
12. The method of claim 11, wherein the genetic mutation is a single-nucleotide polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
13. The method of claim 1, wherein the genetic change comprises a 2-60-bp deletion or a l-bp insertion.
14. The method of claim 11, wherein the genetic mutation causes a disease or a risk of a disease.
15. The method of claims 14, wherein the disease is a monogenic disease.
16. The method of claim 15, wherein the monogenic disease is sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
17. The method of claim 1, wherein the step of identifying the available cut sites comprises identifying one or more PAM sequences.
18. The method of claim 1, wherein the computational model is a deep learning computational model.
19. The method of claim 1, wherein the computational model is a neural network model having one or more hidden layers.
20. The method of claim 1 , wherein the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site.
21. The method of claim 1, wherein the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
22. The method of claim 1 , wherein the computational model comprises one or more training modules for evaluating experimental data.
23. The method of claim 1, wherein the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
24. The method of claim 1 , wherein the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
25. The method of claim 24, wherein the genomic repair outcomes comprise microhomology deletions, microhomology-less deletions, and l-bp insertions.
26. The method of claim 1 , wherein the computational model comprises a one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
27. The method of claim 1, wherein the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200- 800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more.
28. A method of introducing a genetic change in the genome of a cell with a Cas-based genome editing system comprising:
(i) selecting a guide RNA for use in the Cas-based genome editing system in accordance with the method of any of claims 1-27; and
(ii) contacting the genome of the cell with the guide RNA and the Cas-based genome editing system, thereby introducing the genetic change.
29. The method of claim 28, wherein the method of correcting the genetic error in the genome of a cell is in vivo.
30. The method of claim 28, wherein the method of correcting the genetic error in the genome of a cell is ex vivo, in vivo, or ex vivo.
31. The method of claim 28, wherein one or more repair mechanisms of the cell are inhibited.
32. The method of claim 28, wherein the genetic change restores the function of a gene.
33. The method of claim 28, wherein the genetic change corrects a disease-causing mutation.
34. The method of claim 33, wherein the disease of the disease-causing mutation is sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta- thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
35. A method of treating a genetic disease in a subject caused by a genetic error in the genome of one or more cells of the subject, comprising:
(i) selecting a guide RNA for use in a Cas-based genome editing system in accordance with the method of any of claims 1-27; and
(ii) contacting the genome of the one or more cells of the subject with the guide RNA and the Cas-based genome editing system, thereby correcting the genetic error in the genome of the cell.
36. The method of claim 35, wherein the method is in vivo, ex vivo, or in vitro.
37. The method of claim 35, wherein the method outside a cellular context.
38. The method of claim 35, wherein one or more repair mechanisms of the cell are inhibited.
39. The method of claim 35, wherein the genetic error is a disease-causing mutation.
40. The method of claim 39, wherein the disease of the disease-causing mutation is sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta- thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
41. A guide RNA identified by the method of claim 1, wherein the guide RNA is selected from the group consisting of a guide RNA nucleotide sequence of Tables 1-6.
42. The guide RNA of claim 41, wherein the guide RNA comprises one or more modifications.
43. The guide RNA of claim 42, wherein the modifications are selected from the group
consisting of: nucleoside analogs, chemically modified bases, intercalated bases, modified sugars, and modified phosphate group linkers.
44. The guide RNA of claim 41, wherein the guide RNA further comprises one or more
phosphorothioate and/or 5’-N-phosphporamidite linkages.
45. A vector comprising a nucleotide sequence encoding one or more guide RNAs of claim 41.
46. A host cell comprising a vector encoding one or more guide RNAs of claim 41.
47. An isolated guide RNA for use in a Cas-based genome editing system for editing a genome of a mammalian cell selected from the group consisting of a guide RNA nucleotide sequence of Tables 1-6.
48. A vector comprising a nucleotide sequence encoding a guide RNA of claim 47.
49. A host cell comprising the vector of claim 48.
50. A Cas-based genome editing system comprising a Cas protein complexed with at least one guide RNA identified by the method of claim 1.
51. A Cas-based genome editing system comprising a Cas protein complexed with a least one guide RNA selected from the group consisting of a guide RNA nucleotide sequence of Tables 1-6.
52. A Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein and at least one other expressible nucleotide sequence encoding a guide RNA, wherein the guide RNA is identified by the method of claim 1.
53. A Cas-based genome editing system comprising an expression vector having at least one expressible nucleotide sequence encoding a Cas protein and at least one other expressible nucleotide sequence encoding a guide RNA selected from the group consisting of a guide RNA nucleotide sequence of Tables 1-6.
54. A library for training a computational model for selecting a guide RNA sequence for use with a Cas-based genome editing system capable of introducing a genetic change into a genome without homology-directed repair, wherein the library comprises a plurality of vectors each comprising a first nucleotide sequence of a target genomic location having a cut site and a second nucleotide sequence encoding a cognate guide RNA capable of directing a Cas-based genome editing system to carry out a double-strand break at the cut site of the first nucleotide sequence.
55. The library of claim 54, wherein the first nucleotide sequence is a non-naturally occurring sequence.
56. The library of claim 54, wherein the first nucleotide sequence is a naturally-occurring
genomic sequence.
57. The library of claim 56, wherein the naturally-occurring genomic sequence comprises a disease-causing mutation.
58. The library of claim 57, wherein the disease-causing mutation causes sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta- thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, or Duchenne muscular dystrophy.
59. A host cell comprising at least one member vector of the library of claim 54.
60. The host cell of claim 59, wherein the vector or a portion thereof is integrated into the genome of the host cell.
61. An isolated Cas protein complexed with a guide RNA of any of claims 41-44.
62. A computational model that is capable of selecting a guide RNA for use with a Cas-based genome editing system introduce a genetic change in a genome.
63. The computational model of claim 62, wherein the model is a neural network model having one or more hidden layers.
64. The computational model of claim 62, wherein the model is a deep learning computational model.
65. The computational model of claim 62, wherein the Cas-based genome editing system edits the genome without homology-based repair.
66. The computational model of claim 62, wherein the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site.
67. The computational model of claim 62, wherein the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
68. The computational model of claim 62, wherein the computational model comprises one or more training modules for evaluating experimental data.
69. The computational model of claim 62, wherein the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
70. The computational model of claim 62, wherein the computational model predicts genomic repair outcomes for any given input nucleotide sequence and cut site.
71. The computational model of claim 70, wherein the genomic repair outcomes comprise
microhomology deletions, microhomology-less deletions, and l-bp insertions.
72. The computational model of claim 62, wherein the computational model comprises a one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site.
73. The computational model of claim 62, wherein the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200-800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600- 6400 nucleotide, or more.
74. A method for training a computational model of any of claims 62-73, comprising: (i)
preparing a library comprising a plurality of nucleic acid molecules each encoding a nucleotide target sequence and a cognate guide RNA, wherein each nucleotide target sequence comprises a cut site; (ii) introducing the library into a plurality of host cells; (iii) contacting the library in the host cells with a Cas-based genome editing system to produce a plurality of genomic repair products; (iv) determining the sequences of the genomic repair products; and (iv) training the computational model with input data that comprises at least the sequences of the genomic repair products and the cut sites.
75. The method of claim 74, wherein the trained computational model resulting from the method is capable of computing a probability of distribution of indel lengths for any given nucleotide sequence and cut site, and/or a probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
76. The method of claim 74, wherein the trained computational model is capable of selecting a guide RNA for use with a Cas-based genome editing system for introducing a genetic change into a genome.
77. The method of claim 76, wherein the genetic change is a microhomology deletions,
microhomology-less deletions, and l-bp insertions.
78. The method of claim 76, wherein the genetic change corrects a disease-causing mutation.
79. The method of claim 78, wherein the disease-causing mutation causes sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta- thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, or Duchenne muscular dystrophy.
80. A method for selecting one or more guide RNAs (gRNAs) from a plurality of gRNAs for CRISPR, comprising acts of: for at least one gRNA of the plurality of gRNAs, using a local DNA sequence and a cut site targeted by the at least one gRNA to predict a frequency of one or more repair genotypes resulting from template-free repair following application of CRISPR with the at least one gRNA; and determining whether to select the at least one gRNA based at least in part on the predicted frequency of the one or more repair genotypes.
81. The method of claim 80, wherein the one or more repair genotypes correspond to one or more healthy alleles of a gene related to a disease.
82. The method of claim 80, wherein the predicted frequency of the one or more repair
genotypes is at least about 30%, or at least about 40%, or at least about 50%.
83. The method of claim 80, wherein predicting the frequency of the one or more repair genotypes comprises: for each deletion length of a plurality of deletion lengths, aligning subsequences of that deletion length on 5’ and 3’ sides of the cut site to identify one or more longest
microhomologies ; featurizing the identified microhomologies; applying a machine learning model to compute a frequency distribution over the plurality of deletion lengths; and using frequency distribution over the plurality of deletion lengths to determine the frequency of the one or more repair genotypes.
84. The method of claim 80, wherein the plurality of gRNAs comprise gRNAs for
CRISPR/Cas9, and the application of CRISPR comprises application of CRISPR/Cas9.
85. A system comprising: at least one processor; and at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause the at least one processor to perform the method of any of claims 80- 84.
86. At least one computer-readable storage medium having encoded thereon instructions which, when executed, cause at least one processor to perform the method of any of claims 80-84.
87. A method for CRISPR editing of DNA that utilizes a guide RNA in the absence of a
homology directed repair template, wherein the guide RNA is selected to produce one or more selected genotypic outcomes.
88. A method of predicting a genomic repair profile of a double-strand brake (DSB)-inducing genome editing system capable of introducing a genetic change into a nucleotide sequence of a target genomic location, the method comprising:
(i) identifying one or more available cut sites at a nucleotide sequence of a target genomic location;
(ii) analyzing the nucleotide sequence and available cut sites with a computational model to identify the optimal cut site for introducing the genetic change into the nucleotide sequence of the target genomic location.
89. The method of claim 88, wherein the available cut sites are a function of the DSB-inducing genome editing system.
90. The method of claim 89, wherein the DSB-inducing genome editing system is a TALENS- based editing system, a Cas-based genome editing system, or a Zinc-Finger-based genome editing system.
91. The method of claim 89, wherein the DSB-inducing genome editing system is a Cas-based genome editing system.
92. The method of claim 91 , wherein the method further comprises selecting a cognate guide RNA capable of directing a double-strand break at the optimal cut site by the Cas-based genome editing system.
93. The method of claim 92, wherein the guide RNA is selected from the group consisting the guide RNA sequences listed in any of Tables 1-6.
94. The method of claim 92, wherein the guide RNA is known in the art.
95. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome editing system is capable of editing the genome without homology-directed repair.
96. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome editing system comprises a type I Cas RNA-guided endonuclease, or a variant or orthologue thereof.
97. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome editing system comprises a type II Cas RNA-guided endonuclease, or a functional variant or orthologue thereof.
98. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome editing system comprises a Cas9 RNA-guided endonuclease, or a variant or orthologue thereof.
99. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome editing system comprises a Cpfl RNA-guided endonuclease, or a variant or orthologue thereof.
100. The method of claim 88, wherein the double-strand brake (DSB)-inducing genome
editing system comprises a Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus pyogenes Cas9 (SpCas9), Staphyloccocus aureus Cas (SaCas9), Francisella novicida Cas9 (FnCas9), or a functional variant or orthologue thereof.
101. The method of claim 88, wherein the genetic change is to a genetic mutation.
102. The method of claim 101, wherein the genetic mutation is a single-nucleotide
polymorphism, a deletion mutation, an insertion mutation, or a microduplication error.
103. The method of claim 88, wherein the genetic change comprises a 2-60-bp deletion or a 1- bp insertion.
104. The method of claim 101, wherein the genetic mutation causes a disease or a risk of a disease.
105. The method of claim 88, wherein the genetic change is a desired modification to a
wildtype gene that confers one or more beneficial traits.
106. The method of claims 104, wherein the disease is a monogenic disease.
107. The method of claim 106, wherein the monogenic disease is sickle cell disease, cystic fibrosis, polycystic kidney disease, Tay-Sachs disease, achondroplasia, beta-thalassemia, Hurler syndrome, severe combined immunodeficiency, hemophilia, glycogen storage disease la, and Duchenne muscular dystrophy.
108. The method of claim 88, wherein the step of identifying the available cut sites comprises identifying one or more PAM sequences in the case of a Cas-based genome editing system.
109. The method of claim 88, wherein the computational model is a deep learning
computational model.
110. The method of claim 88, wherein the computational model is a neural network model having one or more hidden layers.
111. The method of claim 88, wherein the computational model is trained with experimental data to predict the probability of distribution of indel lengths for any given nucleotide sequence and cut site.
112. The method of claim 88, wherein the computational model is trained with experimental data to predict the probability of distribution of genotype frequencies for any given nucleotide sequence and cut site.
113. The method of claim 88, wherein the computational model comprises one or more
training modules for evaluating experimental data.
114. The method of claim 88, wherein the computational model comprises: a first training module (305) for computing a microhomology score matrix (305); a second training module (310) for computing a microhomology independent score matrix; and a third training module (315) for computing a probability distribution over l-bp insertions, wherein once trained with experimental data the computational model computes a probability distribution over indel genotypes and a probability distribution over indel lengths for any given input nucleotide sequence and cut site.
115. The method of claim 88, wherein the computational model predicts genomic repair
outcomes for any given input nucleotide sequence and cut site.
116. The method of claim 115, wherein the genomic repair outcomes comprise
microhomology deletions, microhomology-less deletions, and l-bp insertions.
117. The method of claim 1, wherein the computational model comprises a one or more modules each comprising one more input features selected from the group consisting of: a target site nucleotide sequence; a cut site; a PAM-sequence; microhomology lengths relative at a cut site, % GC content at a cut site; and microhomology deletion lengths at a cut site, and type of DSB-genome editing system.
118. The method of claim 1, wherein the nucleotide sequence analyzed by the computational model is between about 25-100 nucleotides, 50-200 nucleotides, 100-400 nucleotides, 200- 800 nucleotides, 400-1600 nucleotides, 800-3200 nucleotides, and 1600-6400 nucleotide, or more.
PCT/US2018/065886 2017-12-15 2018-12-15 Systems and methods for predicting repair outcomes in genetic engineering Ceased WO2019118949A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/772,747 US12406749B2 (en) 2017-12-15 2018-12-15 Systems and methods for predicting repair outcomes in genetic engineering
EP18887576.9A EP3724214A4 (en) 2017-12-15 2018-12-15 SYSTEMS AND METHODS FOR PREDICTING REPAIR RESULTS IN GENETIC ENGINEERING

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762599623P 2017-12-15 2017-12-15
US62/599,623 2017-12-15
US201862669771P 2018-05-10 2018-05-10
US62/669,771 2018-05-10

Publications (1)

Publication Number Publication Date
WO2019118949A1 true WO2019118949A1 (en) 2019-06-20

Family

ID=66819534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/065886 Ceased WO2019118949A1 (en) 2017-12-15 2018-12-15 Systems and methods for predicting repair outcomes in genetic engineering

Country Status (3)

Country Link
US (1) US12406749B2 (en)
EP (1) EP3724214A4 (en)
WO (1) WO2019118949A1 (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
WO2021003343A1 (en) * 2019-07-03 2021-01-07 Integrated Dna Technologies, Inc. Identification, characterization, and quantitation of crispr-introduced double-stranded dna break repairs
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
WO2021072309A1 (en) * 2019-10-09 2021-04-15 Massachusetts Institute Of Technology Systems, methods, and compositions for correction of frameshift mutations
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
EP3814510A4 (en) * 2018-05-04 2022-02-23 University of Massachusetts MICRO-HOMELOGY-MEDIATED REPAIR OF MICRO-DUPLICATION GENE MUTATIONS
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11345932B2 (en) 2018-05-16 2022-05-31 Synthego Corporation Methods and systems for guide RNA design and use
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
CN115806989A (en) * 2022-11-25 2023-03-17 昆明理工大学 sgRNA, carrier and application for DMD gene exon 5 mutation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12006520B2 (en) 2011-07-22 2024-06-11 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
WO2024163862A2 (en) 2023-02-03 2024-08-08 The Broad Institute, Inc. Gene editing methods, systems, and compositions for treating spinal muscular atrophy
US12123033B2 (en) 2019-10-24 2024-10-22 Integrated Dna Technologies, Inc. Modified double-stranded donor templates
US12157760B2 (en) 2018-05-23 2024-12-03 The Broad Institute, Inc. Base editors and uses thereof
WO2025040617A1 (en) * 2023-08-18 2025-02-27 Universität Zürich Microhomology mediated integration of cargo nucleic acid molecules
US12281338B2 (en) 2018-10-29 2025-04-22 The Broad Institute, Inc. Nucleobase editors comprising GeoCas9 and uses thereof
US12351837B2 (en) 2019-01-23 2025-07-08 The Broad Institute, Inc. Supernegatively charged proteins and uses thereof
US12390514B2 (en) 2017-03-09 2025-08-19 President And Fellows Of Harvard College Cancer vaccine
US12406749B2 (en) 2017-12-15 2025-09-02 The Broad Institute, Inc. Systems and methods for predicting repair outcomes in genetic engineering
US12435330B2 (en) 2019-10-10 2025-10-07 The Broad Institute, Inc. Methods and compositions for prime editing RNA
US12473543B2 (en) 2019-04-17 2025-11-18 The Broad Institute, Inc. Adenine base editors with reduced off-target effects
US12522807B2 (en) 2018-07-09 2026-01-13 The Broad Institute, Inc. RNA programmable epigenetic RNA modifiers and uses thereof
US12559737B2 (en) 2013-09-06 2026-02-24 President And Fellows Of Harvard College Cas9 variants and uses thereof

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024227082A1 (en) * 2023-04-26 2024-10-31 Vijay Sankaran Base-editing perturbation screens and uses thereof
WO2025240206A1 (en) * 2024-05-14 2025-11-20 The Broad Institute, Inc. Mechanistic model for improving prime editing systems, compositions, and methods of using same

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016196805A1 (en) * 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
WO2017049129A2 (en) * 2015-09-18 2017-03-23 President And Fellows Of Harvard College Methods of making guide rna
WO2017081097A1 (en) * 2015-11-09 2017-05-18 Ifom Fondazione Istituto Firc Di Oncologia Molecolare Crispr-cas sgrna library
WO2017083766A1 (en) * 2015-11-13 2017-05-18 Massachusetts Institute Of Technology High-throughput crispr-based library screening
WO2017147056A1 (en) 2016-02-22 2017-08-31 Caribou Biosciences, Inc. Methods for modulating dna repair outcomes
US20170283831A1 (en) * 2014-12-12 2017-10-05 The Broad Institute Inc. Protected guide rnas (pgrnas)

Family Cites Families (1852)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4217344A (en) 1976-06-23 1980-08-12 L'oreal Compositions containing aqueous dispersions of lipid spheres
US4235871A (en) 1978-02-24 1980-11-25 Papahadjopoulos Demetrios P Method of encapsulating biologically active materials in lipid vesicles
US4186183A (en) 1978-03-29 1980-01-29 The United States Of America As Represented By The Secretary Of The Army Liposome carriers in chemotherapy of leishmaniasis
US4182449A (en) 1978-04-18 1980-01-08 Kozlow William J Adhesive bandage and package
US4261975A (en) 1979-09-19 1981-04-14 Merck & Co., Inc. Viral liposome particle
US4663290A (en) 1982-01-21 1987-05-05 Molecular Genetics, Inc. Production of reverse transcriptase
US4485054A (en) 1982-10-04 1984-11-27 Lipoderm Pharmaceuticals Limited Method of encapsulating biologically active materials in multilamellar lipid vesicles (MLV)
US4501728A (en) 1983-01-06 1985-02-26 Technology Unlimited, Inc. Masking of liposomes from RES recognition
US4880635B1 (en) 1984-08-08 1996-07-02 Liposome Company Dehydrated liposomes
US5049386A (en) 1985-01-07 1991-09-17 Syntex (U.S.A.) Inc. N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4946787A (en) 1985-01-07 1990-08-07 Syntex (U.S.A.) Inc. N-(ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4897355A (en) 1985-01-07 1990-01-30 Syntex (U.S.A.) Inc. N[ω,(ω-1)-dialkyloxy]- and N-[ω,(ω-1)-dialkenyloxy]-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4797368A (en) 1985-03-15 1989-01-10 The United States Of America As Represented By The Department Of Health And Human Services Adeno-associated virus as eukaryotic expression vector
US4921757A (en) 1985-04-26 1990-05-01 Massachusetts Institute Of Technology System for delayed and pulsed release of biologically active substances
US4774085A (en) 1985-07-09 1988-09-27 501 Board of Regents, Univ. of Texas Pharmaceutical administration systems containing a mixture of immunomodulators
US5139941A (en) 1985-10-31 1992-08-18 University Of Florida Research Foundation, Inc. AAV transduction vectors
US4737323A (en) 1986-02-13 1988-04-12 Liposome Technology, Inc. Liposome extrusion method
US5017492A (en) 1986-02-27 1991-05-21 Life Technologies, Inc. Reverse transcriptase and method for its production
DE122007000007I1 (en) 1986-04-09 2007-05-16 Genzyme Corp Genetically transformed animals secreting a desired protein in milk
US5374553A (en) 1986-08-22 1994-12-20 Hoffmann-La Roche Inc. DNA encoding a thermostable nucleic acid polymerase enzyme from thermotoga maritima
US5079352A (en) 1986-08-22 1992-01-07 Cetus Corporation Purified thermostable enzyme
WO1992006200A1 (en) 1990-09-28 1992-04-16 F. Hoffmann-La-Roche Ag 5' to 3' exonuclease mutations of thermostable dna polymerases
US4889818A (en) 1986-08-22 1989-12-26 Cetus Corporation Purified thermostable enzyme
US4920016A (en) 1986-12-24 1990-04-24 Linear Technology, Inc. Liposomes with enhanced circulation time
US4837028A (en) 1986-12-24 1989-06-06 Liposome Technology, Inc. Liposomes with enhanced circulation time
JPH0825869B2 (en) 1987-02-09 1996-03-13 株式会社ビタミン研究所 Antitumor agent-embedded liposome preparation
US4911928A (en) 1987-03-13 1990-03-27 Micro-Pak, Inc. Paucilamellar lipid vesicles
US4917951A (en) 1987-07-28 1990-04-17 Micro-Pak, Inc. Lipid vesicles formed of surfactants and steroids
DE3874735T2 (en) 1987-04-23 1993-04-22 Fmc Corp INSECTICIDAL CYCLOPROPYL SUBSTITUTED DI (ARYL) COMPOUNDS.
US4873316A (en) 1987-06-23 1989-10-10 Biogen, Inc. Isolation of exogenous recombinant proteins from the milk of transgenic mammals
ATE115999T1 (en) 1987-12-15 1995-01-15 Gene Shears Pty Ltd RIBOZYMES.
US5244797B1 (en) 1988-01-13 1998-08-25 Life Technologies Inc Cloned genes encoding reverse transcriptase lacking rnase h activity
US4965185A (en) 1988-06-22 1990-10-23 Grischenko Valentin I Method for low-temperature preservation of embryos
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
EP0436597B1 (en) 1988-09-02 1997-04-02 Protein Engineering Corporation Generation and selection of recombinant varied binding proteins
US5270179A (en) 1989-08-10 1993-12-14 Life Technologies, Inc. Cloning and expression of T5 DNA polymerase reduced in 3'- to-5' exonuclease activity
US5047342A (en) 1989-08-10 1991-09-10 Life Technologies, Inc. Cloning and expression of T5 DNA polymerase
AU637800B2 (en) 1989-08-31 1993-06-10 City Of Hope Chimeric dna-rna catalytic sequences
US5264618A (en) 1990-04-19 1993-11-23 Vical, Inc. Cationic lipids for intracellular delivery of biologically active molecules
US5427908A (en) 1990-05-01 1995-06-27 Affymax Technologies N.V. Recombinant library screening methods
WO1991017424A1 (en) 1990-05-03 1991-11-14 Vical, Inc. Intracellular delivery of biologically active substances by means of self-assembling lipid complexes
US5637459A (en) 1990-06-11 1997-06-10 Nexstar Pharmaceuticals, Inc. Systematic evolution of ligands by exponential enrichment: chimeric selex
US5580737A (en) 1990-06-11 1996-12-03 Nexstar Pharmaceuticals, Inc. High-affinity nucleic acid ligands that discriminate between theophylline and caffeine
DE553264T1 (en) 1990-10-05 1994-04-28 Wayne M Barnes THERMOSTABLE DNA POLYMERASE.
JP3257675B2 (en) 1990-10-12 2002-02-18 マックス−プランク−ゲゼルシャフト ツール フェルデルング デル ビッセンシャフテン エー.ファウ. Modified ribozyme
US5173414A (en) 1990-10-30 1992-12-22 Applied Immune Sciences, Inc. Production of recombinant adeno-associated virus vectors
NZ241310A (en) 1991-01-17 1995-03-28 Gen Hospital Corp Trans-splicing ribozymes
NZ241311A (en) 1991-01-17 1995-03-28 Gen Hospital Corp Rna sequence having trans-splicing activity, plant strains
DK0580737T3 (en) 1991-04-10 2004-11-01 Scripps Research Inst Heterodimeric receptor libraries using phagemids
DE4216134A1 (en) 1991-06-20 1992-12-24 Europ Lab Molekularbiolog SYNTHETIC CATALYTIC OLIGONUCLEOTIDE STRUCTURES
US6872816B1 (en) 1996-01-24 2005-03-29 Third Wave Technologies, Inc. Nucleic acid detection kits
US5652094A (en) 1992-01-31 1997-07-29 University Of Montreal Nucleozymes
JPH05274181A (en) 1992-03-25 1993-10-22 Nec Corp Setting/canceling system for break point
US5587308A (en) 1992-06-02 1996-12-24 The United States Of America As Represented By The Department Of Health & Human Services Modified adeno-associated virus vector capable of expression from a novel promoter
US5496714A (en) 1992-12-09 1996-03-05 New England Biolabs, Inc. Modification of protein by use of a controllable interveining protein sequence
US5834247A (en) 1992-12-09 1998-11-10 New England Biolabs, Inc. Modified proteins comprising controllable intervening protein sequences or their elements methods of producing same and methods for purification of a target protein comprised by a modified protein
US5434058A (en) 1993-02-09 1995-07-18 Arch Development Corporation Apolipoprotein B MRNA editing protein compositions and methods
US5436149A (en) 1993-02-19 1995-07-25 Barnes; Wayne M. Thermostable DNA polymerase with enhanced thermostability and enhanced length and efficiency of primer extension
CN1127527A (en) 1993-05-17 1996-07-24 加利福尼亚大学董事会 Ribozyme gene therapy method for human immunodeficiency virus infection and AIDS
US5512462A (en) 1994-02-25 1996-04-30 Hoffmann-La Roche Inc. Methods and reagents for the polymerase chain reaction amplification of long DNA sequences
US5651981A (en) 1994-03-29 1997-07-29 Northwestern University Cationic phospholipids for transfection
US5874560A (en) 1994-04-22 1999-02-23 The United States Of America As Represented By The Department Of Health And Human Services Melanoma antigens and their use in diagnostic and therapeutic methods
US5912155A (en) 1994-09-30 1999-06-15 Life Technologies, Inc. Cloned DNA polymerases from Thermotoga neapolitana
US5614365A (en) 1994-10-17 1997-03-25 President & Fellow Of Harvard College DNA polymerase having modified nucleotide binding site for DNA sequencing
US5449639A (en) 1994-10-24 1995-09-12 Taiwan Semiconductor Manufacturing Company Ltd. Disposable metal anti-reflection coating process used together with metal dry/wet etch
US5767099A (en) 1994-12-09 1998-06-16 Genzyme Corporation Cationic amphiphiles containing amino acid or dervatized amino acid groups for intracellular delivery of therapeutic molecules
US6057153A (en) 1995-01-13 2000-05-02 Yale University Stabilized external guide sequences
US5795587A (en) 1995-01-23 1998-08-18 University Of Pittsburgh Stable lipid-comprising drug delivery complexes and methods for their production
US5830430A (en) 1995-02-21 1998-11-03 Imarx Pharmaceutical Corp. Cationic lipids and the use thereof
US5851548A (en) 1995-06-07 1998-12-22 Gen-Probe Incorporated Liposomes containing cationic lipids and vitamin D
US5773258A (en) 1995-08-25 1998-06-30 Roche Molecular Systems, Inc. Nucleic acid amplification using a reversibly inactivated thermostable enzyme
NO953680D0 (en) 1995-09-18 1995-09-18 Hans Prydz Cell cycle Enzymes
US5962313A (en) 1996-01-18 1999-10-05 Avigen, Inc. Adeno-associated virus vectors comprising a gene encoding a lyosomal enzyme
US5840839A (en) 1996-02-09 1998-11-24 The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Alternative open reading frame DNA of a normal gene and a novel human cancer antigen encoded therein
US6077705A (en) 1996-05-17 2000-06-20 Thomas Jefferson University Ribozyme-mediated gene replacement
US20040156861A1 (en) 1996-07-11 2004-08-12 Figdor Carl Gustav Melanoma associated peptide analogues and vaccines against melanoma
US6887707B2 (en) 1996-10-28 2005-05-03 University Of Washington Induction of viral mutation by incorporation of miscoding ribonucleoside analogs into viral RNA
GB9701425D0 (en) 1997-01-24 1997-03-12 Bioinvent Int Ab A method for in vitro molecular evolution of protein function
CA2278931A1 (en) 1997-01-30 1998-08-06 University Of Virginia Patent Foundation Cysteine-depleted peptides recognized by a3-restricted cytotoxic lymphocytes, and uses therefor
US5981182A (en) 1997-03-13 1999-11-09 Albert Einstein College Of Medicine Of Yeshiva University Vector constructs for the selection and identification of open reading frames
US20040203109A1 (en) 1997-06-06 2004-10-14 Incyte Corporation Human regulatory proteins
US5849528A (en) 1997-08-21 1998-12-15 Incyte Pharmaceuticals, Inc.. Polynucleotides encoding a human S100 protein
US6355415B1 (en) 1997-09-29 2002-03-12 Ohio University Compositions and methods for the use of ribozymes to determine gene function
US6156509A (en) 1997-11-12 2000-12-05 Genencor International, Inc. Method of increasing efficiency of directed evolution of a gene using phagemid
US6429301B1 (en) 1998-04-17 2002-08-06 Whitehead Institute For Biomedical Research Use of a ribozyme to join nucleic acids and peptides
US6183998B1 (en) 1998-05-29 2001-02-06 Qiagen Gmbh Max-Volmer-Strasse 4 Method for reversible modification of thermostable enzymes
CA2331378A1 (en) 1998-06-12 1999-12-16 Sloan-Kettering Institute For Cancer Research Vaccination strategy to prevent and treat cancers
US8097648B2 (en) 1998-06-17 2012-01-17 Eisai R&D Management Co., Ltd. Methods and compositions for use in treating cancer
AU1115300A (en) 1998-10-13 2000-05-01 Advanced Research And Technology Institute, Inc. Assays for identifying functional alterations in the p53 tumor suppressor
DK1129064T3 (en) 1998-11-12 2008-04-28 Invitrogen Corp transfection
US6599692B1 (en) 1999-09-14 2003-07-29 Sangamo Bioscience, Inc. Functional genomics using zinc finger proteins
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US7013219B2 (en) 1999-01-12 2006-03-14 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6453242B1 (en) 1999-01-12 2002-09-17 Sangamo Biosciences, Inc. Selection of sites for targeting by zinc finger proteins and methods of designing zinc finger proteins to bind to preselected sites
US20090130718A1 (en) 1999-02-04 2009-05-21 Diversa Corporation Gene site saturation mutagenesis
AU3330700A (en) 1999-03-29 2000-10-16 Tasuku Honjo Novel cytidine deaminase
US6365410B1 (en) 1999-05-19 2002-04-02 Genencor International, Inc. Directed evolution of microorganisms
GB9920194D0 (en) 1999-08-27 1999-10-27 Advanced Biotech Ltd A heat-stable thermostable DNA polymerase for use in nucleic acid amplification
AU1661201A (en) 1999-11-18 2001-05-30 Epimmune, Inc. Heteroclitic analogs and related methods
CA2392490A1 (en) 1999-11-24 2001-05-31 Mcs Micro Carrier Systems Gmbh Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells
CA2394850C (en) 1999-12-06 2012-02-07 Sangamo Biosciences, Inc. Methods of using randomized libraries of zinc finger proteins for the identification of gene function
KR20020086508A (en) 2000-02-08 2002-11-18 상가모 바이오사이언스 인코포레이티드 Cells for drug discovery
US7378248B2 (en) 2000-03-06 2008-05-27 Rigel Pharmaceuticals, Inc. In vivo production of cyclic peptides for inhibiting protein-protein interaction
US7078208B2 (en) 2000-05-26 2006-07-18 Invitrogen Corporation Thermostable reverse transcriptases and uses thereof
US6573092B1 (en) 2000-10-10 2003-06-03 Genvec, Inc. Method of preparing a eukaryotic viral vector
DK1328543T3 (en) 2000-10-27 2009-11-23 Novartis Vaccines & Diagnostic Nucleic acids and proteins of streptococcus group A & B
US20040003420A1 (en) 2000-11-10 2004-01-01 Ralf Kuhn Modified recombinase
US7067650B1 (en) 2000-11-22 2006-06-27 National Institute Of Advanced Industrial Science And Technology Ribozymes targeting bradeion transcripts and use thereof
EP1360308B1 (en) 2001-01-25 2007-10-10 Evolva Ltd. Concatemers of differentially expressed multiple genes
US20050222030A1 (en) 2001-02-21 2005-10-06 Anthony Allison Modified annexin proteins and methods for preventing thrombosis
US20040115184A1 (en) 2001-02-27 2004-06-17 Smith Harold C Methods and compositions for modifying apolipoprotein b mrna editing
EP1423400B2 (en) 2001-03-19 2013-05-22 President and Fellows of Harvard College Evolving new molecular function
US7476500B1 (en) 2001-03-19 2009-01-13 President And Fellows Of Harvard College In vivo selection system for enzyme activity
AU2002257076A1 (en) 2001-03-19 2002-10-03 President And Fellows Of Harvard College Nucleic acid shuffling
US7807408B2 (en) 2001-03-19 2010-10-05 President & Fellows Of Harvard College Directed evolution of proteins
US20040197892A1 (en) 2001-04-04 2004-10-07 Michael Moore Composition binding polypeptides
US7083970B2 (en) 2001-04-19 2006-08-01 The Scripps Research Institute Methods and compositions for the production of orthogonal tRNA-aminoacyl tRNA synthetase pairs
AU2002330714A1 (en) 2001-05-30 2003-01-02 Biomedical Center In silico screening for phenotype-associated expressed sequences
WO2003004608A2 (en) 2001-07-06 2003-01-16 Incyte Genomics, Inc. Drug metabolizing enzymes
CA2454319A1 (en) 2001-07-26 2003-03-27 Stratagene Multi-site mutagenesis
US20030167533A1 (en) 2002-02-04 2003-09-04 Yadav Narendra S. Intein-mediated protein splicing
FR2837837B1 (en) 2002-03-28 2006-09-29 Roussy Inst Gustave PEPTIDE EPITOPES COMMON TO ANTIGENS OF THE SAME MULTIGENIC FAMILY
EP1506288B1 (en) 2002-05-10 2013-04-17 Medical Research Council Activation induced deaminase (aid)
AU2003274397A1 (en) 2002-06-05 2003-12-22 University Of Florida Production of pseudotyped recombinant aav virions
US9388459B2 (en) 2002-06-17 2016-07-12 Affymetrix, Inc. Methods for genotyping
AU2003251905A1 (en) 2002-07-12 2004-02-02 Affymetrix, Inc. Synthetic tag genes
AU2003263937B2 (en) 2002-08-19 2010-04-01 The President And Fellows Of Harvard College Evolving new molecular function
JP2006500030A (en) 2002-09-20 2006-01-05 イェール ユニバーシティ Riboswitch, method of using the same, and composition for use with riboswitch
US20090183270A1 (en) 2002-10-02 2009-07-16 Adams Thomas R Transgenic plants with enhanced agronomic traits
US8017323B2 (en) 2003-03-26 2011-09-13 President And Fellows Of Harvard College Free reactant use in nucleic acid-templated synthesis
ATE412902T1 (en) 2003-04-14 2008-11-15 Caliper Life Sciences Inc REDUCING MIGRATION SHIFT ASSAY INTERFERENCE
US8017755B2 (en) 2003-05-23 2011-09-13 President And Fellows Of Harvard College RNA-based transcriptional regulators
US20050136429A1 (en) 2003-07-03 2005-06-23 Massachusetts Institute Of Technology SIRT1 modulation of adipogenesis and adipose function
WO2005019415A2 (en) 2003-07-07 2005-03-03 The Scripps Research Institute Compositions of orthogonal lysyl-trna and aminoacyl-trna synthetase pairs and uses thereof
JP4555292B2 (en) 2003-08-08 2010-09-29 サンガモ バイオサイエンシズ インコーポレイテッド Methods and compositions for targeted cleavage and recombination
EP2478913A1 (en) 2003-12-01 2012-07-25 Sloan-Kettering Institute For Cancer Research Synthetic HLA binding peptide analogues and uses thereof
JP5060134B2 (en) 2003-12-12 2012-10-31 ガバメント オブ ザ ユナイテッド ステイツ オブ アメリカ・アズ リプレゼンテッド バイ ザ セクレタリー・デパートメント オブ ヘルス アンド ヒューマン サービシーズ Epitope of human cytotoxic T lymphocytes and its non-variable number of non-VNTR (non-variable number of nucleotide repeat sequences) of MUC-1
US7670807B2 (en) 2004-03-10 2010-03-02 East Tennessee State Univ. Research Foundation RNA-dependent DNA polymerase from Geobacillus stearothermophilus
WO2005098043A2 (en) 2004-03-30 2005-10-20 The President And Fellows Of Harvard College Ligand-dependent protein splicing
US7595179B2 (en) 2004-04-19 2009-09-29 Applied Biosystems, Llc Recombinant reverse transcriptases
US7919277B2 (en) 2004-04-28 2011-04-05 Danisco A/S Detection and typing of bacterial strains
US7476734B2 (en) 2005-12-06 2009-01-13 Helicos Biosciences Corporation Nucleotide analogs
US8202841B2 (en) 2004-06-17 2012-06-19 Mannkind Corporation SSX-2 peptide analogs
EP1814896A4 (en) 2004-07-06 2008-07-30 Commercialisation Des Produits NUCLEIC ACID ADAPTER DEPENDENT OF THE TARGET
WO2007008226A2 (en) 2004-08-17 2007-01-18 The President And Fellows Of Harvard College Palladium-catalyzed carbon-carbon bond forming reactions
WO2006023207A2 (en) 2004-08-19 2006-03-02 The United States Of America As Represented By The Secretary Of Health And Human Services, Nih Coacervate of anionic and cationic polymer forming microparticles for the sustained release of therapeutic agents
JP5101288B2 (en) 2004-10-05 2012-12-19 カリフォルニア インスティテュート オブ テクノロジー Aptamer-regulated nucleic acids and uses thereof
US9034650B2 (en) 2005-02-02 2015-05-19 Intrexon Corporation Site-specific serine recombinases and methods of their use
US8178291B2 (en) 2005-02-18 2012-05-15 Monogram Biosciences, Inc. Methods and compositions for determining hypersusceptibility of HIV-1 to non-nucleoside reverse transcriptase inhibitors
JP2006248978A (en) 2005-03-10 2006-09-21 Mebiopharm Co Ltd New liposome preparation
DE602006013134D1 (en) 2005-06-17 2010-05-06 Harvard College ITERATED ADDRESSING REACTION TRANSACTIONS OF NUCLEIC ACID-MEDIATED CHEMISTRY
NZ564359A (en) 2005-06-17 2011-09-30 Mannkind Corp Analogs of pepetides corresponding to class I MHC-restricted T cell epitopes
WO2007011722A2 (en) 2005-07-15 2007-01-25 President And Fellows Of Harvard College Reaction discovery system
US9783791B2 (en) 2005-08-10 2017-10-10 Agilent Technologies, Inc. Mutant reverse transcriptase and methods of use
AU2015252023B2 (en) 2005-08-26 2017-06-29 Dupont Nutrition Biosciences Aps Use
AU2012244264B2 (en) 2005-08-26 2015-08-06 Dupont Nutrition Biosciences Aps Use
ES2398918T3 (en) 2005-08-26 2013-03-22 Dupont Nutrition Biosciences Aps A method and arrangement to vertically support outstanding electrical resistance elements
WO2007037444A1 (en) 2005-09-30 2007-04-05 National University Corporation Hokkaido University Vector for delivering target substance into nucleus or cell
KR100784478B1 (en) 2005-12-05 2007-12-11 한국과학기술원 Method of manufacturing a protein with renal function by simultaneous insertion of functional elements
US20080051317A1 (en) 2005-12-15 2008-02-28 George Church Polypeptides comprising unnatural amino acids, methods for their production and uses therefor
PT2161038E (en) 2006-01-26 2014-03-10 Isis Pharmaceuticals Inc COMPOSITIONS AND THEIR USES DIRECTED TO HUNTINGTIN
EP2604255B1 (en) 2006-05-05 2017-10-25 Molecular Transfer, Inc. Novel reagents for transfection of eukaryotic cells
PL2018441T3 (en) 2006-05-19 2012-03-30 Dupont Nutrition Biosci Aps LABELED MICROORGANISMS AND LABELING METHODS
EP2030015B1 (en) 2006-06-02 2016-02-17 President and Fellows of Harvard College Protein surface remodeling
EP2028272B1 (en) 2006-06-06 2014-01-08 Panasonic Corporation Method of modifying nucleotide chain
US7572618B2 (en) 2006-06-30 2009-08-11 Bristol-Myers Squibb Company Polynucleotides encoding novel PCSK9 variants
WO2008005529A2 (en) 2006-07-07 2008-01-10 The Trustees Columbia University In The City Of New York Cell-mediated directed evolution
US20120322861A1 (en) 2007-02-23 2012-12-20 Barry John Byrne Compositions and Methods for Treating Diseases
AU2008223544B2 (en) 2007-03-02 2014-06-05 Dupont Nutrition Biosciences Aps Cultures with improved phage resistance
WO2009002418A2 (en) 2007-06-21 2008-12-31 Merck & Co., Inc. T-cell peptide epitopes from carcinoembryonic antigen, immunogenic analogs, and uses thereof
FR2919804B1 (en) 2007-08-08 2010-08-27 Erytech Pharma COMPOSITION AND ANTI-TUMOR THERAPEUTIC VACCINE
WO2009033027A2 (en) 2007-09-05 2009-03-12 Medtronic, Inc. Suppression of scn9a gene expression and/or function for the treatment of pain
US20110014616A1 (en) 2009-06-30 2011-01-20 Sangamo Biosciences, Inc. Rapid screening of biologically active nucleases and isolation of nuclease-modified cells
CA2700231C (en) 2007-09-27 2018-09-18 Sangamo Biosciences, Inc. Rapid in vivo identification of biologically active nucleases
US9029524B2 (en) 2007-12-10 2015-05-12 California Institute Of Technology Signal activated RNA interference
EP2087789A1 (en) 2008-02-06 2009-08-12 Heinrich-Heine-Universität Düsseldorf Fto-modified non-human mammal
AU2009212247A1 (en) 2008-02-08 2009-08-13 Sangamo Therapeutics, Inc. Treatment of chronic pain with zinc finger proteins
GB0806562D0 (en) 2008-04-10 2008-05-14 Fermentas Uab Production of nucleic acid
WO2009146179A1 (en) 2008-04-15 2009-12-03 University Of Iowa Research Foundation Zinc finger nuclease for the cftr gene and methods of use thereof
WO2009134808A2 (en) 2008-04-28 2009-11-05 President And Fellows Of Harvard College Supercharged proteins for cell penetration
WO2009132455A1 (en) 2008-04-30 2009-11-05 Paul Xiang-Qin Liu Protein splicing using short terminal split inteins
WO2010011961A2 (en) 2008-07-25 2010-01-28 University Of Georgia Research Foundation, Inc. Prokaryotic rnai-like system and methods of use
FR2934346B1 (en) 2008-07-28 2010-09-03 Claude Benit VALVE FOR SANITARY INSTALLATION AND MULTIFUNCTION DEVICE FOR SANITARY APPARATUS COMPRISING SUCH A VALVE
JP2010033344A (en) 2008-07-29 2010-02-12 Azabu Jui Gakuen Method for expressing uneven distribution of nucleic acid constituent base
EP2159286A1 (en) 2008-09-01 2010-03-03 Consiglio Nazionale Delle Ricerche Method for obtaining oligonucleotide aptamers and uses thereof
US8790664B2 (en) 2008-09-05 2014-07-29 Institut National De La Sante Et De La Recherche Medicale (Inserm) Multimodular assembly useful for intracellular delivery
EP2342336B1 (en) 2008-09-05 2016-12-14 President and Fellows of Harvard College Continuous directed evolution of proteins and nucleic acids
US8636884B2 (en) 2008-09-15 2014-01-28 Abbott Diabetes Care Inc. Cationic polymer based wired enzyme formulations for use in analyte sensors
US20100076057A1 (en) 2008-09-23 2010-03-25 Northwestern University TARGET DNA INTERFERENCE WITH crRNA
WO2010054108A2 (en) 2008-11-06 2010-05-14 University Of Georgia Research Foundation, Inc. Cas6 polypeptides and methods of use
MX337838B (en) 2008-11-07 2016-03-22 Dupont Nutrition Biosci Aps Bifidobacteria crispr sequences.
US20110016540A1 (en) 2008-12-04 2011-01-20 Sigma-Aldrich Co. Genome editing of genes associated with trinucleotide repeat expansion disorders in animals
CN102317473A (en) 2008-12-11 2012-01-11 加利福尼亚太平洋生物科学股份有限公司 Shenzhen tcl new technology co. , ltd
US9175338B2 (en) 2008-12-11 2015-11-03 Pacific Biosciences Of California, Inc. Methods for identifying nucleic acid modifications
WO2010075424A2 (en) 2008-12-22 2010-07-01 The Regents Of University Of California Compositions and methods for downregulating prokaryotic genes
CA2748314C (en) 2009-02-03 2018-10-02 Amunix Operating Inc. Extended recombinant polypeptides and compositions comprising same
US20130022980A1 (en) 2009-02-04 2013-01-24 Lucigen Corporation Rna- and dna-copying enzymes
US20100305197A1 (en) 2009-02-05 2010-12-02 Massachusetts Institute Of Technology Conditionally Active Ribozymes And Uses Thereof
US8389679B2 (en) 2009-02-05 2013-03-05 The Regents Of The University Of California Targeted antimicrobial moieties
BRPI1009221A2 (en) 2009-03-04 2016-03-15 Univ Texas stabilized reverse transcriptase fusion proteins.
SG10201400436PA (en) 2009-03-06 2014-06-27 Synthetic Genomics Inc Methods For Cloning And Manipulating Genomes
EP2406289B1 (en) 2009-03-10 2017-02-22 Baylor Research Institute Antigen presenting cell targeted anti-viral vaccines
CA2760155A1 (en) 2009-04-27 2010-11-11 Pacific Biosciences Of California, Inc. Real-time sequencing methods and systems
JP2012525146A (en) 2009-04-28 2012-10-22 プレジデント アンド フェロウズ オブ ハーバード カレッジ Overcharged protein for cell penetration
WO2010132092A2 (en) 2009-05-12 2010-11-18 The Scripps Research Institute Cytidine deaminase fusions and related methods
US9063156B2 (en) 2009-06-12 2015-06-23 Pacific Biosciences Of California, Inc. Real-time analytical methods and systems
US8569256B2 (en) 2009-07-01 2013-10-29 Protiva Biotherapeutics, Inc. Cationic lipids and methods for the delivery of therapeutic agents
US20120178647A1 (en) 2009-08-03 2012-07-12 The General Hospital Corporation Engineering of zinc finger arrays by context-dependent assembly
EP2462230B1 (en) 2009-08-03 2015-07-15 Recombinetics, Inc. Methods and compositions for targeted gene modification
GB0913681D0 (en) 2009-08-05 2009-09-16 Glaxosmithkline Biolog Sa Immunogenic composition
US8889394B2 (en) 2009-09-07 2014-11-18 Empire Technology Development Llc Multiple domain proteins
EP2494060B1 (en) 2009-10-30 2016-04-27 Synthetic Genomics, Inc. Encoding text into nucleic acid sequences
KR101934923B1 (en) 2009-11-02 2019-04-10 유니버시티 오브 워싱톤 스루 이츠 센터 포 커머셜리제이션 Therapeutic Nuclease Compositions and Methods
WO2011056185A2 (en) 2009-11-04 2011-05-12 President And Fellows Of Harvard College Reactivity-dependent and interaction-dependent pcr
US20110104787A1 (en) 2009-11-05 2011-05-05 President And Fellows Of Harvard College Fusion Peptides That Bind to and Modify Target Nucleic Acid Sequences
ES2693167T3 (en) 2009-11-13 2018-12-07 Inserm - Institut National De La Santé Et De La Recherche Médicale Direct administration of proteins with engineered microvesicles
WO2011068916A1 (en) 2009-12-01 2011-06-09 Intezyne Technologies, Incorporated Pegylated polyplexes for polynucleotide delivery
HUE042177T2 (en) 2009-12-01 2019-06-28 Translate Bio Inc Steroid derivative for the delivery of mrna in human genetic diseases
HUE041436T2 (en) 2009-12-10 2019-05-28 Univ Minnesota Tal-effector-mediated DNA modification
US20130011380A1 (en) 2009-12-18 2013-01-10 Blau Helen M Use of Cytidine Deaminase-Related Agents to Promote Demethylation and Cell Reprogramming
KR101866578B1 (en) 2010-01-22 2018-06-11 다우 아그로사이언시즈 엘엘씨 Targeted genomic alteration
NZ600546A (en) 2010-01-22 2014-08-29 Dow Agrosciences Llc Excision of transgenes in genetically modified organisms
US9198983B2 (en) 2010-01-25 2015-12-01 Alnylam Pharmaceuticals, Inc. Compositions and methods for inhibiting expression of Mylip/Idol gene
EP2542676A1 (en) 2010-03-05 2013-01-09 Synthetic Genomics, Inc. Methods for cloning and manipulating genomes
GB201004575D0 (en) 2010-03-19 2010-05-05 Immatics Biotechnologies Gmbh Composition of tumor associated peptides and related anti cancer vaccine for the treatment of gastric cancer and other cancers
WO2011123830A2 (en) 2010-04-02 2011-10-06 Amunix Operating Inc. Alpha 1-antitrypsin compositions and methods of making and using same
WO2011140284A2 (en) 2010-05-04 2011-11-10 Fred Hutchinson Cancer Research Center Conditional superagonist ctl ligands for the promotion of tumor-specific ctl responses
SG185481A1 (en) 2010-05-10 2012-12-28 Univ California Endoribonuclease compositions and methods of use thereof
JP6208580B2 (en) 2010-05-17 2017-10-04 サンガモ セラピューティクス, インコーポレイテッド Novel DNA binding protein and use thereof
GB201008267D0 (en) 2010-05-18 2010-06-30 Univ Edinburgh Cationic lipids
CN103154256A (en) 2010-05-27 2013-06-12 海因里希·佩特研究所莱比锡试验病毒学研究所-民法基金会 Tailored recombinases for recombination of asymmetric target sites in various retroviral strains
US8748667B2 (en) 2010-06-04 2014-06-10 Sirna Therapeutics, Inc. Low molecular weight cationic lipids for oligonucleotide delivery
EP2392208B1 (en) 2010-06-07 2016-05-04 Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Fusion proteins comprising a DNA-binding domain of a Tal effector protein and a non-specific cleavage domain of a restriction nuclease and their use
AU2011265733B2 (en) 2010-06-14 2014-04-17 Iowa State University Research Foundation, Inc. Nuclease activity of TAL effector and Foki fusion protein
WO2012016186A1 (en) 2010-07-29 2012-02-02 President And Fellows Of Harvard College Macrocyclic kinase inhibitors and uses thereof
WO2012019168A2 (en) 2010-08-06 2012-02-09 Moderna Therapeutics, Inc. Engineered nucleic acids and methods of use thereof
US8900814B2 (en) 2010-08-13 2014-12-02 Kyoto University Variant reverse transcriptase
AU2011305572B2 (en) 2010-09-20 2016-08-04 Diane Goll Microencapsulation process and product
CN103261213A (en) 2010-10-20 2013-08-21 杜邦营养生物科学有限公司 Lactococcus crispr-as sequences
US9458484B2 (en) 2010-10-22 2016-10-04 Bio-Rad Laboratories, Inc. Reverse transcriptase mixtures with improved storage stability
BR112013011194B1 (en) 2010-11-05 2020-12-22 Novavax, Inc micellar particle, pharmaceutical composition comprising it, use of a micellar particle and method of preparing a micelle
CN103327970A (en) 2010-11-26 2013-09-25 约翰内斯堡威特沃特斯兰德大学 Polymeric matrix of polymer-lipid nanoparticles as a pharmaceutical dosage form
KR101255338B1 (en) 2010-12-15 2013-04-16 포항공과대학교 산학협력단 Polynucleotide delivering complex for a targeting cell
WO2012083017A2 (en) 2010-12-16 2012-06-21 Celgene Corporation Controlled release oral dosage forms of poorly soluble drugs and uses thereof
EP3202903B1 (en) 2010-12-22 2020-02-12 President and Fellows of Harvard College Continuous directed evolution
US9499592B2 (en) 2011-01-26 2016-11-22 President And Fellows Of Harvard College Transcription activator-like effectors
KR101818126B1 (en) 2011-02-09 2018-01-15 (주)바이오니아 Reverse Transcriptase Having Improved Thermostability
US9528124B2 (en) 2013-08-27 2016-12-27 Recombinetics, Inc. Efficient non-meiotic allele introgression
US9200045B2 (en) 2011-03-11 2015-12-01 President And Fellows Of Harvard College Small molecule-dependent inteins and uses thereof
US9164079B2 (en) 2011-03-17 2015-10-20 Greyledge Technologies Llc Systems for autologous biological therapeutics
US20120244601A1 (en) 2011-03-22 2012-09-27 Bertozzi Carolyn R Riboswitch based inducible gene expression platform
JP2012210172A (en) 2011-03-30 2012-11-01 Japan Science & Technology Agency Liposome varying inner material composition responding to external environment
US8709466B2 (en) 2011-03-31 2014-04-29 International Business Machines Corporation Cationic polymers for antimicrobial applications and delivery of bioactive materials
EP2694089B1 (en) 2011-04-05 2024-06-05 Cellectis New tale-protein scaffolds and uses thereof
US20140128449A1 (en) 2011-04-07 2014-05-08 The Board Of Regents Of The University Of Texas System Oligonucleotide modulation of splicing
US10092660B2 (en) 2011-04-25 2018-10-09 Stc.Unm Solid compositions for pharmaceutical use
KR102068107B1 (en) 2011-04-27 2020-01-20 아미리스 인코퍼레이티드 Methods for genomic modification
WO2012158986A2 (en) 2011-05-17 2012-11-22 Transposagen Biopharmaceuticals, Inc. Methods for site-specific genetic modification in stem cells using xanthomonas tal nucleases (xtn) for the creation of model organisms
US8691750B2 (en) 2011-05-17 2014-04-08 Axolabs Gmbh Lipids and compositions for intracellular delivery of biologically active compounds
WO2012158985A2 (en) 2011-05-17 2012-11-22 Transposagen Biopharmaceuticals, Inc. Methods for site-specific genetic modification in spermatogonial stem cells using zinc finger nuclease (zfn) for the creation of model organisms
WO2012164565A1 (en) 2011-06-01 2012-12-06 Yeda Research And Development Co. Ltd. Compositions and methods for downregulating prokaryotic genes
PL3586861T3 (en) 2011-06-08 2022-05-23 Translate Bio, Inc. COMPOSITIONS OF LIPID nanoparticles and MRN DELIVERY METHODS
CA2843853A1 (en) 2011-07-01 2013-01-10 President And Fellows Of Harvard College Macrocyclic insulin-degrading enzyme (ide) inhibitors and uses thereof
CA2841710C (en) 2011-07-15 2021-03-16 The General Hospital Corporation Methods of transcription activator like effector assembly
EP2734622B1 (en) 2011-07-19 2018-09-05 Vivoscript, Inc. Compositions and methods for re-programming cells without genetic modification for repairing cartilage damage
JP6261500B2 (en) 2011-07-22 2018-01-17 プレジデント アンド フェローズ オブ ハーバード カレッジ Evaluation and improvement of nuclease cleavage specificity
EP3384938A1 (en) 2011-09-12 2018-10-10 Moderna Therapeutics, Inc. Engineered nucleic acids and methods of use thereof
EP2755986A4 (en) 2011-09-12 2015-05-20 Moderna Therapeutics Inc MODIFIED NUCLEIC ACIDS AND METHODS OF USE
EP2755675B1 (en) 2011-09-12 2018-06-06 Amunix Operating Inc. Glucagon-like peptide-2 compositions and methods of making and using same
WO2013047844A1 (en) 2011-09-28 2013-04-04 株式会社リボミック Ngf aptamer and application thereof
EP2761006B1 (en) 2011-09-28 2016-12-14 Zera Intein Protein Solutions, S.L. Split inteins and uses thereof
CN103088008B (en) 2011-10-31 2014-08-20 中国科学院微生物研究所 Cytidine deaminase, its coding gene, and applications of cytidine deaminase and its coding gene
EP2788487B1 (en) 2011-12-08 2018-04-04 Sarepta Therapeutics, Inc. Oligonucleotide analogues targeting human lmna
CN104114572A (en) 2011-12-16 2014-10-22 现代治疗公司 Modified nucleosides, nucleotides and nucleic acid compositions
PH12014501360B1 (en) 2011-12-16 2022-05-20 Targetgene Biotechnologies Ltd Compositions and methods for modifying a predetermined target nucleic acid sequence
GB201122458D0 (en) 2011-12-30 2012-02-08 Univ Wageningen Modified cascade ribonucleoproteins and uses thereof
WO2013119602A1 (en) 2012-02-06 2013-08-15 President And Fellows Of Harvard College Arrdc1-mediated microvesicles (armms) and uses thereof
WO2013120022A2 (en) 2012-02-08 2013-08-15 Seneb Biosciences, Inc. Treatment of hypoglycemia
BR112014020694A2 (en) 2012-02-15 2018-05-08 Amunix Operating Inc. factor viii fusion protein comprising extended recombinant polypeptide (xten) fusion factor polypeptide and its method of manufacture, nucleic acid, vectors, host cell, as well as pharmaceutical composition and its use in the treatment of coagulopathy, bleeding episode and hemophilia a
RU2650811C2 (en) 2012-02-24 2018-04-17 Фред Хатчинсон Кэнсер Рисерч Сентер Compositions and methods for treatment of hemoglobinopathies
CN117462693A (en) 2012-02-27 2024-01-30 阿穆尼克斯运营公司 XTEN conjugate compositions and methods of making the same
CN108285491B (en) 2012-02-29 2021-08-10 桑格摩生物科学股份有限公司 Methods and compositions for treating huntington's disease
CN104364394B (en) 2012-03-17 2019-02-22 加州大学评议会 Rapid diagnosis and individualized treatment of acne
US9637739B2 (en) 2012-03-20 2017-05-02 Vilnius University RNA-directed DNA cleavage by the Cas9-crRNA complex
WO2013141680A1 (en) 2012-03-20 2013-09-26 Vilnius University RNA-DIRECTED DNA CLEAVAGE BY THE Cas9-crRNA COMPLEX
WO2013152359A1 (en) 2012-04-06 2013-10-10 The Regents Of The University Of California Novel tetrazines and method of synthesizing the same
AU2013254857B2 (en) 2012-04-23 2018-04-26 Bayer Cropscience Nv Targeted genome engineering in plants
CA2872124C (en) 2012-05-02 2022-05-03 Dow Agrosciences Llc Plant with targeted modification of the endogenous malate dehydrogenase gene
AU2013259647B2 (en) 2012-05-07 2018-11-08 Corteva Agriscience Llc Methods and compositions for nuclease-mediated targeted integration of transgenes
US11120889B2 (en) 2012-05-09 2021-09-14 Georgia Tech Research Corporation Method for synthesizing a nuclease with reduced off-site cleavage
EP3241902B1 (en) 2012-05-25 2018-02-28 The Regents of The University of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
KR102437522B1 (en) 2012-05-25 2022-08-26 셀렉티스 Methods for engineering allogeneic and immunosuppressive resistant t cell for immunotherapy
US20150017136A1 (en) 2013-07-15 2015-01-15 Cellectis Methods for engineering allogeneic and highly active t cell for immunotherapy
US20140056868A1 (en) 2012-05-30 2014-02-27 University of Washington Center for Commercialization Supercoiled MiniVectors as a Tool for DNA Repair, Alteration and Replacement
US9102936B2 (en) 2012-06-11 2015-08-11 Agilent Technologies, Inc. Method of adaptor-dimer subtraction using a CRISPR CAS6 protein
CN104540382A (en) 2012-06-12 2015-04-22 弗·哈夫曼-拉罗切有限公司 Methods and compositions for generating conditional knock-out alleles
EP2674501A1 (en) 2012-06-14 2013-12-18 Agence nationale de sécurité sanitaire de l'alimentation,de l'environnement et du travail Method for detecting and identifying enterohemorrhagic Escherichia coli
WO2013188638A2 (en) 2012-06-15 2013-12-19 The Regents Of The University Of California Endoribonucleases and methods of use thereof
EP2861737B1 (en) 2012-06-19 2019-04-17 Regents Of The University Of Minnesota Gene targeting in plants using dna viruses
US9267127B2 (en) 2012-06-21 2016-02-23 President And Fellows Of Harvard College Evolution of bond-forming enzymes
DK3431497T3 (en) 2012-06-27 2022-10-10 Univ Princeton Cleaved INTEINS, CONJUGATES AND USES THEREOF
SG11201408736SA (en) 2012-06-29 2015-03-30 Massachusetts Inst Technology Massively parallel combinatorial genetics
US9125508B2 (en) 2012-06-30 2015-09-08 Seasons 4, Inc. Collapsible tree system
DK3444342T3 (en) 2012-07-11 2020-08-24 Sangamo Therapeutics Inc METHODS AND COMPOSITIONS FOR THE TREATMENT OF LYSOSOMAL DEPOSIT DISEASES
WO2014011901A2 (en) 2012-07-11 2014-01-16 Sangamo Biosciences, Inc. Methods and compositions for delivery of biologics
KR102530118B1 (en) 2012-07-25 2023-05-08 더 브로드 인스티튜트, 인코퍼레이티드 Inducible dna binding proteins and genome perturbation tools and applications thereof
US10058078B2 (en) 2012-07-31 2018-08-28 Recombinetics, Inc. Production of FMDV-resistant livestock by allele substitution
JP6340366B2 (en) 2012-07-31 2018-06-06 イェダ リサーチ アンド デベロップメント カンパニー リミテッド Methods for diagnosing and treating motor neuron disease
HK1207111A1 (en) 2012-08-03 2016-01-22 加利福尼亚大学董事会 Methods and compositions for controlling gene expression by rna processing
SI2890780T1 (en) 2012-08-29 2020-11-30 Sangamo Therapeutics, Inc. Methods and compositions for treating a genetic condition
DK2893022T3 (en) 2012-09-04 2020-07-27 Scripps Research Inst CHIMERIC POLYPEPTIDES WITH TARGETED BINDING SPECIFICITY
WO2014039513A2 (en) 2012-09-04 2014-03-13 The Trustees Of The University Of Pennsylvania Inhibition of diacylglycerol kinase to augment adoptive t cell transfer
CN104769103B (en) 2012-09-04 2018-06-08 塞勒克提斯公司 Multi-chain chimeric antigen receptors and uses thereof
CA2884162C (en) 2012-09-07 2020-12-29 Dow Agrosciences Llc Fad3 performance loci and corresponding target site specific binding proteins capable of inducing targeted breaks
UA118090C2 (en) 2012-09-07 2018-11-26 ДАУ АГРОСАЙЄНСІЗ ЕлЕлСі METHOD OF THE METHER OF THE METHOD OF THE INTEGRED EMBLED SUBSTITUTION OF NUCLEIC NUCLE OF NUCLEIC ACID AND NON-NUCLIC ACID AND NON-SPECIAL SPECIES
US20140075593A1 (en) 2012-09-07 2014-03-13 Dow Agrosciences Llc Fluorescence activated cell sorting (facs) enrichment to generate plants
US9557336B2 (en) 2012-09-07 2017-01-31 University Of Rochester Methods and compositions for site-specific labeling of peptides and proteins
UA119135C2 (en) 2012-09-07 2019-05-10 ДАУ АГРОСАЙЄНСІЗ ЕлЕлСі Engineered transgene integration platform (etip) for gene targeting and trait stacking
WO2014043143A1 (en) 2012-09-11 2014-03-20 Life Technologies Corporation Nucleic acid amplification
GB201216564D0 (en) 2012-09-17 2012-10-31 Univ Edinburgh Genetically edited animal
US10612053B2 (en) 2012-09-18 2020-04-07 The Translational Genomics Research Institute Isolated genes and transgenic organisms for producing biofuels
US9181535B2 (en) 2012-09-24 2015-11-10 The Chinese University Of Hong Kong Transcription activator-like effector nucleases (TALENs)
WO2014055778A2 (en) 2012-10-03 2014-04-10 Agrivida, Inc. Multiprotein expression cassettes
JO3470B1 (en) 2012-10-08 2020-07-05 Merck Sharp & Dohme 5-phenoxy-3h-pyrimidin-4-one derivatives and their use as hiv reverse transcriptase inhibitors
EP2906684B8 (en) 2012-10-10 2020-09-02 Sangamo Therapeutics, Inc. T cell modifying compounds and uses thereof
EP3789405A1 (en) 2012-10-12 2021-03-10 The General Hospital Corporation Transcription activator-like effector (tale) - lysine-specific demethylase 1 (lsd1) fusion proteins
EP4357457B1 (en) 2012-10-23 2024-10-16 Toolgen Incorporated Composition for cleaving a target dna comprising a guide rna specific for the target dna and cas protein-encoding nucleic acid or cas protein, and use thereof
US20140115728A1 (en) 2012-10-24 2014-04-24 A. Joseph Tector Double knockout (gt/cmah-ko) pigs, organs and tissues
CA2889502A1 (en) 2012-10-30 2014-05-08 Recombinetics, Inc. Control of sexual maturation in animals
CA2890160A1 (en) 2012-10-31 2014-05-08 Cellectis Coupling herbicide resistance with targeted insertion of transgenes in plants
BR112015009931A2 (en) 2012-10-31 2017-12-05 Two Blades Found gene, protein, nucleic acid molecule, vector, host cell, methods for in vitro preparation of a mutant gene and for generating a plant, mutant plant, product, mutant plant seed, antibody, use of an antibody, gene probe, pair of primer oligonucleotides, and use of genetic probe
US20150315576A1 (en) 2012-11-01 2015-11-05 Massachusetts Institute Of Technology Genetic device for the controlled destruction of dna
WO2014071219A1 (en) 2012-11-01 2014-05-08 Factor Bioscience Inc. Methods and products for expressing proteins in cells
US20140127752A1 (en) 2012-11-07 2014-05-08 Zhaohui Zhou Method, composition, and reagent kit for targeted genomic enrichment
CA2890824A1 (en) 2012-11-09 2014-05-15 Marco Archetti Diffusible factors and cancer cells
WO2014081855A1 (en) 2012-11-20 2014-05-30 Universite De Montreal Methods and compositions for muscular dystrophies
WO2014081730A1 (en) 2012-11-20 2014-05-30 Cold Spring Harbor Laboratory Mutations in solanaceae plants that modulate shoot architecture and enhance yield-related phenotypes
CN104884626A (en) 2012-11-20 2015-09-02 杰.尔.辛普洛公司 TAL-mediated transfer DNA insertion
SG11201504038XA (en) 2012-11-27 2015-06-29 Childrens Medical Center Targeting bcl11a distal regulatory elements for fetal hemoglobin reinduction
CA2892551A1 (en) 2012-11-29 2014-06-05 North Carolina State University Synthetic pathway for biological carbon dioxide sequestration
WO2014085830A2 (en) 2012-11-30 2014-06-05 The Parkinson's Institute Screening assays for therapeutics for parkinson's disease
WO2014082644A1 (en) 2012-11-30 2014-06-05 WULFF, Peter, Samuel Circular rna for inhibition of microrna
WO2014089212A1 (en) 2012-12-05 2014-06-12 Sangamo Biosciences, Inc. Methods and compositions for regulation of metabolic disorders
US9447422B2 (en) 2012-12-06 2016-09-20 Synthetic Genomics, Inc. Autonomous replication sequences and episomal DNA molecules
EP3363902B1 (en) 2012-12-06 2019-11-27 Sigma Aldrich Co. LLC Crispr-based genome modification and regulation
WO2014089533A2 (en) 2012-12-06 2014-06-12 Synthetic Genomics, Inc. Algal mutants having a locked-in high light acclimated phenotype
WO2014089348A1 (en) 2012-12-07 2014-06-12 Synthetic Genomics, Inc. Nannochloropsis spliced leader sequences and uses therefor
EP2928303A4 (en) 2012-12-07 2016-07-13 Haplomics Inc REPAIR OF FACTOR VIII MUTATION AND INDUCTION OF TOLERANCE
WO2014093479A1 (en) 2012-12-11 2014-06-19 Montana State University Crispr (clustered regularly interspaced short palindromic repeats) rna-guided control of gene regulation
DK2931897T3 (en) 2012-12-12 2018-02-05 Broad Inst Inc CONSTRUCTION, MODIFICATION AND OPTIMIZATION OF SYSTEMS, PROCEDURES AND COMPOSITIONS FOR SEQUENCE MANIPULATION AND THERAPEUTICAL APPLICATIONS
DK2931898T3 (en) 2012-12-12 2016-06-20 Massachusetts Inst Technology CONSTRUCTION AND OPTIMIZATION OF SYSTEMS, PROCEDURES AND COMPOSITIONS FOR SEQUENCE MANIPULATION WITH FUNCTIONAL DOMAINS
CN105121648B (en) 2012-12-12 2021-05-07 布罗德研究所有限公司 Systems, methods and engineering of guide compositions for sequence manipulation
EP2931899A1 (en) 2012-12-12 2015-10-21 The Broad Institute, Inc. Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof
WO2014093718A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Methods, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
WO2014093694A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Crispr-cas nickase systems, methods and compositions for sequence manipulation in eukaryotes
AU2013359262C1 (en) 2012-12-12 2021-05-13 Massachusetts Institute Of Technology CRISPR-Cas component systems, methods and compositions for sequence manipulation
CA2894684A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Engineering and optimization of improved crispr-cas systems, methods and enzyme compositions for sequence manipulation in eukaryotes
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
WO2014093709A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
CA2895117A1 (en) 2012-12-13 2014-06-19 James W. Bing Precision gene targeting to a particular locus in maize
KR102240135B1 (en) 2012-12-13 2021-04-14 다우 아그로사이언시즈 엘엘씨 Dna detection methods for site specific nuclease activity
JP6419082B2 (en) 2012-12-13 2018-11-07 マサチューセッツ インスティテュート オブ テクノロジー Recombinase-based logic / memory system
CA2895155C (en) 2012-12-17 2021-07-06 President And Fellows Of Harvard College Rna-guided human genome engineering
US9708589B2 (en) 2012-12-18 2017-07-18 Monsanto Technology Llc Compositions and methods for custom site-specific DNA recombinases
EP2934097B1 (en) 2012-12-21 2018-05-02 Cellectis Potatoes with reduced cold-induced sweetening
ES2953523T3 (en) 2012-12-27 2023-11-14 Keygene Nv Method for inducing a directed translocation in a plant
LT2943579T (en) 2013-01-10 2018-11-12 Dharmacon, Inc. Libraries and methods for generating molecules
EP2943060A4 (en) 2013-01-14 2016-11-09 Recombinetics Inc Hornless livestock
EP3919505B1 (en) 2013-01-16 2023-08-30 Emory University Uses of cas9-nucleic acid complexes
CN103233028B (en) 2013-01-25 2015-05-13 南京徇齐生物技术有限公司 Specie limitation-free eucaryote gene targeting method having no bio-safety influence and helical-structure DNA sequence
KR20150133695A (en) 2013-02-05 2015-11-30 유니버시티 오브 조지아 리서치 파운데이션, 인코포레이티드 Cell lines for virus production and methods of use
US10660943B2 (en) 2013-02-07 2020-05-26 The Rockefeller University Sequence specific antimicrobials
WO2014127287A1 (en) 2013-02-14 2014-08-21 Massachusetts Institute Of Technology Method for in vivo tergated mutagenesis
DK2963113T3 (en) 2013-02-14 2020-02-17 Univ Osaka PROCEDURE FOR ISOLATING SPECIFIC GENOMREGION USING MOLECULES BINDING SPECIFIC TO ENDOGENIC DNA SEQUENCE
MX2015010841A (en) 2013-02-20 2016-05-09 Regeneron Pharma Genetic modification of rats.
WO2014128659A1 (en) 2013-02-21 2014-08-28 Cellectis Method to counter-select cells or organisms by linking loci to nuclease components
ES2522765B2 (en) 2013-02-22 2015-03-18 Universidad De Alicante Method to detect spacer insertions in CRISPR structures
US10227610B2 (en) 2013-02-25 2019-03-12 Sangamo Therapeutics, Inc. Methods and compositions for enhancing nuclease-mediated gene disruption
WO2014131833A1 (en) 2013-02-27 2014-09-04 Helmholtz Zentrum München Deutsches Forschungszentrum Für Gesundheit Und Umwelt (Gmbh) Gene editing in the oocyte by cas9 nucleases
US10047366B2 (en) 2013-03-06 2018-08-14 The Johns Hopkins University Telomerator-a tool for chromosome engineering
WO2014143381A1 (en) 2013-03-09 2014-09-18 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
HK1217968A1 (en) 2013-03-12 2017-01-27 桑格摩生物科学股份有限公司 Methods and compositions for modification of hla
RU2694686C2 (en) 2013-03-12 2019-07-16 Е.И.Дюпон Де Немур Энд Компани Methods for identifying variant recognition sites for rare-cutting engineered double-strand-break-inducing agents and compositions and uses thereof
EP2970923B1 (en) 2013-03-13 2018-04-11 President and Fellows of Harvard College Mutants of cre recombinase
WO2014153118A1 (en) 2013-03-14 2014-09-25 The Board Of Trustees Of The Leland Stanford Junior University Treatment of diseases and conditions associated with dysregulation of mammalian target of rapamycin complex 1 (mtorc1)
US20160184458A1 (en) 2013-03-14 2016-06-30 Shire Human Genetic Therapies, Inc. Mrna therapeutic compositions and use to treat diseases and disorders
US20140283156A1 (en) 2013-03-14 2014-09-18 Cold Spring Harbor Laboratory Trans-splicing ribozymes and silent recombinases
MX374090B (en) 2013-03-14 2025-03-05 Caribou Biosciences Inc COMPOSITIONS AND METHODS OF NUCLEIC ACIDS DIRECTED TO NUCLEIC ACID.
AU2014227653B2 (en) 2013-03-15 2017-04-20 The General Hospital Corporation Using RNA-guided foki nucleases (RFNs) to increase specificity for RNA-guided genome editing
US11332719B2 (en) 2013-03-15 2022-05-17 The Broad Institute, Inc. Recombinant virus and preparations thereof
US9234213B2 (en) 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
US10760064B2 (en) 2013-03-15 2020-09-01 The General Hospital Corporation RNA-guided targeting of genetic and epigenomic regulatory proteins to specific genomic loci
US20160046959A1 (en) 2013-03-15 2016-02-18 Carlisle P. Landel Reproducible method for testis-mediated genetic modification (tgm) and sperm-mediated genetic modification (sgm)
WO2014144094A1 (en) 2013-03-15 2014-09-18 J.R. Simplot Company Tal-mediated transfer dna insertion
US20140349400A1 (en) 2013-03-15 2014-11-27 Massachusetts Institute Of Technology Programmable Modification of DNA
US20140273230A1 (en) 2013-03-15 2014-09-18 Sigma-Aldrich Co., Llc Crispr-based genome modification and regulation
US20140273235A1 (en) 2013-03-15 2014-09-18 Regents Of The University Of Minnesota ENGINEERING PLANT GENOMES USING CRISPR/Cas SYSTEMS
HRP20220803T1 (en) 2013-03-15 2022-09-30 Cibus Us Llc Methods and compositions for increasing efficiency of targeted gene modification using oligonucleotide-mediated gene repair
WO2014204578A1 (en) 2013-06-21 2014-12-24 The General Hospital Corporation Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
JP6346266B2 (en) 2013-03-21 2018-06-20 サンガモ セラピューティクス, インコーポレイテッド Targeted disruption of T cell receptor genes using engineered zinc finger protein nucleases
EP2981614A1 (en) 2013-04-02 2016-02-10 Bayer CropScience NV Targeted genome engineering in eukaryotes
AU2014248119B2 (en) 2013-04-03 2019-06-20 Memorial Sloan-Kettering Cancer Center Effective generation of tumor-targeted T-cells derived from pluripotent stem cells
JP6576904B2 (en) 2013-04-04 2019-09-18 トラスティーズ・オブ・ダートマス・カレッジ Compositions and methods for in vivo excision of HIV-1 proviral DNA
JP2016522679A (en) 2013-04-04 2016-08-04 プレジデント アンド フェローズ オブ ハーバード カレッジ Therapeutic use of genome editing with the CRISPR / Cas system
EP2981166B1 (en) 2013-04-05 2020-09-09 Dow AgroSciences LLC Methods and compositions for integration of an exogenous sequence within the genome of plants
US20150056629A1 (en) 2013-04-14 2015-02-26 Katriona Guthrie-Honea Compositions, systems, and methods for detecting a DNA sequence
SI2986729T1 (en) 2013-04-16 2019-02-28 Regeneron Pharmaceuticals, Inc. Targeted modification of rat genome
WO2014172458A1 (en) 2013-04-16 2014-10-23 University Of Washington Through Its Center For Commercialization Activating an alternative pathway for homology-directed repair to stimulate targeted gene correction and genome engineering
US20160186208A1 (en) 2013-04-16 2016-06-30 Whitehead Institute For Biomedical Research Methods of Mutating, Modifying or Modulating Nucleic Acid in a Cell or Nonhuman Mammal
US10053725B2 (en) 2013-04-23 2018-08-21 President And Fellows Of Harvard College In situ interaction determination
EP2796558A1 (en) 2013-04-23 2014-10-29 Rheinische Friedrich-Wilhelms-Universität Bonn Improved gene targeting and nucleic acid carrier molecule, in particular for use in plants
CN103224947B (en) 2013-04-28 2015-06-10 陕西师范大学 Gene targeting system
EP3546484B1 (en) 2013-05-10 2021-09-08 Whitehead Institute for Biomedical Research In vitro production of red blood cells with sortaggable proteins
US10604771B2 (en) 2013-05-10 2020-03-31 Sangamo Therapeutics, Inc. Delivery methods and compositions for nuclease-mediated genome engineering
PL3546572T3 (en) 2013-05-13 2024-07-22 Cellectis Cd19 specific chimeric antigen receptor and uses thereof
MX2015015638A (en) 2013-05-13 2016-10-28 Cellectis Methods for engineering highly active t cell for immunotherapy.
CN105683376A (en) 2013-05-15 2016-06-15 桑格摩生物科学股份有限公司 Methods and compositions for treating genetic conditions
WO2014186686A2 (en) 2013-05-17 2014-11-20 Two Blades Foundation Targeted mutagenesis and genome engineering in plants using rna-guided cas nucleases
EP3778899A1 (en) 2013-05-22 2021-02-17 Northwestern University Rna-directed dna cleavage and gene editing by cas9 enzyme from neisseria meningitidis
US20160122774A1 (en) 2013-05-29 2016-05-05 Cellectis A method for producing precise dna cleavage using cas9 nickase activity
US11414695B2 (en) 2013-05-29 2022-08-16 Agilent Technologies, Inc. Nucleic acid enrichment using Cas9
WO2014191128A1 (en) 2013-05-29 2014-12-04 Cellectis Methods for engineering t cells for immunotherapy by using rna-guided cas nuclease system
US11685935B2 (en) 2013-05-29 2023-06-27 Cellectis Compact scaffold of Cas9 in the type II CRISPR system
WO2014194190A1 (en) 2013-05-30 2014-12-04 The Penn State Research Foundation Gene targeting and genetic modification of plants via rna-guided genome editing
JP6488283B2 (en) 2013-05-31 2019-03-20 セレクティスCellectis LAGLIDADG homing endonuclease that cleaves CC chemokine receptor type 5 (CCR5) gene and its use
US10000746B2 (en) 2013-05-31 2018-06-19 Cellectis LAGLIDADG homing endonuclease cleaving the T cell receptor alpha gene and uses thereof
US20140359796A1 (en) 2013-05-31 2014-12-04 Recombinetics, Inc. Genetically sterile animals
EP4596565A3 (en) 2013-06-04 2025-11-05 President And Fellows Of Harvard College Rna-guided transcriptional regulation
US20140356956A1 (en) 2013-06-04 2014-12-04 President And Fellows Of Harvard College RNA-Guided Transcriptional Regulation
EP3004370B1 (en) 2013-06-05 2024-08-21 Duke University Rna-guided gene editing and gene regulation
US9593356B2 (en) 2013-06-11 2017-03-14 Takara Bio Usa, Inc. Protein enriched microvesicles and methods of making and using the same
US9982277B2 (en) 2013-06-11 2018-05-29 The Regents Of The University Of California Methods and compositions for target DNA modification
US20150315252A1 (en) 2013-06-11 2015-11-05 Clontech Laboratories, Inc. Protein enriched microvesicles and methods of making and using the same
CN105531372A (en) 2013-06-14 2016-04-27 塞尔克蒂斯股份有限公司 Non-transgenic genome editing methods in plants
WO2014204724A1 (en) 2013-06-17 2014-12-24 The Broad Institute Inc. Delivery, engineering and optimization of tandem guide systems, methods and compositions for sequence manipulation
KR20160019553A (en) 2013-06-17 2016-02-19 더 브로드 인스티튜트, 인코퍼레이티드 Delivery, engineering and optimization of systems, methods and compositions for targeting and modeling diseases and disorders of post mitotic cells
EP3725885A1 (en) 2013-06-17 2020-10-21 The Broad Institute, Inc. Functional genomics using crispr-cas systems, compositions methods, screens and applications thereof
EP3011030B1 (en) 2013-06-17 2023-11-08 The Broad Institute, Inc. Optimized crispr-cas double nickase systems, methods and compositions for sequence manipulation
MX2015017312A (en) 2013-06-17 2017-04-10 Broad Inst Inc SUPPLY AND USE OF CRISPR-CAS COMPOSITIONS, VECTORS AND SYSTEMS FOR DIRECTED MODIFICATION AND HEPATIC THERAPY.
WO2014204723A1 (en) 2013-06-17 2014-12-24 The Broad Institute Inc. Oncogenic models based on delivery and use of the crispr-cas systems, vectors and compositions
KR20250012194A (en) 2013-06-17 2025-01-23 더 브로드 인스티튜트, 인코퍼레이티드 Delivery, use and therapeutic applications of the crispr-cas systems and compositions for targeting disorders and diseases using viral components
BR112015031639A2 (en) 2013-06-19 2019-09-03 Sigma Aldrich Co Llc target integration
CA2915779A1 (en) 2013-06-25 2014-12-31 Cellectis Modified diatoms for biofuel production
US20160369268A1 (en) 2013-07-01 2016-12-22 The Board Of Regents Of The University Of Texas System Transcription activator-like effector (tale) libraries and methods of synthesis and use
JP7120717B2 (en) 2013-07-09 2022-08-17 プレジデント アンド フェローズ オブ ハーバード カレッジ Multiple RNA-guided genome editing
JP2016528890A (en) 2013-07-09 2016-09-23 プレジデント アンド フェローズ オブ ハーバード カレッジ Therapeutic use of genome editing using the CRISPR / Cas system
EP3019005B1 (en) 2013-07-10 2019-02-20 EffStock, LLC Mrap2 knockouts
WO2015006294A2 (en) 2013-07-10 2015-01-15 President And Fellows Of Harvard College Orthogonal cas9 proteins for rna-guided gene regulation and editing
US10435731B2 (en) 2013-07-10 2019-10-08 Glykos Finland Oy Multiple proteases deficient filamentous fungal cells and methods of use thereof
SMT202100691T1 (en) 2013-07-11 2022-01-10 Modernatx Inc Compositions comprising synthetic polynucleotides encoding crispr related proteins and synthetic sgrnas and methods of use
CN106222197A (en) 2013-07-16 2016-12-14 中国科学院上海生命科学研究院 Plant Genome pointed decoration method
US9663782B2 (en) 2013-07-19 2017-05-30 Larix Bioscience Llc Methods and compositions for producing double allele knock outs
GB201313235D0 (en) 2013-07-24 2013-09-04 Univ Edinburgh Antiviral Compositions Methods and Animals
US10563225B2 (en) 2013-07-26 2020-02-18 President And Fellows Of Harvard College Genome engineering
CN103388006B (en) 2013-07-26 2015-10-28 华东师范大学 A kind of construction process of site-directed point mutation
US10421957B2 (en) 2013-07-29 2019-09-24 Agilent Technologies, Inc. DNA assembly using an RNA-programmable nickase
WO2015017866A1 (en) 2013-08-02 2015-02-05 Enevolv, Inc. Processes and host cells for genome, pathway, and biomolecular engineering
ITTO20130669A1 (en) 2013-08-05 2015-02-06 Consiglio Nazionale Ricerche ADENO-ASSOCIATED MOMCULAR-SPECIFIC VECTOR AND ITS EMPLOYMENT IN THE TREATMENT OF MUSCLE PATHOLOGIES
US20150044192A1 (en) 2013-08-09 2015-02-12 President And Fellows Of Harvard College Methods for identifying a target site of a cas9 nuclease
WO2015021426A1 (en) 2013-08-09 2015-02-12 Sage Labs, Inc. A crispr/cas system-based novel fusion protein and its application in genome editing
WO2015024017A2 (en) 2013-08-16 2015-02-19 President And Fellows Of Harvard College Rna polymerase, methods of purification and methods of use
WO2015021990A1 (en) 2013-08-16 2015-02-19 University Of Copenhagen Rna probing method and reagents
USRE48801E1 (en) 2013-08-20 2021-11-02 Vib Vzw Inhibition of a lncRNA for treatment of melanoma
CA3109801C (en) 2013-08-22 2024-01-09 Andrew Cigan Plant genome modification using guide rna/cas endonuclease systems and methods of use
US9359599B2 (en) 2013-08-22 2016-06-07 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
GB201315321D0 (en) 2013-08-28 2013-10-09 Koninklijke Nederlandse Akademie Van Wetenschappen Transduction Buffer
AU2014312295C1 (en) 2013-08-28 2020-08-13 Sangamo Therapeutics, Inc. Compositions for linking DNA-binding domains and cleavage domains
US9925248B2 (en) 2013-08-29 2018-03-27 Temple University Of The Commonwealth System Of Higher Education Methods and compositions for RNA-guided treatment of HIV infection
AU2014316676B2 (en) 2013-09-04 2020-07-23 Csir Site-specific nuclease single-cell assay targeting gene regulatory elements to silence gene expression
WO2015032494A2 (en) 2013-09-04 2015-03-12 Kws Saat Ag Plant resistant to helminthosporium turcicum
KR102238137B1 (en) 2013-09-04 2021-04-09 다우 아그로사이언시즈 엘엘씨 Rapid targeting analysis in crops for determining donor insertion
WO2015034872A2 (en) 2013-09-05 2015-03-12 Massachusetts Institute Of Technology Tuning microbial populations with programmable nucleases
WO2016070129A1 (en) 2014-10-30 2016-05-06 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
WO2015040075A1 (en) 2013-09-18 2015-03-26 Genome Research Limited Genomic screening methods using rna-guided endonucleases
EP3418379B1 (en) 2013-09-18 2020-12-09 Kymab Limited Methods, cells & organisms
US10202593B2 (en) 2013-09-20 2019-02-12 President And Fellows Of Harvard College Evolved sortases and uses thereof
CN105579068A (en) 2013-09-23 2016-05-11 伦斯勒理工学院 Nanoparticle-mediated gene delivery, genomic editing and ligand-targeted modification in various cell populations
WO2015048577A2 (en) 2013-09-27 2015-04-02 Editas Medicine, Inc. Crispr-related methods and compositions
US10822606B2 (en) 2013-09-27 2020-11-03 The Regents Of The University Of California Optimized small guide RNAs and methods of use
CA2925050A1 (en) 2013-09-30 2015-04-02 The Regents Of The University Of California Identification of cxcr8, a novel chemokine receptor
US20160237451A1 (en) 2013-09-30 2016-08-18 Regents Of The University Of Minnesota Conferring resistance to geminiviruses in plants using crispr/cas systems
CA2932580A1 (en) 2013-10-02 2015-04-09 Northeastern University Methods and compositions for generation of developmentally-incompetent eggs in recipients of nuclear genetic transfer
JP5774657B2 (en) 2013-10-04 2015-09-09 国立大学法人京都大学 Method for genetic modification of mammals using electroporation
US20160237402A1 (en) 2013-10-07 2016-08-18 Northeastern University Methods and Compositions for Ex Vivo Generation of Developmentally Competent Eggs from Germ Line Cells Using Autologous Cell Systems
DE102013111099B4 (en) 2013-10-08 2023-11-30 Eberhard Karls Universität Tübingen Medizinische Fakultät Permanent gene correction using nucleotide-modified messenger RNA
WO2015052231A2 (en) 2013-10-08 2015-04-16 Technical University Of Denmark Multiplex editing system
JP2015076485A (en) 2013-10-08 2015-04-20 株式会社ジャパンディスプレイ Display device
US20150098954A1 (en) 2013-10-08 2015-04-09 Elwha Llc Compositions and Methods Related to CRISPR Targeting
WO2015052335A1 (en) 2013-10-11 2015-04-16 Cellectis Methods and kits for detecting nucleic acid sequences of interest using dna-binding protein domain
WO2015057671A1 (en) 2013-10-14 2015-04-23 The Broad Institute, Inc. Artificial transcription factors comprising a sliding domain and uses thereof
EP3057991B8 (en) 2013-10-15 2019-09-04 The Scripps Research Institute Chimeric antigen receptor t cell switches and uses thereof
KR102339240B1 (en) 2013-10-15 2021-12-15 더 스크립스 리서치 인스티튜트 Peptidic chimeric antigen receptor t cell switches and uses thereof
US10117899B2 (en) 2013-10-17 2018-11-06 Sangamo Therapeutics, Inc. Delivery methods and compositions for nuclease-mediated genome engineering in hematopoietic stem cells
CN105899665B (en) 2013-10-17 2019-10-22 桑格摩生物科学股份有限公司 Delivery methods and compositions for nuclease-mediated genome engineering
WO2015058047A2 (en) 2013-10-18 2015-04-23 President And Fellows Of Harvard College Fluorination of organic compounds
KR102251168B1 (en) 2013-10-25 2021-05-13 셀렉티스 Design of rare-cutting endonucleases for efficient and specific targeting dna sequences comprising highly repetitive motives
WO2015065964A1 (en) 2013-10-28 2015-05-07 The Broad Institute Inc. Functional genomics using crispr-cas systems, compositions, methods, screens and applications thereof
WO2015066119A1 (en) 2013-10-30 2015-05-07 North Carolina State University Compositions and methods related to a type-ii crispr-cas system in lactobacillus buchneri
UY35814A (en) 2013-11-04 2015-05-29 Dow Agrosciences Llc ? OPTIMAL PLACES FOR SOYBEAN ?.
MX358066B (en) 2013-11-04 2018-08-03 Dow Agrosciences Llc Optimal soybean loci.
KR102269769B1 (en) 2013-11-04 2021-06-28 코르테바 애그리사이언스 엘엘씨 Optimal maize loci
TWI669395B (en) 2013-11-04 2019-08-21 美商陶氏農業科學公司 A universal donor system for gene targeting
KR102269371B1 (en) 2013-11-04 2021-06-28 코르테바 애그리사이언스 엘엘씨 Optimal maize loci
US10752906B2 (en) 2013-11-05 2020-08-25 President And Fellows Of Harvard College Precise microbiota engineering at the cellular level
AU2014346559B2 (en) 2013-11-07 2020-07-09 Editas Medicine,Inc. CRISPR-related methods and compositions with governing gRNAs
US20160282354A1 (en) 2013-11-08 2016-09-29 The Broad Institute, Inc. Compositions and methods for selecting a treatment for b-cell neoplasias
US20150132263A1 (en) 2013-11-11 2015-05-14 Radiant Genomics, Inc. Compositions and methods for targeted gene disruption in prokaryotes
WO2015070212A1 (en) 2013-11-11 2015-05-14 Sangamo Biosciences, Inc. Methods and compositions for treating huntington's disease
HUE044540T2 (en) 2013-11-13 2019-10-28 Childrens Medical Center Nuclease-mediated regulation of gene expression
WO2015073867A1 (en) 2013-11-15 2015-05-21 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Engineering neural stem cells using homologous recombination
EP3760719A1 (en) 2013-11-18 2021-01-06 CRISPR Therapeutics AG Crispr-cas system materials and methods
KR20160091920A (en) 2013-11-18 2016-08-03 예일 유니버시티 Compositions and methods of using transposons
US9074199B1 (en) 2013-11-19 2015-07-07 President And Fellows Of Harvard College Mutant Cas9 proteins
US10787684B2 (en) 2013-11-19 2020-09-29 President And Fellows Of Harvard College Large gene excision and insertion
WO2015075056A1 (en) 2013-11-19 2015-05-28 Thermo Fisher Scientific Baltics Uab Programmable enzymes for isolation of specific dna fragments
EP3071592B1 (en) 2013-11-20 2021-01-06 Fondazione Telethon Artificial dna-binding proteins and uses thereof
CA3236835A1 (en) 2013-11-22 2015-05-28 Mina Therapeutics Limited C/ebp alpha short activating rna compositions and methods of use
KR102348577B1 (en) 2013-11-22 2022-01-06 셀렉티스 Method of engineering chemotherapy drug resistant t-cells for immunotherapy
US10357515B2 (en) 2013-11-22 2019-07-23 Cellectis Method for generating batches of allogeneic T-cells with averaged potency
CN103642836A (en) 2013-11-26 2014-03-19 苏州同善生物科技有限公司 Method for establishing fragile X-syndrome non-human primate model on basis of CRISPR gene knockout technology
CN103614415A (en) 2013-11-27 2014-03-05 苏州同善生物科技有限公司 Method for establishing obese rat animal model based on CRISPR (clustered regularly interspaced short palindromic repeat) gene knockout technology
CN106103699B (en) 2013-11-28 2019-11-26 地平线探索有限公司 Body cell monoploid Human cell line
DK3080274T3 (en) 2013-12-09 2020-08-31 Sangamo Therapeutics Inc Methods and compositions for genome manipulation
US9546384B2 (en) 2013-12-11 2017-01-17 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a mouse genome
EP3080259B1 (en) 2013-12-12 2023-02-01 The Broad Institute, Inc. Engineering of systems, methods and optimized guide compositions with new architectures for sequence manipulation
US9994831B2 (en) 2013-12-12 2018-06-12 The Regents Of The University Of California Methods and compositions for modifying a single stranded target nucleic acid
BR112016013207A2 (en) 2013-12-12 2017-09-26 Massachusetts Inst Technology administration, use and therapeutic applications of crisp systems and compositions for hbv and viral disorders and diseases
AU2014361834B2 (en) 2013-12-12 2020-10-22 Massachusetts Institute Of Technology CRISPR-Cas systems and methods for altering expression of gene products, structural information and inducible modular Cas enzymes
MX2016007325A (en) 2013-12-12 2017-07-19 Broad Inst Inc Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders.
JP6793547B2 (en) 2013-12-12 2020-12-02 ザ・ブロード・インスティテュート・インコーポレイテッド Optimization Function Systems, methods and compositions for sequence manipulation with the CRISPR-Cas system
AU2014361781B2 (en) 2013-12-12 2021-04-01 Massachusetts Institute Of Technology Delivery, use and therapeutic applications of the CRISPR -Cas systems and compositions for genome editing
US20150165054A1 (en) 2013-12-12 2015-06-18 President And Fellows Of Harvard College Methods for correcting caspase-9 point mutations
EP3470089A1 (en) 2013-12-12 2019-04-17 The Broad Institute Inc. Delivery, use and therapeutic applications of the crispr-cas systems and compositions for targeting disorders and diseases using particle delivery components
WO2015089364A1 (en) 2013-12-12 2015-06-18 The Broad Institute Inc. Crystal structure of a crispr-cas system, and uses thereof
CA2933134A1 (en) 2013-12-13 2015-06-18 Cellectis Cas9 nuclease platform for microalgae genome engineering
EP3080275B1 (en) 2013-12-13 2020-01-15 Cellectis Method of selection of transformed diatoms using nuclease
US20150191744A1 (en) 2013-12-17 2015-07-09 University Of Massachusetts Cas9 effector-mediated regulation of transcription, differentiation and gene editing/labeling
MX2016007797A (en) 2013-12-19 2016-09-07 Amyris Inc Methods for genomic integration.
CA2935032C (en) 2013-12-26 2024-01-23 The General Hospital Corporation Multiplex guide rnas
ES2818625T3 (en) 2013-12-30 2021-04-13 Univ Pittsburgh Commonwealth Sys Higher Education Fusion genes associated with progressive prostate cancer
CN103668472B (en) 2013-12-31 2014-12-24 北京大学 Method for constructing eukaryon gene knockout library by using CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas9 system
US9963689B2 (en) 2013-12-31 2018-05-08 The Regents Of The University Of California Cas9 crystals and methods of use thereof
CN106133141B (en) 2014-01-08 2021-08-20 哈佛学院董事及会员团体 RNA-guided gene drives
EP3094729A1 (en) 2014-01-14 2016-11-23 Lam Therapeutics, Inc. Mutagenesis methods
US10774338B2 (en) 2014-01-16 2020-09-15 The Regents Of The University Of California Generation of heritable chimeric plant traits
US10179911B2 (en) 2014-01-20 2019-01-15 President And Fellows Of Harvard College Negative selection and stringency modulation in continuous evolution systems
GB201400962D0 (en) 2014-01-21 2014-03-05 Kloehn Peter C Screening for target-specific affinity binders using RNA interference
CA2937429A1 (en) 2014-01-21 2015-07-30 Caixia Gao Modified plants
EP3097190A2 (en) 2014-01-22 2016-11-30 Life Technologies Corporation Novel reverse transcriptases for use in high temperature nucleic acid synthesis
JP2017503514A (en) 2014-01-24 2017-02-02 ノースカロライナ ステート ユニバーシティーNorth Carolina State University Methods and compositions relating to sequences that guide CAS9 targeting
US10034463B2 (en) 2014-01-24 2018-07-31 Children's Medical Center Corporation High-throughput mouse model for optimizing antibody affinities
US10354746B2 (en) 2014-01-27 2019-07-16 Georgia Tech Research Corporation Methods and systems for identifying CRISPR/Cas off-target sites
CN104805078A (en) 2014-01-28 2015-07-29 北京大学 Design, synthesis and use of RNA molecule for high-efficiency genome editing
WO2015116686A1 (en) 2014-01-29 2015-08-06 Agilent Technologies, Inc. Cas9-based isothermal method of detection of specific dna sequence
US20150291969A1 (en) 2014-01-30 2015-10-15 Chromatin, Inc. Compositions for reduced lignin content in sorghum and improving cell wall digestibility, and methods of making the same
WO2015116969A2 (en) 2014-01-30 2015-08-06 The Board Of Trustees Of The University Of Arkansas Method, vectors, cells, seeds and kits for stacking genes into a single genomic site
ES2939542T3 (en) 2014-01-31 2023-04-24 Factor Bioscience Inc Methods and products for nucleic acid production and delivery
GB201401707D0 (en) 2014-01-31 2014-03-19 Sec Dep For Health The Adeno-associated viral vectors
WO2015115903A1 (en) 2014-02-03 2015-08-06 Academisch Ziekenhuis Leiden H.O.D.N. Lumc Site-specific dna break-induced genome editing using engineered nucleases
WO2015117081A2 (en) 2014-02-03 2015-08-06 Sangamo Biosciences, Inc. Methods and compositions for treatment of a beta thalessemia
PL3102722T3 (en) 2014-02-04 2021-03-08 Jumpcode Genomics, Inc. Genome fractioning
US9783803B2 (en) 2014-02-07 2017-10-10 Vib Vzw Inhibition of NEAT1 for treatment of solid tumors
EP4063503A1 (en) 2014-02-11 2022-09-28 The Regents of the University of Colorado, a body corporate Crispr enabled multiplexed genome engineering
WO2015122967A1 (en) 2014-02-13 2015-08-20 Clontech Laboratories, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
ES3063961T3 (en) 2014-02-14 2026-04-21 Cellectis Cells for immunotherapy engineered for targeting antigen present both on immune cells and pathological cells
JP2017506893A (en) 2014-02-18 2017-03-16 デューク ユニバーシティ Viral replication inactivating composition and method for producing and using the same
WO2015124718A1 (en) 2014-02-20 2015-08-27 Dsm Ip Assets B.V. Phage insensitive streptococcus thermophilus
US10196608B2 (en) 2014-02-21 2019-02-05 Cellectis Method for in situ inhibition of regulatory T cells
AU2015218576B2 (en) 2014-02-24 2020-02-27 Sangamo Therapeutics, Inc. Methods and compositions for nuclease-mediated targeted integration
US20170015994A1 (en) 2014-02-24 2017-01-19 Massachusetts Institute Of Technology Methods for in vivo genome editing
WO2015129686A1 (en) 2014-02-25 2015-09-03 国立研究開発法人 農業生物資源研究所 Plant cell having mutation introduced into target dna, and method for producing same
US11186843B2 (en) 2014-02-27 2021-11-30 Monsanto Technology Llc Compositions and methods for site directed genomic modification
CN103820454B (en) 2014-03-04 2016-03-30 上海金卫生物技术有限公司 The method of CRISPR-Cas9 specific knockdown people PD1 gene and the sgRNA for selectively targeted PD1 gene
CN103820441B (en) 2014-03-04 2017-05-17 黄行许 Method for human CTLA4 gene specific knockout through CRISPR-Cas9 (clustered regularly interspaced short palindromic repeat) and sgRNA(single guide RNA)for specially targeting CTLA4 gene
WO2015134812A1 (en) 2014-03-05 2015-09-11 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating usher syndrome and retinitis pigmentosa
SG11201609211VA (en) 2014-03-05 2016-12-29 Nat Univ Corp Univ Kobe Genomic sequence modification method for specifically converting nucleic acid bases of targeted dna sequence, and molecular complex for use in same
WO2015138510A1 (en) 2014-03-10 2015-09-17 Editas Medicine., Inc. Crispr/cas-related methods and compositions for treating leber's congenital amaurosis 10 (lca10)
MX373460B (en) 2014-03-11 2020-04-07 Cellectis METHOD FOR GENERATING COMPATIBLE T CELLS FOR ALLOGENIC TRANSPLANTATION.
CA2942268A1 (en) 2014-03-12 2015-09-17 Precision Biosciences, Inc. Dystrophin gene exon deletion using engineered nucleases
WO2015138870A2 (en) 2014-03-13 2015-09-17 The Trustees Of The University Of Pennsylvania Compositions and methods for targeted epigenetic modification
WO2015138855A1 (en) 2014-03-14 2015-09-17 The Regents Of The University Of California Vectors and methods for fungal genome engineering by crispr-cas9
EA201691581A1 (en) 2014-03-14 2017-02-28 Кибус Юс Ллс METHODS AND COMPOSITIONS FOR IMPROVING THE EFFICIENCY OF DIRECTED MODIFICATION OF GENES WITH THE APPLICATION OF MEDIATED OLIGONUCLEOTIDE REPAIR GENES
CN106459894B (en) 2014-03-18 2020-02-18 桑格摩生物科学股份有限公司 Methods and compositions for modulating zinc finger protein expression
CA2942915A1 (en) 2014-03-20 2015-09-24 Universite Laval Crispr-based methods and products for increasing frataxin levels and uses thereof
BR112016019940A2 (en) 2014-03-21 2017-10-24 Univ Leland Stanford Junior nuclease genome editing
PL3122766T3 (en) 2014-03-24 2021-09-13 IMMCO Diagnostics, Inc. Improved anti-nuclear antibody detection and diagnostics for systemic and non-systemic autoimmune disorders
US20170143848A1 (en) 2014-03-24 2017-05-25 Shire Human Genetic Therapies, Inc. Mrna therapy for the treatment of ocular diseases
WO2015148680A1 (en) 2014-03-25 2015-10-01 Ginkgo Bioworks, Inc. Methods and genetic systems for cell engineering
CA2943622A1 (en) 2014-03-25 2015-10-01 Editas Medicine Inc. Crispr/cas-related methods and compositions for treating hiv infection and aids
WO2015148860A1 (en) 2014-03-26 2015-10-01 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating beta-thalassemia
US10349639B2 (en) 2014-03-26 2019-07-16 University Of Maryland, College Park Targeted genome editing in zygotes of domestic large animals
US9609415B2 (en) 2014-03-26 2017-03-28 Bose Corporation Headphones with cable management
US11242525B2 (en) 2014-03-26 2022-02-08 Editas Medicine, Inc. CRISPR/CAS-related methods and compositions for treating sickle cell disease
US9993563B2 (en) 2014-03-28 2018-06-12 Aposense Ltd. Compounds and methods for trans-membrane delivery of molecules
CA2944141C (en) 2014-03-28 2023-03-28 Aposense Ltd. Compounds and methods for trans-membrane delivery of molecules
WO2015153789A1 (en) 2014-04-01 2015-10-08 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating herpes simplex virus type 1 (hsv-1)
WO2015153791A1 (en) 2014-04-01 2015-10-08 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating herpes simplex virus type 2 (hsv-2)
WO2015153760A2 (en) 2014-04-01 2015-10-08 Sangamo Biosciences, Inc. Methods and compositions for prevention or treatment of a nervous system disorder
WO2015153889A2 (en) 2014-04-02 2015-10-08 University Of Florida Research Foundation, Incorporated Materials and methods for the treatment of latent viral infection
US12460231B2 (en) 2014-04-02 2025-11-04 Editas Medicine, Inc. Crispr/CAS-related methods and compositions for treating primary open angle glaucoma
EP3126503A1 (en) 2014-04-03 2017-02-08 Massachusetts Institute Of Technology Methods and compositions for the production of guide rna
CN103911376B (en) 2014-04-03 2017-02-15 黄行许 CRISPR-Cas9 targeted knockout hepatitis b virus cccDNA and specific sgRNA thereof
US11439712B2 (en) 2014-04-08 2022-09-13 North Carolina State University Methods and compositions for RNA-directed repression of transcription using CRISPR-associated genes
EP3556858A3 (en) 2014-04-09 2020-01-22 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating cystic fibrosis
US10253311B2 (en) 2014-04-10 2019-04-09 The Regents Of The University Of California Methods and compositions for using argonaute to modify a single stranded target nucleic acid
AU2015245469B2 (en) 2014-04-11 2020-11-12 Cellectis Method for generating immune cells resistant to arginine and/or tryptophan depleted microenvironment
WO2015159068A1 (en) 2014-04-14 2015-10-22 Nemesis Bioscience Ltd Therapeutic
EP3132025B1 (en) 2014-04-14 2023-08-30 Maxcyte, Inc. Methods and compositions for modifying genomic dna
CN103923911B (en) 2014-04-14 2016-06-08 上海金卫生物技术有限公司 The method of CRISPR-Cas9 specific knockdown CCR5 gene and the sgRNA for selectively targeted CCR5 gene
GB201406968D0 (en) 2014-04-17 2014-06-04 Green Biologics Ltd Deletion mutants
GB201406970D0 (en) 2014-04-17 2014-06-04 Green Biologics Ltd Targeted mutations
CA2945335A1 (en) 2014-04-18 2015-10-22 Editas Medicine, Inc. Crispr-cas-related methods, compositions and components for cancer immunotherapy
CN105039399A (en) 2014-04-23 2015-11-11 复旦大学 Pluripotent stem cell-hereditary cardiomyopathy cardiac muscle cell and preparation method thereof
WO2015164748A1 (en) 2014-04-24 2015-10-29 Sangamo Biosciences, Inc. Engineered transcription activator like effector (tale) proteins
EP3134515B1 (en) 2014-04-24 2019-03-27 Board of Regents, The University of Texas System Application of induced pluripotent stem cells to generate adoptive cell therapy products
US20170076039A1 (en) 2014-04-24 2017-03-16 Institute For Basic Science A Method of Selecting a Nuclease Target Sequence for Gene Knockout Based on Microhomology
WO2015168158A1 (en) 2014-04-28 2015-11-05 Fredy Altpeter Targeted genome editing to modify lignin biosynthesis and cell wall composition
WO2015168125A1 (en) 2014-04-28 2015-11-05 Recombinetics, Inc. Multiplex gene editing in swine
RU2016143352A (en) 2014-04-28 2018-05-28 ДАУ АГРОСАЙЕНСИЗ ЭлЭлСи HAPLOID CORN TRANSFORMATION
WO2015167766A1 (en) 2014-04-29 2015-11-05 Seattle Children's Hospital (dba Seattle Children's Research Institute) Ccr5 disruption of cells expressing anti-hiv chimeric antigen receptor (car) derived from broadly neutralizing antibodies
WO2015168404A1 (en) 2014-04-30 2015-11-05 Massachusetts Institute Of Technology Toehold-gated guide rna for programmable cas9 circuitry with rna input
EP3156493B1 (en) 2014-04-30 2020-05-06 Tsinghua University Use of tale transcriptional repressor for modular construction of synthetic gene line in mammalian cell
CN104178506B (en) 2014-04-30 2017-03-01 清华大学 TALER albumen is by sterically hindered performance transcripting suppressioning action and its application
WO2015165276A1 (en) 2014-04-30 2015-11-05 清华大学 Reagent kit using tale transcriptional repressor for modular construction of synthetic gene line in mammalian cell
CN107405411A (en) 2014-05-01 2017-11-28 华盛顿大学 Use genetic modification inside adenovirus vector
GB201407852D0 (en) 2014-05-02 2014-06-18 Iontas Ltd Preparation of libraries od protein variants expressed in eukaryotic cells and use for selecting binding molecules
WO2015171603A1 (en) 2014-05-06 2015-11-12 Two Blades Foundation Methods for producing plants with enhanced resistance to oomycete pathogens
RU2691102C2 (en) 2014-05-08 2019-06-11 Сангамо Байосайенсиз, Инк. Methods and compositions for treating huntington's disease
US10487336B2 (en) 2014-05-09 2019-11-26 The Regents Of The University Of California Methods for selecting plants after genome editing
CA2948580A1 (en) 2014-05-09 2015-11-12 Adam Zlotnick Methods and compositions for treating hepatitis b virus infections
EP3140403A4 (en) 2014-05-09 2017-12-20 Université Laval Prevention and treatment of alzheimer's disease by genome editing using the crispr/cas system
AU2015259191B2 (en) 2014-05-13 2019-03-21 Sangamo Therapeutics, Inc. Methods and compositions for prevention or treatment of a disease
CN104004782B (en) 2014-05-16 2016-06-08 安徽省农业科学院水稻研究所 A kind of breeding method extending paddy rice breeding time
CN104017821B (en) 2014-05-16 2016-07-06 安徽省农业科学院水稻研究所 Directed editor's grain husk shell color determines the gene OsCHI method formulating brown shell rice material
CN103981212B (en) 2014-05-16 2016-06-01 安徽省农业科学院水稻研究所 The clever shell color of the rice varieties of yellow grain husk shell is changed into the breeding method of brown
WO2015173436A1 (en) 2014-05-16 2015-11-19 Vrije Universiteit Brussel Genetic correction of myotonic dystrophy type 1
CN103981211B (en) 2014-05-16 2016-07-06 安徽省农业科学院水稻研究所 A kind of breeding method formulating cleistogamous rice material
EP3152221A4 (en) 2014-05-20 2018-01-24 Regents of the University of Minnesota Method for editing a genetic sequence
CA2852593A1 (en) 2014-05-23 2015-11-23 Universite Laval Methods for producing dopaminergic neurons and uses thereof
WO2015183885A1 (en) 2014-05-27 2015-12-03 Dana-Farber Cancer Institute, Inc. Methods and compositions for perturbing gene expression in hematopoietic stem cell lineages in vivo
CN106687601A (en) 2014-05-28 2017-05-17 株式会社图尔金 Method for sensitive detection of target DNA using target-specific nuclease
EP3149171A1 (en) 2014-05-30 2017-04-05 The Board of Trustees of The Leland Stanford Junior University Compositions and methods of delivering treatments for latent viral infections
EP3152319A4 (en) 2014-06-05 2017-12-27 Sangamo BioSciences, Inc. Methods and compositions for nuclease design
RS60359B1 (en) 2014-06-06 2020-07-31 Regeneron Pharma Methods and compositions for modifying a targeted locus
US11030531B2 (en) 2014-06-06 2021-06-08 Trustees Of Boston University DNA recombinase circuits for logical control of gene expression
KR20170010893A (en) 2014-06-06 2017-02-01 더 캘리포니아 인스티튜트 포 바이오메디칼 리써치 Methods of constructing amino terminal immunoglobulin fusion proteins and compositions thereof
WO2015188094A1 (en) 2014-06-06 2015-12-10 President And Fellows Of Harvard College Methods for targeted modification of genomic dna
US20170210818A1 (en) 2014-06-06 2017-07-27 The California Institute For Biomedical Research Constant region antibody fusion proteins and compositions thereof
CN104004778B (en) 2014-06-06 2016-03-02 重庆高圣生物医药有限责任公司 Targeting knockout carrier containing CRISPR/Cas9 system and adenovirus thereof and application
WO2015191693A2 (en) 2014-06-10 2015-12-17 Massachusetts Institute Of Technology Method for gene editing
US11274302B2 (en) 2016-08-17 2022-03-15 Diacarta Ltd Specific synthetic chimeric Xenonucleic acid guide RNA; s(XNA-gRNA) for enhancing CRISPR mediated genome editing efficiency
CA2951882A1 (en) 2014-06-11 2015-12-17 Tom E. HOWARD Factor viii mutation repair and tolerance induction and related cdnas, compositions, methods and systems
JP6730199B2 (en) 2014-06-11 2020-07-29 デューク ユニバーシティ Compositions and methods for rapid and dynamic flux control using synthetic metabolic valves
WO2015189693A1 (en) 2014-06-12 2015-12-17 King Abdullah University Of Science And Technology Targeted viral-mediated plant genome editing using crispr/cas9
WO2015191911A2 (en) 2014-06-12 2015-12-17 Clontech Laboratories, Inc. Protein enriched microvesicles and methods of making and using the same
WO2015195547A1 (en) 2014-06-16 2015-12-23 University Of Washington Methods for controlling stem cell potential and for gene editing in stem cells
DK3155101T3 (en) 2014-06-16 2020-05-04 Univ Johns Hopkins Compositions and Methods for Expression of CRISPR Leader RNAs Using the H1 Promoter
EP3157328B1 (en) 2014-06-17 2021-08-04 Poseida Therapeutics, Inc. A method for directing proteins to specific loci in the genome and uses thereof
CA2952906A1 (en) 2014-06-20 2015-12-23 Cellectis Potatoes with reduced granule-bound starch synthase
EP3919621A1 (en) 2014-06-23 2021-12-08 The General Hospital Corporation Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq)
HUE049405T2 (en) 2014-06-23 2020-09-28 Regeneron Pharma Nuclease-mediated DNA assembly
WO2015200555A2 (en) 2014-06-25 2015-12-30 Caribou Biosciences, Inc. Rna modification to engineer cas9 activity
KR102386101B1 (en) 2014-06-26 2022-04-14 리제너론 파마슈티칼스 인코포레이티드 Methods and compositions for targeted genetic modifications and methods of use
GB201411344D0 (en) 2014-06-26 2014-08-13 Univ Leicester Cloning
JP6342491B2 (en) 2014-06-30 2018-06-13 花王株式会社 Adhesive sheet for cooling
US20170152787A1 (en) 2014-06-30 2017-06-01 Nissan Motor Co., Ltd. Internal combustion engine
WO2016004010A1 (en) 2014-07-01 2016-01-07 Board Of Regents, The University Of Texas System Regulated gene expression from viral vectors
BR112016030852A2 (en) 2014-07-02 2018-01-16 Shire Human Genetic Therapies rna messenger encapsulation
WO2016007604A1 (en) 2014-07-09 2016-01-14 Gen9, Inc. Compositions and methods for site-directed dna nicking and cleaving
EP2966170A1 (en) 2014-07-10 2016-01-13 Heinrich-Pette-Institut Leibniz-Institut für experimentelle Virologie-Stiftung bürgerlichen Rechts - HBV inactivation
WO2016007948A1 (en) 2014-07-11 2016-01-14 Pioneer Hi-Bred International, Inc. Agronomic trait modification using guide rna/cas endonuclease systems and methods of use
AU2015288157A1 (en) 2014-07-11 2017-01-19 E. I. Du Pont De Nemours And Company Compositions and methods for producing plants resistant to glyphosate herbicide
ES3047792T3 (en) 2014-07-14 2025-12-04 Univ California Crispr/cas transcriptional modulation
CN104109687A (en) 2014-07-14 2014-10-22 四川大学 Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system
AU2015289644A1 (en) 2014-07-15 2017-02-02 Juno Therapeutics, Inc. Engineered cells for adoptive cell therapy
US9944933B2 (en) 2014-07-17 2018-04-17 Georgia Tech Research Corporation Aptamer-guided gene targeting
WO2016011428A1 (en) 2014-07-17 2016-01-21 University Of Pittsburgh - Of The Commonwealth System Of Higher Education Methods of treating cells containing fusion genes
US20160053272A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Modifying A Sequence Using CRISPR
US10975406B2 (en) 2014-07-18 2021-04-13 Massachusetts Institute Of Technology Directed endonucleases for repeatable nucleic acid cleavage
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
AU2015294354B2 (en) 2014-07-21 2021-10-28 Illumina, Inc. Polynucleotide enrichment using CRISPR-Cas systems
TWI750110B (en) 2014-07-21 2021-12-21 瑞士商諾華公司 Treatment of cancer using humanized anti- bcma chimeric antigen receptor
US10210987B2 (en) 2014-07-22 2019-02-19 Panasonic Intellectual Property Management Co., Ltd. Composite magnetic material, coil component using same, and composite magnetic material manufacturing method
US10244771B2 (en) 2014-07-24 2019-04-02 Dsm Ip Assets B.V. Non-CRISPR-mediated phage resistant Streptococcus thermophilus
WO2016012544A2 (en) 2014-07-25 2016-01-28 Boehringer Ingelheim International Gmbh Enhanced reprogramming to ips cells
WO2016014837A1 (en) 2014-07-25 2016-01-28 Sangamo Biosciences, Inc. Gene editing for hiv gene therapy
US9816074B2 (en) 2014-07-25 2017-11-14 Sangamo Therapeutics, Inc. Methods and compositions for modulating nuclease-mediated genome engineering in hematopoietic stem cells
US10301367B2 (en) 2014-07-26 2019-05-28 Consiglio Nazionale Delle Ricerche Compositions and methods for treatment of muscular dystrophy
FR3024464A1 (en) 2014-07-30 2016-02-05 Centre Nat Rech Scient TARGETING NON-VIRAL INTEGRATIVE VECTORS IN NUCLEOLAR DNA SEQUENCES IN EUKARYOTES
WO2016022363A2 (en) 2014-07-30 2016-02-11 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
WO2016019144A2 (en) 2014-07-30 2016-02-04 Sangamo Biosciences, Inc. Gene correction of scid-related genes in hematopoietic stem and progenitor cells
US9850521B2 (en) 2014-08-01 2017-12-26 Agilent Technologies, Inc. In vitro assay buffer for Cas9
EP2982758A1 (en) 2014-08-04 2016-02-10 Centre Hospitalier Universitaire Vaudois (CHUV) Genome editing for the treatment of huntington's disease
US20160076093A1 (en) 2014-08-04 2016-03-17 University Of Washington Multiplex homology-directed repair
ES2865275T3 (en) 2014-08-06 2021-10-15 College Of Medicine Pochon Cha Univ Industry Academic Cooperation Foundation Immunocompatible cells created by nuclease-mediated editing of HLA-encoding genes
CN106922154B (en) 2014-08-06 2022-01-07 基因工具股份有限公司 Gene editing using Campylobacter jejuni CRISPR/CAS system-derived RNA-guided engineered nucleases
US11299732B2 (en) 2014-08-07 2022-04-12 The Rockefeller University Compositions and methods for transcription-based CRISPR-Cas DNA editing
WO2016022866A1 (en) 2014-08-07 2016-02-11 Agilent Technologies, Inc. Cis-blocked guide rna
WO2016025469A1 (en) 2014-08-11 2016-02-18 The Board Of Regents Of The University Of Texas System Prevention of muscular dystrophy by crispr/cas9-mediated gene editing
US10513711B2 (en) 2014-08-13 2019-12-24 Dupont Us Holding, Llc Genetic targeting in non-conventional yeast using an RNA-guided endonuclease
WO2016025759A1 (en) 2014-08-14 2016-02-18 Shen Yuelei Dna knock-in system
CN104178461B (en) 2014-08-14 2017-02-01 北京蛋白质组研究中心 CAS9-carrying recombinant adenovirus and application thereof
US9879270B2 (en) 2014-08-15 2018-01-30 Wisconsin Alumni Research Foundation Constructs and methods for genome editing and genetic engineering of fungi and protists
EP3686279B1 (en) 2014-08-17 2023-01-04 The Broad Institute, Inc. Genome editing using cas9 nickases
EP3633047B1 (en) 2014-08-19 2022-12-28 Pacific Biosciences of California, Inc. Method of sequencing nucleic acids based on an enrichment of nucleic acids
EP3183358B1 (en) 2014-08-19 2020-10-07 President and Fellows of Harvard College Rna-guided systems for probing and mapping of nucleic acids
WO2016026444A1 (en) 2014-08-20 2016-02-25 Shanghai Institutes For Biological Sciences, Chinese Academy Of Sciences Biomarker and therapeutic target for triple negative breast cancer
ES2778727T3 (en) 2014-08-25 2020-08-11 Geneweave Biosciences Inc Non-replicative transduction particles and reporter systems based on transduction particles
CA2958767A1 (en) 2014-08-26 2016-03-03 The Regents Of The University Of California Hypersensitive aba receptors
ES2730378T3 (en) 2014-08-27 2019-11-11 Caribou Biosciences Inc Procedures to increase the efficiency of the modification mediated by Cas9
EP3633032A3 (en) 2014-08-28 2020-07-29 North Carolina State University Novel cas9 proteins and guiding features for dna targeting and genome editing
WO2016036754A1 (en) 2014-09-02 2016-03-10 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification
KR20160029247A (en) 2014-09-05 2016-03-15 한국외국어대학교 연구산학협력단 A novel fusion protein and manufacturing method thereof
WO2016035044A1 (en) 2014-09-05 2016-03-10 Vilnius University Programmable rna shredding by the type iii-a crispr-cas system of streptococcus thermophilus
EP3188746B1 (en) 2014-09-05 2024-06-19 The Johns Hopkins University Targeting capn9 activity as a therapeutic strategy for the treatment of myofibroblast differentiation and associated pathologies
WO2016040594A1 (en) 2014-09-10 2016-03-17 The Regents Of The University Of California Reconstruction of ancestral cells by enzymatic recording
CN108064129A (en) 2014-09-12 2018-05-22 纳幕尔杜邦公司 Methods for the generation and use of site-specific integration sites for complex trait loci in maize and soybean
HUE055583T2 (en) 2014-09-16 2021-12-28 Sangamo Therapeutics Inc Methods and compositions for nuclease-mediated genome engineering and correction in hematopoietic stem cells
CA2960436C (en) 2014-09-16 2021-01-05 Gilead Sciences, Inc. Solid forms of a toll-like receptor modulator
BR112017005892A2 (en) 2014-09-24 2017-12-12 Hope City adeno-associated virus vector variants for high-efficiency genome editing and methods
WO2016049024A2 (en) 2014-09-24 2016-03-31 The Broad Institute Inc. Delivery, use and therapeutic applications of the crispr-cas systems and compositions for modeling competition of multiple cancer mutations in vivo
WO2016049163A2 (en) 2014-09-24 2016-03-31 The Broad Institute Inc. Use and production of chd8+/- transgenic animals with behavioral phenotypes characteristic of autism spectrum disorder
WO2016049251A1 (en) 2014-09-24 2016-03-31 The Broad Institute Inc. Delivery, use and therapeutic applications of the crispr-cas systems and compositions for modeling mutations in leukocytes
WO2016049258A2 (en) 2014-09-25 2016-03-31 The Broad Institute Inc. Functional screening with optimized functional crispr-cas systems
WO2016046635A1 (en) 2014-09-25 2016-03-31 Institut Pasteur Methods for characterizing human papillomavirus associated cervical lesions
US20160090603A1 (en) 2014-09-30 2016-03-31 Sandia Corporation Delivery platforms for the domestication of algae and plants
WO2016054326A1 (en) 2014-10-01 2016-04-07 The General Hospital Corporation Methods for increasing efficiency of nuclease-induced homology-directed repair
DK3204399T3 (en) 2014-10-09 2025-02-24 Seattle Childrens Hospital Dba Seattle Childrens Res Inst LONG POLY (A) PLASMIDS AND METHODS FOR INSERTING LONG POLY (A) SEQUENCES INTO THE PLASMID
WO2016057951A2 (en) 2014-10-09 2016-04-14 Life Technologies Corporation Crispr oligonucleotides and gene editing
US10583201B2 (en) 2014-10-10 2020-03-10 Massachusetts Eye And Ear Infirmary Efficient delivery of therapeutic molecules in vitro and in vivo
EP3204496A1 (en) 2014-10-10 2017-08-16 Editas Medicine, Inc. Compositions and methods for promoting homology directed repair
WO2016061073A1 (en) 2014-10-14 2016-04-21 Memorial Sloan-Kettering Cancer Center Composition and method for in vivo engineering of chromosomal rearrangements
DK3207124T3 (en) 2014-10-15 2019-08-12 Regeneron Pharma METHODS AND COMPOSITIONS FOR GENERATION OR STORAGE OF PLURIPOTENT CELLS
CN104342457A (en) 2014-10-17 2015-02-11 杭州师范大学 Method for targetedly integrating exogenous gene into target gene
BR112017007923B1 (en) 2014-10-17 2023-12-12 The Penn State Research Foundation METHOD FOR PRODUCING GENETIC MANIPULATION MEDIATED BY MULTIPLEX REACTIONS WITH RNA IN A RECEIVING CELL, CONSTRUCTION OF NUCLEIC ACID, EXPRESSION CASSETTE, VECTOR, RECEIVING CELL AND GENETICALLY MODIFIED CELL
US11174506B2 (en) 2014-10-17 2021-11-16 Howard Hughes Medical Institute Genomic probes
BR112017008082A2 (en) 2014-10-20 2017-12-26 Envirologix Inc compositions and methods for detecting an rna virus
US10920208B2 (en) 2014-10-22 2021-02-16 President And Fellows Of Harvard College Evolution of proteases
US20170306306A1 (en) 2014-10-24 2017-10-26 Life Technologies Corporation Compositions and Methods for Enhancing Homologous Recombination
WO2016069591A2 (en) 2014-10-27 2016-05-06 The Broad Institute Inc. Compositions, methods and use of synthetic lethal screening
WO2016069774A1 (en) 2014-10-28 2016-05-06 Agrivida, Inc. Methods and compositions for stabilizing trans-splicing intein modified proteases
US10258697B2 (en) 2014-10-29 2019-04-16 Massachusetts Eye And Ear Infirmary Efficient delivery of therapeutic molecules in vitro and in vivo
MA40880A (en) 2014-10-30 2017-09-05 Temple Univ Of The Commonwealth RNA-GUIDED ERADICATION OF HUMAN JC VIRUS AND OTHER POLYOMAVIRUSES
WO2016069282A1 (en) 2014-10-31 2016-05-06 The Trustees Of The University Of Pennsylvania Altering gene expression in modified t cells and uses thereof
CN107429246B (en) 2014-10-31 2021-06-01 麻省理工学院 Massively parallel combinatorial genetics for CRISPR
US9816080B2 (en) 2014-10-31 2017-11-14 President And Fellows Of Harvard College Delivery of CAS9 via ARRDC1-mediated microvesicles (ARMMs)
US10435697B2 (en) 2014-11-03 2019-10-08 Nanyang Technological University Recombinant expression system that senses pathogenic microorganisms
CN104404036B (en) 2014-11-03 2017-12-01 赛业(苏州)生物科技有限公司 Conditional gene knockout method based on CRISPR/Cas9 technologies
CN104504304B (en) 2014-11-03 2017-08-25 深圳先进技术研究院 A kind of short palindrome repetitive sequence recognition methods of regular intervals of cluster and device
US10920215B2 (en) 2014-11-04 2021-02-16 National University Corporation Kobe University Method for modifying genome sequence to introduce specific mutation to targeted DNA sequence by base-removal reaction, and molecular complex used therein
US20180291382A1 (en) 2014-11-05 2018-10-11 The Regents Of The University Of California Methods for Autocatalytic Genome Editing and Neutralizing Autocatalytic Genome Editing
CN107406838A (en) 2014-11-06 2017-11-28 纳幕尔杜邦公司 Peptide-mediated delivering of the endonuclease of RNA guiding into cell
CA2963820A1 (en) 2014-11-07 2016-05-12 Editas Medicine, Inc. Methods for improving crispr/cas-mediated genome-editing
CN107532142A (en) 2014-11-11 2018-01-02 应用干细胞有限公司 Mescenchymal stem cell is transformed using homologous recombination
WO2016077350A1 (en) 2014-11-11 2016-05-19 Illumina, Inc. Polynucleotide amplification using crispr-cas systems
WO2016076672A1 (en) 2014-11-14 2016-05-19 기초과학연구원 Method for detecting off-target site of genetic scissors in genome
EP3467110A1 (en) 2014-11-15 2019-04-10 Zumutor Biologics Inc. Dna-binding domain, non-fucosylated and partially fucosylated proteins, and methods thereof
WO2016080097A1 (en) 2014-11-17 2016-05-26 国立大学法人東京医科歯科大学 Method for easily and highly efficiently creating genetically modified nonhuman mammal
US10858662B2 (en) 2014-11-19 2020-12-08 Institute For Basic Science Genome editing with split Cas9 expressed from two vectors
US11319555B2 (en) 2014-11-20 2022-05-03 Duke University Compositions, systems and methods for cell therapy
US10227661B2 (en) 2014-11-21 2019-03-12 GeneWeave Biosciences, Inc. Sequence-specific detection and phenotype determination
ES2731437T3 (en) 2014-11-21 2019-11-15 Regeneron Pharma Methods and compositions for directed genetic modification through the use of guide RNA pairs
US20180334732A1 (en) 2014-11-25 2018-11-22 Drexel University Compositions and methods for hiv quasi-species excision from hiv-1-infected patients
WO2016084088A1 (en) 2014-11-26 2016-06-02 Ramot At Tel-Aviv University Ltd. Targeted elimination of bacterial genes
EP3224363B1 (en) 2014-11-27 2021-11-03 Yissum Research Development Company of the Hebrew University of Jerusalem Ltd. Nucleic acid constructs for genome editing
US20180105834A1 (en) 2014-11-27 2018-04-19 Institute Of Animal Sciences, Chinese Academy Of Agrigultural Sciences A method of site-directed insertion to h11 locus in pigs by using site-directed cutting system
CN105695485B (en) 2014-11-27 2020-02-21 中国科学院上海生命科学研究院 A Cas9-encoding gene for filamentous fungal Crispr-Cas system and its application
GB201421096D0 (en) 2014-11-27 2015-01-14 Imp Innovations Ltd Genome editing methods
WO2016089866A1 (en) 2014-12-01 2016-06-09 President And Fellows Of Harvard College Rna-guided systems for in vivo gene editing
EP3227446A1 (en) 2014-12-01 2017-10-11 Novartis AG Compositions and methods for diagnosis and treatment of prostate cancer
US10900034B2 (en) 2014-12-03 2021-01-26 Agilent Technologies, Inc. Guide RNA with chemical modifications
CN104450774A (en) 2014-12-04 2015-03-25 中国农业科学院作物科学研究所 Construction of soybean CRISPR/Cas9 system and application of soybean CRISPR/Cas9 system in soybean gene modification
US10975392B2 (en) 2014-12-05 2021-04-13 Abcam Plc Site-directed CRISPR/recombinase compositions and methods of integrating transgenes
CN104531705A (en) 2014-12-09 2015-04-22 中国农业大学 Method for knocking off animal myostatin gene by using CRISPR-Cas9 system
CN104531704B (en) 2014-12-09 2019-05-21 中国农业大学 Utilize the method for CRISPR-Cas9 system knock-out animal FGF5 gene
AU2015360502A1 (en) 2014-12-10 2017-06-29 Regents Of The University Of Minnesota Genetically modified cells, tissues, and organs for treating disease
CN104480144B (en) 2014-12-12 2017-04-12 武汉大学 CRISPR/Cas9 recombinant lentiviral vector for human immunodeficiency virus gene therapy and lentivirus of CRISPR/Cas9 recombinant lentiviral vector
EP4372091A3 (en) 2014-12-12 2024-07-31 Tod M. Woolf Compositions and methods for editing nucleic acids in cells utilizing oligonucleotides
CN107249645A (en) 2014-12-12 2017-10-13 朱坚 Methods and compositions for selective elimination of cells of interest
WO2016094874A1 (en) 2014-12-12 2016-06-16 The Broad Institute Inc. Escorted and functionalized guides for crispr-cas systems
WO2016094872A1 (en) 2014-12-12 2016-06-16 The Broad Institute Inc. Dead guides for crispr transcription factors
WO2016094880A1 (en) 2014-12-12 2016-06-16 The Broad Institute Inc. Delivery, use and therapeutic applications of crispr systems and compositions for genome editing as to hematopoietic stem cells (hscs)
CA2971187C (en) 2014-12-16 2023-10-24 Danisco Us Inc. Fungal genome modification systems and methods of use
EP3234136B1 (en) 2014-12-16 2024-08-21 C3J Therapeutics, Inc. Compositions of and methods for in vitro viral genome engineering
JP6839082B2 (en) 2014-12-17 2021-03-03 イー・アイ・デュポン・ドウ・ヌムール・アンド・カンパニーE.I.Du Pont De Nemours And Company E. using a guide RNA / CAS endonuclease system in combination with a cyclic polynucleotide modification template. Compositions and methods for efficient gene editing in E. coli
US10676737B2 (en) 2014-12-17 2020-06-09 Proqr Therapeutics Ii B.V. Targeted RNA editing
CA2969384A1 (en) 2014-12-17 2016-06-23 Cellectis Inhibitory chimeric antigen receptor (icar or n-car) expressing non-t cell transduction domain
WO2016097751A1 (en) 2014-12-18 2016-06-23 The University Of Bath Method of cas9 mediated genome engineering
CN112877327B (en) 2014-12-18 2024-11-22 综合基因技术公司 CRISPR-based compositions and methods of use
CN104745626B (en) 2014-12-19 2018-05-01 中国航天员科研训练中心 A kind of fast construction method of conditional gene knockout animal model and application
EP3234192B1 (en) 2014-12-19 2021-07-14 The Broad Institute, Inc. Unbiased identification of double-strand breaks and genomic rearrangement by genome-wide insert capture sequencing
CA2971444A1 (en) 2014-12-20 2016-06-23 Arc Bio, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
US10190106B2 (en) 2014-12-22 2019-01-29 Univesity Of Massachusetts Cas9-DNA targeting unit chimeras
CN104560864B (en) 2014-12-22 2017-08-11 中国科学院微生物研究所 Utilize the 293T cell lines of the knockout IFN β genes of CRISPR Cas9 system constructings
WO2016106236A1 (en) 2014-12-23 2016-06-30 The Broad Institute Inc. Rna-targeting system
US11053271B2 (en) 2014-12-23 2021-07-06 The Regents Of The University Of California Methods and compositions for nucleic acid integration
AU2015101792A4 (en) 2014-12-24 2016-01-28 Massachusetts Institute Of Technology Engineering of systems, methods and optimized enzyme and guide scaffolds for sequence manipulation
US20170369855A1 (en) 2014-12-24 2017-12-28 Dana-Farber Cancer Institute, Inc. Systems and methods for genome modification and regulation
CN104651398A (en) 2014-12-24 2015-05-27 杭州师范大学 Method for knocking out microRNA gene family by utilizing CRISPR-Cas9 specificity
WO2016106244A1 (en) 2014-12-24 2016-06-30 The Broad Institute Inc. Crispr having or associated with destabilization domains
EP3239298A4 (en) 2014-12-26 2018-06-13 Riken Gene knockout method
WO2016108926A1 (en) 2014-12-30 2016-07-07 The Broad Institute Inc. Crispr mediated in vivo modeling and genetic screening of tumor growth and metastasis
CN104498493B (en) 2014-12-30 2017-12-26 武汉大学 The method of CRISPR/Cas9 specific knockdown hepatitis type B viruses and the gRNA for selectively targeted HBV DNA
US20180002706A1 (en) 2014-12-30 2018-01-04 University Of South Florida Methods and compositions for cloning into large vectors
SG11201704272YA (en) 2014-12-31 2017-06-29 Synthetic Genomics Inc Compositions and methods for high efficiency in vivo genome editing
CN104651399B (en) 2014-12-31 2018-11-16 广西大学 A method of gene knockout being realized in Pig embryos cell using CRISPR/Cas system
US10590436B2 (en) 2015-01-06 2020-03-17 Dsm Ip Assets B.V. CRISPR-CAS system for a lipolytic yeast host cell
CN104651392B (en) 2015-01-06 2018-07-31 华南农业大学 A method of obtaining temp-sensing sterile line using CRISPR/Cas9 system rite-directed mutagenesis P/TMS12-1
WO2016110512A1 (en) 2015-01-06 2016-07-14 Dsm Ip Assets B.V. A crispr-cas system for a yeast host cell
JP6603721B2 (en) 2015-01-06 2019-11-06 インダストリー−アカデミック コーポレーション ファウンデーション,ヨンセイ ユニバーシティ Endonuclease targeting blood coagulation factor VIII gene and composition for treating hemophilia containing the same
DK3242950T3 (en) 2015-01-06 2021-12-20 Dsm Ip Assets Bv CRISPR-CAS SYSTEM FOR A WIRED MUSHROOM MUSHROOM HOST CELL
CN104593422A (en) 2015-01-08 2015-05-06 中国农业大学 Method of cloning reproductive and respiratory syndrome resisting pig
WO2016112242A1 (en) 2015-01-08 2016-07-14 President And Fellows Of Harvard College Split cas9 proteins
WO2016112351A1 (en) 2015-01-09 2016-07-14 Bio-Rad Laboratories, Inc. Detection of genome editing
WO2016114972A1 (en) 2015-01-12 2016-07-21 The Regents Of The University Of California Heterodimeric cas9 and methods of use thereof
CN107250373A (en) 2015-01-12 2017-10-13 麻省理工学院 Gene editing by microfluidic delivery
WO2016112963A1 (en) 2015-01-13 2016-07-21 Riboxx Gmbh Delivery of biomolecules into cells
MA41349A (en) 2015-01-14 2017-11-21 Univ Temple RNA-GUIDED ERADICATION OF HERPES SIMPLEX TYPE I AND OTHER ASSOCIATED HERPES VIRUSES
PL3244909T3 (en) 2015-01-14 2020-04-30 Université D'aix-Marseille Proteasome inhibitors for treating a disorder related to an accumulation of non-degraded abnormal protein or a cancer
CN107429263A (en) 2015-01-15 2017-12-01 斯坦福大学托管董事会 Methods for Regulating Genome Editing
CN104611370A (en) 2015-01-16 2015-05-13 深圳市科晖瑞生物医药有限公司 Method for rejecting B2M (beta 2-microglobulin) gene segment
EA201791633A1 (en) 2015-01-19 2018-03-30 Инститьют Оф Дженетикс Энд Девелопментал Байолоджи, Чайниз Акэдеми Оф Сайенсиз METHOD OF PRECISE MODIFICATION OF A PLANT BY MEANS OF TRANSIENT EXPRESSION OF THE GENE
CN104725626B (en) 2015-01-22 2016-06-29 漳州亚邦化学有限公司 A kind of preparation method of the unsaturated-resin suitable in artificial quartz in lump
CN105821072A (en) 2015-01-23 2016-08-03 深圳华大基因研究院 CRISPR-Cas9 system used for assembling DNA and DNA assembly method
WO2016123071A1 (en) 2015-01-26 2016-08-04 Cold Spring Harbor Laboratory Methods of identifying essential protein domains
US10059940B2 (en) 2015-01-27 2018-08-28 Minghong Zhong Chemically ligated RNAs for CRISPR/Cas9-lgRNA complexes as antiviral therapeutic agents
CN104561095B (en) 2015-01-27 2017-08-22 深圳市国创纳米抗体技术有限公司 A kind of preparation method for the transgenic mice that can produce growth factor of human nerve
WO2016123243A1 (en) 2015-01-28 2016-08-04 The Regents Of The University Of California Methods and compositions for labeling a single-stranded target nucleic acid
EP3250691B9 (en) 2015-01-28 2023-08-02 Caribou Biosciences, Inc. Crispr hybrid dna/rna polynucleotides and methods of use
US11248240B2 (en) 2015-01-29 2022-02-15 Meiogenix Method for inducing targeted meiotic recombinations
WO2016123578A1 (en) 2015-01-30 2016-08-04 The Regents Of The University Of California Protein delivery in primary hematopoietic cells
PL3265563T3 (en) 2015-02-02 2021-09-13 Meiragtx Uk Ii Limited Regulation of gene expression by aptamer-mediated modulation of alternative splicing
CN104593418A (en) 2015-02-06 2015-05-06 中国医学科学院医学实验动物研究所 Method for establishing humanized rat drug evaluation animal model
US10676726B2 (en) 2015-02-09 2020-06-09 Duke University Compositions and methods for epigenome editing
KR101584933B1 (en) 2015-02-10 2016-01-13 성균관대학교산학협력단 Recombinant vector for inhibiting antibiotic resistance and uses thereof
WO2016130697A1 (en) 2015-02-11 2016-08-18 Memorial Sloan Kettering Cancer Center Methods and kits for generating vectors that co-express multiple target molecules
CN104928321B (en) 2015-02-12 2018-06-01 中国科学院西北高原生物研究所 A kind of scale loss zebra fish pattern and method for building up by Crispr/Cas9 inductions
CN104726494B (en) 2015-02-12 2018-10-23 中国人民解放军第二军医大学 The method that CRISPR-Cas9 technologies build chromosome translocation stem cell and animal model
WO2016131009A1 (en) 2015-02-13 2016-08-18 University Of Massachusetts Compositions and methods for transient delivery of nucleases
US20160244784A1 (en) 2015-02-15 2016-08-25 Massachusetts Institute Of Technology Population-Hastened Assembly Genetic Engineering
WO2016132122A1 (en) 2015-02-17 2016-08-25 University Of Edinburgh Assay construct
JP6354100B2 (en) 2015-02-19 2018-07-11 国立大学法人徳島大学 Method for introducing Cas9 mRNA into a fertilized egg of a mammal by electroporation
WO2016135559A2 (en) 2015-02-23 2016-09-01 Crispr Therapeutics Ag Materials and methods for treatment of human genetic diseases including hemoglobinopathies
EP3262162A4 (en) 2015-02-23 2018-08-08 Voyager Therapeutics, Inc. Regulatable expression using adeno-associated virus (aav)
AU2016225178B2 (en) 2015-02-23 2022-05-05 Crispr Therapeutics Ag Materials and methods for treatment of hemoglobinopathies
KR20160103953A (en) 2015-02-25 2016-09-02 연세대학교 산학협력단 Method for target DNA enrichment using CRISPR system
WO2016137774A1 (en) 2015-02-25 2016-09-01 Pioneer Hi-Bred International Inc Composition and methods for regulated expression of a guide rna/cas endonuclease complex
WO2016135507A1 (en) 2015-02-27 2016-09-01 University Of Edinburgh Nucleic acid editing systems
CN104805099B (en) 2015-03-02 2018-04-13 中国人民解放军第二军医大学 A kind of nucleic acid molecules and its expression vector of safe coding Cas9 albumen
EP3265559B1 (en) 2015-03-03 2021-01-06 The General Hospital Corporation Engineered crispr-cas9 nucleases with altered pam specificity
CN104651401B (en) 2015-03-05 2019-03-08 东华大学 A method for biallelic knockout of mir-505
CN104673816A (en) 2015-03-05 2015-06-03 广东医学院 PCr-NHEJ (non-homologous end joining) carrier as well as construction method of pCr-NHEJ carrier and application of pCr-NHEJ carrier in site-specific knockout of bacterial genes
US20160264934A1 (en) 2015-03-11 2016-09-15 The General Hospital Corporation METHODS FOR MODULATING AND ASSAYING m6A IN STEM CELL POPULATIONS
WO2016145150A2 (en) 2015-03-11 2016-09-15 The Broad Institute Inc. Selective treatment of prmt5 dependent cancer
GB201504223D0 (en) 2015-03-12 2015-04-29 Genome Res Ltd Biallelic genetic modification
US20180195084A1 (en) 2015-03-12 2018-07-12 Institute Of Genetics And Developmental Biology Chinese Academy Of Sciences Method for increasing ability of a plant to resist an invading dna virus
JP6588995B2 (en) 2015-03-13 2019-10-09 ザ ジャクソン ラボラトリーThe Jackson Laboratory Three component CRISPR / Cas composite system and its use
AR103926A1 (en) 2015-03-16 2017-06-14 Inst Genetics & Dev Biolog Cas METHOD FOR MAKING MODIFICATIONS DIRECTED SITE IN THE GENOMA OF A PLANT USING NON-HEREDABLE MATERIALS
CN106032540B (en) 2015-03-16 2019-10-25 中国科学院上海生命科学研究院 Adeno-associated virus vector construction and application of CRISPR/Cas9 endonuclease system
WO2016149484A2 (en) 2015-03-17 2016-09-22 Temple University Of The Commonwealth System Of Higher Education Compositions and methods for specific reactivation of hiv latent reservoir
CN113846144B (en) 2015-03-17 2023-09-26 生物辐射实验室股份有限公司 Detecting genome editing
EP3271461A1 (en) 2015-03-20 2018-01-24 Danmarks Tekniske Universitet Crispr/cas9 based engineering of actinomycetal genomes
MA41382A (en) 2015-03-20 2017-11-28 Univ Temple GENE EDITING BASED ON THE TAT-INDUCED CRISPR / ENDONUCLEASE SYSTEM
CN104726449A (en) 2015-03-23 2015-06-24 国家纳米科学中心 CRISPR-Cas9 system for preventing and/or treating HIV, as well as preparation method and application thereof
CN106148416B (en) 2015-03-24 2019-12-17 华东师范大学 Breeding method of Cyp gene knockout rats and preparation method of liver microsomes
US20180112213A1 (en) 2015-03-25 2018-04-26 Editas Medicine, Inc. Crispr/cas-related methods, compositions and components
EP3851530A1 (en) 2015-03-26 2021-07-21 Editas Medicine, Inc. Crispr/cas-mediated gene conversion
WO2016161004A1 (en) 2015-03-30 2016-10-06 The Board Of Regents Of The Nevada System Of Higher Educ. On Behalf Of The University Of Nevada, La Compositions comprising talens and methods of treating hiv
KR20250171479A (en) 2015-03-31 2025-12-08 소흠, 인코포레이티드 Cas 9 retroviral integrase and cas 9 recombinase systems for targeted incorporation of a dna sequence into a genome of a cell or organism
EP3748004A1 (en) 2015-04-01 2020-12-09 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating duchenne muscular dystrophy and becker muscular dystrophy
EP3300507A4 (en) 2015-04-02 2019-03-13 Agenovir Corporation Gene delivery methods and compositions
US20170166928A1 (en) 2015-04-03 2017-06-15 Whitehead Institute For Biomedical Research Compositions And Methods For Genetically Modifying Yeast
AU2016243052C1 (en) 2015-04-03 2022-11-24 Dana-Farber Cancer Institute, Inc. Composition and methods of genome editing of B-cells
CN106167810A (en) 2015-04-03 2016-11-30 内蒙古中科正标生物科技有限责任公司 Monocot genes knockout carrier based on CRISPR/Cas9 technology and application thereof
EP3280803B1 (en) 2015-04-06 2021-05-26 The Board of Trustees of the Leland Stanford Junior University Chemically modified guide rnas for crispr/cas-mediated gene regulation
RS61907B1 (en) 2015-04-06 2021-06-30 Subdomain Llc De novo binding domain containing polypeptides and uses thereof
US11214779B2 (en) 2015-04-08 2022-01-04 University of Pittsburgh—of the Commonwealth System of Higher Education Activatable CRISPR/CAS9 for spatial and temporal control of genome editing
JP6892642B2 (en) 2015-04-13 2021-06-23 国立大学法人 東京大学 A set of polypeptides that exhibit nuclease or nickase activity photodependently or in the presence of a drug, or suppress or activate the expression of a target gene.
US10155938B2 (en) 2015-04-14 2018-12-18 City Of Hope Coexpression of CAS9 and TREX2 for targeted mutagenesis
GB201506509D0 (en) 2015-04-16 2015-06-03 Univ Wageningen Nuclease-mediated genome editing
WO2016168631A1 (en) 2015-04-17 2016-10-20 President And Fellows Of Harvard College Vector-based mutagenesis system
WO2016170484A1 (en) 2015-04-21 2016-10-27 Novartis Ag Rna-guided gene editing system and uses thereof
CN104762321A (en) 2015-04-22 2015-07-08 东北林业大学 Knockout vector construction method based on CRISPR/Cas9 system target knockout KHV gene and crNRA prototype thereof
CN104805118A (en) 2015-04-22 2015-07-29 扬州大学 A method for targeted knockout of specific genes in Suqin yellow chicken embryonic stem cells
CA2982966C (en) 2015-04-24 2024-02-20 Editas Medicine, Inc. Evaluation of cas9 molecule/guide rna molecule complexes
US11268158B2 (en) 2015-04-24 2022-03-08 St. Jude Children's Research Hospital, Inc. Assay for safety assessment of therapeutic genetic manipulations, gene therapy vectors and compounds
CN107614012A (en) 2015-04-24 2018-01-19 加利福尼亚大学董事会 Using the cell detection of engineering, monitoring or treatment disease or the system of the patient's condition and preparation and use their method
EP3288594B1 (en) 2015-04-27 2022-06-29 The Trustees of The University of Pennsylvania Dual aav vector system for crispr/cas9 mediated correction of human disease
WO2016174056A1 (en) 2015-04-27 2016-11-03 Genethon Compositions and methods for the treatment of nucleotide repeat expansion disorders
EP3087974A1 (en) 2015-04-29 2016-11-02 Rodos BioTarget GmbH Targeted nanocarriers for targeted drug delivery of gene therapeutics
EP3289080B1 (en) 2015-04-30 2021-08-25 The Trustees of Columbia University in the City of New York Gene therapy for autosomal dominant diseases
US20190002920A1 (en) 2015-04-30 2019-01-03 The Brigham And Women's Hospital, Inc. Methods and kits for cloning-free genome editing
US20160346359A1 (en) 2015-05-01 2016-12-01 Spark Therapeutics, Inc. Adeno-associated Virus-Mediated CRISPR-Cas9 Treatment of Ocular Disease
US20180344817A1 (en) 2015-05-01 2018-12-06 Precision Biosciences, Inc. Precise deletion of chromosomal sequences in vivo and treatment of nucleotide repeat expansion disorders using engineered nucleases
EP3292219B9 (en) 2015-05-04 2022-05-18 Ramot at Tel-Aviv University Ltd. Methods and kits for fragmenting dna
CN104894068A (en) 2015-05-04 2015-09-09 南京凯地生物科技有限公司 Method for preparing CAR-T cell by CRISPR/Cas9
GB2531454A (en) 2016-01-10 2016-04-20 Snipr Technologies Ltd Recombinogenic nucleic acid strands in situ
CN107667173A (en) 2015-05-06 2018-02-06 斯尼普技术有限公司 Altering microbial populations and improving microbiota
WO2016182893A1 (en) 2015-05-08 2016-11-17 Teh Broad Institute Inc. Functional genomics using crispr-cas systems for saturating mutagenesis of non-coding elements, compositions, methods, libraries and applications thereof
ES2835861T5 (en) 2015-05-08 2025-02-18 Childrens Medical Ct Corp Targeting bcl11a enhancer functional regions for fetal hemoglobin reinduction
AU2016261358B2 (en) 2015-05-11 2021-09-16 Editas Medicine, Inc. Optimized CRISPR/Cas9 systems and methods for gene editing in stem cells
CA2985615A1 (en) 2015-05-11 2016-11-17 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating hiv infection and aids
KR101785847B1 (en) 2015-05-12 2017-10-17 연세대학교 산학협력단 Targeted genome editing based on CRISPR/Cas9 system using short linearized double-stranded DNA
MX382223B (en) 2015-05-12 2025-03-13 Sangamo Therapeutics Inc NUCLEASE-MEDIATED REGULATION OF GENE EXPRESSION.
US20180119174A1 (en) 2015-05-13 2018-05-03 Seattle Children's Hospita (dba Seattle Children's Research Institute Enhancing endonuclease based gene editing in primary cells
WO2016183402A2 (en) 2015-05-13 2016-11-17 President And Fellows Of Harvard College Methods of making and using guide rna for use with cas9 systems
CN105886498A (en) 2015-05-13 2016-08-24 沈志荣 Method for specifically knocking out human PCSK9 gene by virtue of CRISPR-Cas9 and sgRNA for specifically targeting PCSK9 gene
EP3294774B1 (en) 2015-05-13 2024-08-28 Zumutor Biologics, Inc. Afucosylated protein, cell expressing said protein and associated methods
WO2016183438A1 (en) 2015-05-14 2016-11-17 Massachusetts Institute Of Technology Self-targeting genome editing system
EP3294879A4 (en) 2015-05-14 2019-02-20 University of Southern California OPTIMIZED GENOMIC EDITION USING A RECOMBINANT ENDONUCLEASE SYSTEM
CN107849546A (en) 2015-05-15 2018-03-27 先锋国际良种公司 To the quick sign of CAS endonuclease systems, PAM sequences and guide RNA element
CN107709555A (en) 2015-05-15 2018-02-16 达尔马科恩有限公司 The unidirectional of synthesis for the gene editing of Cas9 mediations leads RNA
WO2016186772A2 (en) 2015-05-16 2016-11-24 Genzyme Corporation Gene editing of deep intronic mutations
CN104846010B (en) 2015-05-18 2018-07-06 安徽省农业科学院水稻研究所 A kind of method for deleting transgenic paddy rice riddled basins
EP3298149A1 (en) 2015-05-18 2018-03-28 King Abdullah University Of Science And Technology Method of inhibiting plant virus pathogen infections by crispr/cas9-mediated interference
EP3095870A1 (en) 2015-05-19 2016-11-23 Kws Saat Se Methods for the in planta transformation of plants and manufacturing processes and products based and obtainable therefrom
CN106011104B (en) 2015-05-21 2019-09-27 清华大学 Method for gene editing and expression regulation using split Cas system
CN105518135B (en) 2015-05-22 2020-11-24 深圳市第二人民医院 CRISPR-Cas9 specific knockout method of porcine CMAH gene and sgRNA for specific targeting of CMAH gene
US20160340622A1 (en) 2015-05-22 2016-11-24 Nabil Radi Abdou Bar Soap Anchoring Core
WO2016187904A1 (en) 2015-05-22 2016-12-01 深圳市第二人民医院 Method for pig cmah gene specific knockout by means of crispr-cas9 and sgrna for specially targeting cmah gene
WO2016187717A1 (en) 2015-05-26 2016-12-01 Exerkine Corporation Exosomes useful for genome editing
CN105624146B (en) 2015-05-28 2019-02-15 中国科学院微生物研究所 Molecular cloning method based on CRISPR/Cas9 and endogenous homologous recombination in Saccharomyces cerevisiae cells
WO2016191684A1 (en) 2015-05-28 2016-12-01 Finer Mitchell H Genome editing vectors
CN104894075B (en) 2015-05-28 2019-08-06 华中农业大学 CRISPR/Cas9 and Cre/lox system editor's Pseudorabies virus genome prepares vaccine approach and application
EP3325620A4 (en) 2015-05-29 2019-06-26 Agenovir Corporation Antiviral methods and compositions
EP3302556A4 (en) 2015-05-29 2018-12-05 Clark Atlanta University Human cell lines mutant for zic2
EP3331582A4 (en) 2015-05-29 2019-08-07 Agenovir Corporation Methods and compositions for treating cells for transplant
US20160346362A1 (en) 2015-05-29 2016-12-01 Agenovir Corporation Methods and compositions for treating cytomegalovirus infections
EP3331571A4 (en) 2015-05-29 2019-04-10 Agenovir Corporation COMPOSITIONS AND METHODS FOR TREATING VIRAL INFECTIONS
US20160346360A1 (en) 2015-05-29 2016-12-01 Agenovir Corporation Compositions and methods for cell targeted hpv treatment
US10117911B2 (en) 2015-05-29 2018-11-06 Agenovir Corporation Compositions and methods to treat herpes simplex virus infections
KR20220139447A (en) 2015-05-29 2022-10-14 노쓰 캐롤라이나 스테이트 유니버시티 Methods for screening bacteria, archaea, algae, and yeast using crispr nucleic acids
EA037359B1 (en) 2015-06-01 2021-03-17 Тэмпл Юниверсити - Оф Зе Коммонвэлс Систем Оф Хайе Эдьюкейшн Methods and compositions for rna-guided treatment of hiv infection
CA2987684A1 (en) 2015-06-01 2016-12-08 The Hospital For Sick Children Delivery of structurally diverse polypeptide cargo into mammalian cells by a bacterial toxin
CN105112445B (en) 2015-06-02 2018-08-10 广州辉园苑医药科技有限公司 A kind of miR-205 gene knockout kits based on CRISPR-Cas9 gene Knockouts
WO2016196887A1 (en) 2015-06-03 2016-12-08 Board Of Regents Of The University Of Nebraska Dna editing using single-stranded dna
EP3303634B1 (en) 2015-06-03 2023-08-30 The Regents of The University of California Cas9 variants and methods of use thereof
US10626393B2 (en) 2015-06-04 2020-04-21 Arbutus Biopharma Corporation Delivering CRISPR therapeutics with lipid nanoparticles
US20180245074A1 (en) 2015-06-04 2018-08-30 Protiva Biotherapeutics, Inc. Treating hepatitis b virus infection using crispr
CN105039339B (en) 2015-06-05 2017-12-19 新疆畜牧科学院生物技术研究所 A kind of method of specific knockdown sheep FecB genes with RNA mediations and its special sgRNA
EP3307887A1 (en) 2015-06-09 2018-04-18 Editas Medicine, Inc. Crispr/cas-related methods and compositions for improving transplantation
WO2016198500A1 (en) 2015-06-10 2016-12-15 INSERM (Institut National de la Santé et de la Recherche Médicale) Methods and compositions for rna-guided treatment of human cytomegalovirus (hcmv) infection
JP6961494B2 (en) 2015-06-10 2021-11-05 フイルメニツヒ ソシエテ アノニムFirmenich Sa Cell lines for screening odorant and aroma receptors
JP7085841B2 (en) 2015-06-10 2022-06-17 フイルメニツヒ ソシエテ アノニム Identification method of musk compound
US20160362667A1 (en) 2015-06-10 2016-12-15 Caribou Biosciences, Inc. CRISPR-Cas Compositions and Methods
WO2016197356A1 (en) 2015-06-11 2016-12-15 深圳市第二人民医院 Method for knockout of swine sla-2 gene using crispr-cas9 specificity, and sgrna used for specifically targeting sla-2 gene
WO2016197360A1 (en) 2015-06-11 2016-12-15 深圳市第二人民医院 Method for specific knockout of swine gfra1 gene using crispr-cas9 specificity, and sgrna used for specifically targeting gfra1 gene
CN105518137B (en) 2015-06-11 2021-04-30 深圳市第二人民医院 Method for specifically knocking out pig SALL1 gene by CRISPR-Cas9 and sgRNA for specifically targeting SALL1 gene
WO2016197358A1 (en) 2015-06-11 2016-12-15 深圳市第二人民医院 Method for specific knockout of swine fgl-2 gene using crispr-cas9 specificity, and sgrna used for specifically targeting fgl-2 gene
WO2016197357A1 (en) 2015-06-11 2016-12-15 深圳市第二人民医院 Method for specific knockout of swine sla-3 gene using crispr-cas9 specificity, and sgrna used for specifically targeting sla-3 gene
CN105492608B (en) 2015-06-11 2021-07-23 深圳市第二人民医院 CRISPR-Cas9 specific knockout method of porcine PDX1 gene and sgRNA used to specifically target PDX1 gene
CN105518140A (en) 2015-06-11 2016-04-20 深圳市第二人民医院 Method for pig vWF gene specific knockout through CRISPR-Cas9 and sgRNA for specially targeting vWF gene
WO2016197361A1 (en) 2015-06-11 2016-12-15 深圳市第二人民医院 Method for specific knockout of swine ggta1 gene using crispr-cas9 specificity, and sgrna used for specifically targeting ggta1 gene
CN105593367A (en) 2015-06-11 2016-05-18 深圳市第二人民医院 CRISPR-Cas9 specificity pig SLA-1 gene knockout method and sgRNA used for specific targeting SLA-1 gene
US20180187190A1 (en) 2015-06-12 2018-07-05 Erasmus University Medical Center Rotterdam New crispr assays
GB201510296D0 (en) 2015-06-12 2015-07-29 Univ Wageningen Thermostable CAS9 nucleases
WO2016201138A1 (en) 2015-06-12 2016-12-15 The Regents Of The University Of California Reporter cas9 variants and methods of use thereof
KR102468240B1 (en) 2015-06-15 2022-11-17 노쓰 캐롤라이나 스테이트 유니버시티 Methods and compositions for the efficient delivery of nucleic acid and RNA-based antimicrobial agents
CA2989858A1 (en) 2015-06-17 2016-12-22 The Uab Research Foundation Crispr/cas9 complex for introducing a functional polypeptide into cells of blood cell lineage
WO2016205623A1 (en) 2015-06-17 2016-12-22 North Carolina State University Methods and compositions for genome editing in bacteria using crispr-cas9 systems
WO2016205728A1 (en) 2015-06-17 2016-12-22 Massachusetts Institute Of Technology Crispr mediated recording of cellular events
US11643668B2 (en) 2015-06-17 2023-05-09 The Uab Research Foundation CRISPR/Cas9 complex for genomic editing
WO2016205745A2 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Cell sorting
TWI813532B (en) 2015-06-18 2023-09-01 美商博得學院股份有限公司 Crispr enzyme mutations reducing off-target effects
US9957501B2 (en) 2015-06-18 2018-05-01 Sangamo Therapeutics, Inc. Nuclease-mediated regulation of gene expression
US10954513B2 (en) 2015-06-18 2021-03-23 University Of Utah Research Foundation RNA-guided transcriptional regulation and methods of using the same for the treatment of back pain
WO2016205759A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Engineering and optimization of systems, methods, enzymes and guide scaffolds of cas9 orthologs and variants for sequence manipulation
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
EP4159856A1 (en) 2015-06-18 2023-04-05 The Broad Institute, Inc. Novel crispr enzymes and systems
CA3012607A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Crispr enzymes and systems
UY36743A (en) 2015-06-22 2017-01-31 Bayer Cropscience Ag NEW 3-PHENYL-PIRROLIDIN-2,4-DIONA ALQUINIL-SUBSTANCES AND THEIR USES AS HERBICIDES
GB201511191D0 (en) 2015-06-25 2015-08-12 Immatics Biotechnologies Gmbh T-cell epitopes for the immunotherapy of myeloma
EP4545544A3 (en) 2015-06-29 2025-10-08 Ionis Pharmaceuticals, Inc. Modified crispr rna and modified single crispr rna and uses thereof
GB201511376D0 (en) 2015-06-29 2015-08-12 Ecolab Usa Inc Process for the treatment of produced water from chemical enhanced oil recovery
US11279928B2 (en) 2015-06-29 2022-03-22 Massachusetts Institute Of Technology Compositions comprising nucleic acids and methods of using the same
EP4043556B1 (en) 2015-06-30 2024-02-07 Cellectis Methods for improving functionality in nk cell by gene inactivation using specific endonuclease
CA2989331A1 (en) 2015-07-02 2017-01-05 The Johns Hopkins University Crispr/cas9-based treatments
US20170009242A1 (en) 2015-07-06 2017-01-12 Whitehead Institute For Biomedical Research CRISPR-Mediated Genome Engineering for Protein Depletion
DK3320091T3 (en) 2015-07-06 2021-02-01 Dsm Ip Assets Bv GUIDE RNA COLLECTION VECTOR
CN105132451B (en) 2015-07-08 2019-07-23 电子科技大学 A kind of single transcriptional units directed modification skeleton carrier of CRISPR/Cas9 and its application
EP3322797B1 (en) 2015-07-13 2023-11-29 Institut Pasteur Improving sequence-specific antimicrobials by blocking dna repair
US20170014449A1 (en) 2015-07-13 2017-01-19 Elwha LLC, a limited liability company of the State of Delaware Site-specific epigenetic editing
EP3322297B1 (en) 2015-07-13 2024-12-04 Sangamo Therapeutics, Inc. Delivery methods and compositions for nuclease-mediated genome engineering
JP6624743B2 (en) 2015-07-14 2019-12-25 学校法人福岡大学 Site-specific RNA mutagenesis method, target editing guide RNA used therefor, and target RNA-target editing guide RNA complex
EP3322801A1 (en) 2015-07-15 2018-05-23 Juno Therapeutics, Inc. Engineered cells for adoptive cell therapy
US11479793B2 (en) 2015-07-15 2022-10-25 Rutgers, The State University Of New Jersey Nuclease-independent targeted gene editing platform and uses thereof
US20170020922A1 (en) 2015-07-16 2017-01-26 Batu Biologics Inc. Gene editing for immunological destruction of neoplasia
WO2017015015A1 (en) 2015-07-17 2017-01-26 Emory University Crispr-associated protein from francisella and uses related thereto
WO2017015101A1 (en) 2015-07-17 2017-01-26 University Of Washington Methods for maximizing the efficiency of targeted gene correction
WO2017015545A1 (en) 2015-07-22 2017-01-26 President And Fellows Of Harvard College Evolution of site-specific recombinases
WO2017015637A1 (en) 2015-07-22 2017-01-26 Duke University High-throughput screening of regulatory element function with epigenome editing technologies
JP2018524992A (en) 2015-07-23 2018-09-06 メイヨ・ファウンデーション・フォー・メディカル・エデュケーション・アンド・リサーチ Editing mitochondrial DNA
CA2993474A1 (en) 2015-07-25 2017-02-02 Habib FROST A system, device and a method for providing a therapy or a cure for cancer and other pathological states
CN106399360A (en) 2015-07-27 2017-02-15 上海药明生物技术有限公司 FUT8 gene knockout method based on CRISPR technology
JP6937740B2 (en) 2015-07-28 2021-09-22 ダニスコ・ユーエス・インク Genome editing system and usage
CN105063061B (en) 2015-07-28 2018-10-30 华南农业大学 A kind of rice mass of 1000 kernel gene tgw6 mutant and the preparation method and application thereof
CN106701808A (en) 2015-07-29 2017-05-24 深圳华大基因研究院 DNA polymerase I defective strain and construction method thereof
US10612011B2 (en) 2015-07-30 2020-04-07 President And Fellows Of Harvard College Evolution of TALENs
WO2017069829A2 (en) 2015-07-31 2017-04-27 The Trustees Of Columbia University In The City Of New York High-throughput strategy for dissecting mammalian genetic interactions
GB2592821B (en) 2015-07-31 2022-01-12 Univ Minnesota Modified cells and methods of therapy
WO2017024047A1 (en) 2015-08-03 2017-02-09 Emendobio Inc. Compositions and methods for increasing nuclease induced recombination rate in cells
WO2017023974A1 (en) 2015-08-03 2017-02-09 President And Fellows Of Harvard College Cas9 genome editing and transcriptional regulation
WO2017024318A1 (en) 2015-08-06 2017-02-09 Dana-Farber Cancer Institute, Inc. Targeted protein degradation to attenuate adoptive t-cell therapy associated adverse inflammatory responses
US9580727B1 (en) 2015-08-07 2017-02-28 Caribou Biosciences, Inc. Compositions and methods of engineered CRISPR-Cas9 systems using split-nexus Cas9-associated polynucleotides
CN104962523B (en) 2015-08-07 2018-05-25 苏州大学 A kind of method for measuring non-homologous end joining repairing activity
WO2017024343A1 (en) 2015-08-07 2017-02-16 Commonwealth Scientific And Industrial Research Organisation Method for producing an animal comprising a germline genetic modification
CA2994746A1 (en) 2015-08-11 2017-02-16 Cellectis Cells for immunotherapy engineered for targeting cd38 antigen and for cd38 gene inactivation
EA201890492A1 (en) 2015-08-14 2018-08-31 Инститьют Оф Дженетикс Энд Девелопментал Байолоджи, Чайниз Акэдеми Оф Сайенсиз METHOD OF OBTAINING RIS, SUSTAINABLE TO GLYFOSAT, BY SITE-DIRECTED REPLACEMENT OF NUCLEOTIDE
CN105255937A (en) 2015-08-14 2016-01-20 西北农林科技大学 Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof
AU2016308283B2 (en) 2015-08-19 2022-04-21 Arc Bio, Llc Capture of nucleic acids using a nucleic acid-guided nuclease-based system
US11339408B2 (en) 2015-08-20 2022-05-24 Applied Stemcell, Inc. Nuclease with enhanced efficiency of genome editing
CN105112519A (en) 2015-08-20 2015-12-02 郑州大学 CRISPR-based Escherichia coli O157:H7 strain detection reagent box and detection method
CN105177126B (en) 2015-08-21 2018-12-04 东华大学 It is a kind of using Fluorescence PCR assay to the Classification Identification method of mouse
EP3341727B1 (en) 2015-08-25 2022-08-10 Duke University Compositions and methods of improving specificity in genomic engineering using rna-guided endonucleases
CN106480083B (en) 2015-08-26 2021-12-14 中国科学院分子植物科学卓越创新中心 CRISPR/Cas9-mediated Large Fragment DNA Splicing Method
AU2016316845B2 (en) 2015-08-28 2022-03-10 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
US9512446B1 (en) 2015-08-28 2016-12-06 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
US9926546B2 (en) 2015-08-28 2018-03-27 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
CN105087620B (en) 2015-08-31 2017-12-29 中国农业大学 One kind is overexpressed the 1BB carriers of pig costimulation acceptor 4 and its application
WO2017040709A1 (en) 2015-08-31 2017-03-09 Caribou Biosciences, Inc. Directed nucleic acid repair
WO2017040511A1 (en) 2015-08-31 2017-03-09 Agilent Technologies, Inc. Compounds and methods for crispr/cas-based genome editing by homologous recombination
CA2996599A1 (en) 2015-09-01 2017-03-09 Dana-Farber Cancer Institute Inc. Systems and methods for selection of grna targeting strands for cas9 localization
WO2017040813A2 (en) 2015-09-02 2017-03-09 University Of Massachusetts Detection of gene loci with crispr arrayed repeats and/or polychromatic single guide ribonucleic acids
WO2017040786A1 (en) 2015-09-04 2017-03-09 Massachusetts Institute Of Technology Multilayer genetic safety kill circuits based on single cas9 protein and multiple engineered grna in mammalian cells
CN105400810B (en) 2015-09-06 2019-05-07 吉林大学 A method for establishing a hypophosphatemic rickets model by knockout technology
WO2017044419A1 (en) 2015-09-08 2017-03-16 University Of Massachusetts Dnase h activity of neisseria meningitidis cas9
ES2938623T3 (en) 2015-09-09 2023-04-13 Univ Kobe Nat Univ Corp Method for converting a genome sequence of a gram-positive bacterium by specific nucleic acid base conversion of a targeted DNA sequence and the molecular complex used therein
JP6780860B2 (en) 2015-09-09 2020-11-04 国立大学法人神戸大学 Genome sequence modification method that specifically converts the nucleobase of the targeted DNA sequence and the molecular complex used for it.
WO2017044776A1 (en) 2015-09-10 2017-03-16 Texas Tech University System Single-guide rna (sgrna) with improved knockout efficiency
WO2017044857A2 (en) 2015-09-10 2017-03-16 Youhealth Biotech, Limited Methods and compositions for the treatment of glaucoma
CN105274144A (en) 2015-09-14 2016-01-27 徐又佳 Preparation method of zebrafish with hepcidin gene knocked out by use of CRISPR / Cas9 technology
US10109551B2 (en) 2015-09-15 2018-10-23 Intel Corporation Methods and apparatuses for determining a parameter of a die
CN105210981B (en) 2015-09-15 2018-09-28 中国科学院生物物理研究所 Establish the method and its application for the ferret model that can be applied to human diseases research
US10301613B2 (en) 2015-09-15 2019-05-28 Arizona Board Of Regents On Behalf Of Arizona State University Targeted remodeling of prokaryotic genomes using CRISPR-nickases
CN105112422B (en) 2015-09-16 2019-11-08 中山大学 Application of Gene miR408 and UCL in Breeding High-yielding Rice
WO2017053431A2 (en) 2015-09-21 2017-03-30 Arcturus Therapeutics, Inc. Allele selective gene editing and uses thereof
CN105132427B (en) 2015-09-21 2019-01-08 新疆畜牧科学院生物技术研究所 A kind of dual-gene method for obtaining gene editing sheep of specific knockdown mediated with RNA and its dedicated sgRNA
US20180237800A1 (en) 2015-09-21 2018-08-23 The Regents Of The University Of California Compositions and methods for target nucleic acid modification
WO2017053753A1 (en) 2015-09-23 2017-03-30 Sangamo Biosciences, Inc. Htt repressors and uses thereof
AU2016326711B2 (en) 2015-09-24 2022-11-03 Editas Medicine, Inc. Use of exonucleases to improve CRISPR/Cas-mediated genome editing
CN108350489B (en) 2015-09-24 2022-04-29 西格马-奥尔德里奇有限责任公司 Methods and Reagents for Molecular Proximity Detection Using RNA-Guided Nucleic Acid Binding Proteins
US20190048340A1 (en) 2015-09-24 2019-02-14 Crispr Therapeutics Ag Novel family of rna-programmable endonucleases and their uses in genome editing and other applications
KR101795999B1 (en) 2015-09-25 2017-11-09 전남대학교산학협력단 Primer for Beta2-Microglobulin gene remove using CRISPR/CAS9 system
EP3353309A4 (en) 2015-09-25 2019-04-10 Tarveda Therapeutics, Inc. COMPOSITIONS AND METHODS FOR GENOMIC EDITION
KR101745863B1 (en) 2015-09-25 2017-06-12 전남대학교산학협력단 Primer for prohibitin2 gene remove using CRISPR/CAS9 system
WO2017053729A1 (en) 2015-09-25 2017-03-30 The Board Of Trustees Of The Leland Stanford Junior University Nuclease-mediated genome editing of primary cells and enrichment thereof
EP3147363B1 (en) 2015-09-26 2019-10-16 B.R.A.I.N. Ag Activation of taste receptor genes in mammalian cells using crispr-cas-9
EP3356521A4 (en) 2015-09-28 2019-03-13 Temple University - Of The Commonwealth System of Higher Education METHODS AND COMPOSITIONS FOR RNA-GUIDED TREATMENT OF HIV INFECTION
HK1258900A1 (en) 2015-09-29 2019-11-22 埃吉诺维亚公司 Delivery methods and compositions
WO2017058796A1 (en) 2015-09-29 2017-04-06 Agenovir Corporation Antiviral fusion proteins and genes
CN105177038B (en) 2015-09-29 2018-08-24 中国科学院遗传与发育生物学研究所 A kind of CRISPR/Cas9 systems of efficient fixed point editor Plant Genome
WO2017058791A1 (en) 2015-09-29 2017-04-06 Agenovir Corporation Compositions and methods for treatment of latent viral infections
CA2999923A1 (en) 2015-09-29 2017-04-06 Agenovir Corporation Compositions and methods for latent viral transcription regulation
CN105331627B (en) 2015-09-30 2019-04-02 华中农业大学 A method for prokaryotic genome editing using the endogenous CRISPR-Cas system
EP3356520B1 (en) 2015-10-02 2022-03-23 The U.S.A. as represented by the Secretary, Department of Health and Human Services Lentiviral protein delivery system for rna-guided genome editing
US11497816B2 (en) 2015-10-06 2022-11-15 The Children's Hospital Of Philadelphia Compositions and methods for treating fragile X syndrome and related syndromes
US10760081B2 (en) 2015-10-07 2020-09-01 New York University Compositions and methods for enhancing CRISPR activity by POLQ inhibition
WO2017062886A1 (en) 2015-10-08 2017-04-13 Cellink Corporation Battery interconnects
EP4491732A3 (en) 2015-10-08 2025-03-26 President and Fellows of Harvard College Multiplexed genome editing
EP4400597A3 (en) 2015-10-09 2024-10-16 Monsanto Technology LLC Novel rna-guided nucleases and uses thereof
WO2017062983A1 (en) 2015-10-09 2017-04-13 The Children's Hospital Of Philadelphia Compositions and methods for treating huntington's disease and related disorders
AU2016338785B2 (en) 2015-10-12 2022-07-14 E. I. Du Pont De Nemours And Company Protected DNA templates for gene modification and increased homologous recombination in cells and methods of use
EP4089175A1 (en) 2015-10-13 2022-11-16 Duke University Genome engineering with type i crispr systems in eukaryotic cells
JP2018532404A (en) 2015-10-14 2018-11-08 ライフ テクノロジーズ コーポレーション Ribonucleoprotein transfection agent
CN105400779A (en) 2015-10-15 2016-03-16 芜湖医诺生物技术有限公司 Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system
FR3042506B1 (en) 2015-10-16 2018-11-30 IFP Energies Nouvelles GENETIC TOOL FOR PROCESSING BACTERIA CLOSTRIDIUM
US20190083656A1 (en) 2015-10-16 2019-03-21 Temple University - Of The Commonwealth System Of Higher Education Methods and compositions utilizing cpf1 for rna-guided gene editing
US10947559B2 (en) 2015-10-16 2021-03-16 Astrazeneca Ab Inducible modification of a cell genome
WO2017066781A1 (en) 2015-10-16 2017-04-20 Modernatx, Inc. Mrna cap analogs with modified phosphate linkage
US20180327706A1 (en) 2015-10-19 2018-11-15 The Methodist Hospital Crispr-cas9 delivery to hard-to-transfect cells via membrane deformation
CN105331607A (en) 2015-10-19 2016-02-17 芜湖医诺生物技术有限公司 Human CCR5 gene target sequence recognized by streptococcus thermophilus CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 (CRISPR-associated protein 9) system, sgRNA (single guide ribonucleic acid) and application
CN105331608A (en) 2015-10-20 2016-02-17 芜湖医诺生物技术有限公司 Human CXCR4 gene target sequence identified by neisseria meningitidis CRISPR-Cas9 system, sgRNA and application of target sequence and sgRNA
CN105316337A (en) 2015-10-20 2016-02-10 芜湖医诺生物技术有限公司 Streptococcus thermophilus derived human CXCR3 gene target sequence recognizable by CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 (CRISPR associated 9) system and sgRNA (single guide ribonucleic acid) and application thereof
CN105331609A (en) 2015-10-20 2016-02-17 芜湖医诺生物技术有限公司 Human CCR5 gene target sequence identified by neisseria meningitidis CRISPR-Cas9 system, sgRNA and application of target sequence and sgRNA
CN105316324A (en) 2015-10-20 2016-02-10 芜湖医诺生物技术有限公司 Streptococcus thermophilus derived human CXCR3 gene target sequence recognizable by CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 (CRISPR associated 9) system and sgRNA (single guide ribonucleic acid) and application thereof
WO2017068077A1 (en) 2015-10-20 2017-04-27 Institut National De La Sante Et De La Recherche Medicale (Inserm) Methods and products for genetic engineering
EP3365437B1 (en) 2015-10-20 2025-06-04 Institut National de la Santé et de la Recherche Médicale (INSERM) Methods and products for genetic engineering
EP3365440B1 (en) 2015-10-20 2022-09-14 Pioneer Hi-Bred International, Inc. Restoring function to a non-functional gene product via guided cas systems and methods of use
WO2017070284A1 (en) 2015-10-21 2017-04-27 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating hepatitis b virus
CN109153980B (en) 2015-10-22 2023-04-14 布罗德研究所有限公司 Type VI-B CRISPR enzymes and systems
CN105219799A (en) 2015-10-22 2016-01-06 天津吉诺沃生物科技有限公司 The breeding method of a kind of English ryegrass based on CRISPR/Cas system
EP3159407A1 (en) 2015-10-23 2017-04-26 Silence Therapeutics (London) Ltd Guide rnas, methods and uses
DK3350327T3 (en) 2015-10-23 2019-01-21 Caribou Biosciences Inc CONSTRUCTED CRISPR CLASS-2-NUCLEIC ACID TARGETING-NUCLEIC ACID
JP7109784B2 (en) 2015-10-23 2022-08-01 プレジデント アンド フェローズ オブ ハーバード カレッジ Evolved Cas9 protein for gene editing
TW201715041A (en) 2015-10-26 2017-05-01 國立清華大學 Method for bacterial genome editing
US9988637B2 (en) 2015-10-26 2018-06-05 National Tsing Hua Univeristy Cas9 plasmid, genome editing system and method of Escherichia coli
US10280411B2 (en) 2015-10-27 2019-05-07 Pacific Biosciences of California, In.c Methods, systems, and reagents for direct RNA sequencing
WO2017075261A1 (en) 2015-10-27 2017-05-04 Recombinetics, Inc. Engineering of humanized car t-cells and platelets by genetic complementation
US20180230489A1 (en) 2015-10-28 2018-08-16 Voyager Therapeutics, Inc. Regulatable expression using adeno-associated virus (aav)
JP2019507579A (en) 2015-10-28 2019-03-22 クリスパー セラピューティクス アーゲー Materials and methods for the treatment of Duchenne muscular dystrophy
CA3002524A1 (en) 2015-10-28 2017-05-04 Sangamo Therapeutics, Inc. Liver-specific constructs, factor viii expression cassettes and methods of use thereof
US11111508B2 (en) 2015-10-30 2021-09-07 Brandeis University Modified CAS9 compositions and methods of use
WO2017075475A1 (en) 2015-10-30 2017-05-04 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating herpes simplex virus
CN105238806B (en) 2015-11-02 2018-11-27 中国科学院天津工业生物技术研究所 A kind of building and its application of the CRISPR/Cas9 gene editing carrier for microorganism
CN105316327B (en) 2015-11-03 2019-01-29 中国农业科学院作物科学研究所 Wheat TaAGO4a gene CRISPR/Cas9 vector and its application
WO2017079428A1 (en) 2015-11-04 2017-05-11 President And Fellows Of Harvard College Site specific germline modification
JP6928604B2 (en) 2015-11-04 2021-09-01 フェイト セラピューティクス,インコーポレイテッド Genome modification of pluripotent cells
MY185961A (en) 2015-11-04 2021-06-14 Univ Pennsylvania Methods and compositions for gene editing in hematopoietic stem cells
GB2544270A (en) 2015-11-05 2017-05-17 Fundació Centre De Regulació Genòmica Nucleic acids, peptides and methods
US20180320138A1 (en) 2015-11-05 2018-11-08 Centro De Investigación Biomédica En Red (Ciber) Process of gene-editing of cells isolated from a subject suffering from a metabolic disease affecting the erythroid lineage, cells obtained by said process and uses thereof
WO2017078751A1 (en) 2015-11-06 2017-05-11 The Methodist Hospital Micoluidic cell deomailiy assay for enabling rapid and efficient kinase screening via the crispr-cas9 system
AU2016349738A1 (en) 2015-11-06 2018-05-24 The Jackson Laboratory Large genomic DNA knock-in and uses thereof
WO2017081288A1 (en) 2015-11-11 2017-05-18 Lonza Ltd Crispr-associated (cas) proteins with reduced immunogenicity
WO2017083722A1 (en) 2015-11-11 2017-05-18 Greenberg Kenneth P Crispr compositions and methods of using the same for gene therapy
CA2947904A1 (en) 2015-11-12 2017-05-12 Pfizer Inc. Tissue-specific genome engineering using crispr-cas9
KR101885901B1 (en) 2015-11-13 2018-08-07 기초과학연구원 RGEN RNP delivery method using 5'-phosphate removed RNA
US20170191047A1 (en) 2015-11-13 2017-07-06 University Of Georgia Research Foundation, Inc. Adenosine-specific rnase and methods of use
ES2905558T3 (en) 2015-11-13 2022-04-11 Avellino Lab Usa Inc Procedures for the treatment of corneal dystrophies
CA3005633C (en) 2015-11-16 2023-11-21 Research Institute Of Nationwide Children's Hospital Materials and methods for treatment of titin-based myopathies and other titinopathies
CN106893739A (en) 2015-11-17 2017-06-27 香港中文大学 New methods and systems for targeted genetic manipulation
JP2019500899A (en) 2015-11-23 2019-01-17 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア Cellular RNA tracking and manipulation through nuclear delivery of CRISPR / Cas9
CN105602987A (en) 2015-11-23 2016-05-25 深圳市默赛尔生物医学科技发展有限公司 High-efficiency knockout method for XBP1 gene in DC cell
US20170145438A1 (en) 2015-11-24 2017-05-25 University Of South Carolina Viral Vectors for Gene Editing
US10612044B2 (en) 2015-11-25 2020-04-07 National University Corporation Gunma University DNA methylation editing kit and DNA methylation editing method
US10240145B2 (en) 2015-11-25 2019-03-26 The Board Of Trustees Of The Leland Stanford Junior University CRISPR/Cas-mediated genome editing to treat EGFR-mutant lung cancer
US20180346940A1 (en) 2015-11-27 2018-12-06 The Regents Of The University Of California Compositions and methods for the production of hydrocarbons, hydrogen and carbon monoxide using engineered azotobacter strains
CN105505979A (en) 2015-11-28 2016-04-20 湖北大学 Method for acquiring aromatic rice strain by targeting Badh2 gene via CRISPR/Cas9 gene editing technology
CN106811479B (en) 2015-11-30 2019-10-25 中国农业科学院作物科学研究所 The system and application of CRISPR/Cas9 system to modify ALS gene to obtain herbicide-resistant rice
KR101906491B1 (en) 2015-11-30 2018-12-05 기초과학연구원 Composition for Genome Editing comprising Cas9 derived from F. novicida
RU2634395C1 (en) 2015-12-01 2017-10-26 Федеральное государственное автономное образовательное учреждение высшего профессионального образования "Балтийский Федеральный Университет имени Иммануила Канта" (БФУ им. И. Канта) GENETIC CONSTRUCT BASED ON CRISPR/Cas9 GENOME SYSTEM EDITING, CODING Cas9 NUCLEASE, SPECIFICALLY IMPORTED IN HUMAN CELLS MITOCHONDRIA
CN105296518A (en) 2015-12-01 2016-02-03 中国农业大学 Homologous arm vector construction method used for CRISPR/Cas 9 technology
EP3383168A4 (en) 2015-12-02 2019-05-08 Ceres, Inc. METHODS FOR GENETIC MODIFICATION OF PLANTS
WO2017096041A1 (en) 2015-12-02 2017-06-08 The Regents Of The University Of California Compositions and methods for modifying a target nucleic acid
WO2017093370A1 (en) 2015-12-03 2017-06-08 Technische Universität München T-cell specific genome editing
CN105779448B (en) 2015-12-04 2018-11-27 新疆农业大学 A kind of cotton promoters GbU6-7PS and application
CN105779449B (en) 2015-12-04 2018-11-27 新疆农业大学 A kind of cotton promoters GbU6-5PS and application
EA201891338A1 (en) 2015-12-04 2018-12-28 Новартис Аг COMPOSITIONS AND METHODS FOR IMMUNICOLOGY
CN106845151B (en) 2015-12-07 2019-03-26 中国农业大学 The screening technique and device of CRISPR-Cas9 system sgRNA action target spot
CN105462968B (en) 2015-12-07 2018-10-16 北京信生元生物医学科技有限公司 It is a kind of targeting apoC III CRISPR-Cas9 systems and its application
EP3387001A4 (en) 2015-12-09 2019-08-14 Excision Biotherapeutics, Inc. METHODS AND COMPOSITIONS OF GENE EDITION TO ELIMINATE THE RISK OF ACTIVATING JC VIRUS AND PML (PROGRESSIVE MULTIFOCAL LEUCOENCEPHALEPATHY) DURING IMMUNOSUPPRESSE TREATMENT
EP3387134B1 (en) 2015-12-11 2020-10-14 Danisco US Inc. Methods and compositions for enhanced nuclease-mediated genome modification and reduced off-target site effects
CN105463003A (en) 2015-12-11 2016-04-06 扬州大学 Recombinant vector for eliminating activity of kanamycin drug resistance gene and building method of recombinant vector
CN105296537A (en) 2015-12-12 2016-02-03 西南大学 Fixed-point gene editing method based on intratestis injection
WO2017105350A1 (en) 2015-12-14 2017-06-22 Cellresearch Corporation Pte Ltd A method of generating a mammalian stem cell carrying a transgene, a mammalian stem cell generated by the method and pharmaceuticals uses of the mammalian stem cell
CN105400773B (en) 2015-12-14 2018-06-26 同济大学 CRISPR/Cas9 applied to Large-scale Screening cancer gene is enriched with sequencing approach
WO2017106616A1 (en) 2015-12-17 2017-06-22 The Regents Of The University Of Colorado, A Body Corporate Varicella zoster virus encoding regulatable cas9 nuclease
CN105463027A (en) 2015-12-17 2016-04-06 中国农业大学 Method for preparing high muscle content and hypertrophic cardiomyopathy model cloned pig
NO343153B1 (en) 2015-12-17 2018-11-19 Hydra Systems As A method of assessing the integrity status of a barrier plug
IL297018A (en) 2015-12-18 2022-12-01 Sangamo Therapeutics Inc Targeted disruption of the mhc cell receptor
EP3708665A1 (en) 2015-12-18 2020-09-16 Danisco US Inc. Methods and compositions for t-rna based guide rna expression
CN109072218B (en) 2015-12-18 2023-04-18 国立研究开发法人科学技术振兴机构 Genetically modified non-human organism, egg cell, fertilized egg, and method for modifying target gene
WO2017106569A1 (en) 2015-12-18 2017-06-22 The Regents Of The University Of California Modified site-directed modifying polypeptides and methods of use thereof
AU2016369490C1 (en) 2015-12-18 2021-12-23 Sangamo Therapeutics, Inc. Targeted disruption of the T cell receptor
AU2016370726B2 (en) 2015-12-18 2022-12-08 Danisco Us Inc. Methods and compositions for polymerase II (Pol-II) based guide RNA expression
WO2017106657A1 (en) 2015-12-18 2017-06-22 The Broad Institute Inc. Novel crispr enzymes and systems
US11761007B2 (en) 2015-12-18 2023-09-19 The Scripps Research Institute Production of unnatural nucleotides using a CRISPR/Cas9 system
US11542466B2 (en) 2015-12-22 2023-01-03 North Carolina State University Methods and compositions for delivery of CRISPR based antimicrobials
EP3701963A1 (en) 2015-12-22 2020-09-02 CureVac AG Method for producing rna molecule compositions
BR112018012894A2 (en) 2015-12-23 2018-12-04 Crispr Therapeutics Ag Materials and Methods for Treatment of Amyotrophic Lateral Sclerosis and / or Frontotemporal Lobular Degeneration
CN105543270A (en) 2015-12-24 2016-05-04 中国农业科学院作物科学研究所 Double resistance CRISPR/Cas9 carrier and application
CN105505976A (en) 2015-12-25 2016-04-20 安徽大学 Construction method of penicillin-producing recombined strain of streptomyces virginiae IBL14
CN105543266A (en) 2015-12-25 2016-05-04 安徽大学 CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat sequences)-Cas (CRISPR-associated proteins) system in Streptomyces virginiae IBL14 and method for carrying out gene editing by using CRISPR-Cas system
KR102860636B1 (en) 2015-12-28 2025-09-17 노파르티스 아게 Compositions and methods for the treatment of hemoglobinopathies
AU2016380351B2 (en) 2015-12-29 2023-04-06 Monsanto Technology Llc Novel CRISPR-associated transposases and uses thereof
CN105441451B (en) 2015-12-31 2019-03-22 暨南大学 A kind of sgRNA targeting sequencing of special target people ABCB1 gene and application
CN105567735A (en) 2016-01-05 2016-05-11 华东师范大学 Site specific repairing carrier system and method of blood coagulation factor genetic mutation
EP3400296A1 (en) 2016-01-08 2018-11-14 Novozymes A/S Genome editing in bacillus host cells
CN105647922A (en) 2016-01-11 2016-06-08 中国人民解放军疾病预防控制所 Application of CRISPR-Cas9 system based on new gRNA (guide ribonucleic acid) sequence in preparing drugs for treating hepatitis B
US11441146B2 (en) 2016-01-11 2022-09-13 Christiana Care Health Services, Inc. Compositions and methods for improving homogeneity of DNA generated using a CRISPR/Cas9 cleavage system
WO2017123609A1 (en) 2016-01-12 2017-07-20 The Regents Of The University Of California Compositions and methods for enhanced genome editing
US12049625B2 (en) 2016-01-14 2024-07-30 The Brigham And Women's Hospital, Inc. Genome editing for treating glioblastoma
MX2018008733A (en) 2016-01-14 2019-01-28 Memphis Meats Inc Methods for extending the replicative capacity of somatic cells during an ex vivo cultivation process.
KR20180097756A (en) 2016-01-15 2018-08-31 더 잭슨 래보라토리 Genetically engineered non-human mammals by multi-cycle electroporation of CAS9 protein
CN105567734A (en) 2016-01-18 2016-05-11 丹弥优生物技术(湖北)有限公司 Method for precisely editing genome DNA sequence
CN105567738A (en) 2016-01-18 2016-05-11 南开大学 Method for inducing CCR5-delta32 deletion with genome editing technology CRISPR-Cas9
WO2017126987A1 (en) 2016-01-18 2017-07-27 Анатолий Викторович ЗАЗУЛЯ Red blood cells for targeted drug delivery
SE540921C2 (en) 2016-01-20 2018-12-27 Apr Tech Ab Electrohydrodynamic control device
US10731153B2 (en) 2016-01-21 2020-08-04 Massachusetts Institute Of Technology Recombinases and target sequences
WO2017127807A1 (en) 2016-01-22 2017-07-27 The Broad Institute Inc. Crystal structure of crispr cpf1
CA3011270A1 (en) 2016-01-25 2018-06-14 Temple University Of The Commonwealth System Of Higher Education Rna guided eradication of human jc virus and other polyomaviruses
CN105567689B (en) 2016-01-25 2019-04-09 重庆威斯腾生物医药科技有限责任公司 CRISPR/Cas9 targeting knockout people TCAB1 gene and its specificity gRNA
CN105543228A (en) 2016-01-25 2016-05-04 宁夏农林科学院 Method for transforming rice into fragrant rice rapidly
AU2017211062A1 (en) 2016-01-25 2018-06-07 Excision Biotherapeutics Methods and compositions for RNA-guided treatment of HIV infection
EP3199632A1 (en) 2016-01-26 2017-08-02 ACIB GmbH Temperature-inducible crispr/cas system
CN105567688A (en) 2016-01-27 2016-05-11 武汉大学 CRISPR/SaCas9 system for gene therapy of AIDS
ES2949163T3 (en) 2016-01-29 2023-09-26 Univ Princeton Split Inteines with Exceptional Splicing Activity
US11518994B2 (en) 2016-01-30 2022-12-06 Bonac Corporation Artificial single guide RNA and use thereof
CN107022562B (en) 2016-02-02 2020-07-17 中国种子集团有限公司 A method for site-directed mutagenesis of maize genes using the CRISPR/Cas9 system
CN105647968B (en) 2016-02-02 2019-07-23 浙江大学 A kind of CRISPR/Cas9 working efficiency fast testing system and its application
CN105671083B (en) 2016-02-03 2017-09-29 安徽柯顿生物科技有限公司 The gene recombined virus plasmids of PD 1 and structure, the Puro of recombinant retrovirus Lenti PD 1 and packaging and application
US11845933B2 (en) 2016-02-03 2023-12-19 Massachusetts Institute Of Technology Structure-guided chemical modification of guide RNA and its applications
WO2017136520A1 (en) 2016-02-04 2017-08-10 President And Fellows Of Harvard College Mitochondrial genome editing and regulation
WO2017136629A1 (en) 2016-02-05 2017-08-10 Regents Of The University Of Minnesota Vectors and system for modulating gene expression
WO2017139264A1 (en) 2016-02-09 2017-08-17 President And Fellows Of Harvard College Dna-guided gene editing and regulation
RU2016104674A (en) 2016-02-11 2017-08-16 Анатолий Викторович Зазуля ERYTHROCYT MODIFICATION DEVICE WITH DIRECTED MEDICINAL TRANSPORT MECHANISM FOR CRISPR / CAS9 GENE THERAPY FUNCTIONS
WO2017139505A2 (en) 2016-02-11 2017-08-17 The Regents Of The University Of California Methods and compositions for modifying a mutant dystrophin gene in a cell's genome
CN105647962A (en) 2016-02-15 2016-06-08 浙江大学 Gene editing method for knocking out rice MIRNA393b stem-loop sequences with application of CRISPR(clustered regulatory interspersed short palindromic repeat)-Cas9 system
US9896696B2 (en) 2016-02-15 2018-02-20 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
MX2018009904A (en) 2016-02-15 2019-07-08 Univ Temple Excision of retroviral nucleic acid sequences.
US11274288B2 (en) 2016-02-16 2022-03-15 Emendobio Inc. Compositions and methods for promoting homology directed repair mediated gene editing
US11136597B2 (en) 2016-02-16 2021-10-05 Yale University Compositions for enhancing targeted gene editing and methods of use thereof
CN105647969B (en) 2016-02-16 2020-12-15 湖南师范大学 A method for gene knockout and breeding of stat1a gene-deficient zebrafish
CN105594664B (en) 2016-02-16 2018-10-02 湖南师范大学 A kind of method of gene knockout selection and breeding stat1a Gene Deletion zebra fish
CN105624187A (en) 2016-02-17 2016-06-01 天津大学 Site-directed mutation method for genomes of saccharomyces cerevisiae
WO2017142999A2 (en) 2016-02-18 2017-08-24 President And Fellows Of Harvard College Methods and systems of molecular recording by crispr-cas system
CN105646719B (en) 2016-02-24 2019-12-20 无锡市妇幼保健院 Efficient fixed-point transgenic tool and application thereof
US20170275665A1 (en) 2016-02-24 2017-09-28 Board Of Regents, The University Of Texas System Direct crispr spacer acquisition from rna by a reverse-transcriptase-cas1 fusion protein
US11530253B2 (en) 2016-02-25 2022-12-20 The Children's Medical Center Corporation Customized class switch of immunoglobulin genes in lymphoma and hybridoma by CRISPR/CAS9 technology
US20170246260A1 (en) 2016-02-25 2017-08-31 Agenovir Corporation Modified antiviral nuclease
CA3015353A1 (en) 2016-02-25 2017-08-31 Agenovir Corporation Viral and oncoviral nuclease treatment
US20170247703A1 (en) 2016-02-25 2017-08-31 Agenovir Corporation Antiviral nuclease methods
CA3015665C (en) 2016-02-26 2020-09-22 Lanzatech New Zealand Limited Crispr/cas systems for c1-fixing bacteria
WO2017151444A1 (en) 2016-02-29 2017-09-08 Agilent Technologies, Inc. Methods and compositions for blocking off-target nucleic acids from cleavage by crispr proteins
US11447768B2 (en) 2016-03-01 2022-09-20 University Of Florida Research Foundation, Incorporated Molecular cell diary system
CN105671070B (en) 2016-03-03 2019-03-19 江南大学 A kind of CRISPRCas9 system and its construction method for Bacillus subtilis genes group editor
KR102438360B1 (en) 2016-03-04 2022-08-31 에디타스 메디신, 인코포레이티드 CRISPR-CPF1-related methods, compositions and components for cancer immunotherapy
CN107177591A (en) 2016-03-09 2017-09-19 北京大学 SgRNA sequences using CRISPR technical editor's CCR5 genes and application thereof
CN105821039B (en) 2016-03-09 2020-02-07 李旭 Specific sgRNA combined with immune gene to inhibit HBV replication, expression vector and application of specific sgRNA
CN105821040B (en) 2016-03-09 2018-12-14 李旭 Combined immunization gene inhibits sgRNA, gene knockout carrier and its application of high-risk HPV expression
CN105861547A (en) 2016-03-10 2016-08-17 黄捷 Method for permanently embedding identity card number into genome
CA3010628A1 (en) 2016-03-11 2017-09-14 Pioneer Hi-Bred International, Inc. Novel cas9 systems and methods of use
US20180112234A9 (en) 2016-03-14 2018-04-26 Intellia Therapeutics, Inc. Methods and compositions for gene editing
MX2018011114A (en) 2016-03-14 2019-02-20 Editas Medicine Inc Crispr/cas-related methods and compositions for treating beta hemoglobinopathies.
WO2017157422A1 (en) 2016-03-15 2017-09-21 Carrier Corporation Refrigerated sales cabinet
US11530394B2 (en) 2016-03-15 2022-12-20 University Of Massachusetts Anti-CRISPR compounds and methods of use
EP3219799A1 (en) 2016-03-17 2017-09-20 IMBA-Institut für Molekulare Biotechnologie GmbH Conditional crispr sgrna expression
US20200291370A1 (en) 2016-03-18 2020-09-17 President And Fellows Of Harvard College Mutant Cas Proteins
WO2017165741A1 (en) 2016-03-24 2017-09-28 Karim Aftab S Reverse transcriptase dependent conversion of rna templates into dna
WO2017165862A1 (en) 2016-03-25 2017-09-28 Editas Medicine, Inc. Systems and methods for treating alpha 1-antitrypsin (a1at) deficiency
EP3433363A1 (en) 2016-03-25 2019-01-30 Editas Medicine, Inc. Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
WO2017172645A2 (en) 2016-03-28 2017-10-05 The Charles Stark Draper Laboratory, Inc. Bacteriophage engineering methods
CN106047803A (en) 2016-03-28 2016-10-26 青岛市胶州中心医院 Cell model obtained after targeted knockout of rabbit bone morphogenetic protein-2 (BMP2) gene based on CRISPR/Cas9 and application thereof
TWI773666B (en) 2016-03-30 2022-08-11 美商英特利亞醫療公司 Lipid nanoparticle formulations for crispr/cas components
EP3436578B1 (en) 2016-03-30 2022-01-19 F. Hoffmann-La Roche AG Improved sortase
WO2017173004A1 (en) 2016-03-30 2017-10-05 Mikuni Takayasu A method for in vivo precise genome editing
WO2017173092A1 (en) 2016-03-31 2017-10-05 The Regents Of The University Of California Methods for genome editing in zygotes
GB2565461B (en) 2016-03-31 2022-04-13 Harvard College Methods and compositions for the single tube preparation of sequencing libraries using Cas9
CN106167525B (en) 2016-04-01 2019-03-19 北京康明百奥新药研发有限公司 Methods and applications for screening ultra-low fucose cell lines
US10301619B2 (en) 2016-04-01 2019-05-28 New England Biolabs, Inc. Compositions and methods relating to synthetic RNA polynucleotides created from synthetic DNA oligonucleotides
CN118185874A (en) 2016-04-04 2024-06-14 苏黎世联邦理工学院 A recombinant mammalian B cell
WO2017176529A1 (en) 2016-04-06 2017-10-12 Temple Univesity-Of The Commonwealth System Of Higher Education Compositions for eradicating flavivirus infections in subjects
CN105802980A (en) 2016-04-08 2016-07-27 北京大学 CRISPR/Cas9 system with Gateway compatibility and application of CRISPR/Cas9 system
CN106399306B (en) 2016-04-12 2019-11-05 西安交通大学第一附属医院 Target sgRNA, genophore and its application that people lncRNA-UCA1 inhibits bladder cancer
WO2017180694A1 (en) 2016-04-13 2017-10-19 Editas Medicine, Inc. Cas9 fusion molecules gene editing systems, and methods of use thereof
EP3443088B1 (en) 2016-04-13 2024-09-18 Editas Medicine, Inc. Grna fusion molecules, gene editing systems, and methods of use thereof
US20190127713A1 (en) 2016-04-13 2019-05-02 Duke University Crispr/cas9-based repressors for silencing gene targets in vivo and methods of use
US20190167814A1 (en) 2016-04-14 2019-06-06 Université de Lausanne Treatment And/Or Prevention Of DNA-Triplet Repeat Diseases Or Disorders
WO2017180926A1 (en) 2016-04-14 2017-10-19 Boco Silicon Valley. Inc. Genome editing of human neural stem cells using nucleases
CN105821116A (en) 2016-04-15 2016-08-03 扬州大学 A detection method for directional knockout of sheep MSTN gene and its effect on myogenic differentiation
US12065667B2 (en) 2016-04-16 2024-08-20 Ohio State Innovation Foundation Modified Cpf1 MRNA, modified guide RNA, and uses thereof
WO2017184334A1 (en) 2016-04-18 2017-10-26 The Board Of Regents Of The University Of Texas System Generation of genetically engineered animals by crispr/cas9 genome editing in spermatogonial stem cells
EP3445852A1 (en) 2016-04-18 2019-02-27 Ruprecht-Karls-Universität Heidelberg Means and methods for inactivating therapeutic dna in a cell
US20200263190A1 (en) 2016-04-19 2020-08-20 The Broad Institute, Inc. Novel crispr enzymes and systems
EP3445853A1 (en) 2016-04-19 2019-02-27 The Broad Institute, Inc. Cpf1 complexes with reduced indel activity
CN110382692A (en) 2016-04-19 2019-10-25 博德研究所 Novel C RISPR enzyme and system
CN106086062A (en) 2016-04-19 2016-11-09 上海市农业科学院 A kind of tomato dna group that obtains pinpoints the method knocking out mutant
CN105886616B (en) 2016-04-20 2020-08-07 广东省农业科学院农业生物基因研究中心 Efficient specific sgRNA recognition site guide sequence for pig gene editing and screening method thereof
EP3235908A1 (en) 2016-04-21 2017-10-25 Ecole Normale Superieure De Lyon Methods for selectively modulating the activity of distinct subtypes of cells
CN105821075B (en) 2016-04-22 2017-09-12 湖南农业大学 A kind of construction method of tea tree CaMTL5 CRISPR/Cas9 genome editor's carriers
CN107304435A (en) 2016-04-22 2017-10-31 中国科学院青岛生物能源与过程研究所 A kind of Cas9/RNA systems and its application
CN105861552B (en) 2016-04-25 2019-10-11 西北农林科技大学 A method for constructing a T7 RNA polymerase-mediated CRISPR/Cas9 gene editing system
WO2017189336A1 (en) 2016-04-25 2017-11-02 The Regents Of The University Of California Methods and compositions for genomic editing
CN107326046A (en) 2016-04-28 2017-11-07 上海邦耀生物科技有限公司 A kind of method for improving foreign gene homologous recombination efficiency
JP7184648B2 (en) 2016-04-29 2022-12-06 ビーエーエスエフ プラント サイエンス カンパニー ゲーエムベーハー Improved methods for modification of target nucleic acids
CN105821049B (en) 2016-04-29 2019-06-04 中国农业大学 A kind of preparation method of Fbxo40 gene knockout pig
CN105886534A (en) 2016-04-29 2016-08-24 苏州溯源精微生物科技有限公司 Tumor metastasis inhibition method
CN109477109B (en) 2016-04-29 2022-09-23 萨勒普塔医疗公司 Oligonucleotide analogs targeting human LMNA
WO2017190257A1 (en) 2016-05-01 2017-11-09 Neemo Inc Harnessing heterologous and endogenous crispr-cas machineries for efficient markerless genome editing in clostridium
US20170362609A1 (en) 2016-05-02 2017-12-21 Massachusetts Institute Of Technology AMPHIPHILIC NANOPARTICLES FOR CODELIVERY OF WATER-INSOLUBLE SMALL MOLECULES AND RNAi
EP3452101A2 (en) 2016-05-04 2019-03-13 CureVac AG Rna encoding a therapeutic protein
WO2017191210A1 (en) 2016-05-04 2017-11-09 Novozymes A/S Genome editing by crispr-cas9 in filamentous fungal host cells
CN105950639A (en) 2016-05-04 2016-09-21 广州美格生物科技有限公司 Preparation method of staphylococcus aureus CRISPR/Cas9 system and application of system in constructing mouse model
US20190134221A1 (en) 2016-05-05 2019-05-09 Duke University Crispr/cas-related methods and compositions for treating duchenne muscular dystrophy
WO2017192172A1 (en) 2016-05-05 2017-11-09 Temple University - Of The Commonwealth System Of Higher Education Rna guided eradication of varicella zoster virus
CN105907785B (en) 2016-05-05 2020-02-07 苏州吉玛基因股份有限公司 Application of chemically synthesized crRNA in CRISPR/Cpf1 system in gene editing
WO2017190664A1 (en) 2016-05-05 2017-11-09 苏州吉玛基因股份有限公司 Use of chemosynthetic crrna and modified crrna in crispr/cpf1 gene editing systems
CN106244591A (en) 2016-08-23 2016-12-21 苏州吉玛基因股份有限公司 Modify crRNA application in CRISPR/Cpf1 gene editing system
CN105985985B (en) 2016-05-06 2019-12-31 苏州大学 Preparation method of allogeneic mesenchymal stem cells edited by CRISPR technology and optimized with IGF and its application in the treatment of myocardial infarction
JP6872560B2 (en) 2016-05-06 2021-05-19 エム. ウルフ、トッド Improved methods for genome editing with programmable nucleases and genome editing without programmable nucleases
US20190161743A1 (en) 2016-05-09 2019-05-30 President And Fellows Of Harvard College Self-Targeting Guide RNAs in CRISPR System
JP2019519250A (en) 2016-05-10 2019-07-11 ユナイテッド ステイツ ガバメント アズ リプレゼンテッド バイ ザ デパートメント オブ ベテランズ アフェアーズUnited States Government As Represented By The Department Of Veterans Affairs Lentiviral delivery of a CRISPR / CAS construct that cleaves genes essential for HIV-1 infection and replication
CN105861554B (en) 2016-05-10 2020-01-31 华南农业大学 method for realizing animal sex control based on editing Rbmy gene and application
WO2017197301A1 (en) 2016-05-12 2017-11-16 Hanley Brian P Safe delivery of crispr and other gene therapies to large fractions of somatic cells in humans and animals
WO2017197238A1 (en) 2016-05-12 2017-11-16 President And Fellows Of Harvard College Aav split cas9 genome editing and transcriptional regulation
CN107365786A (en) 2016-05-12 2017-11-21 中国科学院微生物研究所 A kind of method and its application being cloned into spacer sequences in CRISPR-Cas9 systems
KR101922989B1 (en) 2016-05-13 2018-11-28 연세대학교 산학협력단 Generation and tracking of substitution mutations in the genome using a CRISPR/Retron system
CN105907758B (en) 2016-05-18 2020-06-05 世翱(上海)生物医药科技有限公司 CRISPR-Cas9 guide sequence and primer thereof, transgenic expression vector and construction method thereof
CN105838733A (en) 2016-05-18 2016-08-10 云南省农业科学院花卉研究所 Cas9 mediated carnation gene editing carrier and application
CN106011171B (en) 2016-05-18 2019-10-11 西北农林科技大学 A seamless gene editing method based on SSA repair using CRISPR/Cas9 technology
IL323024A (en) 2016-05-20 2025-10-01 Regeneron Pharma Methods for breaking immunological tolerance using multiple guide rnas
CN106446600B (en) 2016-05-20 2019-10-18 同济大学 A design method of sgRNA based on CRISPR/Cas9
US20190300867A1 (en) 2016-05-23 2019-10-03 The Trustees Of Columbia University In The City Of New York Bypassing the pam requirement of the crispr-cas system
WO2017205423A1 (en) 2016-05-23 2017-11-30 Washington University Pulmonary targeted cas9/crispr for in vivo editing of disease genes
CN105950560B (en) 2016-05-24 2019-07-23 苏州系统医学研究所 Humanized PD-L1 tumor cell line and animal model and application with the cell line
CN106011167B (en) 2016-05-27 2019-11-01 上海交通大学 The method of the application and rice fertility restorer of male sterility gene OsDPW2
JP2019517261A (en) 2016-06-01 2019-06-24 カーヴェーエス ザート ソシエタス・ヨーロピアKws Saat Se Hybrid nucleic acid sequences for genome manipulation
LT3604527T (en) 2016-06-02 2021-06-25 Sigma-Aldrich Co., Llc USE OF PROGRAMMABLE DNA-BINDING PROTEINS TO IMPROVE TARGETED GENOME MODIFICATION
WO2017208247A1 (en) 2016-06-02 2017-12-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Assay for the removal of methyl-cytosine residues from dna
CA3026332A1 (en) 2016-06-03 2017-12-14 Temple University - Of The Commonwealth System Of Higher Education Negative feedback regulation of hiv-1 by gene editing strategy
US11140883B2 (en) 2016-06-03 2021-10-12 Auburn University Gene editing of reproductive hormones to sterilize aquatic animals
WO2017213898A2 (en) 2016-06-07 2017-12-14 Temple University - Of The Commonwealth System Of Higher Education Rna guided compositions for preventing and treating hepatitis b virus infections
CN106119275A (en) 2016-06-07 2016-11-16 湖北大学 Based on CRISPR/Cas9 technology, nonglutinous rice strain is transformed into targeting vector and the method for waxy strain
US10767175B2 (en) 2016-06-08 2020-09-08 Agilent Technologies, Inc. High specificity genome editing using chemically modified guide RNAs
WO2017222834A1 (en) 2016-06-10 2017-12-28 City Of Hope Compositions and methods for mitochondrial genome editing
CN106086008B (en) 2016-06-10 2019-03-12 中国农业科学院植物保护研究所 CRISPR/cas9 system of TRP gene of B. tabaci MED cryptic species and its application
AU2017286122A1 (en) 2016-06-14 2018-11-22 Pioneer Hi-Bred International, Inc. Use of Cpf1 endonuclease for plant genome modifications
CN106434752A (en) 2016-06-14 2017-02-22 南通大学附属医院 Process of knocking out Wnt3a gene and verification method thereof
CN106167808A (en) 2016-06-16 2016-11-30 郑州大学 A kind of method eliminating mecA plasmid based on CRISPR/Cas9 technology
CN105950633B (en) 2016-06-16 2019-05-03 复旦大学 Application of gene OsARF4 in controlling grain length and 1000-grain weight of rice
CN106167821A (en) 2016-06-16 2016-11-30 郑州大学 A kind of staphylococcus aureus CRISPR site detection kit and detection method
EP3472321A2 (en) 2016-06-17 2019-04-24 Genesis Technologies Limited Crispr-cas system, materials and methods
CA3028158A1 (en) 2016-06-17 2017-12-21 The Broad Institute, Inc. Type vi crispr orthologs and systems
EP3472311A4 (en) 2016-06-17 2020-03-04 Montana State University BIDIRECTIONAL TARGETING FOR GENOMEDITATION
CN105950626B (en) 2016-06-17 2018-09-28 新疆畜牧科学院生物技术研究所 The method of different hair color sheep is obtained based on CRISPR/Cas9 and targets the sgRNA of ASIP genes
US20170362635A1 (en) 2016-06-20 2017-12-21 University Of Washington Muscle-specific crispr/cas9 editing of genes
WO2017223107A1 (en) 2016-06-20 2017-12-28 Unity Biotechnology, Inc. Genome modifying enzyme therapy for diseases modulated by senescent cells
ES2981548T3 (en) 2016-06-20 2024-10-09 Keygene Nv Method for targeted alteration of DNA in plant cells
CA3018430A1 (en) 2016-06-20 2017-12-28 Pioneer Hi-Bred International, Inc. Novel cas systems and methods of use
CN106148370A (en) 2016-06-21 2016-11-23 苏州瑞奇生物医药科技有限公司 Fat rats animal model and construction method
EP3475424A1 (en) 2016-06-22 2019-05-01 ProQR Therapeutics II B.V. Single-stranded rna-editing oligonucleotides
JP2019522481A (en) 2016-06-22 2019-08-15 アイカーン スクール オブ メディシン アット マウント サイナイ Viral delivery of RNA using self-cleaving ribozymes and its CRISPR-based application
CN106047877B (en) 2016-06-24 2019-01-11 中山大学附属第一医院 sgRNA and CRISPR/Cas9 lentivirus system for targeted knockout of FTO gene and application
CN105925608A (en) 2016-06-24 2016-09-07 广西壮族自治区水牛研究所 Method for targeted knockout of gene ALK6 by using CRISPR-Cas9
CN106119283A (en) 2016-06-24 2016-11-16 广西壮族自治区水牛研究所 A kind of method that the CRISPR of utilization Cas9 targeting knocks out MSTN gene
CN106148286B (en) 2016-06-29 2019-10-29 牛刚 A kind of construction method and cell model and pyrogen test kit for detecting the cell model of pyrogen
WO2018005691A1 (en) 2016-06-29 2018-01-04 The Regents Of The University Of California Efficient genetic screening method
US12595478B2 (en) 2016-06-29 2026-04-07 The Broad Institute, Inc. Crispr-Cas systems having destabilization domain
US20190185849A1 (en) 2016-06-29 2019-06-20 Crispr Therapeutics Ag Compositions and methods for gene editing
US10927383B2 (en) 2016-06-30 2021-02-23 Ethris Gmbh Cas9 mRNAs
US20180004537A1 (en) 2016-07-01 2018-01-04 Microsoft Technology Licensing, Llc Molecular State Machines
US10892034B2 (en) 2016-07-01 2021-01-12 Microsoft Technology Licensing, Llc Use of homology direct repair to record timing of a molecular event
CN109477130B (en) 2016-07-01 2022-08-30 微软技术许可有限责任公司 Storage by iterative DNA editing
BR112019000057A2 (en) 2016-07-05 2019-04-02 The Johns Hopkins University crispr / cas9 based compositions and methods for the treatment of retinal degeneration
CN106191057B (en) 2016-07-06 2018-12-25 中山大学 A kind of sgRNA sequence for knocking out people's CYP2E1 gene, the construction method of CYP2E1 gene deleted cell strains and its application
US20190185847A1 (en) 2016-07-06 2019-06-20 Novozymes A/S Improving a Microorganism by CRISPR-Inhibition
CN106051058A (en) 2016-07-07 2016-10-26 上海格昆机电科技有限公司 Rotating rack used for spaceflight storage tank and particle treatment instrument and transmission mechanism of rotation rack
CN107586777A (en) 2016-07-08 2018-01-16 上海吉倍生物技术有限公司 People's PDCD1 genes sgRNA purposes and its related drugs
WO2018009822A1 (en) 2016-07-08 2018-01-11 Ohio State Innovation Foundation Modified nucleic acids, hybrid guide rnas, and uses thereof
CN106047930B (en) 2016-07-12 2020-05-19 北京百奥赛图基因生物技术有限公司 Preparation method of Flox rat with conditional knockout of PS1 gene
KR102319845B1 (en) 2016-07-13 2021-11-01 디에스엠 아이피 어셋츠 비.브이. CRISPR-CAS system for avian host cells
US20190330659A1 (en) 2016-07-15 2019-10-31 Zymergen Inc. Scarless dna assembly and genome editing using crispr/cpf1 and dna ligase
WO2018013932A1 (en) 2016-07-15 2018-01-18 Salk Institute For Biological Studies Methods and compositions for genome editing in non-dividing cells
CN106191062B (en) 2016-07-18 2019-06-14 广东华南疫苗股份有限公司 A kind of TCR-/PD-1- double negative T cell and its construction method
CN106190903B (en) 2016-07-18 2019-04-02 华中农业大学 Cas9 gene deletion mutant of R. anatipestifer and its application
CN106191061B (en) 2016-07-18 2019-06-18 暨南大学 A kind of sgRNA guide sequence specifically targeting human ABCG2 gene and its application
WO2018017754A1 (en) 2016-07-19 2018-01-25 Duke University Therapeutic applications of cpf1-based genome editing
CN106434651B (en) 2016-07-19 2021-05-18 广西大学 Agrobacterium tumefaciens and CRISPR-Cas9 mediated gene site-directed insertion inactivation method and application thereof
KR20190031306A (en) 2016-07-21 2019-03-25 맥스시티 인코포레이티드 Methods and compositions for altering genomic DNA
CN106191107B (en) 2016-07-22 2020-03-20 湖南农业大学 Molecular improvement method for reducing rice grain falling property
WO2018015444A1 (en) 2016-07-22 2018-01-25 Novozymes A/S Crispr-cas9 genome editing with multiple guide rnas in filamentous fungi
CN106191064B (en) 2016-07-22 2019-06-07 中国农业大学 A method of preparing MC4R gene knock-out pig
US20190270980A1 (en) 2016-07-25 2019-09-05 Mayo Foundation For Medical Education And Research Treating cancer
WO2018018979A1 (en) 2016-07-26 2018-02-01 浙江大学 Recombinant plant vector and method for screening non-transgenic gene-edited strain
CN106222193B (en) 2016-07-26 2019-09-20 浙江大学 A screening method for recombinant vectors and non-transgenic gene editing plants
WO2018022634A1 (en) 2016-07-26 2018-02-01 The General Hospital Corporation Variants of crispr from prevotella and francisella 1 (cpf1)
CN106086061A (en) 2016-07-27 2016-11-09 苏州泓迅生物科技有限公司 A kind of genes of brewing yeast group editor's carrier based on CRISPR Cas9 system and application thereof
CN106191099A (en) 2016-07-27 2016-12-07 苏州泓迅生物科技有限公司 A kind of parallel multiple editor's carrier of genes of brewing yeast group based on CRISPR Cas9 system and application thereof
KR101828958B1 (en) 2016-07-28 2018-02-13 주식회사 비엠티 Heating jacket for outdoor pipe
CN106191113B (en) 2016-07-29 2020-01-14 中国农业大学 Preparation method of MC3R gene knockout pig
CN106191114B (en) 2016-07-29 2020-02-11 中国科学院重庆绿色智能技术研究院 Breeding method for knocking out fish MC4R gene by using CRISPR-Cas9 system
CN106191124B (en) 2016-07-29 2019-10-11 中国科学院重庆绿色智能技术研究院 A Fish Breeding Method Using Fish Egg Preservation Solution to Improve CRISPR-Cas9 Gene Editing and Passaging Efficiency
CN106434748A (en) 2016-07-29 2017-02-22 中国科学院重庆绿色智能技术研究院 Development and applications of heat shock induced Cas9 enzyme transgene danio rerio
GB201613135D0 (en) 2016-07-29 2016-09-14 Medical Res Council Genome editing
US11866733B2 (en) 2016-08-01 2024-01-09 University of Pittsburgh—of the Commonwealth System of Higher Education Human induced pluripotent stem cells for high efficiency genetic engineering
CN106011150A (en) 2016-08-01 2016-10-12 云南纳博生物科技有限公司 Rice grain number per ear Gn1a gene artificial site-directed mutant and application thereof
CN106434688A (en) 2016-08-01 2017-02-22 云南纳博生物科技有限公司 Artificial fixed-point rice dense and erect panicle (DEP1) gene mutant body and application thereof
WO2018026976A1 (en) 2016-08-02 2018-02-08 Editas Medicine, Inc. Compositions and methods for treating cep290 associated disease
WO2018025206A1 (en) 2016-08-02 2018-02-08 Kyoto University Method for genome editing
KR102547316B1 (en) 2016-08-03 2023-06-23 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Adenosine nucleobase editing agents and uses thereof
CN106282241A (en) 2016-08-05 2017-01-04 无锡市第二人民医院 The method obtaining knocking out the Brachydanio rerio of bmp2a gene by CRISPR/Cas9
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
KR101710026B1 (en) 2016-08-10 2017-02-27 주식회사 무진메디 Composition comprising delivery carrier of nano-liposome having Cas9 protein and guide RNA
CN106222203A (en) 2016-08-10 2016-12-14 云南纳博生物科技有限公司 CRISPR/Cas technology is utilized to obtain bombyx mori silk fibroin heavy chain gene mutant and mutation method and application
CN106172238B (en) 2016-08-12 2019-01-22 中南大学 Construction method and application of miR-124 gene knockout mouse animal model
WO2018029534A1 (en) 2016-08-12 2018-02-15 Oxitec Ltd. A self-limiting, sex-specific gene and methods of using
CN106222177B (en) 2016-08-13 2018-06-26 江苏集萃药康生物科技有限公司 A kind of CRISPR-Cas9 systems for targeting people STAT6 and its application for treating anaphylactia
US12431216B2 (en) 2016-08-17 2025-09-30 Broad Institute, Inc. Methods for identifying class 2 crispr-cas systems
WO2018035300A1 (en) 2016-08-17 2018-02-22 The Regents Of The University Of California Split trans-complementing gene-drive system for suppressing aedes aegypti mosquitos
US11810649B2 (en) 2016-08-17 2023-11-07 The Broad Institute, Inc. Methods for identifying novel gene editing elements
WO2018035503A1 (en) 2016-08-18 2018-02-22 The Regents Of The University Of California Crispr-cas genome engineering via a modular aav delivery system
MA46018A (en) 2016-08-19 2019-06-26 Bluebird Bio Inc GENOME EDITING ACTIVATORS
JP2019524149A (en) 2016-08-20 2019-09-05 アベリノ ラボ ユーエスエー インコーポレイテッドAvellino Lab USA, Inc. Single-stranded guide RNA, CRISPR / Cas9 system, and methods of use thereof
CN106191071B (en) 2016-08-22 2018-09-04 广州资生生物科技有限公司 CRISPR-Cas9 system and application thereof in treating breast cancer diseases
CN106191116B (en) 2016-08-22 2019-10-08 西北农林科技大学 Foreign gene based on CRISPR/Cas9 knocks in integration system and its method for building up and application
CN106086028B (en) 2016-08-23 2019-04-23 中国农业科学院作物科学研究所 A method for improving rice resistant starch content by genome editing and its dedicated sgRNA
CN106244555A (en) 2016-08-23 2016-12-21 广州医科大学附属第三医院 A kind of method of efficiency improving gene targeting and the base in-situ remediation method in beta globin gene site
SG10201913948PA (en) 2016-08-24 2020-03-30 Sangamo Therapeutics Inc Engineered target specific nucleases
CN106244609A (en) 2016-08-24 2016-12-21 浙江理工大学 The screening system of a kind of Noncoding gene regulating PI3K AKT signal path and screening technique
CN106109417A (en) 2016-08-24 2016-11-16 李因传 A kind of bionical lipidosome drug carrier of liver plasma membrane, manufacture method and application thereof
KR101856345B1 (en) 2016-08-24 2018-06-20 경상대학교산학협력단 Method for generation of APOBEC3H and APOBEC3CH double-knocked out cat using CRISPR/Cas9 system
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
LT3504229T (en) 2016-08-24 2021-12-10 Sangamo Therapeutics, Inc. Regulation of gene expression using engineered nucleases
CN106544357B (en) 2016-08-25 2018-08-21 湖南杂交水稻研究中心 A method of cultivating low cadmium-accumulation rice variety
CN106350540A (en) 2016-08-26 2017-01-25 苏州系统医学研究所 High-efficient inducible type CRISPR/Cas9 gene knockout carrier mediated by lentivirus and application thereof
CN106318973B (en) 2016-08-26 2019-09-13 深圳市第二人民医院 A CRISPR-Cas9-based gene regulation device and gene regulation method
CN107784200B (en) 2016-08-26 2020-11-06 深圳华大生命科学研究院 Method and device for screening novel CRISPR-Cas system
CN106244557B (en) 2016-08-29 2019-10-25 中国农业科学院北京畜牧兽医研究所 Method for site-directed mutation of ApoE gene and LDLR gene
CN106480097A (en) 2016-10-13 2017-03-08 南京凯地生物科技有限公司 Knocking out that people PD 1 is gene constructed using CRISPR/Cas9 technology can the method for targeting MSLN novel C AR T cell and its application
CN106399367A (en) 2016-08-31 2017-02-15 深圳市卫光生物制品股份有限公司 Method for improving efficiency of CRISPR mediated homologous recombination
CN106399375A (en) 2016-08-31 2017-02-15 南京凯地生物科技有限公司 Method for constructing CD19 targeting CAR-T (chimeric antigen receptor-T) cells by knocking out PD-1 (programmed death 1) genes by virtue of CRISPR/Cas9
CN107794272B (en) 2016-09-06 2021-10-12 中国科学院上海营养与健康研究所 High-specificity CRISPR genome editing system
JP7682604B2 (en) 2016-09-07 2025-05-26 フラッグシップ パイオニアリング イノベーションズ ブイ, インコーポレイテッド Methods and compositions for modulating gene expression
CN106399377A (en) 2016-09-07 2017-02-15 同济大学 Method for screening drug target genes based on CRISPR/Cas9 high-throughput technology
WO2018048827A1 (en) 2016-09-07 2018-03-15 Massachusetts Institute Of Technology Rna-guided endonuclease-based dna assembly
CN106399311A (en) 2016-09-07 2017-02-15 同济大学 Endogenous protein marking method used for Chip-seq genome-wide binding spectrum
CN106367435B (en) 2016-09-07 2019-11-08 电子科技大学 A method for targeted knockout of miRNA in rice
EP4431607A3 (en) 2016-09-09 2024-12-11 The Board of Trustees of the Leland Stanford Junior University High-throughput precision genome editing
CN107574179B (en) 2016-09-09 2018-07-10 康码(上海)生物科技有限公司 A kind of CRISPR/Cas9 high efficiency gene editing systems for kluyveromyces optimization
EP3512943B1 (en) 2016-09-14 2023-04-12 Yeda Research and Development Co. Ltd. Crisp-seq, an integrated method for massively parallel single cell rna-seq and crispr pooled screens
CN106318934B (en) 2016-09-21 2020-06-05 上海交通大学 Complete gene sequence of carrot β(1,2) xylose transferase and construction of CRISPR/CAS9 plasmid for transfection of dicotyledonous plants
FI3516056T3 (en) 2016-09-23 2025-02-28 Dsm Ip Assets Bv GUIDE RNA EXPRESSION SYSTEM FOR THE HOST CELL
EP3516058A1 (en) 2016-09-23 2019-07-31 Casebia Therapeutics Limited Liability Partnership Compositions and methods for gene editing
US9580698B1 (en) 2016-09-23 2017-02-28 New England Biolabs, Inc. Mutant reverse transcriptase
CN106957858A (en) 2016-09-23 2017-07-18 西北农林科技大学 A kind of method that utilization CRISPR/Cas9 systems knock out sheep MSTN, ASIP, BCO2 gene jointly
EP3497215B1 (en) 2016-09-28 2024-01-10 Cellivery Therapeutics, Inc. Cell-permeable (cp)-cas9 recombinant protein and uses thereof
CN107881184B (en) 2016-09-30 2021-08-27 中国科学院分子植物科学卓越创新中心 Cpf 1-based DNA in-vitro splicing method
CN107880132B (en) 2016-09-30 2022-06-17 北京大学 Fusion protein and method for carrying out homologous recombination by using same
CA3038960A1 (en) 2016-09-30 2018-04-05 The Regents Of The University Of California Rna-guided nucleic acid modifying enzymes and methods of use thereof
CN106480027A (en) 2016-09-30 2017-03-08 重庆高圣生物医药有限责任公司 CRISPR/Cas9 targeting knock out people PD 1 gene and its specificity gRNA
JP7306696B2 (en) 2016-09-30 2023-07-11 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア RNA-guided nucleic acid-modifying enzyme and method of use thereof
WO2018064516A1 (en) 2016-09-30 2018-04-05 Monsanto Technology Llc Method for selecting target sites for site-specific genome modification in plants
US11730823B2 (en) 2016-10-03 2023-08-22 President And Fellows Of Harvard College Delivery of therapeutic RNAs via ARRDC1-mediated microvesicles
US20190241899A1 (en) 2016-10-05 2019-08-08 President And Fellows Of Harvard College Methods of Crispr Mediated Genome Modulation in V. Natriegens
US10669539B2 (en) 2016-10-06 2020-06-02 Pioneer Biolabs, Llc Methods and compositions for generating CRISPR guide RNA libraries
CN118726313A (en) 2016-10-07 2024-10-01 综合Dna技术公司 CAS9 mutant gene of Streptococcus pyogenes and polypeptide encoded thereby
CN106479985A (en) 2016-10-09 2017-03-08 上海吉玛制药技术有限公司 Application of the virus-mediated Cpf1 albumen in CRISPR/Cpf1 gene editing system
IT201600102542A1 (en) 2016-10-12 2018-04-12 Univ Degli Studi Di Trento Plasmid and lentiviral system containing a self-limiting Cas9 circuit that increases its safety.
WO2018071623A2 (en) 2016-10-12 2018-04-19 Temple University - Of The Commonwealth System Of Higher Education Combination therapies for eradicating flavivirus infections in subjects
CN106434663A (en) 2016-10-12 2017-02-22 遵义医学院 Method for CRISPR/Cas9 targeted knockout of human ezrin gene enhancer key region and specific gRNA thereof
EP3526320A1 (en) 2016-10-14 2019-08-21 President and Fellows of Harvard College Aav delivery of nucleobase editors
AU2017341926B2 (en) 2016-10-14 2022-06-30 The General Hospital Corporation Epigenetically regulated site-specific nucleases
CN106434782B (en) 2016-10-14 2020-01-10 南京工业大学 Method for producing cis-4-hydroxyproline
US20190330620A1 (en) 2016-10-14 2019-10-31 Emendobio Inc. Rna compositions for genome editing
SG10201913505WA (en) 2016-10-17 2020-02-27 Univ Nanyang Tech Truncated crispr-cas proteins for dna targeting
US10640810B2 (en) 2016-10-19 2020-05-05 Drexel University Methods of specifically labeling nucleic acids using CRISPR/Cas
WO2018081534A1 (en) 2016-10-28 2018-05-03 President And Fellows Of Harvard College Assay for exo-site binding molecules
WO2018081504A1 (en) 2016-10-28 2018-05-03 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating herpes simplex virus
US20180119141A1 (en) 2016-10-28 2018-05-03 Massachusetts Institute Of Technology Crispr/cas global regulator screening platform
WO2018081535A2 (en) 2016-10-28 2018-05-03 Massachusetts Institute Of Technology Dynamic genome engineering
WO2018081728A1 (en) 2016-10-31 2018-05-03 Emendobio Inc. Compositions for genome editing
US20190198214A1 (en) 2016-10-31 2019-06-27 Eguchi High Frequency Co., Ltd. Reactor
US20180245065A1 (en) 2016-11-01 2018-08-30 Novartis Ag Methods and compositions for enhancing gene editing
WO2018085288A1 (en) 2016-11-01 2018-05-11 President And Fellows Of Harvard College Inhibitors of rna guided nucleases and uses thereof
WO2018085414A1 (en) 2016-11-02 2018-05-11 President And Fellows Of Harvard College Engineered guide rna sequences for in situ detection and sequencing
GB201618507D0 (en) 2016-11-02 2016-12-14 Stichting Voor De Technische Wetenschappen And Wageningen Univ Microbial genome editing
CN106544353A (en) 2016-11-08 2017-03-29 宁夏医科大学总医院 A kind of method that utilization CRISPR Cas9 remove Acinetobacter bauamnnii drug resistance gene
WO2018089664A1 (en) 2016-11-11 2018-05-17 The Regents Of The University Of California Variant rna-guided polypeptides and methods of use
CN106755088A (en) 2016-11-11 2017-05-31 广东万海细胞生物科技有限公司 A kind of autologous CAR T cells preparation method and application
CN106566838B (en) 2016-11-14 2019-11-01 上海伯豪生物技术有限公司 A kind of miR-126 full-length gene knockout kit and its application based on CRISPR-Cas9 technology
EA201990815A1 (en) 2016-11-14 2019-09-30 Институте Оф Генетисс Анд Девелопментал Биологй. Чинесе Асадемй Оф Ссиенсес METHOD FOR EDITING THE BASE OF PLANTS
CN106554969A (en) 2016-11-15 2017-04-05 陕西理工学院 Mutiple Targets CRISPR/Cas9 expression vectors based on bacteriostasis and sterilization
CN106754912B (en) 2016-11-16 2019-11-08 上海交通大学 A class of plasmids and preparations for directed removal of HBV cccDNA in hepatocytes
WO2018093990A1 (en) 2016-11-16 2018-05-24 The Regents Of The University Of California Inhibitors of crispr-cas9
US20180282722A1 (en) 2016-11-21 2018-10-04 Massachusetts Institute Of Technology Chimeric DNA:RNA Guide for High Accuracy Cas9 Genome Editing
CN106480067A (en) 2016-11-21 2017-03-08 中国农业科学院烟草研究所 The old and feeble application of Nicotiana tabacum L. NtNAC096 Gene Handling Nicotiana tabacum L.
WO2018098383A1 (en) 2016-11-22 2018-05-31 Integrated Dna Technologies, Inc. Crispr/cpf1 systems and methods
CA3044531A1 (en) 2016-11-28 2018-05-31 The Board Of Regents Of The University Of Texas System Prevention of muscular dystrophy by crispr/cpf1-mediated gene editing
CN106755091A (en) 2016-11-28 2017-05-31 中国人民解放军第三军医大学第附属医院 Gene knockout carrier, MH7A cell NLRP1 gene knockout methods
CN106480036B (en) 2016-11-30 2019-04-09 华南理工大学 A DNA fragment with promoter function and its application
CN106834323A (en) 2016-12-01 2017-06-13 安徽大学 Gene editing method based on streptomyces virginiae IBL14 gene cas7-5-3
US20200056206A1 (en) 2016-12-01 2020-02-20 UNIVERSITé LAVAL Crispr-based treatment of friedreich ataxia
CN107043779B (en) 2016-12-01 2020-05-12 中国农业科学院作物科学研究所 Application of a CRISPR/nCas9-mediated site-directed base replacement in plants
US9816093B1 (en) 2016-12-06 2017-11-14 Caribou Biosciences, Inc. Engineered nucleic acid-targeting nucleic acids
CN106701830B (en) 2016-12-07 2020-01-03 湖南人文科技学院 Pig embryo p66 knock-outshcMethod for gene
CN108165573B (en) 2016-12-07 2022-01-14 中国科学院分子植物科学卓越创新中心 Chloroplast genome editing method
US11192929B2 (en) 2016-12-08 2021-12-07 Regents Of The University Of Minnesota Site-specific DNA base editing using modified APOBEC enzymes
AU2017374044B2 (en) 2016-12-08 2023-11-30 Intellia Therapeutics, Inc. Modified guide RNAs
CN106544351B (en) 2016-12-08 2019-09-10 江苏省农业科学院 CRISPR-Cas9 knock out in vitro drug resistant gene mcr-1 method and its dedicated cell-penetrating peptides
US12404514B2 (en) 2016-12-09 2025-09-02 The Broad Institute, Inc. CRISPR-systems for modifying a trait of interest in a plant
CA3049961A1 (en) 2016-12-09 2018-06-14 The Broad Institute, Inc. Crispr effector system based diagnostics
WO2018111947A1 (en) 2016-12-12 2018-06-21 Integrated Dna Technologies, Inc. Genome editing enhancement
WO2018111946A1 (en) 2016-12-12 2018-06-21 Integrated Dna Technologies, Inc. Genome editing detection
CN107893074A (en) 2016-12-13 2018-04-10 广东赤萌医疗科技有限公司 A kind of gRNA, expression vector, knockout system, kit for being used to knock out CXCR4 genes
WO2018109101A1 (en) 2016-12-14 2018-06-21 Wageningen Universiteit Thermostable cas9 nucleases
JP7182545B2 (en) 2016-12-14 2022-12-02 ヴァーヘニンゲン ユニヴェルシテット Thermostable CAS9 nuclease
WO2018112336A1 (en) 2016-12-16 2018-06-21 Ohio State Innovation Foundation Systems and methods for dna-guided rna cleavage
KR101748575B1 (en) 2016-12-16 2017-06-20 주식회사 엠젠플러스 INSulin gene knockout diabetes mellitus or diabetic complications animal model and a method for producing the same
WO2018112446A2 (en) 2016-12-18 2018-06-21 Selonterra, Inc. Use of apoe4 motif-mediated genes for diagnosis and treatment of alzheimer's disease
CN106755026A (en) 2016-12-18 2017-05-31 吉林大学 The foundation of the structure and enamel hypocalcification model of sgRNA expression vectors
EP3559223A1 (en) 2016-12-23 2019-10-30 President and Fellows of Harvard College Gene editing of pcsk9
WO2018119359A1 (en) 2016-12-23 2018-06-28 President And Fellows Of Harvard College Editing of ccr5 receptor gene to protect against hiv infection
CN107354173A (en) 2016-12-26 2017-11-17 浙江省医学科学院 The method that liver specificity knock-out mice model is established based on CRISPR technologies and hydrodynamic force tail vein injection
CN106755424B (en) 2016-12-26 2020-11-06 郑州大学 A CRISPR-based Escherichia coli ST131 strain detection primer, kit and detection method
CN106834347A (en) 2016-12-27 2017-06-13 安徽省农业科学院畜牧兽医研究所 A kind of goat CDK2 gene knockout carriers and its construction method
CN108243575B (en) 2016-12-27 2020-04-17 Bgt材料有限公司 Method for manufacturing polymer printed circuit board
CN106755097A (en) 2016-12-27 2017-05-31 安徽省农业科学院畜牧兽医研究所 A kind of goat TLR4 gene knockout carriers and its construction method
CN106597260B (en) 2016-12-29 2020-04-03 合肥工业大学 Analog circuit fault diagnosis method based on continuous wavelet analysis and ELM network
CN106834341B (en) 2016-12-30 2020-06-16 中国农业大学 A kind of gene site-directed mutagenesis vector and its construction method and application
CN106755077A (en) 2016-12-30 2017-05-31 华智水稻生物技术有限公司 Using CRISPR CAS9 technologies to the method for paddy rice CENH3 site-directed point mutations
CN106701763B (en) 2016-12-30 2019-07-19 重庆高圣生物医药有限责任公司 CRISPR/Cas9 targeting knockout human hepatitis B virus P gene and its specificity gRNA
CN106868008A (en) 2016-12-30 2017-06-20 重庆高圣生物医药有限责任公司 CRISPR/Cas9 targeting knock outs people Lin28A genes and its specificity gRNA
CN106701818B (en) 2017-01-09 2020-04-24 湖南杂交水稻研究中心 Method for cultivating common genic male sterile line of rice
WO2018130830A1 (en) 2017-01-11 2018-07-19 Oxford University Innovation Limited Crispr rna
CN107012164B (en) 2017-01-11 2023-03-03 电子科技大学 CRISPR/Cpf1 plant genome directed modification functional unit, vector containing functional unit and application of functional unit
EP3572525A4 (en) 2017-01-17 2020-09-30 Institute for Basic Science PROCESS FOR IDENTIFYING A BASE-EDITING OFF-TARGET SITE BY DNA STRAND BREAKING
US20180201921A1 (en) 2017-01-18 2018-07-19 Excision Biotherapeutics, Inc. CRISPRs
CN106701823A (en) 2017-01-18 2017-05-24 上海交通大学 Establishment and application of CHO cell line for producing fucose-free monoclonal antibody
CN107058372A (en) 2017-01-18 2017-08-18 四川农业大学 A kind of construction method of CRISPR/Cas9 carriers applied on plant
CN106801056A (en) 2017-01-24 2017-06-06 中国科学院广州生物医药与健康研究院 The slow virus carrier and application of a kind of sgRNA and its structure
ES2950676T3 (en) 2017-01-30 2023-10-11 Kws Saat Se & Co Kgaa Repair of template binding to endonucleases for gene modification
TWI608100B (en) 2017-02-03 2017-12-11 國立清華大學 Cas9 expression plastid, E. coli gene editing system and method thereof
US10465187B2 (en) 2017-02-06 2019-11-05 Trustees Of Boston University Integrated system for programmable DNA methylation
TW201839136A (en) 2017-02-06 2018-11-01 瑞士商諾華公司 Composition and method for treating hemochromatosis
JP2020506948A (en) 2017-02-07 2020-03-05 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア Gene therapy for haploinsufficiency
US20190345501A1 (en) 2017-02-07 2019-11-14 Massachusetts Institute Of Technology Methods and compositions for rna-guided genetic circuits
WO2018148647A2 (en) 2017-02-10 2018-08-16 Lajoie Marc Joseph Genome editing reagents and their use
IT201700016321A1 (en) 2017-02-14 2018-08-14 Univ Degli Studi Di Trento HIGH-SPECIFICITY CAS9 MUTANTS AND THEIR APPLICATIONS.
WO2018152197A1 (en) 2017-02-15 2018-08-23 Massachusetts Institute Of Technology Dna writers, molecular recorders and uses thereof
KR102772726B1 (en) 2017-02-15 2025-02-25 키진 엔.브이. Methods for targeted genetic modification in plant cells
CN106957855B (en) 2017-02-16 2020-04-17 上海市农业科学院 Method for targeted knockout of rice dwarf gene SD1 by using CRISPR/Cas9 technology
WO2018152418A1 (en) 2017-02-17 2018-08-23 Temple University - Of The Commonwealth System Of Higher Education Gene editing therapy for hiv infection via dual targeting of hiv genome and ccr5
EP3583216A4 (en) 2017-02-20 2021-03-10 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences Genome editing system and method
EP3585899A1 (en) 2017-02-22 2020-01-01 CRISPR Therapeutics AG Materials and methods for treatment of primary hyperoxaluria type 1 (ph1) and other alanine-glyoxylate aminotransferase (agxt) gene related conditions or disorders
WO2018156372A1 (en) 2017-02-22 2018-08-30 The Regents Of The University Of California Genetically modified non-human animals and products thereof
US11559588B2 (en) 2017-02-22 2023-01-24 Crispr Therapeutics Ag Materials and methods for treatment of Spinocerebellar Ataxia Type 1 (SCA1) and other Spinocerebellar Ataxia Type 1 Protein (ATXN1) gene related conditions or disorders
EP3585897A1 (en) 2017-02-22 2020-01-01 CRISPR Therapeutics AG Materials and methods for treatment of dystrophic epidermolysis bullosa (deb) and other collagen type vii alpha 1 chain (col7a1) gene related conditions or disorders
EP3585894A1 (en) 2017-02-22 2020-01-01 CRISPR Therapeutics AG Compositions and methods for treatment of proprotein convertase subtilisin/kexin type 9 (pcsk9)-related disorders
WO2018154462A2 (en) 2017-02-22 2018-08-30 Crispr Therapeutics Ag Materials and methods for treatment of spinocerebellar ataxia type 2 (sca2) and other spinocerebellar ataxia type 2 protein (atxn2) gene related conditions or disorders
JP2020508056A (en) 2017-02-22 2020-03-19 クリスパー・セラピューティクス・アクチェンゲゼルシャフトCRISPR Therapeutics AG Compositions and methods for gene editing
US20200095579A1 (en) 2017-02-22 2020-03-26 Crispr Therapeutics Ag Materials and methods for treatment of merosin-deficient cogenital muscular dystrophy (mdcmd) and other laminin, alpha 2 (lama2) gene related conditions or disorders
US20200040061A1 (en) 2017-02-22 2020-02-06 Crispr Therapeutics Ag Materials and methods for treatment of early onset parkinson's disease (park1) and other synuclein, alpha (snca) gene related conditions or disorders
WO2018156824A1 (en) 2017-02-23 2018-08-30 President And Fellows Of Harvard College Methods of genetic modification of a cell
CN106868031A (en) 2017-02-24 2017-06-20 北京大学 A kind of cloning process of multiple sgRNA series parallels expression based on classification assembling and application
WO2018161009A1 (en) 2017-03-03 2018-09-07 Yale University Aav-mediated direct in vivo crispr screen in glioblastoma
GB2574769A (en) 2017-03-03 2019-12-18 Univ California RNA Targeting of mutations via suppressor tRNAs and deaminases
US11111492B2 (en) 2017-03-06 2021-09-07 Florida State University Research Foundation, Inc. Genome engineering methods using a cytosine-specific Cas9
EP3592853A1 (en) 2017-03-09 2020-01-15 President and Fellows of Harvard College Suppression of pain by gene editing
JP2020510038A (en) 2017-03-09 2020-04-02 プレジデント アンド フェローズ オブ ハーバード カレッジ Cancer vaccine
JP2020510439A (en) 2017-03-10 2020-04-09 プレジデント アンド フェローズ オブ ハーバード カレッジ Base-editing factor from cytosine to guanine
EP3595694A4 (en) 2017-03-14 2021-06-09 The Regents of The University of California CONSTRUCTION OF CAS9 CRISPR IMMUNE FURTIF
CN111108220B (en) 2017-03-15 2024-11-19 博德研究所 CRISPR effector system-based diagnostics for virus detection
CN106978428A (en) 2017-03-15 2017-07-25 上海吐露港生物科技有限公司 A kind of Cas albumen specific bond target DNA, the method for regulation and control target gene transcription and kit
CN106906242A (en) 2017-03-16 2017-06-30 重庆高圣生物医药有限责任公司 A kind of method that raising CRIPSR/Cas9 targeting knock outs gene produces nonhomologous end joint efficiency
CA3057330A1 (en) 2017-03-21 2018-09-27 Anthony P. Shuber Treating cancer with cas endonuclease complexes
CA3057192A1 (en) 2017-03-23 2018-09-27 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable dna binding proteins
CN107012213A (en) 2017-03-24 2017-08-04 南开大学 Biomarkers for colorectal cancer
US10876101B2 (en) 2017-03-28 2020-12-29 Locanabio, Inc. CRISPR-associated (Cas) protein
CN106947780A (en) 2017-03-28 2017-07-14 扬州大学 A kind of edit methods of rabbit MSTN genes
CN106906240A (en) 2017-03-29 2017-06-30 浙江大学 The method that the key gene HPT in barley VE synthesis paths is knocked out with CRISPR Cas9 systems
KR102758434B1 (en) 2017-03-30 2025-01-21 고쿠리츠 다이가쿠 호진 교토 다이가쿠 Method for inducing exon skipping by genome editing
CN108660161B (en) 2017-03-31 2023-05-09 中国科学院脑科学与智能技术卓越创新中心 Method for preparing chimeric gene-free knockout animal based on CRISPR/Cas9 technology
CN107058358B (en) 2017-04-01 2020-06-09 中国科学院微生物研究所 Construction of a double-spacer sequence-recognized cleavage CRISPR-Cas9 vector and its application in Verrucobacterium
US9938288B1 (en) 2017-04-05 2018-04-10 President And Fellows Of Harvard College Macrocyclic compound and uses thereof
CN106967726B (en) 2017-04-05 2020-12-29 华南农业大学 A method and application of creating interspecific hybrid compatibility lines of Asian cultivated rice and African cultivated rice
CN107142282A (en) 2017-04-06 2017-09-08 中山大学 A kind of method that utilization CRISPR/Cas9 realizes large fragment DNA site-directed integration in mammalian cell
CN107034229A (en) 2017-04-07 2017-08-11 江苏贝瑞利生物科技有限公司 High frequency zone CRISPR/CAS9 gene editings system candidate sgRNA systems and application in a kind of plant
EP3610006B1 (en) 2017-04-11 2021-05-19 Roche Diagnostics GmbH Mutant reverse transcriptase with increased thermal stability as well as products, methods and uses involving the same
AU2018251801B2 (en) 2017-04-12 2024-11-07 Massachusetts Institute Of Technology Novel type VI CRISPR orthologs and systems
CN107058320B (en) 2017-04-12 2019-08-02 南开大学 The preparation and its application of IL7R gene delection zebra fish mutant
CN106916852B (en) 2017-04-13 2020-12-04 上海科技大学 A base editing system and its construction and application methods
CN108728476A (en) 2017-04-14 2018-11-02 复旦大学 A method of generating diversity antibody library using CRISPR systems
CN107298701B (en) 2017-04-18 2020-10-30 上海大学 Maize transcription factor ZmbZIP22 and its application
WO2018195402A1 (en) 2017-04-20 2018-10-25 Egenesis, Inc. Methods for generating genetically modified animals
CN106957844A (en) 2017-04-20 2017-07-18 华侨大学 It is a kind of effectively to knock out the virus genomic CRISPR/Cas9 of HTLV 1 gRNA sequences
WO2018195555A1 (en) 2017-04-21 2018-10-25 The Board Of Trustees Of The Leland Stanford Junior University Crispr/cas 9-mediated integration of polynucleotides by sequential homologous recombination of aav donor vectors
WO2018195545A2 (en) 2017-04-21 2018-10-25 The General Hospital Corporation Variants of cpf1 (cas12a) with altered pam specificity
EP3615665B1 (en) 2017-04-24 2025-11-26 International N&H Denmark ApS Novel anti-crispr genes and proteins and methods of use
CN107043775B (en) 2017-04-24 2020-06-16 中国农业科学院生物技术研究所 A kind of sgRNA that can promote cotton lateral root development and its application
CN206970581U (en) 2017-04-26 2018-02-06 重庆威斯腾生物医药科技有限责任公司 A kind of kit for being used to aid in CRISPR/cas9 gene knockouts
US20180312822A1 (en) 2017-04-26 2018-11-01 10X Genomics, Inc. Mmlv reverse transcriptase variants
WO2018197020A1 (en) 2017-04-27 2018-11-01 Novozymes A/S Genome editing by crispr-cas9 using short donor oligonucleotides
WO2018202800A1 (en) 2017-05-03 2018-11-08 Kws Saat Se Use of crispr-cas endonucleases for plant genome engineering
CN107012174A (en) 2017-05-04 2017-08-04 昆明理工大学 Application of the CRISPR/Cas9 technologies in silkworm zinc finger protein gene mutant is obtained
CA3062165A1 (en) 2017-05-04 2018-11-08 The Trustees Of The University Of Pennsylvania Compositions and methods for gene editing in t cells using crispr/cpf1
CN107254485A (en) 2017-05-08 2017-10-17 南京农业大学 A kind of new reaction system for being capable of rapid build plant gene fixed point knockout carrier
WO2018208755A1 (en) 2017-05-09 2018-11-15 The Regents Of The University Of California Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest
CN107129999A (en) 2017-05-09 2017-09-05 福建省农业科学院畜牧兽医研究所 A method for targeted editing of viral genomes using the stable CRISPR/Cas9 system
EP3622070A2 (en) 2017-05-10 2020-03-18 Editas Medicine, Inc. Crispr/rna-guided nuclease systems and methods
EP3622062A4 (en) 2017-05-10 2020-10-14 The Regents of the University of California DIRECTED EDITING OF CELLULAR RNA THROUGH NUCLEAR DELIVERY OF CRISPR / CAS9
WO2018209320A1 (en) 2017-05-12 2018-11-15 President And Fellows Of Harvard College Aptazyme-embedded guide rnas for use with crispr-cas9 in genome editing and transcriptional activation
CN107130000B (en) 2017-05-12 2019-12-17 浙江卫未生物医药科技有限公司 A CRISPR-Cas9 system for simultaneously knocking out KRAS gene and EGFR gene and its application
CN106939303B (en) 2017-05-16 2021-02-23 上海交通大学 A kind of Cas9 nuclease R919P and use thereof
CN106957830B (en) 2017-05-16 2020-12-25 上海交通大学 Cas9 nuclease delta F916 and application thereof
CN106916820B (en) 2017-05-16 2019-09-27 吉林大学 sgRNA capable of effectively editing porcine ROSA26 gene and its application
CN106987570A (en) 2017-05-16 2017-07-28 上海交通大学 A kind of Cas9 Nuclease Rs 780A and application thereof
US11692184B2 (en) 2017-05-16 2023-07-04 The Regents Of The University Of California Thermostable RNA-guided endonucleases and methods of use thereof
CN107012250B (en) 2017-05-16 2021-01-29 上海交通大学 Analysis method and application of genome DNA fragment editing accuracy suitable for CRISPR/Cas9 system
CN106947750B (en) 2017-05-16 2020-12-08 上海交通大学 A kind of Cas9 nuclease Q920P and use thereof
CN106957831B (en) 2017-05-16 2021-03-12 上海交通大学 A kind of Cas9 nuclease K918A and use thereof
CN107326042A (en) 2017-05-16 2017-11-07 上海交通大学 The fixed point of paddy rice TMS10 genes knocks out system and its application
CN106967697B (en) 2017-05-16 2021-03-26 上海交通大学 Cas9 nuclease G915F and application thereof
EP3625342B1 (en) 2017-05-18 2022-08-24 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2018213791A1 (en) 2017-05-18 2018-11-22 Children's National Medical Center Compositions comprising aptamers and nucleic acid payloads and methods of using the same
WO2018213708A1 (en) 2017-05-18 2018-11-22 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
EP3625340A4 (en) 2017-05-18 2021-02-24 Cargill, Incorporated Genome editing system
CN107043787B (en) 2017-05-19 2017-12-26 南京医科大学 A kind of construction method and application that MARF1 rite-directed mutagenesis mouse models are obtained based on CRISPR/Cas9
CN107236737A (en) 2017-05-19 2017-10-10 上海交通大学 The sgRNA sequences of special target arabidopsis ILK2 genes and its application
WO2018217852A1 (en) 2017-05-23 2018-11-29 Gettysburg College Crispr based tool for characterizing bacterial serovar diversity
CN107034188B (en) 2017-05-24 2018-07-24 中山大学附属口腔医院 A kind of excretion body carrier, CRISPR/Cas9 gene editings system and the application of targeting bone
US20200172895A1 (en) 2017-05-25 2020-06-04 The General Hospital Corporation Using split deaminases to limit unwanted off-target base editor deamination
EP3630975B1 (en) 2017-05-26 2025-11-19 North Carolina State University Altered guide rnas for modulating cas9 activity and methods of use
CN107177625B (en) 2017-05-26 2021-05-25 中国农业科学院植物保护研究所 A site-directed mutagenesis artificial vector system and site-directed mutagenesis method
CN107287245B (en) 2017-05-27 2020-03-17 南京农业大学 Construction method of Glrx1 gene knockout animal model based on CRISPR/Cas9 technology
CN107142272A (en) 2017-06-05 2017-09-08 南京金斯瑞生物科技有限公司 A kind of method for controlling plasmid replication in Escherichia coli
CA3065946A1 (en) 2017-06-05 2018-12-13 Research Institute At Nationwide Children's Hospital Enhanced modified viral capsid proteins
WO2018226855A1 (en) 2017-06-06 2018-12-13 The General Hospital Corporation Engineered crispr-cas9 nucleases
CN107034218A (en) 2017-06-07 2017-08-11 浙江大学 Targeting sgRNA, modification carrier for pig APN gene editings and its preparation method and application
CN107119071A (en) 2017-06-07 2017-09-01 江苏三黍生物科技有限公司 A kind of method for reducing plant amylose content and application
CN107177595A (en) 2017-06-07 2017-09-19 浙江大学 Targeting sgRNA, modification carrier for pig CD163 gene editings and its preparation method and application
CN107236739A (en) 2017-06-12 2017-10-10 上海捷易生物科技有限公司 The method of CRISPR/SaCas9 specific knockdown people's CXCR4 genes
CN106987757A (en) 2017-06-12 2017-07-28 苏州双金实业有限公司 A kind of corrosion resistant type austenitic based alloy
CN107227352A (en) 2017-06-13 2017-10-03 西安医学院 The detection method of GPR120 gene expressions based on eGFP and application
CN107083392B (en) 2017-06-13 2020-09-08 中国医学科学院病原生物学研究所 CRISPR/Cpf1 gene editing system and application thereof in mycobacteria
CN107245502B (en) 2017-06-14 2020-11-03 中国科学院武汉病毒研究所 CD2-binding protein (CD2AP) and its interacting proteins
CN107312798B (en) 2017-06-16 2020-06-23 武汉大学 CRISPR/Cas9 recombinant lentiviral vector containing gRNA sequence of specific targeting CCR5 gene and application
CN107099850B (en) 2017-06-19 2018-05-04 东北农业大学 A kind of method that CRISPR/Cas9 genomic knockouts library is built by digestion genome
CN107446951B (en) 2017-06-20 2021-01-08 温氏食品集团股份有限公司 Method for rapidly screening recombinant fowlpox virus through CRISPR/Cas9 system and application thereof
CN107266541B (en) 2017-06-20 2021-06-04 上海大学 Corn transcription factor ZmbHLH167 and application thereof
CN107058328A (en) 2017-06-22 2017-08-18 江苏三黍生物科技有限公司 A kind of method for improving plant amylose content and application
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
CN107099533A (en) 2017-06-23 2017-08-29 东北农业大学 A kind of sgRNA targeting sequencings of special target pig IGFBP3 genes and application
CN107227307A (en) 2017-06-23 2017-10-03 东北农业大学 A kind of sgRNA targeting sequencings of special target pig IRS1 genes and its application
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
CN107119053A (en) 2017-06-23 2017-09-01 东北农业大学 A kind of sgRNA targeting sequencings of special target pig MC4R genes and its application
CN107177631B (en) 2017-06-26 2020-11-24 中国农业大学 A method for knocking out Slc22a2 gene in NRK cells using CRISPR-CAS9 technology
WO2019005886A1 (en) 2017-06-26 2019-01-03 The Broad Institute, Inc. Crispr/cas-cytidine deaminase based compositions, systems, and methods for targeted nucleic acid editing
AU2018290843B2 (en) 2017-06-26 2025-04-24 Massachusetts Institute Of Technology CRISPR/Cas-adenine deaminase based compositions, systems, and methods for targeted nucleic acid editing
CN107217075B (en) 2017-06-28 2021-07-02 西安交通大学医学院第一附属医院 A method for constructing EPO gene knockout zebrafish animal model, primers, plasmids and preparation method
CN107356793A (en) 2017-07-01 2017-11-17 合肥东玖电气有限公司 A kind of fire-proof ammeter box
CN107312793A (en) 2017-07-05 2017-11-03 新疆农业科学院园艺作物研究所 The tomato dna editor carrier of Cas9 mediations and its application
CN107190006A (en) 2017-07-07 2017-09-22 南通大学附属医院 A kind of sgRNA of targeting IGF IR genes and its application
WO2019010384A1 (en) 2017-07-07 2019-01-10 The Broad Institute, Inc. Methods for designing guide sequences for guided nucleases
CN107236741A (en) 2017-07-19 2017-10-10 广州医科大学附属第五医院 A kind of gRNA and method for knocking out wild-type T cells TCR alpha chains
CN107190008A (en) 2017-07-19 2017-09-22 苏州吉赛基因测序科技有限公司 A kind of method of capture genome target sequence based on Crispr/cas9 and its application in high-flux sequence
CN107400677B (en) 2017-07-19 2020-05-22 江南大学 Bacillus licheniformis genome editing vector based on CRISPR-Cas9 system and preparation method thereof
CN107354156B (en) 2017-07-19 2021-02-09 广州医科大学附属第五医院 gRNA for knocking out TCR beta chain of wild T cell and method
CN107267515B (en) 2017-07-28 2020-08-25 重庆医科大学附属儿童医院 CRISPR/Cas9 targeted knockout of human CNE10 gene and its specific gRNA
CN107418974A (en) 2017-07-28 2017-12-01 新乡医学院 It is a kind of to sort the quick method for obtaining CRISPR/Cas9 gene knockout stable cell lines using monoclonal cell
CN107384922A (en) 2017-07-28 2017-11-24 重庆医科大学附属儿童医院 CRISPR/Cas9 targeting knock outs people CNE9 genes and its specific gRNA
WO2019023680A1 (en) 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)
CN107435069A (en) 2017-07-28 2017-12-05 新乡医学院 A kind of quick determination method of cell line CRISPR/Cas9 gene knockouts
CN107446954A (en) 2017-07-28 2017-12-08 新乡医学院 A kind of preparation method of SD rat T cells deleting genetic model
CN107435051B (en) 2017-07-28 2020-06-02 新乡医学院 Cell line gene knockout method for rapidly obtaining large fragment deletion through CRISPR/Cas9 system
BR112019028146A2 (en) 2017-07-31 2020-07-07 Sigma-Aldrich Co. Llc rna synthetic guide for crispr / cas activator systems
CN107217042B (en) 2017-07-31 2020-03-06 江苏东抗生物医药科技有限公司 Genetic engineering cell line for producing afucosylated protein and establishing method thereof
CN107446922A (en) 2017-08-03 2017-12-08 无锡市第二人民医院 A kind of gRNA sequences and its application method for knocking out hepcidin gene in human osteoblast cell's strain
CN107502618B (en) 2017-08-08 2021-03-12 中国科学院微生物研究所 Controllable vector elimination method and easy-to-use CRISPR-Cas9 tool
CN107312785B (en) 2017-08-09 2019-12-06 四川农业大学 Application of OsKTN80b Gene in Reducing Plant Height of Rice
CN107446923B (en) 2017-08-13 2019-12-31 中国人民解放军疾病预防控制所 rAAV8-CRISPR-SaCas9 system and application thereof in preparation of hepatitis B treatment drug
CN107365804B (en) 2017-08-13 2019-12-20 中国人民解放军疾病预防控制所 Method for packaging CRISPR-Cas9 system by using temperate phage vector
CN107384926B (en) 2017-08-13 2020-06-26 中国人民解放军疾病预防控制所 CRISPR-Cas9 system for targeted removal of bacterial drug-resistant plasmids and application
CN107815463A (en) 2017-08-15 2018-03-20 西南大学 CRISPR/Cas9 technologies mediate the method for building up of miR167 precursor sequence editor's systems
CN108034656A (en) 2017-08-16 2018-05-15 四川省农业科学院生物技术核技术研究所 SgRNA, CRISPR/Cas9 carrier related with rice bronzing glume character, vector construction, application
CN107446924B (en) 2017-08-16 2020-01-14 中国科学院华南植物园 Kiwi fruit gene AcPDS editing vector based on CRISPR-Cas9 and construction method and application thereof
CN107384894B (en) 2017-08-21 2019-10-22 华南师范大学 A method for efficient delivery of CRISPR/Cas9 on functionalized graphene oxide for gene editing
CN107299114B (en) 2017-08-23 2021-08-27 中国科学院分子植物科学卓越创新中心 Efficient yeast chromosome fusion method
CN107557393B (en) 2017-08-23 2020-05-08 中国科学院上海应用物理研究所 A magnetic nanomaterial-mediated CRISPR/Cas9 intracellular delivery system, preparation method and application thereof
CN107312795A (en) 2017-08-24 2017-11-03 浙江省农业科学院 The gene editing method of pink colour fruit tomato is formulated with CRISPR/Cas9 systems
CN107488649A (en) 2017-08-25 2017-12-19 南方医科大学 A kind of fusion protein of Cpf1 and p300 Core domains, corresponding DNA target are to activation system and application
CN107460196A (en) 2017-08-25 2017-12-12 同济大学 A kind of construction method of immunodeficient mouse animal model and application
CN107541525B (en) 2017-08-26 2021-12-10 内蒙古大学 Method for mediating goat Tbeta 4 gene fixed-point knock-in based on CRISPR/Cas9 technology
CN107446932B (en) 2017-08-29 2020-02-21 江西省农业科学院 Gene for controlling male reproductive development of rice and application thereof
EP3676376B1 (en) 2017-08-30 2025-01-15 President and Fellows of Harvard College High efficiency base editors comprising gam
WO2019041296A1 (en) 2017-09-01 2019-03-07 上海科技大学 Base editing system and method
CN107519492B (en) 2017-09-06 2019-01-25 武汉迈特维尔生物科技有限公司 Knockout of miR-3187-3p using CRISPR technology in coronary atherosclerotic heart disease
CN107641631A (en) 2017-09-07 2018-01-30 浙江工业大学 A CRISPR/Cas9 system-based method for gene knockout in Escherichia coli mediated by chemical transformation
CN107362372B (en) 2017-09-07 2019-01-11 佛山波若恩生物科技有限公司 Use application of the CRISPR technology in coronary atherosclerotic heart disease
US11649442B2 (en) 2017-09-08 2023-05-16 The Regents Of The University Of California RNA-guided endonuclease fusion polypeptides and methods of use thereof
CN107502608B (en) 2017-09-08 2020-10-16 中山大学 Construction method and application of sgRNA for knocking out human ALDH2 gene and ALDH2 gene deletion cell line
CN107557455A (en) 2017-09-15 2018-01-09 国家纳米科学中心 A kind of detection method of the nucleic acid specific fragment based on CRISPR Cas13a
CN107557390A (en) 2017-09-18 2018-01-09 江南大学 A kind of method for screening the high expression sites of Chinese hamster ovary celI system
CN107475300B (en) 2017-09-18 2020-04-21 上海市同济医院 Construction method and application of Ifit3-eKO1 knockout mouse animal model
US11624130B2 (en) 2017-09-18 2023-04-11 President And Fellows Of Harvard College Continuous evolution for stabilized proteins
CN107557373A (en) 2017-09-19 2018-01-09 安徽大学 A kind of gene editing method based on I Type B CRISPR Cas system genes cas3
CN107557378B (en) 2017-09-19 2025-04-25 安徽大学 A eukaryotic gene editing method based on the gene cas7-3 in the type I CRISPR-Cas system
CN107630042A (en) 2017-09-19 2018-01-26 安徽大学 A kind of prokaryotic gene edit methods for coming from I type Cas 4 cas genes of system
CN107523583A (en) 2017-09-19 2017-12-29 安徽大学 A kind of prokaryotic gene edit methods for coming from gene cas5 3 in I type CRISPR Cas systems
CN107630041A (en) 2017-09-19 2018-01-26 安徽大学 A kind of eukaryotic gene edit methods based on Virginia streptomycete IBL14 I Type B Cas systems
CN107619837A (en) 2017-09-20 2018-01-23 西北农林科技大学 The method that nuclease-mediated Ipr1 fixed points insertion acquisition transgenic cow fetal fibroblast is cut using Cas9
CN107513531B (en) 2017-09-21 2020-02-21 无锡市妇幼保健院 gRNA target sequence for endogenously over-expressing lncRNA-XIST and application thereof
CN107686848A (en) 2017-09-26 2018-02-13 中山大学孙逸仙纪念医院 The stable of transposons collaboration CRISPR/Cas9 systems knocks out single plasmid vector and its application
CN107760652A (en) 2017-09-29 2018-03-06 华南理工大学 The cell models of caco 2 and its method that CRISPR/CAS9 mediate drugs transporter target knocks out
CN107557394A (en) 2017-09-29 2018-01-09 南京鼓楼医院 The method for reducing embryonic gene editor's miss rate of CRISPR/Cas9 mediations
CN107828794A (en) 2017-09-30 2018-03-23 上海市农业生物基因中心 A kind of method for creating of Rice Salt gene OsRR22 mutant, its amino acid sequence encoded, plant and the mutant
CN107630006B (en) 2017-09-30 2020-09-11 山东兴瑞生物科技有限公司 Method for preparing T cell with double knockout genes of TCR and HLA
CN107760663A (en) 2017-09-30 2018-03-06 新疆大学 The clone of chufa pepc genes and structure and the application of expression vector
CN107604003A (en) 2017-10-10 2018-01-19 南方医科大学 One kind knocks out kit and its application based on linearisation CRISPR CAS9 lentiviral vector genomes
CN107557381A (en) 2017-10-12 2018-01-09 南京农业大学 A kind of foundation and its application of Chinese cabbage CRISPR Cas9 gene editing systems
EP3694530A4 (en) 2017-10-12 2021-06-30 Wave Life Sciences Ltd. OLIGONUCLEOTIDE COMPOSITIONS AND METHOD FOR THEREFORE
CN107474129B (en) 2017-10-12 2018-10-19 江西汉氏联合干细胞科技有限公司 The method of specificity enhancing CRISPR-CAS system gene editorial efficiencies
CN108102940B (en) 2017-10-12 2021-07-13 中石化上海工程有限公司 An industrial Saccharomyces cerevisiae strain using CRISPR/Cas9 system to knock out XKS1 gene and its construction method
CN108103586A (en) 2017-10-13 2018-06-01 上海科技大学 A kind of CRISPR/Cas9 random libraries and its structure and application
CN107586779B (en) 2017-10-14 2018-08-28 天津金匙生物科技有限公司 The method that CASP3 gene knockouts are carried out to mescenchymal stem cell using CRISPR-CAS systems
CN107619829B (en) 2017-10-14 2018-08-24 南京平港生物技术有限公司 The method that GINS2 gene knockouts are carried out to mescenchymal stem cell using CRISPR-CAS systems
CN107523567A (en) 2017-10-16 2017-12-29 遵义医学院 A kind of construction method for the esophageal cancer cell strain for knocking out people's ezrin genetic enhancers
KR20250107288A (en) 2017-10-16 2025-07-11 더 브로드 인스티튜트, 인코퍼레이티드 Uses of adenosine base editors
CN107760715B (en) 2017-10-17 2021-12-10 张业胜 Transgenic vector and construction method and application thereof
CN107937427A (en) 2017-10-20 2018-04-20 广东石油化工学院 A kind of homologous repair vector construction method based on CRISPR/Cas9 systems
US20210130800A1 (en) 2017-10-23 2021-05-06 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
CN107893086B (en) 2017-10-24 2021-09-03 中国科学院武汉植物园 Method for rapidly constructing Cas9 binary expression vector library of paired sgRNAs
CN111712248B (en) 2017-11-02 2024-07-26 衣阿华大学研究基金会 Method for rescue of stop codon via genetic reassignment using ACE-tRNA
CN107760684B (en) 2017-11-03 2018-09-25 上海拉德钫斯生物科技有限公司 The method that RBM17 gene knockouts are carried out to mescenchymal stem cell using CRISPR-CAS systems
WO2019090367A1 (en) 2017-11-05 2019-05-09 Aveterra Corp Method and apparatus for automated composting of organic wastes
CN107858346B (en) 2017-11-06 2020-06-16 天津大学 Method for knocking out saccharomyces cerevisiae chromosome
CN107794276A (en) 2017-11-08 2018-03-13 中国农业科学院作物科学研究所 Fast and effectively crops pinpoint genetic fragment or allele replacement method and system for a kind of CRISPR mediations
EP3707252A1 (en) 2017-11-10 2020-09-16 Novozymes A/S Temperature-sensitive cas9 protein
CN107630043A (en) 2017-11-14 2018-01-26 吉林大学 The method that Gadd45a knockout rabbit models are established using knockout technology
CN108441519A (en) 2017-11-15 2018-08-24 中国农业大学 The method that homologous remediation efficiency is improved in CRISPR/CAS9 gene editings
CN107858373B (en) 2017-11-16 2020-03-17 山东省千佛山医院 Construction method of endothelial cell conditional knockout CCR5 gene mouse model
CN107893075A (en) 2017-11-17 2018-04-10 和元生物技术(上海)股份有限公司 CRISPR Cas9 targeting knock out people colon-cancer cell RITA genes and its specific sgRNA
CN108192956B (en) 2017-11-17 2021-06-01 东南大学 A DNA detection and analysis method based on Cas9 nuclease and its application
CN107828874B (en) 2017-11-20 2020-10-16 东南大学 DNA detection and typing method based on CRISPR and application thereof
CN107904261A (en) 2017-11-21 2018-04-13 福州大学 The preparation of CRISPR/Cas9 nano gene systems and its application in terms of transfection
CN107653256A (en) 2017-11-21 2018-02-02 云南省烟草农业科学研究院 A kind of Polyphenol Oxidase in Tobacco gene NtPPO1 and its directed mutagenesis method and application
CN107893076A (en) 2017-11-23 2018-04-10 和元生物技术(上海)股份有限公司 CRISPR Cas9 targeting knock outs human breast cancer cell RASSF2 genes and its specific sgRNA
CN107937501A (en) 2017-11-24 2018-04-20 安徽师范大学 A kind of method of fast and convenient screening CRISPR/Cas gene editing positive objects
CN107937432B (en) 2017-11-24 2020-05-01 华中农业大学 A Genome Editing Method Based on CRISPR System and Its Application
CN107828738A (en) 2017-11-28 2018-03-23 新乡医学院 A kind of dnmt rna deficiency Chinese hamster ovary celI system and preparation method and application
CN107988256B (en) 2017-12-01 2020-07-28 暨南大学 Human huntingtin gene knock-in recombinant vector and its construction method and its application in the construction of model pigs
CN108570479B (en) 2017-12-06 2020-04-03 内蒙古大学 Method for mediating down producing goat VEGF gene fixed-point knock-in based on CRISPR/Cas9 technology
CN108148873A (en) 2017-12-06 2018-06-12 南方医科大学 A kind of CAV-1 gene delections zebra fish and preparation method thereof
CN108148835A (en) 2017-12-07 2018-06-12 和元生物技术(上海)股份有限公司 The sgRNA of CRISPR-Cas9 targeting knock out SLC30A1 genes and its specificity
CN108315330B (en) 2017-12-07 2020-05-19 嘉兴市第一医院 sgRNA of CRISPR-Cas9 system specific targeting human RSPO2 gene, knockout method and application
CN107974466B (en) 2017-12-07 2020-09-29 中国科学院水生生物研究所 A Sturgeon CRISPR/Cas9 Gene Editing Method
CN108251423B (en) 2017-12-07 2020-11-06 嘉兴市第一医院 sgRNA of CRISPR-Cas9 system specific targeting human RSPO2 gene, activation method and application
CN108103090B (en) 2017-12-12 2021-06-15 中山大学附属第一医院 RNA Cas9-m6A modified carrier system targeting RNA methylation and its construction method and application
CN107828826A (en) 2017-12-12 2018-03-23 南开大学 A kind of external method for efficiently obtaining NSC
CN108103098B (en) 2017-12-14 2020-07-28 华南理工大学 A compound skin sensitization in vitro evaluation cell model and its construction method
JP2021506251A (en) 2017-12-14 2021-02-22 クリスパー セラピューティクス アーゲー New RNA programmable endonuclease system, as well as its use in genome editing and other applications
WO2019118949A1 (en) 2017-12-15 2019-06-20 The Broad Institute, Inc. Systems and methods for predicting repair outcomes in genetic engineering
CN107988268A (en) 2017-12-18 2018-05-04 湖南师范大学 A kind of method of gene knockout selection and breeding tcf25 Gene Deletion zebra fish
CN108018316A (en) 2017-12-20 2018-05-11 湖南师范大学 A kind of method of gene knockout selection and breeding rmnd5b Gene Deletion zebra fish
WO2019123430A1 (en) 2017-12-21 2019-06-27 Casebia Therapeutics Llp Materials and methods for treatment of usher syndrome type 2a and/or non-syndromic autosomal recessive retinitis pigmentosa (arrp)
CN108048466B (en) 2017-12-21 2020-02-07 嘉兴市第一医院 CRRNA of CRISPR-Cas13a system specific targeting human RSPO2 gene, system and application
WO2019126709A1 (en) 2017-12-22 2019-06-27 The Broad Institute, Inc. Cas12b systems, methods, and compositions for targeted dna base editing
RU2652899C1 (en) 2017-12-28 2018-05-03 Федеральное бюджетное учреждение науки "Центральный научно-исследовательский институт эпидемиологии" Федеральной службы по надзору в сфере защиты прав потребителей и благополучия человека (ФБУН ЦНИИ Эпидемиологии Роспотребнадзора) Rna-conductors to suppress the replication of hepatitis b virus and for the elimination of hepatitis b virus from host cell
CN107893080A (en) 2017-12-29 2018-04-10 江苏省农业科学院 A kind of sgRNA for targetting rat Inhba genes and its application
CN107988246A (en) 2018-01-05 2018-05-04 汕头大学医学院 A kind of gene knockout carrier and its zebra fish Glioma Model
CN107988229B (en) 2018-01-05 2020-01-07 中国农业科学院作物科学研究所 A method for obtaining tiller-altered rice by modifying the OsTAC1 gene using CRISPR-Cas
CN108103092B (en) 2018-01-05 2021-02-12 中国农业科学院作物科学研究所 System for modifying OsHPH gene by using CRISPR-Cas system to obtain dwarf rice and application thereof
CN108559760A (en) 2018-01-09 2018-09-21 陕西师范大学 The method for establishing luciferase knock-in cell lines based on CRISPR targeted genomic modification technologies
WO2019139951A1 (en) 2018-01-09 2019-07-18 The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Detecting protein interaction sites in nucleic acids
CN108559730B (en) 2018-01-12 2021-09-24 中国人民解放军第四军医大学 An experimental method for constructing Hutat2:Fc gene knock-in monocytes using CRISPR/Cas9 technology
CN108148837A (en) 2018-01-12 2018-06-12 南京医科大学 ApoE-CRISPR/Cas9 carriers and its application in ApoE genes are knocked out
US11268092B2 (en) 2018-01-12 2022-03-08 GenEdit, Inc. Structure-engineered guide RNA
CN108251451A (en) 2018-01-16 2018-07-06 西南大学 CRISPR/Cas9-gRNA target practices sequence pair, plasmid and its application of HTT
CN108251452A (en) 2018-01-17 2018-07-06 扬州大学 A kind of transgenic zebrafish for expressing Cas9 genes and its construction method and application
KR102839528B1 (en) 2018-01-23 2025-07-29 기초과학연구원 Extended single guide RNA and uses thereof
CN208034188U (en) 2018-02-09 2018-11-02 衡阳市振洋汽车配件有限公司 A kind of processing hole fixture quickly positioned
CN108359712B (en) 2018-02-09 2020-06-26 广东省农业科学院农业生物基因研究中心 Method for rapidly and efficiently screening SgRNA target DNA sequence
CN108559745A (en) 2018-02-10 2018-09-21 和元生物技术(上海)股份有限公司 The method for improving B16F10 cell transfecting efficiencies based on CRISPR-Cas9 technologies
CN108359691B (en) 2018-02-12 2021-09-28 中国科学院重庆绿色智能技术研究院 Kit and method for knocking out abnormal mitochondrial DNA by mito-CRISPR/Cas9 system
CN108486145A (en) 2018-02-12 2018-09-04 中国科学院遗传与发育生物学研究所 Plant efficient methods of homologous recombination based on CRISPR/Cas9
WO2019161251A1 (en) 2018-02-15 2019-08-22 The Broad Institute, Inc. Cell data recorders and uses thereof
CN109021111B (en) 2018-02-23 2021-12-07 上海科技大学 Gene base editor
US20220307001A1 (en) 2018-02-27 2022-09-29 President And Fellows Of Harvard College Evolved cas9 variants and uses thereof
CN108396027A (en) 2018-02-27 2018-08-14 和元生物技术(上海)股份有限公司 The sgRNA of CRISPR-Cas9 targeting knock out people colon-cancer cell DEAF1 genes and its specificity
CN108486159B (en) 2018-03-01 2021-10-22 南通大学附属医院 A CRISPR-Cas9 system for knocking out GRIN2D gene and its application
CN108410906A (en) 2018-03-05 2018-08-17 淮海工学院 A kind of CRISPR/Cpf1 gene editing methods being applicable in Yu Haiyang shell-fish mitochondrial genomes
CN108342480B (en) 2018-03-05 2022-03-01 北京医院 Gene variation detection quality control substance and preparation method thereof
CN108410907B (en) 2018-03-08 2021-08-27 湖南农业大学 Method for realizing HMGCR gene knockout based on CRISPR/Cas9 technology
CN108410911B (en) 2018-03-09 2021-08-20 广西医科大学 LMNA knockout cell line based on CRISPR/Cas9 technology
CN108486108B (en) 2018-03-16 2020-10-09 华南农业大学 A cell line knocking out human HMGB1 gene and its application
CN108486146B (en) 2018-03-16 2021-02-19 中国农业科学院作物科学研究所 Application of LbCpf1-RR mutant in CRISPR/Cpf1 system in plant gene editing
CN108384784A (en) 2018-03-23 2018-08-10 广西医科大学 A method of knocking out Endoglin genes using CRISPR/Cas9 technologies
CA3094828A1 (en) 2018-03-23 2019-09-26 Massachusetts Eye And Ear Infirmary Crispr/cas9-mediated exon-skipping approach for ush2a-associated usher syndrome
CN108504685A (en) 2018-03-27 2018-09-07 宜明细胞生物科技有限公司 A method of utilizing CRISPR/Cas9 system homologous recombination repair IL-2RG dcc genes
CN108410877A (en) 2018-03-27 2018-08-17 和元生物技术(上海)股份有限公司 The sgRNA of CRISPR-Cas9 targeting knock outs people's cell SANIL1 genes and its specificity
CN108486234B (en) 2018-03-29 2022-02-11 东南大学 A kind of CRISPR typing PCR method and its application
CN108424931A (en) 2018-03-29 2018-08-21 内蒙古大学 The method that CRISPR/Cas9 technologies mediate goat VEGF Gene targetings
CN108486154A (en) 2018-04-04 2018-09-04 福州大学 A kind of construction method of sialidase gene knock-out mice model and its application
CN108486111A (en) 2018-04-04 2018-09-04 山西医科大学 The method and its specificity sgRNA of CRISPR-Cas9 targeting knock out people's SMYD3 genes
CN108504693A (en) 2018-04-04 2018-09-07 首都医科大学附属北京朝阳医院 The O-type that T synthase genes structure is knocked out using Crispr technologies glycosylates abnormal colon carcinoma cell line
CN108441520B (en) 2018-04-04 2020-07-31 苏州大学 Conditional gene knockout method constructed by CRISPR/Cas9 system
CN108753772B (en) 2018-04-04 2020-10-30 南华大学 Construction method of human neuroblastoma cell line with CAPNS1 gene knocked out based on CRISPR/Cas technology
CN108504657B (en) 2018-04-12 2019-06-14 中南民族大学 The method for knocking out HEK293T cell KDM2A gene using CRISPR-CAS9 technology
CN108588182B (en) 2018-04-13 2025-11-28 武汉中科先进技术研究院有限公司 Isothermal amplification and detection technology based on CRISPR-chain substitution
CN108753817A (en) 2018-04-13 2018-11-06 北京华伟康信生物科技有限公司 The enhanced cell for enhancing the method for the anti-cancer ability of cell and being obtained using this method
JP2021521889A (en) 2018-04-17 2021-08-30 アプライド ステムセル,インコーポレイテッド Compositions and Methods for Treating Spinal Muscular Atrophy
CN108823248A (en) 2018-04-20 2018-11-16 中山大学 A method of Luchuan pigs CD163 gene is edited using CRISPR/Cas9
CN108753832A (en) 2018-04-20 2018-11-06 中山大学 A method of editing Large White CD163 genes using CRISPR/Cas9
CN108588071A (en) 2018-04-25 2018-09-28 和元生物技术(上海)股份有限公司 The sgRNA of CRISPR-Cas9 targeting knock out people colon-cancer cell CNR1 genes and its specificity
CN108588128A (en) 2018-04-26 2018-09-28 南昌大学 A kind of construction method of high efficiency soybean CRISPR/Cas9 systems and application
CN108707621B (en) 2018-04-26 2021-02-12 中国农业科学院作物科学研究所 CRISPR/Cpf1 system-mediated homologous recombination method taking RNA transcript as repair template
CN108546712B (en) 2018-04-26 2020-08-07 中国农业科学院作物科学研究所 Method for realizing homologous recombination of target gene in plant by using CRISPR/L bcPf1 system
CN108642053A (en) 2018-04-28 2018-10-12 和元生物技术(上海)股份有限公司 The sgRNA of CRISPR-Cas9 targeting knock out people colon-cancer cell PPP1R1C genes and its specificity
CN108611364A (en) 2018-05-03 2018-10-02 南京农业大学 A kind of preparation method of non-transgenic CRISPR mutant
CN108588123A (en) 2018-05-07 2018-09-28 南京医科大学 CRISPR/Cas9 carriers combine the application in the blood product for preparing gene knock-out pig
CN121555430A (en) 2018-05-11 2026-02-24 比姆医疗股份有限公司 Method for substituting pathogenic amino acids using programmable base editor system
CN108610399B (en) 2018-05-14 2019-09-27 河北万玛生物医药有限公司 The method that specificity enhancing CRISPR-CAS system carries out gene editing efficiency in epidermal stem cells
CN108546717A (en) 2018-05-15 2018-09-18 吉林大学 The method that antisense lncRNA mediates cis regulatory inhibition expression of target gene
CN108624622A (en) 2018-05-16 2018-10-09 湖南艾佳生物科技股份有限公司 A kind of genetically engineered cell strain that can secrete mouse interleukin -6 based on CRISPR-Cas9 systems structure
CN108546718B (en) 2018-05-16 2021-07-09 康春生 Application of crRNA-mediated CRISPR/Cas13a gene editing system in tumor cells
CN108642055B (en) 2018-05-17 2021-12-03 吉林大学 sgRNA capable of effectively editing pig miR-17-92 gene cluster
CN108642077A (en) 2018-05-18 2018-10-12 江苏省农业科学院 Method based on CRISPR/Cas9 gene editing technology selection and breeding mung bean sterile mutants and special gRNA
CN108642090A (en) 2018-05-18 2018-10-12 中国人民解放军总医院 Method and the application that Nogo-B knocks out pattern mouse are obtained based on CRISPR/Cas9 technologies
CN108642078A (en) 2018-05-18 2018-10-12 江苏省农业科学院 Method based on CRISPR/Cas9 gene editing technology selection and breeding Mung Bean Bloomings pollination mutant and special gRNA
CN108559732A (en) 2018-05-21 2018-09-21 陕西师范大学 The method for establishing KI-T2A-luciferase cell lines based on CRISPR/Cas9 targeted genomic modification technologies
CN108707620A (en) 2018-05-22 2018-10-26 西北农林科技大学 A kind of Gene drive carriers and construction method
EP3797160A1 (en) 2018-05-23 2021-03-31 The Broad Institute Inc. Base editors and uses thereof
US11117812B2 (en) 2018-05-24 2021-09-14 Aqua-Aerobic Systems, Inc. System and method of solids conditioning in a filtration system
CN108690844B (en) 2018-05-25 2021-10-15 西南大学 CRISPR/Cas9-gRNA targeting sequence pair, plasmid and HD cell model for HTT
CN108707628B (en) 2018-05-28 2021-11-23 上海海洋大学 Preparation method of zebra fish notch2 gene mutant
CN108823249A (en) 2018-05-28 2018-11-16 上海海洋大学 The method of CRISPR/Cas9 building notch1a mutant zebra fish
CN108707629A (en) 2018-05-28 2018-10-26 上海海洋大学 The preparation method of zebra fish notch1b gene mutation bodies
CN108707604B (en) 2018-05-30 2019-07-23 江西汉氏联合干细胞科技有限公司 CNE10 gene knockout is carried out using CRISPR-Cas system in epidermal stem cells
CN108753835A (en) 2018-05-30 2018-11-06 中山大学 A method of editing pig BMP15 genes using CRISPR/Cas9
CN108753836B (en) 2018-06-04 2021-10-12 北京大学 Gene regulation or editing system utilizing RNA interference mechanism
CN108715850B (en) 2018-06-05 2020-10-23 艾一生命科技(广东)有限公司 GING2 gene knockout in epidermal stem cells by using CRISPR-Cas system
BR112020024863A2 (en) 2018-06-05 2022-02-01 Lifeedit Inc RNA-guided nucleases, active fragments and variants thereof and methods of use
CN108753813B (en) 2018-06-08 2021-08-24 中国水稻研究所 Methods of obtaining marker-free transgenic plants
CN108753783A (en) 2018-06-13 2018-11-06 上海市同济医院 The construction method of Sqstm1 full genome knock-out mice animal models and application
WO2019241649A1 (en) 2018-06-14 2019-12-19 President And Fellows Of Harvard College Evolution of cytidine deaminases
CN108728486A (en) 2018-06-20 2018-11-02 江苏省农业科学院 A kind of construction method of eggplant CRISPR/Cas9 gene knockout carriers and application
CN108841845A (en) 2018-06-21 2018-11-20 广东石油化工学院 A kind of CRISPR/Cas9 carrier and its construction method with selection markers
CN108893529A (en) 2018-06-25 2018-11-27 武汉博杰生物医学科技有限公司 A kind of crRNA being mutated based on CRISPR technology specific detection people KRAS gene 2 and 3 exons
CN108866093B (en) 2018-07-04 2021-07-09 广东三杰牧草生物科技有限公司 Method for performing site-directed mutagenesis on alfalfa gene by using CRISPR/Cas9 system
CN108795902A (en) 2018-07-05 2018-11-13 深圳三智医学科技有限公司 A kind of safe and efficient CRISPR/Cas9 gene editings technology
CN108913714A (en) 2018-07-05 2018-11-30 江西省超级水稻研究发展中心 A method of BADH2 gene, which is knocked out, using CRISPR/Cas9 system formulates fragrant rice
US12522807B2 (en) 2018-07-09 2026-01-13 The Broad Institute, Inc. RNA programmable epigenetic RNA modifiers and uses thereof
CN108913691B (en) 2018-07-16 2020-09-01 山东华御生物科技有限公司 Card3 gene knockout in epidermal stem cells by using CRISPR-Cas system
CN108913664B (en) 2018-07-20 2020-09-04 嘉兴学院 Method for knocking out CFP1 gene in ovarian cancer cell by CRISPR/Cas9 gene editing method
CN108853133A (en) 2018-07-25 2018-11-23 福州大学 A kind of preparation method of PAMAM and CRISPR/Cas9 System reorganization plasmid delivery nanoparticle
CN108823291B (en) 2018-07-25 2022-04-12 领航医学科技(深圳)有限公司 Specific nucleic acid fragment quantitative detection method based on CRISPR technology
CN113348245A (en) 2018-07-31 2021-09-03 博德研究所 Novel CRISPR enzymes and systems
CN108913717A (en) 2018-08-01 2018-11-30 河南农业大学 A method of using CRISPR/Cas9 system to rice PHYB site-directed point mutation
AU2019316094B2 (en) 2018-08-03 2026-02-19 Beam Therapeutics Inc. Multi-effector nucleobase editors and methods of using same to modify a nucleic acid target sequence
EP3841203A4 (en) 2018-08-23 2022-11-02 The Broad Institute Inc. CAS9 VARIANTS WITH NON-CANONICAL PAM SPECIFICITIES AND USES OF THEM
CN113286880A (en) 2018-08-28 2021-08-20 旗舰先锋创新Vi有限责任公司 Methods and compositions for regulating a genome
US20240173430A1 (en) 2018-09-05 2024-05-30 The Broad Institute, Inc. Base editing for treating hutchinson-gilford progeria syndrome
BR112021007123A2 (en) 2018-10-15 2021-08-10 University Of Massachusetts base editing of programmable DNA by the nme2cas9-deaminase fusion proteins
WO2020086908A1 (en) 2018-10-24 2020-04-30 The Broad Institute, Inc. Constructs for improved hdr-dependent genomic editing
WO2020092453A1 (en) 2018-10-29 2020-05-07 The Broad Institute, Inc. Nucleobase editors comprising geocas9 and uses thereof
US20220282275A1 (en) 2018-11-15 2022-09-08 The Broad Institute, Inc. G-to-t base editors and uses thereof
WO2020102709A1 (en) 2018-11-16 2020-05-22 The Regents Of The University Of California Compositions and methods for delivering crispr/cas effector polypeptides
CN109517841B (en) 2018-12-05 2020-10-30 华东师范大学 Composition, method and application for nucleotide sequence modification
US12351837B2 (en) 2019-01-23 2025-07-08 The Broad Institute, Inc. Supernegatively charged proteins and uses thereof
AU2020215232A1 (en) 2019-01-28 2021-08-26 Proqr Therapeutics Ii B.V. RNA-editing oligonucleotides for the treatment of usher syndrome
CN113396220A (en) 2019-01-29 2021-09-14 华盛顿大学 Method for gene editing
EP3918077A4 (en) 2019-01-31 2023-03-29 Beam Therapeutics, Inc. NUCLEOBASE EDITORS WITH REDUCED OFF-TARGET DESAMINATION AND METHODS OF USING THEM TO MODIFY A NUCLEOBASE TARGET SEQUENCE
WO2020160481A1 (en) 2019-02-01 2020-08-06 The General Hospital Corporation Targetable 3'-overhang nuclease fusion proteins
WO2020180975A1 (en) 2019-03-04 2020-09-10 President And Fellows Of Harvard College Highly multiplexed base editing
WO2020181180A1 (en) 2019-03-06 2020-09-10 The Broad Institute, Inc. A:t to c:g base editors and uses thereof
WO2020181202A1 (en) 2019-03-06 2020-09-10 The Broad Institute, Inc. A:t to t:a base editing through adenine deamination and oxidation
WO2020181195A1 (en) 2019-03-06 2020-09-10 The Broad Institute, Inc. T:a to a:t base editing through adenine excision
WO2020181178A1 (en) 2019-03-06 2020-09-10 The Broad Institute, Inc. T:a to a:t base editing through thymine alkylation
WO2020181193A1 (en) 2019-03-06 2020-09-10 The Broad Institute, Inc. T:a to a:t base editing through adenosine methylation
JP7657726B2 (en) 2019-03-19 2025-04-07 ザ ブロード インスティテュート,インコーポレーテッド Editing Methods and compositions for editing nucleotide sequences
US20220195514A1 (en) 2019-03-29 2022-06-23 The Broad Institute, Inc. Construct for continuous monitoring of live cells
US20220204975A1 (en) 2019-04-12 2022-06-30 President And Fellows Of Harvard College System for genome editing
US12473543B2 (en) 2019-04-17 2025-11-18 The Broad Institute, Inc. Adenine base editors with reduced off-target effects
WO2020236982A1 (en) 2019-05-20 2020-11-26 The Broad Institute, Inc. Aav delivery of nucleobase editors
AU2020288623A1 (en) 2019-06-06 2022-01-06 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
EP4010474A1 (en) 2019-08-08 2022-06-15 The Broad Institute, Inc. Base editors with diversified targeting scope
WO2021030666A1 (en) 2019-08-15 2021-02-18 The Broad Institute, Inc. Base editing by transglycosylation
US12435330B2 (en) 2019-10-10 2025-10-07 The Broad Institute, Inc. Methods and compositions for prime editing RNA
US20230086199A1 (en) 2019-11-26 2023-03-23 The Broad Institute, Inc. Systems and methods for evaluating cas9-independent off-target editing of nucleic acids
EP4085141A4 (en) 2019-12-30 2024-03-06 The Broad Institute, Inc. GENOME EDITING USING ACTIVATED, FULLY ACTIVE CRISPR COMPLEXES OF REVERSE TRANSCRIPTASE
CA3166153A1 (en) 2020-01-28 2021-08-05 The Broad Institute, Inc. Base editors, compositions, and methods for modifying the mitochondrial genome
WO2021158995A1 (en) 2020-02-05 2021-08-12 The Broad Institute, Inc. Base editor predictive algorithm and method of use
WO2021158921A2 (en) 2020-02-05 2021-08-12 The Broad Institute, Inc. Adenine base editors and uses thereof
WO2021158999A1 (en) 2020-02-05 2021-08-12 The Broad Institute, Inc. Gene editing methods for treating spinal muscular atrophy
US20230127008A1 (en) 2020-03-11 2023-04-27 The Broad Institute, Inc. Stat3-targeted base editor therapeutics for the treatment of melanoma and other cancers
WO2021222318A1 (en) 2020-04-28 2021-11-04 The Broad Institute, Inc. Targeted base editing of the ush2a gene
DE112021002672T5 (en) 2020-05-08 2023-04-13 President And Fellows Of Harvard College METHODS AND COMPOSITIONS FOR EDIT BOTH STRANDS SIMULTANEOUSLY OF A DOUBLE STRANDED NUCLEOTIDE TARGET SEQUENCE
EP4217490A2 (en) 2020-09-24 2023-08-02 The Broad Institute Inc. Prime editing guide rnas, compositions thereof, and methods of using the same
EP4274894A2 (en) 2021-01-11 2023-11-15 The Broad Institute, Inc. Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision
US20240287487A1 (en) 2021-06-11 2024-08-29 The Broad Institute, Inc. Improved cytosine to guanine base editors
WO2023015309A2 (en) 2021-08-06 2023-02-09 The Broad Institute, Inc. Improved prime editors and methods of use

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170283831A1 (en) * 2014-12-12 2017-10-05 The Broad Institute Inc. Protected guide rnas (pgrnas)
WO2016196805A1 (en) * 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
WO2017049129A2 (en) * 2015-09-18 2017-03-23 President And Fellows Of Harvard College Methods of making guide rna
WO2017081097A1 (en) * 2015-11-09 2017-05-18 Ifom Fondazione Istituto Firc Di Oncologia Molecolare Crispr-cas sgrna library
WO2017083766A1 (en) * 2015-11-13 2017-05-18 Massachusetts Institute Of Technology High-throughput crispr-based library screening
WO2017147056A1 (en) 2016-02-22 2017-08-31 Caribou Biosciences, Inc. Methods for modulating dna repair outcomes

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ARAKAWA, H.: "A method to convert mRNA into a gRNA library for CRISPR/Cas9 editing of any organism", SCIENCE ADVANCES, vol. 2, no. 8, 24 August 2016 (2016-08-24), pages 1 - 10, XP055340557, DOI: doi:10.1126/sciadv.1600699 *
See also references of EP3724214A4
SHEN ET AL.: "Predictable and precise template-free CRISPR editing of pathogenic variants", NATURE, vol. 563, 7 November 2018 (2018-11-07), pages 646 - 651, XP036647874, DOI: doi:10.1038/s41586-018-0686-x *
SÜRÜN ET AL.: "High Efficiency Gene Correction in Hematopoietic Cells by Donor-Template-Free CRISPR/Cas9 Genome Editing", MOLECULAR THERAPY: NUCLEIC ACIDS, vol. 10, 10 November 2017 (2017-11-10), pages 1 - 8, XP055606586, DOI: doi:10.1016/j.omtn.2017.11.001 *

Cited By (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12006520B2 (en) 2011-07-22 2024-06-11 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US12559737B2 (en) 2013-09-06 2026-02-24 President And Fellows Of Harvard College Cas9 variants and uses thereof
US12473573B2 (en) 2013-09-06 2025-11-18 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US12584118B2 (en) 2013-09-06 2026-03-24 President And Fellows Of Harvard College Cas9 variants and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US12215365B2 (en) 2013-12-12 2025-02-04 President And Fellows Of Harvard College Cas variants for gene editing
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US12398406B2 (en) 2014-07-30 2025-08-26 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US12344869B2 (en) 2015-10-23 2025-07-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US12043852B2 (en) 2015-10-23 2024-07-23 President And Fellows Of Harvard College Evolved Cas9 proteins for gene editing
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11999947B2 (en) 2016-08-03 2024-06-04 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US12084663B2 (en) 2016-08-24 2024-09-10 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US12390514B2 (en) 2017-03-09 2025-08-19 President And Fellows Of Harvard College Cancer vaccine
US12516308B2 (en) 2017-03-09 2026-01-06 President And Fellows Of Harvard College Suppression of pain by gene editing
US12435331B2 (en) 2017-03-10 2025-10-07 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US12359218B2 (en) 2017-07-28 2025-07-15 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US12406749B2 (en) 2017-12-15 2025-09-02 The Broad Institute, Inc. Systems and methods for predicting repair outcomes in genetic engineering
EP3814510A4 (en) * 2018-05-04 2022-02-23 University of Massachusetts MICRO-HOMELOGY-MEDIATED REPAIR OF MICRO-DUPLICATION GENE MUTATIONS
US11697827B2 (en) 2018-05-16 2023-07-11 Synthego Corporation Systems and methods for gene modification
US11345932B2 (en) 2018-05-16 2022-05-31 Synthego Corporation Methods and systems for guide RNA design and use
US11802296B2 (en) 2018-05-16 2023-10-31 Synthego Corporation Methods and systems for guide RNA design and use
US12157760B2 (en) 2018-05-23 2024-12-03 The Broad Institute, Inc. Base editors and uses thereof
US12522807B2 (en) 2018-07-09 2026-01-13 The Broad Institute, Inc. RNA programmable epigenetic RNA modifiers and uses thereof
US12281338B2 (en) 2018-10-29 2025-04-22 The Broad Institute, Inc. Nucleobase editors comprising GeoCas9 and uses thereof
US12351837B2 (en) 2019-01-23 2025-07-08 The Broad Institute, Inc. Supernegatively charged proteins and uses thereof
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US12281303B2 (en) 2019-03-19 2025-04-22 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US12570972B2 (en) 2019-03-19 2026-03-10 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US12509680B2 (en) 2019-03-19 2025-12-30 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US12473543B2 (en) 2019-04-17 2025-11-18 The Broad Institute, Inc. Adenine base editors with reduced off-target effects
US12254959B2 (en) 2019-07-03 2025-03-18 Integrated Dna Technologies, Inc. Identification, characterization, and quantitation of CRISPR-introduced double-stranded DNA break repairs
WO2021003343A1 (en) * 2019-07-03 2021-01-07 Integrated Dna Technologies, Inc. Identification, characterization, and quantitation of crispr-introduced double-stranded dna break repairs
US11530425B2 (en) 2019-10-09 2022-12-20 Massachusetts Institute Of Technology Systems, methods, and compositions for correction of frameshift mutations
US12146152B2 (en) 2019-10-09 2024-11-19 Massachusetts Institute Of Technology Systems, methods, and compositions for correction of frameshift mutations
WO2021072309A1 (en) * 2019-10-09 2021-04-15 Massachusetts Institute Of Technology Systems, methods, and compositions for correction of frameshift mutations
US12435330B2 (en) 2019-10-10 2025-10-07 The Broad Institute, Inc. Methods and compositions for prime editing RNA
US12123033B2 (en) 2019-10-24 2024-10-22 Integrated Dna Technologies, Inc. Modified double-stranded donor templates
US12031126B2 (en) 2020-05-08 2024-07-09 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
CN115806989B (en) * 2022-11-25 2023-08-08 昆明理工大学 sgRNA aiming at mutation of exon 5 of DMD gene, vector and application
CN115806989A (en) * 2022-11-25 2023-03-17 昆明理工大学 sgRNA, carrier and application for DMD gene exon 5 mutation
WO2024163862A2 (en) 2023-02-03 2024-08-08 The Broad Institute, Inc. Gene editing methods, systems, and compositions for treating spinal muscular atrophy
WO2025040617A1 (en) * 2023-08-18 2025-02-27 Universität Zürich Microhomology mediated integration of cargo nucleic acid molecules

Also Published As

Publication number Publication date
US20220238182A1 (en) 2022-07-28
EP3724214A4 (en) 2021-09-01
US12406749B2 (en) 2025-09-02
EP3724214A1 (en) 2020-10-21

Similar Documents

Publication Publication Date Title
US12406749B2 (en) Systems and methods for predicting repair outcomes in genetic engineering
Koblan et al. Efficient C• G-to-G• C base editors developed using CRISPRi screens, target-library analysis, and machine learning
Arbab et al. Determinants of base editing outcomes from target library analysis and machine learning
JP7550816B2 (en) Genome-wide, unbiased identification of DSBs assessed by sequencing (GUIDE-Seq)
Lee et al. CRISPR/Cas9‐mediated genome engineering of CHO cell factories: Application and perspectives
Sharon et al. Functional genetic variants revealed by massively parallel precise genome editing
US11913017B2 (en) Efficient genetic screening method
US10738303B2 (en) Comprehensive in vitro reporting of cleavage events by sequencing (CIRCLE-seq)
Xie et al. High-fidelity SaCas9 identified by directional screening in human cells
CN113646434B (en) Compositions and methods for efficient gene screening using tagged guide RNA constructs
US11155814B2 (en) Methods for using DNA repair for cell engineering
Graham et al. Resources for the design of CRISPR gene editing experiments
Canaj et al. Deep profiling reveals substantial heterogeneity of integration outcomes in CRISPR knock-in experiments
US20200190699A1 (en) Assessing nuclease cleavage
Lee et al. Efficient single-nucleotide microbial genome editing achieved using CRISPR/Cpf1 with maximally 3′-end-truncated crRNAs
Farmiloe et al. Structural evolution of gene promoters driven by primate-specific KRAB zinc finger proteins
Lei et al. Chemical and biological approaches to interrogate off-target effects of genome editing tools
McDiarmid et al. A parts list of promoters and gRNA scaffolds for mammalian genome engineering and molecular recording
US20260088126A1 (en) Method and device for predicting prime editing efficiency of various prime editors in different cell types
WO2024033378A1 (en) Method of parallel, rapid and sensitive detection of dna double strand breaks
US20220238181A1 (en) Crispr guide selection
Koblan et al. Development of a set of C• G-to-G• C transversion base editors from CRISPRi screens, target-library analysis, and machine learning
Shen et al. Haplotype‐resolved telomere‐to‐telomere genome assembly of Populus lasiocarpa unveils retrotransposon‐driven centromere evolution
McDiarmid et al. Diversified, miniaturized and ancestral parts for mammalian genome engineering and molecular recording
EP4321630A1 (en) Method of parallel, rapid and sensitive detection of dna double strand breaks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18887576

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018887576

Country of ref document: EP

Effective date: 20200715

WWG Wipo information: grant in national office

Ref document number: 16772747

Country of ref document: US