WO2025003358A2 - Nouveaux systèmes de ciblage d'acide nucléique comprenant des nucléases guidées par arn - Google Patents

Nouveaux systèmes de ciblage d'acide nucléique comprenant des nucléases guidées par arn Download PDF

Info

Publication number
WO2025003358A2
WO2025003358A2 PCT/EP2024/068177 EP2024068177W WO2025003358A2 WO 2025003358 A2 WO2025003358 A2 WO 2025003358A2 EP 2024068177 W EP2024068177 W EP 2024068177W WO 2025003358 A2 WO2025003358 A2 WO 2025003358A2
Authority
WO
WIPO (PCT)
Prior art keywords
sgrna
sequence
nucleic acid
rgn
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2024/068177
Other languages
English (en)
Other versions
WO2025003358A3 (fr
Inventor
Tyson David BOWEN
Lila Herk RIEBER
Meng Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UCB Biopharma SRL
Original Assignee
UCB Biopharma SRL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UCB Biopharma SRL filed Critical UCB Biopharma SRL
Publication of WO2025003358A2 publication Critical patent/WO2025003358A2/fr
Publication of WO2025003358A3 publication Critical patent/WO2025003358A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • RNA-guided nucleases RGN
  • nucleic acid targeting systems comprising such.
  • RGN RNA-guided nucleases
  • BACKGROUND Targeted genome editing or modification has been undergoing many changes in the past years since the discovery of novel technologies and systems.
  • First systems relied on meganucleases, zinc finger fusion proteins or Transcription activator-like effector nucleases (TALENs), requiring the generation of chimeric nucleases with engineered, sequence- specific DNA-binding domains specific for each particular target sequence.
  • TALENs Transcription activator-like effector nucleases
  • RNA-guided nucleases such as the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (Cas) proteins allow for the targeting of specific sequences by using a short RNA sequence that specifically hybridizes with a particular target sequence.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • Cas Clustered Regularly Interspaced Short Palindromic Repeats
  • Such CRISPR systems became popular and gained multiple uses in research, diagnostics and therapeutics due to the ease of production of target-specific short RNA sequences and use of such with the same RGN protein.
  • Such RGNs can be used to edit genomes through the introduction of a sequence-specific, double -stranded break that is either repaired and introduces a mutation or repaired by introducing a stretch of heterologous DNA.
  • Type V-U4 CRISPR systems have been described in WO2018/035250 including some exemplary RGN sequences identified using bioinformatics methods. However, no guidance to how to make those RGNs functional for gene editing tools had been provided.
  • type V-F5 previously identified as Type V-U4 nucleic acid targeting systems comprising RGNs and RNA molecules, nucleic acid molecules encoding the same, and vectors and host cells comprising such nucleic acid molecules.
  • Figure 1 represents schematically the location of the RGNs, final catalytic RuvC residue, tracrRNA and CRISPR repeats within the locus.
  • Figure 2 shows the activity of different truncations of EGS0293 RGN.
  • Figure 3 shows the performance of several sgRNA truncation designs for EGS0293 [011]
  • Figure 4 shows the performance of several sgRNA stabilization designs for EGS0293 [012]
  • Figure 5 shows the performance of DNA affinity mutations of EGS0293 [013]
  • Figure 6 shows the Recognition Sequence Diversity of Cas12 effector proteins.
  • PCs The first two principal components (PCs) in the PC decomposition of PAM position weight matrices of diverse Cas12- related effector proteins DETAILED DESCRIPTION OF THE INVENTION Definitions [014] Table 1.
  • Cas CRISPR associated Sequence Cas9 Cas protein 9
  • Cas protein 9 CRISPR Clustered Regularly Interspaced Short Palindromic Repeats gRNA long monomeric nucleic acid targeting RNAs RGN RNA-guided nuclease ssDNA Single stranded DNA dsDNA Double stranded DNA
  • Table 2 The first two principal components (PCs) in the PC decomposition of PAM position weight matrices of diverse Cas12- related effector proteins
  • Amino acids abbreviations Abbreviation 1 letter abbreviation Amino acid name Ala A Alanine Arg R Arginine Asn N Asparagine Asp D Aspartic acid Cys C Cysteine Gln Q Glutamine Glu E Glutamic acid Gly G Glycine His H Histidine PF0434 ⁇ WO ⁇ PCT Ile I Isoleucine Leu L Leucine Lys K Lysine Met M Methionine Phe F Phenylalanine Pro P Proline Pyl O Pyrrolysine Ser S Serine Sec U Selenocysteine Thr T Threonine Trp W Tryptophan Tyr Y Tyrosine Val V Valine [016] Table 3.
  • Nucleotide Code abbreviations Nucleotide A Adenine G Guanine C Cytosine T Thymine U Uracil R Purine (A or G) Y Pyrimidine (C or T) N Any nucleotide W Weak (A or T) S Strong (G or C) M Amino (A or C) K Keto (G or T) B Not A (G or C or T) H Not G (A or C or T) D Not C (A or G or T) V Not T (A or G or C) [017] The following definitions are used throughout the description.
  • AAV adeno-associated virus
  • a biological sample may contain whole cells and/or live cells and/or cell debris.
  • the biological sample may contain (or be derived from) a “bodily fluid”. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
  • Cas12f1 refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type VF1 CRISPR system.
  • the Cas12f1 protein commonly used is from an uncultured archaeon (Un1).
  • the Cas12f1 protein may be mutated so that the nuclease activity is partly or completely inactivated.
  • Cas12f1 RGNs are described in Harrington et al (2016). Science, 362(6416), 839–842 and Karvelis et al (2020) Nucleic acids research, 48(9), 5016–5023.
  • Cas12f5 or “c2c9” refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of a subtype of the Type V-F5 CRISPR system.
  • the Cas12f5 protein consists of a Rec1 domain and tri-split RuvC domain and may be mutated so that the nuclease activity is partly or completely inactivated.
  • the term “Cas9” refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type II CRISPR system.
  • the Cas9 protein commonly used is from bacterial species Streptococcus pyogenes.
  • the Cas9 protein may be mutated so that the nuclease activity is partly or completely inactivated.
  • complement or “complementary” as used herein means a nucleic acid can mean Watson-Crick or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
  • complementarity refers to a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) proteins, including sequences encoding a Cas protein, a tracr (trans -activating CRISPR) sequence (e.g.
  • tracrRNA or an active partial tracrRNA a tracr-mate sequence (containing a "direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred herein to as a "spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
  • a tracr-mate sequence containing a "direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system
  • guide sequence also referred herein to as a "spacer” in the context of an endogenous CRISPR system
  • other sequences and transcripts from a CRISPR locus a CRISPR locus.
  • an effective PF0434 ⁇ WO ⁇ PCT amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease.
  • an effective amount of a recombinase may refer to the amount of the recombinase that is sufficient to induce recombination at a target site specifically bound and recombined by the recombinase.
  • an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • the term “enhancer” as used herein refers to non-coding DNA sequences containing multiple activator and repressor binding sites.
  • Enhancers range from 200 bp to 1 kb in length and may be either proximal, 5' upstream to the promoter or within the first intron of the regulated gene, or distal, in introns of neighboring genes or intergenic regions far away from the locus. Through DNA looping, active enhancers contact the promoter dependently of the core DNA binding motif promoter specificity.4 to 5 enhancers may interact with a promoter.
  • fusion protein refers to a chimeric protein created through the covalent or non-covalent joining of two or more genes, directly or indirectly, that originally coded for separate proteins. In some embodiments, the translation of the fusion gene results in a single polypeptide with functional properties derived from each of the original proteins.
  • gRNA also used interchangeably herein as a chimeric single guide RNA (“sgRNA”), refers to nucleic acid which is a fusion of two noncoding RNAs: a crRNA and a tracrRNA. “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains:(1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas complex to the target); and (2) a domain that binds a Cas protein.
  • an "isolated” or “purified” polypeptide, or biologically active portion thereof, is substantially or essentially free from components that normally accompany or interact with the polypeptide as found in its naturally occurring environment.
  • an isolated or purified polypeptide is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
  • a protein that is substantially free of cellular material includes preparations of protein having less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of contaminating protein.
  • optimally culture medium represents less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of chemical precursors or non-protein-of-interest chemicals.
  • PF0434 ⁇ WO ⁇ PCT The term “leader sequence” refers to the final region of the CRISPR repeat before the reprogrammable spacer that does not base pair with the final portion of the tracrRNA known as the antirepeat. This leader sequence may be useful for interactions with other regions of the tracrRNA or with the Cas12f5 protein itself.
  • linker refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
  • the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
  • the linker is an organic molecule, group, polymer, or chemical moiety.
  • the linker is a polypeptide of 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150- 200 amino acids in length. Longer or shorter linkers are also contemplated.
  • modification in reference to a nucleic acid molecule refers to a change in the nucleotide sequence of the nucleic acid molecule, which can be a deletion, insertion, or substitution of one or more nucleotides, or a combination thereof.
  • mutation refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
  • nucleic acid As used herein, the terms "nucleic acid,” “nucleic acid sequence,” “nucleotide sequence,” “oligonucleotide,” and “polynucleotide” are interchangeable and refer to a polymeric form of nucleotides.
  • the nucleotides may be deoxyribonucleotides (DNA), ribonucleotides (RNA), analogs thereof, or combinations thereof, and may be of any length.
  • Polynucleotides may perform any function and may have any secondary and tertiary structures.
  • the terms encompass known analogs of natural nucleotides and nucleotides that are modified in the base, sugar and/or phosphate moieties.
  • a polynucleotide may comprise one modified nucleotide or multiple modified nucleotides. Examples of modified nucleotides include fluorinated nucleotides, methylated nucleotides, and nucleotide analogs. Nucleotide structure may be modified before or after a polymer is assembled. Following polymerization, PF0434 ⁇ WO ⁇ PCT polynucleotides may be additionally modified via, for example, conjugation with a labeling component or target binding component. A nucleotide sequence may incorporate non-nucleotide components.
  • nucleic acids comprising modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, and have similar binding properties as a reference polynucleotide (e.g., DNA or RNA).
  • reference polynucleotide e.g., DNA or RNA
  • analogs include, but are not limited to, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), Locked Nucleic Acid (LNATM) (Exiqon, Inc., Woburn, MA) nucleosides, glycol nucleic acid, bridged nucleic acids, and morpholino structures.
  • PNAs peptide-nucleic acids
  • LNATM Locked Nucleic Acid
  • operably linked means that expression of a gene is under the control of a promoter with which it is spatially connected.
  • a promoter may be positioned 5′ (upstream) or 3′ (downstream) of a gene under its control.
  • the distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.
  • polypeptide As used herein, the terms "peptide,” “polypeptide,” and “protein” are interchangeable and refer to polymers of amino acids.
  • a polypeptide may be of any length. It may be branched or linear, it may be interrupted by non-amino acids, and it may comprise modified amino acids.
  • polypeptide sequences are displayed herein in the conventional N-terminal to C-terminal orientation.
  • Polypeptides and polynucleotides can be made using routine techniques in the field of molecular biology (see, e.g., standard texts set forth above). Further, essentially any polypeptide or polynucleotide can be custom ordered from commercial sources.
  • percentage of sequence identity means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched PF0434 ⁇ WO ⁇ PCT positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
  • promoter means a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell.
  • a promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same.
  • a promoter may also comprise distal enhancer or repressor elements, which may be located as much as several thousand base pairs from the start site of transcription.
  • a promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals.
  • RNA-guided endonuclease or “RGN” is used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage.
  • sequence identity or “identity” in the context of two polynucleotides or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
  • sequence similarity or “similarity”. Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity.
  • spacer sequence refers to a part of gRNA nucleotide sequence that directly hybridizes with the target nucleotide sequence of interest.
  • subject and patient as used herein interchangeably refers to any vertebrate, including, but not limited to, a mammal ⁇ e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea PF0434 ⁇ WO ⁇ PCT pig, cat, dog, rat, and mouse, a non-human primate (for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.) and a human).
  • a mammal ⁇ e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea PF0434 ⁇ WO ⁇ PCT pig, cat, dog, rat, and mouse
  • a non-human primate for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.
  • the subject may be a
  • target region refers to the region of the target gene to which the CRISPR-based system targets.
  • treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
  • treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
  • treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed.
  • treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease.
  • treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors).
  • Type II CRISPR system refers to effector system that carries out targeted DNA double- strand break in four sequential steps, using a single effector enzyme, Cas9, to cleave dsDNA.
  • the Type II effector system may function in alternative contexts such as eukaryotic cells.
  • the Type II effector system consists of a long pre-crRNA, which is transcribed from the spacer-containing CRISPR locus, the Cas9 protein, and a tracrRNA, which is involved in pre-crRNA processing.
  • Type V-F5 refers to a novel type of CRISPR system provided in this disclosure comprising an effector protein, such as a RGN, located neara CRISPR repeat spacer array. No other common CRISPR proteins are found nearby. Additionally, the system comprises a trans-activating crRNA (tracrRNA) within 900 nt of the last catalytic RuvC domain of the potential Cas12f5, including regions within the ORF, which is capable of hybridizing with the CRISPR RNA (crRNA) expressed from the CRISPR array..
  • vector as used herein means a nucleic acid sequence containing an origin of replication.
  • a crRNA refers to the mature form of the spacer-repeat unit.
  • a crRNA contains a spacer sequence that is involved in targeting a target nucleic.
  • crRNA has a region of complementarity to a potential DNA or RNA target sequence and in some cases, e.g., in currently characterized Type II systems, a second region that forms base-pair hydrogen bonds with a transactivating CRISPR RNA (tracrRNA) to form a secondary structure, typically to form at PF0434 ⁇ WO ⁇ PCT least a stem structure.
  • tracrRNA transactivating CRISPR RNA
  • CRISPR locus comprises polynucleotide sequences encoding for CRISPR Associated Genes (cas) genes. Cas genes are involved in the biogenesis and/or the interference stages of crRNA function. Cas genes display extreme sequence diversity between different species and homologs.
  • Mature crRNAs are processed from a longer polycistronic CRISPR locus transcript, also referred to as pre-crRNA array.
  • a pre-crRNA array comprises a plurality of crRNAs. The repeats in the pre-crRNA array are recognized by cas genes. Cas genes bind to the repeats and cleave the repeats. This action can liberate the plurality of crRNAs. crRNAs can be subjected to further events to produce the mature crRNA form such as trimming (e.g., with an exonuclease).
  • a crRNA may comprise all, or some, of the CRISPR repeat sequences.
  • Interference refers to the stage in the CRISPR system that is functionally responsible for combating infection by a foreign invader.
  • CRISPR interference follows a similar mechanism to RNA interference, which results in target RNA degradation and/or destabilization.
  • RNA interference results in target RNA degradation and/or destabilization.
  • Currently characterized CRISPR systems perform interference of a target nucleic acid by coupling crRNAs and Cas genes, thereby forming CRISPR ribonucleoproteins (RNPs).
  • crRNA of the RNP guides the RNP to foreign invader nucleic acid, (e.g. , by recognizing the foreign invader nucleic acid through hybridization).
  • Hybridized target foreign invader nucleic acid- crRNA units are subjected to cleavage by Cas proteins.
  • the RuvC-like nuclease domain of Cas12a cleaves one strand of the double-stranded nucleic acid target sequence, and a putative nuclease domain cleaves the other strand of the double- stranded nucleic acid target sequence in a staggered configuration, producing 5' overhangs, which is different from the blunt ends generated by Cas9 cleavage. These 5' overhangs may facilitate insertion of DNA.
  • the Cas12a cleavage activity of Type V systems also does not require hybridization of crRNA to tracrRNA to form a duplex, rather the crRNA of Type V systems uses a single crRNA that has a stem- loop structure forming an internal duplex.
  • Cas12a binds the crRNA in a sequence and structure specific manner that recognizes the stem loop and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence.
  • This stem-loop structure is typically in the range of 15 to 19 nucleotides in length. Substitutions that disrupt this stem- loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
  • the crRNA forms a stem-loop structure at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
  • Type V-F1 is represented by Cas12f1 protein.
  • the Cas12f1 protein cleavage activity of Type V-F1 systems does require hybridization of crRNA to tracrRNA to form a duplex.
  • Cas12f1 protein binds the tracrRNA/crRNA in a sequence- and structure-specific manner by recognizing the stem loops and PF0434 ⁇ WO ⁇ PCT sequences adjacent to the stem loops, most notably the nucleotides 5' of the spacer sequence, which hybridizes to the nucleic acid target sequence.
  • These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA.
  • nucleic acid target sequence binding involves Cas12f1 and the tracrRNA/crRNA, as does the nucleic acid target sequence cleavage.
  • Cas12f1 binds the tracrRNA/crRNA in a sequence and structure specific manner that recognizes the stem loops and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence.
  • These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA. Substitutions that disrupt this stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
  • the tracrRNA/crRNA forms stem-loop structures at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
  • Other proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage include Cas12b, Cas12c, Cas12d, and Cas12e which are similar in length to Cas12a proteins, ranging from approximately 1000-1500 amino acids, but also require an additional RNA (either a tracRNA or a scoutRNA) (see for example Harrington et al, Molecular Cell,Volume 79, Issue 3, 2020,Pages 416-424).
  • Type VI systems include the Cas13a protein (also known as Class 2 candidate 2 protein, or C2c2) which does not share sequence similarity with other CRISPR effector proteins (see Abudayyeh, et al, Science (2016) 353:aaf5573). Cas13a proteins have two HEPN domains and possess single-stranded RNA cleavage activity.
  • Genomic and /or metagenomic samples are searched for open reading frames (ORFs) and those that have predicted to be genes were selected.
  • a hidden Markov model (HMM) was used to compare the putative genes to profiles of known Cas proteins.
  • the identified Cas genes are subsequently grouped into operons, and the operon type is determined based on the presence of known signature genes.
  • the CRISPR arrays are identified based on the presence of regularly spaced repeats. The subtype of each CRISPR array is predicted using machine learning. Cas operons are considered linked to CRISPR arrays if they are less than 10 kilobases apart.
  • Regions in the identified CRIPSR operon are manually searched for potential tracrRNAs by searching for antirepeat sequences capable of hybridizing to approximately bases 1-10 of the putative crRNA sequence within 900 nt of the last catalytic RuvC domain of the RGN, including regions within the Open reading frame (ORF) of the corresponding RGN ( Figure 1).
  • the putative tracrRNA when joined together via a flexible linker, such as, for example, GAAA tetra loop, to the putative crRNA, the resulting sgRNA consists of 4 -6 stem loop sequences.
  • the essential structure of the sgRNA consist of the antirepeat from the putative tracrRNA and approximately the 5-16 final 3’ bases of the CRISPR repeat, of which the first 1-12 bases are complimentary to the antirepeat of the putative tracrRNA before 1-4 bases of unpaired “leader sequence” before the reprogrammable spacer sequence.
  • said genomic and metagenomic sequences are obtained from a sequence database such as Ensembl or NCBI genome databases.
  • the ribonucleoprotein complex can be purified from a cell or organism that has been transformed with polynucleotides that encode an RGN polypeptide and a gRNA and cultured under conditions to allow for the expression of the RGN polypeptide and guide RNA.
  • methods are provided for making an RGN polypeptide or an RGN ribonucleoprotein complex. Such methods comprise culturing a cell comprising a nucleotide sequence encoding an RGN polypeptide under conditions in which the RGN polypeptide is expressed. In some embodiments the cell further comprises a nucleotide sequence encoding a gRNA.
  • the RGN polypeptide or RGN ribonucleoprotein can then be purified from the cultured cells.
  • Methods for purifying an RGN polypeptide or RGN ribonucleoprotein complex from a biological sample are known in the art (e.g., size exclusion and/or affinity chromatography, 2D-PAGE, HPLC, reversed-phase chromatography, immunoprecipitation).
  • the tagged RGN polypeptide or RGN ribonucleoprotein complex is purified using immobilized metal affinity chromatography. It will be appreciated that other similar methods known in the art may be used, including other forms of chromatography or for example immunoprecipitation, either alone or in combination.
  • Some methods provided herein for binding and/or cleaving a target sequence of interest involve the use of an in vitro assembled RGN ribonucleoprotein complex. In vitro assembly of an RGN ribonucleoprotein complex can be performed using any method known in the art in which an RGN polypeptide is contacted with a guide RNA under conditions to allow for binding of the RGN polypeptide to the gRNA.
  • the RGN polypeptide can be purified from a biological sample, cell lysate, or culture medium, produced via in vitro translation, or chemically synthesized.
  • the gRNA can be purified from a biological sample, cell lysate, or culture medium, transcribed in vitro, or chemically synthesized.
  • the RGN polypeptide and gRNA can be brought into contact in solution (e.g., buffered saline solution) to allow for in vitro assembly of the RGN ribonucleoprotein complex.
  • RNA-guided nucleases [083]
  • the present disclosure provides CRISPR-based nucleic acid targeting systems that comprise an RNA-guided nuclease (RGN) as defined in Table 4. [084] Table 4.
  • Novel RGNs RGN code SEQ ID NO CRISPR Repeat length Protein length E GS0290 1 29 492 EGS0293 2 29 542 EGS0294 3 30 537 EGS0346 4 28 532 EGS0380 5 28 520 E GS0288 79 29 506 EGS0291 80 28 606 EGS0295 81 30 492 EGS0318 82 28 503 EGS0334 83 25 552 EGS0336 84 30 496 PF0434 ⁇ WO ⁇ PCT E GS0337 85 29 539 EGS0338 86 29 559 EGS0341 87 29 572 EGS0343 88 29 506 EGS0344 89 29 583 [085] An RGN provided herein binds to a target nucleotide sequence and hybridizes with the RNA molecule (crRNA) specific to the RNA-guided nuclease.
  • crRNA RNA molecule
  • the target sequence can then be subsequently cleaved by the RGN if the RGN polypeptide possesses nuclease activity.
  • the presently disclosed RGNs can cleave nucleotides within a polynucleotide, functioning as an endonuclease.
  • the disclosed RGNs can cleave nucleotides of a target nucleotide sequence within any position of a polynucleotide and thus function as both an endonuclease and exonuclease.
  • the presently disclosed RGNs can be wild-type sequences derived from bacterial or archaeal species. Alternatively, the RGNs can be variants or fragments of wild-type polypeptides.
  • the wild-type RGN can be modified to alter nuclease activity or alter PAM specificity, for example.
  • the RGN is not naturally-occurring. Such RGN have a single functioning nuclease domain.
  • the RGNs lacks nuclease activity altogether or exhibits reduced nuclease activity and is referred to herein as nuclease-dead RGNs. Any method known in the art for introducing mutations into an amino acid sequence, such as PCR-mediated mutagenesis and site-directed mutagenesis, can be used for generating nuclease-dead RGNs. (e.g. US9,790,490).
  • nuclease dead RGNs can be targeted to particular genomic locations to alter the expression of a desired sequence.
  • the binding of a nuclease-dead RNA-guided nuclease to a target sequence results in the repression of expression of the target sequence or a gene under transcriptional control by the target sequence by interfering with the binding of RNA polymerase or transcription factors within the targeted genomic region.
  • the RGN e.g. , a nuclease- dead RGN
  • its complexed gRNA further comprises an expression modulator that, upon binding to a target sequence, serves to either repress or activate the expression of the target sequence or a gene under transcriptional control by the target sequence.
  • the expression modulator modulates the expression of the target sequence or regulated gene through epigenetic mechanisms.
  • one or more of the nuclease-dead RGNs disclosed herein can be targeted to particular genomic locations to modify the sequence of a target polynucleotide through fusion to a base editing polypeptide, for example a deaminase polypeptide or active variant or fragment thereof that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another.
  • the base- PF0434 ⁇ WO ⁇ PCT editing polypeptide can be fused to the RGN at its N-terminal or C-terminal end.
  • the base- editing polypeptide may be fused to the RGN via a peptide linker.
  • a deaminase polypeptide that is useful for such compositions and methods include cytidine deaminase or the adenosine deaminase base editor described in Gaudelli et al. (2017) Nature 551:464-471, and WO2018/027078.
  • Structural elements of the RGN peptides [090]
  • the RGN proteins used in the present disclosure employ multiple domains distributed in a recognition lobe (REC) and a nuclease lobe (NUC) for substrate recognition and cleavage.
  • an RGN polypeptide of the disclosure comprises an amino-terminal domain (NTD) and a carboxy-terminal domain (CTD), which are connected by a linker loop.
  • the NTD consists of two domains: the wedge (WED) and recognition (REC) domains.
  • the CTD consists of the tri split RuvC domain, which is split by a second REC domain and a target nucleic acid-binding (TNB) domain and an unstructured tail that corresponds with the expression region of the tracrRNA, that is dispensable for activity.
  • the RGN polypeptides of the present disclosure do not contain a HNH domain.
  • the RGNs of the present disclosure may comprise one or more additional domains, e.g., one or more of a Rec domains.
  • the RGN polypeptides provided herein are between 300 and 700 amino acids in size, between 400 and 600 amino acids in size, between 450 and 550 amino acids in size. Size variation may be dependent on the particular domain architecture of the RGN polypeptides provided herein.
  • RuvC domain [093] The RuvC domain may comprise multiple subdomains: RuvC-I, RuvC-II and RuvC-III. The subdomains may be separated by other sequences on the amino acid sequence of the protein.
  • RuvC domains include any polypeptides having a structural similarity and/or sequence similarity to a RuvC domain described in the art.
  • the RuvC domain may share a structural similarity and/or sequence similarity to a RuvC of Cas9.
  • the RuvC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with RuvC domains.
  • the RuvC domain comprise RuvC-I polypeptide, RuvC-II polypeptide, and RuvC-III polypeptide.
  • RuvC-I domain also include any polypeptides having a structural similarity and/or sequence similarity to a RuvC-I, II, and III domains described in the art, such as the corresponding domains of Cas9.
  • the RuvC domain may have an amino acid sequence that share at least PF0434 ⁇ WO ⁇ PCT 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a RuvC domain of Cas9.
  • the RuvC domain of Cas9 consists of a six-stranded mixed beta-sheet flanked by a-helices and two additional two-stranded antiparallel beta-sheets (see e.g., Nishimasu et al.
  • the RuvC domain of Cas9 shares structural similarity with the retroviral integrase superfamily members characterized by an RNase H fold, such as Escherichia coli RuvC (PDB code 1HJR, 14% identity, root-mean-square deviation (rmsd) of 3.6 A for 126 equivalent Ca atoms) and Thermus thermophilus RuvC (PDB code 4LD0, 12% identity, rmsd of 3.4 A for 131 equivalent Ca atoms).
  • E. coli RuvC is a 3-layer alpha-beta sandwich containing a 5-stranded beta-sheet sandwiched between 5 alpha-helices.
  • RuvC nucleases have four catalytic residues (e.g., Asp7, Glu70, Hisl43 and Aspl46 in T. thermophilus RuvC), and cleave Holliday junctions (or structurally analogous cruciform junctions) through a two-metal mechanism. Asp 10 (Ala), Glu762, His983 and Asp986 of the Cas9 RuvC domain are located at positions similar to those of the catalytic residues of T. thermophilus RuvC.
  • REC domain [097]
  • the REC domain may comprise multiple subdomains: REC1 and REC2. The subdomains may be separated by other sequences on the amino acid sequence of the protein.
  • Examples of REC domains include any polypeptides having a structural similarity and/or sequence similarity to a REC domain described in the art.
  • the REC domain may share a structural similarity and/or sequence similarity to a REC of Cas12a.
  • the REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with REC domains.
  • the REC domain may have a Helix-turn-helix (HTH) DNA binding domain.
  • HTH Helix-turn-helix
  • HTH is the DNA-binding motif used in prokaryotic regulatory proteins such as Cro, CAP, and ⁇ repressor and in many eukaryotic activators such as Myc, MyoD, E12, E47, and AP-4.
  • prokaryotic regulatory proteins the HTH motif is a tightly packed amino acid structure consisting of an ⁇ -helical region is followed by a sharp ⁇ -turn and then another ⁇ -helical region.
  • the HTH motif of the protein directly interacts with DNA, the second ⁇ helix (the “recognition helix”) binding in the major groove of the DNA.
  • the REC domain may comprise a bridge helix (BH) domain.
  • the bridge helix domain refers to a helix and arginine rich polypeptide.
  • the bridge helix domain may be located next to anyone of the amino acid domains in the nucleic-acid guided nuclease.
  • the bridge helix domain is next to a RuvC domain, e.g., next to RuvC-I, RuvC-II, or RuvC-III subdomain.
  • the bridge helix domain is between a RuvC-I and RuvC-II subdomains.
  • the REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a REC domain of Cas12a.
  • the REC domain of Cas12a consists of the REC1 and REC2 domains where REC1 comprises 13 alpha helices, and REC2 comprises ten alpha helices and two beta strands that form a small antiparallel sheet (see e.g., Yamano et al. (2016), Cell, 165, 4,Pages 949-962).
  • Target nucleic acid-binding domains include any polypeptides having a structural similarity and/or sequence similarity to a TNB domain described in the art.
  • the TNB domain may share a structural similarity and/or sequence similarity to a TNB of Cas12f1.
  • the TNB domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with TNB domains.
  • the TNB domain may consist of a Zinc finger binding domain.
  • a Zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions in order to stabilize the fold, typically consisting of a dual CXXC motif where X can be any amino acid and there are a variable number of amino acids between the two pairs of cytosines.
  • Modified RGN peptides [104] The RGNs may comprise one or more modifications. The modified RGNs may be catalytically inactive (also referred as dead). A catalytically inactive or dead nuclease may have reduced or no nuclease activity compared to a wildtype counterpart nuclease. In some cases, a catalytically inactive or dead nuclease may have nickase activity.
  • the RGN polypeptide comprises a mutation of the catalytic RuvC- residue corresponding to D289A, E388A or D486A (catalytic residues of RuvI, II, and III which are well known in the prior art) of SEQ ID NO:14 (mutated EGS0293) or equivalent residues of other RGN sequences PF0434 ⁇ WO ⁇ PCT provided herein (see for example Kleinstiver, et al. (2019) Nat Biotechnol 37, 276–282).
  • the modifications of the RGN polypeptide may or may not cause an altered functionality.
  • the RGN polypeptide comprises a deletion of the unstructured tail at the carboxy terminus of the RGN polypeptide while maintaining the core catalytic activity of the enzyme (SEQ ID NO:6-13). This can be done to facilitate greater packaging into delivery vectors and to save on manufacturing costs.
  • the RGN polypeptide comprises mutations in the DNA binding pocket to increase affinity for DNA leading to enhanced binding activity. Such enhanced binding activity can lead to increased cleavage activity or can lead to increased activity of the fusion domain.
  • the RGN polypeptide comprises a mutation that increases the positive change of the of enzyme corresponding to T97R, and/or T101R, and/or N150R, and/or D153R, and/or N157R, and/or A190K or A190R, and/or E247R, and/or Q336R, and/or Q343K or Q343R, and/or N347K or N347R, Q373R, and/or D389K or D389R, and/orA424K or A424R, and/or V427R of SEQ ID NO: 2.
  • the improved RGN is EGS0293v2_Q343K_N347K_A424R (SEQ ID NO: 98) ( Figure 5).
  • Fusion proteins may include, for example, fusions with heterologous domains or functional domains (e.g., localization signals, enzymes).
  • various different modifications may be combined (e.g., a mutated nuclease which is catalytically inactive and which further is fused to a functional domain, such as for instance to induce DNA methylation or another nucleic acid modification, such as, for example, a mutation, a deletion, an insertion, a replacement).
  • the RGNs can comprise at least one nuclear localization signal (NLS) to enhance transport of the RGN to the nucleus of a cell.
  • Nuclear localization signals are known in the art and generally comprise a stretch of basic amino acids (see, e.g., Lange et al., J. Biol. Chem. (2007) 282:5101-5105).
  • the RGN comprises 2, 3, or more nuclear localization signals.
  • the nuclear localization signal(s) can be a heterologous NLS.
  • Non-limiting examples of nuclear localization signals useful for the presently disclosed RGNs are the nuclear localization signals of SV40 Large T-antigen, nucleopasmin, PF0434 ⁇ WO ⁇ PCT and c-Myc (see.
  • RGNs localization signal sequences known in the art that localize polypeptides to particular subcellular location(s) can also be used to target the RGNs, including, but not limited to, plastid localization sequences, mitochondrial localization sequences, and dual-targeting signal sequences that target to both the plastid and mitochondria (see, e.g., Nassoury and Morse (2005) Biochim Biophys Acta 1743:5-19; Herrmann and Neupert (2003) IUBMB Life 55:219-225; Soil (2002) Curr Opin Plant Biol 5:529-535; Carrie and Small (2013) Biochim Biophys Acta 1833:253-259).
  • a non-limiting example of a cell-penetrating domain is the trans-activating transcriptional activator (TAT) from the human immunodeficiency virus 1.
  • TAT trans-activating transcriptional activator
  • the nuclear localization signal, plastid localization signal, mitochondrial localization signal, dual targeting localization signal, and/or cell-penetrating domain can be located at the amino-terminus (N- terminus), the carboxyl-terminus (C-terminus), or in an internal location of the RNA-guided nuclease.
  • Additional tags and labels [113]
  • the presently disclosed RGN polypeptides may comprise a detectable label or a purification tag.
  • the RGN polypeptide or guide RNA can be fused to a detectable label to allow for detection of a particular sequence.
  • a nuclease-dead RGN can be fused to a detectable label (e.g., PF0434 ⁇ WO ⁇ PCT fluorescent protein) and targeted to a particular sequence associated with a disease to allow for detection of the disease-associated sequence.
  • a detectable label is a molecule that can be visualized or otherwise observed.
  • the detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to the RGN polypeptide that can be detected visually or by other means.
  • Detectable labels that can be fused to the presently disclosed RGNs as a fusion protein include any detectable protein domain, including but not limited to, a fluorescent protein or a protein domain that can be detected with a specific antibody.
  • fluorescent proteins include green fluorescent proteins (e.g., GFP, EGFP, ZsGreen) and yellow fluorescent proteins (e.g, YFP, EYFP, ZsYellow).
  • RGN polypeptides can also comprise a purification tag, which is any molecule that can be utilized to isolate a protein or fused protein from a mixture (e.g., biological sample, culture medium).
  • Non- limiting examples of purification tags include biotin, myc, maltose binding protein (MBP), and glutathione -S- transferase (GST).
  • Fusion proteins comprising the RGNs [117]
  • the presently disclosed RGNs can be fused to an effector domain (a fusion protein of an RGN and an effector domain), such as a cleavage domain, a deaminase domain, or an expression modulator domain, either directly or indirectly via a linker.
  • an effector domain can be located at the N-terminus, the C-terminus, or an internal location of the RNA-guided nuclease.
  • the RGN component of the fusion protein is a nuclease-dead RGN.
  • RGNs that are fused to a polypeptide or domain can be separated or joined by a linker.
  • a linker joins a gRNA binding domain of an RNA guided nuclease and a base-editing polypeptide, such as a deaminase.
  • the RGN fusion protein comprises a cleavage domain, which is any domain that is capable of cleaving a polynucleotide (i.e.. RNA, DNA) and includes, but is not limited to, restriction endonucleases and homing endonucleases (see, e.g Linn et al.
  • the RGN fusion protein comprises a deaminase domain that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another, and includes, but is not limited to, a cytidine deaminase or an adenosine deaminase base editor.
  • the effector domain of the fusion protein can be an expression modulator domain, which is a domain that either serves to upregulate or downregulate transcription.
  • the expression PF0434 ⁇ WO ⁇ PCT modulator domain can be an epigenetic modification domain, a transcriptional repressor domain or a transcriptional activation domain.
  • the expression modulator of the RGN fusion protein comprises an epigenetic modification domain that covalently modifies DNA or histone proteins to alter histone structure and/or chromosomal structure without altering the DNA sequence, leading to changes in gene expression (i. e. , upregulation or downregulation).
  • Non-limiting examples of epigenetic modifications include acetylation or methylation of lysine residues, arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation of histone proteins, and methylation and hydroxymethylation of cytosine residues in DNA.
  • Non-limiting examples of epigenetic modification domains include histone acetyltransferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
  • the expression modulator of the fusion protein comprises a transcriptional repressor domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to reduce or terminate transcription of at least one gene.
  • Transcriptional repressor domains are known in the art and include, but are not limited to IKB, and Kruppel associated box (KRAB) domains.
  • the expression modulator of the fusion protein comprises a transcriptional activation domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to increase or activate transcription of at least one gene.
  • Transcriptional activation domains are known in the art and include, but are not limited to, a VP16 activation domain and an NFAT activation domain.
  • the nucleic acid-targeting effector protein-guide RNA complex as a whole may be associated with two or more functional domains.
  • there may be two or more functional domains associated with the nucleic acid-targeting effector protein or there may be two or more functional domains associated with the guide RNA (via one or more adaptor proteins), or there may be one or more functional domains associated with the nucleic acid-targeting effector protein and one or more functional domains associated with the guide RNA (via one or more adaptor proteins).
  • the fusion between the adaptor protein and the activator or repressor may include a linker.
  • a linker For example, Gly-Ser linkers GGGS can be used. They can be used in repeats of 3 or 6, 9 or even 12 or more, to provide suitable lengths, as required. Linkers can be used between the guide RNAs and the functional PF0434 ⁇ WO ⁇ PCT domain (activator or repressor), or between the nucleic acid-targeting effector protein and the functional domain (activator or repressor).
  • GuideRNAs (gRNAs), tracrRNA, and crRNA [127] The present disclosure provides RGNs that can bind to gRNAs.
  • gRNA refers to a nucleotide sequence having sufficient complementarity with a target nucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of an associated RNA- guided nuclease to the target nucleotide sequence.
  • a RGN s respective gRNA is one or more RNA molecules (generally, one or two), that can bind to the RGN and guide the RGN to bind to a particular target nucleotide sequence, and in those instances wherein the RGN has nickase or nuclease activity, also cleave the target nucleotide sequence.
  • a gRNA comprises a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA).
  • Native gRNAs that comprise both a crRNA and a tracrRNA generally comprise two separate RNA molecules that hybridize to each other through the repeat sequence of the crRNA and the anti-repeat sequence of the tracrRNA.
  • Native direct repeat sequences within a CRISPR array generally range in length from 28 to 37 base pairs, although the length can vary between 23 bp to 55 bp.
  • Spacer sequences within a CRISPR array generally range from 28 to 34 bp in length, although the length can be between 21 bp to 72 bp.
  • the spacer sequences of the present disclosure are normally between 35 and 46 nucleotides.
  • Each CRISPR array generally comprises less than 60 units of the CRISPR repeat-spacer sequence.
  • the CRISPRs are transcribed as part of a long transcript termed the primary CRISPR transcript, which comprises much of the CRISPR array.
  • the primary CRISPR transcript is cleaved by Cas proteins to produce crRNAs or in some cases, to produce pre-crRNAs that are further processed by additional Cas proteins into mature crRNAs.
  • Mature crRNAs comprise a spacer sequence and a CRISPR repeat sequence.
  • maturation involves the removal of one to six or more 5', 3', or 5' and 3' nucleotides.
  • these nucleotides that are removed during maturation of the pre-crRNA molecule are not necessary for generating or designing a gRNA.
  • the length of the target DNA within the sequence of the gRNA with the complementary sequence is 17 to 23bp, 18 to 23bp, 19 to 23bp, more specifically 20 to 23bp, as more specifically, it may be a 21 to 23bp, but is not limited thereto.
  • a CRISPR RNA comprises a spacer sequence and a CRISPR repeat sequence.
  • the “spacer sequence” is the nucleotide sequence that directly hybridizes with the target nucleotide sequence of interest.
  • the spacer sequence is engineered to be fully or partially complementary with the target PF0434 ⁇ WO ⁇ PCT sequence of interest.
  • the spacer sequence can comprise from 8 nucleotides to 30 nucleotides, or more.
  • the spacer sequence can be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
  • a trans-activating CRISPR RNA (tracrRNA) molecule comprises a nucleotide sequence comprising a region that has sufficient complementarity to hybridize to a CRISPR repeat sequence of a crRNA, which is referred to herein as the anti-repeat region.
  • the tracrRNA molecule further comprises a region with secondary structure (e.g., stem-loop) or forms secondary structure upon hybridizing with its corresponding crRNA.
  • the region of the tracrRNA that is fully or partially complementary to a CRISPR repeat sequence is at the 3' end of the molecule and the 5' end of the tracrRNA comprises secondary structure.
  • This region of secondary structure generally comprises several hairpin structures, including the nexus hairpin, which is found adjacent to the anti - repeat sequence. There are often hairpins at the 5' end of the tracrRNA that can vary in structure and number. [132] Table 5.
  • a single gRNA comprises the crRNA and tracrRNA on a single molecule of RNA
  • a dual-gRNA system comprises a crRNA and a tracrRNA present on two distinct RNA molecules, hybridized to one another through at least a portion of the CRISPR repeat sequence of the crRNA and at least a portion of the tracrRNA, which may be fully or partially complementary to the CRISPR repeat sequence of the crRNA.
  • the crRNA and tracrRNA are separated by a linker nucleotide sequence.
  • the linker nucleotide sequence is one that does not include complementary bases in order to avoid the formation of secondary structure within or comprising nucleotides of the linker nucleotide sequence.
  • the linker nucleotide sequence between the crRNA and tracrRNA is at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, or more nucleotides in length.
  • the linker nucleotide sequence of a single gRNA is at least 4 nucleotides in length.
  • the linker nucleotide sequence is the nucleotide sequence set forth in Table 5.
  • the linker nucleotide sequence is at least 6 nucleotides in length. In certain embodiments, the linker nucleotide sequence is RAAA.
  • the single gRNA or dual-gRNA can be synthesized chemically or via in vitro transcription. Assays for determining sequence-specific binding between a RGN and a gRNA are known in the art and include, but are not limited to, in vitro binding assays between an expressed RGN and the gRNA, which can be tagged with a detectable label (e.g., biotin) and used in a pull-down detection assay in which the gRNA:RGN complex is captured via the detectable label (e.g., with streptavidin beads).
  • a detectable label e.g., biotin
  • a control gRNA with an unrelated sequence or structure to the gRNA can be used as a negative control for non-specific binding of the RGN to RNA.
  • the gRNA can be introduced into a target cell, organelle, or embryo as an RNA molecule.
  • the gRNA can be transcribed in vitro or chemically synthesized.
  • a nucleotide sequence encoding the gRNA is introduced into the cell, organelle, or embryo.
  • the nucleotide sequence encoding the gRNA is operably linked to a promoter (e.g., an PF0434 ⁇ WO ⁇ PCT RNA polymerase III promoter).
  • the promoter can be a native promoter or heterologous to the gRNA- encoding nucleotide sequence.
  • the gRNA can be introduced into a target cell, organelle, or embryo as a ribonucleoprotein complex, as described herein, wherein the gRNA is bound to an RNA-guided nuclease polypeptide.
  • the gRNA directs an associated RNA-guided nuclease to a particular target nucleotide sequence of interest through hybridization of the gRNA to the target nucleotide sequence.
  • a target nucleotide sequence can comprise DNA, RNA, or a combination of both and can be single-stranded or double -stranded.
  • a target nucleotide sequence can be genomic DNA (i.e. , chromosomal DNA), plasmid DNA, or an RNA molecule (e.g. , messenger RNA, ribosomal RNA, transfer RNA, micro RNA, small interfering RNA).
  • the target nucleotide sequence can be bound (and in some embodiments, cleaved) by an RNA-guided nuclease in vitro or in a cell.
  • the chromosomal sequence targeted by the RGN can be a nuclear, plastid or mitochondrial chromosomal sequence.
  • the target nucleotide sequence is unique in the target genome.
  • the present disclosure also provides methods for binding and/or modifying a target nucleotide sequence of interest.
  • the methods include delivering a system comprising at least one gRNA or a polynucleotide encoding the same, and at least one fusion polypeptide comprises an RGN of the invention and a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase, or a polynucleotide encoding the fusion polypeptide, to the target sequence or a cell, organelle, or embryo comprising the target sequence.
  • methods comprise the use of a single RGN polypeptide in combination with multiple, distinct gRNAs, which can target multiple, distinct sequences within a single gene and/or multiple genes. Also encompassed herein are methods wherein multiple, distinct gRNAs are introduced in combination with multiple, distinct RGN polypeptides. These gRNAs and gRNA/RGN polypeptide systems can target multiple, distinct sequences within a single gene and/or multiple genes.
  • Protospacer adjacent motif (PAM) sequences [139] The present disclosure also provides PAM (proto-spacer-adjacent Motif) sequences to the adjacent, target DNA sequence of the complementary chain (complementary strand) and base pair formation can be in sequence to include to, a gRNA, or a composition comprising a DNA coding for the gRNA.
  • PAM proto-spacer-adjacent Motif sequences to the adjacent, target DNA sequence of the complementary chain (complementary strand) and base pair formation can be in sequence to include to, a gRNA, or a composition comprising a DNA coding for the gRNA.
  • PF0434 ⁇ WO ⁇ PCT [140]
  • the target nucleotide sequence of the RGNs is adjacent to a sequence called protospacer adjacent motif (PAM).
  • a protospacer adjacent motif is generally within 1 to 30 nucleotides from the target nucleotide sequence.
  • a protospacer adjacent motif can be within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides from the target nucleotide sequence.
  • the PAM can be 5' or 3' of the target sequence depending on the RGN. In some embodiments, the PAM is 5' of the target sequence for the presently disclosed RGNs. Generally, the PAM is a consensus sequence of 3-4 nucleotides, but in particular embodiments, can be 2, 3, 4, 5, 6, 7, 8, 9, or more nucleotides in length.
  • the PAM sequences of the presently disclosed RGN proteins are purine rich as opposed to pyrimidine rich PAM sequences of all other Type V systems. [141] Table 6.
  • the RGN having or an active variant or fragment thereof binds respectively a target nucleotide sequence adjacent to a PAM sequence set forth in Table 6.
  • the RGN binds to a guide sequence comprising a CRISPR repeat sequence set forth in Table 5, or an active variant or fragment thereof, and a tracrRNA sequence set forth in Table 5, or an active variant or fragment thereof.
  • PAM sequence specificity for a given nuclease enzyme is affected by enzyme concentration (see, e.g. , Karvelis et al. (2015) Genome Biol 16:253), which may be modified by altering the promoter used to express the RGN, or the amount of ribonucleoprotein complex delivered to the cell, organelle, or embryo.
  • the RGN can cleave the target nucleotide sequence at a specific cleavage site.
  • a cleavage site is made up of the two particular nucleotides within a target nucleotide sequence between which the nucleotide sequence is cleaved by an RGN.
  • the cleavage site can comprise the Ist and 2nd , 2nd and 3rd , 3rd and 4th , 4th and 5th , 5th and 6th , 7th and 8th , or 8th and 9th nucleotides from the PAM in either the 5' or 3' direction.
  • the cleavage site may be over 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides from the PAM in either the 5’ or 3’ direction. In some embodiments, the cleavage site is 4 nucleotides away from the PAM. In other embodiments, the cleavage site is at least 15 nucleotides away from the PAM.
  • the cleavage site is defined based on the distance of the two nucleotides from the PAM on the positive (+) strand of the polynucleotide and the distance of the two nucleotides from the PAM on the negative (-) strand of the polynucleotide.
  • Target nucleotide sequence [146]
  • the target polynucleotide of an RGN system can be any polynucleotide endogenous or exogenous to the eukaryotic cell.
  • the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell.
  • the target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regions or introns).
  • the target sequence is generally associated with a PAM (protospacer adjacent motif. The precise sequence and length requirements for the PAM differ depending on the RGN used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence).
  • a target nucleic acid can be single stranded DNA (ssDNA) or double stranded DNA (dsDNA).
  • the source of the target DNA can be the same as the source of the sample, e.g., as described below.
  • the source of the target DNA can be any source.
  • the target DNA is a viral DNA (e.g., a genomic DNA of a DNA virus).
  • subject method can be for detecting the presence of a viral DNA amongst a population of nucleic acids (e.g., in a sample).
  • a subject method can also be used for the cleavage of non-target ssDNAs in the present of a target DNA. For example, if a method takes place in a cell, a subject method can be used to promiscuously cleave non-target ssDNAs in the cell (ssDNAs that do not hybridize with the guide sequence of the guide RNA) when a particular target DNA is present in the cell (e.g., when the cell is infected with a virus and viral target DNA is detected).
  • the target polynucleotide of a RGN/RNA complex may be a disease-associated gene or polynucleotides or a gene/ polynucleotide associated with a biological pathway.
  • target DNAs include, but are not limited to, viral DNAs such as: a papovavirus (e.g., human papillomavirus (HPV), polyoma virus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus (VZV), Epstein-Barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, Pityriasis Rosea, kaposi's sarcoma- associated herpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus, ichtadenovirus, mastadenovirus, siadeno virus); a poxvirus (e.g., smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus, pseudocow
  • the target DNA is parasite DNA.
  • the target DNA is bacterial DNA, e.g., DNA of a pathogenic bacterium.
  • RGNs complexes with RNA [151] RGNs can be complexed to a gRNA (gRNA/RGN complex) in order to deliver Cas in proximity with a target nucleic acid sequence.
  • the gRNA is a polynucleotide that site-specifically guides a Cas nuclease, or a deactivated Cas nuclease, to a target nucleic acid region.
  • the binding specificity is determined jointly by the complementary region on the cognate guide and a short DNA motif (protospacer adjacent motif or PAM) juxtaposed to the complementary region.
  • RNA/Cas complexes can be produced using methods well known in the art.
  • the RNA of the complexes can be produced in vitro and RGN polypeptides can be recombinantly produced and then the RNA and RGN proteins can be complexed together using methods known in the art.
  • cell lines constitutively expressing RGN proteins can be developed and can be transfected with the gRNA components, and complexes can be purified from the cells using standard purification techniques, such as but not limited to affinity, ion exchange and size exclusion chromatography. See, e.g. , Jinek M., et al, "A programmable dual-R A-guided DNA endonuclease in adaptive bacterial immunity," Science (2012) 337:816-821.
  • the components i.e., the gRNA and RGN polynucleotides may be provided separately to a cell, e.g., using separate constructs, or together, in a single construct, or in any combination, and complexes can be purified as above.
  • Variants of RGNs [154] The present disclosure provides RGNs comprising at least 50, 100, 150, 200, 250, 300, 350, 400, 450 or more contiguous amino acid residues of the amino acid as provided above in Table 4.
  • RNA-guided nucleases provided herein can comprise at least one nuclease domain (e.g ., DNase, RNase domain) and at least one RNA recognition and/or RNA binding domain to interact with gRNAs.
  • RNA-guided nucleases include, but are not limited to, DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
  • the RNA-guided nucleases provided herein can comprise at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% to one or more of a DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
  • variants or fragments While the activity of a variant or fragment may be altered compared to the polynucleotide or polypeptide of interest, the variant and fragment should retain the functionality of the polynucleotide or polypeptide of interest. For example, a variant or fragment may have increased activity, decreased activity, different spectrum of activity or any other alteration in activity when compared to the polynucleotide or polypeptide of interest.
  • Fragments and variants of naturally-occurring RGN polypeptides, such as those disclosed herein, will retain sequence-specific, RNA-guided DNA-binding activity.
  • fragments and variants of naturally-occurring RGN polypeptides will retain nuclease activity (single -stranded or double-stranded).
  • a biologically active variant of an RGN polypeptide of the invention may differ by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 amino acid residue.
  • the polypeptides can comprise an N- terminal or a C-terminal truncation, which can comprise at least a deletion of 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, PF0434 ⁇ WO ⁇ PCT 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350 amino acids or more from either the N or C terminus of the polypeptide.
  • a biologically active variant of an RGN polypeptide of the invention may differ by as few as 1 or 2 amino acids.
  • RGN proteins can have varying sensitivity to mismatches between a spacer sequence in a gRNA and its target sequence that affects the efficiency of cleavage.
  • the CRISPR RNA repeat sequence comprises a nucleotide sequence that comprises a region with sufficient complementarity to hybridize to a tracrRNA.
  • the CRISPR RNA repeat sequence can comprise from 5 nucleotides to 30 nucleotides, or more.
  • the CRISPR repeat sequence can be 5,6,7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
  • the CRISPR repeat sequence is 12 nucleotides in length
  • the degree of complementarity between a CRISPR repeat sequence and its corresponding tracrRNA sequence, when optimally aligned using a suitable alignment algorithm is or more than 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more.
  • the CRISPR repeat sequence comprises the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a gRNA, is capable of directing the sequence-specific binding of an associated RNA-guided nuclease provided herein to a target sequence of interest.
  • an active CRISPR repeat sequence variant of a wild- type sequence comprises a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the nucleotide sequence set forth in Table 5.
  • an active CRISPR repeat sequence fragment of a wild-type sequence comprises at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 contiguous nucleotides from 3’ end of the nucleotide sequence set forth in Table 5 (also listed in Table 10).
  • an active CRISPR repeat sequence fragment of a wild-type sequence comprises at least 10, 11, 12, 13, 15, 15, 16, 17, 18, 19, or 20 contiguous nucleotides from 3’ end of the nucleotide sequence set forth in Table 10.
  • an active CRISPR repeat sequence fragment of a wild-type sequence comprises a nucleotide sequence set forth in Table 11.
  • Fragments and variants of naturally occurring CRISPR repeats will retain the ability, when part of a gRNA (comprising a tracrRNA), to bind to and guide an RNA-guided nuclease (complexed with the gRNA) to a target nucleotide sequence in a sequence-specific manner.
  • tracrRNAs such as those disclosed herein, will retain the ability, when part of a gRNA (comprising a CRISPR RNA), to guide an RNA-guided nuclease (complexed with the gRNA) to a target nucleotide sequence in a sequence-specific manner.
  • the anti-repeat region of the tracrRNA that is fully or partially complementary to the CRISPR repeat sequence comprises from 3 nucleotides to 30 nucleotides, or more.
  • the region of base pairing between the tracrRNA anti-repeat sequence and the CRISPR repeat sequence can be 3,4,5,6,7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
  • the anti-repeat region of the tracrRNA that is fully or partially complementary to a CRISPR repeat sequence is 8 nucleotides in length.
  • the degree of complementarity between a CRISPR repeat sequence and its corresponding tracrRNA anti-repeat sequence when optimally aligned using a suitable alignment algorithm, is or more than 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more.
  • the entire tracrRNA can comprise from 60 nucleotides to more than 140 nucleotides.
  • the tracrRNA can be 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, or more nucleotides in length.
  • the tracrRNA is 80 to 90 nucleotides in length, including 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, and 90 nucleotides in length.
  • the tracrRNA is 107 nucleotides in length.
  • the tracrRNA comprises the nucleotide sequence set forth in Table 5 or an active variant or fragment thereof that when comprised within a gRNA is capable of directing the sequence-specific binding of an associated RNA-guided nuclease provided herein to a target sequence of interest.
  • an active tracrRNA sequence variant of a wild-type sequence comprises a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%,80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the nucleotide sequence set forth in Table 5.
  • an active tracrRNA sequence fragment of a wild-type sequence comprises at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, or more contiguous nucleotides of the nucleotide sequence set forth in Table 5.
  • the presently disclosed polynucleotides comprise or encode a CRISPR repeat comprising a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater identity to the nucleotide sequence set forth in Table 5.
  • the presently disclosed polynucleotides can comprise or encode a tracrRNA comprising a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater identity to the nucleotide sequence set forth in Table 5.
  • Biologically active variants of a CRISPR repeat or tracrRNA of the invention may differ by as few as 1-15 nucleotides, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 nucleotide.
  • the polynucleotides can comprise a 5' or 3' truncation, which can comprise at least a deletion of 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80 nucleotides or more from either the 5' or 3' end of the polynucleotide.
  • the degree of complementarity between a spacer sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm is or more than 80%, 85%, 90, 95%, 96%, 97%, 98%, 99%, or more.
  • the spacer sequence is free of secondary structure, which can be predicted using any suitable polynucleotide folding algorithm known in the art, including but not limited to mFold (see, e.g., Zuker and Stiegler (1981) Nucleic Acids Res .9: 133-148) and RNAfold (see, e.g., Gruber et al. (2008) Cell l06(l):23-24).
  • Nucleotides encoding RNA-guided nucleases, crRNA, and/or tracrRNA [169] The present disclosure provides polynucleotides comprising the presently disclosed crRNAs, tracrRNAs, and/or gRNAs and polynucleotides comprising a nucleotide sequence encoding the presently disclosed RGNs, crRNAs, tracrRNAs, and/or gRNAs.
  • polynucleotides include those comprising or encoding a CRISPR repeat sequence comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a gRNA is capable of directing the sequence-specific binding of an associated RNA- guided nuclease to a target sequence of interest.
  • the disclosure also provides polynucleotides comprising or encoding a tracrRNA comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a gRNA is capable of directing the sequence- specific binding of an associated RNA-guided nuclease to a target sequence of interest.
  • Polynucleotides are also provided that encode an RGN comprising the amino acid sequence set forth in Table 4, and active fragments or variants thereof that retain the ability to bind to a target nucleotide sequence in an RNA-guided sequence-specific manner.
  • the expression cassette will include in the 5'-3' direction of transcription, a transcriptional (and, in some embodiments, translational) initiation region (i.e.. a promoter), an RGN-, a transcriptional initiation region (i.e...
  • a promoter a transcriptional (and in some embodiments, translational) termination region (i.e.. termination region) functional in the organism of interest.
  • the promoters of the invention are capable of directing or driving expression of a coding sequence in a host cell.
  • the regulatory regions e.g., promoters, transcriptional regulatory regions, and translational termination regions
  • Additional regulatory signals may include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, and termination signals.
  • the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
  • a number of promoters can be used in the practice of the invention.
  • the promoters can be selected based on the desired outcome.
  • the nucleic acids can be combined with constitutive, inducible, growth stage-specific, cell type-specific, tissue-preferred, tissue-specific, or other promoters for expression in the organism of interest.
  • the nucleotide comprises a tissue-preferred promoter.
  • the nucleic acid molecules encoding a RGN, crRNA-, tracrRNA-and/or gRNA comprise a cell type- specific promoter.
  • the nucleic acid sequences encoding the RGNs, crRNA-, tracrRNA-and/or gRNA can be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for example, for in vitro mRNA synthesis.
  • the promoter sequence can be a pol I, pol II, pol III, T7, T3, U6, CMV or SP6 promoter sequence or a variation of a T7, T3, U6, CMV or SP6 promoter sequence.
  • the expressed protein and/or RNAs can be purified for use in the methods of genome modification described herein. Any Pol II promoter or terminator could express the RGN.
  • the choice of a promoter depends on how strongly RGN needs to be expressed and in what tissue type.
  • the RGN is expressed using is the CMV promoter.
  • the gRNA can be expressed by Pol III promoters (e.g. U6 promoter).
  • the polynucleotide encoding the RGN also can be linked to a polyadenylation signal (e.g., SV40 polyA signal, or sv40 polyA with rrnG terminator) and/or at least one PF0434 ⁇ WO ⁇ PCT transcriptional termination sequence.
  • sequence encoding the RGN also can be linked to sequence(s) encoding at least one nuclear localization signal, at least one cell- penetrating domain, and/or at least one signal peptide capable of trafficking proteins to particular subcellular locations.
  • Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, and the like. See, for example, U.S. Pat. Nos.5,039,523 and 4,853,331
  • the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
  • Variants of polynucleotides include those sequences that, because of the degeneracy of the genetic code, encode the native amino acid sequence of the gene of interest.
  • Naturally occurring allelic variants such as these can be identified with the use of well-known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques as outlined below.
  • Variant polynucleotides also include synthetically derived polynucleotides, such as those generated, for example, by using site-directed mutagenesis but which still encode the polypeptide or the polynucleotide of interest.
  • variants of a particular polynucleotide disclosed herein will have at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment programs and parameters described elsewhere herein.
  • Variants of a particular polynucleotide disclosed herein i.e. , the reference polynucleotide
  • variants of a particular polynucleotide disclosed herein can also be evaluated by comparison of the percent sequence identity between the polypeptide encoded by a variant polynucleotide and the polypeptide encoded by the reference polynucleotide.
  • Percent sequence identity between any two polypeptides can be calculated using sequence alignment programs and parameters described elsewhere herein. Where any given pair of polynucleotides disclosed herein is evaluated by comparison of the percent sequence identity shared by the two polypeptides they encode, the percent sequence identity between the two encoded polypeptides is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
  • the presently disclosed polynucleotides encode an RGN polypeptide comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater identity to an amino acid sequence set forth in Table 4.
  • Variant polynucleotides and proteins also encompass sequences and proteins derived from a mutagenic and recombinogenic procedure such as DNA shuffling.
  • RGN proteins disclosed herein is manipulated to create a new RGN protein possessing the desired properties.
  • libraries of recombinant polynucleotides are generated from a population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo.
  • sequence motifs encoding a domain of interest may be shuffled between the RGN sequences provided herein and other known RGN genes to obtain a new gene coding for a protein with an improved property of interest, such as an increased Km in the case of an enzyme.
  • Strategies for such DNA shuffling are known in the art.
  • nucleic acid molecules encoding RGNs and/or gRNA can be codon optimized for expression in a target cell or tissue of interest.
  • Such polynucleotide coding sequence normally has its frequency of codon usage designed to mimic the frequency of preferred codon usage or transcription conditions of a particular host cell. Expression in the particular host cell or organism is enhanced as a result of the alteration of one or more codons at the nucleic acid level such that the translated amino acid sequence is not changed.
  • Nucleic acid molecules can be codon optimized, either wholly or in part.
  • Vectors The polynucleotide encoding the RGN, and/or gRNA can be present in a vector or multiple vectors. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, adenoviral vectors).
  • the vector may comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like.
  • additional expression control sequences e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences
  • selectable marker sequences e.g., antibiotic resistance genes
  • origins of replication e.g. "Current Protocols in Molecular Biology” Ausubel et al, John Wiley & Sons, New York, 2003 or "Molecular Cloning: A Laboratory Manual” Sambrook & Russell, Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 3rd edition, 2001).
  • PF0434 ⁇ WO ⁇ PCT [186]
  • the vector may also comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues.
  • Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
  • NEO neomycin phosphotransferase II
  • HPT hygromycin phosphotransferase
  • Delivery of the components to the target cells [187]
  • components of the present invention are delivered using nanoscale delivery systems, such as nanoparticles.
  • liposomes and other particulate delivery systems can be used.
  • vectors including the components of the present methods can be packaged in liposomes prior to delivery.
  • expression constructs comprising nucleotide sequences encoding the RGNs, and/or gRNA can be used to transform organisms of interest.
  • Methods for transformation involve introducing a nucleotide construct into an organism of interest.
  • the methods of the invention do not require a particular method for introducing a nucleotide construct to a host organism, only that the nucleotide construct gains access to the interior of a target cell.
  • the host cell can be a eukaryotic or prokaryotic cell.
  • the eukaryotic host cell is a plant cell, a mammalian cell, or an insect cell.
  • nucleotide constructs into host cells are known in the art including, but not limited to, stable transformation methods, transient transformation methods, and virus-mediated methods.
  • transformation of a host cell may be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate co- precipitation, polycation DMSO technique, DEAE dextran procedure, and viral mediated, liposome mediated and other similar methods.
  • Viral -mediated introduction of a polynucleotide encoding an RGN, and/or gRNA includes retroviral, lentiviral, adenoviral, and adeno-associated viral mediated introduction and expression.
  • Transformation may result in stable or transient incorporation of the nucleic acid into the cell.
  • the cells that have been transformed may be grown into a transgenic organism using well-known methods. Alternatively, cells that have been transformed may be introduced into an organism. These cells could have originated from the organism, wherein the cells are transformed in an ex vivo approach.
  • the polynucleotides encoding the RGNs, and/or gRNAs can also be used to transform any prokaryotic cells, including but not limited to, archaea and bacteria.
  • the polynucleotides encoding the RGNs, and/or gRNAs can be used to transform any eukaryotic cells, including but not limited to animal (e.g., mammals, insects, fish, birds, and reptiles), fungi, amoeba, algae, and yeast cells.
  • Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues.
  • Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a nucleic acid described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
  • Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
  • nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
  • Lipofection is described in, e.g., US 5,049,386 and lipofection reagents are wildly available commercially. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
  • Viral delivery for therapeutic applications allows targeting a virus to specific cells and trafficking the viral payload to the nucleus.
  • Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo).
  • Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene.
  • Retroviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis- acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate PF0434 ⁇ WO ⁇ PCT the therapeutic gene into the target cell to provide permanent transgene expression.
  • Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Viral.66:2731-2739 (1992); Johann et al., J. Viral.66: 1635-1640 (1992); Sommnerfelt et al., Viral.176:58-59 (1990); Wilson et al., J. Viral.63:2374-2378 (1989); Miller et al., 7.
  • MiLV murine leukemia virus
  • GaLV gibbon ape leukemia virus
  • SIV Simian Immuno deficiency virus
  • HAV human immuno deficiency virus
  • adenoviral based systems may be used.
  • Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system.
  • Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids. Construction of recombinant AAV vectors are described in a number of publications, including U.S.5,173,414. Packaging cells are typically used to form virus particles that are capable of infecting a host cell.
  • Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle.
  • the vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide( s) to be expressed.
  • the missing viral functions are typically supplied in trans by the packaging cell line.
  • AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome.
  • Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences.
  • the cell line may also be infected with adenovirus as a helper.
  • the helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid.
  • the helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art.
  • the disclosure provides methods of modifying a target polynucleotide in a eukaryotic cell, which may be performed in vivo, ex vivo or in vitro.
  • the method comprises sampling a cell or population of cells from a human or non-human animal or plant (including microalgae) PF0434 ⁇ WO ⁇ PCT and modifying the cell or cells. Culturing may occur at any stage ex vivo. The cell or cells may even be re-introduced into the non-human animal or plant (including micro-algae).
  • the present disclosure provides methods for binding, cleaving, and/or modifying a target nucleotide sequence of interest.
  • the methods include delivering a system comprising at least one gRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same to the target sequence or a cell, organelle, or embryo comprising the target sequence.
  • the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof.
  • the gRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as provided above, or an active variant or fragment thereof.
  • the gRNA comprising the nucleotide sequence as provided above, or an active variant or fragment thereof.
  • the RGN of the system may be nuclease dead RGN, or may be a fusion polypeptide.
  • the fusion polypeptide comprises a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase.
  • the RGN and/or gRNA is heterologous to the cell, organelle, or embryo to which the RGN and/or gRNA (or polynucleotide(s) encoding at least one of the RGN and gRNA) are introduced.
  • the cell or embryo can then be cultured under conditions in which the gRNA and/or RGN polypeptide are expressed.
  • the method comprises contacting a target sequence with an RGN ribonucleoprotein complex.
  • the RGN ribonucleoprotein complex may comprise an RGN that is nuclease dead or has nickase activity.
  • the RGN of the ribonucleoprotein complex is a fusion polypeptide comprising a base-editing polypeptide.
  • the method comprises introducing into a cell, organelle, or embryo comprising a target sequence an RGN ribonucleoprotein complex.
  • the RGN ribonucleoprotein complex can be one that has been purified from a biological sample, recombinantly produced and subsequently purified, or in vitro- assembled as described herein.
  • the method can further comprise the in vitro assembly of the complex prior to contact with the target sequence, cell, organelle, or embryo.
  • a purified or in vitro assembled RGN ribonucleoprotein complex can be introduced into a cell, organelle, or embryo using any method known in the art, including, but not limited to electroporation.
  • an RGN polypeptide and/or polynucleotide encoding or comprising the gRNA can be introduced into a cell, organelle, or embryo using any method known in the art.
  • PF0434 ⁇ WO ⁇ PCT [207] Upon delivery to or contact with the target sequence or cell, organelle, or embryo comprising the target sequence, the gRNA directs the RGN to bind to the target sequence in a sequence-specific manner.
  • the RGN polypeptide cleaves the target sequence of interest upon binding.
  • the target sequence can subsequently be modified via endogenous repair mechanisms, such as non-homologous end joining, or homology-directed repair with a provided donor polynucleotide.
  • endogenous repair mechanisms such as non-homologous end joining, or homology-directed repair with a provided donor polynucleotide.
  • methods to measure cleavage or modification of a target sequence include in vitro or in vivo cleavage assays wherein cleavage is confirmed using PCR, sequencing, or gel electrophoresis, with or without the attachment of an appropriate label (e.g., radioisotope, fluorescent substance) to the target sequence to facilitate detection of degradation products.
  • an appropriate label e.g., radioisotope, fluorescent substance
  • NTEXPAR nicking triggered exponential amplification reaction
  • In vivo cleavage can be evaluated using the Surveyor assay (Guschin et al. (2010) Methods Mol Biol 649:247-256).
  • the methods involve the use of a single type of RGN complexed with more than one gRNA.
  • the more than one guide RNA can target different regions of a single gene or can target multiple genes.
  • a double -stranded break introduced by an RGN polypeptide can be repaired by a non-homologous end-joining (NHEJ) repair process. Due to the error-prone nature of NHEJ, repair of the double -stranded break can result in a modification to the target sequence. Modification of the target sequence can result in the expression of an altered protein product or inactivation of a coding sequence.
  • NHEJ non-homologous end-joining
  • the donor sequence in the donor polynucleotide can be integrated into or exchanged with the target nucleotide sequence during the course of repair of the introduced double-stranded break, resulting in the introduction of the exogenous donor sequence.
  • a donor polynucleotide thus comprises a donor sequence that is desired to be introduced into a target sequence of interest.
  • the donor sequence alters the original target nucleotide sequence such that the newly integrated donor sequence will not be recognized and cleaved by the RGN.
  • the donor polynucleotide can comprise a PF0434 ⁇ WO ⁇ PCT donor sequence flanked by compatible overhangs, allowing for direct ligation of the donor sequence to the cleaved target nucleotide sequence comprising overhangs by a non-homologous repair process during repair of the double -stranded break.
  • a method for binding a target nucleotide sequence and detecting the target sequence, wherein the method comprises introducing into a cell, organelle, or embryo at least one guide RNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the guide RNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN and further comprises a detectable label, and the method further comprises detecting the detectable label.
  • the detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to or incorporated within the RGN polypeptide that can be detected visually or by other means.
  • a fusion protein e.g., fluorescent protein
  • Methods of modulating gene expression [213] Also provided herein are methods for modulating the expression of a target sequence or a gene of interest under the regulation of a target sequence.
  • the methods comprise introducing into a cell, organelle, or embryo at least one gRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the gRNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN.
  • the nuclease-dead RGN is a fusion protein comprising an expression modulator domain (i.e., epigenetic modification domain, transcriptional activation domain or a transcriptional repressor domain) as described herein.
  • An RGN polypeptide of the present disclosure once activated by detection of a target DNA (double or single stranded), can cleave non-targeted single stranded DNA (ssDNA).
  • ssDNA non-targeted single stranded DNA
  • an RGN polypeptide is activated by a gRNA, after hybridization of gRNA with a target sequence of a target DNA, the protein becomes a nuclease that promiscuously cleaves ssDNAs.
  • the target DNA is present in the sample, the result is cleavage of ssDNAs in the sample, which can be detected using any common detection method (such as using a labeled single stranded DNA).
  • the present disclosure provides systems and methods for detecting a target DNA (double stranded or single stranded) in a sample.
  • a detector DNA is used that is single stranded (ssDNA) and does not hybridize with the gRNA (i.e., the detector ssDNA is a non-target ssDNA).
  • Such methods comprise steps of: (a) contacting the sample with: (i) an RGN polypeptide; (ii) a gRNA PF0434 ⁇ WO ⁇ PCT comprising: a region that binds to the RGN polypeptide, and a spacer sequence that hybridizes with the target DNA; and (iii) a detector DNA that is single stranded and does not hybridize with the spacer sequence; and (b) measuring a detectable signal produced by cleavage of the single stranded detector DNA by the RGN polypeptide, thereby detecting the target DNA.
  • the contacting step of a subject method can be carried out in a composition comprising divalent metal ions.
  • the contacting step can be carried out outside of a cell.
  • the contacting step can be carried out inside a cell.
  • the contacting step can be carried out in a cell in vitro.
  • the contacting step can be also carried out in a cell ex vivo.
  • the contacting step can be carried out in a cell in vivo.
  • the sample is contacted for 2 hours or less (e.g., 1.5 hours or less, 1 hour or less, 40 minutes or less, 30 minutes or less, 20 minutes or less, 10 minutes or less, or 5 minutes or less, or 1 minute or less), under conditions that provide for trans cleavage of the detector DNA.
  • Conditions that provide for trans cleavage of the detector DNA include temperature conditions such as from 17°C to 39°C (e.g., 37°C).
  • kits containing any one or more of the elements disclosed in the above methods and compositions.
  • the kit comprises a vector system and instructions for using the kit.
  • the vector system comprises (a) a first regulatory element operably linked to a gRNA sequence and one or more insertion sites for inserting a guide sequence downstream of the gRNA sequence, wherein when expressed, the gRNA directs sequence- specific binding of a CRISPR complex to a target sequence in a eukaryotic cell, wherein the CRISPR complex comprises a CRISPR enzyme complexed with (1) the gRNA sequence that is hybridized to the target sequence, and (2) a second regulatory element operably linked to an enzyme coding sequence encoding said CRISPR enzyme comprising a nuclear localization sequence.
  • Elements may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube.
  • the kit includes instructions in one or more languages.
  • a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein.
  • Reagents may be provided in any suitable container.
  • a kit may provide one or more reaction or storage buffers.
  • Reagents may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use (e.g. in PF0434 ⁇ WO ⁇ PCT concentrate or lyophilized form).
  • a buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof.
  • the buffer is alkaline.
  • the buffer has a pH from 6 to 10.
  • the kit comprises one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element.
  • the kit comprises a homologous recombination template polynucleotide.
  • the invention provides methods for using one or more elements of a CRISPR system.
  • the CRISPR complex of the invention provides an effective means for modifying a target polynucleotide.
  • the CRISPR complex of the disclosure has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target polynucleotide in a multiplicity of cell types.
  • the CRISPR complex of the invention has a broad spectrum of applications in, e.g., gene therapy, drug screening, disease diagnosis, and prognosis.
  • An exemplary CRISPR complex comprises a CRISPR enzyme complexed with a guide sequence hybridized to a target sequence within the target polynucleotide.
  • Cells comprising the RGN-based systems [222] Provided herein are cells and organisms comprising a target sequence of interest that has been modified using a process or the system based an RGN, and/or gRNA as described herein. Also are provided cells and organisms comprising the system for binding a target sequence of interest comprising an RGN, and/or gRNA as described herein.
  • the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof.
  • the gRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as disclosed above, or an active variant or fragment thereof.
  • the gRNA comprises the nucleotide sequence as disclosed above, or an active variant or fragment thereof.
  • the modified cells can be eukaryotic (e.g., mammalian, plant, insect cell) or prokaryotic.
  • organelles and embryos comprising at least one nucleotide sequence that has been modified by a process utilizing an RGN and/or gRNA as described herein.
  • the genetically modified cells, organisms, organelles, and embryos can be heterozygous or homozygous for the modified nucleotide sequence.
  • the chromosomal modification of the cell, organism, organelle, or embryo can result in altered expression (up-regulation or down-regulation), inactivation, or the expression of an altered protein product or an integrated sequence.
  • the genetically PF0434 ⁇ WO ⁇ PCT modified cell, organism, organelle, or embryo is referred to as a “knock-out”.
  • the knock out phenotype can be the result of a deletion mutation (i.e.. deletion of at least one nucleotide), an insertion mutation (i.e.. insertion of at least one nucleotide), or a nonsense mutation (/. e. , substitution of at least one nucleotide such that a stop codon is introduced).
  • the chromosomal modification of a cell, organism, organelle, or embryo can produce a “knock-in”, which results from the chromosomal integration of a nucleotide sequence that encodes a protein.
  • the coding sequence is integrated into the chromosome such that the chromosomal sequence encoding the wild-type protein is inactivated, but the exogenously introduced protein is expressed.
  • the chromosomal modification results in the production of a variant protein product.
  • the expressed variant protein product can have at least one amino acid substitution and/or the addition or deletion of at least one amino acid.
  • the variant protein product encoded by the altered chromosomal sequence can exhibit modified characteristics or activities when compared to the wild-type protein, including but not limited to altered enzymatic activity or substrate specificity.
  • the chromosomal modification can result in an altered expression pattern of a protein.
  • chromosomal alterations in the regulatory regions controlling the expression of a protein product can result in the overexpression or downregulation of the protein product or an altered tissue or temporal expression pattern.
  • Pharmaceutical compositions [228]
  • the polypeptides, nucleic acids and vectors of the present disclosure may be in a form of a pharmaceutical composition.
  • the pharmaceutical composition may comprise 1 ng to 10 mg of DNA encoding the RGN/gRNA- based system or RGN/gRNA-based system protein component, i.e., the fusion protein.
  • the pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified lentiviral vector.
  • the pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified AAV vector and a nucleotide sequence encoding the site-specific nuclease.
  • the pharmaceutical compositions according to the present invention can be formulated according to the mode of administration to be used. In cases where pharmaceutical compositions are injectable pharmaceutical compositions, they are sterile, pyrogen free and particulate free. An isotonic formulation is preferably used.
  • additives for isotonicity may include sodium chloride, dextrose, mannitol, sorbitol and lactose.
  • isotonic solutions such as phosphate buffered saline are preferred.
  • Stabilizers include gelatin and albumin.
  • a vasoconstriction agent is added to the formulation.
  • the composition may further comprise a pharmaceutically acceptable excipient.
  • the pharmaceutically acceptable excipient may be functional molecules as vehicles, adjuvants, carriers, or diluents.
  • the pharmaceutically acceptable excipient may be a transfection facilitating agent, which may include surface active agents, such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs, vesicles such as squalene and squalene, hyaluronic acid, lipids, liposomes, calcium ions, viral proteins, polyanions, polycations, or nanoparticles, or other known transfection facilitating agents.
  • the transfection facilitating agent can be a polyanion, polycation, including poly-L- glutamate (LGS), or lipid.
  • the transfection facilitating agent is poly-L-glutamate, and more preferably, the poly-L- glutamate is present in the composition for genome editing in skeletal muscle or cardiac muscle at a concentration less than 6 mg/ml.
  • the transfection facilitating agent may also include surface active agents such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs and vesicles such as squalene and squalene, and hyaluronic acid may also be used administered in conjunction with the genetic construct.
  • ISCOMS immune-stimulating complexes
  • LPS analog including monophosphoryl lipid A
  • muramyl peptides muramyl peptides
  • quinone analogs and vesicles such as squalene and squalene
  • hyaluronic acid may also be used administered in conjunction with the genetic construct.
  • tracrRNA sequences of the RGN polypeptides Name Sequence SEQ ID NO E GS0290 tracrRNA CGTCGTCCCCCTGGTAGAGGGGGACGCTAACGCCCGCAGAAGTCCACG 30 CCCGGCCGATGCAGGCCGGGACGGACACGCCGAAAGGCAGAAGCGGGA AGGCCCACCACCCGGTGGGTCACCTCGGCGGGAGTAATCCCCCCGACA GCCCC EGS0293 tracrRNA ACTCGCCTCAGAAAATGGGGAGAGTCTAAACGGACGTGGAAGTCGAGG 31 CGCTCTTCAGGGTGCTGAGACTGTGGAAGCGTCAAGACCACCTACGAG TCATCGTAGAGGGTCACCGTAGATGAGTAATCATCTGCCCATCTAT E GS0294 tracrRNA GTGAACACTCGGCTCCGGAGGGGGCAGAGGATAAACGGCCGTGGAGTG 32 TACATCTCCCATCAACGCAGTATGTTGATGGACTGTATACGAGGAAGC GGCAAGGCCCCAGCGGT
  • Example 2 gRNA identification [241] Systems that fit the putative domain and CRISPR orientation for Cas12f5 were confirmed by predicting the structure computationally using neural network based models, similar to methods described, for example in Jumper et al (2021) Nature. V596m pp 583-589. These structural models were compared to each other, and to solved crystal structures to identify possible gRNA structures, and confirm the catalytic residue of the final RuvC domain of the proteins. The crRNA was held to be the last 14 bases of the CRISPR repeat followed by the reprogrammable spacer sequence.
  • Regions in the identified CRIPSR operon were manually searched for potential tracrRNAs by searching for antirepeat sequences capable of hybridizing to approximately bases 1-10 of the putative crRNA sequence within 400 nt of the last catalytic RuvC domain of the potential Cas12f5, including regions within the ORF.
  • the putative tracrRNA when joined together via a flexible linker, such as a GAAA tetra loop, to the putative crRNA, where there was sufficient hybridization between the antirepeat and the repeat regions, PF0434 ⁇ WO ⁇ PCT the complex adopted a sgRNA form consisting of four to six stem loop sequences.
  • the second stem loop of the structure was typically the longest and most poorly paired, and was able to be truncated down to various lengths while maintaining catalytic activity, as long as a single transcript was produced that maintained the core structure of the sgRNA.
  • the final stem loop of the sgRNA consisted of the antirepeat from the putative tracrRNA and approximately bases 1-10 of the CRISPR repeat before consisting of 1-4 bases of unpaired “leader sequence” before the reprogrammable spacer sequence.
  • Example 3 Determination of PAM requirements for each RGN through Bacterial PAM Depletion [242] PAM requirements for each RGN were determined using a bacterial PAM depletion assay essentially adapted from Kleinstiver et al.
  • Both the RGN and sgRNA in the pET28b backbone were under the control of separate T7 promoters.
  • the transformation reaction was allowed to recover for 1 hr after which it was diluted into LB media containing carbenicillin and kanamycin and grown overnight. The following day the mixture was diluted into self-inducing Overnight ExpressTM Instant TB Medium (Millipore Sigma) to allow expression of the RGN and sgRNA, and grown for an additional 4h at 37C and then shifted to 30C for an additional 16h after which the cells were spun down and plasmid DNA was isolated with a Mini-prep kit (Qiagen, Germantown, MD).
  • the PAM and protospacer regions of uncleaved plasmids were PCR-amplified and prepared for sequencing following published protocols (l6s-metagenomic library prep guide 15044223B, PF0434 ⁇ WO ⁇ PCT Illumina, San Diego, CA). Deep sequencing (55bp paired end reads) was performed on a NextSeq (Illumina).
  • 1-4M reads were obtained per amplicon. PAM regions were extracted, counted, and normalized to total reads for each sample. PAMs that lead to plasmid cleavage were identified by being underrepresented when compared to controls (i.e., when the library is transformed into E. coli containing the RGN but lacking an appropriate sgRNA). To identify the PAM requirements for a novel RGN, an enrichment value was computed for each kmer as the difference between the library size-normalized read counts in the control sample and in the targeting sample. This value was rounded to the nearest integer for positive numbers and set to zero for negative numbers.
  • Enrichment values were then summed across all kmers to yield a position frequency matrix, which was represented visually as a sequence logo using the command line utility weblogo.
  • the final PAM for these RGNs were obtained by summing counts across both plasmid libraries, normalizing counts, computing kmer enrichment values, summing across kmers to yield a position frequency matrix, then visually representing the PAM as a sequence logo using the command line utility weblogo.
  • Example 4 Active Truncations of the Protein [245] Truncations to the proteins were created to identify just a conserved active domain. Activity was impaired with small truncations at the amino terminus ( Figure 2). The carboxyl terminus of the protein was able to be truncated down to having at least 9 amino acids past the final RuvC Catalytic residue without impacting catalytic activity. Truncated sequences are listed in Table 9.
  • Example 5 Improved gRNA designs
  • Improvements to the editing ability of novel RGNs can be accomplished by changing the sgRNA scaffolds by altering the tracrRNA and crRNA linkage, removing hairpin mismatches, strengthening hairpins by swapping A:U base pairs for G:C base pairs, altering the starting position of the tracrRNA, removing non-protein contacting regions of the sgRNA to minimize the design.
  • novel sgRNA designs were developed and tested for genomic editing according to Table 15. Bacterial Editing was confirmed with all v1 sgRNA designs to identify the PAM sequence.
  • gRNAs were synthesized by in vitro transcription of the gRNA cassettes with a GeneArtTM Precision gRNA Synthesis Kit (ThermoFisher).
  • Example 7 Demonstration of gene editing activity on endogenous targets in mammalian cells
  • the RGN was codon optimized for human expression and cloned into expression cassettes with a Nterm SV40 NLS, and a Cterm FLAGtag and c-myc NLS under control of a CMV promoter for mammalian expression. The sequences are set forth in Table 17. [249] Table 17.
  • NLS and FLAG tag sequences T ag Sequence SEQ ID NO NTerm CCCAAGAAGAAGAGGAAGGTG 76 SV40 NLS CTerm GACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGAT 77 FLAGtag TACAAAGACGATGACGATAAG CTerm c- CCTGCTGCCAAACGTTAAACTAGAC 78 myc NLS [250] .
  • gRNA expression constructs encoding a single gRNA each under the control of a human RNA polymerase III U6 promoter were produced and introduced into an expression vector containing GFP under control of a CMV promoter. Guides were design to targeted regions of selected genes with the appropriate PAM for the system.
  • HEK293T cells (Sigma) were plated in 24-well dishes in Dulbecco’s modified Eagle medium (DMEM) plus 10% (vol/vol) fetal bovine serum (Gibco) and 1% Penicillin-Streptomycin (Gibco). The next day when the cells were at 50-60% confluency, 500 ng of a RGN expression plasmid plus 500 ng of a single gRNA expression plasmid were co-transfected using 1.5 uL of Lipofectamine 3000 (Thermo Scientific) per well, following the manufacturer’s instructions.
  • genomic DNA was harvested using a genomic DNA isolation kit (Machery-Nagel) according to the manufacturer’s instructions.
  • PF0434 ⁇ WO ⁇ PCT [251] The total genomic DNA was then analyzed to determine the rate of editing in the targeted gene. Oligonucleotides were produced to be used for PCR amplification and subsequent analysis of the amplified genomic target site.
  • PCR reactions were performed using 10 uL of 2X Master Mix Platinum SuperFi DNA polymerase (Thermo Scientific) in a 20 uL reaction including 0.5 uM of each primer specific for each guide using a program of: 98°C, 1 min; 35 cycles of [98°C, 10 sec; 65°C, 15 sec; 72°C, 30 sec]; 72°C, 5 min; 12°C, forever.
  • Primers for PCR#2 include Nextera Read 1 and Read 2 Transposase Adapter overhang sequences for Illumina sequencing. [252] Following the PCR amplification, DNA was cleaned using a PCR cleanup kit (Zymo) according to the manufacturer’s instructions and eluted in water.
  • the deaminase is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to the RNA guided nuclease at its C- terminal end, that has been mutated to have an inactive RuvC domain (dEGS0293v2_D289A_D486A) (That is, it has been mutated into RGN that is catalytically dead).
  • the RNA-guided DNA binding polypeptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to an uracil protecting peptide (developed in house).
  • the uracil protecting peptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to a second NLS at its C-terminal end.
  • Each of these expression cassettes is introduced into a vector capable of driving the expression of the fusion protein in mammalian cells.
  • a vector capable of expressing gRNA to target the deaminase-RGN-UPP fusion protein to the determined genomic location was also produced.
  • These guide RNAs can guide the deaminase-RGN-UPP fusion protein to the target genome sequence for base editing.
  • liposome transfection vectors capable of expressing the deaminase-RGN-UPP fusion protein and guide RNAs were transfected into HEK293T cells.
  • the day before transfection the cells were distributed in a 24-well plate of growth medium (DMEM + 10% fetal bovine serum + 1% penicillin/streptomycin) at 1.3 ⁇ 10 5 cells/well.
  • DMEM + 10% fetal bovine serum + 1% penicillin/streptomycin 1.3 ⁇ 10 5 cells/well.
  • Lipofectamine® 3000 reagent Thermo Fisher Scientific
  • genomic DNA is harvested from the transfected cells, and the DNA is sequenced and analyzed for the PF0434 ⁇ WO ⁇ PCT presence of targeted cytosine base editing mutations using CRISPResso2 (Clement K, et al Nat Biotechnol. 2019 Mar; 37(3):224-226. doi: 10.1038/s41587-019-0032-3. PubMed PMID: 30809026).
  • Tables 19, 20 and 21 show the editing rate of cytidine bases for each deaminase-RGN-UPP fusion protein and the rate for targeted cytosine deamination for the deaminase-RGN-UPP targeted to the same region as the catalytically dead RGN-UPP.
  • Active cytosine base editing was defined as a greater than 5x increase of C>D SNP base editing along the targeted window of the deaminase-RGN-UPP under investigation, and >60% of specific C>T SNP base editing at highly mutated cytosines. With a catalytically dead nuclease domain, the RGN will not generate a detectable INDEL formation by itself..
  • a cytosine When fused with an active cytosine deaminase that acts on the opposite strand a cytosine will be turned into a uracil.
  • the uracil is rapidly removed from the DNA leaving an abasic site, and eventually a gap, on the strand opposite the strand bound by the gRNA. This can result in a double stranded break which is repaired through non-homologous end joining (NHEJ) and detectable INDEL formation, however, with the presence of an active UPP, the converted uracil is protected from removal and the abasic site is never removed and NHEJ does not occur.
  • NHEJ non-homologous end joining
  • Target 1 Specific cytosine editing results A3A.dEGS0293v2.UPP12 dEGS0293v2 Base (down stream from C>T C>T (% C>G C>G (% C>T (total C>T (% C>G (total C>G (% PAM) (total %) of SNPs) (total %) of SNPs) %) of SNPs) %) of SNPs) %) of SNPs) C 3 0.291698 96.47355 0.003046 1.007557 0.005041 26.92308 0.006001 32.05128 C 4 1.776085 99.19183 0.005331 0.297746 0.006001 25 0.004081 17 C 9 2.713633 99.33092 0.006855 0.250906 0.007201 33.33333 0.007441 34.44444 C 11 2.53313 99.46172 0.006855 0.269139 0.007201 29.12621 0.004081 16.50485 C 12 1.92003 99
  • Target 2 Specific cytosine editing results PF0434 ⁇ WO ⁇ PCT A3A.dEGS0293v2.UPP12 dEGS0293v2 Base (do wnst ream from PA C>T (total C>T (% C>G C>G (% C>T (total C>T (% C>G C>G (% of M) %) of SNPs) (total %) of SNPs) %) of SNPs) (total %) SNPs) C 8 9.072732 99.59171 0.016347 0.17944 0.007903 14.16667 0.032078 57.5 C 11 8.431651 98.67746 0.0552 0.64602 0.013017 9.210526 0.087866 62.17105 C 14 6.545605 87.06983 0.951433 12.65599 0.007903 7.589286 0.075314 72.32143 C 15 7.495143 98.22715 0.100687 1.319548 0.010228 5.37
  • the construct With a mutated catalytic domain, the construct will not be able to cleave the dsDNA but will still bind to the dsDNA in the presence of a targeting sgRNA. While bound to the DNA, the fused activator domains will recruit expression proteins to the same location and increase expression of the targeted gene of interest.
  • CRISPR inhibition CRISPR inhibition
  • the catalytically dead dEGS0293v2_D289A_D486A is codon optimized for human expression and cloned into expression cassettes with a Nterm FLAGtag and SV40 NLS and a Cterm Nucleoplasmin NLS and a KRAB repression domain under control of a hUbC promoter for mammalian expression.
  • a vector capable of expressing a single sgRNA under the control of a human RNA polymerase III U6 promoter is produced and introduced into an expression vector containing GFP under control of a CMV promoter. Guides are designed to targeted regions upstream of selected genes with the appropriate PAM for the system, thereby directing the fusion protein to the targeted gene.
  • the constructs described are introduced into mammalian cells.
  • K562 cells (Sigma) are combined with SF Nucleofector solution (Lonza) and 600 ng of a RGN expression plasmid plus 500 ng of a single crRNA expression plasmid and nucleofected at FF120 per manufactures protocol (Lonza) and plated in duplicate 96-well dishes in Dulbecco’s modified Eagle medium (RPMI) plus 10% (vol/vol) fetal bovine serum (Gibco) and 1% Penicillin-Streptomycin (Gibco). After 72 hours of growth, cells are washed and PF0434 ⁇ WO ⁇ PCT harvested in PBS+1% BSA and stained with APC-labeled target specific antibody (BioLegend).
  • RPMI Dulbecco
  • fetal bovine serum Gibco
  • Penicillin-Streptomycin Gibco
  • Genome Wide protein coding knock out libraries are designed by first identifying all possible PAM matches on both DNA strands inferred to cut within a protein-coding exon.
  • crRNAs corresponding to all PAM matches are inferred.
  • the specificity of all gRNAs is computed using GuideScan2 (Schmidt et al. 2022, bioRxiv, https://doi.org/10.1101/2022.05.02.490368), based on all potential off-targets up to 5 mismatches (with a maximum of 1,000 off-targets considered for each crRNA) and where each off-target site is weighted for probability of cutting based on a cutting frequency determination matrix.
  • the cutting frequency determination matrix is based on either an approximation, given knowledge of related RGN(s), or an empirical estimation computed from systematic variation and testing of known, high efficiency crRNAs.
  • Efficiency of all crRNAs is computed using either an in-house machine learning model of efficiency trained on empirical in-house data or using established models from related RGN(s). Accessibility of each crRNA target site is computed as the proportion of unique cellular contexts in PF0434 ⁇ WO ⁇ PCT ENCODE for which the crRNA target site falls wholly within a DNase-seq narrow- or broad peak as called using the standardized ENCODE pipeline. crRNAs that target exons shared across transcripts are preferrable to those that target rarely utilized alternative exons.
  • crRNA target sites are scored for ‘inclusion-by-expression’, which is an estimate of the inclusion of an exon across transcripts and is estimated by exon-level expression across diverse RNA-seq data sources and normalized within-gene and across the length of the gene to control for the 3’-bias of RNA-seq. crRNAs that target exons earlier in genes are preferrable because indels at these locations are more likely to induce nonsense-mediated decay than those in later exons. All crRNA target sites are scored for exon-level ‘exon priority’, which is computed as a nonlinear function of order of the targeted exon and the total number of exons in the gene model.
  • CRISPRi Genome Wide protein-coding inhibition
  • CRISPRa Genome Wide protein-coding inhibition
  • TSS annotated protein-coding transcription start site
  • the specificity of all crRNAs is computed using GuideScan2 (Schmidt et al.2022), based on all potential off-targets up to 5 mismatches (with a maximum of 1,000 off-targets considered for each crRNA) and where each off-target site is weighted for probability of cutting based on a cutting frequency determination matrix.
  • the cutting frequency determination matrix is based on either an approximation, given knowledge of related RGN(s), or an empirical estimation computed from systematic variation and testing of known, high efficiency crRNAs.
  • Efficiency of all crRNAs is computed using an in-house machine learning model of efficiency trained on empirical in-house data or using established models from related RGN(s).
  • Accessibility of each crRNA target site is computed as the proportion of unique cellular contexts in ENCODE for which the crRNA target site falls wholly within a DNase-seq narrow- or broad peak as called using the standardized ENCODE pipeline.
  • All crRNA target sites are assigned an activation/inhibition-specific distance weight based on the primary genomic sequence distance of the cut site to the TSS as input in a nonlinear function that peaks, for CRISPRa, 50 bp upstream from the TSS and decays in both directions and that peaks, for CRISPRi, 50 bp downstream from the TSS and decays in both directions. All crRNAs per gene are ranked by a linear combination of specificity, efficiency, accessibility, and distance weight.
  • the top N crRNAs are iteratively selected per gene such that the midpoints of no two selected crRNA target sites will be less than 30 bp from one another.
  • PF0434 ⁇ WO ⁇ PCT Separate libraries of sgRNA are created for gene knockout, gene activation, and gene repression. These sequences are synthesized in parallel on arrays, cloned as a pool into a lentiviral transfer plasmid, and packaged into lentivirus for pooled delivery to the cell type of interest.
  • the relevant protein construct for the appropriate screen (such as catalytically active for gene knock out, dEGS0293-VPR for CRISPRa, and dEGS0293-KRAB for CRISPRi) is codon optimized for human expression and cloned into expression cassettes with a Nterm FLAGtag and SV40 NLS and a Cterm Nucleoplasmin NLS with a P2A cleavable linker and a selectable marker under control of a hUbC promoter for mammalian expression in a plasmid capable of being packaged into a lentivirus. The constructs are then packaged into a lentivirus for delivery to the cell type of interest.
  • Cell types are then transduced with a lentivirus to stably express the effector protein.
  • the pooled sgRNA libraries are then transduced to stably express the the targeted sgRNA.
  • cells with differential target gene expression are isolated by fluorescence- activated cell sorting and the responsible sgRNAs recovered from the lentiviral vector in the genomic DNA by PCR. These recovered libraries are then subjected to high-throughput next generation DNA sequencing to identify the genomic sequences associated with altering the expression of each gene.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

La présente invention concerne un nouveau système de ciblage d'acide nucléique comprenant des protéines de nucléase guidées par ARN pour cliver et/ou modifier le nucléotide cible d'intérêt.
PCT/EP2024/068177 2023-06-29 2024-06-27 Nouveaux systèmes de ciblage d'acide nucléique comprenant des nucléases guidées par arn Ceased WO2025003358A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363510918P 2023-06-29 2023-06-29
US63/510,918 2023-06-29

Publications (2)

Publication Number Publication Date
WO2025003358A2 true WO2025003358A2 (fr) 2025-01-02
WO2025003358A3 WO2025003358A3 (fr) 2025-02-20

Family

ID=91782447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/068177 Ceased WO2025003358A2 (fr) 2023-06-29 2024-06-27 Nouveaux systèmes de ciblage d'acide nucléique comprenant des nucléases guidées par arn

Country Status (1)

Country Link
WO (1) WO2025003358A2 (fr)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4853331A (en) 1985-08-16 1989-08-01 Mycogen Corporation Cloning and expression of Bacillus thuringiensis toxin gene toxic to beetles of the order Coleoptera
US5039523A (en) 1988-10-27 1991-08-13 Mycogen Corporation Novel Bacillus thuringiensis isolate denoted B.t. PS81F, active against lepidopteran pests, and a gene encoding a lepidopteran-active toxin
US5049386A (en) 1985-01-07 1991-09-17 Syntex (U.S.A.) Inc. N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US5173414A (en) 1990-10-30 1992-12-22 Applied Immune Sciences, Inc. Production of recombinant adeno-associated virus vectors
WO1993024640A2 (fr) 1992-06-04 1993-12-09 The Regents Of The University Of California PROCEDES ET COMPOSITIONS UTILISES DANS UNE THERAPIE GENIQUE $i(IN VIVO)
US20030087817A1 (en) 1999-01-12 2003-05-08 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
WO2018027078A1 (fr) 2016-08-03 2018-02-08 President And Fellows Of Harard College Éditeurs de nucléobases d'adénosine et utilisations associées
WO2018035250A1 (fr) 2016-08-17 2018-02-22 The Broad Institute, Inc. Méthodes d'identification de systèmes crispr-cas de classe 2

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021178933A2 (fr) * 2020-03-06 2021-09-10 Metagenomi Ip Technologies, Llc Systèmes crispr de classe ii, de type v
JP2023508731A (ja) * 2019-12-30 2023-03-03 ライフエディット セラピューティクス,インコーポレイティド Rna誘導ヌクレアーゼ、その活性断片および多様体、ならびに使用方法
EP4349979A4 (fr) * 2021-05-27 2024-08-21 Institute Of Zoology, Chinese Academy Of Sciences Nucléase cas12i modifiée, protéine effectrice et utilisation de celle-ci
CN116200368A (zh) * 2021-11-30 2023-06-02 上海科技大学 一种基于c2c9核酸酶的新型基因组编辑系统及其应用

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5049386A (en) 1985-01-07 1991-09-17 Syntex (U.S.A.) Inc. N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4853331A (en) 1985-08-16 1989-08-01 Mycogen Corporation Cloning and expression of Bacillus thuringiensis toxin gene toxic to beetles of the order Coleoptera
US5039523A (en) 1988-10-27 1991-08-13 Mycogen Corporation Novel Bacillus thuringiensis isolate denoted B.t. PS81F, active against lepidopteran pests, and a gene encoding a lepidopteran-active toxin
US5173414A (en) 1990-10-30 1992-12-22 Applied Immune Sciences, Inc. Production of recombinant adeno-associated virus vectors
WO1993024640A2 (fr) 1992-06-04 1993-12-09 The Regents Of The University Of California PROCEDES ET COMPOSITIONS UTILISES DANS UNE THERAPIE GENIQUE $i(IN VIVO)
US20030087817A1 (en) 1999-01-12 2003-05-08 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
WO2018027078A1 (fr) 2016-08-03 2018-02-08 President And Fellows Of Harard College Éditeurs de nucléobases d'adénosine et utilisations associées
WO2018035250A1 (fr) 2016-08-17 2018-02-22 The Broad Institute, Inc. Méthodes d'identification de systèmes crispr-cas de classe 2

Non-Patent Citations (39)

* Cited by examiner, † Cited by third party
Title
"Nucleases", 1993, COLD SPRING HARBOR LABORATORY PRESS
ABUDAYYEH ET AL., SCIENCE, vol. 353, 2016, pages aaf5573
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 2003, JOHN WILEY & SONS
BUCHSCHER ET AL., J. VIRAL, vol. 66, 1992, pages 1635 - 1640
CARRIESMALL, BIOCHIM BIOPHYS ACTA, vol. 1833, 2013, pages 253 - 259
CLEMENT K ET AL., NAT BIOTECHNOL., vol. 37, no. 3, March 2019 (2019-03-01), pages 224 - 226
GAUDELLI ET AL., NATURE, vol. 551, 2017, pages 464 - 471
GRUBER ET AL., CELL, vol. 106, no. 1, 2008, pages 23 - 24
GUSCHIN ET AL., METHODS MOL BIOL, vol. 649, 2010, pages 247 - 256
HARRINGTON ET AL., MOLECULAR CELL, vol. 79, 2020, pages 416 - 424
HARRINGTON ET AL., SCIENCE, vol. 360, no. 6387, 2018, pages 436 - 439
HERRMANNNEUPERT, IUBMB LIFE, vol. 55, 2003, pages 219 - 225
JINEK M. ET AL.: "A programmable dual-R A-guided DNA endonuclease in adaptive bacterial immunity", SCIENCE, vol. 337, 2012, pages 816 - 821, XP055229606, DOI: 10.1126/science.1225829
JUMPER ET AL., NATURE, vol. 596, 2021, pages 583 - 589
KAMINSKI ET AL., NAT BIOMED ENG, vol. 5, 2021, pages 643 - 656
KARVELIS ET AL., GENOME BIOL, vol. 16, 2015, pages 253
KARVELIS ET AL., NUCLEIC ACIDS RESEARCH, vol. 48, no. 9, 2020, pages 5016 - 5023
KLEINSTIVER ET AL., NAT BIOTECHNOL, vol. 37, 2019, pages 276 - 282
KLEINSTIVER ET AL., NATURE, vol. 523, 2015, pages 481 - 485
LANGE ET AL., J. BIOL. CHEM, vol. 282, 2007, pages 5101 - 5105
MAKAROVA ET AL., NATURE REVIEWS MICROBIOLOGY, vol. 13, 2015, pages 1 - 15
MILLER, VIRAL, vol. 65, pages 2220 - 2224
MILLETTI F, DRUG DISCOV TODAY, vol. 17, 2012, pages 850 - 860
NASSOURYMORSE, BIOCHIM BIOPHYS ACTA, vol. 1743, 2005, pages 5 - 19
NISHIMASU ET AL., CELL, 2014
PINELLO ET AL., NATURE BIOTECH, vol. 34, 2016, pages 695 - 697
RAY ET AL., BIOCONJUG CHEM, vol. 26, no. 6, 2015, pages 1004 - 7
RUSSEL ET AL., THE CRISPR JOURNAL, vol. 3, no. 6, 2020, pages 462 - 469
SAMBROOK & RUSSELL: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR PRESS
SCHMIDT ET AL., BIORXIV, 2022, Retrieved from the Internet <URL:https://doi.org/10.1101/2022.05.02.490368>
SOIL, CURR OPIN PLANT BIOL, vol. 5, 2002, pages 529 - 535
SOMMNERFELT ET AL., VIRAL, vol. 176, 1990, pages 58 - 59
STEMMER, PROC. NATL. ACAD. SCI. USA, 1994
SWARTS ET AL., MOL. CELL, vol. 66, 2017, pages 221 - 233
WILSON ET AL., J. VIRAL, vol. 63, 1989, pages 2374 - 2378
YAMANO ET AL., CELL, vol. 165, no. 4, 2016, pages 949 - 962
ZETSCHE ET AL., CELL, vol. 163, 2015, pages 759 - 771
ZHANG ET AL., CHEM. SCI, vol. 7, 2016, pages 4951 - 4957
ZUKERSTIEGLER, NUCLEIC ACIDS RES, vol. 9, 1981, pages 133 - 148

Also Published As

Publication number Publication date
WO2025003358A3 (fr) 2025-02-20

Similar Documents

Publication Publication Date Title
US12590299B2 (en) Variants of CRISPR from Prevotella and Francisella 1 (Cpf1)
US10633642B2 (en) Engineered CRISPR-Cas9 nucleases
US11060078B2 (en) Engineered CRISPR-Cas9 nucleases
US12203077B2 (en) Fusion proteins for improved precision in base editing
US10519454B2 (en) Genome editing using Campylobacter jejuni CRISPR/CAS system-derived RGEN
EP3004149B1 (fr) Endonucléase de homing de la famille laglidadg clivant le gène du récepteur aux chimiokines c-c de type 5 (ccr5) et utilisations associées
CN114875012B (zh) 工程化的CRISPR-Cas9核酸酶
US20200172895A1 (en) Using split deaminases to limit unwanted off-target base editor deamination
CA2615532C (fr) Integration et expression ciblees de sequences d&#39;acides nucleiques exogenes
KR20190005801A (ko) 표적 특이적 crispr 변이체
EP4021945A2 (fr) Éditeurs combinatoires d&#39;adénine et de cytosine à base d&#39;adn
AU2004263865A1 (en) Methods and compositions for targeted cleavage and recombination
WO2020160481A1 (fr) Protéines de fusion nucléase à extrémité sortante 3&#39; pouvant être ciblées
WO2020069029A1 (fr) Nouvelles nucléases crispr
JP7109009B2 (ja) 遺伝子ノックアウト方法
JP2024501892A (ja) 新規の核酸誘導型ヌクレアーゼ
WO2025003358A2 (fr) Nouveaux systèmes de ciblage d&#39;acide nucléique comprenant des nucléases guidées par arn
WO2024038168A1 (fr) Nouvelles nucléases guidées par arn et systèmes de ciblage d&#39;acide nucléique les comprenant
WO2024042168A1 (fr) Nouvelles nucléases guidées par arn et systèmes de ciblage d&#39;acide nucléique comprenant de telles nucléases guidées par arn
WO2024042165A2 (fr) Nouvelles nucléases guidées par arn et systèmes de ciblage d&#39;acides nucléiques comprenant de telles nucléases guidées par arn
WO2022129905A1 (fr) Polypeptides crispr
WO2024235991A1 (fr) Nucléases guidées par arn et systèmes de ciblage d&#39;acides nucléiques comprenant de telles nucléases guidées par arn
WO2024127370A1 (fr) Arn guides ciblant le gène trac et procédés d&#39;utilisation
EP4684009A1 (fr) Procédés et compositions améliorés pour l&#39;interférence et l&#39;activation de crispr
EP4634383A1 (fr) Arn guides ciblant un gène foxp3 et procédés d&#39;utilisation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24737915

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE